At work we run our production systems on Solaris 10 mainly for ZFS (the most awesome filesystem) and Zones (virtualization). We had 3 different revisions of the site with 2 live at the same time. Each revision of the site requiring 4 different tiers. Running all these sites ended up requiring 12 different virtual machines, each on it’s own ZFS filesystem. Some were active and some were shutdown incase they were needed later. Easy as pie with Solaris.
[root@zone01 ~]$ zoneadm list -cv
ID NAME STATUS PATH BRAND IP
0 global running / native shared
7 web01-s6 running /zones/web01-s6 native shared
10 web01 running /zones/web01 native shared
- app01 installed /zones/app01 native shared
- graph01 installed /zones/graph01 native shared
- upload01 installed /zones/upload01 native shared
- web01-front installed /zones/web01-front native shared
- app01-front installed /zones/app01-front native shared
- web01-back installed /zones/web01-back native shared
- app01-back installed /zones/app01-back native shared
- graph01-back installed /zones/graph01-back native shared
- upload01-back installed /zones/upload01-back native shared
- appv01-back installed /zones/appv01-back native shared
- app01-s6 installed /zones/app01-s6 native shared
- upload01-s6 installed /zones/upload01-s6 native shared
[root@zone01 ~]$ zfs list
NAME USED AVAIL REFER MOUNTPOINT
zones 106G 9.51G 41.5K /zones
zones/app01 23.0G 9.51G 23.0G /zones/app01
zones/app01-back 2.58G 9.51G 2.58G /zones/app01-back
zones/app01-front 3.95G 9.51G 3.95G /zones/app01-front
zones/app01-s6 3.38G 9.51G 3.38G /zones/app01-s6
zones/appv01-back 1.45G 9.51G 1.45G /zones/appv01-back
zones/export 5.20G 9.51G 1.55G /export
zones/export@100608 3.65G - 3.65G -
zones/graph01 13.8G 9.51G 13.8G /zones/graph01
zones/graph01-back 2.57G 9.51G 2.57G /zones/graph01-back
zones/upload01 2.69G 9.51G 2.69G /zones/upload01
zones/upload01-back 2.58G 9.51G 2.58G /zones/upload01-back
zones/upload01-s6 3.25G 9.51G 3.25G /zones/upload01-s6
zones/web01 28.7G 9.51G 28.7G /zones/web01
zones/web01-back 3.33G 9.51G 3.33G /zones/web01-back
zones/web01-front 5.07G 9.51G 5.07G /zones/web01-front
zones/web01-s6 4.01G 9.51G 4.01G /zones/web01-s6
What wasn’t easy were the crazy tweaks I started seeing on certain zones. SSH logins, vi saves and sudo commands were taking 20 seconds to complete. There were no abnormal load spikes or processes running. Here is the truss output:
[root@richard.dev ~]$ truss -D ssh web01 uptime
0.0001 read(5, “0E\0\0\0 7\0\0\007 s s h”.., 60) = 60
0.0001 write(4, “D3\b1ACD96FFB9 1B0DCE9 /”.., 960) = 960
pollsys(0×080455F0, 1, 0×00000000, 0×00000000) (sleeping…)
22.5205 pollsys(0×080455F0, 1, 0×00000000, 0×00000000) = 1
0.0001 read(4, ” q9DFB aFA9B C16DE1D13F0″.., 8192) = 32
0.0002 close(5)
[root@zone01 ~]$ truss -D vi tmp.txt
0.0001 open(”tmp.txt”, O_WRONLY|O_CREAT|O_TRUNC, 0644) = 4
0.0000 write(4, ” r i c h a r d r i c h a”.., 22) = 22
17.6760 fdsync(4, FSYNC) = 0
0.0001 close(4) = 0
[root@zone01 ~]$ truss -D sudo -u build id -a
0.0001 open64(”/var/adm/lastlog”, O_RDWR|O_DSYNC|O_CREAT, 0444) = 4
0.0001 llseek(4, 14000, SEEK_SET) = 14000
0.0000 time() = 1237920731
20.5669 write(4, “DB +C9 I p t s / 2\0\0\0″.., 28) = 28
0.0002 close(4) = 0
The only thing I could find through google was some old ZFS write bug that didn’t help much except I knew what to blame now. I left this problem on the back burn while I did some awesome snowboarding in Tahoe over the weekend, my mind still processing in the background. After 2 days of snowboarding I tried cleaning out the old zones and the ZFS filesystems. After all the revisions and system changes, we had gone from 12 zones to 2. Here’s what the cleaned up zone and zfs list looks like:
[root@zone01 ~]$ zoneadm list -cv
ID NAME STATUS PATH BRAND IP
0 global running / native shared
7 web01-s6 running /zones/web01-s6 native shared
10 web01 running /zones/web01 native shared
[root@zone01 ~]$ zfs list
NAME USED AVAIL REFER MOUNTPOINT
zones 33.1G 82.1G 27.5K /zones
zones/web01 29.0G 82.1G 29.0G /zones/web01
zones/web01-s6 3.99G 82.1G 3.99G /zones/web01-s6
Upon the first zfs -z zone destroy, things immediately sped up to normal. SSHing into my zones no longer had a delay. I could save in vi and use sudo like a normal person again. I don’t understand why ZFS had such slow write problems, but this sure did fix it. Case closed.
Recent Comments