EON: zfs resilver slow for USB disks?

Discussion:

Manojav Sridhar

2010-06-16 14:24:58 UTC

pool: sahara
state: ONLINE
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scrub: resilver in progress for 10h27m, 0.22% done, 4652h48m to go
config:

NAME STATE READ WRITE CKSUM
sahara ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
c0t0d0 ONLINE 0 0 0 2.60G resilvered
c1t0d0 ONLINE 0 0 0

errors: No known data errors

Is that remaining time normal? I am running this on brand new hardware. 2.9GH Hz Dual Core/4G ECC. Seems a bit slow ? any ideas on how to speed it up or see what is choking the resilver?

--
This message posted from opensolaris.org

Andre Lue

2010-06-16 22:25:19 UTC

Permalink

Vajonam,

Whatever you do do not turn it off.

Please describe in detail what caused the failure and the exact steps you took to replace the disk.

I've seen a 7210 take approx 10 hrs to resilver a 1TB drive. So if this is a 1 TB drive I would say its on track.

Do you have dedup enabled?

Please post the output of the following:
zpool list

zpool iostat -v 2

iostat -xnzp 1

zdb -v

--
This message posted from opensolaris.org

Manojav Sridhar

2010-06-17 01:37:14 UTC

Permalink

dre2kse,

Thanks for your pointers.

It was not a failure.

I created a disk with a single stripe, copied about 1.1 TB of data (backing up another mirror I am about to move out of hardware RAID to ZFS mirror). To disk A, then i attached disk B to the mirror hoping the mirror and resilver will be quick.

But since I have posted the OT, the iostat has been reporting about 16MB/s writes and reads to the USB disk(s) which I think is par for the course. The time is now down to about 29 hours remaining after running 21 hours, but i think it will come down non-linearly as it has been so far.

but what is strange is that for 11 hours it was very slow. I didn't know how to use iostat at the time, now i do. So if this happens again will be sure to post the speeds.

Is there any reasons the metadata is slower to resilver than the actual data? I intend to do a scrub again before i trust the data on the two drives entirely. if the scrub returns 0 errors, Am i guaranteed the data is good?

--
This message posted from opensolaris.org

Andre Lue

2010-06-17 04:02:21 UTC

Permalink

The following are general reasons for re-silver slowness:
- something wrong with the replacement disk, or disk in the pool, or controller
- dedup tables too big to fit in RAM, forced to swap
- lotsa lotsa small highly fragmented files and zpool almost (> 80+ %) full

noting the USB writing on par with its capabilities (16Mb/s) a good catch

If the scrub returns 0 errors, yeah it should be good.

--
This message posted from opensolaris.org

Manojav Sridhar

2010-06-17 11:40:35 UTC

Permalink

I've started a srub but that too is starting off very slow

capacity operations bandwidth
pool alloc free read write read write
---------- ----- ----- ----- ----- ----- -----
sahara 1.13T 243G 66 32 7.50M 180K
mirror 1.13T 243G 66 32 7.50M 180K
c0t0d0 - - 1 77 12.2K 7.66M
c1t0d0 - - 64 7 7.53M 160K
---------- ----- ----- ----- ----- ----- -----

capacity operations bandwidth
pool alloc free read write read write
---------- ----- ----- ----- ----- ----- -----
sahara 1.13T 243G 66 464 171K 2.53M
mirror 1.13T 243G 66 464 171K 2.53M
c0t0d0 - - 61 29 163K 2.53M
c1t0d0 - - 57 33 154K 2.90M
---------- ----- ----- ----- ----- ----- -----

capacity operations bandwidth
pool alloc free read write read write
---------- ----- ----- ----- ----- ----- -----
sahara 1.13T 243G 28 501 70.8K 5.62M
mirror 1.13T 243G 28 501 70.8K 5.62M
c0t0d0 - - 12 60 30.8K 5.77M
c1t0d0 - - 15 55 40.0K 5.27M
---------- ----- ----- ----- ----- ----- -----

capacity operations bandwidth
pool alloc free read write read write
---------- ----- ----- ----- ----- ----- -----
sahara 1.13T 243G 39 51 99.3K 6.40M
mirror 1.13T 243G 39 51 99.3K 6.40M
c0t0d0 - - 18 51 47.5K 6.30M
c1t0d0 - - 20 51 51.7K 6.40M
---------- ----- ----- ----- ----- ----- -----

capacity operations bandwidth
pool alloc free read write read write
---------- ----- ----- ----- ----- ----- -----
sahara 1.13T 243G 82 7 215K 268K
mirror 1.13T 243G 82 7 215K 268K
c0t0d0 - - 80 4 215K 210K
c1t0d0 - - 88 4 231K 268K
---------- ----- ----- ----- ----- ----- -----

capacity operations bandwidth
pool alloc free read write read write
---------- ----- ----- ----- ----- ----- -----
sahara 1.13T 243G 49 544 128K 2.29M
mirror 1.13T 243G 49 544 128K 2.29M
c0t0d0 - - 44 31 117K 2.29M
c1t0d0 - - 39 30 104K 2.29M
---------- ----- ----- ----- ----- ----- -----

capacity operations bandwidth
pool alloc free read write read write
---------- ----- ----- ----- ----- ----- -----
sahara 1.13T 243G 74 24 245K 1.52M
mirror 1.13T 243G 74 24 245K 1.52M
c0t0d0 - - 68 16 186K 1.52M
c1t0d0 - - 74 19 248K 1.52M
---------- ----- ----- ----- ----- ----- -----

sahara:
version: 22
name: 'sahara'
state: 0
txg: 436691
pool_guid: 1647524158157444734
hostid: 8832737
hostname: 'getafix'
vdev_children: 1
vdev_tree:
type: 'root'
id: 0
guid: 1647524158157444734
children[0]:
type: 'mirror'
id: 0
guid: 15124998041910158027
metaslab_array: 14
metaslab_shift: 32
ashift: 9
asize: 1500294676480
is_log: 0
children[0]:
type: 'disk'
id: 0
guid: 5405856878232070164
path: '/dev/dsk/c0t0d0s0'
devid: 'id1,***@TWD______Ext_HDD_1021____WMAVU3096894____/a'
phys_path: '/***@0,0/pci15d9,***@1a,7/***@1/***@0,0:a'
whole_disk: 1
DTL: 77
offline: 1
children[1]:
type: 'disk'
id: 1
guid: 5096009099665214152
path: '/dev/dsk/c1t0d0s0'
devid: 'id1,***@TST31500341AS____________________________2GEVYWE5/a'
phys_path: '/***@0,0/pci15d9,***@1a,7/***@2/***@0,0:a'
whole_disk: 1
DTL: 85
rewind_txg_ts: 1276641613
seconds_of_rewind: 0
verify_data_errors: 0

Post by Andre Lue
- dedup tables too big to fit in RAM, forced to swap

dedup is 1.00 not many duplicates of data.

Post by Andre Lue
- lotsa lotsa small highly fragmented files and zpool
almost (> 80+ %) full

This is true

sahara 1.36T 1.13T 243G 82% 1.00x ONLINE -

1 have 4G, should be using more?

last pid: 2195; load avg: 0.01, 0.01, 0.01; up 1+07:47:54 07:39:52
36 processes: 35 sleeping, 1 on cpu
CPU states: 98.8% idle, 0.0% user, 1.2% kernel, 0.0% iowait, 0.0% swap
Kernel: 3067 ctxsw, 3 trap, 3915 intr, 307 syscall, 2 flt
Memory: 4094M phys mem, 237M free mem

PID USERNAME NLWP PRI NICE SIZE RES STATE TIME CPU COMMAND
2194 root 1 50 0 3672K 1768K cpu/0 0:00 0.04% top
713 root 18 59 0 16M 7648K sleep 0:14 0.00% smbd
1016 root 1 59 0 5472K 2644K sleep 0:03 0.00% ntpd
212 root 6 59 0 11M 4124K sleep 0:01 0.00% devfsadm
669 root 1 59 0 4112K 828K sleep 0:01 0.00% ipmon
1215 root 1 59 0 23M 8560K sleep 0:02 0.00% httpd
9 root 13 59 0 8668K 5960K sleep 0:02 0.00% svc.configd
1232 daemon 2 60 -20 2800K 1884K sleep 0:01 0.00% nfsd
1043 daemon 2 60 -20 2756K 1892K sleep 0:00 0.00% lockd
2104 root 1 59 0 11M 5132K sleep 0:00 0.00% sshd
1894 root 1 59 0 11M 5116K sleep 0:00 0.00% sshd
210 daemon 3 59 0 7836K 4420K sleep 0:00 0.00% kcfd
7 root 10 59 0 7756K 4396K sleep 0:00 0.00% svc.startd
2103 root 1 59 0 9652K 4092K sleep 0:00 0.00% sshd
1893 root 1 59 0 9652K 4028K sleep 0:00 0.00% sshd
1624 root 18 59 0 5448K 3532K sleep 0:00 0.00% nscd
203 daemon 5 59 0 5576K 2964K sleep 0:00 0.00% idmapd
209 root 17 59 0 5676K 2476K sleep 0:00 0.00% syseventd
1216 webservd 1 59 0 21M 2464K sleep 0:00 0.00% httpd
1230 root 7 59 0 3532K 2368K sleep 0:00 0.00% mountd
1897 root 1 59 0 3680K 2352K sleep 0:00 0.00% bash
2107 root 1 59 0 3680K 2340K sleep 0:00 0.00% bash
1217 webservd 1 59 0 23M 2292K sleep 0:00 0.00% httpd
1218 webservd 1 59 0 23M 2292K sleep 0:00 0.00% httpd
881 root 11 59 0 4436K 2276K sleep 0:00 0.00% syslogd
1029 daemon 1 59 0 3116K 2148K sleep 0:00 0.00% statd
892 root 1 59 0 4200K 1512K sleep 0:00 0.00% sshd
1262 root 1 59 0 2860K 1472K sleep 0:00 0.00% rsync
847 daemon 1 59 0 3248K 1424K sleep 0:00 0.00% rpcbind
1294 root 1 59 0 2032K 1416K sleep 0:00 0.00% ttymon

--
This message posted from opensolaris.org

Manojav Sridhar

2010-06-17 16:28:17 UTC

Permalink

So the scrub was running at about 0.11% for 5 hours and now suddenly the I/O speed has picked up and its scrubbing at about 16 MB/s per drive which is acceptable again given USB.

Can you recommend any commands that will give me a clue as to why the initial stages of a scrub/resilver are so slow?

--
This message posted from opensolaris.org

Manojav Sridhar

2010-06-23 14:10:59 UTC

Permalink

dre2kse,

iostat -xnzp 1

r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 32.0 0.0 88.9 7.0 3.0 219.0 93.8 100 100 c1t0d0s0
15.0 0.0 36.5 0.0 0.0 0.1 0.0 6.3 0 8 c0t0d0s0
0.0 32.0 0.0 88.9 7.0 3.0 219.0 93.8 100 100 c1t0d0
15.0 0.0 36.5 0.0 0.0 0.1 0.0 6.3 0 8 c0t0d0
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 34.0 0.0 87.1 7.0 3.0 205.7 88.1 100 100 c1t0d0s0
5.0 0.0 13.5 0.0 0.0 0.0 0.0 7.5 0 4 c0t0d0s0
0.0 34.0 0.0 87.1 7.0 3.0 205.7 88.1 100 100 c1t0d0
5.0 0.0 13.5 0.0 0.0 0.0 0.0 7.5 0 4 c0t0d0
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 36.0 0.0 84.4 5.0 2.9 138.0 82.0 92 100 c1t0d0s0
0.0 8.0 0.0 27.0 0.0 0.0 0.9 2.1 0 1 c0t0d0s0
0.0 36.0 0.0 84.4 5.0 2.9 138.0 82.0 92 100 c1t0d0
0.0 8.0 0.0 27.0 0.0 0.0 0.9 2.1 0 1 c0t0d0
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 51.0 0.0 305.0 1.7 1.4 33.5 26.8 38 49 c1t0d0s0
0.0 50.0 0.0 288.0 0.1 0.1 2.2 2.6 3 5 c0t0d0s0
0.0 51.0 0.0 305.0 1.7 1.4 33.5 26.8 38 49 c1t0d0
0.0 50.0 0.0 288.0 0.1 0.1 2.2 2.6 3 5 c0t0d0
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 75.1 0.0 181.7 0.8 0.8 10.8 10.4 24 27 c1t0d0s0
33.0 5.0 152.2 4.0 1.0 0.7 25.4 17.7 20 24 c0t0d0s0
0.0 75.1 0.0 181.7 0.8 0.8 10.8 10.4 24 27 c1t0d0
33.0 5.0 152.2 4.0 1.0 0.7 25.4 17.7 20 24 c0t0d0
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 26.0 0.0 65.0 7.0 3.0 269.2 115.4 100 100 c1t0d0s0
2.0 0.0 6.0 0.0 0.0 0.0 0.0 16.9 0 2 c0t0d0s0
0.0 26.0 0.0 65.0 7.0 3.0 269.2 115.4 100 100 c1t0d0
2.0 0.0 6.0 0.0 0.0 0.0 0.0 16.9 0 2 c0t0d0
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 30.0 0.0 71.9 7.0 3.0 233.5 100.1 100 100 c1t0d0s0
18.0 0.0 48.0 0.0 0.0 0.1 0.0 6.8 0 11 c0t0d0s0
0.0 30.0 0.0 71.9 7.0 3.0 233.5 100.1 100 100 c1t0d0
18.0 0.0 48.0 0.0 0.0 0.1 0.0 6.8 0 11 c0t0d0

What does 100 w% and 100b% mean? this is still during a resilver.

--
This message posted from opensolaris.org

Andre Lue

2010-06-23 16:18:09 UTC

Permalink

w=wait (ie waiting for I/O resources)
b=busy (ie busy performing I/O)

--
This message posted from opensolaris.org

Manojav Sridhar

2010-07-02 14:44:36 UTC

Permalink

found out what the problem was. One of my disks was 4K WD Caviar Green. Which doesn't play well with ZFS as per the various forums. So had to get rid of that, things seems much faster now!

--
This message posted from opensolaris.org

Andre Lue

2010-07-02 14:46:20 UTC

Permalink

Cool, thank for sharing your findings.

--
This message posted from opensolaris.org