[zfs-discuss] zpool scrub repeatedly detects checksum errors
achim at ag-web.biz
Fri Sep 13 06:21:36 EDT 2013
Am 13.09.2013 11:17, schrieb Gregor Kopka:
> Am 13.09.2013 04:37, schrieb Achim Gottinger:
>> Recently i checken my zpool on debian wheezy with scrub and it
>> detected around 450 checksum error, two of them remained
>> uncorrectable, resulting in two defet files whom i deleted.
>> The pool consits of an stripe with two 2TB drives plus 10GB cache and
>> 512MB for logs.
>> Wheezy runs as an VM with 4 cores and 4GB ram ontop of xen cloud
>> platform. The two 2TB drives are physically raid1's on the host's on
>> an adaptec 6805E raid controller. Due to the fact that the vm is
>> limited to an maximum of six virtual disk;s, each with an maximum of
>> 2TB, i had to use above aproach to get an ~4TB pool inside the vm.
>> Pool is accessed via nfs3 and samba (4).
>> In the past i had a few kernel lookup's due to arc related memory
>> issues, i ended up with an max arc size of 1GB, back then, afterwards
>> it was stable.
>> Back on topic, after the scrubbing i checked both raid1's, running
>> verify and fix on the adpatec controller and theys where both ok. The
>> physical drives themselves had no errors /msart/hardware/parity),
>> just an few aborted commands warning. I noted the aborted command
>> values, cleaned the zpool incl discs from the warnings and reran an
>> Once again it found about 450 checksum errors on each drive, this
>> time all of them where correctable, but how can that be? Aborted
>> commands counters on the physical discs involved did not change
>> during the test.
>> So now i wonder where these checksum errors come from when the
>> involved discs are intact and all previsous errors could be fixed?
> In case the errors are in zfs metadata then they're correctable (since
> zfs keeps multiple copies of metadata).
Thank you for the reply George!
This is the output with results from the second scrub after i had checked involved raid1's and discs for error.
Disc's and raid's are ok, the system ran flawless since the previous scrub and i had cleaned the errors before i ran scrub again.
I wonder where these checksum erros come from because i had expected they had been fixed during the first scrub and by me removing the affected two fils.
I run zfs/spl verision 0.6.2 btw.
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
scan: scrub repaired 1,07M in 5h51m with 0 errors on Fri Sep 13 01:15:56 2013
NAME STATE READ WRITE CKSUM
zpool ONLINE 0 0 0
xvde ONLINE 0 0 446
xvdf ONLINE 0 0 459
xvda3 ONLINE 0 0 0
xvdd1 ONLINE 0 0 0
errors: No known data errors
> Having the raid find nothing is quite normal, since it'll happily feed
> you garbage because it dosn't know anything about the data on-disk.
> With the 6 drives limit on the VM it would be better to hand the 4
> data disks to the VM (since it would fit the limitation, in case the
> cache and log drives are from one SSD you could hand it completely and
> create the partitions inside the VM) and let ZFS handle them directly,
> then you would have had better info which disk failed (where the
> checksum errors come from) and also the ability of zfs to repair them
> from the other side of the mirror.
Indeed tha would be better but one drive is occupied by the system one
by the dvd and i need another free slot for drives i mount from other
vm's randomly, so i had to go the raid0 route.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to zfs-discuss+unsubscribe at zfsonlinux.org.
To unsubscribe from this group and stop receiving emails from it, send an email to zfs-discuss+unsubscribe at zfsonlinux.org.
More information about the zfs-discuss