[zfs-discuss] zfs large pool - many checksum errors, no read or write errors
Uwe Sauter
uwe.sauter.de at gmail.com
Mon May 16 17:06:45 EDT 2016
Hi,
you could also try to temporarily disable NCQ (= set queue depth to 1).
As mentioned somewhere else in this thread, the firmware of the controller could also have issues. But I think you
didn't provide info about the controller's version, right?
Uwe
Am 16.05.2016 um 22:49 schrieb Badi' Abdul-Wahid via zfs-discuss:
> Francois, here is an instance of the errors I get.
>
>
> May 15 07:15:02 namo kernel: blk_update_request: I/O error, dev sda, sector 1930956344
> May 15 07:15:02 namo kernel: ata1: EH complete
> May 15 07:16:18 namo kernel: ata1.00: exception Emask 0x10 SAct 0x10000 SErr 0x280100 action 0x6 frozen
> May 15 07:16:18 namo kernel: ata1.00: irq_stat 0x08000000, interface fatal error
> May 15 07:16:18 namo kernel: ata1: SError: { UnrecovData 10B8B BadCRC }
> May 15 07:16:18 namo kernel: ata1.00: failed command: READ FPDMA QUEUED
> May 15 07:16:18 namo kernel: ata1.00: cmd 60/00:80:c8:58:23/01:00:74:00:00/40 tag 16 ncq 131072 in
> res 40/00:84:c8:58:23/00:00:74:00:00/40 Emask 0x10 (ATA bus error)
> May 15 07:16:18 namo kernel: ata1.00: status: { DRDY }
> May 15 07:16:18 namo kernel: ata1: hard resetting link
>
> I see several instances of these until
> ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
> after which the errors subside.
>
> Looking at smartctl for the disk:
> ata-WDC_WD3001FFSX-68JNUN0_WD-WMC1F0E426P7SATA Version is: SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
>
>
>
> On Mon, May 16, 2016 at 12:36 PM, Badi' Abdul-Wahid <abdulwahidc at gmail.com <mailto:abdulwahidc at gmail.com>> wrote:
>
> Francois Stark <francois at postmasters.co.za <mailto:francois at postmasters.co.za>> writes:
>
> > ... Thanks for the suggestion - but wouldn't I be getting read or write errors?
> >
> > I am only getting ZFS checksum errors - no read or write errors. I also don't see any errors in the kernel for the disks.
> >
> > Can you paste the kind of errors you have found with the WD disks?
>
> Sure, I'll send it this evening when I get a chance.
>
> >
> > Thanks
> >
> > ________________________________________
> > From: zfs-discuss [zfs-discuss-bounces at list.zfsonlinux.org <mailto:zfs-discuss-bounces at list.zfsonlinux.org>] On
> Behalf Of Badi' Abdul-Wahid
> > Sent: 16 May 2016 04:27 PM
> >
> > This looks similar to something I see on my system.
> > I've got a couple of the WD FFSX drives (same as yours but 7200rpm).
> > Do you see any errors in your kernel logs regarding these disks?
> > For me, libata reports bad sectors and I can always reproduce these errors when doing a dd test or a scrub.
> > After a few minutes, the link speed gets downgraded from 6 to 3 Gbps and the issue subsides.
> > I've been able to "hide" the problem by setting libata.force=3 in the kernel parameters during boot.
> > Can you pull a couple out and connect them directly to the motherboard?
> > When I tested this, skipping the backplane board, the issue disappeared as well.
> > At this point I'm in the process of replacing my WD Red drives.
> > My guess at this point is that the vibration is beyond what the disks can handle, despite the manufacturer claims.
> >
> > FWIW, my pool is a mix of WD, HGST, and Seagate disks in 3 mirrors of two disks.
> >
>
> --
> Badi' Abdul-Wahid
>
>
>
>
> --
>
> Badi' Abdul-Wahid
>
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at list.zfsonlinux.org
> http://list.zfsonlinux.org/cgi-bin/mailman/listinfo/zfs-discuss
>
More information about the zfs-discuss
mailing list