[zfs-discuss] zfs large pool - many checksum errors, no read or write errors

Uwe Sauter uwe.sauter.de at gmail.com
Mon May 16 17:06:45 EDT 2016


Hi,

you could also try to temporarily disable NCQ (= set queue depth to 1).

As mentioned somewhere else in this thread, the firmware of the controller could also have issues. But I think you
didn't provide info about the controller's version, right?

	Uwe

Am 16.05.2016 um 22:49 schrieb Badi' Abdul-Wahid via zfs-discuss:
> Francois, here is an instance of the errors I get.
> 
> 
> May 15 07:15:02 namo kernel: blk_update_request: I/O error, dev sda, sector 1930956344
> May 15 07:15:02 namo kernel: ata1: EH complete
> May 15 07:16:18 namo kernel: ata1.00: exception Emask 0x10 SAct 0x10000 SErr 0x280100 action 0x6 frozen
> May 15 07:16:18 namo kernel: ata1.00: irq_stat 0x08000000, interface fatal error
> May 15 07:16:18 namo kernel: ata1: SError: { UnrecovData 10B8B BadCRC }
> May 15 07:16:18 namo kernel: ata1.00: failed command: READ FPDMA QUEUED
> May 15 07:16:18 namo kernel: ata1.00: cmd 60/00:80:c8:58:23/01:00:74:00:00/40 tag 16 ncq 131072 in
>                                       res 40/00:84:c8:58:23/00:00:74:00:00/40 Emask 0x10 (ATA bus error)
> May 15 07:16:18 namo kernel: ata1.00: status: { DRDY }
> May 15 07:16:18 namo kernel: ata1: hard resetting link
> 
> I see several instances of these until
> ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
> after which the errors subside.
> 
> Looking at smartctl for the disk:
> ata-WDC_WD3001FFSX-68JNUN0_WD-WMC1F0E426P7SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
> 
> 
> 
> On Mon, May 16, 2016 at 12:36 PM, Badi' Abdul-Wahid <abdulwahidc at gmail.com <mailto:abdulwahidc at gmail.com>> wrote:
> 
>     Francois Stark <francois at postmasters.co.za <mailto:francois at postmasters.co.za>> writes:
> 
>     > ... Thanks for the suggestion - but wouldn't I be getting read or write errors?
>     >
>     > I am only getting ZFS checksum errors - no read or write errors. I also don't see any errors in the kernel for the disks.
>     >
>     > Can you paste the kind of errors you have found with the WD disks?
> 
>     Sure, I'll send it this evening when I get a chance.
> 
>     >
>     > Thanks
>     >
>     > ________________________________________
>     > From: zfs-discuss [zfs-discuss-bounces at list.zfsonlinux.org <mailto:zfs-discuss-bounces at list.zfsonlinux.org>] On
>     Behalf Of Badi' Abdul-Wahid
>     > Sent: 16 May 2016 04:27 PM
>     >
>     > This looks similar to something I see on my system.
>     > I've got a couple of the WD FFSX drives (same as yours but 7200rpm).
>     > Do you see any errors in your kernel logs regarding these disks?
>     > For me, libata reports bad sectors and I can always reproduce these errors when doing a dd test or a scrub.
>     > After a few minutes, the link speed gets downgraded from 6 to 3 Gbps and the issue subsides.
>     > I've been able to "hide" the problem by setting libata.force=3 in the kernel parameters during boot.
>     > Can you pull a couple out and connect them directly to the motherboard?
>     > When I tested this, skipping the backplane board, the issue disappeared as well.
>     > At this point I'm in the process of replacing my WD Red drives.
>     > My guess at this point is that the vibration is beyond what the disks can handle, despite the manufacturer claims.
>     >
>     > FWIW, my pool is a mix of WD, HGST, and Seagate disks in 3 mirrors of two disks.
>     >
> 
>     --
>     Badi' Abdul-Wahid
> 
> 
> 
> 
> -- 
> 
> Badi' Abdul-Wahid
> 
> 
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at list.zfsonlinux.org
> http://list.zfsonlinux.org/cgi-bin/mailman/listinfo/zfs-discuss
> 


More information about the zfs-discuss mailing list