[zfs-discuss] zfs large pool - many checksum errors, no read or write errors

Badi' Abdul-Wahid abdulwahidc at gmail.com
Mon May 16 10:27:25 EDT 2016


This looks similar to something I see on my system.
I've got a couple of the WD FFSX drives (same as yours but 7200rpm).
Do you see any errors in your kernel logs regarding these disks?
For me, libata reports bad sectors and I can always reproduce these errors when doing a dd test or a scrub.
After a few minutes, the link speed gets downgraded from 6 to 3 Gbps and the issue subsides.
I've been able to "hide" the problem by setting libata.force=3 in the kernel parameters during boot.
Can you pull a couple out and connect them directly to the motherboard?
When I tested this, skipping the backplane board, the issue disappeared as well.
At this point I'm in the process of replacing my WD Red drives.
My guess at this point is that the vibration is beyond what the disks can handle, despite the manufacturer claims.

FWIW, my pool is a mix of WD, HGST, and Seagate disks in 3 mirrors of two disks.


Francois Stark via zfs-discuss <zfs-discuss at list.zfsonlinux.org> writes:

> Over the past five years we have built four large zfs 0n linux
> servers, all on ubuntu 12 and 14.04; supermicro storage servers, with
> zfs pools ranging from 2x 8 disk raidz1 to the latest monster - 3x 10
> disk raidz2, filled with WD NAS RED 6Tb disks. We have never seen
> anything like this on ZFS before.
>
> So what do you make of our latest scrub feedback :
>
> pool: saturnpool
> state: ONLINE
> status: One or more devices has experienced an unrecoverable error. An
> attempt was made to correct the error. Applications are unaffected.
> action: Determine if the device needs to be replaced, and clear the errors
> using 'zpool clear' or replace the device with 'zpool replace'.
> see: http://zfsonlinux.org/msg/ZFS-8000-9P
> scan: scrub repaired 12,5M in 27h32m with 0 errors on Sun May 1 04:32:43 2016
> config:
>
> NAME STATE READ WRITE CKSUM
> saturnpool ONLINE 0 0 0
> raidz2-0 ONLINE 0 0 0
> ata-WDC_WD60EFRX-68MYMN1_WD-WX71D65JE3HN ONLINE 0 0 20
> ata-WDC_WD60EFRX-68MYMN1_WD-WX71D65JEXK4 ONLINE 0 0 24
> ata-WDC_WD60EFRX-68MYMN1_WD-WX71D65JEYP6 ONLINE 0 0 27
> ata-WDC_WD60EFRX-68MYMN1_WD-WX81D6550AYV ONLINE 0 0 25
> ata-WDC_WD60EFRX-68MYMN1_WD-WX81D6550CRK ONLINE 0 0 15
> ata-WDC_WD60EFRX-68MYMN1_WD-WX81D6550R7L ONLINE 0 0 27
> ata-WDC_WD60EFRX-68MYMN1_WD-WX91D65359U2 ONLINE 0 0 27
> ata-WDC_WD60EFRX-68MYMN1_WD-WX91D6535C70 ONLINE 0 0 18
> ata-WDC_WD60EFRX-68MYMN1_WD-WX91D6535RFL ONLINE 0 0 32
> ata-WDC_WD60EFRX-68MYMN1_WD-WX91D6535SJJ ONLINE 0 0 31
> raidz2-1 ONLINE 0 0 0
> ata-WDC_WD60EFRX-68MYMN1_WD-WX91D6535Y18 ONLINE 0 0 28
> ata-WDC_WD60EFRX-68MYMN1_WD-WX91D6535YK9 ONLINE 0 0 19
> ata-WDC_WD60EFRX-68MYMN1_WD-WX91D65DCS2R ONLINE 0 0 23
> ata-WDC_WD60EFRX-68MYMN1_WD-WXA1D6542060 ONLINE 0 0 37
> ata-WDC_WD60EFRX-68MYMN1_WD-WXA1D65426T4 ONLINE 0 0 26
> ata-WDC_WD60EFRX-68MYMN1_WD-WXA1D6542C71 ONLINE 0 0 26
> ata-WDC_WD60EFRX-68MYMN1_WD-WXA1D6542CRZ ONLINE 0 0 21
> ata-WDC_WD60EFRX-68MYMN1_WD-WXA1D6542DDT ONLINE 0 0 22
> ata-WDC_WD60EFRX-68MYMN1_WD-WXA1D6542DY2 ONLINE 0 0 36
> ata-WDC_WD60EFRX-68MYMN1_WD-WXA1D6542EJJ ONLINE 0 0 29
> raidz2-2 ONLINE 0 0 0
> ata-WDC_WD60EFRX-68MYMN1_WD-WXA1D6542EYN ONLINE 0 0 40
> ata-WDC_WD60EFRX-68MYMN1_WD-WXA1D6542H5Y ONLINE 0 0 29
> ata-WDC_WD60EFRX-68MYMN1_WD-WXA1D6542S04 ONLINE 0 0 24
> ata-WDC_WD60EFRX-68MYMN1_WD-WXA1D6542SC0 ONLINE 0 0 30
> ata-WDC_WD60EFRX-68MYMN1_WD-WXA1D6542ZNE ONLINE 0 0 26
> ata-WDC_WD60EFRX-68MYMN1_WD-WXA1D65E34U9 ONLINE 0 0 26
> ata-WDC_WD60EFRX-68MYMN1_WD-WXA1D65E3R25 ONLINE 0 0 24
> ata-WDC_WD60EFRX-68MYMN1_WD-WXA1D65E3XL3 ONLINE 0 0 35
> ata-WDC_WD60EFRX-68MYMN1_WD-WXA1D65E3Y62 ONLINE 0 0 24
> ata-WDC_WD60EFRX-68MYMN1_WD-WXA1D65E3YJ9 ONLINE 0 0 28
> cache
> ata-Samsung_SSD_850_PRO_256GB_S251NXAH232543L ONLINE 0 0 0
> ata-Samsung_SSD_850_PRO_256GB_S251NXAH232545F ONLINE 0 0 0
> ata-Samsung_SSD_850_PRO_256GB_S251NXAH232550B ONLINE 0 0 0
>
> errors: No known data errors
>
>
>
> mmmmmm? Faulty lsi jbod card? Normally we see read or write errors causing checksum errors - so how do we get so many checksum errors without any r /w errors?
>
> This points to software error? And the fact that we see so many errors across all disks? Impossible that all disks are faulty...
>
> This server is meant to be a near-line backup server. We send 27 TB zfs filesystems to it over 1gig-ethernet and the do daily incremental zfs send | zfs receive to it.
>
> Anybody seen so many checksum errors in zfs yet?
>
> We have contacted supermicro - they replaced the LSI card (Avago LSI SAS3008), but the errors persist.  They are now blaming the disks - WD NAS RED 6TB.
>
>
> Here are some more detail about the server:
> Supermicro SSG-6048R-E1CR36L 4U RACK, 2 XE5-2600V3, 36X 3.5”
>
> We are running the latest ZFS version from the repository ppa  zfs-native/stable
>
> Ubuntu 14.04.4 LTS (GNU/Linux 4.2.0-36-generic x86_64)
>
> 03:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS3008 PCI-Express Fusion-MPT SAS-3 (rev 02)
>
> AME        PROPERTY                    VALUE                       SOURCE
> saturnpool  size                        164T                        -
> saturnpool  capacity                    20%                         -
> saturnpool  altroot                     -                           default
> saturnpool  health                      ONLINE                      -
> saturnpool  guid                        16758994863278285701        default
> saturnpool  version                     -                           default
> saturnpool  bootfs                      -                           default
> saturnpool  delegation                  on                          default
> saturnpool  autoreplace                 off                         default
> saturnpool  cachefile                   -                           default
> saturnpool  failmode                    wait                        default
> saturnpool  listsnapshots               on                          local
> saturnpool  autoexpand                  off                         default
> saturnpool  dedupditto                  0                           default
> saturnpool  dedupratio                  1.00x                       -
> saturnpool  free                        130T                        -
> saturnpool  allocated                   33.7T                       -
> saturnpool  readonly                    off                         -
> saturnpool  ashift                      0                           default
> saturnpool  comment                     -                           default
> saturnpool  expandsize                  -                           -
> saturnpool  freeing                     0                           default
> saturnpool  fragmentation               13%                         -
> saturnpool  leaked                      0                           default
> saturnpool  feature at async_destroy       enabled                     local
> saturnpool  feature at empty_bpobj         active                      local
> saturnpool  feature at lz4_compress        active                      local
> saturnpool  feature at spacemap_histogram  active                      local
> saturnpool  feature at enabled_txg         active                      local
> saturnpool  feature at hole_birth          active                      local
> saturnpool  feature at extensible_dataset  enabled                     local
> saturnpool  feature at embedded_data       active                      local
> saturnpool  feature at bookmarks           enabled                     local
> saturnpool  feature at filesystem_limits   enabled                     local
> saturnpool  feature at large_blocks        enabled                     local
>
>
> zfs get version saturnpool
> NAME        PROPERTY  VALUE    SOURCE
> saturnpool  version   5        -
>
>
> Thanks
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at list.zfsonlinux.org
> http://list.zfsonlinux.org/cgi-bin/mailman/listinfo/zfs-discuss

-- 
Badi' Abdul-Wahid


More information about the zfs-discuss mailing list