[zfs-discuss] Corrupted ZFS, but Scrub Shows no Errors!

Omen Wild omen at mandarb.com
Fri Nov 27 00:10:51 EST 2015


Quoting Gordan Bobic via zfs-discuss <zfs-discuss at list.zfsonlinux.org> on Thu, Nov 26 19:22:
>
> I have a zfs that is verifiably corrupted, but the pool containing it
> scrubs out as completely clean with no errors.
> 
> Various directories/links are showing up as, for example:
> 
> # ls -la /maximilian/hammersteinROOT/boot.bak/rd/usr/
> ls: cannot access /maximilian/hammersteinROOT/boot.bak/rd/usr/lib: No such
> file or directory
> total 17
> drwxr-xr-x. 3 root root 3 Nov 26 18:19 .
> drwxr-xr-x. 3 root root 3 Nov 26 18:17 ..
> d?????????? ? ?    ?    ?            ? lib
> 
> This corruption survives zfs send/receive, and pools on both sides of
> send/receive scrub as clean, without any errors detected.
> 
> Trying to remove the entries that show up as the above results in an
> instantaneous kernel panic on both x86-64 and 32-bit ARM.

We have a similar problem on OmniOS. In our case, the xattrs appear to be
corrupt and give pretty much identical results, though we have not tried
a send/receive of the corrupt pool.
<http://www.listbox.com/member/archive/182191/2015/09/sort/thread/page/4/entry/16:109/20150915180426:B64F1110-5BF5-11E5-A431-A649A75614FD/>

The pool originated on OpenIndiana. Before we got corruption we did a zfs
send/receive from an OpenIndiana system, then moved the disks to an
OmniOS system in late spring. In August we started getting panics when
trying to unlink a specific file (that has a bunch of hard links). The
panics, and in fact the exact line of code that panics the system is in
the above email thread on the Illumos ZFS mailing list.

The first scrub I did after the corruption was detected found no
errors, but the next one found one error:
----- Begin zpool status -v -----
  scan: scrub repaired 0 in 22h22m with 1 errors on Tue Sep 29 14:47:52 2015
  ...
  errors: Permanent errors have been detected in the following files:
  
          zaphod:<0x1f2724c>
---- End zpool status -v -----

"zdb zaphod 0x1f2724c" ends with:
zdb: dmu_bonus_hold(32666188) failed, errno 2

So even zdb is unable to read the data.

We are pretty much stuck. The system is our backup server (BackupPC), so
restoring from backups isn't an option, and there is a single filesystem
that has 17TB of data. Every time BackupPC tries to unlink the one
specific file the system panics. I've starting moving BackupPC's trash to
a new folder and deleting everything that isn't the bad file.

The last comment we got in the email thread was that we could rework that
code where ZFS panics to ignore the error and proceed anyway.

-- 
Experience enables you to recognize a mistake when you make it again.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 4058 bytes
Desc: not available
URL: <http://list.zfsonlinux.org/pipermail/zfs-discuss/attachments/20151126/94f7e3fe/attachment.bin>


More information about the zfs-discuss mailing list