[zfs-discuss] Re: null pointer dereference ddt_phys_decref+0x9/0x10

ScOut3R mailinglist at modernbiztonsag.org
Thu Mar 22 09:19:06 EDT 2012


arc_reclaim locks up, but the first stack trace comes from z_int_rw
(or something) with a kernel OOPS. This happens when i'm rsyncing data
from an mdadm/ext4 array to the ZFS pool. I'm using the daily repo
now. What i noticed it a few minutes before the oops happens the rsync
speed drops significantly. The system is running with 3GB ram and i've
set arc's max memory to 2GB. I don't use dedup, but i have compression
enabled on a half of the volumes. Do you need any other info which
would be helpful?

On Mar 22, 12:15 pm, ScOut3R <mailingl... at modernbiztonsag.org> wrote:
> Thank you, i just bumped into the opened issue.
>
> After i was able to create a new pool from the same drives i began to
> copy over the backed up data and the copy process crashed. The kernel
> was spilling out process hangup warning. I did not manage to catch the
> actual process name, but i'm running memtest now on the system. I
> wouldn't be surprised if the initial pool corruption was caused by
> faulty hardware.
>
> On Mar 19, 5:49 pm, Brian Behlendorf <behlendo... at llnl.gov> wrote:
>
>
>
>
>
>
>
> > This look like an upstream bug which is very well described in the
> > following post to zfs-discuss.  If you were to make the same change I
> > suspect you would be be able to import the pool.  I'd then suggest you
> > migrate off your data to a new pool.
>
> >http://mail.opensolaris.org/pipermail/zfs-discuss/2012-February/05097...
>
> > How your pool could have been damaged like this is the real mystery, but
> > it appears your not the first person to see this.  I'll open an issue so
> > we can at least update the code with some better error handling.
>
> > --
> > Thanks,
> > Brian
>
> > On Mon, 2012-03-19 at 03:12 -0700, ScOut3R wrote:
> > > Dear List,
>
> > > i'm running Ubuntu Lucid 64bit with latest stable ppa packages
> > > (0.6.0.54). On import i get the following kernel output and the import
> > > fails:
>
> > > [ 1993.534996] BUG: unable to handle kernel NULL pointer dereference
> > > at 0000000000000030
> > > [ 1993.535026] IP: [<ffffffffa067d129>] ddt_phys_decref+0x9/0x10 [zfs]
> > > [ 1993.535088] PGD b5e18067 PUD b5eeb067 PMD 0
> > > [ 1993.535104] Oops: 0002 [#1] SMP
> > > [ 1993.535116] last sysfs file: /sys/devices/pci0000:00/0000:00:1f.2/
> > > host4/target4:0:0/4:0:0:0/block/sdd/queue/scheduler
> > > [ 1993.535133] CPU 0
> > > [ 1993.535140] Modules linked in: zfs(P) zcommon(P) znvpair(P) zavl(P)
> > > zunicode(P) spl fbcon tileblit font bitblit softcursor
> > > snd_hda_codec_realtek snd_hda_intel vga16fb vgastate snd_hda_codec
> > > snd_hwdep nouveau ppdev snd_pcm ttm drm_kms_helper lp parport_pc
> > > snd_timer parport drm snd soundcore snd_page_alloc i2c_algo_bit
> > > intel_agp zlib_deflate raid10 raid456 async_pq async_xor xor
> > > async_memcpy async_raid6_recov raid6_pq async_tx raid1 raid0 ohci1394
> > > multipath 3w_9xxx ieee1394 pata_jmicron r8169 mii linear ahci
> > > [ 1993.535309] Pid: 29559, comm: z_fr_iss/0 Tainted: P
> > > 2.6.32-38-generic #83-Ubuntu P35-DS3P
> > > [ 1993.535323] RIP: 0010:[<ffffffffa067d129>]  [<ffffffffa067d129>]
> > > ddt_phys_decref+0x9/0x10 [zfs]
> > > [ 1993.535375] RSP: 0018:ffff88009af39dc0  EFLAGS: 00010246
> > > [ 1993.535384] RAX: 0000000000000000 RBX: ffff880099dd0000 RCX:
> > > ffffffff817b8ae0
> > > [ 1993.535396] RDX: 0000000000000004 RSI: ffff880099d28c70 RDI:
> > > 0000000000000000
> > > [ 1993.535407] RBP: ffff88009af39dc0 R08: 0000000000000000 R09:
> > > 964bfdd7b6bcfd8d
> > > [ 1993.535418] R10: 0000000000000001 R11: 0000000000000001 R12:
> > > ffff880099d28c70
> > > [ 1993.535429] R13: ffff8800b4cd8800 R14: 0000000000000000 R15:
> > > ffff88009a1aec50
> > > [ 1993.535442] FS:  0000000000000000(0000) GS:ffff880001c00000(0000)
> > > knlGS:0000000000000000
> > > [ 1993.535454] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> > > [ 1993.535464] CR2: 0000000000000030 CR3: 00000000b5e72000 CR4:
> > > 00000000000006f0
> > > [ 1993.535476] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > > 0000000000000000
> > > [ 1993.535487] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> > > 0000000000000400
> > > [ 1993.535498] Process z_fr_iss/0 (pid: 29559, threadinfo
> > > ffff88009af38000, task ffff88009daaade0)
> > > [ 1993.535511] Stack:
> > > [ 1993.535516]  ffff88009af39de0 ffffffffa0708cf1 ffff880099d28c10
> > > 0000000000000200
> > > [ 1993.535532] <0> ffff88009af39e10 ffffffffa070c339 ffff88009a1aec40
> > > ffff8800baf79600
> > > [ 1993.535552] <0> ffff880099d28fa0 ffff8800baf79628 ffff88009af39ee0
> > > ffffffffa05b5c84
> > > [ 1993.535574] Call Trace:
> > > [ 1993.535633]  [<ffffffffa0708cf1>] zio_ddt_free+0x51/0x70 [zfs]
> > > [ 1993.535691]  [<ffffffffa070c339>] zio_execute+0x99/0xf0 [zfs]
> > > [ 1993.535715]  [<ffffffffa05b5c84>] taskq_thread+0x224/0x5b0 [spl]
> > > [ 1993.535731]  [<ffffffff815452c9>] ? thread_return+0x48/0x41f
> > > [ 1993.535744]  [<ffffffff8105cd70>] ? default_wake_function+0x0/0x20
> > > [ 1993.535766]  [<ffffffffa05b5a60>] ? taskq_thread+0x0/0x5b0 [spl]
> > > [ 1993.535779]  [<ffffffff81084a06>] kthread+0x96/0xa0
> > > [ 1993.535791]  [<ffffffff810131ea>] child_rip+0xa/0x20
> > > [ 1993.535802]  [<ffffffff81084970>] ? kthread+0x0/0xa0
> > > [ 1993.535814]  [<ffffffff810131e0>] ? child_rip+0x0/0x20
> > > [ 1993.535822] Code: 75 04 48 8b 46 50 48 89 47 38 c9 c3 66 0f 1f 44
> > > 00 00 55 48 89 e5 0f 1f 44 00 00 48 83 47 30 01 c9 c3 55 48 89 e5 0f
> > > 1f 44 00 00 <48> 83 6f 30 01 c9 c3 55 48 89 e5 0f 1f 44 00 00 31 d2 48
> > > 8d 47
> > > [ 1993.535969] RIP  [<ffffffffa067d129>] ddt_phys_decref+0x9/0x10
> > > [zfs]
> > > [ 1993.536018]  RSP <ffff88009af39dc0>
> > > [ 1993.536636] CR2: 0000000000000030
> > > [ 1993.548418] [drm] nouveau 0000:01:00.0: Setting dpms mode 0 on vga
> > > encoder (output 0)
> > > [ 1993.554358] ---[ end trace 9dd386860ac68813 ]---
>
> > > The beginning of this story is that i had a 500GB volume in the pool
> > > with deduplication. To revert this i wanted to destroy that
> > > filesystem, but the system always crashed during the process, so i've
> > > booted up the latest OpenIndiana system and tried to delete the
> > > filesystem from there. The process was running for days but after a
> > > week or so the system rebooted (i guess it crashed too). After the
> > > reboot the OpenIndiana system could not import the pool, because it
> > > crashed during the import. When i tried to import the same pool under
> > > Ubuntu i got the error shown above.
> > > Is there a way to bring some life back into my pool?
>
> > > Best regards,
> > > Mate



More information about the zfs-discuss mailing list