[zfs-discuss] Zpool scrub system lockup

Niels de Carpentier zfs at decarpentier.com
Thu Aug 29 06:22:14 EDT 2013


This is why people want drives with TLER.
The drive is likely going into recovery mode, and won't respond for
minutes while it's trying to recover some sector. The controller gets
confused, and tries to reset the drive, and gets stuck until the drive
responds again.

It should all work again when the drive gets done with it's recovery, but
the same might happen with some other sector. Just replace the drive ASAP.

Niels

> Started a "zpool scrub tank" on a pool, a minute or so into the scrub
> "zpool status" reported a drive with 158 read and write errors. The scrub
> tried to continue and the kernel started throwing errors into the console:
>
> Message from syslogd at server at Aug 28 19:05:46 ...
>  kernel:BUG: soft lockup - CPU#0 stuck for 67s! [scsi_eh_8:612]
>
> As soon as I can get access to the system again (it is now completely
> locked up) I can send a current log error, there are some old ones
> however:
>
>
> Aug 28 18:40:34 server kernel: BUG: soft lockup - CPU#0 stuck for 67s!
> [scsi_eh_8:612]
> Aug 28 18:40:34 server kernel: Modules linked in: zfs(P)(U) zcommon(P)(U)
> znvpair(P)(U) zavl(P)(U) zunicode(P)(U) spl(U) zlib_deflate ipvore shpchp
> ext4 mbcache jbd2 arcmsr sd_mod crc_t10dif firewire_ohci firewire_core
> crc_itu_t ahci ata_generic pata_acpi pata_jmicron dm_mirror
> Aug 28 18:40:34 server kernel: CPU 0
> Aug 28 18:40:34 server kernel: Modules linked in: zfs(P)(U) zcommon(P)(U)
> znvpair(P)(U) zavl(P)(U) zunicode(P)(U) spl(U) zlib_deflate ipvore shpchp
> ext4 mbcache jbd2 arcmsr sd_mod crc_t10dif firewire_ohci firewire_core
> crc_itu_t ahci ata_generic pata_acpi pata_jmicron dm_mirror
> Aug 28 18:40:34 server kernel:
> Aug 28 18:40:34 server kernel: Pid: 612, comm: scsi_eh_8 Tainted: P
>   ---------------    2.6.32-358.14.1.el6.x86_64 #1 Supermicro
> Aug 28 18:40:34 server kernel: RIP: 0010:[<ffffffffa007c67a>]
>  [<ffffffffa007c67a>] arcmsr_get_firmware_spec+0x4a/0x500 [arcmsr]
> Aug 28 18:40:34 server kernel: RSP: 0018:ffff880c29a55ce0  EFLAGS:
> 00000246
> Aug 28 18:40:34 server kernel: RAX: 0000000000000000 RBX: ffff880c29a55d10
> RCX: 000000000000000d
> Aug 28 18:40:34 server kernel: RDX: ffffc9001cde00bc RSI: 0000000000000282
> RDI: ffff88182af305e0
> Aug 28 18:40:34 server kernel: RBP: ffffffff8100bb8e R08: ffff880c29a54000
> R09: 00000000ffffffff
> Aug 28 18:40:34 server kernel: R10: 0000000000000000 R11: 0000000000000000
> R12: 0000000000000000
> Aug 28 18:40:34 server kernel: R13: 0000000000000282 R14: ffff880c29a55cd0
> R15: ffffffff8150eb7a
> Aug 28 18:40:34 server kernel: FS:  0000000000000000(0000)
> GS:ffff880028200000(0000) knlGS:0000000000000000
> Aug 28 18:40:34 server kernel: CS:  0010 DS: 0018 ES: 0018 CR0:
> 000000008005003b
> Aug 28 18:40:34 server kernel: CR2: 0000000000404190 CR3: 000000182a7e9000
> CR4: 00000000000007f0
> Aug 28 18:40:34 server kernel: DR0: 0000000000000000 DR1: 0000000000000000
> DR2: 0000000000000000
> Aug 28 18:40:34 server kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0
> DR7: 0000000000000400
> Aug 28 18:40:34 server kernel: Process scsi_eh_8 (pid: 612, threadinfo
> ffff880c29a54000, task ffff880c2abdeaa0)
> Aug 28 18:40:34 server kernel: Stack:
> Aug 28 18:40:34 server kernel: ffff880c29a55cf0 ffffffff8150ecde
> ffff88182af305e0 000000000000000d
> Aug 28 18:40:34 server kernel: <d> 0000000000000000 ffffc9001cde00f8
> ffff880c29a55d80 ffffffffa007ce02
> Aug 28 18:40:34 server kernel: <d> ffffffffa007f2a0 ffffffffa007f2a0
> ffff880c29a55d40 ffffffff8135a7e7
> Aug 28 18:40:34 server kernel: Call Trace:
> Aug 28 18:40:34 server kernel: [<ffffffff8150ecde>] ?
> schedule_timeout_uninterruptible+0x1e/0x20
> Aug 28 18:40:34 server kernel: [<ffffffffa007ce02>] ?
> arcmsr_bus_reset+0x2d2/0x560 [arcmsr]
> Aug 28 18:40:34 server kernel: [<ffffffff8135a7e7>] ? put_device+0x17/0x20
> Aug 28 18:40:34 server kernel: [<ffffffff81371314>] ?
> scsi_device_put+0x44/0x60
> Aug 28 18:40:34 server kernel: [<ffffffff81375ce2>] ?
> scsi_try_bus_reset+0x42/0x120
> Aug 28 18:40:34 server kernel: [<ffffffff81377783>] ?
> scsi_eh_ready_devs+0x4a3/0x840
> Aug 28 18:40:34 server kernel: [<ffffffff81378213>] ?
> scsi_error_handler+0x4e3/0x6b0
> Aug 28 18:40:34 server kernel: [<ffffffff81377d30>] ?
> scsi_error_handler+0x0/0x6b0
> Aug 28 18:40:34 server kernel: [<ffffffff81096956>] ? kthread+0x96/0xa0
> Aug 28 18:40:34 server kernel: [<ffffffff8100c0ca>] ? child_rip+0xa/0x20
> Aug 28 18:40:34 server kernel: [<ffffffff810968c0>] ? kthread+0x0/0xa0
> Aug 28 18:40:34 server kernel: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
> Aug 28 18:40:34 server kernel: Code: 01 00 00 83 f8 02 0f 84 42 02 00 00
> 4c
> 8b 77 38 41 8b 46 34 83 c8 0d 41 89 46 34 49 8d 96 bc 00 00 0
> Aug 28 18:40:34 server kernel: Call Trace:
> Aug 28 18:40:34 server kernel: [<ffffffff8150ecde>] ?
> schedule_timeout_uninterruptible+0x1e/0x20
> Aug 28 18:40:34 server kernel: [<ffffffffa007ce02>] ?
> arcmsr_bus_reset+0x2d2/0x560 [arcmsr]
> Aug 28 18:40:34 server kernel: [<ffffffff8135a7e7>] ? put_device+0x17/0x20
> Aug 28 18:40:34 server kernel: [<ffffffff81371314>] ?
> scsi_device_put+0x44/0x60
> Aug 28 18:40:34 server kernel: [<ffffffff81375ce2>] ?
> scsi_try_bus_reset+0x42/0x120
> Aug 28 18:40:34 server kernel: [<ffffffff81377783>] ?
> scsi_eh_ready_devs+0x4a3/0x840
> Aug 28 18:40:34 server kernel: [<ffffffff81378213>] ?
> scsi_error_handler+0x4e3/0x6b0
> Aug 28 18:40:34 server kernel: [<ffffffff81377d30>] ?
> scsi_error_handler+0x0/0x6b0
> Aug 28 18:40:34 server kernel: [<ffffffff81096956>] ? kthread+0x96/0xa0
> Aug 28 18:40:34 server kernel: [<ffffffff8100c0ca>] ? child_rip+0xa/0x20
> Aug 28 18:40:34 server kernel: [<ffffffff810968c0>] ? kthread+0x0/0xa0
> Aug 28 18:40:34 server kernel: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
>
>
> Looks like it may be having issues with the Areca card? The system is a 3
> chassis Supermicro with 3TB Ultrastars and an Areca 1880 in JBOD mode. Any
> help figuring out what went wrong would be grand!
>
> J
>
> To unsubscribe from this group and stop receiving emails from it, send an
> email to zfs-discuss+unsubscribe at zfsonlinux.org.
>


To unsubscribe from this group and stop receiving emails from it, send an email to zfs-discuss+unsubscribe at zfsonlinux.org.



More information about the zfs-discuss mailing list