[zfs-discuss] hung thread while making FS on zvol durring vmbuilder

Christ Schlacta aarcane at aarcane.org
Sat Mar 10 00:57:30 EST 2012


while running vmbuilder on my main server (raidz on 6 drives, 1068E 
controller, more details if needed), I got the following stack traces 
from dmesg and 100% CPU usage on one core:

aarcane at density:~$ sudo dmesg -c
[100529.120001] INFO: rcu_sched_state detected stall on CPU 1 (t=15000 
jiffies)
[100681.668141] INFO: task fsnotify_mark:56 blocked for more than 120 
seconds.
[100681.668176] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[100681.668207] fsnotify_mark   D ffffffff81805060     0    56      2 
0x00000000
[100681.668211]  ffff88020f55dc60 0000000000000046 0000000000000000 
0000000000000000
[100681.668214]  ffff88020f55dfd8 ffff88020f55dfd8 ffff88020f55dfd8 
0000000000012a40
[100681.668216]  ffff88020f7fdc80 ffff88020f59ae40 0000000000000000 
7fffffffffffffff
[100681.668218] Call Trace:
[100681.668225]  [<ffffffff816058cf>] schedule+0x3f/0x60
[100681.668228]  [<ffffffff81605f15>] schedule_timeout+0x2a5/0x320
[100681.668231]  [<ffffffff81103180>] ? 
__perf_event_task_sched_out+0x30/0x60
[100681.668233]  [<ffffffff81604f7f>] wait_for_common+0xdf/0x180
[100681.668236]  [<ffffffff810574b0>] ? try_to_wake_up+0x200/0x200
[100681.668238]  [<ffffffff810d4090>] ? call_rcu_sched+0x20/0x20
[100681.668240]  [<ffffffff816050fd>] wait_for_completion+0x1d/0x20
[100681.668242]  [<ffffffff810d40eb>] synchronize_sched+0x5b/0x60
[100681.668244]  [<ffffffff8107e120>] ? find_ge_pid+0x50/0x50
[100681.668246]  [<ffffffff81086a63>] __synchronize_srcu+0x63/0xc0
[100681.668248]  [<ffffffff81086af5>] synchronize_srcu+0x15/0x20
[100681.668251]  [<ffffffff811a4185>] fsnotify_mark_destroy+0x95/0x170
[100681.668253]  [<ffffffff81081850>] ? add_wait_queue+0x60/0x60
[100681.668255]  [<ffffffff811a40f0>] ? fsnotify_put_mark+0x30/0x30
[100681.668257]  [<ffffffff81080dac>] kthread+0x8c/0xa0
[100681.668260]  [<ffffffff81610ca4>] kernel_thread_helper+0x4/0x10
[100681.668262]  [<ffffffff81080d20>] ? flush_kthread_worker+0xa0/0xa0
[100681.668264]  [<ffffffff81610ca0>] ? gs_change+0x13/0x13
[100709.244004] INFO: rcu_sched_state detected stall on CPU 1 (t=60031 
jiffies)
[100801.668146] INFO: task fsnotify_mark:56 blocked for more than 120 
seconds.
[100801.668182] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[100801.668214] fsnotify_mark   D ffffffff81805060     0    56      2 
0x00000000
[100801.668217]  ffff88020f55dc60 0000000000000046 0000000000000000 
0000000000000000
[100801.668220]  ffff88020f55dfd8 ffff88020f55dfd8 ffff88020f55dfd8 
0000000000012a40
[100801.668223]  ffff88020f7fdc80 ffff88020f59ae40 0000000000000000 
7fffffffffffffff
[100801.668225] Call Trace:
[100801.668231]  [<ffffffff816058cf>] schedule+0x3f/0x60
[100801.668234]  [<ffffffff81605f15>] schedule_timeout+0x2a5/0x320
[100801.668237]  [<ffffffff81103180>] ? 
__perf_event_task_sched_out+0x30/0x60
[100801.668240]  [<ffffffff81604f7f>] wait_for_common+0xdf/0x180
[100801.668242]  [<ffffffff810574b0>] ? try_to_wake_up+0x200/0x200
[100801.668245]  [<ffffffff810d4090>] ? call_rcu_sched+0x20/0x20
[100801.668247]  [<ffffffff816050fd>] wait_for_completion+0x1d/0x20
[100801.668249]  [<ffffffff810d40eb>] synchronize_sched+0x5b/0x60
[100801.668251]  [<ffffffff8107e120>] ? find_ge_pid+0x50/0x50
[100801.668253]  [<ffffffff81086a63>] __synchronize_srcu+0x63/0xc0
[100801.668255]  [<ffffffff81086af5>] synchronize_srcu+0x15/0x20
[100801.668258]  [<ffffffff811a4185>] fsnotify_mark_destroy+0x95/0x170
[100801.668260]  [<ffffffff81081850>] ? add_wait_queue+0x60/0x60
[100801.668262]  [<ffffffff811a40f0>] ? fsnotify_put_mark+0x30/0x30
[100801.668264]  [<ffffffff81080dac>] kthread+0x8c/0xa0
[100801.668266]  [<ffffffff81610ca4>] kernel_thread_helper+0x4/0x10
[100801.668269]  [<ffffffff81080d20>] ? flush_kthread_worker+0xa0/0xa0
[100801.668270]  [<ffffffff81610ca0>] ? gs_change+0x13/0x13

a few minutes later, I get a similar output:

aarcane at density:~$ sudo dmesg -c
[100889.368001] INFO: rcu_sched_state detected stall on CPU 1 (t=105062 
jiffies)
[100921.668115] INFO: task ksmd:54 blocked for more than 120 seconds.
[100921.668148] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[100921.668181] ksmd            D 0000000000000002     0    54      2 
0x00000000
[100921.668184]  ffff88020f597c20 0000000000000046 ffff88020f597bc0 
ffffffff81046ad4
[100921.668187]  ffff88020f597fd8 ffff88020f597fd8 ffff88020f597fd8 
0000000000012a40
[100921.668189]  ffff88020f712e40 ffff88020f598000 ffff88021fc12a40 
7fffffffffffffff
[100921.668191] Call Trace:
[100921.668197]  [<ffffffff81046ad4>] ? __enqueue_entity+0x84/0x90
[100921.668201]  [<ffffffff816058cf>] schedule+0x3f/0x60
[100921.668204]  [<ffffffff81605f15>] schedule_timeout+0x2a5/0x320
[100921.668206]  [<ffffffff81087ef5>] ? sched_clock_local+0x25/0x90
[100921.668209]  [<ffffffff81026558>] ? native_smp_send_reschedule+0x48/0x60
[100921.668212]  [<ffffffff81032a69>] ? default_spin_lock_flags+0x9/0x10
[100921.668214]  [<ffffffff81604f7f>] wait_for_common+0xdf/0x180
[100921.668216]  [<ffffffff810574b0>] ? try_to_wake_up+0x200/0x200
[100921.668219]  [<ffffffff816050fd>] wait_for_completion+0x1d/0x20
[100921.668221]  [<ffffffff8107cdde>] flush_work+0x2e/0x40
[100921.668223]  [<ffffffff81079100>] ? do_work_for_cpu+0x30/0x30
[100921.668225]  [<ffffffff8107d003>] schedule_on_each_cpu+0xc3/0x110
[100921.668228]  [<ffffffff81116a05>] lru_add_drain_all+0x15/0x20
[100921.668230]  [<ffffffff8114f773>] scan_get_next_rmap_item+0x2c3/0x360
[100921.668232]  [<ffffffff8114fb4f>] ksm_scan_thread+0xdf/0x2a0
[100921.668234]  [<ffffffff81081850>] ? add_wait_queue+0x60/0x60
[100921.668236]  [<ffffffff8114fa70>] ? cmp_and_merge_page+0x260/0x260
[100921.668238]  [<ffffffff81080dac>] kthread+0x8c/0xa0
[100921.668240]  [<ffffffff81610ca4>] kernel_thread_helper+0x4/0x10
[100921.668242]  [<ffffffff81080d20>] ? flush_kthread_worker+0xa0/0xa0
[100921.668244]  [<ffffffff81610ca0>] ? gs_change+0x13/0x13
[100921.668246] INFO: task fsnotify_mark:56 blocked for more than 120 
seconds.
[100921.668273] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[100921.668304] fsnotify_mark   D ffffffff81805060     0    56      2 
0x00000000
[100921.668307]  ffff88020f55dc60 0000000000000046 0000000000000000 
0000000000000000
[100921.668309]  ffff88020f55dfd8 ffff88020f55dfd8 ffff88020f55dfd8 
0000000000012a40
[100921.668311]  ffff88020f7fdc80 ffff88020f59ae40 0000000000000000 
7fffffffffffffff
[100921.668314] Call Trace:
[100921.668316]  [<ffffffff816058cf>] schedule+0x3f/0x60
[100921.668318]  [<ffffffff81605f15>] schedule_timeout+0x2a5/0x320
[100921.668320]  [<ffffffff81103180>] ? 
__perf_event_task_sched_out+0x30/0x60
[100921.668323]  [<ffffffff81604f7f>] wait_for_common+0xdf/0x180
[100921.668324]  [<ffffffff810574b0>] ? try_to_wake_up+0x200/0x200
[100921.668327]  [<ffffffff810d4090>] ? call_rcu_sched+0x20/0x20
[100921.668329]  [<ffffffff816050fd>] wait_for_completion+0x1d/0x20
[100921.668330]  [<ffffffff810d40eb>] synchronize_sched+0x5b/0x60
[100921.668332]  [<ffffffff8107e120>] ? find_ge_pid+0x50/0x50
[100921.668334]  [<ffffffff81086a63>] __synchronize_srcu+0x63/0xc0
[100921.668336]  [<ffffffff81086af5>] synchronize_srcu+0x15/0x20
[100921.668339]  [<ffffffff811a4185>] fsnotify_mark_destroy+0x95/0x170
[100921.668341]  [<ffffffff81081850>] ? add_wait_queue+0x60/0x60
[100921.668343]  [<ffffffff811a40f0>] ? fsnotify_put_mark+0x30/0x30
[100921.668345]  [<ffffffff81080dac>] kthread+0x8c/0xa0
[100921.668347]  [<ffffffff81610ca4>] kernel_thread_helper+0x4/0x10
[100921.668349]  [<ffffffff81080d20>] ? flush_kthread_worker+0xa0/0xa0
[100921.668350]  [<ffffffff81610ca0>] ? gs_change+0x13/0x13
aarcane at density:~$


It seems to not be progressing, I may have to kill the vmbuilder process 
shortly, any idea how to "fix" this, or what to do other than kill it 
and reboot?



More information about the zfs-discuss mailing list