[zfs-discuss] zfs send stalls

Andreas Pflug pgadmin at pse-consulting.de
Wed Feb 7 12:04:20 EST 2018


For a year, I've been backing up a Debian stretch based installation,
sending daily incremental snapshots to a remote system. Everything went
well, until there was heavy file activity, copying aroud 1TB of data,
increasing the size used to >6TB. Now, the incremental zfs send will
stall after 20 minutes.

ZFS 0.7.5.1 on Linux 4.9.65 from Debian stretch-backports (happened with
0.7.3 as well).


I get this in kern.log:

INFO: task send_traverse:8361 blocked for more than 120 seconds.
      Tainted: P           O    4.9.0-5-amd64 #1 Debian 4.9.65-3+deb9u2
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
send_traverse   D    0  8361      2 0x00000080
 ffff8801c13bec00 0000000000000000 ffff8801d346a080 ffff8801f5c18940
 ffffffff81c11500 ffffc9004bc5b860 ffffffff81602923 ffff8801f2cd8140
 00ffffff810a0144 ffff8801f5c18940 ffffc9004bc5b880 ffff8801d346a080
Call Trace:
 [<ffffffff81602923>] ? __schedule+0x233/0x6d0
 [<ffffffff81602df2>] ? schedule+0x32/0x80
 [<ffffffffc025539f>] ? cv_wait_common+0x11f/0x140 [spl]
 [<ffffffff810b86c0>] ? prepare_to_wait_event+0xf0/0xf0
 [<ffffffffc03452ad>] ? bqueue_enqueue+0x5d/0xd0 [zfs]
 [<ffffffffc03560c4>] ? send_cb+0x144/0x190 [zfs]
 [<ffffffffc035944e>] ? traverse_visitbp+0x47e/0x9a0 [zfs]
 [<ffffffffc0359598>] ? traverse_visitbp+0x5c8/0x9a0 [zfs]
 [<ffffffffc0359598>] ? traverse_visitbp+0x5c8/0x9a0 [zfs]
 [<ffffffffc0359598>] ? traverse_visitbp+0x5c8/0x9a0 [zfs]
 [<ffffffffc0359598>] ? traverse_visitbp+0x5c8/0x9a0 [zfs]
 [<ffffffffc0359598>] ? traverse_visitbp+0x5c8/0x9a0 [zfs]
 [<ffffffffc0359598>] ? traverse_visitbp+0x5c8/0x9a0 [zfs]
 [<ffffffffc035a015>] ? traverse_dnode+0xa5/0x1b0 [zfs]
 [<ffffffffc035984e>] ? traverse_visitbp+0x87e/0x9a0 [zfs]
 [<ffffffffc0359b63>] ? traverse_impl+0x1f3/0x460 [zfs]
 [<ffffffffc0355f80>] ? dmu_send_impl+0x1360/0x1360 [zfs]
 [<ffffffffc0351a50>] ? byteswap_record+0x2a0/0x2a0 [zfs]
 [<ffffffffc0250230>] ? __thread_exit+0x20/0x20 [spl]
 [<ffffffffc035a16e>] ? traverse_dataset_resume+0x4e/0x60 [zfs]
 [<ffffffffc0355f80>] ? dmu_send_impl+0x1360/0x1360 [zfs]
 [<ffffffffc0351aa2>] ? send_traverse_thread+0x52/0xb0 [zfs]
 [<ffffffffc025029d>] ? thread_generic_wrapper+0x6d/0x80 [spl]
 [<ffffffff81095ea7>] ? kthread+0xd7/0xf0
 [<ffffffff81095dd0>] ? kthread_park+0x60/0x60
 [<ffffffff81607911>] ? ret_from_fork+0x41/0x50

The scrub I executed right before the last send attempt didn't report
any problems.

Any hints what I can do to restore functionality?

Regards,
Andreas



More information about the zfs-discuss mailing list