[zfs-discuss] ZFS eating up all 16 GB RAM when combining 4k blocksize ZVOL with XFS

Omar Siam simar at gmx.net
Mon Nov 12 17:08:03 EST 2018


Hi list!

I am a bit frustrated because I try to debug a problem on a test RAIDZ 
array. I can reproducably exhaust my 16 GB of physical memory and make 
that test machine hang or oom kill or do kernel panics on hung task or 
oom if I configure it so.

Is this a bug for the issue tracker?

I set up kdump crash dumping on oom and hung task and I got two dumps of 
the situation.

My problem is: whatever eats up all RAM is not tracked in any statistics 
I know. htop does not show it and I also don't find any details in the 
kernel crash dump ps. It is however clear that all memory is used up in 
that situation by something.

The setup may be somewhat special:

* 3 Disk RAIDZ array, old drives 250 GB not the same brand or model

* ZVOL with a block size of 4k (RAID-Storage/iSCSI1 volsize=100G, 
volblocksize= 4K, checksum=on, compression=lz4)

* I formated the zvol with XFS which I forced to use 4k blocks (mkfs -t 
xfs -d su=4k -d sw=3 -f /dev/RAID-Storage/iSCSI1)

* Ubuntu 18.04.1, stock kernel 4.15.0-38-generic with their shipped zfs 
0.7.5-1ubuntu1 (I also tried 4.18 series kernels with zfs/spl dkms 0.7.11)

* a cache and log nvme SSD but it does not seem to matter much regarding 
the crash

* I created the array in freenas but as far as I know that should make 
no difference even if I now use it in ZFS on Linux

* I use phoronix-test-suite to run some disk speed test 
utilities(1808205-RA-SAMSUNGQU86[1], it runs the same utility and 
configuration several times)

** it runs some sqlite injections (3 or 4 times)

** fio-3.1 with "block sizes" of 4k and 2M random reading using libaio 
(Linux AIO) (about 10 times both)

** fio-3.1 with "block size" of 2M random writing using libaio (Linux 
AIO) [2]

The first run of this last test succeeds. On the second run suddenly the 
RAM usage spikes from around 4 GB to all of the RAM which crashes the 
machine or almost all which causes hung task kernel messages and the 
machine is unresponsive. [3]

The ARC statistics show only about 4 GB of RAM use at that moment.

I am willing to provide any information I can get from the machine to 
help track that down but I have no more ideas where to look.

My conclusion is that there is something wrong with ZFS'es ZVOLs and 
Linux'es kernel based asynchronuos IO.

As a side note: I also tried to run the same test in a Linux byhve VM on 
freenas 11.2 RC using that same physical array, the same ZVOL and the 
freebsd ZFS implementation and that works. I can run the write test and 
all the other following tests.

Furthermore this started as a test for iSCSI on RAIDZ. With Linux'es 
built in LIO/iSCSI implementation the test also finishes on the machine 
using this setup as the target. Using the alternative SCST 
implementation the random write test crashes the target machine. Perhaps 
the former does not use async io and the latter does.

Best regards

Omar Siam

[1] https://openbenchmarking.org/result/1808205-RA-SAMSUNGQU86

[2] Generated fio-3.1 configuration:

[global]
rw=randwrite
ioengine=libaio
iodepth=64
size=1g
direct=1
buffered=0
startdelay=5
ramp_time=5
runtime=20
time_based
disk_util=0
clat_percentiles=0
disable_lat=1
disable_clat=1
disable_slat=1
filename=fiofile

[test]
name=test
bs=2m
stonewall

[3]
crash> kmem -i
                  PAGES        TOTAL      PERCENTAGE
     TOTAL MEM  3990273      15.2 GB         ----
          FREE    34063     133.1 MB    0% of TOTAL MEM
          USED  3956210      15.1 GB   99% of TOTAL MEM
        SHARED    65844     257.2 MB    1% of TOTAL MEM
       BUFFERS        0            0    0% of TOTAL MEM
        CACHED    64807     253.2 MB    1% of TOTAL MEM
          SLAB   875668       3.3 GB   21% of TOTAL MEM

    PID    PPID  CPU       TASK        ST  %MEM     VSZ RSS  COMM
    5379   4951   3  ffff9844f780ae80  UN   0.0  511368    772 fio



More information about the zfs-discuss mailing list