[zfs-discuss] Summary: space overhead caused by record-/volblock-size

Klaus Hartnegg hartnegg at uni-freiburg.de
Mon Jul 10 10:38:19 EDT 2017

This is a try to summarize what I leaned from this discussion.
Thanks to everybody who sent comments.

Foreword: this effect is only relevant with large physical disk sectors 
and small logical blocks. This typically happens on zvols, not on files. 
Because the default recordsize (for files) is 128k, but the default 
volblocksize is only 8k. And the effect is much larger on new disks with 
large physical sectors (large ashift).

Turns out there are two reasons for the kind of space overhead which I 
had asked for: additional party sectors, and padding sectors.

additional parity sectors

Chunks of data are split into blocks with size recordsize (or volblocksize).
If datasize is not a multiple of recordsize, the last block will be 
smaller. But parity sectors are added to each block, regardless how 
small it is. This can cause a much higher ratio of parity to data 
sectors than a classic RAID would have.
If volblocksize equals sectorsize, parity sectors are added to each data 

padding sectors

The number of sectors allocated to store the data and parity sectors of 
one chunk of data must be a multiple of p+1 (p = number of parity disks).
If it is not, padding sectors are added.
"So that when it is freed it does not leave a free segment which is too 
small to be used (i.e. too small to fit even a single sector of data 
plus p parity sectors)"

For both possible sources of space overhead, the effect gets small when 
recordsize or volblocksize is large.

The actual amount of space overhead is difficult to predict if you have 
compression enabled, and your data is compessible. Then the compression 
often overcompensates the space overhead.

Notes for those who put NTFS into a zvol:
Several people recommended NTFS compression instead of ZFS compression.
NTFS compression is not supported with cluster sizes > 4KB.
Each change of a single sector requires to read and rewrite the whole 
volblock (if it is not already cached).

While researching this topic, I found two more space overheads:

Unused space behind last metaslab

"Vdevs are divided into 200 or less metasabs for the purpose of space 
management. Metaslab size is always a number equal to 2^N, where N is 
defined by the metaslab_shift parameter."
If the available space is not a multiple of 2^N, the remaining space is 
left unused.
The value metaslab_shift can be found with
zdb -C $pool | grep metaslab_shift

Slop space

Slop space reservation: 1/32th of the zpool capacity.
"to ensure that some critical ZFS operations can complete even in 
situations with very low free space remaining"

See also:
Capacity calculator http://wintelguy.com/zfs-calc.pl

More information about the zfs-discuss mailing list