[zfs-discuss] Compressratio vs du

Joel Krauska jkrauska at gmail.com
Wed Feb 14 17:20:19 EST 2018


Nmz:

Your method seems to use find -ls

"""list current file in ls -dils format on standard output.  The block
counts are of 1K blocks, unless the environment variable POSIXLY_CORRECT is
set, in which case 512-byte blocks are used.  See the UNUSUAL FILENAMES
section for information about how unusual characters in filenames are
handled."""

Example dir
# lsdu .
Path: .
  Total Size: 6.28 GB
  Disk Usage: 2.50 GB
  Compress Ratio: 2.5095


Seeing nearly the same results from my script...

#!/bin/bash

# Compares du -s and du -s --apparent-size
# to get effective compression ratio for a given dir

if [ -z "$1" ]; then
    dir=`pwd`
else
    dir=$1
fi

big=`du -s --apparent-size $dir | awk '{print $1}'`
small=`du -s $dir | awk '{print $1}'`
ratio=`echo "scale=3;$big / $small" | bc -l`
echo "$dir $big $small $ratio"


# /usr/local/sbin/du-compress-ratio .
. 6585664 2580693 2.551


So maybe a 1000 vs 1024 thing going on?

In any case, it doesn't line up with compressratio.
(the reason I posted)

# zfs get compressratio data
NAME  PROPERTY       VALUE  SOURCE
data  compressratio  7.60x  -







On Wed, Feb 14, 2018 at 1:40 PM, Nmz via zfs-discuss <
zfs-discuss at list.zfsonlinux.org> wrote:

>
>
> Take this function
>
> # cat .bashrc
> ...
>
> function lsdu() (
>     export SEARCH_PATH=$*
>     if [ ! -e "$SEARCH_PATH" ]; then
>         echo "ERROR: Invalid file or directory ($SEARCH_PATH)"
>         return 1
>     fi
>     find "$SEARCH_PATH" -ls | gawk --lint --posix '
>         BEGIN {
>             split("B KB MB GB TB PB",type)
>             ls=hls=du=hdu=0;
>             out_fmt="Path: %s \n  Total Size: %.2f %s \n  Disk Usage: %.2f
> %s \n  Compress Ratio: %.4f \n"
>         }
>         NF >= 7 {
>             ls += $7
>             du += $2
>         }
>         END {
>             du *= 1024
>             for(i=5; hls<1; i--) hls = ls / (2^(10*i))
>             for(j=5; hdu<1; j--) hdu = du / (2^(10*j))
>             printf out_fmt, ENVIRON["SEARCH_PATH"], hls, type[i+2], hdu,
> type[j+2], ls/du
>         }
>     '
> )
>
> Example
>
> # lsdu /root/
> Path: /root/
>   Total Size: 3.91 GB
>   Disk Usage: 5.18 GB
>   Compress Ratio: 0.7544
>
> # lsdu rezultatai/
> Path: rezultatai/
>   Total Size: 335.46 GB
>   Disk Usage: 29.25 GB
>   Compress Ratio: 11.4690
>
>
> ----- Original Message -----
> From: Joel Krauska via zfs-discuss <
> zfs-discuss at list.zfsonlinux.org>
> To: zfs-discuss at list.zfsonlinux.org
>
> Date: Wednesday, February 14, 2018, 11:27:56 PM
> Subject: [zfs-discuss] Compressratio vs du
>
>
> Hello,
>
> I've been trying to better understand how to evaluate effective
> compression ratios.
>
> I've come across many examples that compare the output from
>
> du -s . and   du -s --apparent-size
> as a technique to validate/confirm compression.
>
> eg.
> # dd if=/dev/zero of=zeros bs=1M count=100
>
> # du -s zeros
> 1 zeros
>
> vs
>
> # du -s --apparent-size zeros
> 102400 zeros
>
> Using this technique I've been attempting to establish which
> directories/datasets compress well and which do not.
>
> However on the whole I'm seeing a pretty big disparity between what du
> reports (typically 3x-4x) and what 'zfs get compressratio' reports
> (typically 7-8x).
>
> This has lead me to question if du is an accurate representation of zfs's
> compression.  Either zfs's internal metric is skewed or du is skewed.
>
> Any advice here?
>
> Note: This in part became of interest after shifting to ashift=13 setup
> for our SSDs.  du got father off, but the zfs parameter seems similar.
>
> Cheers,
>
> Joel Krauska
>
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at list.zfsonlinux.org
> http://list.zfsonlinux.org/cgi-bin/mailman/listinfo/zfs-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.zfsonlinux.org/pipermail/zfs-discuss/attachments/20180214/76b7e43e/attachment-0001.html>


More information about the zfs-discuss mailing list