[zfs-discuss] General tuning suggestions for a read only workload using SSD caching

Ivan Ator ivanatorz at gmail.com
Fri Mar 23 22:21:06 EDT 2012


Hi everyone, I'm new to ZFS in general and am trying it out on Ubuntu 
11.10. I seem to have hit a disk IO bottleneck far too easily given the 
amount of data on disk and the workload.

I have 24x900GB SAS drives and 2x256GB SSD's all JBOD exported to the OS 
via LSI MegaRAID 9240-8i where ZFS uses the SAS in RAID10 and the SSD's 
as L2ARC:

# zpool status
   pool: raid
  state: ONLINE
  scan: none requested
config:

         NAME         STATE     READ WRITE CKSUM
         raid         ONLINE       0     0     0
           mirror-0   ONLINE       0     0     0
             sdc      ONLINE       0     0     0
             sdd      ONLINE       0     0     0
           mirror-1   ONLINE       0     0     0
             sde      ONLINE       0     0     0
             sdf      ONLINE       0     0     0
           mirror-2   ONLINE       0     0     0
             sdg      ONLINE       0     0     0
             sdh      ONLINE       0     0     0
           mirror-3   ONLINE       0     0     0
             sdi      ONLINE       0     0     0
             sdj      ONLINE       0     0     0
           mirror-4   ONLINE       0     0     0
             sdk      ONLINE       0     0     0
             sdl      ONLINE       0     0     0
           mirror-5   ONLINE       0     0     0
             sdm      ONLINE       0     0     0
             sdn      ONLINE       0     0     0
           mirror-6   ONLINE       0     0     0
             sdo      ONLINE       0     0     0
             sdp      ONLINE       0     0     0
           mirror-7   ONLINE       0     0     0
             sdq      ONLINE       0     0     0
             sdr      ONLINE       0     0     0
           mirror-8   ONLINE       0     0     0
             sds      ONLINE       0     0     0
             sdt      ONLINE       0     0     0
           mirror-9   ONLINE       0     0     0
             sdu      ONLINE       0     0     0
             sdv      ONLINE       0     0     0
           mirror-10  ONLINE       0     0     0
             sdw      ONLINE       0     0     0
             sdx      ONLINE       0     0     0
           mirror-11  ONLINE       0     0     0
             sdy      ONLINE       0     0     0
             sdz      ONLINE       0     0     0
         cache
           sda        ONLINE       0     0     0
           sdb        ONLINE       0     0     0

errors: No known data errors

# zpool list
NAME   SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
raid  9.75T  1.61T  8.14T    16%  1.00x  ONLINE  -

I haven't changed anything other than the record size to 128k.

What I'm noticing is a very high amount of reads on the disks for the 
amount of data going over the network (the disks are 100% saturated):

----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw
   0   7  90   0   0   3| 730M  128k|2664k  130M|   0     0 |  65k   19k
   0   9  88   0   0   3| 744M 2048k|2529k  124M|   0     0 |  64k   23k
   0   8  90   0   0   3| 666M  128k|2650k  130M|   0     0 |  65k   19k
   0   7  90   0   0   2| 657M  512k|2649k  130M|   0     0 |  64k   18k
   0   7  90   0   0   3| 560M 1664k|2556k  128M|   0     0 |  61k   17k
   0   8  89   0   0   3| 716M 2176k|2645k  131M|   0     0 |  66k   20k
   0   9  88   0   0   3| 706M 1664k|2647k  128M|   0     0 |  65k   20k
   0   8  89   0   0   3| 674M 2580k|2693k  130M|   0     0 |  65k

The cache disk usage and activity seems very low for being in use almost 
24 hours now:


                 capacity     operations    bandwidth
pool         alloc   free   read  write   read  write
-----------  -----  -----  -----  -----  -----  -----
raid         1.61T  8.14T  5.80K      0   742M      0
   mirror      138G   694G    360      0  45.1M      0
     sdc          -      -    153      0  19.2M      0
     sdd          -      -    207      0  25.9M      0
   mirror      138G   694G    489      0  61.2M      0
     sde          -      -    194      0  24.3M      0
     sdf          -      -    295      0  36.9M      0
   mirror      138G   694G    495      0  61.9M      0
     sdg          -      -    283      0  35.4M      0
     sdh          -      -    212      0  26.5M      0
   mirror      138G   694G    577      0  72.1M      0
     sdi          -      -    303      0  38.0M      0
     sdj          -      -    273      0  34.1M      0
   mirror      138G   694G    512      0  64.0M      0
     sdk          -      -    226      0  28.3M      0
     sdl          -      -    286      0  35.8M      0
   mirror      138G   694G    578      0  72.3M      0
     sdm          -      -    278      0  34.8M      0
     sdn          -      -    299      0  37.5M      0
   mirror      138G   694G    565      0  70.6M      0
     sdo          -      -    281      0  35.1M      0
     sdp          -      -    284      0  35.5M      0
   mirror      138G   694G    522      0  65.3M      0
     sdq          -      -    303      0  38.0M      0
     sdr          -      -    218      0  27.3M      0
   mirror      138G   694G    495      0  61.9M      0
     sds          -      -    227      0  28.4M      0
     sdt          -      -    268      0  33.5M      0
   mirror      138G   694G    482      0  60.3M      0
     sdu          -      -    180      0  22.5M      0
     sdv          -      -    301      0  37.7M      0
   mirror      138G   694G    446      0  55.8M      0
     sdw          -      -    289      0  36.1M      0
     sdx          -      -    157      0  19.7M      0
   mirror      138G   694G    410      0  51.3M      0
     sdy          -      -    299      0  37.5M      0
     sdz          -      -    110      0  13.8M      0
cache            -      -      -      -      -      -
   sda         108G   131G     82      0  10.2M      0
   sdb         107G   131G     80      6  9.87M   558K


I have other servers with only 16 drives and an identical workload 
pushing more network traffic with less disk usage (something is clearly 
wrong).

Can anyone comment on the subject? Any responses are appreciated. I 
think many would benefit from general information to assist a problem 
like this. I've seen suggestions to reduce recordsize but that seems to 
be more for improving on small writes. I'd be happy to provide 
additional information just tell me what you need.

Thanks!

Ivan



More information about the zfs-discuss mailing list