[zfs-discuss] Re: ZFS soft lockup bug?

Steve Costaras stevecs at chaven.com
Tue Jun 14 16:23:49 EDT 2011



Would say about average. Though found a couple notes so far:

- with ST2000DL003 drives it appears you can get about 32 per 6Gbps SAS (24Gbps/4channel multi-lane) port before saturation (for I/O or scrub saturation) I'm running 16/port now.
- This comes to about 64 drives per 9200-8e controller (32 for each port max) before you saturate the 8x PCIe (v2 5GT/s) bandwidth with these types of drives (sata helps here as it's kind of crappy and self-limiting)
- Unlike Solaris/SPARC the issues in boot strapping the system are interesting. Most bios it seem under intel are limted to 10-12 INT 13 devices. This causes an issue where the system won't boot (can't find your boot drive). Limit INT 13 devices on controllers and disable bios on secondary (non boot disk) controllers. 
- with many HBA's you will invariably run into option rom space issues, disable bios loading on PCIe slots for everything except your boot card/chipset.
- With the test system here the bottleneck is CPU, getting ~4GB/s performance (raw i/o as seen from iostat), running with dual X5680 cpu's. all 12 threads in the 90% utilization range. zero time in i/o wait, and each drive hitting teh 75% utilization mark (iostat). 
- Hyperthreading hurts performance (with hyperthreading enabled, which I expected, decreases perfromance down to ~3.2-3.5GB/s).
- would indicate that during scrub would need more cores (X7xxx series or 8+core procs) to handle zfs+ network i/o and nfs/smb/iscsi exports at same time assuming 10GbE network loads.

The above is mainly scrub performance (my primary item was for verification scans to get full array scrubs down to less than 16 hours for the ~128TB array). Scrub of ~50TB of data took about 4.5hours. 

I'm seeing 'stalls' of up to 6-10 seconds on system but it recovers when under write loads (system has currently 24GB ram; will be upgrading to 192GB when I can find 16GB 1333mhz sticks). Stalls are for all system tasks (i/o, network, application, etc, so perhpas kernel/spl related?) No real indication as to why yet trying to monitor.




-----Original Message-----
From: devsk [mailto:devsku at gmail.com]
Sent: Tuesday, June 14, 2011 02:54 PM
To: 'zfs-discuss'
Subject: [zfs-discuss] Re: ZFS soft lockup bug?


Oh WOW!

That's a big install!

-devsk


On Jun 14, 11:43 am, "Steve Costaras" <stev... at chaven.com> wrote:
> If this is a new issue, then it's not too common. This is the first I'm running into it (using the PPA sources) and I've been testing for the past several months. Only real items of note would be that I just got all 104 drives hooked up to the system (16 6-disk raidz2 vdevs/pool + 1 8-disc raidz2 pool) across 6 LSI 9200-8 controllers and have put on about 3,000,000 files (~50TB) and hitting the arrays. replacing drives, doing scrub performance testing et al. So the system is not just sitting idle.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.zfsonlinux.org/pipermail/zfs-discuss/attachments/20110614/945dc266/attachment.html>


More information about the zfs-discuss mailing list