[zfs-discuss] ZOL + SATA + Interposers + SAS Expanders is bad Mkay

Dead Horse deadhorseconsulting at gmail.com
Tue Oct 15 14:50:41 EDT 2013

Answers inline

> "interposers are more likely to do the correct thing than the drives'
> firmwares"

Regrettably not always which is why this problems can and does occur. A
quick google search will yield you many tales of woe. Additionally why the
aforementioned fix (workaround was put into illumos/solaris in the first
Garrett from Nexenta does a good job explaining things
and here:

Open Indiana discuss thread

A good quick techinical overview on this generation of interposer can be
found here:

"I have the feeling your experience might be another version of "drive X
> incompatible with interposer Y and controller Z"

Not the case. Note in my prior replies this setup is the Sun Amber Road
generation 2 hardware (EG: Sun Storage 7000 series). The actual Sun variant
of the Moose drives I have are ST31000NSSUN1.0T and they are *different*
from both a firmware and design standpoint. Additionally the interposer
which is again a Sun variant of the LSISS1320 AAMUX is both firmware and
design different then the consumer equivelents. Both drive and interposer
firmware are also tuned and designed to work with the J-Series arrays and
their LSI expander + backplane. The expander and backplane again being Sun
specific. The original technologies were aqquired from StorageTek and
manufactured for Sun by Quanta.

"Another possibility is drive X is faulty". "Considering the Moose drives
> are not enterprise" "Check smart...."
The drives are fine no surface issues or otherwise. I spent quite a bit of
time pouring over the SMART data from all the drives in this setup to rule
this out. The Moose drives are Enterprise SATA drives hence the (ES.2) AKA:
(E)nterprise (S)ATA Generation 2

Quoted from the Solaris ZFS discuss lists: "The J series JBODs aren't
overly expensive, it's the darn drives for
them that break the budget." <-- EG: Engineered drive solution for the
Amber Road storage appliances

Also If you are interested or curious you can peruse the fishworks
changelogs here much of the underlying history is documented there (at
least what is public knowledge) ;-)
Found here: https://wikis.oracle.com/display/FishWorks/Software+Updates

> "Can you determine the first drive which has given errors and caused
> reset from dmesg or /var/log/dmesg?"

The attachment to my original mail to the mailing list contains example
output. The drive on which is occurs is random no one particular disk is at

"Can you confirm NCQ is disabled on such drives"

This is a problem on *certain* consumer Moose drives and firmware. This
works fine either on or off on the Sun drives and has not linkage to the
issue at hand. I actually have some of the consumer versions here as well
and interestingly I have flashed some of the reported *affected* firmware
on them for fun and tested the reported NCQ issue. I was not actually able
to observe that reported issue with firmware MA0D but I did reproduce it
with SN04.

"One thing I don't understand in your story. You write"

Quoting Garrett from Nexenta:
"The problem is that when a reset occurs on an expander, it aborts any
in-flight operations, and they fail. Unfortunately, the *way* in which they
fail is to generate a generic "hardware error". The problem is that the
sd(7d) driver's response to this is to ... issue another reset, in a futile
effort to hopefully correct things."

In this case replace Solaris (sd) with Linux (sg).


To unsubscribe from this group and stop receiving emails from it, send an email to zfs-discuss+unsubscribe at zfsonlinux.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.zfsonlinux.org/pipermail/zfs-discuss/attachments/20131015/e0021aba/attachment.html>

More information about the zfs-discuss mailing list