[zfs-discuss] nvme -- ashift value -- Samsung XS1715

Andreas Dilger adilger at dilger.ca
Wed Feb 17 18:52:53 EST 2016


On Feb 17, 2016, at 4:36 PM, Jeff Johnson via zfs-discuss <zfs-discuss at list.zfsonlinux.org> wrote:
> 
> Kevin,
> 
> I don't think you can do anything but ashift=9 here. The drive reports as a 512b sector device. I don't think any (or many) of the nvme flash drives are advanced format by default.
> 
> You may be able to re-initialize the drive using vendor provided tools to reconfigure it as a 4KN drive, and thereby be able to do ashift=12 but that depends on the design of the drive and the level of modification the vendor (Dell) is willing to allow.
> 
> You many be able to run `blockdev --getss /dev/<nvmedevice>` and `blockdev --getpbsize /dev/<nvmedevice>` to verify the physical and reported sector sizes of the device.

Note that it is still possible to use "ashift=12" on 512-byte sector drives.  The main reason to do so would be to provide some flexibility for moving the VDEV to a newer device that _does_ require "ashift=12" in order to work.  

The impact would be increased space usage for small files, but it would also reduce overhead for larger files.  The increased space usage is amplified when using RAID-Z2 VDEVs, since a 4KB chunk is needed from each of the 3+ devices even for small files.  When using mirrored VDEVs this isn't as pronounced unless you have tiny files.

Cheers, Andreas

> 
> 
> On Wed, Feb 17, 2016 at 3:21 PM, Kevin Abbey via zfs-discuss <zfs-discuss at list.zfsonlinux.org> wrote:
> 
> Hi, 
> 
> I'm considering to format an nvme device with zfs but am not sure which ashift is correct. 
> 
> I've read the following two links for reference but am still unable to determine the correct ashift value. 
> 
> http://list.zfsonlinux.org/pipermail/zfs-discuss/2014-June/016263.html 
> 
> https://github.com/zfsonlinux/zfs/blob/master/cmd/zpool/zpool_vdev.c 
> 
> 
> I've pasted the device information below.  If anyone can assist please share with an explaination. 
> 
> Thank you, 
> Kevin 
> 
> 
> 
> 
> ==================================== 
> 
> ~]# lspci -s 84:00.0 -vvv 
> 
> 
> 84:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller 171X (rev 03) (prog-if 02 [NVM Express]) 
>     Subsystem: Dell Express Flash NVMe XS1715 SSD 800GB 
>     Physical Slot: 180 
>     Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ 
>     Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- 
>     Latency: 0 
>     Interrupt: pin A routed to IRQ 37 
>     Region 0: Memory at c8600000 (64-bit, non-prefetchable) [size=16K] 
>     Capabilities: [c0] Power Management version 3 
>         Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) 
>         Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- 
>     Capabilities: [c8] MSI: Enable- Count=1/32 Maskable+ 64bit+ 
>         Address: 0000000000000000  Data: 0000 
>         Masking: 00000000  Pending: 00000001 
>     Capabilities: [e0] MSI-X: Enable+ Count=129 Masked- 
>         Vector table: BAR=0 offset=00002000 
>         PBA: BAR=0 offset=00003000 
>     Capabilities: [70] Express (v2) Endpoint, MSI 00 
>         DevCap:    MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited 
>             ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ 
>         DevCtl:    Report errors: Correctable- Non-Fatal+ Fatal+ Unsupported+ 
>             RlxdOrd- ExtTag+ PhantFunc- AuxPwr- NoSnoop- FLReset- 
>             MaxPayload 256 bytes, MaxReadReq 512 bytes 
>         DevSta:    CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- 
>         LnkCap:    Port #0, Speed 8GT/s, Width x4, ASPM L0s L1, Exit Latency L0s <4us, L1 <4us 
>             ClockPM- Surprise- LLActRep- BwNot- 
>         LnkCtl:    ASPM Disabled; RCB 64 bytes Disabled- CommClk- 
>             ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- 
>         LnkSta:    Speed 8GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- 
>         DevCap2: Completion Timeout: Range AB, TimeoutDis+, LTR-, OBFF Not Supported 
>         DevCtl2: Completion Timeout: 65ms to 210ms, TimeoutDis-, LTR-, OBFF Disabled 
>         LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis- 
>              Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- 
>              Compliance De-emphasis: -6dB 
>         LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+ 
>              EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest- 
>     Capabilities: [40] Vendor Specific Information: Len=24 <?> 
>     Capabilities: [100 v2] Advanced Error Reporting 
>         UESta:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- 
>         UEMsk:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt+ UnxCmplt+ RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- 
>         UESvrt:    DLP+ SDES- TLP+ FCP- CmpltTO+ CmpltAbrt- UnxCmplt- RxOF- MalfTLP+ ECRC+ UnsupReq- ACSViol- 
>         CESta:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr- 
>         CEMsk:    RxErr+ BadTLP+ BadDLLP+ Rollover+ Timeout+ NonFatalErr+ 
>         AERCap:    First Error Pointer: 00, GenCap+ CGenEn+ ChkCap+ ChkEn+ 
>     Capabilities: [180 v1] #19 
>     Capabilities: [150 v1] Vendor Specific Information: ID=0001 Rev=1 Len=02c <?> 
>     Kernel driver in use: nvme 
> 
> 
> ==================================== 
> 
> 
> ~]# modinfo nvme 
> 
> 
> filename: /lib/modules/3.10.0-327.4.5.el7.x86_64/kernel/drivers/block/nvme.ko 
> version:        1.0 
> license:        GPL 
> author:         Matthew Wilcox <willy at linux.intel.com> 
> rhelversion:    7.2 
> srcversion:     6FE34EC5F6A703F8EDE6C77 
> alias:          pci:v*d*sv*sd*bc01sc08i02* 
> depends: 
> intree:         Y 
> vermagic:       3.10.0-327.4.5.el7.x86_64 SMP mod_unload modversions 
> signer:         CentOS Linux kernel signing key 
> sig_key: 10:5D:A1:3D:CA:AA:74:AE:50:00:17:E7:D5:2C:DA:9B:7C:C5:10:93 
> sig_hashalgo:   sha256 
> parm:           admin_timeout:timeout in seconds for admin commands (byte) 
> parm:           io_timeout:timeout in seconds for I/O (byte) 
> parm:           shutdown_timeout:timeout in seconds for controller shutdown (byte) 
> parm:           nvme_major:int 
> parm:           nvme_char_major:int 
> parm:           use_threaded_interrupts:int 
> 
> 
> 
> 
> ==================================== 
> 
> 
> ~]# nvme list 
> 
> Node             Model                Version  Namepace Usage                      Format           FW Rev 
> ---------------- -------------------- -------- -------- -------------------------- ---------------- -------- 
> /dev/nvme0n1     Dell Express Flash N 1.0      1 14.27  MB / 800.17  GB    512   B +  0 B   IPM0FD3Q 
> 
> 
> ==================================== 
> 
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at list.zfsonlinux.org
> http://list.zfsonlinux.org/cgi-bin/mailman/listinfo/zfs-discuss
> 
> 
> 
> 
> -- 
> ------------------------------
> Jeff Johnson
> Co-Founder
> Aeon Computing
> 
> jeff.johnson at aeoncomputing.com
> www.aeoncomputing.com
> t: 858-412-3810 x1001   f: 858-412-3845
> m: 619-204-9061
> 
> 4170 Morena Boulevard, Suite D - San Diego, CA 92117
> 
> High-Performance Computing / Lustre Filesystems / Scale-out Storage
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at list.zfsonlinux.org
> http://list.zfsonlinux.org/cgi-bin/mailman/listinfo/zfs-discuss


Cheers, Andreas







More information about the zfs-discuss mailing list