[zfs-discuss] changing ashift=9 to ashift=12

Badi' Abdul-Wahid abdulwahidc at gmail.com
Wed May 4 00:43:19 EDT 2016


Thank you all for your comments.
I've updated my procedure to the and run another simulation.
The main changes are using zpool detach instead of split, using zpool
snapshot -r, and zfs send -R.

   zpool create -o ashift=9 p0 mirror /dev/loop{0,1} mirror /dev/loop{2,3}
   zfs create p0/v0
   zfs create p0/v1
   dd if=/dev/urandom of=/p0/v0/data bs=10M count=50
   dd if=/dev/urandom of=/p0/v1/data bs=10M count=40
   mkdir /tmp/p0 && cp -av /p0/* /tmp/p0
   diff /p0/v0/data /tmp/p0/v0/data ; echo $?   # => 0
   zpool detach p0 loop1
   zpool detach p0 loop3
   zpool labelclear /dev/loop1 /dev/loop3
   zpool create -o ashift=12 p1 /dev/loop{1,3}
   zpool snapshot -r p0 at now
   zpool send -vR p0 at now | zpool recv -vdF p1
   for i in 0 1 ; do diff /p0/v$i/data /p1/v$i/data; echo $?; done   # => 0
0
   zpool destroy p0
   zpool labelclear /dev/loop{0,2}
   zpool attach p1 loop1 /dev/loop0
   zpool attach p1 loop3 /dev/loop2
   zpool export p1
   zpool import p1 p0
   for i in 0 1 ; do diff /p0/v$i/data /tmp/p0/v$i/data; echo $?; done   #
=> 0 0

A couple other questions (I'm trying to minimize the time that the disks
are not mirrored):


Resilvering:

Each call to zpool attach requires each previous call to attach to have
completed resilvering, even though the mirrors are distinct.
Eg mirror-0 must finish before loop2 is attached to loop3.
NAME        STATE     READ WRITE CKSUM
p1          ONLINE       0     0     0
 mirror-0  ONLINE       0     0     0
   loop0   ONLINE       0     0     0
   loop1   ONLINE       0     0     0
 mirror-1  ONLINE       0     0     0
   loop2   ONLINE       0     0     0
   loop3   ONLINE       0     0     0

Is it possible to resilver the mirrors in parallel?
The man page indicates that "attach"ing triggers an immediate resilver.
Can the resilvering be delayed until all disks are (partially) attached and
then trigger the resilver for all mirrors?
Something like:
  zpool attach -o delayed_resilver=true p1 loop1 /dev/loop0
  zpool attach -o delayed_resilver=true p1 loop3 /dev/loop2
  zpool scrub p1 # triggers resilver then scrub


Failures:

My understanding now is that, barring hardware failure of disks or other
components, this process should recover from power loss.
I experimented with killing the send/recv partway and was able to start
over after destroying the partial snapshot on p1.
Is there a way for the send/recv to be incremental so that the send | recv
picks up from the previous state?


PS. My critical data is stored on a separate machine.




On Tue, May 3, 2016 at 3:36 AM, Sam Van den Eynde <svde.tech at gmail.com>
wrote:

> I'd scrub the pool before starting as well.
>
>
>
> On Tue, May 3, 2016 at 6:59 AM, Badi' Abdul-Wahid via zfs-discuss <
> zfs-discuss at list.zfsonlinux.org> wrote:
>
>> Hi
>>
>> I need to migrate a pool from ashift=9 to ashift=12.
>> I *think* I've figured out how to do it, but I'd like any
>> comments/suggestions before I *actually* go through the process.
>> Some questions:
>> - are there any assumptions that I am obviously making that are wildly
>> off the mark?
>> - while my test run (below) indicates the process should work, is there
>> anything that could result in data loss, beyond disk failure? Eg, how would
>> sudden power loss (rare, but known to happen where I live), or other issues
>> affect the process?
>> - what are my options if a disk dies during the process? Is the whole
>> pool lost or would I loose just the data on the at disk?
>> - is there another way of doing this that is easy on the wallet (say <
>> $100 USD)?
>> - have I backed myself into a corner and is this a totally disastrous
>> plan almost guaranteed to fail?
>>
>> This is for my self-built NAS server for the home and has been my primary
>> exposure the zfs as I tinkered with it.
>> My platform is Linux (4.4.6) on NixOS 16.03 with zfs-on-linux v0.6.5.4.
>> I created the pool several years ago without specifying ashift=12 and the
>> initial disks all had 512 byte sectors.
>> The pool has survived several drive replacements and upgrades since and I
>> had forgotten about this until I stuck a new drive in as an upgrade which
>> happened to use 4K sectors and got an error when I tried to replace one of
>> the disks (this was a preemptive upgrade, not replacement of a failed
>> drive).
>> Rebuilding the pool entirely from scratch is not an option.
>> The layout is of 4 mirrors with 2 disks per mirror.
>> While the disks seem in good health (regular scrubs and smart checks) and
>> I would prefer to not loose data, it would not be the end of the world if I
>> lost it. It would make me really sad though.
>>
>> Initially based on the steps here, I've made some modifications:
>> https://forums.freebsd.org/threads/45693/
>> In summary, the steps are:
>> 0) given an initial pool p0
>> 1) split the mirrored vdevs into pool p1
>> 2) destroy p1 to reclaim the disks
>> 3) create a new pool p1 which ashift=12
>> 4) send the data from p0 to p1
>> 5) destroy p0 to reclaim the disks
>> 6) add the disks to p1
>> 7) rename p1 to p0 using export/import
>>
>> I've tried with loopback files and it *seems* to work.
>> Some random data was shown to survive the process.
>>
>> 0) Given: initial pool p0 of mirrored vdevs with some random data:
>>
>> # zpool create -o ashift=9 p0 mirror /dev/loop{0,1} mirror /dev/loop{2,3}
>> # dd if=/dev/urandom of=/p0/data bs=32M count=10
>> # zpool status
>>   pool: p0
>>  state: ONLINE
>>   scan: none requested
>> config:
>>
>> NAME        STATE     READ WRITE CKSUM
>> p0          ONLINE       0     0     0
>>  mirror-0  ONLINE       0     0     0
>>    loop0   ONLINE       0     0     0
>>    loop1   ONLINE       0     0     0
>>  mirror-1  ONLINE       0     0     0
>>    loop2   ONLINE       0     0     0
>>    loop3   ONLINE       0     0     0
>>
>> errors: No known data errors
>>
>>
>> 1) # zpool split p0 p1 loop{1,3}
>>
>> 2) # zpool destroy p1
>>
>> 3) create new pool with ashift=12
>>
>> # zpool create -o ashift=12 p1 /dev/loop{1,3}
>> # zpool status
>>   pool: p0
>>  state: ONLINE
>>   scan: none requested
>> config:
>>
>> NAME        STATE     READ WRITE CKSUM
>> p0          ONLINE       0     0     0
>>  loop0     ONLINE       0     0     0
>>  loop2     ONLINE       0     0     0
>>
>> errors: No known data errors
>>
>>   pool: p1
>>  state: ONLINE
>>   scan: none requested
>> config:
>>
>> NAME        STATE     READ WRITE CKSUM
>> p1          ONLINE       0     0     0
>>  loop1     ONLINE       0     0     0
>>  loop3     ONLINE       0     0     0
>>
>> errors: No known data errors
>>
>> 4) send the data from p0 to p1
>>
>> # zfs snapshot p0 at now
>> # zfs send p0 at now | zfs receive -F p1
>> # diff /p{0,1}/data; echo $?
>> 0
>>
>> 5) destroy p0 to reclaim disks
>>
>> # zpool destroy p0
>>
>> 6) add the newly freed disks to p1
>>
>> # zpool attach -o ashift=12 p1 loop1 /dev/loop0
>> # zpool attach -o ashift=12 p1 loop3 /dev/loop2
>> # zpool status
>>   pool: p1
>>  state: ONLINE
>>   scan: resilvered 160M in 0h0m with 0 errors on Mon May  2 23:52:52 2016
>> config:
>>
>> NAME        STATE     READ WRITE CKSUM
>> p1          ONLINE       0     0     0
>>  mirror-0  ONLINE       0     0     0
>>    loop1   ONLINE       0     0     0
>>    loop0   ONLINE       0     0     0
>>  mirror-1  ONLINE       0     0     0
>>    loop3   ONLINE       0     0     0
>>    loop2   ONLINE       0     0     0
>>
>> errors: No known data errors
>>
>> 7) rename p0 to p1
>>
>> # zpool export p1
>> # zpool export p1 p0
>> # zpool status p0
>>   pool: p0
>>  state: ONLINE
>>   scan: resilvered 160M in 0h0m with 0 errors on Mon May  2 23:52:52 2016
>> config:
>>
>> NAME        STATE     READ WRITE CKSUM
>> p0          ONLINE       0     0     0
>>  mirror-0  ONLINE       0     0     0
>>    loop1   ONLINE       0     0     0
>>    loop0   ONLINE       0     0     0
>>  mirror-1  ONLINE       0     0     0
>>    loop3   ONLINE       0     0     0
>>    loop2   ONLINE       0     0     0
>>
>> errors: No known data errors
>>
>>
>>
>>
>>
>> --
>>
>> Badi' Abdul-Wahid
>>
>> _______________________________________________
>> zfs-discuss mailing list
>> zfs-discuss at list.zfsonlinux.org
>> http://list.zfsonlinux.org/cgi-bin/mailman/listinfo/zfs-discuss
>>
>>
>


-- 

Badi' Abdul-Wahid
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.zfsonlinux.org/pipermail/zfs-discuss/attachments/20160504/9447e2db/attachment-0001.html>


More information about the zfs-discuss mailing list