[zfs-discuss] zpool add and spa_namespace_lock

Richard Elling richard.elling at richardelling.com
Thu Nov 1 14:14:08 EDT 2018



> On Oct 31, 2018, at 3:28 PM, Gaurav Kumar <gauravk.18 at gmail.com> wrote:
> 
> Ok, now since I have an audience :), i will explain a bit more on the use case.
> 
> Our share is actually sharded across multiple pools and multiple VMs, with each pool having 4 virtual disks.

What does "shard" mean in this context? It is not a term we use with ZFS or storage in general. It is often used
in the context of databases and ZFS is not a database.

> We could have started with let's say 10 disks giving a bigger share size but the issue is that not all pools needs to be big and we want to reduce the no. of iscsi logins due to the time associated with it. So if out of 20 pools 3 pools see more incoming data only those should be expanded to add more disks. The way we do it is when pool alloc size reaches 80%, we raise a zed event which does bunch of infra work (outside of scope of zfs) and then call zpool add. We want to do this is a manner that there is no downtime from client perspective. This works fine as long spa_sync times are lower but if they are high then zpool add takes long time to finish while holding the spa_namespace_lock. Due to this other operations get stalled. 

I think you're misusing the concept of pooled storage, like ZFS and ReFS. A ZFS volume can be resized without
changing the pool configuration and this is how most people use it.
 -- richard

> 
> In order to find a solution, i wanted to understand the use and necessity of spa_namespace_lock in spa_vdev_enter as well as spa_config_update. Based on the understanding i want to see if i can drop the lock in between so that other things can make progress.
> 
> The other solution i have in mind is to limit the dirty data (rather than having 4Gb have may be 500MB) so that spa_sync runs faster since there is less data to sync. The downside being that there will be some impact on the throughput for the brief time period when expansion is happening. 
> 
> Hope, above explains the use case well. 
> 
> -Gaurav
> 
> On Wed, Oct 31, 2018 at 2:58 PM Richard Elling <richard.elling at richardelling.com <mailto:richard.elling at richardelling.com>> wrote:
> 
> 
>> On Oct 31, 2018, at 12:11 PM, Gaurav Kumar via zfs-discuss <zfs-discuss at list.zfsonlinux.org <mailto:zfs-discuss at list.zfsonlinux.org>> wrote:
>> 
>> Hi, 
>> 
>> I am issuing zpool add (triggering via zed) while the load is running in the VM.
> 
> This sounds like a crazy idea, well outside of the expected use of zpool add. Can you explain
> what you're trying to accomplish here, perhaps there is a better way...
>  -- richard
> 
>> Storage is all virtualized and backed by a different layer and accessed via iscsi. Zpool add over its course waits for 3 txgs to get synced (spa_vdev_enter and spa_config_update)..The backend is really slow and each txg takes around 30 -40 sec. Hence zpool add command can take minutes to complete in worst case. During this time, spa_namespace_lock is held and none of the other commands (more importantly spa_import that can happen due to HA) will proceed which is a big concern. Looking at spa_namespace_lock, I don’t fully understand why we need to hold this lock during config_update of a pool. If there is any locking needed shouldn’t it be per pool basis and not across multiple pools?
>> 
>> -Gaurav 
>> _______________________________________________
>> zfs-discuss mailing list
>> zfs-discuss at list.zfsonlinux.org <mailto:zfs-discuss at list.zfsonlinux.org>
>> http://list.zfsonlinux.org/cgi-bin/mailman/listinfo/zfs-discuss <http://list.zfsonlinux.org/cgi-bin/mailman/listinfo/zfs-discuss>
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.zfsonlinux.org/pipermail/zfs-discuss/attachments/20181101/961d420d/attachment.html>


More information about the zfs-discuss mailing list