Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

suspicious code in dsl_scan_ds_clone_swapped, related to sequential scan #9140

Closed
avg-I opened this issue Aug 8, 2019 · 3 comments · Fixed by #9163
Closed

suspicious code in dsl_scan_ds_clone_swapped, related to sequential scan #9140

avg-I opened this issue Aug 8, 2019 · 3 comments · Fixed by #9163

Comments

@avg-I
Copy link
Contributor

avg-I commented Aug 8, 2019

Sorry for not following the template.
This potential issue was found through a code review.
The code in question:

        if (scan_ds_queue_contains(scn, ds1->ds_object, &mintxg)) {
                scan_ds_queue_remove(scn, ds1->ds_object);
                scan_ds_queue_insert(scn, ds2->ds_object, mintxg);
        }
        if (scan_ds_queue_contains(scn, ds2->ds_object, &mintxg)) {
                scan_ds_queue_remove(scn, ds2->ds_object);
                scan_ds_queue_insert(scn, ds1->ds_object, mintxg);
        }

I see two potential problems.

If the first condition is true then the second condition will always be true as well.
That's because the first block replaces ds1->ds_object with ds2->ds_object in the queue.
So, the second block will do a reverse replacement.
I feel that it is unlikely that that is the intended behavior.

Also, I think that it is possible that both ds1->ds_object and ds2->ds_object may already be on the queue. If that's the case, then the code will attempt to do a duplicate insertion.

Maybe the code should first check if object IDs are on the queue and remember the results.
Then do necessary removals (if either ID is present, then both should be removed?).
And then do necessary insertions (insert the other ID for each originally present ID).

CC: @tcaputi @ahrens

avg-I added a commit to avg-I/zfs that referenced this issue Aug 14, 2019
The was incorrect with respect to swapping dataset IDs both in the
on-disk ZAP object and the in-memory queue.

In both cases, if ds1 was already present, then it would be first
replaced with ds2 and then ds would be replaced back with ds1.
Also, both cases did not properly handle a situation where both ds1 and
ds2 are already queued.  A duplicate insertion would be attempted and
its failure would result in a panic.

This change should fix openzfs#9140.

Signed-off-by: Andriy Gapon <avg@FreeBSD.org>
@avg-I
Copy link
Contributor Author

avg-I commented Aug 14, 2019

I believe that FreeBSD bug 239566 is the same issue. That crash happens when both ds1 and ds2 are already queue when dsl_scan_ds_clone_swapped() is called.

@tcaputi
Copy link
Contributor

tcaputi commented Aug 14, 2019

I believe you are correct. We should probably just have separate if / else statements for each case.

avg-I added a commit to avg-I/zfs that referenced this issue Aug 26, 2019
The was incorrect with respect to swapping dataset IDs both in the
on-disk ZAP object and the in-memory queue.

In both cases, if ds1 was already present, then it would be first
replaced with ds2 and then ds would be replaced back with ds1.
Also, both cases did not properly handle a situation where both ds1 and
ds2 are already queued.  A duplicate insertion would be attempted and
its failure would result in a panic.

This change should fix openzfs#9140.

Signed-off-by: Andriy Gapon <avg@FreeBSD.org>
avg-I added a commit to avg-I/zfs that referenced this issue Aug 29, 2019
The was incorrect with respect to swapping dataset IDs both in the
on-disk ZAP object and the in-memory queue.

In both cases, if ds1 was already present, then it would be first
replaced with ds2 and then ds would be replaced back with ds1.
Also, both cases did not properly handle a situation where both ds1 and
ds2 are already queued.  A duplicate insertion would be attempted and
its failure would result in a panic.

This change should fix openzfs#9140.

Signed-off-by: Andriy Gapon <avg@FreeBSD.org>
avg-I added a commit to avg-I/zfs that referenced this issue Sep 4, 2019
The was incorrect with respect to swapping dataset IDs both in the
on-disk ZAP object and the in-memory queue.

In both cases, if ds1 was already present, then it would be first
replaced with ds2 and then ds would be replaced back with ds1.
Also, both cases did not properly handle a situation where both ds1 and
ds2 are already queued.  A duplicate insertion would be attempted and
its failure would result in a panic.

This change should fix openzfs#9140.

Signed-off-by: Andriy Gapon <avg@FreeBSD.org>
behlendorf pushed a commit that referenced this issue Sep 18, 2019
The was incorrect with respect to swapping dataset IDs both in the
on-disk ZAP object and the in-memory queue.

In both cases, if ds1 was already present, then it would be first
replaced with ds2 and then ds would be replaced back with ds1.
Also, both cases did not properly handle a situation where both ds1 and
ds2 are already queued.  A duplicate insertion would be attempted and
its failure would result in a panic.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Andriy Gapon <avg@FreeBSD.org>
Closes #9140 
Closes #9163
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Sep 18, 2019
The was incorrect with respect to swapping dataset IDs both in the
on-disk ZAP object and the in-memory queue.

In both cases, if ds1 was already present, then it would be first
replaced with ds2 and then ds would be replaced back with ds1.
Also, both cases did not properly handle a situation where both ds1 and
ds2 are already queued.  A duplicate insertion would be attempted and
its failure would result in a panic.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Andriy Gapon <avg@FreeBSD.org>
Closes openzfs#9140 
Closes openzfs#9163
ahrens pushed a commit to ahrens/zfs that referenced this issue Sep 18, 2019
The was incorrect with respect to swapping dataset IDs both in the
on-disk ZAP object and the in-memory queue.

In both cases, if ds1 was already present, then it would be first
replaced with ds2 and then ds would be replaced back with ds1.
Also, both cases did not properly handle a situation where both ds1 and
ds2 are already queued.  A duplicate insertion would be attempted and
its failure would result in a panic.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Andriy Gapon <avg@FreeBSD.org>
Closes openzfs#9140 
Closes openzfs#9163
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Sep 18, 2019
The was incorrect with respect to swapping dataset IDs both in the
on-disk ZAP object and the in-memory queue.

In both cases, if ds1 was already present, then it would be first
replaced with ds2 and then ds would be replaced back with ds1.
Also, both cases did not properly handle a situation where both ds1 and
ds2 are already queued.  A duplicate insertion would be attempted and
its failure would result in a panic.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Andriy Gapon <avg@FreeBSD.org>
Closes openzfs#9140 
Closes openzfs#9163
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Sep 19, 2019
The was incorrect with respect to swapping dataset IDs both in the
on-disk ZAP object and the in-memory queue.

In both cases, if ds1 was already present, then it would be first
replaced with ds2 and then ds would be replaced back with ds1.
Also, both cases did not properly handle a situation where both ds1 and
ds2 are already queued.  A duplicate insertion would be attempted and
its failure would result in a panic.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Andriy Gapon <avg@FreeBSD.org>
Closes openzfs#9140 
Closes openzfs#9163
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Sep 23, 2019
The was incorrect with respect to swapping dataset IDs both in the
on-disk ZAP object and the in-memory queue.

In both cases, if ds1 was already present, then it would be first
replaced with ds2 and then ds would be replaced back with ds1.
Also, both cases did not properly handle a situation where both ds1 and
ds2 are already queued.  A duplicate insertion would be attempted and
its failure would result in a panic.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Andriy Gapon <avg@FreeBSD.org>
Closes openzfs#9140 
Closes openzfs#9163
tonyhutter pushed a commit that referenced this issue Sep 26, 2019
The was incorrect with respect to swapping dataset IDs both in the
on-disk ZAP object and the in-memory queue.

In both cases, if ds1 was already present, then it would be first
replaced with ds2 and then ds would be replaced back with ds1.
Also, both cases did not properly handle a situation where both ds1 and
ds2 are already queued.  A duplicate insertion would be attempted and
its failure would result in a panic.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Andriy Gapon <avg@FreeBSD.org>
Closes #9140
Closes #9163
@PaulZ-98
Copy link
Contributor

Hi all - just a follow up. This is not a theoretical issue. It happens without the fix. So thanks for taking care of it!

VERIFY3(avl_find(&scn->scn_queue, sds, &where) == ((void *)0)) failed (00000000185a2639 ==           (null))
2020-05-26T11:59:00.867230+00:00 kern.emerg usw2-bfyii-307 kernel: [6444694.715839] PANIC at dsl_scan.c:1101:scan_ds_queue_insert()

      1 7ddba5456e65ef9403f3e1c6d4ae2eb9  /proc/23127/stack
[<0>] spl_panic+0xfa/0x110 [spl]
[<0>] scan_ds_queue_insert+0x8e/0xc0 [zfs]
[<0>] dsl_scan_ds_clone_swapped+0x2dc/0x3c0 [zfs]
[<0>] dsl_dataset_clone_swap_sync_impl+0x9a2/0xa50 [zfs]
[<0>] dmu_recv_end_sync+0xd7/0x480 [zfs]
[<0>] dsl_sync_task_sync+0x11c/0x120 [zfs]
[<0>] dsl_pool_sync+0x295/0x340 [zfs]
[<0>] spa_sync+0x3d6/0x8c0 [zfs]
[<0>] txg_sync_thread+0x29b/0x340 [zfs]
[<0>] thread_generic_wrapper+0x74/0x90 [spl]
[<0>] kthread+0x105/0x140
[<0>] ret_from_fork+0x35/0x40
[<0>] 0xffffffffffffffff

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants