Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge: Fix double free issue after merging the diffs #6589

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

vincenzopalazzo
Copy link
Contributor

This commit addresses a double free issue that occurs when merging two diffs and free the diff objects.

The problem was caused from copying the pointer using a vector swap git function and then make a deep free of the original vector.

This operation made on a vector of pointers deallocated all the pointers inside the new vector where we swap the content to.

As a result, an invalid object was returned to the user.

The valgrid report.

==256010== HEAP SUMMARY:
==256010==     in use at exit: 696,247 bytes in 11,244 blocks
==256010==   total heap usage: 50,123 allocs, 38,880 frees, 4,419,142 bytes allocated
==256010==
==256010== Searching for pointers to 11,244 not-freed blocks
==256010== Checked 850,720 bytes
==256010==
==256010== 112 bytes in 1 blocks are definitely lost in loss record 69 of 163
==256010==    at 0x4841848: malloc (vg_replace_malloc.c:431)
==256010==    by 0x48964FD: stdalloc__malloc.lto_priv.0 (stdalloc.c:22)
==256010==    by 0x48D4C34: git_diff__delta_dup (diff_tform.c:23)
==256010==    by 0x48DAA22: git_diff__merge_like_cgit (diff_tform.c:81)
==256010==    by 0x48D5C14: git_diff__merge (diff_tform.c:160)
==256010==    by 0x1092A6: main (in /home/vincent/Github/libgit2/build/a.out)
==256010==
==256010== LEAK SUMMARY:
==256010==    definitely lost: 112 bytes in 1 blocks
==256010==    indirectly lost: 0 bytes in 0 blocks
==256010==      possibly lost: 0 bytes in 0 blocks
==256010==    still reachable: 696,135 bytes in 11,243 blocks
==256010==         suppressed: 0 bytes in 0 blocks
==256010== Reachable blocks (those to which a pointer was found) are not shown.
==256010== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==256010==
==256010== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)
==256010==
==256010== 1 errors in context 1 of 2:
==256010== Invalid free() / delete / delete[] / realloc()
==256010==    at 0x484412F: free (vg_replace_malloc.c:974)
==256010==    by 0x490D5B0: patch_parsed__free.part.0.lto_priv.0 (patch_parse.c:1150)
==256010==    by 0x48CC454: diff_parsed_free (diff_parse.c:21)
==256010==    by 0x1092EA: main (in /home/vincent/Github/libgit2/build/a.out)
==256010==  Address 0x59e55b0 is 0 bytes inside a block of size 112 free'd
==256010==    at 0x484412F: free (vg_replace_malloc.c:974)
==256010==    by 0x48A17CD: git_vector_free_deep.part.0 (vector.c:99)
==256010==    by 0x48D5A2B: UnknownInlinedFun (vector.c:95)
==256010==    by 0x48D5A2B: git_diff__merge (diff_tform.c:193)
==256010==    by 0x1092A6: main (in /home/vincent/Github/libgit2/build/a.out)
==256010==  Block was alloc'd at
==256010==    at 0x48469B3: calloc (vg_replace_malloc.c:1554)
==256010==    by 0x489652D: stdalloc__calloc.lto_priv.0 (stdalloc.c:42)
==256010==    by 0x48CCDE9: UnknownInlinedFun (patch_parse.c:1184)
==256010==    by 0x48CCDE9: git_diff_from_buffer (diff_parse.c:86)
==256010==    by 0x109219: main (in /home/vincent/Github/libgit2/build/a.out)
==256010==
==256010== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)

Fixes: #6588
Reported-by: Eric Huss

@vincenzopalazzo

This comment was marked as off-topic.

@ethomson
Copy link
Member

The git_vector_swap followed by a free should be safe and correct. This will, effectively, move the newly generated data to the diff, swapping it with the old data. The old data is then freed.

I think that the problem is not with the swap, it's with the git_vector_free_deep. This is an old bug that's been lurking here for a while. libgit2 originally generated diffs and patches by comparing two blobs of data. Eventually, libgit2 learned how to read a textual diff and turn it into a similar model -- so that you could (say) apply a patch file.

But this model is similar but not exactly the same. So a diff and a patch today are really indirections over a generated diff (and patch) and a parsed diff (and patch). In particular, one difference is the deltas. See diff_generated_free (in diff_generate.c) and diff_parsed_free (in diff_parse.c).

It looks like diff_merge did not catch up with these changes when they happened.

The problem -- I think -- is that parsed patches own the deltas for a parsed diff, the diff itself doesn't own the deltas. (That's only true for generated diffs.)

So neither the existing code (that does the swap and free) nor the changes in this PR (which is the moral equivalent of the swap and free) is quite correct. When the parsed delta is freed (in diff_parsed_free the patches will still point to the deltas that were freed by git_diff__merge.

I haven't tested this yet - but I wonder if we should provide a deltas_free for both types of diffs, and call that here instead of vector_free_deep. 🤔

@ethomson
Copy link
Member

Here's more concretely what I'm thinking -- this builds on your PR:

commit c852d166c02ca93e1753f67ee56b6dd3ba296ab7
Author: Edward Thomson <ethomson@edwardthomson.com>
Date:   Fri Jul 14 17:05:21 2023 +0100

    diff: provide a delta free function, use it in merge
    
    `git_diff_merge` needs to free the old deltas, which are created and
    managed differently for parsed vs generated deltas. Provide a function
    to free them and use it in the diff merge functionality.

diff --git a/src/libgit2/diff.h b/src/libgit2/diff.h
index f21b2764505e..2cd4feccb816 100644
--- a/src/libgit2/diff.h
+++ b/src/libgit2/diff.h
@@ -47,6 +47,7 @@ struct git_diff {
 	int (*entrycomp)(const void *a, const void *b);
 
 	int (*patch_fn)(git_patch **out, git_diff *diff, size_t idx);
+	void (*free_deltas_fn)(git_diff *diff);
 	void (*free_fn)(git_diff *diff);
 };
 
diff --git a/src/libgit2/diff_generate.c b/src/libgit2/diff_generate.c
index 78fe510e7482..6d52f34c24ae 100644
--- a/src/libgit2/diff_generate.c
+++ b/src/libgit2/diff_generate.c
@@ -424,15 +424,23 @@ static void diff_set_ignore_case(git_diff *diff, bool ignore_case)
 	git_vector_sort(&diff->deltas);
 }
 
+static void diff_generated_free_deltas(git_diff *d)
+{
+	git_diff_generated *diff = (git_diff_generated *)d;
+
+	git_vector_free_deep(&diff->base.deltas);
+	git_pool_clear(&diff->base.pool);
+}
+
 static void diff_generated_free(git_diff *d)
 {
 	git_diff_generated *diff = (git_diff_generated *)d;
 
+	diff_generated_free_deltas(d);
+
 	git_attr_session__free(&diff->base.attrsession);
-	git_vector_free_deep(&diff->base.deltas);
 
 	git_pathspec__vfree(&diff->pathspec);
-	git_pool_clear(&diff->base.pool);
 
 	git__memzero(diff, sizeof(*diff));
 	git__free(diff);
@@ -459,6 +467,7 @@ static git_diff_generated *diff_generated_alloc(
 	diff->base.old_src = old_iter->type;
 	diff->base.new_src = new_iter->type;
 	diff->base.patch_fn = git_patch_generated_from_diff;
+	diff->base.free_deltas_fn = diff_generated_free_deltas;
 	diff->base.free_fn = diff_generated_free;
 	git_attr_session__init(&diff->base.attrsession, repo);
 	memcpy(&diff->base.opts, &dflt, sizeof(git_diff_options));
diff --git a/src/libgit2/diff_parse.c b/src/libgit2/diff_parse.c
index 04603969e408..c95b4c51dbe1 100644
--- a/src/libgit2/diff_parse.c
+++ b/src/libgit2/diff_parse.c
@@ -11,7 +11,7 @@
 #include "patch.h"
 #include "patch_parse.h"
 
-static void diff_parsed_free(git_diff *d)
+static void diff_parsed_free_deltas(git_diff *d)
 {
 	git_diff_parsed *diff = (git_diff_parsed *)d;
 	git_patch *patch;
@@ -24,6 +24,13 @@ static void diff_parsed_free(git_diff *d)
 
 	git_vector_free(&diff->base.deltas);
 	git_pool_clear(&diff->base.pool);
+}
+
+static void diff_parsed_free(git_diff *d)
+{
+	git_diff_parsed *diff = (git_diff_parsed *)d;
+
+	diff_parsed_free_deltas(d);
 
 	git__memzero(diff, sizeof(*diff));
 	git__free(diff);
@@ -43,6 +50,7 @@ static git_diff_parsed *diff_parsed_alloc(git_oid_t oid_type)
 	diff->base.pfxcomp = git__prefixcmp;
 	diff->base.entrycomp = git_diff__entry_cmp;
 	diff->base.patch_fn = git_patch_parsed_from_diff;
+	diff->base.free_deltas_fn = diff_parsed_free_deltas;
 	diff->base.free_fn = diff_parsed_free;
 
 	if (git_diff_options_init(&diff->base.opts, GIT_DIFF_OPTIONS_VERSION) < 0) {
diff --git a/src/libgit2/diff_tform.c b/src/libgit2/diff_tform.c
index a651d47b861d..65181b13d40b 100644
--- a/src/libgit2/diff_tform.c
+++ b/src/libgit2/diff_tform.c
@@ -174,27 +174,30 @@ int git_diff__merge(
 			break;
 	}
 
-	if (!error) {
-		onto->deltas = onto_new;
-		onto->pool = onto_pool;
-
-		if ((onto->opts.flags & GIT_DIFF_REVERSE) != 0)
-			onto->old_src = from->old_src;
-		else
-			onto->new_src = from->new_src;
-
-		/* prefix strings also come from old pool, so recreate those.*/
-		onto->opts.old_prefix =
-			git_pool_strdup_safe(&onto->pool, onto->opts.old_prefix);
-		onto->opts.new_prefix =
-			git_pool_strdup_safe(&onto->pool, onto->opts.new_prefix);
-	} else {
-		/* In the case of an error, it is safe to perform a free the objects of the vector. */
+	if (error) {
 		git_vector_free_deep(&onto_new);
 		git_pool_clear(&onto_pool);
+
+		return error;
 	}
 
-	return error;
+	onto->free_deltas_fn(onto);
+
+	onto->deltas = onto_new;
+	onto->pool = onto_pool;
+
+	if ((onto->opts.flags & GIT_DIFF_REVERSE) != 0)
+		onto->old_src = from->old_src;
+	else
+		onto->new_src = from->new_src;
+
+	/* prefix strings also come from old pool, so recreate those.*/
+	onto->opts.old_prefix =
+		git_pool_strdup_safe(&onto->pool, onto->opts.old_prefix);
+	onto->opts.new_prefix =
+		git_pool_strdup_safe(&onto->pool, onto->opts.new_prefix);
+
+	return 0;
 }
 
 int git_diff_merge(git_diff *onto, const git_diff *from)

A review of this ☝️ would be super helpful. Does this make sense to you? Does this work? What else am I missing, if anything?

@vincenzopalazzo
Copy link
Contributor Author

The git_vector_swap followed by a free should be safe and correct. This will, effectively, move the newly generated data to the diff, swapping it with the old data. The old data is then freed.

Yes it is correct, the swap does not free any data

I think that the problem is not with the swap. It's with the git_vector_free_deep. This is an old bug that's been lurking here for a while.

Correct, if we swap the array A in the array B and then we git_vector_free_deep, we delete the pointer in the array A and B. This looked a little odd to me, but now I understand that it has a historical reason.

A review of this point_up would be super helpful. Does this make sense to you? Does this work? What else am I missing, if anything?

Mh I ready it and tested, but I need to take a look with a fresh mind tomorrow. I will report my though in one day or so

This commit addresses a double free issue that occurs when
merging two diffs and free the diff objects.

The problem was caused from copying the pointer using a vector swap
git function and then make a deep free of the original vector.

This operation made on a vector of pointers deallocated all the
pointers inside the new vector where we swap the content to.

As a result, an invalid object was returned to the user.

The valgrid report.

```
==256010== HEAP SUMMARY:
==256010==     in use at exit: 696,247 bytes in 11,244 blocks
==256010==   total heap usage: 50,123 allocs, 38,880 frees, 4,419,142 bytes allocated
==256010==
==256010== Searching for pointers to 11,244 not-freed blocks
==256010== Checked 850,720 bytes
==256010==
==256010== 112 bytes in 1 blocks are definitely lost in loss record 69 of 163
==256010==    at 0x4841848: malloc (vg_replace_malloc.c:431)
==256010==    by 0x48964FD: stdalloc__malloc.lto_priv.0 (stdalloc.c:22)
==256010==    by 0x48D4C34: git_diff__delta_dup (diff_tform.c:23)
==256010==    by 0x48DAA22: git_diff__merge_like_cgit (diff_tform.c:81)
==256010==    by 0x48D5C14: git_diff__merge (diff_tform.c:160)
==256010==    by 0x1092A6: main (in /home/vincent/Github/libgit2/build/a.out)
==256010==
==256010== LEAK SUMMARY:
==256010==    definitely lost: 112 bytes in 1 blocks
==256010==    indirectly lost: 0 bytes in 0 blocks
==256010==      possibly lost: 0 bytes in 0 blocks
==256010==    still reachable: 696,135 bytes in 11,243 blocks
==256010==         suppressed: 0 bytes in 0 blocks
==256010== Reachable blocks (those to which a pointer was found) are not shown.
==256010== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==256010==
==256010== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)
==256010==
==256010== 1 errors in context 1 of 2:
==256010== Invalid free() / delete / delete[] / realloc()
==256010==    at 0x484412F: free (vg_replace_malloc.c:974)
==256010==    by 0x490D5B0: patch_parsed__free.part.0.lto_priv.0 (patch_parse.c:1150)
==256010==    by 0x48CC454: diff_parsed_free (diff_parse.c:21)
==256010==    by 0x1092EA: main (in /home/vincent/Github/libgit2/build/a.out)
==256010==  Address 0x59e55b0 is 0 bytes inside a block of size 112 free'd
==256010==    at 0x484412F: free (vg_replace_malloc.c:974)
==256010==    by 0x48A17CD: git_vector_free_deep.part.0 (vector.c:99)
==256010==    by 0x48D5A2B: UnknownInlinedFun (vector.c:95)
==256010==    by 0x48D5A2B: git_diff__merge (diff_tform.c:193)
==256010==    by 0x1092A6: main (in /home/vincent/Github/libgit2/build/a.out)
==256010==  Block was alloc'd at
==256010==    at 0x48469B3: calloc (vg_replace_malloc.c:1554)
==256010==    by 0x489652D: stdalloc__calloc.lto_priv.0 (stdalloc.c:42)
==256010==    by 0x48CCDE9: UnknownInlinedFun (patch_parse.c:1184)
==256010==    by 0x48CCDE9: git_diff_from_buffer (diff_parse.c:86)
==256010==    by 0x109219: main (in /home/vincent/Github/libgit2/build/a.out)
==256010==
==256010== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)
```

Link: libgit2#6588
Reported-by: Eric Huss
Signed-off-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com>
@vincenzopalazzo
Copy link
Contributor Author

vincenzopalazzo commented Jul 22, 2023

Ok @ethomson I am back with a little bit more analysis on what the diff that you provide! (Sorry for delay btw)

A review of this point_up would be super helpful. Does this make sense to you?

Yeah, this makes sense. I thought that my solution was not the real one because I needed the free deltas function, but with your help, I could provide a clear solution for it, thanks!.

Does this work?

Mh Ni, I guess we should also move the last allocation from the pool to because if we free the git_pool before and then we use a string that is inside this pool, we have an invalid free (thanks to Valgrind)

I guess this should be a step forward version

--- a/src/libgit2/diff_tform.c
+++ b/src/libgit2/diff_tform.c
@@ -174,27 +174,28 @@ int git_diff__merge(
                        break;
        }
 
-       if (!error) {
-               onto->deltas = onto_new;
-               onto->pool = onto_pool;
-
-               if ((onto->opts.flags & GIT_DIFF_REVERSE) != 0)
-                       onto->old_src = from->old_src;
-               else
-                       onto->new_src = from->new_src;
-
-               /* prefix strings also come from old pool, so recreate those.*/
-               onto->opts.old_prefix =
-                       git_pool_strdup_safe(&onto->pool, onto->opts.old_prefix);
-               onto->opts.new_prefix =
-                       git_pool_strdup_safe(&onto->pool, onto->opts.new_prefix);
-       } else {
-               /* In the case of an error, it is safe to perform a free the objects of the vector. */
+       if (error) {
                git_vector_free_deep(&onto_new);
                git_pool_clear(&onto_pool);
+
+               return error;
        }
+       onto->opts.old_prefix =
+               git_pool_strdup_safe(&onto_pool, onto->opts.old_prefix);
+       onto->opts.new_prefix =
+               git_pool_strdup_safe(&onto_pool, onto->opts.new_prefix);
 
-       return error;
+       onto->free_deltas_fn(onto);
+
+       onto->deltas = onto_new;
+       onto->pool = onto_pool;
+
+       if ((onto->opts.flags & GIT_DIFF_REVERSE) != 0)
+               onto->old_src = from->old_src;
+       else
+               onto->new_src = from->new_src;
+
+       return 0;
 }

What else am I missing, if anything?

I think there is a "forget to free" error from Valgrid while running the test that I added, but I cannot track it down. We should free all the deltas inside the free deltas callback.

This is the Valgrid report

==70853== 112 bytes in 1 blocks are definitely lost in loss record 4 of 6
==70853==    at 0x4841848: malloc (vg_replace_malloc.c:431)
==70853==    by 0x34DF80: stdalloc__malloc (stdalloc.c:20)
==70853==    by 0x3AE6A4: git__malloc (alloc.h:19)
==70853==    by 0x3AE8F5: git_diff__delta_dup (diff_tform.c:23)
==70853==    by 0x3AEA80: git_diff__merge_like_cgit (diff_tform.c:81)
==70853==    by 0x3AEEF2: git_diff__merge (diff_tform.c:160)
==70853==    by 0x3AF148: git_diff_merge (diff_tform.c:203)
==70853==    by 0x1CF331: test_diff_patch__free_diff_after_git_diff_merge (patch.c:729)
==70853==    by 0x14D245: clar_run_test (clar.c:305)
==70853==    by 0x14D653: clar_run_suite (clar.c:401)
==70853==    by 0x14DDFD: clar_test_run (clar.c:596)
==70853==    by 0x1542C5: main (main.c:33)

For now I am pushing your diff + my change inside the commit, and we can see what the CI tell us

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Double free after calling git_diff_merge
2 participants