Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test failure in Debian bullseye: session/test-01.xml... .(bcacfghi vs. acbcfghi) FAILED #22

Closed
pabs3 opened this issue Oct 10, 2019 · 12 comments · May be fixed by #24 or #25
Closed

test failure in Debian bullseye: session/test-01.xml... .(bcacfghi vs. acbcfghi) FAILED #22

pabs3 opened this issue Oct 10, 2019 · 12 comments · May be fixed by #24 or #25

Comments

@pabs3
Copy link

pabs3 commented Oct 10, 2019

Forwarding Debian bug #935614:

The first session test fails with an error on Debian bullseye/sid (full build log:

session/test-01.xml... .(bcacfghi vs. acbcfghi) FAILED

The same version builds fine on Debian buster.

We could use some help in figuring out why the test fails.

@NK-Analyst
Copy link

NK-Analyst commented Oct 16, 2019

Hi, @pabs3

Please, try to debug this file https://github.com/gobby/libinfinity/blob/master/test/inf-test-text-session.c (function perform_single_test in line 136). I think something is wrong in code result = inf_text_chunk_equal(test_chunk, final);
P.S. Code for function inf_text_chunk_equal exists in file https://github.com/gobby/libinfinity/blob/master/libinftext/inf-text-chunk.c

@pkern
Copy link
Contributor

pkern commented Oct 16, 2019

I mean sure. The thing is: The algorithm changed its result and it's literally stopping corruption from getting shipped. Someone needs to take a look as to what changed so that the ordering of the first two chunks is consistently wrong. The two requests are effectively reordered when they are not expected to me:

 <request time="" user="2">
  <insert pos="2">ac</insert>
 </request>

 <request time="" user="3">
  <insert pos="3">bc</insert>
 </request>

If it wouldn't fail that consistently I'd have said that something started randomizing but that's not what we see in Debian. @aburgm Do you happen to have an idea?

@zhsj
Copy link

zhsj commented Dec 15, 2019

Not sure why previously the tests are passed, the requests are randomized because of the following code

/* TODO: This can be optimized */
/* shuffle random */
for(item = permutation->next; item != NULL; item = g_slist_next(item))
{
++ dist;
dist_item = g_slist_nth(permutation, g_rand_int(rand) % (dist + 1));
rval = g_rand_int(rand) % (dist + 1);
temp = item->data;
item->data = dist_item->data;
dist_item->data = temp;
}

I think the test is expected to fail, since order of the two requests is indeterminate.

<request time="" user="2">
<insert pos="2">ac</insert>
</request>
<request time="" user="3">
<insert pos="3">bc</insert>
</request>

@aburgm
Copy link
Contributor

aburgm commented Dec 15, 2019 via email

@pkern
Copy link
Contributor

pkern commented Dec 27, 2019

So even after staring at the code for a long time and after a lot of printf debugging... In the failing permutation bc is already committed to the chunk as a segment at offset 0. It feels like the chunk code is entirely in its right to place ac after bc given that it does not have more information about the original offset. Which suggests that adopted is failing at providing sufficient state information and transforming the incoming requests in such a way that ac can actually go first? The successful permutations insert that string first, unsurprisingly. And of course the kicker is that it used to work, which usually points to changes in glib.

The thing that irritates me the most here is where the code would preserve the state. It does not seem to buffer the request - but then all arrive at the same "time" anyway, so why would it have to. The state vector that it preserves seems to revolve primarily about the time as expressed in count of operations.

Shifting the operations one char into the string has the same failure mode as well, bc's offset is squashed to the first character remaining after the delete.

Specifically the following premutations trigger the problem: (bc|delete) (the remaining one of the two) ac

@pkern
Copy link
Contributor

pkern commented Feb 23, 2020

As expected this is due to a change in glib - and I thought I was looking at that commit before, but now I finally bisected it:

0dee62973cd0e269c0d7dd194d6bdc0cd4a732f9 is the first bad commit
commit 0dee62973cd0e269c0d7dd194d6bdc0cd4a732f9
Author: Hans Petter Jansson <hpj@cl.no>
Date:   Tue Jul 10 12:21:59 2018 +0200

    ghash: Fix poor performance with densely populated keyspaces
    
    Sequential integers would be densely packed in the table, leaving the
    high-index buckets unused and causing abnormally long probes for many
    operations. This was especially noticeable with failed lookups and
    when "aging" the table by repeatedly inserting and removing integers
    from a narrow range using g_direct_hash() as the hashing function.
    
    The solution is to multiply the hash by a small prime before applying
    the modulo. The compiler optimizes this to a few left shifts and adds, so
    the constant overhead is small, and the entries will be spread out,
    yielding a lower average probe count.

:040000 040000 97ed217496c64783af2455ef39ccd634dfd4bba7 d599c7dcd1c4112d5dba5edb50deb4091eaf6fd9 M	glib

The full commit:

commit 0dee62973cd0e269c0d7dd194d6bdc0cd4a732f9 (HEAD, refs/bisect/bad)
Author: Hans Petter Jansson <hpj@cl.no>
Date:   Tue Jul 10 12:21:59 2018 +0200

    ghash: Fix poor performance with densely populated keyspaces
    
    Sequential integers would be densely packed in the table, leaving the
    high-index buckets unused and causing abnormally long probes for many
    operations. This was especially noticeable with failed lookups and
    when "aging" the table by repeatedly inserting and removing integers
    from a narrow range using g_direct_hash() as the hashing function.
    
    The solution is to multiply the hash by a small prime before applying
    the modulo. The compiler optimizes this to a few left shifts and adds, so
    the constant overhead is small, and the entries will be spread out,
    yielding a lower average probe count.

diff --git a/glib/ghash.c b/glib/ghash.c
index 18c7185ef..888e67e00 100644
--- a/glib/ghash.c
+++ b/glib/ghash.c
@@ -334,6 +334,16 @@ g_hash_table_set_shift_from_size (GHashTable *hash_table, gint size)
   g_hash_table_set_shift (hash_table, shift);
 }
 
+static inline guint
+g_hash_table_hash_to_index (GHashTable *hash_table, guint hash)
+{
+  /* Multiply the hash by a small prime before applying the modulo. This
+   * prevents the table from becoming densely packed, even with a poor hash
+   * function. A densely packed table would have poor performance on
+   * workloads with many failed lookups or a high degree of churn. */
+  return (hash * 11) % hash_table->mod;
+}
+
 /*
  * g_hash_table_lookup_node:
  * @hash_table: our #GHashTable
@@ -382,7 +392,7 @@ g_hash_table_lookup_node (GHashTable    *hash_table,
 
   *hash_return = hash_value;
 
-  node_index = hash_value % hash_table->mod;
+  node_index = g_hash_table_hash_to_index (hash_table, hash_value);
   node_hash = hash_table->hashes[node_index];
 
   while (!HASH_IS_UNUSED (node_hash))
@@ -602,7 +612,7 @@ g_hash_table_resize (GHashTable *hash_table)
       if (!HASH_IS_REAL (node_hash))
         continue;
 
-      hash_val = node_hash % hash_table->mod;
+      hash_val = g_hash_table_hash_to_index (hash_table, node_hash);
 
       while (!HASH_IS_UNUSED (new_hashes[hash_val]))
         {

I also validated that 2.62.5 and 2.63.5 are still broken.

@pkern
Copy link
Contributor

pkern commented Feb 23, 2020

That being said I am still unsure where the disambiguation of the user ID happens in a way that the hash table behavior matters.

@jscott0
Copy link

jscott0 commented Jun 2, 2020

I checked that GLib 2.64.3 is also still broken.

SunilMohanAdapa added a commit to SunilMohanAdapa/libinfinity that referenced this issue Jun 11, 2020
This is primarily to help test cases which assume that the adopted algorithm
prioritizes the users in the exact order they appear in the test cases (and get
inserted into the session in that order). With older glib version, the five
users being inserted into the user table hash table by the tests happened to
retain the order. With latest glib, due to a minor tweak in hashing strategy,
the order of iteration is different from the order of insertion failing the
tests.

In addition, GHashTable makes no guarantees about the stability of items when
iterating multiple times. Since the algorithm is sensitive to order of users, it
is best to return users in an order that is consistent over multiple calls and
stable over insert/remove operations.

This patch sorts the keys during iteration of users in the user table.

Closes: gobby#22.

Signed-off-by: Sunil Mohan Adapa <sunil@medhas.org>
SunilMohanAdapa added a commit to SunilMohanAdapa/libinfinity that referenced this issue Jun 11, 2020
This is primarily to help test cases which assume that the adopted algorithm
prioritizes the users in the exact order they appear in the test cases (and get
inserted into the session in that order). With older glib version, the five
users being inserted into the user table hash table by the tests happened to
retain the order. With latest glib, due to a minor tweak in hashing strategy,
the order of iteration is different from the order of insertion failing the
tests.

In addition, GHashTable makes no guarantees about the stability of items when
iterating multiple times. Since the algorithm is sensitive to order of users, it
is best to return users in an order that is consistent over multiple calls and
stable over insert/remove operations.

This patch sorts the keys during iteration of users in the user table.

Closes: gobby#22.

Signed-off-by: Sunil Mohan Adapa <sunil@medhas.org>
SunilMohanAdapa added a commit to SunilMohanAdapa/libinfinity that referenced this issue Jun 11, 2020
This is primarily to help test cases which assume that the adopted algorithm
prioritizes the users in the exact reverse order they appear in the test
cases (and get inserted into the session in reverse order). With older glib
version, the five users being inserted happened to return the order expected by
the tests. With latest glib, due to a minor tweak in hashing strategy, the
insertion leads to unsorted list leading to failed tests.

In addition, GHashTable makes no guarantees about the stability of items when
iterating multiple times. Since the algorithm is sensitive to order of users, it
is best to return users in an order that is consistent over multiple calls and
stable over insert/remove operations.

This patch maintains a sorted list of user ids and uses it for iteration.

Closes: gobby#22.

Signed-off-by: Sunil Mohan Adapa <sunil@medhas.org>
SunilMohanAdapa added a commit to SunilMohanAdapa/libinfinity that referenced this issue Jun 11, 2020
This is primarily to help test cases which assume that the adopted algorithm
prioritizes the users in the exact reverse order they appear in the test
cases (and get inserted into the session in reverse order). With older glib
version, the five users being inserted happened to return the order expected by
the tests. With latest glib, due to a minor tweak in hashing strategy, the
insertion leads to unsorted list leading to failed tests.

In addition, GHashTable makes no guarantees about the stability of items when
iterating multiple times. Since the algorithm is sensitive to order of users, it
is best to return users in an order that is consistent over multiple calls and
stable over insert/remove operations.

This patch sorts the keys during iteration of users in the user table.

Closes: gobby#22.

Signed-off-by: Sunil Mohan Adapa <sunil@medhas.org>
@SunilMohanAdapa
Copy link
Contributor

Apparently, all the tests list users in the sorted order 1 to 5 and expect the algorithm to give priority accordingly when expecting the output. When the values are inserted into the user table, with earlier glib, the iteration order was always 1 to 5 (by chance) even though they were inserted 5 to 1. GHashTable makes no such promises. In the newer glib due to a minor tweak to the hash algorithm, this order is disrupted and the tests started to fail.

Further, GHashTable also does not make any guarantees about consistent ordering when iterating multiple times.

A possible solution seems to be to iterate the user table hash table in sorted order. I proposed fixes with two different approaches in #24 and #25.

Please consider merging a fix soon and making a release. I would love to see infinoted return to Debian testing (and FreedomBox testing).

@pabs3
Copy link
Author

pabs3 commented Jul 7, 2020

infinoted has reached testing again as @pkern has imported the patches.

This issue still needs an upstream release though.

@pabs3
Copy link
Author

pabs3 commented Jul 7, 2020

@pkern: you appear to have commit access (or had in 2017), would you mind merging the fixes?

@pkern pkern closed this as completed in bd546d0 Aug 16, 2020
@pabs3
Copy link
Author

pabs3 commented Sep 6, 2020

@aburgm @pkern: could we get a new release so that non-Debian distros can get the fixes?

Also #24 and #25 should get closed since the commits are now in the master branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants