Introduce custom hash table data structures. #3940

chandlerc · 2024-05-07T10:21:43Z

The hash table design is heavily based on Abseil's "Swiss Tables" design. It uses an array of bytes storing metadata about each entry and an array of entries where each is a pair of key and value. The metadata byte consists of 7-bits of hash of the key (distinct from the bits used to index the table), and one bit indicating the presence of a special entry -- either empty or deleted.

TODO: document the design and PR more fully

josh11b

I wanted to give feedback on map.h early, before you update set.h

common/map.h

common/hashing.h

common/map.h

chandlerc

Thanks, definitely useful to get map.h tidied up before I port it to set.h. I think I've responded to all the comments there.

common/map.h

common/hashing.h

common/map.h

josh11b

Sending what comments I have so far.

common/raw_hashtable.h

common/map.h

common/raw_hashtable.h

chandlerc

Thanks for all the feedback so far, PTAL.

common/raw_hashtable.h

common/map.h

common/raw_hashtable.h

josh11b

Finished a pass through raw_hashtable.h

common/raw_hashtable.h

chandlerc

PTAL!

Beyond inline replies, also was able to simplify the specialization strategy for the table type and improved some other comments spotted as I was working.

Also, think this is far enough along in review I'm moving it out of draft.

common/raw_hashtable.h

chandlerc

PTAL!

Beyond inline replies, also was able to simplify the specialization strategy for the table type and improved some other comments spotted as I was working.

Also, think this is far enough along in review I'm moving it out of draft.

common/raw_hashtable.h

The hash table design is heavily based on Abseil's ["Swiss Tables"][swiss-tables] design. It uses an array of bytes storing metadata about each entry and an array of entries where each is a pair of key and value. The metadata byte consists of 7-bits of hash of the key (distinct from the bits used to index the table), and one bit indicating the presence of a special entry -- either empty or deleted. [swiss-tables]: https://abseil.io/about/design/swisstables See the comments in `raw_hashtable.h` for a detailed overview of the design. Co-authored-by: josh11b <15258583+josh11b@users.noreply.github.com>

chandlerc · 2024-05-28T03:49:52Z

So, I'm most of the way through adding really nice support for handling stateful keys like indices into vectors and such. It turned out to be necessary to even move the toolchain over to these data structures as it also lets us use something other than operator== for equality, and we need that for APFloat keys. There are a few interesting things that emerged:

I've ended up removing unnecessary functionality that also makes the stateful case unimplementable.
I've found and cleaned up a number of innocuous inconsistencies in the code.
I've been able to implement the test you asked for where we control the hashing and force every key to collide. I can implement even more of these if desired.

Wanted to both let you know @josh11b about my progress there, and also ask whether you'd like me to merge this into this code review or keep it as a follow-up. I can easily manage either way.

josh11b · 2024-05-28T18:58:19Z

So, I'm most of the way through adding really nice support for handling stateful keys like indices into vectors and such. It turned out to be necessary to even move the toolchain over to these data structures as it also lets us use something other than operator== for equality, and we need that for APFloat keys. There are a few interesting things that emerged:

I've ended up removing unnecessary functionality that also makes the stateful case unimplementable.

I've found and cleaned up a number of innocuous inconsistencies in the code.

I've been able to implement the test you asked for where we control the hashing and force every key to collide. I can implement even more of these if desired.

Wanted to both let you know @josh11b about my progress there, and also ask whether you'd like me to merge this into this code review or keep it as a follow-up. I can easily manage either way.

I'm fine either way. If it simplifies the files haven't reviewed yet, it would be a benefit.

This allows customizing both the hash and equality comparison. It also does so in a way that enables stateful key contexts such as indices into a specific vector. This works by requiring the context to be packaged on every API accepting a key. When the context is stateless and can be default constructed, this works via a default argument. When it is stateful, the argument must be provided explicitly. This allows us to add a stress test that forces colliding hashes in egregious ways, as well as allowing the staeful hashing. There are also a number of basic cleanups to code that were uncovered while working on this.

chandlerc · 2024-05-29T09:01:32Z

So, I'm most of the way through adding really nice support for handling stateful keys like indices into vectors and such. It turned out to be necessary to even move the toolchain over to these data structures as it also lets us use something other than operator== for equality, and we need that for APFloat keys. There are a few interesting things that emerged:

I've ended up removing unnecessary functionality that also makes the stateful case unimplementable.

I've found and cleaned up a number of innocuous inconsistencies in the code.

I've been able to implement the test you asked for where we control the hashing and force every key to collide. I can implement even more of these if desired.

Wanted to both let you know @josh11b about my progress there, and also ask whether you'd like me to merge this into this code review or keep it as a follow-up. I can easily manage either way.

I'm fine either way. If it simplifies the files haven't reviewed yet, it would be a benefit.

After discussion, it seems a bit borderline with pros and cons, but on the whole probably simplifies the review. Added and pushed. I did make it a clean follow-up commit so that it is easy to see the diff: 807de9e (#3940)

common/raw_hashtable_metadata_group.h

josh11b · 2024-05-28T23:05:13Z

common/raw_hashtable_metadata_group.h

+      // trailing zero bits, AArch64 only supports counting the leading zero
+      // bits and requires a bit-reverse to count the trailing zero bits. Doing
+      // the byte-reverse approach essentially combines moving the high bit into
+      // the low bit and the reverse necessary for counting the zero bits.


Can you say how valuable this optimization is?

Added some context on why it is important. Let me know if you want me to try to repeat the measurements to compute the exact % difference I can see on the M1.

common/raw_hashtable_metadata_group.h

josh11b · 2024-05-28T23:15:06Z

common/raw_hashtable_metadata_group.h

+  template <int N>
+  auto Test() const -> bool {
+    return mask_ & (static_cast<MaskT>(1) << N);
+  }


What is this function for? Seems like it should do something different when ByteEncoding is true.

This was stale code from a past experiment, sorry. Removed.

common/raw_hashtable_metadata_group.h

common/raw_hashtable.h

josh11b · 2024-05-29T23:54:10Z

common/raw_hashtable.h

+  EntryT* local_entries = this->entries();
+  const uint8_t* local_arg_metadata = arg.metadata();
+  const EntryT* local_arg_entries = arg.entries();
+  memcpy(local_metadata, local_arg_metadata, local_size);


Say that we are preserving which slot each entry is in, along with all the tombstones, instead of rehashing just the present values.

josh11b · 2024-05-29T23:56:22Z

common/raw_hashtable.h

+                  KeyContextT key_context = KeyContextT())
+      -> std::pair<EntryT*, bool>;
+
+  // Looks up the entry in the hashtable, and if found destroys the entry and
+  // returns `true`. If not found, returns `false`.
+  //
+  // Does not release any memory, just leaves a tombstone behind so this entry
+  // cannot be found and the slot can in theory be re-used.
+  template <typename LookupKeyT>
+  auto EraseImpl(LookupKeyT lookup_key, KeyContextT key_context = KeyContextT())


Any reason the raw versions of these operations should have default values? Seems like that would just invited accidents where the key context isn't passed through.

Doh, I forgot to remove them. I had them initially to test in phases as I introduced things, and meant to go back. Thanks for catching this.

common/raw_hashtable_metadata_group.h

josh11b · 2024-05-30T00:03:10Z

common/raw_hashtable_metadata_group.h

+  auto MatchEmpty() const -> MatchIndex;
+  auto MatchDeleted() const -> MatchIndex;


Shouldn't these return MatchRange too, like Match() and MatchPresent()? Or do they only return the first match? If the latter case, please add comments to document.

As discussed a bit, no, this is an intentionally weaker API. It opens some doors to different implementation strategies. I've tried to add comments here (and up in BitIndex and BitIndexRange) to help clarify what's going on here.

common/raw_hashtable_metadata_group.h

josh11b · 2024-05-30T00:07:27Z

common/raw_hashtable_metadata_group.h

+  // They directly verify their results to help establish baseline
+  // functionality.


Shouldn't that only be in debug mode?

True, commented.

common/raw_hashtable_metadata_group.h

josh11b · 2024-05-30T00:29:32Z

common/raw_hashtable_metadata_group.h

+  static auto CompareEqual(MetadataGroup lhs, MetadataGroup rhs) -> bool;
+
+  // Directly verify the provided masks that will be used to build a
+  // `MatchIndex` or `MatchGroup`. These either `CHECK`-fail or return true.


Could you explain what these functions are verifying and/or what the parameters are?

josh11b · 2024-05-30T00:34:53Z

common/raw_hashtable_metadata_group.h

+    MatchMask index_mask,
+    llvm::function_ref<auto(uint8_t byte)->bool> byte_match) const -> bool {
+  for (ssize_t byte_index : llvm::seq<ssize_t>(0, Size)) {
+    if constexpr (Size > 8) {


Why is Size > 8 significant?

It's where we switch from byte-encoding of the index to bit-encoding.

josh11b · 2024-05-30T00:35:45Z

common/raw_hashtable_metadata_group.h

+  return UseSIMD ? simd_result : portable_result;
+}
+
+inline auto MetadataGroup::VerifyIndexMask(


I will need to review the Verify* functions again once I know what they are supposed to do.

You suggested comments above are correct.

josh11b · 2024-05-30T00:38:14Z

common/raw_hashtable_metadata_group.h

+
+inline auto MetadataGroup::PortableClearDeleted() -> void {
+  for (uint64_t& byte_int : byte_ints) {
+    byte_int &= (~LSBs | byte_int >> 7);


Magic enough to merit an explanation.

Yeah. It took a long explanation.

josh11b · 2024-05-30T00:40:08Z

common/raw_hashtable_metadata_group.h

+
+  auto ClearDeleted() -> void;
+
+  auto Match(uint8_t present_byte) const -> MatchRange;


Should the high bit be set in present_byte?

No, added a detailed comment to help there. (We check this in the implementation.)

josh11b

Finished a pass on raw_hashtable_metadata_group.h

josh11b · 2024-05-30T15:37:31Z

common/raw_hashtable_metadata_group.h

+  // present matches which only require testing 7 bits and have a particular
+  // layout.
+
+  // Set the high bit of every byte to `1`. The match byte always has this bit


I'm not sure what you mean by "match" byte. Do you mean each byte of "pattern"?

Rewrote the comment a bunch to clarify.

josh11b · 2024-05-30T15:40:19Z

common/raw_hashtable_metadata_group.h

+  // MSBs of each byte. We structure this as a multiply and an add because we
+  // know that the add cannot carry, and this way it can be lowered using
+  // combined multiply-add instructions if available.
+  uint64_t pattern = LSBs * present_byte + MSBs;


Perhaps this should be called the "query"? or "broadcast_query"? or "query_repeated"? I'm not sure why this is called "pattern".

🤷 it's the pattern we use for the parallel search.

I can rename though... query seems a bit wrong. broadcast ok?

josh11b · 2024-05-30T15:41:35Z

common/raw_hashtable_metadata_group.h

+  // Set the high bit of every byte to `1`. The match byte always has this bit
+  // set as well, which ensures the xor below, in addition to zeroing the byte
+  // that matches, also clears the high bit of every byte.
+  uint64_t mask = byte_ints[0] | MSBs;


In what sense is this a "mask"? I would associate that term with something used to & with something else. It looks like this ultimately becomes "matches", so perhaps that would be a better name?

FWIW, I have a slightly broader definition of "mask" as more "a collection of bits with significant positions, rather than a numeric quantity or other 'integer'".

The reason for this being called "mask" is because it eventually becomes the mask input to the MatchRange. I'm not sure it ever becomes "matches"?

I can call this match_mask I guess, but it didn't seem like adding that prefix was very helpful in the context of this large function.

common/raw_hashtable_metadata_group.h

josh11b · 2024-05-30T16:34:48Z

common/raw_hashtable_metadata_group.h

+  auto match_byte_vec = vdup_n_u8(present_byte | PresentMask);
+  auto match_byte_cmp_vec = vceq_u8(byte_vec, match_byte_vec);
+  uint64_t mask = vreinterpret_u64_u8(match_byte_cmp_vec)[0];


Comment what these do for people who don't know all the SIMD functions?

I'd rather avoid duplicating intrinsic documentation here. I can try to add some links to the documentation, although I worry about them becoming stale.

josh11b · 2024-05-30T16:35:50Z

common/raw_hashtable_metadata_group.h

+  auto match_byte_vec = vdup_n_u8(present_byte | PresentMask);
+  auto match_byte_cmp_vec = vceq_u8(byte_vec, match_byte_vec);
+  uint64_t mask = vreinterpret_u64_u8(match_byte_cmp_vec)[0];
+  // Mask in scalar to fold with testing for zero.


I didn't understand what you meant by this, other than there is some advantage to switching to scalar operations over SIMD for this step.

Wrote more details.

josh11b · 2024-05-30T16:37:25Z

common/raw_hashtable_metadata_group.h

+#elif CARBON_X86_SIMD_SUPPORT
+  // We arrange the byte vector for present bytes so that we can directly
+  // extract it as a mask.
+  result = MatchRange(_mm_movemask_epi8(byte_vec));


Not sure what this operation does, and so why you don't need to & MSBs like the other code paths.

This produces a bit-encoded mask instead of a byte-encoded mask.

This is also probably the source of me calling the backing integer storage for BitIndex and BitIndexRange a mask of some kind -- that's how x86's SIMD instructions refer to them.

josh11b · 2024-05-30T16:38:29Z

common/raw_hashtable_metadata_group.h

+  // Mask in scalar to fold with testing for zero.
+  result = MatchRange(mask & MSBs);
+#elif CARBON_X86_SIMD_SUPPORT
+  result = X86SIMDMatch(present_byte | PresentMask);


Any reason this is a separate function rather than inlined here?

Because it has subtly different semantics from what Match needs. I wanted to keep SIMDMatch and PortableMatch exactly the same contract as Match. But for PortableMatch, we specifically don't want to allow Empty or Deleted. But on X86, we have a common implementation we want to use for all three cases.

josh11b · 2024-05-30T16:39:46Z

common/raw_hashtable_metadata_group.h

+  return result;
+}
+
+inline auto MetadataGroup::SIMDMatchEmpty() const -> MatchIndex {


Please add and update comments on these last functions too.

I've propagated some, but not sure if I got everything you were looking for here.

josh11b · 2024-05-30T18:02:18Z

common/raw_hashtable_metadata_group.h

+// An index encoded as low zero bits ending in (at least) one set high bit. The
+// index can be extracted by counting the low zero bits. It's presence can be
+// tested directly however by checking for any zero bits. The underlying type to
+// be used is provided as `MaskT` which must be an unsigned integer type.


I think moving away from "Mask" terminology for this type and these values would be clearer. I think these are much more like "match (bit) sets."

josh11b

I suspect many of the comments about set.h also apply to map.h

josh11b · 2024-05-30T18:44:06Z

common/set.h

+  auto Lookup(LookupKeyT lookup_key,
+              KeyContextT key_context = KeyContextT()) const -> LookupResult;
+
+  // Run the provided callback for every key in the set.


What is the signature of this callback? Less critical here than for MapView.

Added a constraint that checks the signature -- is that enough? It's not the most readable, but seems a bit more semantically unambiguous.

josh11b · 2024-05-30T18:44:33Z

common/map.h

+  auto operator[](LookupKeyT lookup_key) const
+      -> ValueT* requires(std::default_initializable<KeyContextT>);
+
+  // Run the provided callback for every key and value in the map.


Please write the signature for the callback.

Added a constraint that checks the signature -- is that enough? It's not the most readable, but seems a bit more semantically unambiguous.

josh11b · 2024-05-30T18:48:10Z

common/set.h

+  template <typename CallbackT>
+  void ForEach(CallbackT callback);
+
+  // Count the probed keys. This routine is purely informational and for use in


"Count the probed keys" doesn't provide a lot of value beyond the name of the function "CountProbedKeys". It would be more valuable to say this function is slow or to give additional clarity about what is returned.

Tried to improve the comment.

josh11b · 2024-05-30T18:51:49Z

common/map.h

+// by using `MapView<const T, const V>`, and we enable conversions to more-const
+// views. This mirrors the semantics of views like `std::span`.
+template <typename InputKeyT, typename InputValueT,
+          typename InputKeyContextT = DefaultKeyContext>


Should document this in this file and in set.h

common/set.h

josh11b · 2024-05-30T18:55:16Z

common/set.h

+// This data structure optimizes heavily for small key types that are cheap to
+// move and even copy. Using types with large keys or expensive to copy keys may
+// create surprising performance bottlenecks. A `std::string` key should be fine
+// with largely small strings, but if some or many strings are large heap


"largely" seems strange to describe something small

Suggested change

// with largely small strings, but if some or many strings are large heap

// with generally small strings, but if some or many strings are large heap

josh11b

Read through set_test.cpp and started on map_test.cpp

common/set_test.cpp

josh11b · 2024-05-30T19:02:56Z

common/set_test.cpp

+auto MakeElements(RangeT&& range, RangeTs&&... ranges) {
+  std::vector<typename RangeT::value_type> elements;
+  auto add_range = [&elements](RangeT&& r) {
+    for (const auto&& e : r) {
+      elements.push_back(e);
+    }
+  };
+  add_range(std::forward<RangeT>(range));
+  (add_range(std::forward<RangeT>(ranges)), ...);


Looks like your C++ variadics code has been influenced by Carbon's variadics.

josh11b · 2024-05-30T19:07:11Z

common/set_test.cpp

+  ExpectSetElementsAre(other_s2, MakeElements(llvm::seq(1, 32)));
+}
+
+TYPED_TEST(SetTest, Move) {


Is this test done? I don't see anything testing move.

Doh, forgot to finish it. Done now.

josh11b · 2024-05-30T21:08:38Z

common/raw_hashtable_metadata_group.h

+
+  static constexpr uint8_t PresentMask = 0b1000'0000;
+
+  static constexpr bool FastByteClear = Size == 8;


Please write a comment to explain the situation with what makes this case fast.

Good call, this is a weird case. Wrote it up. Hopefully it makes some sense, but may also need some fine tuning.

josh11b · 2024-05-30T21:11:37Z

common/raw_hashtable_metadata_group.h

+  static auto Load(const uint8_t* metadata, ssize_t index) -> MetadataGroup;
+  auto Store(uint8_t* metadata, ssize_t index) const -> void;
+
+  auto ClearByte(ssize_t byte_index) -> void;


Please explain the requirement that FastByteClear is true.

common/raw_hashtable_metadata_group.h

josh11b

I've now looked at everything except for files with names containing "benchmark"

common/map_test.cpp

josh11b · 2024-05-31T20:11:37Z

common/map_test.cpp

+  EXPECT_TRUE(mv.Contains(1));
+  EXPECT_TRUE(cmv.Contains(2));
+  EXPECT_TRUE(cmv2.Contains(3));
+  EXPECT_TRUE(cmv3.Contains(4));


Also validate that these keys have the expected values?

josh11b · 2024-05-31T20:14:28Z

common/map_test.cpp

+  EXPECT_FALSE(m.Update(2, 201).is_inserted());
+  EXPECT_FALSE(m.Update(4, 401).is_inserted());
+
+  // Now fill up the first control group.


What does this comment mean? Is "control group" defined anywhere?

Stale comment -- the original term for the metadata group was "control group", dating back to the original SwissTable design. Updated the comment.

josh11b · 2024-05-31T20:26:21Z

common/map_test.cpp

+  // Fill the map through a couple of growth steps, verifying at each step. Note
+  // that because this is a collision test, we synthesize actively harmful
+  // hashes in terms of collisions and so this test is essentially quadratic. We
+  // need to keep it relatively small.


I'd be interested to see this test do some erasing and reinserting.

Sure, done. I've only done this for Map because its a lot of code and doesn't seem to add value duplicated. We can also test everything a bit more nicely in map because we can track updates to keys with new values.

josh11b · 2024-05-31T20:27:28Z

common/map_test.cpp

+    for (int j : llvm::seq(1, i)) {
+      SCOPED_TRACE(llvm::formatv("Assert key: {0}", j).str());
+      ASSERT_EQ(j * 100, *m[j]);
+    }


Why this instead of ExpectMapElementsAre() like you do below?

Eh, this is code copied around a bit, and I think originally predated the helper. Sure, switched to just use the helper here.

josh11b · 2024-05-31T20:28:37Z

common/map_test.cpp

+  Map<ssize_t, int, 0, IndexKeyContext<TestData>> m;
+
+  EXPECT_FALSE(m.Contains(42, key_context));
+  EXPECT_TRUE(m.Insert(1, 100, key_context).is_inserted());


Using * 100 for both the key and value here -- perhaps choose a different value or different value for keys?

Sure, let's go with 100000 for keys. Mostly want them to be easy to seee.

Co-authored-by: josh11b <15258583+josh11b@users.noreply.github.com>

common/raw_hashtable_metadata_group.h

chandlerc

I think I've responded to all of the comments. Tried to make all the improvements I could based on them, and huge thanks for all the suggested comments and help crafting better documentation throughout!

chandlerc · 2024-06-01T05:24:33Z

common/raw_hashtable_metadata_group.h

+    auto operator*() -> ssize_t& {
+      CARBON_DCHECK(mask_ != 0) << "Cannot get an index from a zero mask!";
+      __builtin_assume(mask_ != 0);
+      index_ = BitIndexT(mask_).index();


(as discussed) we return it on the next line, and we have to return a reference as this is an iterator, so we need to store the index in this iterator to provide the target of that reference. I'll add a comment.

chandlerc · 2024-06-01T05:43:10Z

common/set_test.cpp

+  ExpectSetElementsAre(other_s2, MakeElements(llvm::seq(1, 32)));
+}
+
+TYPED_TEST(SetTest, Move) {


Doh, forgot to finish it. Done now.

chandlerc · 2024-06-01T05:43:22Z

common/set_test.cpp

+auto MakeElements(RangeT&& range, RangeTs&&... ranges) {
+  std::vector<typename RangeT::value_type> elements;
+  auto add_range = [&elements](RangeT&& r) {
+    for (const auto&& e : r) {
+      elements.push_back(e);
+    }
+  };
+  add_range(std::forward<RangeT>(range));
+  (add_range(std::forward<RangeT>(ranges)), ...);


chandlerc · 2024-06-01T05:46:30Z

common/set.h

+  auto Lookup(LookupKeyT lookup_key,
+              KeyContextT key_context = KeyContextT()) const -> LookupResult;
+
+  // Run the provided callback for every key in the set.


Added a constraint that checks the signature -- is that enough? It's not the most readable, but seems a bit more semantically unambiguous.

chandlerc · 2024-06-01T06:25:26Z

common/map.h

+// by using `MapView<const T, const V>`, and we enable conversions to more-const
+// views. This mirrors the semantics of views like `std::span`.
+template <typename InputKeyT, typename InputValueT,
+          typename InputKeyContextT = DefaultKeyContext>


chandlerc · 2024-06-01T09:10:36Z

common/raw_hashtable_metadata_group.h

+  // Mask in scalar to fold with testing for zero.
+  result = MatchRange(mask & MSBs);
+#elif CARBON_X86_SIMD_SUPPORT
+  result = X86SIMDMatch(present_byte | PresentMask);


Because it has subtly different semantics from what Match needs. I wanted to keep SIMDMatch and PortableMatch exactly the same contract as Match. But for PortableMatch, we specifically don't want to allow Empty or Deleted. But on X86, we have a common implementation we want to use for all three cases.

chandlerc · 2024-06-01T09:12:07Z

common/raw_hashtable_metadata_group.h

+#elif CARBON_X86_SIMD_SUPPORT
+  // We arrange the byte vector for present bytes so that we can directly
+  // extract it as a mask.
+  result = MatchRange(_mm_movemask_epi8(byte_vec));


This produces a bit-encoded mask instead of a byte-encoded mask.

This is also probably the source of me calling the backing integer storage for BitIndex and BitIndexRange a mask of some kind -- that's how x86's SIMD instructions refer to them.

chandlerc · 2024-06-01T09:16:24Z

common/raw_hashtable_metadata_group.h

+  return result;
+}
+
+inline auto MetadataGroup::SIMDMatchEmpty() const -> MatchIndex {


I've propagated some, but not sure if I got everything you were looking for here.

chandlerc · 2024-06-01T09:20:34Z

common/set.h

+  template <typename CallbackT>
+  void ForEach(CallbackT callback);
+
+  // Count the probed keys. This routine is purely informational and for use in


Tried to improve the comment.

chandlerc · 2024-06-01T09:21:18Z

common/set.h

+// This data structure optimizes heavily for small key types that are cheap to
+// move and even copy. Using types with large keys or expensive to copy keys may
+// create surprising performance bottlenecks. A `std::string` key should be fine
+// with largely small strings, but if some or many strings are large heap


chandlerc requested a review from josh11b May 7, 2024 10:21

josh11b reviewed May 7, 2024

View reviewed changes

CarbonInfraBot reviewed May 8, 2024

View reviewed changes

common/map.h Outdated Show resolved Hide resolved

common/map.h Outdated Show resolved Hide resolved

chandlerc commented May 8, 2024

View reviewed changes

josh11b reviewed May 8, 2024

View reviewed changes

common/map.h Outdated Show resolved Hide resolved

common/map.h Outdated Show resolved Hide resolved

josh11b reviewed May 10, 2024

View reviewed changes

CarbonInfraBot reviewed May 11, 2024

View reviewed changes

common/raw_hashtable.h Outdated Show resolved Hide resolved

common/raw_hashtable.h Outdated Show resolved Hide resolved

chandlerc commented May 16, 2024

View reviewed changes

josh11b reviewed May 21, 2024

View reviewed changes

CarbonInfraBot reviewed May 23, 2024

View reviewed changes

chandlerc commented May 24, 2024

View reviewed changes

chandlerc marked this pull request as ready for review May 24, 2024 01:42

chandlerc requested a review from josh11b May 24, 2024 01:42

josh11b reviewed May 24, 2024

View reviewed changes

common/raw_hashtable.h Outdated Show resolved Hide resolved

CarbonInfraBot reviewed May 25, 2024

View reviewed changes

common/raw_hashtable.h Outdated Show resolved Hide resolved

common/raw_hashtable.h Outdated Show resolved Hide resolved

common/raw_hashtable.h Outdated Show resolved Hide resolved

chandlerc force-pushed the map branch from 4d7deb0 to bff89a2 Compare May 25, 2024 21:41

josh11b reviewed May 29, 2024

View reviewed changes

josh11b reviewed May 30, 2024

View reviewed changes

josh11b reviewed May 31, 2024

View reviewed changes

Apply suggestions from code review

34d6ed6

Co-authored-by: josh11b <15258583+josh11b@users.noreply.github.com>

CarbonInfraBot reviewed Jun 1, 2024

View reviewed changes

address review feedback

6c66e7f

chandlerc commented Jun 1, 2024

View reviewed changes

		auto MatchEmpty() const -> MatchIndex;
		auto MatchDeleted() const -> MatchIndex;

		// They directly verify their results to help establish baseline
		// functionality.


		auto ClearDeleted() -> void;

		auto Match(uint8_t present_byte) const -> MatchRange;

	// with largely small strings, but if some or many strings are large heap
	// with generally small strings, but if some or many strings are large heap


		static constexpr uint8_t PresentMask = 0b1000'0000;

		static constexpr bool FastByteClear = Size == 8;

Introduce custom hash table data structures. #3940

Are you sure you want to change the base?

Introduce custom hash table data structures. #3940

Conversation

chandlerc commented May 7, 2024

josh11b left a comment

Choose a reason for hiding this comment

chandlerc left a comment

Choose a reason for hiding this comment

josh11b left a comment

Choose a reason for hiding this comment

chandlerc left a comment

Choose a reason for hiding this comment

josh11b left a comment

Choose a reason for hiding this comment

chandlerc left a comment

Choose a reason for hiding this comment

chandlerc left a comment

Choose a reason for hiding this comment

chandlerc commented May 28, 2024

josh11b commented May 28, 2024

chandlerc commented May 29, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

josh11b left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

josh11b left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

josh11b left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment