-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce custom hash table data structures. #3940
base: trunk
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to give feedback on map.h early, before you update set.h
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, definitely useful to get map.h
tidied up before I port it to set.h
. I think I've responded to all the comments there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sending what comments I have so far.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the feedback so far, PTAL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finished a pass through raw_hashtable.h
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PTAL!
Beyond inline replies, also was able to simplify the specialization strategy for the table type and improved some other comments spotted as I was working.
Also, think this is far enough along in review I'm moving it out of draft.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PTAL!
Beyond inline replies, also was able to simplify the specialization strategy for the table type and improved some other comments spotted as I was working.
Also, think this is far enough along in review I'm moving it out of draft.
The hash table design is heavily based on Abseil's ["Swiss Tables"][swiss-tables] design. It uses an array of bytes storing metadata about each entry and an array of entries where each is a pair of key and value. The metadata byte consists of 7-bits of hash of the key (distinct from the bits used to index the table), and one bit indicating the presence of a special entry -- either empty or deleted. [swiss-tables]: https://abseil.io/about/design/swisstables See the comments in `raw_hashtable.h` for a detailed overview of the design. Co-authored-by: josh11b <15258583+josh11b@users.noreply.github.com>
So, I'm most of the way through adding really nice support for handling stateful keys like indices into vectors and such. It turned out to be necessary to even move the toolchain over to these data structures as it also lets us use something other than
Wanted to both let you know @josh11b about my progress there, and also ask whether you'd like me to merge this into this code review or keep it as a follow-up. I can easily manage either way. |
I'm fine either way. If it simplifies the files haven't reviewed yet, it would be a benefit. |
This allows customizing both the hash and equality comparison. It also does so in a way that enables stateful key contexts such as indices into a specific vector. This works by requiring the context to be packaged on every API accepting a key. When the context is stateless and can be default constructed, this works via a default argument. When it is stateful, the argument must be provided explicitly. This allows us to add a stress test that forces colliding hashes in egregious ways, as well as allowing the staeful hashing. There are also a number of basic cleanups to code that were uncovered while working on this.
After discussion, it seems a bit borderline with pros and cons, but on the whole probably simplifies the review. Added and pushed. I did make it a clean follow-up commit so that it is easy to see the diff: |
// trailing zero bits, AArch64 only supports counting the leading zero | ||
// bits and requires a bit-reverse to count the trailing zero bits. Doing | ||
// the byte-reverse approach essentially combines moving the high bit into | ||
// the low bit and the reverse necessary for counting the zero bits. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you say how valuable this optimization is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some context on why it is important. Let me know if you want me to try to repeat the measurements to compute the exact % difference I can see on the M1.
template <int N> | ||
auto Test() const -> bool { | ||
return mask_ & (static_cast<MaskT>(1) << N); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this function for? Seems like it should do something different when ByteEncoding
is true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was stale code from a past experiment, sorry. Removed.
EntryT* local_entries = this->entries(); | ||
const uint8_t* local_arg_metadata = arg.metadata(); | ||
const EntryT* local_arg_entries = arg.entries(); | ||
memcpy(local_metadata, local_arg_metadata, local_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Say that we are preserving which slot each entry is in, along with all the tombstones, instead of rehashing just the present values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
common/raw_hashtable.h
Outdated
KeyContextT key_context = KeyContextT()) | ||
-> std::pair<EntryT*, bool>; | ||
|
||
// Looks up the entry in the hashtable, and if found destroys the entry and | ||
// returns `true`. If not found, returns `false`. | ||
// | ||
// Does not release any memory, just leaves a tombstone behind so this entry | ||
// cannot be found and the slot can in theory be re-used. | ||
template <typename LookupKeyT> | ||
auto EraseImpl(LookupKeyT lookup_key, KeyContextT key_context = KeyContextT()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason the raw versions of these operations should have default values? Seems like that would just invited accidents where the key context isn't passed through.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doh, I forgot to remove them. I had them initially to test in phases as I introduced things, and meant to go back. Thanks for catching this.
auto MatchEmpty() const -> MatchIndex; | ||
auto MatchDeleted() const -> MatchIndex; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't these return MatchRange
too, like Match()
and MatchPresent()
? Or do they only return the first match? If the latter case, please add comments to document.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed a bit, no, this is an intentionally weaker API. It opens some doors to different implementation strategies. I've tried to add comments here (and up in BitIndex and BitIndexRange) to help clarify what's going on here.
// They directly verify their results to help establish baseline | ||
// functionality. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't that only be in debug mode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, commented.
static auto CompareEqual(MetadataGroup lhs, MetadataGroup rhs) -> bool; | ||
|
||
// Directly verify the provided masks that will be used to build a | ||
// `MatchIndex` or `MatchGroup`. These either `CHECK`-fail or return true. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain what these functions are verifying and/or what the parameters are?
MatchMask index_mask, | ||
llvm::function_ref<auto(uint8_t byte)->bool> byte_match) const -> bool { | ||
for (ssize_t byte_index : llvm::seq<ssize_t>(0, Size)) { | ||
if constexpr (Size > 8) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is Size > 8
significant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's where we switch from byte-encoding of the index to bit-encoding.
return UseSIMD ? simd_result : portable_result; | ||
} | ||
|
||
inline auto MetadataGroup::VerifyIndexMask( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will need to review the Verify* functions again once I know what they are supposed to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You suggested comments above are correct.
|
||
inline auto MetadataGroup::PortableClearDeleted() -> void { | ||
for (uint64_t& byte_int : byte_ints) { | ||
byte_int &= (~LSBs | byte_int >> 7); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Magic enough to merit an explanation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. It took a long explanation.
|
||
auto ClearDeleted() -> void; | ||
|
||
auto Match(uint8_t present_byte) const -> MatchRange; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the high bit be set in present_byte
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, added a detailed comment to help there. (We check this in the implementation.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finished a pass on raw_hashtable_metadata_group.h
// present matches which only require testing 7 bits and have a particular | ||
// layout. | ||
|
||
// Set the high bit of every byte to `1`. The match byte always has this bit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you mean by "match" byte. Do you mean each byte of "pattern"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rewrote the comment a bunch to clarify.
// MSBs of each byte. We structure this as a multiply and an add because we | ||
// know that the add cannot carry, and this way it can be lowered using | ||
// combined multiply-add instructions if available. | ||
uint64_t pattern = LSBs * present_byte + MSBs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps this should be called the "query"? or "broadcast_query"? or "query_repeated"? I'm not sure why this is called "pattern".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤷 it's the pattern we use for the parallel search.
I can rename though... query
seems a bit wrong. broadcast
ok?
// Set the high bit of every byte to `1`. The match byte always has this bit | ||
// set as well, which ensures the xor below, in addition to zeroing the byte | ||
// that matches, also clears the high bit of every byte. | ||
uint64_t mask = byte_ints[0] | MSBs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In what sense is this a "mask"? I would associate that term with something used to &
with something else. It looks like this ultimately becomes "matches", so perhaps that would be a better name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I have a slightly broader definition of "mask" as more "a collection of bits with significant positions, rather than a numeric quantity or other 'integer'".
The reason for this being called "mask" is because it eventually becomes the mask input to the MatchRange
. I'm not sure it ever becomes "matches"?
I can call this match_mask
I guess, but it didn't seem like adding that prefix was very helpful in the context of this large function.
auto match_byte_vec = vdup_n_u8(present_byte | PresentMask); | ||
auto match_byte_cmp_vec = vceq_u8(byte_vec, match_byte_vec); | ||
uint64_t mask = vreinterpret_u64_u8(match_byte_cmp_vec)[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment what these do for people who don't know all the SIMD functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather avoid duplicating intrinsic documentation here. I can try to add some links to the documentation, although I worry about them becoming stale.
auto match_byte_vec = vdup_n_u8(present_byte | PresentMask); | ||
auto match_byte_cmp_vec = vceq_u8(byte_vec, match_byte_vec); | ||
uint64_t mask = vreinterpret_u64_u8(match_byte_cmp_vec)[0]; | ||
// Mask in scalar to fold with testing for zero. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't understand what you meant by this, other than there is some advantage to switching to scalar operations over SIMD for this step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrote more details.
#elif CARBON_X86_SIMD_SUPPORT | ||
// We arrange the byte vector for present bytes so that we can directly | ||
// extract it as a mask. | ||
result = MatchRange(_mm_movemask_epi8(byte_vec)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what this operation does, and so why you don't need to & MSBs
like the other code paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This produces a bit-encoded mask instead of a byte-encoded mask.
This is also probably the source of me calling the backing integer storage for BitIndex and BitIndexRange a mask
of some kind -- that's how x86's SIMD instructions refer to them.
// Mask in scalar to fold with testing for zero. | ||
result = MatchRange(mask & MSBs); | ||
#elif CARBON_X86_SIMD_SUPPORT | ||
result = X86SIMDMatch(present_byte | PresentMask); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason this is a separate function rather than inlined here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it has subtly different semantics from what Match
needs. I wanted to keep SIMDMatch
and PortableMatch
exactly the same contract as Match
. But for PortableMatch
, we specifically don't want to allow Empty
or Deleted
. But on X86, we have a common implementation we want to use for all three cases.
return result; | ||
} | ||
|
||
inline auto MetadataGroup::SIMDMatchEmpty() const -> MatchIndex { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add and update comments on these last functions too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've propagated some, but not sure if I got everything you were looking for here.
// An index encoded as low zero bits ending in (at least) one set high bit. The | ||
// index can be extracted by counting the low zero bits. It's presence can be | ||
// tested directly however by checking for any zero bits. The underlying type to | ||
// be used is provided as `MaskT` which must be an unsigned integer type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think moving away from "Mask" terminology for this type and these values would be clearer. I think these are much more like "match (bit) sets."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect many of the comments about set.h also apply to map.h
auto Lookup(LookupKeyT lookup_key, | ||
KeyContextT key_context = KeyContextT()) const -> LookupResult; | ||
|
||
// Run the provided callback for every key in the set. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the signature of this callback? Less critical here than for MapView
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a constraint that checks the signature -- is that enough? It's not the most readable, but seems a bit more semantically unambiguous.
auto operator[](LookupKeyT lookup_key) const | ||
-> ValueT* requires(std::default_initializable<KeyContextT>); | ||
|
||
// Run the provided callback for every key and value in the map. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please write the signature for the callback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a constraint that checks the signature -- is that enough? It's not the most readable, but seems a bit more semantically unambiguous.
common/set.h
Outdated
template <typename CallbackT> | ||
void ForEach(CallbackT callback); | ||
|
||
// Count the probed keys. This routine is purely informational and for use in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Count the probed keys" doesn't provide a lot of value beyond the name of the function "CountProbedKeys". It would be more valuable to say this function is slow or to give additional clarity about what is returned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried to improve the comment.
// by using `MapView<const T, const V>`, and we enable conversions to more-const | ||
// views. This mirrors the semantics of views like `std::span`. | ||
template <typename InputKeyT, typename InputValueT, | ||
typename InputKeyContextT = DefaultKeyContext> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should document this in this file and in set.h
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doh, done!
common/set.h
Outdated
// This data structure optimizes heavily for small key types that are cheap to | ||
// move and even copy. Using types with large keys or expensive to copy keys may | ||
// create surprising performance bottlenecks. A `std::string` key should be fine | ||
// with largely small strings, but if some or many strings are large heap |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"largely" seems strange to describe something small
// with largely small strings, but if some or many strings are large heap | |
// with generally small strings, but if some or many strings are large heap |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Read through set_test.cpp and started on map_test.cpp
auto MakeElements(RangeT&& range, RangeTs&&... ranges) { | ||
std::vector<typename RangeT::value_type> elements; | ||
auto add_range = [&elements](RangeT&& r) { | ||
for (const auto&& e : r) { | ||
elements.push_back(e); | ||
} | ||
}; | ||
add_range(std::forward<RangeT>(range)); | ||
(add_range(std::forward<RangeT>(ranges)), ...); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like your C++ variadics code has been influenced by Carbon's variadics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
=]
ExpectSetElementsAre(other_s2, MakeElements(llvm::seq(1, 32))); | ||
} | ||
|
||
TYPED_TEST(SetTest, Move) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this test done? I don't see anything testing move.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doh, forgot to finish it. Done now.
|
||
static constexpr uint8_t PresentMask = 0b1000'0000; | ||
|
||
static constexpr bool FastByteClear = Size == 8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please write a comment to explain the situation with what makes this case fast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, this is a weird case. Wrote it up. Hopefully it makes some sense, but may also need some fine tuning.
static auto Load(const uint8_t* metadata, ssize_t index) -> MetadataGroup; | ||
auto Store(uint8_t* metadata, ssize_t index) const -> void; | ||
|
||
auto ClearByte(ssize_t byte_index) -> void; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please explain the requirement that FastByteClear
is true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've now looked at everything except for files with names containing "benchmark"
common/map_test.cpp
Outdated
EXPECT_TRUE(mv.Contains(1)); | ||
EXPECT_TRUE(cmv.Contains(2)); | ||
EXPECT_TRUE(cmv2.Contains(3)); | ||
EXPECT_TRUE(cmv3.Contains(4)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also validate that these keys have the expected values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
common/map_test.cpp
Outdated
EXPECT_FALSE(m.Update(2, 201).is_inserted()); | ||
EXPECT_FALSE(m.Update(4, 401).is_inserted()); | ||
|
||
// Now fill up the first control group. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this comment mean? Is "control group" defined anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stale comment -- the original term for the metadata group was "control group", dating back to the original SwissTable design. Updated the comment.
// Fill the map through a couple of growth steps, verifying at each step. Note | ||
// that because this is a collision test, we synthesize actively harmful | ||
// hashes in terms of collisions and so this test is essentially quadratic. We | ||
// need to keep it relatively small. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be interested to see this test do some erasing and reinserting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, done. I've only done this for Map
because its a lot of code and doesn't seem to add value duplicated. We can also test everything a bit more nicely in map because we can track updates to keys with new values.
common/map_test.cpp
Outdated
for (int j : llvm::seq(1, i)) { | ||
SCOPED_TRACE(llvm::formatv("Assert key: {0}", j).str()); | ||
ASSERT_EQ(j * 100, *m[j]); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this instead of ExpectMapElementsAre()
like you do below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eh, this is code copied around a bit, and I think originally predated the helper. Sure, switched to just use the helper here.
Map<ssize_t, int, 0, IndexKeyContext<TestData>> m; | ||
|
||
EXPECT_FALSE(m.Contains(42, key_context)); | ||
EXPECT_TRUE(m.Insert(1, 100, key_context).is_inserted()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using * 100 for both the key and value here -- perhaps choose a different value or different value for keys
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, let's go with 100000
for keys. Mostly want them to be easy to seee.
Co-authored-by: josh11b <15258583+josh11b@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I've responded to all of the comments. Tried to make all the improvements I could based on them, and huge thanks for all the suggested comments and help crafting better documentation throughout!
auto operator*() -> ssize_t& { | ||
CARBON_DCHECK(mask_ != 0) << "Cannot get an index from a zero mask!"; | ||
__builtin_assume(mask_ != 0); | ||
index_ = BitIndexT(mask_).index(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(as discussed) we return it on the next line, and we have to return a reference as this is an iterator, so we need to store the index in this iterator to provide the target of that reference. I'll add a comment.
ExpectSetElementsAre(other_s2, MakeElements(llvm::seq(1, 32))); | ||
} | ||
|
||
TYPED_TEST(SetTest, Move) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doh, forgot to finish it. Done now.
auto MakeElements(RangeT&& range, RangeTs&&... ranges) { | ||
std::vector<typename RangeT::value_type> elements; | ||
auto add_range = [&elements](RangeT&& r) { | ||
for (const auto&& e : r) { | ||
elements.push_back(e); | ||
} | ||
}; | ||
add_range(std::forward<RangeT>(range)); | ||
(add_range(std::forward<RangeT>(ranges)), ...); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
=]
auto Lookup(LookupKeyT lookup_key, | ||
KeyContextT key_context = KeyContextT()) const -> LookupResult; | ||
|
||
// Run the provided callback for every key in the set. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a constraint that checks the signature -- is that enough? It's not the most readable, but seems a bit more semantically unambiguous.
// by using `MapView<const T, const V>`, and we enable conversions to more-const | ||
// views. This mirrors the semantics of views like `std::span`. | ||
template <typename InputKeyT, typename InputValueT, | ||
typename InputKeyContextT = DefaultKeyContext> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doh, done!
// Mask in scalar to fold with testing for zero. | ||
result = MatchRange(mask & MSBs); | ||
#elif CARBON_X86_SIMD_SUPPORT | ||
result = X86SIMDMatch(present_byte | PresentMask); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it has subtly different semantics from what Match
needs. I wanted to keep SIMDMatch
and PortableMatch
exactly the same contract as Match
. But for PortableMatch
, we specifically don't want to allow Empty
or Deleted
. But on X86, we have a common implementation we want to use for all three cases.
#elif CARBON_X86_SIMD_SUPPORT | ||
// We arrange the byte vector for present bytes so that we can directly | ||
// extract it as a mask. | ||
result = MatchRange(_mm_movemask_epi8(byte_vec)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This produces a bit-encoded mask instead of a byte-encoded mask.
This is also probably the source of me calling the backing integer storage for BitIndex and BitIndexRange a mask
of some kind -- that's how x86's SIMD instructions refer to them.
return result; | ||
} | ||
|
||
inline auto MetadataGroup::SIMDMatchEmpty() const -> MatchIndex { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've propagated some, but not sure if I got everything you were looking for here.
common/set.h
Outdated
template <typename CallbackT> | ||
void ForEach(CallbackT callback); | ||
|
||
// Count the probed keys. This routine is purely informational and for use in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried to improve the comment.
common/set.h
Outdated
// This data structure optimizes heavily for small key types that are cheap to | ||
// move and even copy. Using types with large keys or expensive to copy keys may | ||
// create surprising performance bottlenecks. A `std::string` key should be fine | ||
// with largely small strings, but if some or many strings are large heap |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
The hash table design is heavily based on Abseil's "Swiss Tables" design. It uses an array of bytes storing metadata about each entry and an array of entries where each is a pair of key and value. The metadata byte consists of 7-bits of hash of the key (distinct from the bits used to index the table), and one bit indicating the presence of a special entry -- either empty or deleted.
TODO: document the design and PR more fully