Skip to content
This repository has been archived by the owner on Mar 22, 2023. It is now read-only.

Persistent-aware self_relative_ptr and Single-writer-multi-reader (SWMR) skiplist #1174

Draft
wants to merge 41 commits into
base: master
Choose a base branch
from

Conversation

yjrobin
Copy link
Contributor

@yjrobin yjrobin commented Jul 23, 2021

Dear all,

It has been a long time since we talked earlier. I create this PR as the kick-start to discuss with you about how we can contribute to this repo in order to make a new storage engine (based on a single-writer-multi-reader skiplist) for PmemKV.
As we discussed earlier, we both found the traditional persistent_ptr in PMDK is too expensive (16 bytes) and not convenient for atomic operations. So I looked into the experimental self_relative_ptr you mentioned and come up with some idea to modify it to facilitate the implementation of Persistent Compare-and-swap. This PR includes the following components:

1. pa_self_relative_ptr

  • To prevent concurrent read ops to see non-persistent data updated by a concurrent write op in PMem, expensive locks/txns are commonly used for write op, e.g., locks the data object during updates and flush, unlock it after flush is finished, or encapsulate the updates and flush in a PMDK txns.
  • Based on self_relative_ptr, the definition of the stored offset is real_offset-1, so for 8-byte aligned allocation, the lower 3 bits of the stored offset are always 1 (except null_ptr). Therefore, the second lowest bit can be used as the indicator of whether the pa_self_relative_ptr (persistent-aware self_relative_ptr) itself needs explicit flush. If a consequent (atomic) read sees the second lowest bit is 0, it uses CAS to reset the bit to 1 and explicitly performs the flush.
  • Similar test cases to those of self_relative_ptr are created. Maybe more needs to be added after discussion.

2. swmr_skip_list

  • Currently it is almost the same as the existing MWMR concurrent_skip_list (multi-writer-multi-reader skiplist) except the procedure of setting and retrieving the next ptr of a skiplist node.
  • TODO: remove the locks and txns that are not needed for single-writer-multi-reader semantics.

Before I move on to implement the final swmr_skip_list, I'd like to discuss with you about the pa_self_relative_ptr to see if there are any correctness problems. And also how you think the existing MWMR concurrent_skip_list can use pa_self_relative_ptr to remove unnecessary locks and txns in SWMR semantics.

Thanks!

Best rgds,
Jun Yang


This change is Reviewable

Copy link
Contributor

@igchor igchor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yjrobin Thank you very much for the patch! I really like the direction of this. My main comment would be to think about reusing existing components in pa_self_relative_ptr and swmr_skip_list. For pa_self_relative_ptr I think it would be easier to implement the flush_needed on top of the existing self_relative_ptr implementation (see comment below). That way it would also be easier to tell if the implementation is correct.

Regarding the skip_list implementation, I think it would be best to have everything in one place (e.g. in concurrent_map/concurrent_skip_list_impl.hpp) and just add some (template) parameters to control the concurrency setting (swmr vs mwmr) or just have separate insert methods for swmr and mwmr modes. Do you think this is possible?

Reviewed 1 of 18 files at r1.
Reviewable status: 1 of 19 files reviewed, 2 unresolved discussions (waiting on @yjrobin)


include/libpmemobj++/experimental/atomic_pa_self_relative_ptr.hpp, line 17 at r3 (raw file):

{
/**
 * Atomic specialization for pa_self_relative_ptr

Does it make sense to have pa_self_relative_ptr on it's own? (without atomic around it)? I'm wondering if we couldn't just use self_relative_ptr and implement the flush_needed logic on top of it in this atomic specialization. Please take a look at include/libpmemobj++/detail/tagged_ptr.hpp. It's quite similar.

Maybe you could even use tagged_ptr directly in atomic<pa_self_relative_ptr> implementation?


include/libpmemobj++/experimental/swmr_skip_list_impl.hpp, line 125 at r3 (raw file):

	next(size_type level)
	{
		assert(level < height());

I think that this kind of logic is pretty generic and this could be implemented in some load() method in the atomic<pa_self_relative_ptr> class perhaps?

@yjrobin
Copy link
Contributor Author

yjrobin commented Jul 26, 2021

@yjrobin Thank you very much for the patch! I really like the direction of this. My main comment would be to think about reusing existing components in pa_self_relative_ptr and swmr_skip_list. For pa_self_relative_ptr I think it would be easier to implement the flush_needed on top of the existing self_relative_ptr implementation (see comment below). That way it would also be easier to tell if the implementation is correct.

This is the right way to do it, I create a new one because I want to show you the idea and logic without changing existing file. I'll try to merge the pa_self_relative_ptr into self_relative_ptr with some extra overloaded APIs.

Regarding the skip_list implementation, I think it would be best to have everything in one place (e.g. in concurrent_map/concurrent_skip_list_impl.hpp) and just add some (template) parameters to control the concurrency setting (swmr vs mwmr) or just have separate insert methods for swmr and mwmr modes. Do you think this is possible?

Reviewed 1 of 18 files at r1.
Reviewable status: 1 of 19 files reviewed, 2 unresolved discussions (waiting on @yjrobin)

This is possible as they are quite similar. I'll merge them.

include/libpmemobj++/experimental/atomic_pa_self_relative_ptr.hpp, line 17 at r3 (raw file):

{
/**
 * Atomic specialization for pa_self_relative_ptr

Does it make sense to have pa_self_relative_ptr on it's own? (without atomic around it)? I'm wondering if we couldn't just use self_relative_ptr and implement the flush_needed logic on top of it in this atomic specialization. Please take a look at include/libpmemobj++/detail/tagged_ptr.hpp. It's quite similar.

Maybe you could even use tagged_ptr directly in atomic<pa_self_relative_ptr> implementation?

If merging pa_self_relative_ptr into self_relative_ptr cannot be done, I'll look into the tagged_ptr.

include/libpmemobj++/experimental/swmr_skip_list_impl.hpp, line 125 at r3 (raw file):

	next(size_type level)
	{
		assert(level < height());

I think that this kind of logic is pretty generic and this could be implemented in some load() method in the atomic<pa_self_relative_ptr> class perhaps?

Implementing this logic in load() may introduce some problems for "const" iterators as load() function generally does not modify anything. My idea is allow some read-only (not the read in the write-after-read) operation such as const iterators to read the those dirty self_relative_ptr without flushing and resetting the flag. How do you think?

Copy link
Contributor

@igchor igchor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! In general, I think that having two separate classes: self_relative_ptr and pa_self_relative_ptr might be a good idea. pa_self_relative could provide only methods like store/load/compare_exchange/... which ensure atomicity is the same as visibility. If we have all API in self_relateive_ptr it might be easy to accidentally mix the API by a user.

The flush_neede/dirty flag should be an implementation detail, hidden from the user IMO.

Reviewable status: 1 of 19 files reviewed, 2 unresolved discussions (waiting on @igchor and @yjrobin)


include/libpmemobj++/experimental/swmr_skip_list_impl.hpp, line 125 at r3 (raw file):

Previously, yjrobin wrote…

Implementing this logic in load() may introduce some problems for "const" iterators as load() function generally does not modify anything. My idea is allow some read-only (not the read in the write-after-read) operation such as const iterators to read the those dirty self_relative_ptr without flushing and resetting the flag. How do you think?

Right, the load should not modify anything - but I believe you should still put this logic somewhere into the pa_self_relative_ptr (some separate method). Regarding reading dirty self_relative_ptrs, how would you ensure consistency of the data structure? Maybe for const load() we could have something similar to this implementation: https://github.com/pmem/pmdk/blob/master/src/examples/libpmem2/ringbuf/ringbuf.c#L247

The load method there only persists the value, but does not clear the dirty flag.

Copy link
Contributor

@igchor igchor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 of 19 files reviewed, 3 unresolved discussions (waiting on @igchor and @yjrobin)

a discussion (no related file):

Getting rid of transactions is quite tricky and I would suggest to focus on locks first. We need to use transactions, at least for allocations (make_persistent only works within a transaction). Replacing current implementation with Flush-on-Read approach is not so simple because you need to think about memory leaks. We can discuss this in more detail if you want.

As a first step, I would suggest removing locks and replacing enumerable_thread_specific with a single variable for SWMR case. To remove locks you can just create a dummy version of try_lock_nodes method which just do nothing in SWMR case. For enumerable_thread_specific, you could also create a dummy class, which always returns the same reference in local() method (since we only have one writer, we do not need per-thread storage).


Copy link
Contributor Author

@yjrobin yjrobin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 of 19 files reviewed, 3 unresolved discussions (waiting on @igchor and @yjrobin)

a discussion (no related file):

Previously, igchor (Igor Chorążewicz) wrote…

Getting rid of transactions is quite tricky and I would suggest to focus on locks first. We need to use transactions, at least for allocations (make_persistent only works within a transaction). Replacing current implementation with Flush-on-Read approach is not so simple because you need to think about memory leaks. We can discuss this in more detail if you want.

As a first step, I would suggest removing locks and replacing enumerable_thread_specific with a single variable for SWMR case. To remove locks you can just create a dummy version of try_lock_nodes method which just do nothing in SWMR case. For enumerable_thread_specific, you could also create a dummy class, which always returns the same reference in local() method (since we only have one writer, we do not need per-thread storage).

Yes, it is not necessary to remove the txn for pmem allocation. I just focus on the internal locks of the SWMR skiplist.
Dummy version of try_lock_nodes method and per-thread class is what I'm doing. Thanks for the suggestion.


Copy link
Contributor

@igchor igchor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 of 19 files reviewed, 3 unresolved discussions (waiting on @igchor and @yjrobin)

a discussion (no related file):

Previously, yjrobin wrote…

Yes, it is not necessary to remove the txn for pmem allocation. I just focus on the internal locks of the SWMR skiplist.
Dummy version of try_lock_nodes method and per-thread class is what I'm doing. Thanks for the suggestion.

@yjrobin just wanted to let you know that I will be out of the office for the next 3 or 4 weeks. If you have any questions please feel free to reach out to lukasz.stolarczuk@intel.com or piotr.balcer@intel.com My team will also review the changes regarding locks and tls once they are ready.

Once I'm back we can discuss how to further optimize the skip list, perhaps using flush-on-read.


…d parameter.

Implement the single-writer-multi-reader skip-list based on the persistent-aware self_relative_ptr in existing concurrent_skip_list_impl as a templated parameter.
Add some UTs.
…d parameter.

Implement the single-writer-multi-reader skip-list based on the persistent-aware self_relative_ptr in existing concurrent_skip_list_impl as a templated parameter.
Add some UTs.
…d parameter.

Implement the single-writer-multi-reader skip-list based on the persistent-aware self_relative_ptr in existing concurrent_skip_list_impl as a templated parameter.
Add some UTs.
…d parameter.

Implement the single-writer-multi-reader skip-list based on the persistent-aware self_relative_ptr in existing concurrent_skip_list_impl as a templated parameter.
Add some UTs.
…d parameter.

Implement the single-writer-multi-reader skip-list based on the persistent-aware self_relative_ptr in existing concurrent_skip_list_impl as a templated parameter.
Add some UTs.
…d parameter.

Implement the single-writer-multi-reader skip-list based on the persistent-aware self_relative_ptr in existing concurrent_skip_list_impl as a templated parameter.
Add some UTs.
…d parameter.

Implement the single-writer-multi-reader skip-list based on the persistent-aware self_relative_ptr in existing concurrent_skip_list_impl as a templated parameter.
Add some UTs.
…d parameter.

Implement the single-writer-multi-reader skip-list based on the persistent-aware self_relative_ptr in existing concurrent_skip_list_impl as a templated parameter.
Add some UTs.
…d parameter.

Implement the single-writer-multi-reader skip-list based on the persistent-aware self_relative_ptr in existing concurrent_skip_list_impl as a templated parameter.
Add some UTs.
yjrobin and others added 10 commits August 24, 2021 10:06
…d parameter.

Implement the single-writer-multi-reader skip-list based on the persistent-aware self_relative_ptr in existing concurrent_skip_list_impl as a templated parameter.
Add some UTs.
…d parameter.

Implement the single-writer-multi-reader skip-list based on the persistent-aware self_relative_ptr in existing concurrent_skip_list_impl as a templated parameter.
Add some UTs.
…d parameter.

Implement the single-writer-multi-reader skip-list based on the persistent-aware self_relative_ptr in existing concurrent_skip_list_impl as a templated parameter.
Add some UTs.
…d parameter.

Implement the single-writer-multi-reader skip-list based on the persistent-aware self_relative_ptr in existing concurrent_skip_list_impl as a templated parameter.
Add some UTs.
…d parameter.

Implement the single-writer-multi-reader skip-list based on the persistent-aware self_relative_ptr in existing concurrent_skip_list_impl as a templated parameter.
Add some UTs.
…d parameter.

Implement the single-writer-multi-reader skip-list based on the persistent-aware self_relative_ptr in existing concurrent_skip_list_impl as a templated parameter.
Add some UTs.
…d parameter.

Implement the single-writer-multi-reader skip-list based on the persistent-aware self_relative_ptr in existing concurrent_skip_list_impl as a templated parameter.
Add some UTs.
…d parameter.

Implement the single-writer-multi-reader skip-list based on the persistent-aware self_relative_ptr in existing concurrent_skip_list_impl as a templated parameter.
Add some UTs.
…d parameter.

Implement the single-writer-multi-reader skip-list based on the persistent-aware self_relative_ptr in existing concurrent_skip_list_impl as a templated parameter.
Add some UTs.
@yjrobin
Copy link
Contributor Author

yjrobin commented Aug 24, 2021

Hi @lukaszstolarczuk and @pbalcer ,

I've updated the code according to the previous discussion with @igchor. He said you can help to review the code when he's on vacation.
Please take a look and let me know if there are any questions.

Currently, there are some problems in the UTs of radix_tree which uses tagged_ptr based on self_relative_ptr. I've tried to modify it accordingly using "self_relative_ptr<void, std::false_type>". It makes the compilation successful but the testcases of radix_tree cannot pass. Could you help me with this?

Thanks a lot.
Jun

Copy link

@karczex karczex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 30 files reviewed, 5 unresolved discussions (waiting on @igchor and @yjrobin)


include/libpmemobj++/detail/self_relative_ptr_base_impl.hpp, line 208 at r5 (raw file):

	}
	void
	set_dirty_flag(bool dirty)

Why flag have to be 0 when set and 1 when not? Wouldn't be logic cleaner otherwise?
This method may be just something like

void
set_dirty_flag(bool dirty)
{
	if(dirty)
		offset |= dirty_flag;
}

include/libpmemobj++/detail/self_relative_ptr_base_impl.hpp, line 249 at r5 (raw file):

		uintptr_t ptr = static_cast<uintptr_t>(
			reinterpret_cast<intptr_t>(this) +
			(other_offset | ~dirty_flag) + 1);

This line cause your problems with radix.
Address, which is checked in this assert

assert(get<P1>() == rhs.get());

is increased by 2 (and probably points to some garbage) when you unconditionally change this bit.

…d parameter.

Implement the single-writer-multi-reader skip-list based on the persistent-aware self_relative_ptr in existing concurrent_skip_list_impl as a templated parameter.
Add some UTs.
…d parameter.

Implement the single-writer-multi-reader skip-list based on the persistent-aware self_relative_ptr in existing concurrent_skip_list_impl as a templated parameter.
Add some UTs.
…d parameter.

Implement the single-writer-multi-reader skip-list based on the persistent-aware self_relative_ptr in existing concurrent_skip_list_impl as a templated parameter.
Add some UTs.
@yjrobin
Copy link
Contributor Author

yjrobin commented Aug 27, 2021

Why flag have to be 0 when set and 1 when not? Wouldn't be logic cleaner otherwise?

The definition of the offset in self_relative_ptr (offset=real_offset-1) indicates that for a normal pointer (e.g. T *) with 8-byte aligned allocation, the lower 3 bits of offset should always be 1. Therefore, I use 0 to represent 'dirty', and 1 otherwise.

This line cause your problems with radix.
Address, which is checked in this assert

I've fixed this problem. This is quite interesting. The tagged_ptr is quite similar to persistent-aware self_relative_ptr in the concept of marking the pointer. The difference is tagged_ptr tags the virtual address (the last bit: 1 for tagged, 0 otherwise), where persistent-aware self_relative_ptr tags the offset. This problems with radix caused by the logic of is_dirty(). Previously, the offset is considered dirty when the second lowest bit is 0, for tagged tagged_ptr, it is also true so it is set to 1 wrongly. I fix this by using 'the lower two bits is 01 ' to check if it's dirty.

I also removed the testcase for pa_self_relative_ptr_atomic, only pa_self_relative_ptr_atomic_pmem is tested. This is because persistent-aware self_relative_ptr must be on PMem.

@igchor
Copy link
Contributor

igchor commented Sep 8, 2021

Hi @yjrobin, I have a few follow-up questions regarding the design and implementation. Would you be available for a meeting/call? I think it would be easier to discuss in real time than on github. If you could, please send me an email at igor.chorazewicz@intel.com so we could schedule something.

@codecov
Copy link

codecov bot commented Sep 13, 2021

Codecov Report

Merging #1174 (ec0b540) into master (2ab4604) will decrease coverage by 3.51%.
The diff coverage is 94.38%.

❗ Current head ec0b540 differs from pull request most recent head d561afd. Consider uploading reports for the commit d561afd to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1174      +/-   ##
==========================================
- Coverage   93.65%   90.13%   -3.52%     
==========================================
  Files          53       53              
  Lines        4567     4175     -392     
==========================================
- Hits         4277     3763     -514     
- Misses        290      412     +122     
Flag Coverage Δ
tests_clang_debug_cpp17 ?
tests_gcc_debug 90.13% <94.38%> (-0.30%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
include/libpmemobj++/detail/tagged_ptr.hpp 89.39% <ø> (-10.61%) ⬇️
...clude/libpmemobj++/experimental/concurrent_map.hpp 100.00% <ø> (ø)
...memobj++/experimental/atomic_self_relative_ptr.hpp 82.56% <87.50%> (-8.86%) ⬇️
...j++/container/detail/concurrent_skip_list_impl.hpp 90.15% <100.00%> (-6.54%) ⬇️
...ibpmemobj++/detail/self_relative_ptr_base_impl.hpp 96.07% <100.00%> (-0.85%) ⬇️
...de/libpmemobj++/experimental/self_relative_ptr.hpp 100.00% <100.00%> (ø)
include/libpmemobj++/experimental/swmr_map.hpp 100.00% <100.00%> (ø)
include/libpmemobj++/utils.hpp 75.00% <0.00%> (-25.00%) ⬇️
include/libpmemobj++/pexceptions.hpp 4.54% <0.00%> (-18.67%) ⬇️
include/libpmemobj++/detail/ringbuf.hpp 79.66% <0.00%> (-17.42%) ⬇️
... and 43 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2ab4604...d561afd. Read the comment docs.

@lukaszstolarczuk lukaszstolarczuk marked this pull request as draft September 16, 2022 10:22
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants