Persistent-aware self_relative_ptr and Single-writer-multi-reader (SWMR) skiplist #1174

yjrobin · 2021-07-23T08:19:33Z

Dear all,

It has been a long time since we talked earlier. I create this PR as the kick-start to discuss with you about how we can contribute to this repo in order to make a new storage engine (based on a single-writer-multi-reader skiplist) for PmemKV.
As we discussed earlier, we both found the traditional persistent_ptr in PMDK is too expensive (16 bytes) and not convenient for atomic operations. So I looked into the experimental self_relative_ptr you mentioned and come up with some idea to modify it to facilitate the implementation of Persistent Compare-and-swap. This PR includes the following components:

1. pa_self_relative_ptr

To prevent concurrent read ops to see non-persistent data updated by a concurrent write op in PMem, expensive locks/txns are commonly used for write op, e.g., locks the data object during updates and flush, unlock it after flush is finished, or encapsulate the updates and flush in a PMDK txns.
Based on self_relative_ptr, the definition of the stored offset is real_offset-1, so for 8-byte aligned allocation, the lower 3 bits of the stored offset are always 1 (except null_ptr). Therefore, the second lowest bit can be used as the indicator of whether the pa_self_relative_ptr (persistent-aware self_relative_ptr) itself needs explicit flush. If a consequent (atomic) read sees the second lowest bit is 0, it uses CAS to reset the bit to 1 and explicitly performs the flush.
Similar test cases to those of self_relative_ptr are created. Maybe more needs to be added after discussion.

2. swmr_skip_list

Currently it is almost the same as the existing MWMR concurrent_skip_list (multi-writer-multi-reader skiplist) except the procedure of setting and retrieving the next ptr of a skiplist node.
TODO: remove the locks and txns that are not needed for single-writer-multi-reader semantics.

Before I move on to implement the final swmr_skip_list, I'd like to discuss with you about the pa_self_relative_ptr to see if there are any correctness problems. And also how you think the existing MWMR concurrent_skip_list can use pa_self_relative_ptr to remove unnecessary locks and txns in SWMR semantics.

Thanks!

Best rgds,
Jun Yang

This change is

…_map

… swmr_map Add some related tests

igchor

@yjrobin Thank you very much for the patch! I really like the direction of this. My main comment would be to think about reusing existing components in pa_self_relative_ptr and swmr_skip_list. For pa_self_relative_ptr I think it would be easier to implement the flush_needed on top of the existing self_relative_ptr implementation (see comment below). That way it would also be easier to tell if the implementation is correct.

Regarding the skip_list implementation, I think it would be best to have everything in one place (e.g. in concurrent_map/concurrent_skip_list_impl.hpp) and just add some (template) parameters to control the concurrency setting (swmr vs mwmr) or just have separate insert methods for swmr and mwmr modes. Do you think this is possible?

Reviewed 1 of 18 files at r1.
Reviewable status: 1 of 19 files reviewed, 2 unresolved discussions (waiting on @yjrobin)

include/libpmemobj++/experimental/atomic_pa_self_relative_ptr.hpp, line 17 at r3 (raw file):

{
/**
 * Atomic specialization for pa_self_relative_ptr

Does it make sense to have pa_self_relative_ptr on it's own? (without atomic around it)? I'm wondering if we couldn't just use self_relative_ptr and implement the flush_needed logic on top of it in this atomic specialization. Please take a look at include/libpmemobj++/detail/tagged_ptr.hpp. It's quite similar.

Maybe you could even use tagged_ptr directly in atomic<pa_self_relative_ptr> implementation?

include/libpmemobj++/experimental/swmr_skip_list_impl.hpp, line 125 at r3 (raw file):

	next(size_type level)
	{
		assert(level < height());

I think that this kind of logic is pretty generic and this could be implemented in some load() method in the atomic<pa_self_relative_ptr> class perhaps?

yjrobin · 2021-07-26T01:23:15Z

@yjrobin Thank you very much for the patch! I really like the direction of this. My main comment would be to think about reusing existing components in pa_self_relative_ptr and swmr_skip_list. For pa_self_relative_ptr I think it would be easier to implement the flush_needed on top of the existing self_relative_ptr implementation (see comment below). That way it would also be easier to tell if the implementation is correct.

This is the right way to do it, I create a new one because I want to show you the idea and logic without changing existing file. I'll try to merge the pa_self_relative_ptr into self_relative_ptr with some extra overloaded APIs.

Regarding the skip_list implementation, I think it would be best to have everything in one place (e.g. in concurrent_map/concurrent_skip_list_impl.hpp) and just add some (template) parameters to control the concurrency setting (swmr vs mwmr) or just have separate insert methods for swmr and mwmr modes. Do you think this is possible?

Reviewed 1 of 18 files at r1.
Reviewable status: 1 of 19 files reviewed, 2 unresolved discussions (waiting on @yjrobin)

This is possible as they are quite similar. I'll merge them.

include/libpmemobj++/experimental/atomic_pa_self_relative_ptr.hpp, line 17 at r3 (raw file):
{
/**
 * Atomic specialization for pa_self_relative_ptr
Does it make sense to have pa_self_relative_ptr on it's own? (without atomic around it)? I'm wondering if we couldn't just use self_relative_ptr and implement the flush_needed logic on top of it in this atomic specialization. Please take a look at include/libpmemobj++/detail/tagged_ptr.hpp. It's quite similar.

Maybe you could even use tagged_ptr directly in atomic<pa_self_relative_ptr> implementation?

If merging pa_self_relative_ptr into self_relative_ptr cannot be done, I'll look into the tagged_ptr.

include/libpmemobj++/experimental/swmr_skip_list_impl.hpp, line 125 at r3 (raw file):
	next(size_type level)
	{
		assert(level < height());
I think that this kind of logic is pretty generic and this could be implemented in some load() method in the atomic<pa_self_relative_ptr> class perhaps?

Implementing this logic in load() may introduce some problems for "const" iterators as load() function generally does not modify anything. My idea is allow some read-only (not the read in the write-after-read) operation such as const iterators to read the those dirty self_relative_ptr without flushing and resetting the flag. How do you think?

igchor

Great! In general, I think that having two separate classes: self_relative_ptr and pa_self_relative_ptr might be a good idea. pa_self_relative could provide only methods like store/load/compare_exchange/... which ensure atomicity is the same as visibility. If we have all API in self_relateive_ptr it might be easy to accidentally mix the API by a user.

The flush_neede/dirty flag should be an implementation detail, hidden from the user IMO.

Reviewable status: 1 of 19 files reviewed, 2 unresolved discussions (waiting on @igchor and @yjrobin)

include/libpmemobj++/experimental/swmr_skip_list_impl.hpp, line 125 at r3 (raw file):

Previously, yjrobin wrote…

Implementing this logic in load() may introduce some problems for "const" iterators as load() function generally does not modify anything. My idea is allow some read-only (not the read in the write-after-read) operation such as const iterators to read the those dirty self_relative_ptr without flushing and resetting the flag. How do you think?

Right, the load should not modify anything - but I believe you should still put this logic somewhere into the pa_self_relative_ptr (some separate method). Regarding reading dirty self_relative_ptrs, how would you ensure consistency of the data structure? Maybe for const load() we could have something similar to this implementation: https://github.com/pmem/pmdk/blob/master/src/examples/libpmem2/ringbuf/ringbuf.c#L247

The load method there only persists the value, but does not clear the dirty flag.

igchor

Reviewable status: 1 of 19 files reviewed, 3 unresolved discussions (waiting on @igchor and @yjrobin)

a discussion (no related file):

Getting rid of transactions is quite tricky and I would suggest to focus on locks first. We need to use transactions, at least for allocations (make_persistent only works within a transaction). Replacing current implementation with Flush-on-Read approach is not so simple because you need to think about memory leaks. We can discuss this in more detail if you want.

As a first step, I would suggest removing locks and replacing enumerable_thread_specific with a single variable for SWMR case. To remove locks you can just create a dummy version of try_lock_nodes method which just do nothing in SWMR case. For enumerable_thread_specific, you could also create a dummy class, which always returns the same reference in local() method (since we only have one writer, we do not need per-thread storage).

yjrobin

Reviewable status: 1 of 19 files reviewed, 3 unresolved discussions (waiting on @igchor and @yjrobin)

a discussion (no related file):

Previously, igchor (Igor Chorążewicz) wrote…

Getting rid of transactions is quite tricky and I would suggest to focus on locks first. We need to use transactions, at least for allocations (make_persistent only works within a transaction). Replacing current implementation with Flush-on-Read approach is not so simple because you need to think about memory leaks. We can discuss this in more detail if you want.

As a first step, I would suggest removing locks and replacing enumerable_thread_specific with a single variable for SWMR case. To remove locks you can just create a dummy version of try_lock_nodes method which just do nothing in SWMR case. For enumerable_thread_specific, you could also create a dummy class, which always returns the same reference in local() method (since we only have one writer, we do not need per-thread storage).

Yes, it is not necessary to remove the txn for pmem allocation. I just focus on the internal locks of the SWMR skiplist.
Dummy version of try_lock_nodes method and per-thread class is what I'm doing. Thanks for the suggestion.

igchor

Reviewable status: 1 of 19 files reviewed, 3 unresolved discussions (waiting on @igchor and @yjrobin)

a discussion (no related file):

Previously, yjrobin wrote…

Yes, it is not necessary to remove the txn for pmem allocation. I just focus on the internal locks of the SWMR skiplist.
Dummy version of try_lock_nodes method and per-thread class is what I'm doing. Thanks for the suggestion.

@yjrobin just wanted to let you know that I will be out of the office for the next 3 or 4 weeks. If you have any questions please feel free to reach out to lukasz.stolarczuk@intel.com or piotr.balcer@intel.com My team will also review the changes regarding locks and tls once they are ready.

Once I'm back we can discuss how to further optimize the skip list, perhaps using flush-on-read.

…d parameter. Implement the single-writer-multi-reader skip-list based on the persistent-aware self_relative_ptr in existing concurrent_skip_list_impl as a templated parameter. Add some UTs.

yjrobin · 2021-08-24T04:45:29Z

Hi @lukaszstolarczuk and @pbalcer ,

I've updated the code according to the previous discussion with @igchor. He said you can help to review the code when he's on vacation.
Please take a look and let me know if there are any questions.

Currently, there are some problems in the UTs of radix_tree which uses tagged_ptr based on self_relative_ptr. I've tried to modify it accordingly using "self_relative_ptr<void, std::false_type>". It makes the compilation successful but the testcases of radix_tree cannot pass. Could you help me with this?

Thanks a lot.
Jun

karczex

Reviewable status: 0 of 30 files reviewed, 5 unresolved discussions (waiting on @igchor and @yjrobin)

include/libpmemobj++/detail/self_relative_ptr_base_impl.hpp, line 208 at r5 (raw file):

	}
	void
	set_dirty_flag(bool dirty)

Why flag have to be 0 when set and 1 when not? Wouldn't be logic cleaner otherwise?
This method may be just something like

void
set_dirty_flag(bool dirty)
{
	if(dirty)
		offset |= dirty_flag;
}

include/libpmemobj++/detail/self_relative_ptr_base_impl.hpp, line 249 at r5 (raw file):

		uintptr_t ptr = static_cast<uintptr_t>(
			reinterpret_cast<intptr_t>(this) +
			(other_offset | ~dirty_flag) + 1);

This line cause your problems with radix.
Address, which is checked in this assert

libpmemobj-cpp/include/libpmemobj++/detail/tagged_ptr.hpp

Line 62 in 1185a85

assert(get<P1>() == rhs.get());

is increased by 2 (and probably points to some garbage) when you unconditionally change this bit.

…d parameter. Implement the single-writer-multi-reader skip-list based on the persistent-aware self_relative_ptr in existing concurrent_skip_list_impl as a templated parameter. Add some UTs.

yjrobin · 2021-08-27T06:50:39Z

Why flag have to be 0 when set and 1 when not? Wouldn't be logic cleaner otherwise?

The definition of the offset in self_relative_ptr (offset=real_offset-1) indicates that for a normal pointer (e.g. T *) with 8-byte aligned allocation, the lower 3 bits of offset should always be 1. Therefore, I use 0 to represent 'dirty', and 1 otherwise.

This line cause your problems with radix.
Address, which is checked in this assert

I've fixed this problem. This is quite interesting. The tagged_ptr is quite similar to persistent-aware self_relative_ptr in the concept of marking the pointer. The difference is tagged_ptr tags the virtual address (the last bit: 1 for tagged, 0 otherwise), where persistent-aware self_relative_ptr tags the offset. This problems with radix caused by the logic of is_dirty(). Previously, the offset is considered dirty when the second lowest bit is 0, for tagged tagged_ptr, it is also true so it is set to 1 wrongly. I fix this by using 'the lower two bits is 01 ' to check if it's dirty.

I also removed the testcase for pa_self_relative_ptr_atomic, only pa_self_relative_ptr_atomic_pmem is tested. This is because persistent-aware self_relative_ptr must be on PMem.

igchor · 2021-09-08T14:09:18Z

Hi @yjrobin, I have a few follow-up questions regarding the design and implementation. Would you be available for a meeting/call? I think it would be easier to discuss in real time than on github. If you could, please send me an email at igor.chorazewicz@intel.com so we could schedule something.

codecov · 2021-09-13T14:50:26Z

Codecov Report

Merging #1174 (ec0b540) into master (2ab4604) will decrease coverage by 3.51%.
The diff coverage is 94.38%.

❗ Current head ec0b540 differs from pull request most recent head d561afd. Consider uploading reports for the commit d561afd to get more accurate results

@@            Coverage Diff             @@
##           master    #1174      +/-   ##
==========================================
- Coverage   93.65%   90.13%   -3.52%     
==========================================
  Files          53       53              
  Lines        4567     4175     -392     
==========================================
- Hits         4277     3763     -514     
- Misses        290      412     +122

Flag	Coverage Δ
tests_clang_debug_cpp17	`?`
tests_gcc_debug	`90.13% <94.38%> (-0.30%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
include/libpmemobj++/detail/tagged_ptr.hpp	`89.39% <ø> (-10.61%)`	⬇️
...clude/libpmemobj++/experimental/concurrent_map.hpp	`100.00% <ø> (ø)`
...memobj++/experimental/atomic_self_relative_ptr.hpp	`82.56% <87.50%> (-8.86%)`	⬇️
...j++/container/detail/concurrent_skip_list_impl.hpp	`90.15% <100.00%> (-6.54%)`	⬇️
...ibpmemobj++/detail/self_relative_ptr_base_impl.hpp	`96.07% <100.00%> (-0.85%)`	⬇️
...de/libpmemobj++/experimental/self_relative_ptr.hpp	`100.00% <100.00%> (ø)`
include/libpmemobj++/experimental/swmr_map.hpp	`100.00% <100.00%> (ø)`
include/libpmemobj++/utils.hpp	`75.00% <0.00%> (-25.00%)`	⬇️
include/libpmemobj++/pexceptions.hpp	`4.54% <0.00%> (-18.67%)`	⬇️
include/libpmemobj++/detail/ringbuf.hpp	`79.66% <0.00%> (-17.42%)`	⬇️
... and 43 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2ab4604...d561afd. Read the comment docs.

# Conflicts: # tests/CMakeLists.txt

yjrobin and others added 15 commits July 9, 2021 15:28

Add code skeleton for new persistent-aware self_relative_ptr and swmr…

7c6e8d9

…_map

Merge remote-tracking branch 'origin/master' into swmr_skip_list

0ec4b0d

Add new pa_self_relative_ptr (persistent-aware self_relative_ptr) and…

5e942eb

… swmr_map Add some related tests

Add new pa_self_relative_ptr (persistent-aware self_relative_ptr) and…

e4f4cc4

… swmr_map Add some related tests

Clear format

4e5a08b

Merge branch 'pmem:master' into master

2c30c74

Clear format

def1592

Clear format

0ea98e7

Clear format

ec1f1cc

Clear format

51ee8aa

Clear format

6af3575

Clear format

1a3c8c0

Fix copy constructor

08ddbfe

Add some comments

380c70f

Fix and add some comments.

8226e94

igchor suggested changes Jul 23, 2021

View reviewed changes

igchor suggested changes Jul 26, 2021

View reviewed changes

igchor suggested changes Jul 30, 2021

View reviewed changes

yjrobin commented Jul 31, 2021

View reviewed changes

igchor reviewed Aug 6, 2021

View reviewed changes

yjrobin added 9 commits August 23, 2021 16:43

Implement persistent-aware feature of self_relative_ptr as a template…

92d238d

…d parameter. Implement the single-writer-multi-reader skip-list based on the persistent-aware self_relative_ptr in existing concurrent_skip_list_impl as a templated parameter. Add some UTs.

Implement persistent-aware feature of self_relative_ptr as a template…

f4eac57

…d parameter. Implement the single-writer-multi-reader skip-list based on the persistent-aware self_relative_ptr in existing concurrent_skip_list_impl as a templated parameter. Add some UTs.

Implement persistent-aware feature of self_relative_ptr as a template…

7952c73

…d parameter. Implement the single-writer-multi-reader skip-list based on the persistent-aware self_relative_ptr in existing concurrent_skip_list_impl as a templated parameter. Add some UTs.

Implement persistent-aware feature of self_relative_ptr as a template…

07d9e54

…d parameter. Implement the single-writer-multi-reader skip-list based on the persistent-aware self_relative_ptr in existing concurrent_skip_list_impl as a templated parameter. Add some UTs.

Implement persistent-aware feature of self_relative_ptr as a template…

2879d1a

…d parameter. Implement the single-writer-multi-reader skip-list based on the persistent-aware self_relative_ptr in existing concurrent_skip_list_impl as a templated parameter. Add some UTs.

Implement persistent-aware feature of self_relative_ptr as a template…

d5b38ac

…d parameter. Implement the single-writer-multi-reader skip-list based on the persistent-aware self_relative_ptr in existing concurrent_skip_list_impl as a templated parameter. Add some UTs.

Implement persistent-aware feature of self_relative_ptr as a template…

783c3ab

…d parameter. Implement the single-writer-multi-reader skip-list based on the persistent-aware self_relative_ptr in existing concurrent_skip_list_impl as a templated parameter. Add some UTs.

Implement persistent-aware feature of self_relative_ptr as a template…

54e7a63

…d parameter. Implement the single-writer-multi-reader skip-list based on the persistent-aware self_relative_ptr in existing concurrent_skip_list_impl as a templated parameter. Add some UTs.

Implement persistent-aware feature of self_relative_ptr as a template…

cd406ba

…d parameter. Implement the single-writer-multi-reader skip-list based on the persistent-aware self_relative_ptr in existing concurrent_skip_list_impl as a templated parameter. Add some UTs.

yjrobin and others added 10 commits August 24, 2021 10:06

Implement persistent-aware feature of self_relative_ptr as a template…

cae1d74

…d parameter. Implement the single-writer-multi-reader skip-list based on the persistent-aware self_relative_ptr in existing concurrent_skip_list_impl as a templated parameter. Add some UTs.

Implement persistent-aware feature of self_relative_ptr as a template…

f00194b

…d parameter. Implement the single-writer-multi-reader skip-list based on the persistent-aware self_relative_ptr in existing concurrent_skip_list_impl as a templated parameter. Add some UTs.

Implement persistent-aware feature of self_relative_ptr as a template…

7690fe0

…d parameter. Implement the single-writer-multi-reader skip-list based on the persistent-aware self_relative_ptr in existing concurrent_skip_list_impl as a templated parameter. Add some UTs.

Implement persistent-aware feature of self_relative_ptr as a template…

cf97885

…d parameter. Implement the single-writer-multi-reader skip-list based on the persistent-aware self_relative_ptr in existing concurrent_skip_list_impl as a templated parameter. Add some UTs.

Implement persistent-aware feature of self_relative_ptr as a template…

6a37da5

…d parameter. Implement the single-writer-multi-reader skip-list based on the persistent-aware self_relative_ptr in existing concurrent_skip_list_impl as a templated parameter. Add some UTs.

Implement persistent-aware feature of self_relative_ptr as a template…

e18e9da

…d parameter. Implement the single-writer-multi-reader skip-list based on the persistent-aware self_relative_ptr in existing concurrent_skip_list_impl as a templated parameter. Add some UTs.

Implement persistent-aware feature of self_relative_ptr as a template…

b2fea9d

…d parameter. Implement the single-writer-multi-reader skip-list based on the persistent-aware self_relative_ptr in existing concurrent_skip_list_impl as a templated parameter. Add some UTs.

Merge remote-tracking branch 'upstream/master'

be7f84f

Implement persistent-aware feature of self_relative_ptr as a template…

627eaef

…d parameter. Implement the single-writer-multi-reader skip-list based on the persistent-aware self_relative_ptr in existing concurrent_skip_list_impl as a templated parameter. Add some UTs.

Implement persistent-aware feature of self_relative_ptr as a template…

df944c8

…d parameter. Implement the single-writer-multi-reader skip-list based on the persistent-aware self_relative_ptr in existing concurrent_skip_list_impl as a templated parameter. Add some UTs.

karczex suggested changes Aug 26, 2021

View reviewed changes

yjrobin added 3 commits August 27, 2021 12:02

Implement persistent-aware feature of self_relative_ptr as a template…

dde768b

…d parameter. Implement the single-writer-multi-reader skip-list based on the persistent-aware self_relative_ptr in existing concurrent_skip_list_impl as a templated parameter. Add some UTs.

Implement persistent-aware feature of self_relative_ptr as a template…

f087cc6

…d parameter. Implement the single-writer-multi-reader skip-list based on the persistent-aware self_relative_ptr in existing concurrent_skip_list_impl as a templated parameter. Add some UTs.

Implement persistent-aware feature of self_relative_ptr as a template…

5442e77

…d parameter. Implement the single-writer-multi-reader skip-list based on the persistent-aware self_relative_ptr in existing concurrent_skip_list_impl as a templated parameter. Add some UTs.

yjrobin added 4 commits September 26, 2021 11:06

Merge branch 'pmem:master' into master

ec0b540

Merge branch 'pmem:master' into master

bdad6f7

Merge branch 'pmem-master'

01064d7

# Conflicts: # tests/CMakeLists.txt

Merge remote-tracking branch 'origin/master'

d561afd

lukaszstolarczuk marked this pull request as draft September 16, 2022 10:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Persistent-aware self_relative_ptr and Single-writer-multi-reader (SWMR) skiplist #1174

Persistent-aware self_relative_ptr and Single-writer-multi-reader (SWMR) skiplist #1174

yjrobin commented Jul 23, 2021 •

edited by igchor

igchor left a comment

yjrobin commented Jul 26, 2021

igchor left a comment

igchor left a comment

yjrobin left a comment

igchor left a comment

yjrobin commented Aug 24, 2021

karczex left a comment

yjrobin commented Aug 27, 2021 •

edited

igchor commented Sep 8, 2021

codecov bot commented Sep 13, 2021 •

edited

Persistent-aware self_relative_ptr and Single-writer-multi-reader (SWMR) skiplist #1174

Are you sure you want to change the base?

Persistent-aware self_relative_ptr and Single-writer-multi-reader (SWMR) skiplist #1174

Conversation

yjrobin commented Jul 23, 2021 • edited by igchor

1. pa_self_relative_ptr

2. swmr_skip_list

igchor left a comment

Choose a reason for hiding this comment

yjrobin commented Jul 26, 2021

igchor left a comment

Choose a reason for hiding this comment

igchor left a comment

Choose a reason for hiding this comment

yjrobin left a comment

Choose a reason for hiding this comment

igchor left a comment

Choose a reason for hiding this comment

yjrobin commented Aug 24, 2021

karczex left a comment

Choose a reason for hiding this comment

yjrobin commented Aug 27, 2021 • edited

igchor commented Sep 8, 2021

codecov bot commented Sep 13, 2021 • edited

Codecov Report

yjrobin commented Jul 23, 2021 •

edited by igchor

yjrobin commented Aug 27, 2021 •

edited

codecov bot commented Sep 13, 2021 •

edited