-
Notifications
You must be signed in to change notification settings - Fork 488
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adjust sds types #502
base: unstable
Are you sure you want to change the base?
Adjust sds types #502
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## unstable #502 +/- ##
============================================
+ Coverage 70.17% 70.26% +0.08%
============================================
Files 109 109
Lines 59904 59983 +79
============================================
+ Hits 42039 42148 +109
+ Misses 17865 17835 -30
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this makes sense to me. The slightly more memory usage on some systems is likely OK since I think we still prefer to use jemalloc in most cases.
@poiuj overall looks much better now. I would suggest we include some unit tests to cover different types edge cases (for example the case of old style alloc overflow). I am comfortable with the asserts, but we will probably want to trigger ci and daily before we add this. After we finish this PR lets go and fix #453 accordingly |
@poiuj the tests looks fine.
Is it really a waste? we already reduced the overflow in the prev implementation and AFAIK we never should have written pass the usable anyhow right? lets say I will allocate 253 sds. Before: I would allocate according to 253+(sizeof(sdshdr8)=3)+1 = 257 --> allocated size of 320 (IIRC) but the usable size is 255 So IMO this is 59 bytes gain:) I still want to verify there are no issues related to backward compatibility (I think RDB does not serialize sds headers, but would like to think on other potential case) |
@ranshid your example is correct. It's a 59 bytes gain if we use jemalloc. My top commented is about libc allocator that wouldn't allocate 320 bytes in this case. |
Oh I get it now. yes. indeed this is not too bad IMO. Please also consider avoid the malloc_size use in realloc |
3f6c96f
to
a9b378b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/sds.c
Outdated
@@ -124,8 +132,14 @@ sds _sdsnewlen(const void *init, size_t initlen, int trymalloc) { | |||
s = (char*)sh+hdrlen; | |||
fp = ((unsigned char*)s)-1; | |||
usable = usable-hdrlen-1; | |||
|
|||
#ifdef USE_JEMALLOC | |||
assert(usable <= sdsTypeMaxSize(type)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assertion here is based on the behavior we expected/monitor from jemalloc, it depends on the internal of jemalloc, is there any guarantee from jemalloc? Maybe a potential risk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why can't we always keep the defending part? (usable = min(usable, sdsTypeMaxSize(type))
AND also maybe we need to consider keep the assert for the usable size correctness, eg:
assert(usable >= initlen);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert(usable <= sdsTypeMaxSize(type));
assert(initlen <= usable);
Are our two invariants. Should we be doing those asserts regardless of the allocator being used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that is a good point:
we want to make sure that in case JEMALLOC is used we will SURELY have real usable == usable, in order to satisfy #453, so I guess we do have to keep assert(usable <= sdsTypeMaxSize(type)); for JEMALLOC (for example in order to assert when JEMALLOC was compiled with different bin sizes?).
I think (as @madolson suggested) the defensive option is better at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would still change the code so that:
#ifndef USE_JEMALLOC
usable = MIN(usable, sdsTypeMaxSize(type));
#endif
assert(usable <= sdsTypeMaxSize(type) && initlen <= usable);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is a better solution. We can adjust types after the allocation/reallocation. Previously I thought there are some edge cases that wouldn't work. But after reconsidering the idea, and noting again that alloc
doesn't include header size and null terminator, I think it's a better way.
With this solution we don't need to take assumptions for correctness. Instead of truncating usable
, we're going to adjust the header type.
I'll update the PR with this idea.
sds type should be determined based on the size of the underlying buffer, not the logical length of the sds. Currently we truncate the alloc field in case buffer is larger than we can handle. It leads to a mismatch between alloc field and the actual size of the buffer. Even considering that alloc doesn't include header size and the null terminator. It also leads to a waste of memory with jemalloc. For example, let's consider creation of sds of length 253. According to the length, the appropriate type is SDS_TYPE_8. But we allocate `253 + sizeof(struct sdshdr8) + 1` bytes, which sums to 257 bytes. In this case jemalloc allocates buffer from the next size bucket. With current configuration on Linux it's 320 bytes. So we end up with 320 bytes buffer, while we can't address more than 255. The same happens with other types and length close enough to the appropriate powers of 2. The downside of the adjustment is that with allocators that do not allocate larger than requested chunks (like GNU allocator), we switch to a larger type "too early". It leads to small waste of memory. Specifically: sds of length 31 takes 35 bytes instead of 33 (2 bytes wasted) sds of length 255 takes 261 bytes instead of 259 (2 bytes wasted) sds of length 65,535 takes 65,545 bytes instead of 65,541 (4 bytes wasted) sds of length 4,294,967,295 takes 4,294,967,313 bytes instead of 4,294,967,305 (8 bytes wasted) Signed-off-by: Vadym Khoptynets <vadymkh@amazon.com>
Make sdsTypeMaxSize account for header and null terminator. Use it in sdsReqType and to assert `usable` is less than max size. Signed-off-by: Vadym Khoptynets <vadymkh@amazon.com>
The test checks that appropriate types are selected for string length. It also checks that sdsAllocSize returns correct allocation size. Signed-off-by: Vadym Khoptynets <vadymkh@amazon.com>
Also eliminate copying of strings - pass NULL as init. Signed-off-by: Vadym Khoptynets <vadymkh@amazon.com>
This is to preserve git history. To be able to use it in sdsReqType a forward declaration is added. Signed-off-by: Vadym Khoptynets <vadymkh@amazon.com>
After the adjustments, jemalloc always returns buffer that fit to chosen SDS type. On the other hand, it's not guaranteed with other allocators. We leave the current behavior for all allocators, but jemalloc. For jemalloc we assert instead. Signed-off-by: Vadym Khoptynets <vadymkh@amazon.com>
With other allocators we can't guarantee alloc field is correct. Signed-off-by: Vadym Khoptynets <vadymkh@amazon.com>
libc allocator can return larger buffer than requested. Unlike jemalloc the buffer sizes are not aligned with SDS sizes. That's why we may need to adjust the SDS type to a larger one to be sure alloc field can hold the size of the allocated buffer. The maximum length for type SDS_TYPE_X is 2^X - header_size(SDS_TYPE_X) - 1. The maxium value to be stored in alloc field is 2^X - 1. When allocated buffer is larger than 2^X + header_size(SDS_TYPE_X), we "move" to a larger type SDS_TYPE_Y. To be sure SDS_TYPE_Y header fits into 2^X + header_size(SDS_TYPE_X) + 1 bytes, the difference between header sizes must be smaller than header_size(SDS_TYPE_X) + 1. We ignore SDS_TYPE_5 as it doesn't have alloc field. Signed-off-by: Vadym Khoptynets <vadymkh@amazon.com>
Signed-off-by: Vadym Khoptynets <vadymkh@amazon.com>
Updated the branch. Includes the new approach with adjusting types. Also rebased on the latest unstable. Unfortunately, missed some style changes during conflict resolution. But it's small stuff that I can fix later before the actual merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can make a function to capture the repeated code?:
/* Adjust type if usable won't fit into alloc. */
if (type != SDS_TYPE_5 && usable > sdsTypeMaxSize(type)) {
type = sdsReqType(usable);
hdrlen = sdsHdrSize(type);
usable = bufsize - hdrlen - 1;
assert(usable <= sdsTypeMaxSize(type));
}
src/sds.c
Outdated
if (type == SDS_TYPE_8) return (1 << 8) - 1; | ||
if (type == SDS_TYPE_16) return (1 << 16) - 1; | ||
if (type == SDS_TYPE_5) | ||
return (1<<5) - 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is better to leave the current format in order to avoid non-logical changes.
Signed-off-by: Vadym Khoptynets <vadymkh@amazon.com>
Signed-off-by: Vadym Khoptynets <vadymkh@amazon.com>
LGTM but would like to hear @lipzhu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just one concern.
x = sdsnewlen(NULL, 252); | ||
TEST_ASSERT_MESSAGE("len 252 type", (x[-1] & SDS_TYPE_MASK) >= SDS_TYPE_8); | ||
#ifdef USE_JEMALLOC | ||
TEST_ASSERT_MESSAGE("len 252 sdsAllocSize", sdsAllocSize(x) == s_malloc_size(sdsAllocPtr(x))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found the warn message from jemalloc manual.
Any discrepancy between the requested allocation size and the size reported by malloc_usable_size() should not be depended on, since such behavior is entirely implementation-dependent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't depend on discrepancy between the requested allocation size and the size reported by malloc_usable_size
. Not in the test, not in the code. The test checks that sdsAllocSize
returns the same value as s_malloc_size
. It's always true as sdsAllocSize
is based on the value we got from malloc_usable_size
before.
In any case, this test checks that our assumption is correct. As we vendor jemalloc, the only case when it can change (theoretically) is when we update jemalloc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's always true as sdsAllocSize is based on the value we got from malloc_usable_size before.
You are right, so we can open assertion not only for jemalloc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, you're right! This ifdef
isn't needed anymore. Updated the PR.
Signed-off-by: Vadym Khoptynets <vadymkh@amazon.com>
Signed-off-by: Vadym Khoptynets <vadymkh@amazon.com>
sds type should be determined based on the size of the underlying buffer, not the logical length of the sds. Currently we truncate the alloc field in case buffer is larger than we can handle. It leads to a mismatch between alloc field and the actual size of the buffer. Even considering that alloc doesn't include header size and the null terminator.
It also leads to a waste of memory with jemalloc. For example, let's consider creation of sds of length 253. According to the length, the appropriate type is SDS_TYPE_8. But we allocate
253 + sizeof(struct sdshdr8) + 1
bytes, which sums to 257 bytes. In this case jemalloc allocates buffer from the next size bucket. With current configuration on Linux it's 320 bytes. So we end up with 320 bytes buffer, while we can't address more than 255.The same happens with other types and length close enough to the appropriate powers of 2.
The downside of the adjustment is that with allocators that do not allocate larger than requested chunks (like GNU allocator), we switch to a larger type "too early". It leads to small waste of memory. Specifically: sds of length 31 takes 35 bytes instead of 33 (2 bytes wasted) sds of length 255 takes 261 bytes instead of 259 (2 bytes wasted) sds of length 65,535 takes 65,545 bytes instead of 65,541 (4 bytes wasted) sds of length 4,294,967,295 takes 4,294,967,313 bytes instead of 4,294,967,305 (8 bytes wasted)