-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Old] Keyname overhaul #3447
[Old] Keyname overhaul #3447
Conversation
Thank you so much for rebasing this PR! I don't know if anybody will be able to pick this up, but it will definitely be much easier with this rebased version. |
Hard to say. It really depends on what needs to be done before this PR can be merged...
#3471 does have some progress, but I don't think it really affects this PR, there may be a few minor changes needed to make it compatible with the new keyname scheme but it shouldn't be too hard. I don't see any progress in #2969 since this PR (the rebased version) was created. But unlike #3471, it has a lot of overlap with this PR. So merging one will have a big effect on the other. Personally, I would prefer, if #2969 would be finished first, because that would probably cause less work over all. However, at this point it seems a bit unlikely to me. I will invest a few hours this week to get this PR rebased again and to identify the parts that still need work. Then I'll be able to give a more concrete timeline and maybe there are some things were other people can help out as well. Lots of the bugfixes so far were just a while of debugging to find the one magic number that makes an assumption about namespace length, or something like that. Very few bugfixes actually required a deep understanding of the changes in this PR. |
Thank you for the update! Yes, it would be very interesting to know what still needs to be done, thank you so much for looking into it 💖. I also hope that @vLesk can finish his PR before this PR. This PR seems to me it should be rather merged directly after a release and tested extensively. I'll help with testing. |
Merging after a release certainly allows for the most testing yes. In any case, this PR should ideally be merged while no other big PRs are in progress. Anything that is in progress when this PR is merged, will likely require extensive changes. |
53b1b04
to
26e3491
Compare
I did the rebase and I think the PR is now more or less in the same state as before the rebases:
Also documentation and code comments need to be updated. But that could be done in a separate PR to avoid additional rebases. Some problems I found only appear on my machine, but not on the build server:
Regarding the changes in 0c5c870:
If you want me to, I can extract these two changes into a separate PR. As far as I am concerned, there are now 3 things I can do to move forward:
In any case, I don't think this PR will be mergeable before the end of September. My suggestion would be to publish a release in September and then get this PR and #2969 merged ASAP. |
Thank you so much! 💖 What is very clear is that whichever PR is first mergable, we will merge. This was already discussed earlier with @mpranj. How to better fix this bug is hard to say. Maybe with @vLesk's code changes the bug would not exist. But that is a big maybe. We could discuss this on Monday or you could try to find out (see also below). Anyway, I do not think that waiting (for anything) is a good option. Both this PR and #2969 are essential for 1.0. For @PhilippGackstatter's work (finalizing the 1.0 API) this PR is even more important. So the faster we get this merged, the better.
Why would this need rebases? But I trust you that you will do it later.
They could be unrelated to this PR and a problem of your lua/boost version? If yes, a separate bug report would be helpful.
If you somehow can help @vLesk it would be very much appreciated 💖. I think he is quite desperate with his PR.
Having a release in September (with TOML) and then merging this two PRs asap sounds amazing 🚀 |
Yes, this is probably the case. I will open an issue. But that's all I can do, since I have zero knowledge about both Lua and Boost.
I'll see what I can do. I can at least fix compiler warnings and memory leaks. |
37cd8f2
to
e96678e
Compare
@mpranj @markus2330 What should I do about the ASAN failures from Also, are we going to remove the old And I assume, I can ignore/disable the failing |
3bbd14c
to
14f3a05
Compare
I found the right attributes for the ASAN problem, disabled the old That means this PR should be working now... It's not done yet, because there are still some minor API changes I want to do (mainly replacing |
@@ -1,4 +1,4 @@ | |||
return () | |||
remove_tool (pythongen "Deprecated and will be removed soon") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is better if you directly remove it if this PR breaks it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'll do that. I just wanted to get the PR in a state where all CI jobs run successfully. Small incremental changes from a working state are much easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, are we going to remove the old pythongen stuff for 1.0? Because then I won't bother fixing the remaining bugs...
Yes, let us remove it. Later we can port the code generator for contextual values to your new template system. Its low-priority for 1.0, though.
libelektra/tests/shell/check_oclint.sh Lines 18 to 29 in 4b0e9f8
@sanssecours What exactly makes these handful of files special? Why are we only checking those with |
Please make this in a separate (later) PR. It also should be discussed with @PhilippGackstatter as it is a huge API change with many implications on API usability. |
Most of these files contain code I wrote. I just used
|
The whole keyname change is a huge API change already so why not do it right now... Usability is IMO the same, in most of these functions In some cases usability even improves. For example |
If you can change every ssize_t to size_t, then it is a clear improvement. If it is only sometimes, the change would bring an inconsistency. |
Every |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit more review...
@@ -41,24 +41,26 @@ static void addWarning (Key * key, const char * code, const char * name, const c | |||
return; | |||
} | |||
|
|||
char buffer[25] = "warnings/#00"; | |||
buffer[12] = '\0'; | |||
char buffer[64] = "warnings/#0\0\0"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for also fixing this! It also should be mentioned in the release notes.
It should now be more performant since we try to do more work with fewer iterations of the string. | ||
However, there are no benchmarks (yet), so your mileage may vary. | ||
|
||
If you rely on specific behaviour of Elektra's Keynames and have already taken the two changes above into account, please refer to the newly created [Keyname documentation](../keynames) and (easier to read) [Python reference implementation](../keynames/keynames.py). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot agree that the python reference implementation is easier to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you look at the C-implementation? Did you actually try and understand it?
Just the fact, that the C-implementation validates the whole key name in reverse makes it much more difficult to understand. You even said in another comment "Function is very long"...
Can you give me one concrete example where the Python implementation is harder to read?
To be clear "easier to read" refers to the fact that in the Python implementation it is much easier to spot what key names are valid and why. With just the C-implementation it is quite difficult to see, if a certain key name is valid or not, without executing the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you did not say what you wanted to say in the release notes. To make the release notes more clear, remove:
- the "(easier to read)" (as it compares with the documentation in the sentence you wrote)
- the text about performance and not-done benchmarks
and please add all the other important changes that are currently not mentioned (like removal of fallbacks).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as it compares with the documentation in the sentence you wrote
Ah I see... Yes, this could be misunderstood. I will clarify that the Python implementation is easier compared to the C implementation. Of course the documentation text, should be easiest to understand (although it might take longer to read).
the text about performance and not-done benchmarks
That is already removed, just not pushed. It turns out that performance is about the same, because the old implementation started modifying the key before it validated the name.
please add all the other important changes that are currently not mentioned
I will add everything I can remember ;)
like removal of fallbacks
Which fallbacks do you mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean the fallback in KDB's bootstrap code. As there are quite many important breaking changes it would be best if you can make a git diff
and look through.
@@ -131,16 +131,14 @@ enum elektraLockFlags | |||
enum elektraNamespace | |||
{ | |||
KEY_NS_NONE=0, ///< no key given as parameter to keyGetNamespace() | |||
KEY_NS_EMPTY=1, ///< key name was empty, e.g. invalid key name | |||
KEY_NS_CASCADING=1, ///< cascading key, starts with /, abstract name for any of the namespaces below |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs to be rephrased as it is not "below" (e.g. META is below but not relevant for cascading).
@@ -537,6 +550,13 @@ int keyCopy (Key * dest, const Key * source) | |||
return -1; | |||
} | |||
|
|||
static void keyClearNameValue (Key * key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no prefix is needed as it is static, better rename it to "clearNameAndValue".
} | ||
|
||
ssize_t elektraKeySetName (Key * key, const char * newName, elektraKeyFlags options) | ||
int elektraKeyNameValidate (const char * name, const char * prefix, size_t * sizePtr, size_t * usizePtr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add docu for the function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will. You just found the one function that doesn't have a "TODO: document" comment...
} | ||
} | ||
|
||
if (state != PART_END && partStart) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function is very long, can you refactor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can try, but honestly I don't think it will change anything. Personally, I think splitting this function will make maintaining it harder not easier. Other than extracting the bits before and after the loop, I don't see a good place to split the function. The result will be multiple functions with large numbers of arguments. These functions won't make any sense, if you don't understand the main function and you can't understand the main function without understanding every detail of the "helper" functions.
Another (IMO more important) point is that this function will likely have to change quite drastically before 1.0, if we want to implement any of the ideas related to restrictions on key names. To implement the change, we will essentially have to merge the functions back together, do the change and split it into new parts. In short, refactoring now seems a waste of time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather love to see some functions in the middle part. If the functions get good names of what they do it should be a big improvement in any case. At the moment elektraKeyNameValidate is very hard to understand/review. keySetName is done very nicely 💖
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather love to see some functions in the middle part.
Me too, but I don't think there is an easy way to do that. The new function will be so tightly coupled to the existing one that I don't think there is any benefit. If anything it might make it more difficult to understand, because you need to keep track of two functions at the same time.
I will see what I can do, but I can't promise anything...
At the moment elektraKeyNameValidate is very hard to understand/review.
That's why I added the Pyhton implementation and said that one is easier to read.
|
||
ssize_t keySetNamespace (Key * key, elektraNamespace ns) | ||
{ | ||
// TODO (kodebach): document |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
* - Given @p user:someuser/..... return @p someuser | ||
* - Given @p user:some.user/.... return @p some.user | ||
* - Given @p user/.... return the current user | ||
* - Given @p user:someuser:/..... return @p someuser |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you also consider to remove everything related to owner?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did, but owner can also come from the owner
metadata and that is still used in keyCompareByNameOwner
, i.e. the implementation of keyCmp
. So I decided to just remove the fullname stuff and leave the rest for another PR.
This docu has to be changed in any case...
@@ -140,30 +108,30 @@ static void test_keyNewSystem (void) | |||
printf ("Test system key creation\n"); | |||
|
|||
// Empty key | |||
key = keyNew (0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens with the old syntax? (I am all for getting rid of it but as long it compiles we somehow need to do something with it.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean "the old syntax"?
NULL
is not a valid key name anymore and is treated like any other invalid key name. If you pass an invalid key name to keyNew
, then keyNew
will return NULL
.
This was always the case according to the documentation of keyNew
on master:
libelektra/src/libs/elektra/key.c
Line 163 in 6d23610
* @retval NULL on allocation error or if an invalid @p name was passed (see keySetName()). |
Although the implementation of keyVNew
totally ignores the return value of elektraKeySetName
on master and therefore doesn't agree with the documentation.
In any case, the new keyVNew
does check the return value and does properly free
the Key
and return NULL
, if the key name wasn't valid. It also logs a warning stating that the key name was invalid.
libelektra/src/libs/elektra/key.c
Lines 314 to 319 in 1429765
if (keySetName (key, name) < 0) | |
{ | |
ELEKTRA_LOG_WARNING ("Invalid name: %s", name); | |
elektraFree (key); | |
return NULL; | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- key = keyNew (0);
What do you mean "the old syntax"?
keyNew (0);
is the old syntax for keyNew ("", KEY_END);
|NULL| is not a valid key name anymore and is treated like any other
invalid key name. If you pass an invalid key name to |keyNew|, then
|keyNew| will return |NULL|.
Yes, I was missing a testcase for that.
In any case, the new |keyVNew| does check the return value and does
properly |free| the |Key| and return |NULL|, if the key name wasn't
valid. It also logs a warning stating that the key name was invalid.
Thank you for fixing this bug 💖
One question regarding this PR and #3507 and the upgrade path in general: Let's say we merge both PRs, so we have the Keyname overhaul as well as the new dump version. That version of Elektra would not be able to correctly read old configs (with old keynames) anyway, right? Or is there something I'm overlooking here? EDIT: in other words: the old configs will be gone for good? |
AFAIK merging #3507 should not destroy any configs. Most configs that are updated to the dump version from #3507 should also survive this PR. Depending on how exactly this PR will affect key names, some configs may be broken. Currently this would only affect configs which:
Other configs should be unaffected. If needed, I may also update the "dump v1" part of |
Yes, I was thinking more of Keyname overhaul regarding the breaking changes. I don't know how other storage plugins will handle it. It would definitely be frustrating to users to loose lots of config data during an innocent looking 0.9.2 to 0.9.3 upgrade.
That's great if it can be done without causing issues. |
The other storage plugins should all use relative key names, so the namespace changes, shouldn't be a problem. The other things will always be a problem, but shouldn't affect too many configs. Once we know what exactly will cause problems, we could also provide a script that can be run with the old
We could call it |
We should clearly communicate that these upgrades are breaking changes as preparation for 1.0.0. Probably we need this disclaimer every time.
Or we rethink these changes, see also #3518 In particular:
But we will discuss this hopefully soon in a meeting.
I once did this and it caused quite some trouble at many places in the build system, packages and similar. We should avoid it, at simply communicate that |
SemVer actually says:
So according to SemVer, you can never assume
I don't know enough about the build system to say something about this, but we will have somebody working on a new release process, so maybe this can actually be fixed.
I'm not gonna go into detail again, but please do not think in terms of the existing functions. Only think about (unescaped) key names and key name parts. It might be a good strategy to use completely new function names to avoid any existing assumptions. |
Exactly.
Everything can be changed. But with 1.0.0 we would need to change it back again (so that KDB_VERSION_PATCH is a number). As version are mostly a matter of taste and without reading release notes there is always room for misunderstanding (what means beta or alpha after all) I would not waste time for that. We already communicated several times what
Yes, let us go step by step through the all relevant decisions in the next meeting, with as little assumptions as possible in the beginning. This way we hopefully come to the best key names possible for configuration. |
Did we say somewhere that we use SemVer? If so there should be room for misunderstanding. That's the whole point of SemVer.
Personally, I would be a lot more cautious about a But that's just my opinion. In any case, we may need |
@@ -0,0 +1,1113 @@ | |||
#!/bin/python3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@markus2330 @mpranj I added this as a demo of how I would imagine the Elektra key name API. Take a look, if you want to, but please wait with comments until our meeting. There are many parts of this that can be changed and adapted to better fit any requirements.
If you want to play with the API, I recommend using ./new_api.py repl
. There is also ./new_api.py demo
which among other things prints a KeySet
(list of Key
s actually) in various formats to show how storage files might look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hehe, I definitely prefer the idea to reserve ®elektra
(if we need to reserve anything at all, see discussions about having reservation internal to storage plugins. If it is only reserved internally, we might also reserve ®meta
and ®value
?). Also the other things written below "Demo of new key name API" sound like a big improvement to me.
About the base name escaping of #
: maybe our misunderstanding is about if we expect someone using keySetBaseName
"that he/she knows what he/she is doing". Imho the escaping of e.g. #
is unnecessary, why not simply allow construing arrays (also broken ones without the proper number of _
if someone wants) using it as a low-level interface? Then keySetBaseName
would not need to "escape" anything in the unescaped key name and we still have the strong guarantees it had. Only keyAddName
does the proper escaping of arrays (i.e. adding _
) and might reject things. Btw. I am actually fine with keyAddName
rejecting ®
, as long as users can bypass these restrictions with keySetBaseName
(and then run into the danger that these things might get interpreted by plugins!). Obviously this all depends on if we want semantic in key names at all.
Additional Interfaces like _KeyHierarchy I welcome very much but please not before 1.0 as this is very hard to get right (I did not look into the details of your proposal) and there are very few applications which actually need something like this (e.g. whole of KDE only uses flat configurations and with a few exceptions they are happy with that). But after 1.0 I definitely want something like this! (In some scenarios, possibly also in the spec plugin, it is the much more sane interface to work with. E.g. yajl could be rewritten with 10% of the code having such an interface. Maybe we start by adding such an interface internal to spec, and then make it public later?)
My current "perfect Elektra" would be:
- no semantic in key names
- arrays in the KeySet are only a representation for arrays but not real arrays (as arrays is also a rather rare feature, e.g. KDE does not need it), are simply numbered (without starting with
#
, 50 escaped with_50
, but tagged witharray
meta-data). From the representation users can get real arrays via other interfaces like _KeyHierarchy or the high-level API. - keyAddName is a safe interface (in the sense it does not allow to construct weird things usually unwanted like numbers that are not prefixed with
_
) to add something to the key name - keyAddBaseName is low-level raw interface to add something directly to the unescaped key name which can be dangerous and make unwanted things (and I am fine also saying this in the API docu). It is mostly needed for storage plugins but not for applications.
Just a few thoughts, no need to answer, let us discuss it in the next meeting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few further notes (also to myself) and clarifications, discussion will happen in the meeting...
I definitely prefer the idea to reserve ®elektra
The nice thing about reserving ®
in UTF-8 is that the encoding is \xc2\xae
. \xae
is actually the codepoint for ® in ISO-8859-1 and Windows-1252 (\xc2
is Â), so even if an editor doesn't default to UTF-8, there is a good chance that ®elektra
will be recognizable.
why not simply allow construing arrays (also broken ones without the proper number of
_
if someone wants) [...] OnlykeyAddName
does the proper escaping of arrays (i.e. adding_
)
I'm not quite sure I understood your idea about constructing arrays. Also: It is already the case that only keyAddName
adds _
s and I wouldn't suggested anything else.
What I maybe didn't put into the code comments, but would've explained in the meeting:
The idea with this Python API was that there are names and there are indices. Names are for maps, indices for arrays. add_base_name
always adds a name and add_array_index
always adds an index. Basic functionality of the functions is different. Key name parts starting with a single #
are always array indices. add_base_name
always adds a second #
if it detects a #
to make sure the intention of "add a name" is fulfilled.
Additional Interfaces like _KeyHierarchy [...] there are very few applications which actually need something like this
KeyHierarchy
was intended to help storage plugins and not applications. I agree that only few applications could use it. The lcdexec
menu in LCDproc is a candidate, but the fact that I managed to implement it without a KeyHierarachy
shows it isn't really needed.
A possibility (especially if we leave the plugin API unstable for 1.0) would be to explicitly state that KeyHierarchy
is not yet part of the stable API and only patch releases, but not minor version updates will be compatible.
as arrays is also a rather rare feature [...] without starting with
#
I think this very much depends on what config format an application uses. If the application uses a JSON config, I think it is much more likely to use arrays than with INI. The concept of arrays definitely comes up often (e.g. multiple accounts, list of servers, etc.), but often you can work around the need for actual arrays.
IIRC when I started this PR last year, we said the #
is a nice visual indicator and makes it easier to spot array elements within long key names. But maybe idea of separating names and indices would benefit from removing the #
. It's quite common in programming that identifiers cannot start with a digit. However, prefixing with _
is a common way around that, but we use _
for number. Let's discuss this in the meeting.
keyAddName is a safe interface [...] keyAddBaseName is low-level raw interface
Maybe we need another level. keyAddName
works with the escaped name. That can be tedious at times.
In the Python API I thought about adding a add_last_part
which just does self.__parts.append (part)
without any checks or processing. I didn't like the idea. Without add_last_part
you have to think about what you are doing. That makes it harder to accidentally create something broken. It also avoids some problems with aliasing/uniqueness of key names.
[keyAddBaseName] is mostly needed for storage plugins but not for applications.
I think this depends on the application. For a static config with a fixed number of keys, yes keyAddBaseName
isn't needed. If there are dynamic parts e.g. there is a /mail/[account_name]/...
subset for each account [1], then keyAddBaseName
is indeed useful.
I would also say that keyAddBaseName
is useful in all plugins and libraries, not just storage plugins.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have time, please review the
doc/keynames/README.md
as well as doc/keynames/keynames.py (mostly the comments).
Looks like you put a lot of thought into the topic of key names 👍. I added some small comments to the pull request. Please feel free to ignore them if you do not like them.
Like Markus I am not a big fan of capitalizing special terms like “Key Name” or “System”. I would prefer if we just use the current lowercase style, e.g “key name”. Maybe it would make sense to emphasize just the first occurrence of special terms (for example using capitalization) and then switch back to the lowercase spelling.
Co-authored-by: René Schwaiger <sanssecours@me.com>
Yes, a lot of effort was spent on this, since it is a quite essential part for Elektra. However, the files you reviewed no longer represent the latest decisions. In #3514 and #3549, we decided on a few changes (mostly regarding arrays and reserved names). If you think we should go back to the way it is implemented in this PR for some reason or if you have any other suggestions, please comment in #3549.
I will do that in the next version of the docu. |
Co-authored-by: René Schwaiger <sanssecours@me.com>
I squashed and rebased this PR again. The new PR is #3555 |
Continuation of #3115 (commit squashed and rebased)
Implements #3102
Closes #3050
Closes #3084
Closes #2698
Closes #3082
Basics
These points need to be fulfilled for every PR:
(added as entry in
doc/news/_preparation_next_release.md
whichcontains
_(my name)_
)Please always add something to the release notes.
(first line should have
module: short statement
syntax)close #X
, are in the commit messages.Checklist
(not in the PR description)
Review