Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Old] Keyname overhaul #3447

Closed
wants to merge 57 commits into from

Conversation

kodebach
Copy link
Member

Continuation of #3115 (commit squashed and rebased)

Implements #3102

Closes #3050
Closes #3084
Closes #2698
Closes #3082

Basics

These points need to be fulfilled for every PR:

  • Short descriptions of your changes are in the release notes
    (added as entry in doc/news/_preparation_next_release.md which
    contains _(my name)_)
    Please always add something to the release notes.
  • Details of what you changed are in commit messages
    (first line should have module: short statement syntax)
  • References to issues, e.g. close #X, are in the commit messages.
  • The buildservers are happy.
  • The PR is rebased with current master.

Checklist

  • I added unit tests for my code
  • I fully described what my PR does in the documentation
    (not in the PR description)
  • I fixed all affected documentation
  • I added code comments, logging, and assertions as appropriate (see Coding Guidelines)
  • I updated all meta data (e.g. README.md of plugins and METADATA.ini)
  • I mentioned every code not directly written by me in THIRD-PARTY-LICENSES

Review

@mpranj
Copy link
Member

mpranj commented May 30, 2020

Thank you so much for rebasing this PR!

I don't know if anybody will be able to pick this up, but it will definitely be much easier with this rebased version.

@markus2330
Copy link
Contributor

@kodebach any timeline when you can work on this?

In #3471 and #2969 is some progress but I would say it is a simply first-come-first-merge between these 3 PRs, or what do you think @mpranj?

@mpranj
Copy link
Member

mpranj commented Sep 1, 2020

In #3471 and #2969 is some progress but I would say it is a simply first-come-first-merge between these 3 PRs, or what do you think @mpranj?

Fully agree. A ton of work has been put into these PRs and I am happy to merge any PR that is ready!

@kodebach
Copy link
Member Author

kodebach commented Sep 2, 2020

@kodebach any timeline when you can work on this?

Hard to say. It really depends on what needs to be done before this PR can be merged...

In #3471 and #2969 is some progress but I would say it is a simply first-come-first-merge between these 3 PRs [...]

#3471 does have some progress, but I don't think it really affects this PR, there may be a few minor changes needed to make it compatible with the new keyname scheme but it shouldn't be too hard.

I don't see any progress in #2969 since this PR (the rebased version) was created. But unlike #3471, it has a lot of overlap with this PR. So merging one will have a big effect on the other.


Personally, I would prefer, if #2969 would be finished first, because that would probably cause less work over all. However, at this point it seems a bit unlikely to me. I will invest a few hours this week to get this PR rebased again and to identify the parts that still need work. Then I'll be able to give a more concrete timeline and maybe there are some things were other people can help out as well. Lots of the bugfixes so far were just a while of debugging to find the one magic number that makes an assumption about namespace length, or something like that. Very few bugfixes actually required a deep understanding of the changes in this PR.

@markus2330
Copy link
Contributor

Thank you for the update! Yes, it would be very interesting to know what still needs to be done, thank you so much for looking into it 💖.

I also hope that @vLesk can finish his PR before this PR. This PR seems to me it should be rather merged directly after a release and tested extensively. I'll help with testing.

@kodebach
Copy link
Member Author

kodebach commented Sep 2, 2020

it should be rather merged directly after a release and tested extensively

Merging after a release certainly allows for the most testing yes. In any case, this PR should ideally be merged while no other big PRs are in progress. Anything that is in progress when this PR is merged, will likely require extensive changes.

@kodebach kodebach mentioned this pull request Sep 5, 2020
16 tasks
@kodebach
Copy link
Member Author

kodebach commented Sep 5, 2020

I did the rebase and I think the PR is now more or less in the same state as before the rebases:

  • There is a major bug that surfaces in the testshell_markdown_kdb_global_umount. It is very likely that be bug is somewhere deep within the backend code that will be replace by backend plugin #2969. It could also be a bug in the the cache plugin (caused by the keyname change).
  • There is at least one memory leak somewhere within the core. This might be related to the bug with testshell_markdown_kdb_global_umount.

Also documentation and code comments need to be updated. But that could be done in a separate PR to avoid additional rebases.

Some problems I found only appear on my machine, but not on the build server:

  • The markdown shell test for the tcl plugin is broken. I get an exception from some boost library caused by the special characters in the keyname. Maybe the ccode plugin should escape does, but it doesn't. (Boost Version: 1.72.0)
  • The Lua plugin doesn't compile. I get a weird error: "undefined reference to lua_newuserdatauv" (Lua Version: 5.4.0)
  • There is a problem in the Lua binding: The ipairs iterator always skips the first key in the KeySet. (Lua Version: 5.4.0)

Regarding the changes in 0c5c870:

  • I modified reformat-shell, because my version of shfmt (from the Arch/Manjaro community repo) reports (devel) as its version.
  • I modified restyled.yaml, because it tried to use clang-format on some files that really should not be re-formatted. It seems the Restyled job didn't trigger after this change. @mpranj Please check, that I didn't break the config.

If you want me to, I can extract these two changes into a separate PR.


As far as I am concerned, there are now 3 things I can do to move forward:

  1. I try to get this PR into a mergeable state and we merge it. This would break backend plugin #2969 and require significant changes over there.
  2. I try to fix as much as I can in this PR, but don't merge it immediately. (Not sure whether that is worth it, since another rebase will be required)
  3. I take a look backend plugin #2969. It seems that some of the remaining problems should be quite easy to fix (compiler warnings and basic memory leaks). Without @vLesk I can of course not guarantee that the PR is completely done, but I might be able to speed things up. After that I would fix this PR.

In any case, I don't think this PR will be mergeable before the end of September. My suggestion would be to publish a release in September and then get this PR and #2969 merged ASAP.

@markus2330
Copy link
Contributor

Thank you so much! 💖

What is very clear is that whichever PR is first mergable, we will merge. This was already discussed earlier with @mpranj.

How to better fix this bug is hard to say. Maybe with @vLesk's code changes the bug would not exist. But that is a big maybe. We could discuss this on Monday or you could try to find out (see also below).

Anyway, I do not think that waiting (for anything) is a good option. Both this PR and #2969 are essential for 1.0. For @PhilippGackstatter's work (finalizing the 1.0 API) this PR is even more important. So the faster we get this merged, the better.

Also documentation and code comments need to be updated. But that could be done in a separate PR to avoid additional rebases.

Why would this need rebases? But I trust you that you will do it later.

Some problems I found only appear on my machine, but not on the build server:

They could be unrelated to this PR and a problem of your lua/boost version? If yes, a separate bug report would be helpful.

As far as I am concerned, there are now 3 things I can do to move forward:

If you somehow can help @vLesk it would be very much appreciated 💖. I think he is quite desperate with his PR.

In any case, I don't think this PR will be mergeable before the end of September. My suggestion would be to publish a release in September and then get this PR and #2969 merged ASAP.

Having a release in September (with TOML) and then merging this two PRs asap sounds amazing 🚀

@kodebach
Copy link
Member Author

kodebach commented Sep 5, 2020

They could be unrelated to this PR and a problem of your lua/boost version?

Yes, this is probably the case. I will open an issue. But that's all I can do, since I have zero knowledge about both Lua and Boost.

If you somehow can help @vLesk it would be very much appreciated .

I'll see what I can do. I can at least fix compiler warnings and memory leaks.

@kodebach
Copy link
Member Author

I added d10e727, since that should circumvent/hide the Postcondition problem in kdb_global_umount (see also #3299). This avoids the test breaking other tests. However, I don't think we should leave it this way, since it could also hide other bugs (which is why I originally undid those changes).

@kodebach
Copy link
Member Author

@mpranj @markus2330 What should I do about the ASAN failures from opmphm? They occur within a part of the code that is copied from an external source and that is already marked with some ASAN exceptions (seemingly no longer enough). Technically ASAN is right that the reads are a buffer overflow, but I think the code might be written this way (instead of using casts and multiple reads) for performance reasons. I also couldn't figure out what the correct __attribute__ for ignoring the ASAN error is...

Also, are we going to remove the old pythongen stuff for 1.0? Because then I won't bother fixing the remaining bugs...

And I assume, I can ignore/disable the failing debian-stretch-full-ini build job, since the ini plugin is going to be removed in #3491?

@kodebach
Copy link
Member Author

I found the right attributes for the ASAN problem, disabled the old pythongen and removed the failing Jenkins job (since it is gonna be removed in #3491 anyway).

That means this PR should be working now... It's not done yet, because there are still some minor API changes I want to do (mainly replacing ssize_t with size_t in a lot of place, since 0 is no longer a valid key name size and therefore can be an error code).

@@ -1,4 +1,4 @@
return ()
remove_tool (pythongen "Deprecated and will be removed soon")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better if you directly remove it if this PR breaks it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll do that. I just wanted to get the PR in a state where all CI jobs run successfully. Small incremental changes from a working state are much easier.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, are we going to remove the old pythongen stuff for 1.0? Because then I won't bother fixing the remaining bugs...

Yes, let us remove it. Later we can port the code generator for contextual values to your new template system. Its low-priority for 1.0, though.

@kodebach kodebach mentioned this pull request Sep 22, 2020
14 tasks
@kodebach
Copy link
Member Author

oclint -p "@PROJECT_BINARY_DIR@" -no-analytics -enable-global-analysis -enable-clang-static-analyzer \
"src/libs/ease/array.c" \
"src/libs/ease/keyname.c" \
"src/libs/utility/text.c" \
"src/plugins/base64/"*.c \
"src/plugins/ccode/"*.cpp \
"src/plugins/cpptemplate/"*.cpp \
"src/plugins/directoryvalue/"*.cpp \
"src/plugins/mini/mini.c" \
"src/plugins/yamlcpp/"*.cpp \
"src/plugins/yamlsmith/"*.cpp \
"src/plugins/yanlr/"*.cpp

@sanssecours What exactly makes these handful of files special? Why are we only checking those with oclint?

@markus2330
Copy link
Contributor

I want to do (mainly replacing ssize_t with size_t in a lot of place

Please make this in a separate (later) PR. It also should be discussed with @PhilippGackstatter as it is a huge API change with many implications on API usability.

@sanssecours
Copy link
Member

@sanssecours What exactly makes these handful of files special? Why are we only checking those with oclint?

Most of these files contain code I wrote. I just used oclint as an additional code checking tool. I did not include additional files, since this

  • would require fixing the problems reported by the tool, and
  • probably cause a timeout in a lot of build jobs (checking a file with OCLint takes quite a lot of time).

@kodebach
Copy link
Member Author

I want to do (mainly replacing ssize_t with size_t in a lot of place

Please make this in a separate (later) PR. It also should be discussed with @PhilippGackstatter as it is a huge API change with many implications on API usability.

The whole keyname change is a huge API change already so why not do it right now...

Usability is IMO the same, in most of these functions -1 was the only error code and that was used when an input value was NULL. I don't think there are many application that check the return value of e.g. keySetName anyway (outside of tests keyVNew was the only place I found in the whole Elektra repo).

In some cases usability even improves. For example ksGetNameSize currently it returns a ssize_t because it uses -1 to indicate the given key is NULL. But that means 1) we actually cast the size_t keySize from struct _Key into a ssize_t and 2) the caller has to cast the return value back to size_t because the probably actually need a size_t e.g. for malloc or memcpy.

@markus2330
Copy link
Contributor

If you can change every ssize_t to size_t, then it is a clear improvement. If it is only sometimes, the change would bring an inconsistency.

@kodebach
Copy link
Member Author

Every ssize_t where the size part refers to the size of a key name might be possible. If size refers to e.g. key value size it won't be possible.

Copy link
Contributor

@markus2330 markus2330 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit more review...

@@ -41,24 +41,26 @@ static void addWarning (Key * key, const char * code, const char * name, const c
return;
}

char buffer[25] = "warnings/#00";
buffer[12] = '\0';
char buffer[64] = "warnings/#0\0\0";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for also fixing this! It also should be mentioned in the release notes.

It should now be more performant since we try to do more work with fewer iterations of the string.
However, there are no benchmarks (yet), so your mileage may vary.

If you rely on specific behaviour of Elektra's Keynames and have already taken the two changes above into account, please refer to the newly created [Keyname documentation](../keynames) and (easier to read) [Python reference implementation](../keynames/keynames.py).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot agree that the python reference implementation is easier to read.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you look at the C-implementation? Did you actually try and understand it?

Just the fact, that the C-implementation validates the whole key name in reverse makes it much more difficult to understand. You even said in another comment "Function is very long"...

Can you give me one concrete example where the Python implementation is harder to read?

To be clear "easier to read" refers to the fact that in the Python implementation it is much easier to spot what key names are valid and why. With just the C-implementation it is quite difficult to see, if a certain key name is valid or not, without executing the code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you did not say what you wanted to say in the release notes. To make the release notes more clear, remove:

  • the "(easier to read)" (as it compares with the documentation in the sentence you wrote)
  • the text about performance and not-done benchmarks

and please add all the other important changes that are currently not mentioned (like removal of fallbacks).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as it compares with the documentation in the sentence you wrote

Ah I see... Yes, this could be misunderstood. I will clarify that the Python implementation is easier compared to the C implementation. Of course the documentation text, should be easiest to understand (although it might take longer to read).

the text about performance and not-done benchmarks

That is already removed, just not pushed. It turns out that performance is about the same, because the old implementation started modifying the key before it validated the name.

please add all the other important changes that are currently not mentioned

I will add everything I can remember ;)

like removal of fallbacks

Which fallbacks do you mean?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean the fallback in KDB's bootstrap code. As there are quite many important breaking changes it would be best if you can make a git diff and look through.

@@ -131,16 +131,14 @@ enum elektraLockFlags
enum elektraNamespace
{
KEY_NS_NONE=0, ///< no key given as parameter to keyGetNamespace()
KEY_NS_EMPTY=1, ///< key name was empty, e.g. invalid key name
KEY_NS_CASCADING=1, ///< cascading key, starts with /, abstract name for any of the namespaces below
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to be rephrased as it is not "below" (e.g. META is below but not relevant for cascading).

@@ -537,6 +550,13 @@ int keyCopy (Key * dest, const Key * source)
return -1;
}

static void keyClearNameValue (Key * key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no prefix is needed as it is static, better rename it to "clearNameAndValue".

}

ssize_t elektraKeySetName (Key * key, const char * newName, elektraKeyFlags options)
int elektraKeyNameValidate (const char * name, const char * prefix, size_t * sizePtr, size_t * usizePtr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add docu for the function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will. You just found the one function that doesn't have a "TODO: document" comment...

}
}

if (state != PART_END && partStart)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function is very long, can you refactor?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can try, but honestly I don't think it will change anything. Personally, I think splitting this function will make maintaining it harder not easier. Other than extracting the bits before and after the loop, I don't see a good place to split the function. The result will be multiple functions with large numbers of arguments. These functions won't make any sense, if you don't understand the main function and you can't understand the main function without understanding every detail of the "helper" functions.

Another (IMO more important) point is that this function will likely have to change quite drastically before 1.0, if we want to implement any of the ideas related to restrictions on key names. To implement the change, we will essentially have to merge the functions back together, do the change and split it into new parts. In short, refactoring now seems a waste of time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather love to see some functions in the middle part. If the functions get good names of what they do it should be a big improvement in any case. At the moment elektraKeyNameValidate is very hard to understand/review. keySetName is done very nicely 💖

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather love to see some functions in the middle part.

Me too, but I don't think there is an easy way to do that. The new function will be so tightly coupled to the existing one that I don't think there is any benefit. If anything it might make it more difficult to understand, because you need to keep track of two functions at the same time.

I will see what I can do, but I can't promise anything...

At the moment elektraKeyNameValidate is very hard to understand/review.

That's why I added the Pyhton implementation and said that one is easier to read.


ssize_t keySetNamespace (Key * key, elektraNamespace ns)
{
// TODO (kodebach): document
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

* - Given @p user:someuser/..... return @p someuser
* - Given @p user:some.user/.... return @p some.user
* - Given @p user/.... return the current user
* - Given @p user:someuser:/..... return @p someuser
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you also consider to remove everything related to owner?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did, but owner can also come from the owner metadata and that is still used in keyCompareByNameOwner, i.e. the implementation of keyCmp. So I decided to just remove the fullname stuff and leave the rest for another PR.

This docu has to be changed in any case...

@@ -140,30 +108,30 @@ static void test_keyNewSystem (void)
printf ("Test system key creation\n");

// Empty key
key = keyNew (0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens with the old syntax? (I am all for getting rid of it but as long it compiles we somehow need to do something with it.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean "the old syntax"?

NULL is not a valid key name anymore and is treated like any other invalid key name. If you pass an invalid key name to keyNew, then keyNew will return NULL.

This was always the case according to the documentation of keyNew on master:

* @retval NULL on allocation error or if an invalid @p name was passed (see keySetName()).

Although the implementation of keyVNew totally ignores the return value of elektraKeySetName on master and therefore doesn't agree with the documentation.

In any case, the new keyVNew does check the return value and does properly free the Key and return NULL, if the key name wasn't valid. It also logs a warning stating that the key name was invalid.

if (keySetName (key, name) < 0)
{
ELEKTRA_LOG_WARNING ("Invalid name: %s", name);
elektraFree (key);
return NULL;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • key = keyNew (0);

What do you mean "the old syntax"?

keyNew (0); is the old syntax for keyNew ("", KEY_END);

|NULL| is not a valid key name anymore and is treated like any other
invalid key name. If you pass an invalid key name to |keyNew|, then
|keyNew| will return |NULL|.

Yes, I was missing a testcase for that.

In any case, the new |keyVNew| does check the return value and does
properly |free| the |Key| and return |NULL|, if the key name wasn't
valid. It also logs a warning stating that the key name was invalid.

Thank you for fixing this bug 💖

@kodebach kodebach mentioned this pull request Oct 15, 2020
16 tasks
@mpranj
Copy link
Member

mpranj commented Oct 19, 2020

One question regarding this PR and #3507 and the upgrade path in general:

Let's say we merge both PRs, so we have the Keyname overhaul as well as the new dump version. That version of Elektra would not be able to correctly read old configs (with old keynames) anyway, right? Or is there something I'm overlooking here?

EDIT: in other words: the old configs will be gone for good?

@kodebach
Copy link
Member Author

AFAIK merging #3507 should not destroy any configs. Most configs that are updated to the dump version from #3507 should also survive this PR. Depending on how exactly this PR will affect key names, some configs may be broken. Currently this would only affect configs which:

  • contain a key with a key name part that starts with @ (illegal, reserved)
  • contain a key with a key name part that starts with # and is not a valid array part (illegal)
  • contain a key with a key name part that starts with # followed by just digits (_s will be inserted)

Other configs should be unaffected.

If needed, I may also update the "dump v1" part of dump in this PR so that it inserts :s before calling keyNew. This way even the old dump should be readable (apart from the limitations above) will be updated.

@mpranj
Copy link
Member

mpranj commented Oct 19, 2020

AFAIK merging #3507 should not destroy any configs.

Yes, I was thinking more of Keyname overhaul regarding the breaking changes.
As an example, I will definitely have to increment the mmapstorage format version number within this PR to avoid problems. The cache, however, does not need an upgrade path.

I don't know how other storage plugins will handle it. It would definitely be frustrating to users to loose lots of config data during an innocent looking 0.9.2 to 0.9.3 upgrade.

If needed, I may also update the "dump v1" part of dump in this PR so that it inserts :s before calling keyNew. This way even the old dump should be readable (apart from the limitations above) will be updated.

That's great if it can be done without causing issues.

@kodebach
Copy link
Member Author

As an example, I will definitely have to increment the mmapstorage format version number within this PR to avoid problems. The cache, however, does not need an upgrade path.

I don't know how other storage plugins will handle it.

The other storage plugins should all use relative key names, so the namespace changes, shouldn't be a problem. The other things will always be a problem, but shouldn't affect too many configs.

Once we know what exactly will cause problems, we could also provide a script that can be run with the old kdb to list all problematic keys.

It would definitely be frustrating to users to loose lots of config data during an innocent looking 0.9.2 to 0.9.3 upgrade.

We could call it 1.0.0-alpha01 to make it a more obvious breaking change and also clearly mark it as a pre-release version.

@markus2330
Copy link
Contributor

I don't know how other storage plugins will handle it. It would definitely be frustrating to users to loose lots of config data during an innocent looking 0.9.2 to 0.9.3 upgrade.

We should clearly communicate that these upgrades are breaking changes as preparation for 1.0.0. Probably we need this disclaimer every time.

Once we know what exactly will cause problems, we could also provide a script that can be run with the old kdb to list all problematic keys.

Or we rethink these changes, see also #3518

In particular:

  • not reserve @ (but only @elektra or nothing or change to some different escaping strategy)
  • make something only an array if added by keyAddName and take it literally if passed to keySetBaseName.

But we will discuss this hopefully soon in a meeting.

We could call it 1.0.0-alpha01 to make it a more obvious breaking change and also clearly mark it as a pre-release version.

I once did this and it caused quite some trouble at many places in the build system, packages and similar. We should avoid it, at simply communicate that 0.9.* are our pre-releases for 1.0. This is also exactly as described in semantic versioning.

@kodebach
Copy link
Member Author

This is also exactly as described in semantic versioning.

SemVer actually says:

  • Major version zero (0.y.z) is for initial development. Anything MAY change at any time. The public API SHOULD NOT be considered stable. [source]
  • A pre-release version MAY be denoted by appending a hyphen and a series of dot separated identifiers immediately following the patch version. Identifiers MUST comprise only ASCII alphanumerics and hyphens [0-9A-Za-z-]. Identifiers MUST NOT be empty. Numeric identifiers MUST NOT include leading zeroes. Pre-release versions have a lower precedence than the associated normal version. A pre-release version indicates that the version is unstable and might not satisfy the intended compatibility requirements as denoted by its associated normal version. Examples: 1.0.0-alpha, 1.0.0-alpha.1, 1.0.0-0.3.7, 1.0.0-x.7.z.92, 1.0.0-x-y-z.–. [source]

So according to SemVer, you can never assume 0.9.3 doesn't break everything. But since Elektra did guarantee some stable API in the past, 1.0.0-alpha01 would be better, since such a version is also exempt from all guarantees in SemVer and contains the word "alpha" to clearly indicate that.

it caused quite some trouble at many places in the build system, packages and similar.

I don't know enough about the build system to say something about this, but we will have somebody working on a new release process, so maybe this can actually be fixed.

make something only an array if added by keyAddName and take it literally if passed to keySetBaseName.

I'm not gonna go into detail again, but please do not think in terms of the existing functions. Only think about (unescaped) key names and key name parts. It might be a good strategy to use completely new function names to avoid any existing assumptions.

@markus2330
Copy link
Contributor

Anything MAY change at any time.

Exactly.

so maybe this can actually be fixed.

Everything can be changed. But with 1.0.0 we would need to change it back again (so that KDB_VERSION_PATCH is a number). As version are mostly a matter of taste and without reading release notes there is always room for misunderstanding (what means beta or alpha after all) I would not waste time for that. We already communicated several times what 0.9.* means. As we already have 0.9.* releases, we would only increase the confusion by changing again.

I'm not gonna go into detail again, but please do not think in terms of the existing functions. Only think about (unescaped) key names and key name parts. It might be a good strategy to use completely new function names to avoid any existing assumptions.

Yes, let us go step by step through the all relevant decisions in the next meeting, with as little assumptions as possible in the beginning. This way we hopefully come to the best key names possible for configuration.

@kodebach
Copy link
Member Author

As version are mostly a matter of taste and without reading release notes there is always room for misunderstanding

Did we say somewhere that we use SemVer? If so there should be room for misunderstanding. That's the whole point of SemVer.

As we already have 0.9.* releases, we would only increase the confusion by changing again.

Personally, I would be a lot more cautious about a 0.9.2 -> 1.0.0-alpha01 update than about 0.9.2 -> 0.9.3. Even if 0.9.x is a 0.x.y release and it was stated that it may contain breaking changes, I would not expect a breaking change that breaks as much as this PR does to be hidden behind a patch release.

But that's just my opinion. In any case, we may need 1.x.0-alpha01 releases for future pre-release versions so we should at least investigate the possibility.

@@ -0,0 +1,1113 @@
#!/bin/python3
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@markus2330 @mpranj I added this as a demo of how I would imagine the Elektra key name API. Take a look, if you want to, but please wait with comments until our meeting. There are many parts of this that can be changed and adapted to better fit any requirements.

If you want to play with the API, I recommend using ./new_api.py repl. There is also ./new_api.py demo which among other things prints a KeySet (list of Keys actually) in various formats to show how storage files might look.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hehe, I definitely prefer the idea to reserve ®elektra (if we need to reserve anything at all, see discussions about having reservation internal to storage plugins. If it is only reserved internally, we might also reserve ®meta and ®value?). Also the other things written below "Demo of new key name API" sound like a big improvement to me.

About the base name escaping of #: maybe our misunderstanding is about if we expect someone using keySetBaseName "that he/she knows what he/she is doing". Imho the escaping of e.g. # is unnecessary, why not simply allow construing arrays (also broken ones without the proper number of _ if someone wants) using it as a low-level interface? Then keySetBaseName would not need to "escape" anything in the unescaped key name and we still have the strong guarantees it had. Only keyAddName does the proper escaping of arrays (i.e. adding _) and might reject things. Btw. I am actually fine with keyAddName rejecting ®, as long as users can bypass these restrictions with keySetBaseName (and then run into the danger that these things might get interpreted by plugins!). Obviously this all depends on if we want semantic in key names at all.

Additional Interfaces like _KeyHierarchy I welcome very much but please not before 1.0 as this is very hard to get right (I did not look into the details of your proposal) and there are very few applications which actually need something like this (e.g. whole of KDE only uses flat configurations and with a few exceptions they are happy with that). But after 1.0 I definitely want something like this! (In some scenarios, possibly also in the spec plugin, it is the much more sane interface to work with. E.g. yajl could be rewritten with 10% of the code having such an interface. Maybe we start by adding such an interface internal to spec, and then make it public later?)

My current "perfect Elektra" would be:

  • no semantic in key names
  • arrays in the KeySet are only a representation for arrays but not real arrays (as arrays is also a rather rare feature, e.g. KDE does not need it), are simply numbered (without starting with #, 50 escaped with _50, but tagged with array meta-data). From the representation users can get real arrays via other interfaces like _KeyHierarchy or the high-level API.
  • keyAddName is a safe interface (in the sense it does not allow to construct weird things usually unwanted like numbers that are not prefixed with _) to add something to the key name
  • keyAddBaseName is low-level raw interface to add something directly to the unescaped key name which can be dangerous and make unwanted things (and I am fine also saying this in the API docu). It is mostly needed for storage plugins but not for applications.

Just a few thoughts, no need to answer, let us discuss it in the next meeting.

Copy link
Member Author

@kodebach kodebach Oct 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few further notes (also to myself) and clarifications, discussion will happen in the meeting...

I definitely prefer the idea to reserve ®elektra

The nice thing about reserving ® in UTF-8 is that the encoding is \xc2\xae. \xae is actually the codepoint for ® in ISO-8859-1 and Windows-1252 (\xc2 is Â), so even if an editor doesn't default to UTF-8, there is a good chance that ®elektra will be recognizable.

why not simply allow construing arrays (also broken ones without the proper number of _ if someone wants) [...] Only keyAddName does the proper escaping of arrays (i.e. adding _)

I'm not quite sure I understood your idea about constructing arrays. Also: It is already the case that only keyAddName adds _s and I wouldn't suggested anything else.

What I maybe didn't put into the code comments, but would've explained in the meeting:
The idea with this Python API was that there are names and there are indices. Names are for maps, indices for arrays. add_base_name always adds a name and add_array_index always adds an index. Basic functionality of the functions is different. Key name parts starting with a single # are always array indices. add_base_name always adds a second # if it detects a # to make sure the intention of "add a name" is fulfilled.

Additional Interfaces like _KeyHierarchy [...] there are very few applications which actually need something like this

KeyHierarchy was intended to help storage plugins and not applications. I agree that only few applications could use it. The lcdexec menu in LCDproc is a candidate, but the fact that I managed to implement it without a KeyHierarachy shows it isn't really needed.

A possibility (especially if we leave the plugin API unstable for 1.0) would be to explicitly state that KeyHierarchy is not yet part of the stable API and only patch releases, but not minor version updates will be compatible.

as arrays is also a rather rare feature [...] without starting with #

I think this very much depends on what config format an application uses. If the application uses a JSON config, I think it is much more likely to use arrays than with INI. The concept of arrays definitely comes up often (e.g. multiple accounts, list of servers, etc.), but often you can work around the need for actual arrays.

IIRC when I started this PR last year, we said the # is a nice visual indicator and makes it easier to spot array elements within long key names. But maybe idea of separating names and indices would benefit from removing the #. It's quite common in programming that identifiers cannot start with a digit. However, prefixing with _ is a common way around that, but we use _ for number. Let's discuss this in the meeting.

keyAddName is a safe interface [...] keyAddBaseName is low-level raw interface

Maybe we need another level. keyAddName works with the escaped name. That can be tedious at times.

In the Python API I thought about adding a add_last_part which just does self.__parts.append (part) without any checks or processing. I didn't like the idea. Without add_last_part you have to think about what you are doing. That makes it harder to accidentally create something broken. It also avoids some problems with aliasing/uniqueness of key names.

[keyAddBaseName] is mostly needed for storage plugins but not for applications.

I think this depends on the application. For a static config with a fixed number of keys, yes keyAddBaseName isn't needed. If there are dynamic parts e.g. there is a /mail/[account_name]/... subset for each account [1], then keyAddBaseName is indeed useful.

I would also say that keyAddBaseName is useful in all plugins and libraries, not just storage plugins.

@markus2330 markus2330 mentioned this pull request Oct 24, 2020
@kodebach kodebach mentioned this pull request Oct 24, 2020
Copy link
Member

@sanssecours sanssecours left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have time, please review the doc/keynames/README.md as well as doc/keynames/keynames.py (mostly the comments).

Looks like you put a lot of thought into the topic of key names 👍. I added some small comments to the pull request. Please feel free to ignore them if you do not like them.

Like Markus I am not a big fan of capitalizing special terms like “Key Name” or “System”. I would prefer if we just use the current lowercase style, e.g “key name”. Maybe it would make sense to emphasize just the first occurrence of special terms (for example using capitalization) and then switch back to the lowercase spelling.

doc/keynames/keynames.py Outdated Show resolved Hide resolved
doc/keynames/keynames.py Outdated Show resolved Hide resolved
doc/keynames/keynames.py Outdated Show resolved Hide resolved
doc/keynames/keynames.py Outdated Show resolved Hide resolved
doc/keynames/README.md Outdated Show resolved Hide resolved
doc/keynames/README.md Outdated Show resolved Hide resolved
doc/keynames/keynames.py Outdated Show resolved Hide resolved
doc/keynames/keynames.py Outdated Show resolved Hide resolved
doc/keynames/keynames.py Outdated Show resolved Hide resolved
doc/keynames/keynames.py Outdated Show resolved Hide resolved
Co-authored-by: René Schwaiger <sanssecours@me.com>
@kodebach
Copy link
Member Author

kodebach commented Nov 8, 2020

Looks like you put a lot of thought into the topic of key names . I added some small comments to the pull request. Please feel free to ignore them if you do not like them.

Yes, a lot of effort was spent on this, since it is a quite essential part for Elektra. However, the files you reviewed no longer represent the latest decisions. In #3514 and #3549, we decided on a few changes (mostly regarding arrays and reserved names). If you think we should go back to the way it is implemented in this PR for some reason or if you have any other suggestions, please comment in #3549.

Like Markus I am not a big fan of capitalizing special terms like “Key Name” or “System”. I would prefer if we just use the current lowercase style, e.g “key name”. Maybe it would make sense to emphasize just the first occurrence of special terms (for example using capitalization) and then switch back to the lowercase spelling.

I will do that in the next version of the docu.

Co-authored-by: René Schwaiger <sanssecours@me.com>
@kodebach kodebach mentioned this pull request Nov 12, 2020
16 tasks
@kodebach
Copy link
Member Author

I squashed and rebased this PR again. The new PR is #3555

@kodebach kodebach closed this Nov 12, 2020
@markus2330 markus2330 changed the title Keyname overhaul [Old] Keyname overhaul Nov 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

order of namespaces in KeySet default namespace remove elektraKeySetName arrays with natural sorting
5 participants