-
Notifications
You must be signed in to change notification settings - Fork 757
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simple source code to integrate #550
Comments
I personally +1 for amalgamation and relatively modular code base. I also suppose its "root header" may look like this: /* xxhash-root.h */
/* Here's copyright notice, etc. */
#if defined (__cplusplus)
extern "C" {
#endif
#define XXH_VERSION_MAJOR 0
#define XXH_VERSION_MINOR 8
#define XXH_VERSION_RELEASE 1
#define XXH_VERSION_NUMBER (XXH_VERSION_MAJOR *100*100 + XXH_VERSION_MINOR *100 + XXH_VERSION_RELEASE)
#include "xxhash-inline-all.h" /* INLINE mode */
#ifndef XXHASH_H_5627135585666179
# define XXHASH_H_5627135585666179 1
# include "xxhash-api.h" /* Stable API */
# include "xxhash-xxh32-decl.h" /* 32-bit hash */
# ifndef XXH_NO_LONG_LONG
# include "xxhash-xxh64-decl.h"
# include "xxhash-xxh3-decl.h"
# endif
#endif
#if defined(XXH_STATIC_LINKING_ONLY) && !defined(XXHASH_H_STATIC_13879238742)
# define XXHASH_H_STATIC_13879238742
# include "xxhash-xxh32-static.h" /* XXH32_state_s */
# ifndef XXH_NO_LONG_LONG
# include "xxhash-xxh64-static.h"
# include "xxhash-xxh3-static.h"
# endif
# if defined(XXH_INLINE_ALL) || defined(XXH_PRIVATE_API)
# define XXH_IMPLEMENTATION
# endif
#endif
#if ( defined(XXH_INLINE_ALL) || defined(XXH_PRIVATE_API) || defined(XXH_IMPLEMENTATION) ) && !defined(XXH_IMPLEM_13a8737387)
# define XXH_IMPLEM_13a8737387
# include "xxhash-impl.h" /* Tuning parameters, Compiler specific options, etc */
# include "xxhash-xxh32-impl.h"
# ifndef XXH_NO_LONG_LONG
# include "xxhash-xxh64-impl.h"
# ifndef XXH_NO_XXH3
# include "xxhash-xxh3-impl.h"
# endif
# endif
#endif
#if defined (__cplusplus)
}
#endif Amalgamation process might be done by the following command. It "expands" every xxh-amalgum-util.py xxhash-root.h > xxhash.h
Also root The unknown point is when amalgamation will occur. For example,
|
Yes, that's exactly that.
That's a good point. The other idea (ensure that every PR has rebuilt
In which cases does it make a difference ? |
I thought that old version of amalgamed SQLite contained Anyway, my question doesn't make sense. I must rephrase it.
I don't want to object to the amalgamation. But I'd like to know typical scenario.
As for test, I think we can add explicit rebuilding of # Checkout procedure
... git checkout ...
rm xxhash.h
make xxhash.h # Rebuild
# Actual test
make all test-all
... We might also have test for real # Checkout procedure
... git checkout ...
# do not remove/rebuild xxhash.h #
# Actual test
make all test-all
... As for PR/commit process, I suppose typical scenario is: (0) We always have amalgamed |
I currently presume that issues will be reported using line numbers in the file actually included in user's project. aka, generally
Good idea, could be attempted.
Right
Yes, this one is more difficult.
Yes
Except if they modify something else, like the CLI code for example.
Yes |
I made a pair of small script for PoC. Demo
cd
mkdir xxh-amalgum
cd xxh-amalgum
curl -LOJ https://gist.githubusercontent.com/t-mat/0614c9ed6ef7b35dcf00cd022eacfa92/raw/f0a54688f30d5ce0516d8cc7523a8c6a859afee1/amalgum-expander.py
curl -LOJ https://gist.githubusercontent.com/t-mat/0614c9ed6ef7b35dcf00cd022eacfa92/raw/f0a54688f30d5ce0516d8cc7523a8c6a859afee1/amalgum-separator.py
curl -LOJ https://gist.githubusercontent.com/t-mat/8414155918bb7595d8c9c43dfadd6b4f/raw/4c12d56530d49c0185b2e4402d1114ee5a560081/xxhash.h
# Check special comments
grep "amalgum" xxhash.h
# split xxhash.h into multiple headers. base file is xxhash-root.h
python3 ./amalgum-separator.py
ls -l *.h
# expand include statement in xxhash-root.h
python3 ./amalgum-expander.py > test.h
diff xxhash.h test.h What's the point?
|
how about using a |
It's an interesting idea, It offers the advantages of a gradual expansion of build scripts able to produce smaller (limited) versions of the library, In a way, it answers #543 quite directly. However, separating the code into multiple small headers also adds maintenance convenience to the list of benefits. Bidirectional changes feels like an interesting property, though it could also introduce the question: "what's the source of truth?". Note that, if both versions are required to be exactly interchangeable at each commit, I presume it's a moot point at review stage. It's more an issue for the committer, which must decide which version he wants to modify, and update the other one with the appropriate automatic generator. It also impacts Once again, cutting the source code into multiple smaller files looks like a great boon to maintenance, and for certain deployments, so it feels like the right end-objective. But the resulting amalgamation must still produce a readable |
@t-mat : I've looked at your PoC demo, and ran it locally. Now, the resulting files from the splitting process are not yet meaningful enough to become reference. |
I'm glad it seems we find promising direction. It also brings many further questions/issues though.
If we choose separation/amalgamation method, I think this is a core question.
|
Yes, that's a good plan. |
As for license, my idea is adding We introduce special comment // xxhash.h
/* --amalgum--:begin-prologue */
/*
* xxHash - Extremely Fast Hash algorithm
* ...
*/
/* --amalgum--:end-prologue */ Each separated header contains |
As for dependency, we might introduce dedicated root header file for each variant. /* xxhash-xxh32-root.h which produces specific variant file xxhash-xxh32.h */
#if defined (__cplusplus)
extern "C" {
#endif
#include "xxhash-inline-mode.h"
#ifndef XXHASH_H_5627135585666179
# define XXHASH_H_5627135585666179 1
# include "xxhash-decl.h"
# include "xxhash-xxh32-decl.h"
#endif /* XXHASH_H_5627135585666179 */
#if defined(XXH_STATIC_LINKING_ONLY) && !defined(XXHASH_H_STATIC_13879238742)
# define XXHASH_H_STATIC_13879238742
# include "xxhash-xxh32-static.h"
#endif /* defined(XXH_STATIC_LINKING_ONLY) && !defined(XXHASH_H_STATIC_13879238742) */
#if ( defined(XXH_INLINE_ALL) || defined(XXH_PRIVATE_API) \
|| defined(XXH_IMPLEMENTATION) ) && !defined(XXH_IMPLEM_13a8737387)
# define XXH_IMPLEM_13a8737387
# include "xxhash-tuning.h"
# include "xxhash-impl.h"
# include "xxhash-xxh32-impl.h"
#endif /* XXH_IMPLEMENTATION */
#if defined (__cplusplus)
}
#endif I don't say this is the best approach since it still contains some unnecessary complexity of |
Looks good to me.
That's one way of solving it, A remaining issue is that, someone opening It's not a big deal as long as One potential solution could be to extend /* --amalgum--:begin:FILENAME
* #include "DEPENDENCY1"
* #include "DEPENDENCY2"
*/ with obviously the intention to add the commented |
I've updated amalgum scripts and example democd
mkdir xxh-amalgum
cd xxh-amalgum
curl -LOJ https://gist.githubusercontent.com/t-mat/0614c9ed6ef7b35dcf00cd022eacfa92/raw/f0a54688f30d5ce0516d8cc7523a8c6a859afee1/amalgum-expander.py
curl -LOJ https://gist.githubusercontent.com/t-mat/0614c9ed6ef7b35dcf00cd022eacfa92/raw/f0a54688f30d5ce0516d8cc7523a8c6a859afee1/amalgum-separator.py
curl -LOJ https://gist.githubusercontent.com/t-mat/8414155918bb7595d8c9c43dfadd6b4f/raw/12298ac6d1c0386f26eef469b5963f39b221014e/xxhash.h
# Check special comments
grep "amalgum" xxhash.h
# split xxhash.h into multiple headers. base file is xxhash-root.h
python3 ./amalgum-separator.py
ls -l *.h
# expand include statement in xxhash-root.h
python3 ./amalgum-expander.py > test.h
diff xxhash.h test.h Now,
|
As for separated headers, your concern is right. Perhaps we may rename these headers as Other solution is requiring every single separated header must be valid C language header. To satisfy this requirement, separated header might look like this: /* xxhash-xxh32-impl.h */
/* --amalgum--:begin-ignore */
#include "xxhash-decl.h"
#include "xxhash-xxh32-decl.h"
#include "xxhash-xxh32-static.h"
#include "xxhash-tuning.h"
#include "xxhash-impl.h"
/* --amalgum--:end-ignore */
... actual xxh32 implementation ... Here, let's say Since I suppose it might work. But I'm not sure how we can make harmony with INLINE mode or not. If INLINE mode works fine, I personally prefer to this format. Because every single contributor and IDEs understand them easily. |
Yes, guard macro per header file are needed
Indeed. Also, this order only matters for the |
I added special notation for dependency: /* --amalgum--:begin:xxhash-xxh32-impl.h */
/* --amalgum--:include:xxhash-inline-mode.h */
/* --amalgum--:include:xxhash-decl.h */
/* --amalgum--:include:xxhash-xxh32-decl.h */
/* --amalgum--:include:xxhash-xxh32-static.h */
/* --amalgum--:include:xxhash-tuning.h */
/* --amalgum--:include:xxhash-impl.h */ It may be just a bit bulky. To reduce number of /* --amalgum--:begin:xxhash-xxh32-impl.h */
/* --amalgum--:include:xxhash-impl.h */ /* all other dependencies are resolved automatically */ But since it may prevent to tuning of dependency and order by hand, so far I'd like to leave it as primitive form. I also added it to testcd
mkdir xxh-amalgum
cd xxh-amalgum
curl -LOJ https://gist.githubusercontent.com/t-mat/0614c9ed6ef7b35dcf00cd022eacfa92/raw/0062dcbd710ba541dbd4c1a91e195029c5976bd8/amalgum-expander.py
curl -LOJ https://gist.githubusercontent.com/t-mat/0614c9ed6ef7b35dcf00cd022eacfa92/raw/0062dcbd710ba541dbd4c1a91e195029c5976bd8/amalgum-separator.py
curl -LOJ https://gist.githubusercontent.com/t-mat/8414155918bb7595d8c9c43dfadd6b4f/raw/9899dea05aa09f72e5caafed3354fc521f30faf3/xxhash.h
# Check special comments
grep "amalgum" xxhash.h
# split xxhash.h into multiple headers. base file is xxhash-root.h
python3 ./amalgum-separator.py
ls -l *.h
# expand include statement in xxhash-root.h
python3 ./amalgum-expander.py > test.h
diff xxhash.h test.h I still don't have conclusion about guard macro for each fragment.
/* --amalgum--:begin:xxhash-xxh32-impl.h */
/* --amalgum--:guard:XXHASH_XXH32_IMPL */
/* --amalgum--:include:xxhash-inline-mode.h */
... becomes /* --amalgum--:begin-prologue */
...
/* --amalgum--:end-prologue */
#ifndef XXHASH_XXH32_IMPL /* --amalgum--:guard-line */
#define XXHASH_XXH32_IMPL 1 /* --amalgum--:guard-line */
/* --amalgum--:begin:xxhash-xxh32-impl.h */
#include "xxhash-inline-mode.h" /* --amalgum--:include-line */
...
#endif /* XXHASH_XXH32_IMPL */ /* --amalgum--:guard-line */ |
Added /* --amalgam--:begin:xxhash-decl.h */
/* --amalgam--:guard:XXHASH_DECL_H */ becomes #ifndef XXHASH_DECL_H /* --amalgam--:guard-line */
#define XXHASH_DECL_H 1 /* --amalgam--:ignore-line */
...
#endif /* XXHASH_DECL_H */ /* --amalgam--:ignore-line */ cd
mkdir xxh-amalgam
cd xxh-amalgam
curl -LOJ https://gist.githubusercontent.com/t-mat/675417703f9956ee73ce0f2b3721edca/raw/f34bdbd4037e6287c9972f39fe5ab3eaf8d8476a/xxhash.h
curl -LOJ https://gist.githubusercontent.com/t-mat/6b289ec10f6130e1a34841b69237a1cd/raw/0e769e68fc800c242ca52ae011da1d425f0ebedb/amalgam-separator.py
curl -LOJ https://gist.githubusercontent.com/t-mat/6b289ec10f6130e1a34841b69237a1cd/raw/0e769e68fc800c242ca52ae011da1d425f0ebedb/amalgam-expander.py
# Check special comments
grep "amalgam" xxhash.h
# split xxhash.h into multiple headers. base file is xxhash-root.h
python3 ./amalgam-separator.py
ls -l *.h
# expand include statement in xxhash-root.h
python3 ./amalgam-expander.py > test.h
diff xxhash.h test.h (edit) note : rename |
There is something I wanted to point out with xxhash. It popped into my head when I was reading about non/bsd hash result convention (#645) and about simple source code to integrate (#550). xxhash is family of hashes, As I don't want to create another issue but have question that is associated with the subject: Is there any sensible way to compile single hash (by single hash I mean xxh32 or xxh64 or xxh3-128, and so on) other than meticulous, manual carving xxh32 functions from the source? |
This is a listed objective, and will require time to achieve. In the meantime, you can control the scope of the binary by using the following build macros : This is explained and documented here : |
Thanks, it worked. |
This level of complexity is unnecessary. All we need, in my opinion, is to split it into I actually tried to split it but due to all these optimizations it's so complicated that I got stuck, also because of construction like that, which suggests it uses XXH- code (XXH32: XXH_readLE32, XXH64: XXH_readLE64):
and
Maybe some of these functions has to be "duplicated" and have separate XXH and XXH3 "versions". All I achieved was to remove everything enclosed within Ed: corrected xxh.h, xxh3.h; I left |
Coming back to this topic, I know this topic has already been attempted, though not completed, likely due to complexity. While I don't remember all the details, I suspect one issue which impacted complexity was an attempt to have a both-ways conversion, splitting the initial I now suspect that this objective directly lead to increased complexity of the solution, which in turn meant it was not completed. I'm now considering a simpler approach, which would be strictly one-direction, aka from small files to amalgamated Of course, there are a few drawbacks too, and it would be better to underline them first. To begin with, it must be clear that contributors should no longer edit the main Another issue is interface documentation, since the content of Finally, there is also the question of the "transition period", during which it's not easy to integrate new changes while the source code get dismantled. There is also an inherent complexity in deciding how to split Note that I don't expect to start such an effort "immediately". |
One of the strong points of
xxhash
is its integration story : it's enough to grab a single filexxhash.h
and there you have the entire library available in your program.However, this only applies to people who trust the source code. In some cases, when a programmer feels he should inspect the source code before integration, the large source size of
xxhash.h
can feel daunting by itself.One idea could be to split the source code into more dedicated units, resulting in more focused code bases. If someone is only interested in
XXH32()
orXXH64()
, they could integrate, or even copy/paste, just this code.A good example of this strategy is Stefan Brumme's simple xxHash C++ libraries.
To reach the same level of clarity,
XXH3
would have to be split even further, alongside its vector variants.And of course, one-shot and streaming variants would preferably be separated too.
This could work. But it would also result in a more complex multi-files code base, making the integration story more difficult.
A potential work-around could be to split the source code into these dedicated units, and then have an automated build system which regenerates the "complete" amalgamated
xxhash.h
library, so that current user base can still benefit from the 1-file integration story.Splitting is not too difficult, though requires quite some work, hence some available time.
The amalgamation process can be a bit more tricky, especially if an important objective is that
xxhash.h
must remain an approachable source file which can be read and debugged, as readability experience can be hurt by a dumb amalgamation process.Supersedes #543.
The text was updated successfully, but these errors were encountered: