-
Notifications
You must be signed in to change notification settings - Fork 35.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Remove boost usage from kernel api / headers #28335
base: master
Are you sure you want to change the base?
RFC: Remove boost usage from kernel api / headers #28335
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code CoverageFor detailed information about the code coverage, see the test coverage report. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
This is a pretty big change, but looking over it, it seems all straightforward and mechanical, and should not be very hard to review. I'd be interested in opinions from developers who work more with closely with mempool code about whether this improves the mempool code, or if it creates any issues. My opinion is would be that if this improves mempool code it should be merged, and if it makes mempool code more complicated or harder to maintain, it should not be merged. I don't think the stated goals of this PR are very compelling on their own if they are just about speeding up compilation times, or making it easier to compile the kernel library and kernel applications with different versions of boost. (I doubt there are a lot of applications that would opt for using multiple versions of boost in the same build, and if there are it would seem more straightforward for them to put an #include boundary between their application code and libbitcoin_kernel than would be is for this PR to put a boundary between mempool code and the rest of the kernel.) But despite the large diff and long description, the changes in the PR do seem pretty straightforward, so I don't think the benefits need to be that big to justify the changes. |
Concept ACK. I looked at an early version of this and thought generally the same: it's mostly mechanical and not too bad to review.
Generally agree. I'd also like to hear from some mempool folks. @glozow ?
I think this glosses over the larger result of this work: boost is now not a requirement for using libbitcoinkernel at all. It removes our last external dependency. That's a huge achievement imo. Barring us vendoring Boost, it's the difference between: "take this lib and build a hello world app in your point-and-click IDE" to "use this docker image which includes our recommended version of boost headers to build from a script". I know, I know, the latter is an exaggeration and wouldn't be the end of the world. But without boost, as-is right now, we'd be able to ship a libbitcoinkernel that depends on only these headers (though obviously paring down that list is a next step) and nothing from the outside world. To me that's very significant.
+1 though as stated above, to me the benefits are that big :) |
PR title is misleading btw. Boost is not removed from the kernel, it's removed from the kernel's api. I think conceptually this is the commit that really demonstrates the change here: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
boost is now not a requirement for using libbitcoinkernel at all. It removes our last external dependency. That's a huge achievement imo.
Sounds pretty cool to me.
I'm also quite interested in this advertised benefit:
Pimpling the mempool members (like done in this PR) might also make a future splitting of block and mempool validation logic into separate compilation units easier.
My opinion is would be that if this improves mempool code it should be merged, and if it makes mempool code more complicated or harder to maintain, it should not be merged.
I think the size of the diff is a symptom of lots of code external to txmempool knowing a bit too much about how txmempool is implemented, i.e. accessing mapTx
when it should probably use a different CTxMemPool
method to get the information it needs. Adding .impl
in those areas seems a bit ugly tbh - I think a cleaner approach would be to have those bits of code not use txiter/mapTx/setEntries. Apart from that, it seems generally good to move more of the implementation inward. I just had 1 comment about UpdateForDescendants
.
This branch refactors some of these mapTx
accesses to use things like infoAll()
and GetIter()
and size()
, which imo are the more appropriate functions to be calling there. The result is that a lot of things don't need to be pimpl'd anymore or fewer places are using the pimpl'd data structures. I rebased this PR on top of that branch, which resulted in 7 commits being unnecessary and the pimpling commits touching fewer files. The overall diff isn't that much smaller, but I think it achieves the intended result + independently nice changes.
I sorted the commits from most trivial to least trivial - the first ones are very straightforward things like "call pool.size() instead of pool.mapTx.size()." The last part re-implements DisconnectedBlockTransactions
to not need a boost multi index container, which I suppose could be controversial. Would you maybe be interested in taking any of these changes?
* @param[in,out] mapTx a reference to the complete mempool map. | ||
* @param[in] limits the mempool limits configuration. | ||
*/ | ||
static void UpdateForDescendants( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e59f925
Instead of making this static, perhaps you could inline this into UpdateTransactionsFromBlock
? Dropping the thread-safety annotations is a bit of a negative imo.
Yes this is a distinction I glossed over and didn't think much about. With this PR, boost is still a requirement for building libbitcoinkernel, but not a requirement for using it. But I don't understand what the big advantage is (other than the ones already mentioned about speeding up build times or mixing versions of boost). And I don't undestand or what you are saying about the docker image and IDE. To my mind there are two type of developers who might want to use libbitcoinkernel:
The effect of this PR seems negligible in both of this cases, probably resulting in literal 1-line change like fd49dc9 removing a path from the include list. So I'm guessing maybe the thing you are enthusiastic about is supporting a third type of developer who (1) doesn't want to use a package and (2) doesn't want to build from source, but instead wants to download a tarball containing a precompiled libbitcoinkernel library binary and some headers, and use that. But the difference is pretty small in this case too, no? Without this PR the tarball needs to contain a few boost header files, and with this PR, it doesn't. In any case, I'm happy about the benefits of this PR, but just think the main consideration here should be whether it improves the mempool code or not, and glad to see glozow had some suggestions for simplifying it. |
I think we should leave it at this. I'll admit to complete tunnel-vision on this particular boost removal. It's been @glozow thanks for taking a look! Seems getting rid of some of the direct uses of |
Thank you for taking so much consideration for this change! I think your additional patches are well worth reducing the amount of code that has to directly access the boost data structure from outside of the
Most of the changes (as you mentioned) should be uncontroversial, e.g. just directly using existing helpers, or the I think the The other two cases I am not sure about is using |
dbb14bd
to
8e125c0
Compare
Updated with @glozow's patches. I think these are well worth it for both cleaning up the code to not needlessly reach into complex mempool member variables as well as reducing the number of source files that need to include the multiindex headers. The order of the commits was slightly changed, the heavier |
@@ -890,7 +890,7 @@ bitcoin_util_LDADD = \ | |||
|
|||
# bitcoin-chainstate binary # | |||
bitcoin_chainstate_SOURCES = bitcoin-chainstate.cpp | |||
bitcoin_chainstate_CPPFLAGS = $(AM_CPPFLAGS) $(BITCOIN_INCLUDES) $(BOOST_CPPFLAGS) | |||
bitcoin_chainstate_CPPFLAGS = $(AM_CPPFLAGS) $(BITCOIN_INCLUDES) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW this won't have any effect on c-i currently because boost is installed into the common depends prefix.
As a quick hack to demonstrate the functionality of this PR, you could add: theuni@e131d8f and drop it before merge. That puts boost in a different path, so BOOST_CPPFLAGS
becomes a requirement as-intended. (We should also fix that for good in a less hackish way.)
As far as I can tell this PR should currently fail with the above patch because the removal is not yet done as mentioned here: bdbc509#r1313092304
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a quick hack to demonstrate the functionality of this PR, you could add: theuni@e131d8f and drop it before merge. That puts boost in a different path, so
BOOST_CPPFLAGS
becomes a requirement as-intended.
IIRC, this issue is recurring, no?
We should also fix that for good in a less hackish way.
Agree with it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh, yes, I think this is the 3rd time we've needed this to demonstrate a boost dependency removal. Will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this was covered by CI until we dropped the native macOS ARM builds.
d50a9b6
to
d2423b2
Compare
Opened #28385 with the less straightforward changes. |
bd5c1e6
to
3bf4a26
Compare
Currently the mempool returns and consumes sets of multiindex iterators in its public API. A likely motivation for this over working with references to the actual values is that the multi index interface works with these iterators and not with pointers or references to the actual values. However, the iterator type provides little benefit in practice as applied currently. Its purpose, ownership, and safety semantics often remain ambiguous, and it is hardly used for actually iterating through the data structures. So replace it where possible with CTxMemPoolEntryRefs. Since CTxMemPoolEntry itself refers to its Parents and Children by CTxMemPoolEntryRef and not txiter, this allowed for an overall reduction of calls to iterator_to. This also makes the long term goal of eliminating boost types from the headers more feasible.
Introduces mempool_set_definitions.h which collects the boost types that were previously defined in the txmempool header. The context of this change is an effort towards removing the boost multi index includes from the header tree. The goal is that the experimental `bitcoin-chainstate` binary can be compiled without requiring boost includes.
They need not be in the header, so move them to the implementation.
The context of this change is an effort towards removing the boost multi index includes from the header tree. The goal is that the experimental `bitcoin-chainstate` binary can be compiled without requiring boost includes.
The context of this change is an effort towards removing the boost multi index includes from the header tree. The goal is that the experimental `bitcoin-chainstate` binary can be compiled without requiring boost includes.
The context of this change is an effort towards removing the boost multi index includes from the header tree. The goal is that the experimental `bitcoin-chainstate` binary can be compiled without requiring boost includes.
This allows them to use the mapTx without including the full mapTx definition in the header. The context of this change is an effort towards removing the boost multi index includes from the header tree. The goal is that the experimental `bitcoin-chainstate` binary can be compiled without requiring boost includes.
This removes the need for defining its complex boost multi index return type in the header. The context of this change is an effort towards removing the boost multi index includes from the header tree. The goal is that the experimental `bitcoin-chainstate` binary can be compiled without requiring boost includes.
This missing include was found in the following commit.
All the header files should now no longer contain boost multi index includes. Only the implementation files directly include the multi index types for manipulating mempool data structures. The context of this change is an effort towards removing the boost multi index includes from the header tree. The goal is that the experimental `bitcoin-chainstate` binary can be compiled without requiring boost includes.
This is possible as of the previous commit.
3bf4a26
to
55d99c2
Compare
🐙 This pull request conflicts with the target branch and needs rebase. |
⌛ There hasn't been much activity lately and the patch still needs rebase. What is the status here?
|
Similarly to #28327 I wanted to open this PR to receive some opinions and better approaches.
The kernel library is currently at the stage where unwanted headers are removed from its set of headers. In practice, this means we are reducing the number of includes that are required for compiling the experimental
bitcoin-chainstate
binary. This is described in stage 1 step 3 of the project tracking issue.Currently the mempool is part of the kernel library. The mempool headers include the boost multi index headers. Thus any application wanting to use the kernel library and its headers will have to include the boost headers too. This is not only undesirable because of the sheer size of these headers, but also might lead to conflicts if the including application uses a different boost version.
In the approach laid out by this PR, mempool member variables and methods are declared in the header without having to include boost by either wrapping them in a struct and pimpling them, or making methods static implementation functions. The boost definitions are gathered into separate header (
mempool_set_definitions.h
) that is only included by implementation files that require definitions of the boost types. This allows us to retain the current architecture with roughly the same interfaces.The approach laid out by this PR also has some, albeit small, compilation speed and size benefits. Averaged over a few of compilation runs I consistently observe faster compilation by a couple of seconds and some smaller pre-processed and compiled object sizes. The main detractor of this method is obviously the number of lines touched. However it also has the benefit of inventorizing all the files that require direct access to the mempool data structures as well getting rid of boost multi index includes in non-kernel implementation files that include the mempool, but don't directly manipulate its data structures (e.g. wallet.cpp).
A much simpler alternative approach, at least on the surface, would be removing all
txmempool.h
includes from kernel library headers (see this branch). Currently this is onlyvalidation.h
. Due to the mutex member ofCTxMemPool
and the correspondingly defined lock decorators on the chainstate methods this becomes a bit more complicated though and I am not sure how this might be possible with the current architecture.A discussion of how and if to remove the mempool from the kernel library has so far been intentionally punted to the next stage of the kernel library development. Pimpling the mempool itself precludes this discussion, since the library could never be shipped with the
CTxMemPool
headers. Pimpling the mempool members (like done in this PR) might also make a future splitting of block and mempool validation logic into separate compilation units easier.