Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Remove boost usage from kernel api / headers #28335

Draft
wants to merge 15 commits into
base: master
Choose a base branch
from

Conversation

TheCharlatan
Copy link
Contributor

@TheCharlatan TheCharlatan commented Aug 24, 2023

Similarly to #28327 I wanted to open this PR to receive some opinions and better approaches.

The kernel library is currently at the stage where unwanted headers are removed from its set of headers. In practice, this means we are reducing the number of includes that are required for compiling the experimental bitcoin-chainstate binary. This is described in stage 1 step 3 of the project tracking issue.

Currently the mempool is part of the kernel library. The mempool headers include the boost multi index headers. Thus any application wanting to use the kernel library and its headers will have to include the boost headers too. This is not only undesirable because of the sheer size of these headers, but also might lead to conflicts if the including application uses a different boost version.

In the approach laid out by this PR, mempool member variables and methods are declared in the header without having to include boost by either wrapping them in a struct and pimpling them, or making methods static implementation functions. The boost definitions are gathered into separate header (mempool_set_definitions.h) that is only included by implementation files that require definitions of the boost types. This allows us to retain the current architecture with roughly the same interfaces.

The approach laid out by this PR also has some, albeit small, compilation speed and size benefits. Averaged over a few of compilation runs I consistently observe faster compilation by a couple of seconds and some smaller pre-processed and compiled object sizes. The main detractor of this method is obviously the number of lines touched. However it also has the benefit of inventorizing all the files that require direct access to the mempool data structures as well getting rid of boost multi index includes in non-kernel implementation files that include the mempool, but don't directly manipulate its data structures (e.g. wallet.cpp).

A much simpler alternative approach, at least on the surface, would be removing all txmempool.h includes from kernel library headers (see this branch). Currently this is only validation.h. Due to the mutex member of CTxMemPool and the correspondingly defined lock decorators on the chainstate methods this becomes a bit more complicated though and I am not sure how this might be possible with the current architecture.

A discussion of how and if to remove the mempool from the kernel library has so far been intentionally punted to the next stage of the kernel library development. Pimpling the mempool itself precludes this discussion, since the library could never be shipped with the CTxMemPool headers. Pimpling the mempool members (like done in this PR) might also make a future splitting of block and mempool validation logic into separate compilation units easier.

@DrahtBot
Copy link
Contributor

DrahtBot commented Aug 24, 2023

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type Reviewers
Concept ACK theuni

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #28960 (kernel: Remove dependency on CScheduler by TheCharlatan)
  • #28948 (v3 transaction policy for anti-pinning by glozow)
  • #28368 (Fee Estimator updates from Validation Interface/CScheduler thread by ismaelsadeeq)
  • #26593 (tracing: Only prepare tracepoint arguments when actually tracing by 0xB10C)
  • #25038 (policy: nVersion=3 and Package RBF by glozow)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@ryanofsky
Copy link
Contributor

This is a pretty big change, but looking over it, it seems all straightforward and mechanical, and should not be very hard to review.

I'd be interested in opinions from developers who work more with closely with mempool code about whether this improves the mempool code, or if it creates any issues.

My opinion is would be that if this improves mempool code it should be merged, and if it makes mempool code more complicated or harder to maintain, it should not be merged.

I don't think the stated goals of this PR are very compelling on their own if they are just about speeding up compilation times, or making it easier to compile the kernel library and kernel applications with different versions of boost. (I doubt there are a lot of applications that would opt for using multiple versions of boost in the same build, and if there are it would seem more straightforward for them to put an #include boundary between their application code and libbitcoin_kernel than would be is for this PR to put a boundary between mempool code and the rest of the kernel.)

But despite the large diff and long description, the changes in the PR do seem pretty straightforward, so I don't think the benefits need to be that big to justify the changes.

@theuni
Copy link
Member

theuni commented Aug 30, 2023

Concept ACK. I looked at an early version of this and thought generally the same: it's mostly mechanical and not too bad to review.

My opinion is would be that if this improves mempool code it should be merged, and if it makes mempool code more complicated or harder to maintain, it should not be merged.

Generally agree. I'd also like to hear from some mempool folks. @glozow ?

making it easier to compile the kernel library and kernel applications with different versions of boost. (I doubt there are a lot of applications that would opt for using multiple versions of boost in the same build, and if there are it would seem more straightforward for them to put an #include boundary between their application code and libbitcoin_kernel than would be is for this PR to put a boundary between mempool code and the rest of the kernel.)

I think this glosses over the larger result of this work: boost is now not a requirement for using libbitcoinkernel at all. It removes our last external dependency. That's a huge achievement imo. Barring us vendoring Boost, it's the difference between: "take this lib and build a hello world app in your point-and-click IDE" to "use this docker image which includes our recommended version of boost headers to build from a script". I know, I know, the latter is an exaggeration and wouldn't be the end of the world. But without boost, as-is right now, we'd be able to ship a libbitcoinkernel that depends on only these headers (though obviously paring down that list is a next step) and nothing from the outside world. To me that's very significant.

But despite the large diff and long description, the changes in the PR do seem pretty straightforward, so I don't think the benefits need to be that big to justify the changes.

+1 though as stated above, to me the benefits are that big :)

@theuni
Copy link
Member

theuni commented Aug 30, 2023

PR title is misleading btw. Boost is not removed from the kernel, it's removed from the kernel's api.

I think conceptually this is the commit that really demonstrates the change here: bitcoin-chainstate no longer has any external dependencies.

@TheCharlatan TheCharlatan changed the title RFC: Remove boost usage from kernel RFC: Remove boost usage from kernel api / headers Aug 30, 2023
Copy link
Member

@glozow glozow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

boost is now not a requirement for using libbitcoinkernel at all. It removes our last external dependency. That's a huge achievement imo.

Sounds pretty cool to me.

I'm also quite interested in this advertised benefit:

Pimpling the mempool members (like done in this PR) might also make a future splitting of block and mempool validation logic into separate compilation units easier.

My opinion is would be that if this improves mempool code it should be merged, and if it makes mempool code more complicated or harder to maintain, it should not be merged.

I think the size of the diff is a symptom of lots of code external to txmempool knowing a bit too much about how txmempool is implemented, i.e. accessing mapTx when it should probably use a different CTxMemPool method to get the information it needs. Adding .impl in those areas seems a bit ugly tbh - I think a cleaner approach would be to have those bits of code not use txiter/mapTx/setEntries. Apart from that, it seems generally good to move more of the implementation inward. I just had 1 comment about UpdateForDescendants.

This branch refactors some of these mapTx accesses to use things like infoAll() and GetIter() and size(), which imo are the more appropriate functions to be calling there. The result is that a lot of things don't need to be pimpl'd anymore or fewer places are using the pimpl'd data structures. I rebased this PR on top of that branch, which resulted in 7 commits being unnecessary and the pimpling commits touching fewer files. The overall diff isn't that much smaller, but I think it achieves the intended result + independently nice changes.

I sorted the commits from most trivial to least trivial - the first ones are very straightforward things like "call pool.size() instead of pool.mapTx.size()." The last part re-implements DisconnectedBlockTransactions to not need a boost multi index container, which I suppose could be controversial. Would you maybe be interested in taking any of these changes?

* @param[in,out] mapTx a reference to the complete mempool map.
* @param[in] limits the mempool limits configuration.
*/
static void UpdateForDescendants(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e59f925
Instead of making this static, perhaps you could inline this into UpdateTransactionsFromBlock? Dropping the thread-safety annotations is a bit of a negative imo.

@ryanofsky
Copy link
Contributor

I think this glosses over the larger result of this work: boost is now not a requirement for using libbitcoinkernel at all.

Yes this is a distinction I glossed over and didn't think much about. With this PR, boost is still a requirement for building libbitcoinkernel, but not a requirement for using it.

But I don't understand what the big advantage is (other than the ones already mentioned about speeding up build times or mixing versions of boost). And I don't undestand or what you are saying about the docker image and IDE. To my mind there are two type of developers who might want to use libbitcoinkernel:

  1. Developers who will want to use a packaged version of libbitcoinkernel in the form of a vcpkg, conan, nix, or wherever other package.
  2. Developers who will take a more DIY approach and put libbitcoinkernel in a vendor branch or monorepo or something like that.

The effect of this PR seems negligible in both of this cases, probably resulting in literal 1-line change like fd49dc9 removing a path from the include list.

So I'm guessing maybe the thing you are enthusiastic about is supporting a third type of developer who (1) doesn't want to use a package and (2) doesn't want to build from source, but instead wants to download a tarball containing a precompiled libbitcoinkernel library binary and some headers, and use that. But the difference is pretty small in this case too, no? Without this PR the tarball needs to contain a few boost header files, and with this PR, it doesn't.

In any case, I'm happy about the benefits of this PR, but just think the main consideration here should be whether it improves the mempool code or not, and glad to see glozow had some suggestions for simplifying it.

@theuni
Copy link
Member

theuni commented Aug 30, 2023

In any case, I'm happy about the benefits of this PR, but just think the main consideration here should be whether it improves the mempool code or not, and glad to see glozow had some suggestions for simplifying it.

I think we should leave it at this. I'll admit to complete tunnel-vision on this particular boost removal. It's been an obsession a goal of mine for long enough that I'm probably too biased to judge the code changes required for it. And as you said, I don't think i need to belabor that point anyway when there's other merit to the changes.

@glozow thanks for taking a look! Seems getting rid of some of the direct uses of mapTx would be a nice cleanup even outside of this PR?

@TheCharlatan
Copy link
Contributor Author

TheCharlatan commented Aug 30, 2023

Re #28335 (review)

Thank you for taking so much consideration for this change! I think your additional patches are well worth reducing the amount of code that has to directly access the boost data structure from outside of the CTxMemPool class.

Would you maybe be interested in taking any of these changes?

Most of the changes (as you mentioned) should be uncontroversial, e.g. just directly using existing helpers, or the vTxHashes changes. I'll gladly integrate these here.

I think the DisconnectedBlockTransactions change would merit its own PRs, since it does change the internal behavior and performance somewhat.

The other two cases I am not sure about is using infoAll() (as done here) to create a vector of TxMemPoolInfo instead of directly iterating over mapTx and construction of a new GenTxid here. In both cases it would be fairly straight forward to replace them with new CTxMemPool helper methods if they prove too controversial.

@TheCharlatan
Copy link
Contributor Author

Updated with @glozow's patches. I think these are well worth it for both cleaning up the code to not needlessly reach into complex mempool member variables as well as reducing the number of source files that need to include the multiindex headers.

The order of the commits was slightly changed, the heavier DisconnectedBlockTransactions refactor is now the first commit. As a further approach I think two separate PRs should be opened; one for the heavier DisconnectedBlockTransactions and related changes as well as another one for the easier refactors. Once these are processed, this current PR can be rebased and opened for code review.

@DrahtBot DrahtBot removed the CI failed label Sep 1, 2023
src/txmempool.h Outdated Show resolved Hide resolved
@@ -890,7 +890,7 @@ bitcoin_util_LDADD = \

# bitcoin-chainstate binary #
bitcoin_chainstate_SOURCES = bitcoin-chainstate.cpp
bitcoin_chainstate_CPPFLAGS = $(AM_CPPFLAGS) $(BITCOIN_INCLUDES) $(BOOST_CPPFLAGS)
bitcoin_chainstate_CPPFLAGS = $(AM_CPPFLAGS) $(BITCOIN_INCLUDES)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW this won't have any effect on c-i currently because boost is installed into the common depends prefix.

As a quick hack to demonstrate the functionality of this PR, you could add: theuni@e131d8f and drop it before merge. That puts boost in a different path, so BOOST_CPPFLAGS becomes a requirement as-intended. (We should also fix that for good in a less hackish way.)

As far as I can tell this PR should currently fail with the above patch because the removal is not yet done as mentioned here: bdbc509#r1313092304

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a quick hack to demonstrate the functionality of this PR, you could add: theuni@e131d8f and drop it before merge. That puts boost in a different path, so BOOST_CPPFLAGS becomes a requirement as-intended.

IIRC, this issue is recurring, no?

We should also fix that for good in a less hackish way.

Agree with it :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, yes, I think this is the 3rd time we've needed this to demonstrate a boost dependency removal. Will do.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this was covered by CI until we dropped the native macOS ARM builds.

@glozow
Copy link
Member

glozow commented Sep 1, 2023

As a further approach I think two separate PRs should be opened; one for the heavier DisconnectedBlockTransactions and related changes as well as another one for the easier refactors.

The other two cases I am not sure about is using infoAll() (as done here) to create a vector of TxMemPoolInfo instead of directly iterating over mapTx and construction of a new GenTxid here. In both cases it would be fairly straight forward to replace them with new CTxMemPool helper methods if they prove too controversial.

Opened #28385 with the less straightforward changes.

Maytow

This comment was marked as spam.

Currently the mempool returns and consumes sets of multiindex iterators
in its public API. A likely motivation for this over working with
references to the actual values is that the multi index interface works
with these iterators and not with pointers or references to the actual
values.

However, the iterator type provides little benefit in practice as
applied currently. Its purpose, ownership, and safety semantics often
remain ambiguous, and it is hardly used for actually iterating through
the data structures. So replace it where possible with
CTxMemPoolEntryRefs.

Since CTxMemPoolEntry itself refers to its Parents and Children by
CTxMemPoolEntryRef and not txiter, this allowed for an overall
reduction of calls to iterator_to.

This also makes the long term goal of eliminating boost types from the
headers more feasible.
Introduces mempool_set_definitions.h which collects the boost types that
were previously defined in the txmempool header.

The context of this change is an effort towards removing the boost multi
index includes from the header tree. The goal is that the experimental
`bitcoin-chainstate` binary can be compiled without requiring boost
includes.
They need not be in the header, so move them to the implementation.
The context of this change is an effort towards removing the boost multi
index includes from the header tree. The goal is that the experimental
`bitcoin-chainstate` binary can be compiled without requiring boost
includes.
The context of this change is an effort towards removing the boost multi
index includes from the header tree. The goal is that the experimental
`bitcoin-chainstate` binary can be compiled without requiring boost
includes.
The context of this change is an effort towards removing the boost multi
index includes from the header tree. The goal is that the experimental
`bitcoin-chainstate` binary can be compiled without requiring boost
includes.
This allows them to use the mapTx without including the full mapTx
definition in the header.

The context of this change is an effort towards removing the boost multi
index includes from the header tree. The goal is that the experimental
`bitcoin-chainstate` binary can be compiled without requiring boost
includes.
This removes the need for defining its complex boost multi index return
type in the header.

The context of this change is an effort towards removing the boost multi
index includes from the header tree. The goal is that the experimental
`bitcoin-chainstate` binary can be compiled without requiring boost
includes.
This missing include was found in the following commit.
All the header files should now no longer contain boost multi index
includes. Only the implementation files directly include the multi index
types for manipulating mempool data structures.

The context of this change is an effort towards removing the boost multi
index includes from the header tree. The goal is that the experimental
`bitcoin-chainstate` binary can be compiled without requiring boost
includes.
This is possible as of the previous commit.
@DrahtBot
Copy link
Contributor

DrahtBot commented Dec 1, 2023

🐙 This pull request conflicts with the target branch and needs rebase.

@DrahtBot
Copy link
Contributor

⌛ There hasn't been much activity lately and the patch still needs rebase. What is the status here?

  • Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
  • Is it no longer relevant? ➡️ Please close.
  • Did the author lose interest or time to work on this? ➡️ Please close it and mark it 'Up for grabs' with the label, so that it can be picked up in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

None yet

9 participants