Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: blockstore: GetMany blockstore method #492

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

i-norden
Copy link

@i-norden i-norden commented Oct 17, 2023

This PR introduces a GetMany blockstore method to compliment the existing PutMany method. This is for use in a GetMany go-ipld-cbor datastore implementation which, in turn, is for use in parallel ForEach methods for both go-amt-ipld and go-hamt-ipld. This PR uses the TxnDatastore keytransform/namespace wrapper introduced in ipfs/go-datastore#210.

GetMany go-ipld-cbor PR that uses this: ipfs/go-ipld-cbor#97

TODO:
Replace replace directives if/when the dependency is merged and released

@welcome
Copy link

welcome bot commented Oct 17, 2023

Thank you for submitting this PR!
A maintainer will be here shortly to review it.
We are super grateful, but we are also overloaded! Help us by making sure that:

  • The context for this PR is clear, with relevant discussion, decisions
    and stakeholders linked/mentioned.

  • Your contribution itself is clear (code comments, self-review for the
    rest) and in its best form. Follow the code contribution
    guidelines

    if they apply.

Getting other community members to do a review would be great help too on complex PRs (you can ask in the chats/forums). If you are unsure about something, just leave us a comment.
Next steps:

  • A maintainer will triage and assign priority to this PR, commenting on
    any missing things and potentially assigning a reviewer for high
    priority items.

  • The PR gets reviews, discussed and approvals as needed.

  • The PR is merged by maintainers when it has been approved and comments addressed.

We currently aim to provide initial feedback/triaging within two business days. Please keep an eye on any labelling actions, as these will indicate priorities and status of your contribution.
We are very grateful for your contribution!

Copy link
Contributor

@Jorropo Jorropo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for future me:
We maybe want automatic forwarding using type assertions in the default constructor function too.

bs.rehash.Store(enabled)
}

func (bs *getManyBlockStore) Get(ctx context.Context, k cid.Cid) (blocks.Block, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am surprised this function needs to be duplicated, I think you could use embeding here.

Copy link
Author

@i-norden i-norden Oct 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is possible but would require passing in a ds.Batching to the constructor function in addition to theds.TxnDatastore (or some composite interface of the two) as ds.TxnDatastore can't be used to construct a regular Blockstore for embedding because it doesn't satisfy the ds.Batching interface, even though the Put and Get methods we would like to fall through to don't use the portions of that interface that it doesn't support...

Copy link
Author

@i-norden i-norden Oct 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option would be to break the normal Blockstore up into the components that need ds.Batching and the ones that just need ds.Datastore.ds.TxDatastore satisfies ds.Datastore and we only need ds.Datastore to fulfill the Get and Put methods on either blockstore.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know what you think the best/cleanest approach is here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make it take the union of batching and txn datastores so it has access to all that is needed.
You could also implement a stub, a batch is inferior to a transaction while providing the same features:

Batching datastores support deferred, grouped updates to the database. Batches do NOT have transactional semantics: updates to the underlying datastore are not guaranteed to occur in the same iota of time. Similarly, batched updates will not be flushed to the underlying datastore until Commit has been called. Txns from a TxnDatastore have all the capabilities of a Batch, but the reverse is NOT true.

https://pkg.go.dev/github.com/ipfs/go-datastore#Batching

So if you accept a TxnDatastore you can use a simple stub that implements batches using transactions, this would make thee code work in the only place it's used right now.

blockstore/blockstore.go Outdated Show resolved Hide resolved
@@ -64,6 +65,13 @@ type Blockstore interface {
HashOnRead(enabled bool)
}

// TxnBlockstore is a blockstore interface that supports GetMany and PutMany methods using ds.TxnDatastore
type TxnBlockstore interface {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think TxnBlockstore is wrong because the blockstore does not provide transactions, GetManyBlockstore was fine mb.

@codecov
Copy link

codecov bot commented Oct 23, 2023

Codecov Report

Attention: Patch coverage is 50.00000% with 42 lines in your changes are missing coverage. Please review.

Project coverage is 65.64%. Comparing base (0a566c9) to head (b3ed048).
Report is 124 commits behind head on main.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #492      +/-   ##
==========================================
- Coverage   65.78%   65.64%   -0.15%     
==========================================
  Files         207      203       -4     
  Lines       25156    25385     +229     
==========================================
+ Hits        16549    16663     +114     
- Misses       7147     7235      +88     
- Partials     1460     1487      +27     
Files Coverage Δ
blockstore/blockstore.go 54.95% <50.00%> (-3.02%) ⬇️

... and 42 files with indirect coverage changes

Copy link
Contributor

@Jorropo Jorropo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx I'll try to look at this before the freeze on friday but I can't promise anything.

Copy link
Contributor

@aschmahmann aschmahmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @i-norden 🙏. IIUC this is related to your work landing a usable alternative to filecoin-project/go-hamt-ipld#103 which is great to see.

Left some suggestions.

type GetManyBlockstore interface {
Blockstore
PutMany(ctx context.Context, blocks []blocks.Block) error
GetMany(context.Context, []cid.Cid) ([]blocks.Block, []cid.Cid, error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two recommendations:

  1. It seems like there's no need for []blocks.Block and []cid.Cid since Block contains a .Cid() method https://github.com/ipfs/go-block-format/blob/v0.2.0/blocks.go#L19-L25
  2. You may want to consider a streaming interface so that you don't have to buffer all the blocks in memory

If returning an asynchronous object (e.g. channel or iterator) might be worth taking a look at ipfs/kubo#4592 to make sure you don't run into some common pitfalls. With Go generics now iterators may also make this easier than it used to be.

if len(cs) == 1 {
// performance fast-path
block, err := bs.Get(ctx, cs[0])
return []blocks.Block{block}, nil, err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't make sense to not return the CID here given it's in the signature, but also it doesn't seem like []cid.Cid needs to be in the return signature

@lidel lidel added the need/maintainers-input Needs input from the current maintainer(s) label Apr 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need/maintainers-input Needs input from the current maintainer(s)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants