Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fdw: Add foreign data wrappers to read snapshots on s3, gcs or azure #15718

Open
2 tasks
mfussenegger opened this issue Mar 19, 2024 · 0 comments
Open
2 tasks
Labels
complexity: 5-8 feature: cold store feature: fdw Foreign data wrapper needs upvotes Please use the reaction feature on the issue to signal your interest. This helps us prioritize

Comments

@mfussenegger
Copy link
Member

mfussenegger commented Mar 19, 2024

Problem Statement

Keeping all data forever in CrateDB can get expensive.
Deleting the data isn't an option, because it might still be needed, but it is not queried often.

Having an option to use cheaper storage option at the expense of query performance would be nice.

Possible Solutions

Backups already exist in the form of snapshots. These snapshots could be exposed via a foreign table wrapper/foreign table to be able to query them ad-hoc.

Advantages:

  • No need for additional data exports
  • Infrastructure is already setup if there are backups
  • Snapshots contain segments file, we might be able to use the Lucene structures to do filters more efficiently than with alternative file formats.
  • Full control over optimization possiblities, as we control the format.

Downsides:

  • Snapshot format compatibility with major version updates(?)
  • Others?

Considered Alternatives

Technical constraints

Initial Scope (estimate is only for this)

  • Simple but slow version; No caching of downloaded data; Always reads from remote as neeeded; No attempts at operation push-down or partial result retrieval.

Follow up (for later dedicated issues, not included in the first implementation)

  • Download only required data to minimize traffic; And maybe add cache for the remote data to avoid repeated downloads
  • Index utilization; Retrieve metadata/index files, then only fetch required documents.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
complexity: 5-8 feature: cold store feature: fdw Foreign data wrapper needs upvotes Please use the reaction feature on the issue to signal your interest. This helps us prioritize
Projects
None yet
Development

No branches or pull requests

2 participants