Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Option to disable auto commit after data ingestion #2521

Open
HyunggyuJang opened this issue Aug 7, 2023 · 1 comment
Open

[FEATURE] Option to disable auto commit after data ingestion #2521

HyunggyuJang opened this issue Aug 7, 2023 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@HyunggyuJang
Copy link

Description

Currently, versions are made upon data ingestion with the following code:

self.dataset.commit(allow_empty=True)

It seems like every time the commit is made, the full dataset of current state is captured as a corresponding version. So, if the user commits a lot, the storage the versions consumes blows up rapidly.

It becomes problematic if the user ingest small data incrementally, i.e., the dataset between versions are almost the same, so consumes space inefficiently.

The canonical solution for this would be to capture only the diff data for each version, but as I'm not acquainted the codebase, don't know whether it is feasible.

So, instead, I suggest to offer an option that users can choose whether they want "auto-commit" or not when ingest a data.

Use Cases

No response

@HyunggyuJang HyunggyuJang added the enhancement New feature or request label Aug 7, 2023
@FayazRahman
Copy link
Contributor

Hey @HyunggyuJang, thanks a lot for raising the issue. We're already working on this, and I'll be sure to let you know when the updates are released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants