Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] How to produce dataset with millions of data in batches #677

Open
Q-Bug4 opened this issue Apr 3, 2024 · 0 comments
Open

[Question] How to produce dataset with millions of data in batches #677

Q-Bug4 opened this issue Apr 3, 2024 · 0 comments

Comments

@Q-Bug4
Copy link

Q-Bug4 commented Apr 3, 2024

Hi, we love using hollow, it is very nice.

I wanna know if there is a properly way to produce data in batches? Like I have 10 million objects to produce, I wanna produce them divided into 10 parts and produce 1 million objects every time. I need to produce data in batches because my vm does not have enough memory to store 10 million objects.

I am using Incremental and withNumStatesBetweenSnapshots to make it publish snapshot only at begining and at last so that it run like "in batches". But I met a problem that sometimes the Incremental did not publish dataset because some batch do not change the dataset.
I have fork hollow-reference-implementation and make 2 test cases to show what we are looking for. You can check my test cases: ProducerTest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant