Skip to content

Help needed to figure out asynchronous parallel writing during data stream #2066

Answered by davidbuniat
Alegrowin asked this question in Q&A
Discussion options

You must be logged in to vote

Hi @Alegrowin, thanks for posing the question and starting the discussion on this topic. Personally, I am pretty excited about it! Basically you are looking for ACID transaction support on the storage level that any process can concurrently work and modify the data. We have two options.

  1. Short-term: Currently Deep Lake is limited to multi-branch concurrent writes. In other words, each process needs to checkout a new branch and write the data. There also needs to be a final process that merges all branches together. In terms of performance at the moment, it would be similar to creating different datasets and combining them together at a later time (without metadata handling). We are happy…

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by mikayelh
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants