Skip to content
This repository has been archived by the owner on Mar 17, 2021. It is now read-only.

Optimize file streaming #51

Open
WilfredTA opened this issue Apr 9, 2018 · 0 comments
Open

Optimize file streaming #51

WilfredTA opened this issue Apr 9, 2018 · 0 comments
Labels
Optimization Space or time complexity improvement

Comments

@WilfredTA
Copy link
Member

Currently, space complexity for transferring data from one node to another scales linearly with the amount of data in the shard being transferred: peak memory usage (space complexity) = O(bytesInShard)

We can push peak memory usage down to constant space complexity by using something like node.js pipe function.

Instead of fs.readFile we can pipe the data in a readStream to a tcp stream.

The JSONStream library we are using solves the problem of larger JSON objects getting parsed when being split into two when being transferred over streams. It seems like it does this by holding the JSON in memory and delaying the trigger of the data event on the JSONStream until it has received a full JSON object. That means that JSONStream's peak memory usage also scales linearly with the size of the JSON object being sent to it. I need to verify this suspicion with their source code, though.

The problem with piping smaller JSON objects that each contain a portion of the total shard data is that the shard data needs to be written in the order it was received, which is hard to manage when the data is written via event handlers.

Essentially what we need to do is write multiple chunks that are not received in order without storing all chunks in memory.

@WilfredTA WilfredTA added the Optimization Space or time complexity improvement label Apr 9, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Optimization Space or time complexity improvement
Projects
None yet
Development

No branches or pull requests

1 participant