Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Out of memory and "load_book" not being inserted #22

Open
martin-sn opened this issue Mar 10, 2021 · 2 comments
Open

Out of memory and "load_book" not being inserted #22

martin-sn opened this issue Mar 10, 2021 · 2 comments

Comments

@martin-sn
Copy link

martin-sn commented Mar 10, 2021

Hi, once again, thank you very much for the repo.

I have been having an issue, where i am unable to export a day of BTC-USD because the process gets killed because i run out of memory, even with 32gb of memory. I have commented on #20 but i am now opening a ticket as i have more issues.

I have been trying to modify the code such that i can query and export less than a days worth of data at a time (to get around the memory problem) and it has been somewhat successful. However, i have an issue with the _query_artic function. If i understand the project correctly, there should be inserted a "load_book" in the database after every snapshot (e.g. every second with the default settings), however, when i try to query less than a days worth at a time, no "load_book" is being found, and thus i get an exception that my query contains no data (Because the start index is not existent because the cursor.loc does not find a load_book).

So, i have been querying the data just using commands form Artic, and in my data, it seems that "load_book" is only being inserted once when i start recorder.py and then never again.

Am i misunderstanding how the project works? Or is there an issue here?

I will be grateful for any help you may be able to provide.

@sadighian
Copy link
Owner

Hi Martin,

Sorry to hear you're experiencing "out of memory" challenges when exporting recorded data to a CSV. From what you've described, there are no technical problems with the code base or setup on your machine. However, there are a few ways you can modify the code to prevent running out of memory when exporting data.

Your understanding of the data processing and capture is correct: the "load_book" flag is inserted every time a LOB snapshot is processed (i.e., either when the WebSocket (re)connects, OR out-of-sequence messages are received, thereby triggering a LOB snapshot reload).

There are two solutions possible that come to mind:

  1. Create a trigger to invoke an event that loads a new LOB snapshot more frequently. By increasing the number of "load_book" flags into your database, you'll have more "checkpoints" to use for reconstructing the LOB when exporting to a CSV. This trigger could be implemented as a counter of that is set off every n messages, or seconds.
  2. Insert LOB snapshots directly into the database, opposed to individual order messages. This approach would reduce your data footprint significantly (i.e., save 86,400 LOB snapshots per day, opposed to 1-10MM individual order update messages), but would require you to know the snapshot frequency ahead of time (i.e., once per second) and (i) create a new MongoDB collection for the LOB snapshot data, (ii) extend the Database class to perform read/write operations for LOB snapshot data, and (iii) update the Recorder class to pass the LOB snapshots to the database.

Hope this answers your question!

@martin-sn
Copy link
Author

martin-sn commented Mar 17, 2021

Hi Sadighian,

Thank you very much for the reply.

Option 1 is the solution i am going for. But should the load_book flag not already be inserted frequently according to the specified snapshot rate?

I have been looking in to creating the trigger, it should be placed here https://github.com/sadighian/crypto-rl/blob/arctic-streaming-ticks-full/recorder.py#L69 right? But i can't quite figure out which function exactly i should call to insert the load_book flag.

Once again, thank you for the help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants