Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Sim plots] Sim plots use massive storage; so only keep most recent state #1020

Closed
1 task
trentmc opened this issue May 11, 2024 · 2 comments · Fixed by #1075
Closed
1 task

[Sim plots] Sim plots use massive storage; so only keep most recent state #1020

trentmc opened this issue May 11, 2024 · 2 comments · Fixed by #1075
Labels
Type: Enhancement New feature or request

Comments

@trentmc
Copy link
Member

trentmc commented May 11, 2024

Background / motivation

In the beginning, sim analytics plots used matplotlib. It didn't save state at all. What you saw was the plots built from in-memory data.

Then as part of the matplotlib -> streamlit work #749, the plot data for each sim iteration got stored (as pickles).

It was a big upside: greatly helped UX to have persistent storage of plots.

But there was a downside: a single run can take 10Gb+ disk storage space. It's acceptable, but if one does multiple runs it can quickly chew up all storage space.

Example from a 2h run:
top
...
Screenshot 2024-05-11 at 08 25 18

Towards a solution

The way that the plots are constructed, they only need the most recent state. The most recent state holds data from all past iterations. Therefore the sim doesn't need past states.

Which means when a new state file is stored, the previous state's pickle file (.pkl) can be deleted. (But be sure that the new file is stored first)

TODOs / DoD

  • In sim_engine: once a new state is stored to disk, delete all states' pickle files (.pkl).
@trentmc trentmc added the Type: Enhancement New feature or request label May 11, 2024
@trentmc trentmc changed the title [Sim, Analytics] Sim plots use massive storage; so only keep most recent state [Sim plots] Sim plots use massive storage; so only keep most recent state May 11, 2024
@calina-c
Copy link
Contributor

calina-c commented May 21, 2024

@trentmc the previous states are used when browsing through with the slider, after finalising the state. I do agree: not all the pickes are needed. We have two options right now:

  • I can work to map previous states in the slider based on the most recent state
  • removing the slider altogether. It was originally added in order to experiment with some streamlit view components, to see if they are instantaneous enough. Then it got ported into Dash, due to inertia + checking to see if components can be used similarly. But that timeline slider is virtually useless for time-based graphs and I've seen few people even know it exists because they don't wait for the sim final state.

What I will do:

  • see if I can easily keep previous state management based on the most recent state.
  • if it takes too much time and effort, I suggest we reconsider the slider and just remove it. There's plenty more cool things to do.

@trentmc
Copy link
Member Author

trentmc commented May 21, 2024

removing the slider altogether

Yes this is the way. We don't need this info. Because the current state incorporates all the relevant historical info.

calina-c added a commit that referenced this issue May 22, 2024
* Remove slider and old sim states, keeping just the most recent.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants