Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve session class to scale better #218

Open
alexlyttle opened this issue Feb 17, 2020 · 3 comments
Open

Improve session class to scale better #218

alexlyttle opened this issue Feb 17, 2020 · 3 comments

Comments

@alexlyttle
Copy link
Collaborator

Enhancement

While recently running PBjam for multiple stars, I have made some adjustments so that session scales better, e.g.

  • Catch errors when downloading the light curve via Lightkurve and remove/flag said target in list
  • Log stars which fail in a csv with a description of the error (I found this super helpful)
  • Add plt.close('all') after each loop in session.__call__ so plots don't consume as much RAM

I am happy to work on putting these and more in PBjam, making some tests for them.

This could also tie in with Issue #86 where I could consider ways of parallelise session rather than run it in multiple jobs.

@grd349
Copy link
Owner

grd349 commented Feb 18, 2020

Hi @alexlyttle,

That sounds great! Would we need a if make_plots: around the plt.close('all') - I'm not sure but @nielsenmb will know.

@nielsenmb
Copy link
Collaborator

I think pt. 1 and 2 might fall under a general logging module? Something along the lines of what's discussed in Issue #175.

It would be nice to have a general log file that catches errors and outputs, but the failed_runs_go_here.csv that you have sounds really good. I think we should have both. Log files are great for debugging individual cases, but csv files are easier to parse for long sessions.

The only issue I can think of is where to store them, right now we're operating on a kind of star-by-star basis. In a session a dir is created for each star, but not for the session. We could just dump it in the path that the user specifies or in the cwd by default?

The plt.close('all') needs some investigation. I've had this issue before and close() doesn't always seem to release memory. Apparently it's a deliberate thing in matplotlib . The solution they propose is creating a single fig object and clearing the axes in that before every use. This might get complicated in PBjam though.

@alexlyttle
Copy link
Collaborator Author

It would be nice to have a general log file that catches errors and outputs, but the failed_runs_go_here.csv that you have sounds really good. I think we should have both. Log files are great for debugging individual cases, but csv files are easier to parse for long sessions.

I think there is definitely room to do logging and a failed runs csv. I've been storing them under the path specified by the user in session.__init__() and appending to the file each time a star files.

The plt.close('all') needs some investigation. I've had this issue before and close() doesn't always seem to release memory. Apparently it's a deliberate thing in matplotlib . The solution they propose is creating a single fig object and clearing the axes in that before every use. This might get complicated in PBjam though.

I have seen suggestions that clearing the figure before closing it is a better idea. If we had this as an option, e.g. close_plots=True, I will test something like this (may be better ways, just an idea):

if make_plots:
    plots = [self.kde.plot_corner, self.kde.plot_spectrum]
    figs = []
    for plot in plots
        figs.append(plot(path=self.path, ID=self.ID, savefig=make_plots)
    
    if close_plots:
        for fig in figs:
            fig.clear()
            plt.close(fig)

The above could be generalised to one function called in run_kde, run_asy_peakbag etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants