Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating TS for both 'old' and 'new' CSV files is slow #29

Open
Nightsphere opened this issue Jun 29, 2021 · 2 comments
Open

Creating TS for both 'old' and 'new' CSV files is slow #29

Nightsphere opened this issue Jun 29, 2021 · 2 comments
Assignees
Labels
critical Priority: highest priority enhancement S Size: day or less

Comments

@Nightsphere
Copy link
Collaborator

Reading in all 332 basins for the GCP VM SnowpackStatisticsByBasin/ CSV files, and the locally created 332 basins is very slow. I have tried a few different test set ups to debug:

  • Since the GCP files contain about 17 years worth of daily data, I took out some basins and slowly put them back into the SnowpackStatisticsByBasin/ folder. The results:
    • 100 basins: 57 seconds
    • 150 basins: 1 minute 37 seconds
    • 240 basins: 3 minutes 31 seconds
    • 294 basins: 5 minutes 59 seconds

From what I saw, memory never went above 2.75 GB. The command file is quite small at this point, so there's really not much going on.

@Nightsphere Nightsphere added enhancement critical Priority: highest priority S Size: day or less labels Jun 29, 2021
@Nightsphere Nightsphere self-assigned this Jun 29, 2021
@smalers
Copy link
Contributor

smalers commented Jun 30, 2021

The performance does not seem unreasonable. Although ideally runs should be as fast as possible, processing a lot of data can take time. 332 basins * 3 time series = 996 time series. 3 time series with 365.25 points/year and 17 years gives 18628 data points per basin and 6,184,000 data points total for time series, as 4 byte double is 24.7 MB just for time series data. There is other memory being used. The point is that memory should not be a problem.

The increase in run time is not linear but it is not crazy exponential either. Maybe it is what it is.

There is potential that that the VM or combination of Linux and Windows is slow in other ways such as I/O. There may also be some unintended inefficiencies in the command file that an be identified with review. The output may be getting buffered some weird way but usually the UI shows steady progress unless one command really is slow.

I suggest working out the details on the time series comparison using fewer stations (even 1 station) and then run the big comparison. I usually add commands to make it easy to switch between the short and long runs.

@smalers
Copy link
Contributor

smalers commented Jun 30, 2021

Also, the ProfileCommands command under Running and Properties menu will track command performance, but maybe I need to have a general performance check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
critical Priority: highest priority enhancement S Size: day or less
Projects
None yet
Development

No branches or pull requests

2 participants