Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add a gpu scaling job with diagnostics #2852

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

szy21
Copy link
Member

@szy21 szy21 commented Mar 28, 2024

Purpose

To-do

Content


  • I have read and checked the items on the review checklist.

@szy21 szy21 marked this pull request as draft March 28, 2024 18:20
@szy21
Copy link
Member Author

szy21 commented Mar 28, 2024

gpu build: https://buildkite.com/clima/climaatmos-target-gpu-simulations/builds/251

The simulation runs for a day, with daily averaged default output. The SYPD at the end of the simulation with and without diagnostics are 0.54 and 0.46, respectively. The SYPD during the time stepping is similar. @Sbozzolo @charleskawczynski Do you think it's useful to add a job like this in the gpu scaling pipeline?

@szy21 szy21 marked this pull request as ready for review March 28, 2024 20:24
@Sbozzolo
Copy link
Member

In #2646, I was trying to add a job like this, but also producing the flame graph, so that we have an actionable table of parts to optimize. However, I am running into limits for ProfileCanvas, and the HTML cannot be rendered because it has too many entries.

The difference in SYPD during runtime tells us that the online SYPD is not computed correctly for the last step. The SYPD is computed at the beginning of the step, so it does not account for the time spent in saving the output in the last step.

Is this job representative the ideal job you want to run? Does it have all the physics you want to run and diagnostics you want to save?

@szy21
Copy link
Member Author

szy21 commented Mar 28, 2024

This job is a good representative for the atmosphere only without edmf (dyamond) run. The GPU scaling jobs only run for 1 day, with 1 day averaged diagnostics. I think in the end we want to run it with mostly monthly averaged diagnostics, so there may be some differences. But other than that I think this job is good.

@Sbozzolo
Copy link
Member

Okay, I got a flame graph for this entre job, but it doesn't look good. I'll look into it

@charleskawczynski
Copy link
Member

Do you think it's useful to add a job like this in the gpu scaling pipeline?

Yes, I think it'd be good. It'd be helpful if we add an nsysreport, too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants