Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

blk_1m: A not-finite number detected in: RHS of rc after rc_src #103

Open
trontrytel opened this issue Jan 27, 2020 · 5 comments
Open

blk_1m: A not-finite number detected in: RHS of rc after rc_src #103

trontrytel opened this issue Jan 27, 2020 · 5 comments

Comments

@trontrytel
Copy link
Contributor

For a job:

OMP_NUM_THREADS=32 bicycles --outdir=outdir --case=dycoms_rf02 --nx=129 --ny=0 --nz=301 --dt=1 --spinup=3600 --nt=25200 --micro=blk_1m --outfreq=900 --backend=serial --r_c0=0.000711222222222 --rng_seed=42 --prs_tol=5e-5

I get an error: A not-finite number detected in: RHS of rc after rc_src

(-2,2) x (-2,298)
[ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.24152e-12 6.5588e-11 4.10868e-10 6.10279e-10 5.16959e-10 3.64688e-10 2.21308e-10 3.41213e-11 -4.31171e-10 -7.08708e-10 -6.58321e-10 -5.48502e-10 -3.66218e-10 -1.11794e
...
]

I get a similar error for rc0 = 0.000622444444444

but this time its due to one nan value

A not-finite number detected in: RHS of rc after rc_src
(-2,1) x (-2,298)
[ 0 0 0 0 0
...
0 0 0 0 0 0 -nan 0 0 0 0 0 0
...
]

Any clues?

@trontrytel
Copy link
Contributor Author

I tested a little and it seems that this issue and #102 might be the symptoms of the same problem. Running the above commands sometimes leads to the error and sometimes leads to simulation timing out without any error (not even the stuck in the pressure solver error).

From the plots of the results it looks like the --prs_tol=5e-5 is too big. Even if the simulation completes the results are not good. I'll test what behaviour I get for prs_tol = 1e-6 or even 1e-7...

In general, I don't mind if the simulation runs a little bit longer. But getting randomly stuck for some parameter combinations is a problem

@trontrytel
Copy link
Contributor Author

As promised I tested the convergence for the 2D simulations for Dycoms with blk-1m scheme. This is what I get:

image

The top left plot shows the wall times for the whole simulation depending on the pressure solver tolerance. The colors from that plot are used to mark the rest of the profiles. The profiles are averaged over last hour (4 model outputs).

I didn't do any ensemble averaging, so it should all be taken with a grain of salt. But it looks like the 1e-6 tolerance is the borderline. And we should not go to lower tolerances (i.e. larger prs_tol numbers).

The timing out issue I mentioned in #102 looks more and more like a cluster hardware issue...

@trontrytel
Copy link
Contributor Author

Still debugging. But I talked with other cluster users and it seems like their random node failure rate is much lower than mine. So it seems we do have some random bug somewhere that is especially prominent in my 2D blk1m simulations.

@pdziekan
Copy link
Contributor

The convergence tests are nice, thanks for doing them.
1e-6 does look like a reasonable choice.

Could you test if you still get the error:
"A not-finite number detected in: RHS of rc after rc_src"
when some processes are disabled using the following arguments:
a) --rc_src=0
b) --accr=0
c) --conv=0

@trontrytel
Copy link
Contributor Author

The convergence tests are nice, thanks for doing them.
1e-6 does look like a reasonable choice.

Could you test if you still get the error:
"A not-finite number detected in: RHS of rc after rc_src"
when some processes are disabled using the following arguments:
a) --rc_src=0
b) --accr=0
c) --conv=0

Thanks for the hints! I'll be debugging this week(end).

I was also thinking that maybe something is wrong with the 2D setup. I'll try to run a small 3D simulation and see if I get similar errors. Same for the issue #105 - I got used to some negative values in rr or even rc. But rv and th are not acceptable :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants