Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stuck in pressure solver error #92

Open
claresinger opened this issue Nov 21, 2019 · 6 comments
Open

stuck in pressure solver error #92

claresinger opened this issue Nov 21, 2019 · 6 comments

Comments

@claresinger
Copy link
Contributor

I somewhat frequently get the error of stuck in pressure solver (error message below). If I run the same simulation with a different random seed each time this will happen about every 20 runs. Do you know why this might happen? Could it be a glitch on the hpc I'm using and not a bug in the code?

terminate called after throwing an instance of 'std::runtime_error'
  what():  stuck in pressure solver
SIGABRT: abort
PC=0x473e4b m=0 sigcode=0
@pdziekan
Copy link
Contributor

Does this happen every time for a given random seed?

It seems that for some rare conditions the pressure solver has trouble with finding a solution.
If it was a bug in e.g. boundary conditions, I don't see why it would depend on the seed.

To make it easier for the pressure solver, you can try to:

  • decrease pressure solver precision --prs_tol=5e-5 (1e-6 is the default)
  • decrease time step --dt

This runtime_error is thrown when pressure solver needs more than 10000 iterations.
You could test increasing the number of iterations, which is hardcoded in libmpdata++/solvers/detail/mpdata_rhs_vip_prs_common.hpp

@trontrytel
Copy link
Contributor

I also just got stuck in pressure solver. I was running dycoms 2D with rng seed = 42. Will try again now with the same setup to see if it's deterministic or random

@trontrytel
Copy link
Contributor

The below command was stuck 4 times on 2 different GPU nodes.

@pdziekan - could you check if you will also get stuck on your machine? If yes then rng_seed=42 is a good candidate to debug from.

case = "dycoms_rf02"                                                           
nx = "129"                                                                     
ny = "0"                                                                       
nz = "301"                                                                     
dt = "1"                                                                       
nt = "21600"                                                                   
spinup = "3600"                                                                
outfreq = "3600"                                                               
backend = "CUDA"                                                               
                                                                               
outdir = "out_test_lgrngn"                                                     
rng_seed = "42"                                                                
                                                                               
micro = "lgrngn"                                                               
sd_conc = "40"                                                                 
sstp_cond = "10"                                                               
sstp_coal = "10"                                                               
                                                                               
cmd = "OMP_NUM_THREADS=1 ./src/bicycles --outdir="+outdir+" --case="+case+\    
      " --nx="+nx+" --ny=0 --nz="+nz+" --dt="+dt+" --spinup="+spinup+\         
      " --nt="+nt+" --micro="+micro+" --outfreq="+outfreq+\                    
      " --backend="+backend+" --rng_seed="+rng_seed+" --sd_conc="+sd_conc+\    
      " --sstp_cond="+sstp_cond+" --sstp_coal="+sstp_coal                      
                                                                               
print "running " + cmd                                                         
os.system(cmd)

@trontrytel
Copy link
Contributor

The below command was stuck 4 times on 2 different GPU nodes.

@pdziekan - could you check if you will also get stuck on your machine? If yes then rng_seed=42 is a good candidate to debug from.

The same command but with rng_seed = 44 does not get stuck

@trontrytel
Copy link
Contributor

Not sure if its the same issue. This combination gets stuck after time step = 9000 but I don't get any errors from the pressure solver.

case = "dycoms_rf02"                                                           
nx = "129"                                                                     
ny = "0"                                                                       
nz = "301"                                                                     
dt = "1"                                                                       
nt = "25200"                                                                   
spinup = "3600"                                                                
outfreq = "900"                                                                
backend = "CUDA"                                                               
                                                                               
rng_seed = "48"                                                                
outdir = "out_test_lgrngn_"+rng_seed                                           
                                                                               
micro = "lgrngn"                                                               
sd_conc = "512"                                                                
sstp_cond = "10"                                                               
sstp_coal = "10"                                                               
                                                                                                         
cmd = "OMP_NUM_THREADS=1 ./src/bicycles --outdir="+outdir+" --case="+case+\    
      " --nx="+nx+" --ny=0 --nz="+nz+" --dt="+dt+" --spinup="+spinup+\         
      " --nt="+nt+" --micro="+micro+" --outfreq="+outfreq+\                    
      " --backend="+backend+" --rng_seed="+rng_seed+" --sd_conc="+sd_conc+\    
      " --sstp_cond="+sstp_cond+" --sstp_coal="+sstp_coal+\                    
      " --gccn=1"

@trontrytel
Copy link
Contributor

Not sure if its the same issue. This combination gets stuck after time step = 9000 but I don't get any errors from the pressure solver.

The same with rng_seed=13

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants