Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to deal with "free list full with 5000 items" #525

Open
tovogt opened this issue Sep 22, 2021 · 15 comments
Open

How to deal with "free list full with 5000 items" #525

tovogt opened this issue Sep 22, 2021 · 15 comments

Comments

@tovogt
Copy link
Contributor

tovogt commented Sep 22, 2021

It would be great if you could give advice what to do when the calculation stops due to the error message: free list full with 5000 items. This typically appears after a lot of ***adjusting timestep for level 7 at t = and together with Note: The following floating-point exceptions are signalling: IEEE_UNDERFLOW_FLAG IEEE_DENORMAL and usually also the following:

 out of bndry space - allowed ***** bndry grids
 There are    15447 total grids    120351 bndry nbors average num/grid      7.791
 Expanding size of boundary list from       120000  to       180000
 out of nodal space - allowed    30000 grids
    level    1 has       1 grids
    level    2 has       4 grids
    level    3 has      16 grids
    level    4 has      64 grids
    level    5 has     256 grids
    level    6 has    1024 grids
    level    7 has   14107 grids
  Could need twice as many grids as on any given
  level if regridding/birecting
 Expanding maximum number of grids from        30000  to        40000

Any ideas what might be causing this or how to deal with this? Thanks in advance!

@mandli
Copy link
Member

mandli commented Sep 23, 2021

This is pretty rare to see, more often we run of grids before we run out of boundary space. Given that you are seeing underflows as well I am wondering if something is blowing up instead. Have you tried plotting the results up to this time and see what might be going on?

@tovogt
Copy link
Contributor Author

tovogt commented Sep 24, 2021

Thanks for your response!

I now think that this is somehow related to a very irregular bathymetry. This problem occurs when a TC track crosses over the Bahamas Archipel region (bathymetry from SRTM15+V2.0):
image
Even for rather medium-strength storms, GeoClaw tends to produce pretty large waves in this region. For example Hurricane Jeanne (2004):
image
I will be on vacation for a week now, and come back to this at the beginning of October with plots of the GeoClaw wind fields, AMR regions and surface. See you!

@mandli
Copy link
Member

mandli commented Sep 24, 2021

The Bahamas are difficult to say the least. Let's pick it up after you get back then.

@tovogt
Copy link
Contributor Author

tovogt commented May 30, 2022

I just wanted to report back that this is still coming up from time to time and not only for the Bahamas (as suggested above). Most of the time, this is not a problem. But recently, I was experimenting with higher resolution runs (up to 30m) with synthetic scenarios where the wind speeds reach 290 km/h and found that I would run into the "free list full with 5000 items". Reproducing this is nasty, since it will run for 5 days (!) before the run fails.

In those cases I was running modified versions of Cyclone Idai (2019), but only its final landfall:
image
The bathymetry looks totally harmless in principal
image

As I said, I can't easily rerun this and get arbitrary plot outputs from the run because it takes more than 5 days to run it. I just wanted to paste this here for future reference.

@mjberger
Copy link
Contributor

mjberger commented May 30, 2022 via email

@mandli
Copy link
Member

mandli commented May 31, 2022

@mjberger that would not be too hard to do I suppose.

@tovogt
Copy link
Contributor Author

tovogt commented Feb 12, 2024

Unfortunately, this still occurs from time to time. In a current setup, we would like to model storm surge of Super Typhoon Haiyan (2013) at up to 9 arc-seconds resolution in GeoClaw, but the run fails after more than 24 hours of run time with "free list full with 5000 items":
image
Note that we are only modeling a 48-hour subset of the storm duration:
image
How could we implement auto-increase for this data structure?

@mjberger
Copy link
Contributor

mjberger commented Feb 12, 2024 via email

@tovogt
Copy link
Contributor Author

tovogt commented Feb 13, 2024

Thanks for the quick response, Marsha! The code is now running with the lfdim variable increased to 25000. :) I will report back whether it still fails.

@tovogt
Copy link
Contributor Author

tovogt commented Feb 15, 2024

Okay, I now produced some more time-dependent plots to understand what's going on:
fig1001
fig1005

Even though this is a pure surge-driven scenario, it seems like there is an earthquake at the bottom boundary somewhere around hour "+10". What causes this and how can I prevent it?

I already enforce that outside of the inner rectangle (that you see in frame 1) refinement is limited to at most level 4 (of 6). If the refinement at the boundary is causing this, would it help to restrict to refinement level 1 in a small strip around the boundary area? Here is my regions.data:

5                    =: num_regions         
1  4 -4.32000000000000e+04  1.40400000000000e+05  1.05831993103027e+02  1.40850708007812e+02  9.86385345458984e-02  2.24680137634277e+01  
3  6 -4.32000000000000e+04  1.40400000000000e+05  1.15847679138184e+02  1.30129470825195e+02  8.31493663787842e+00  1.43523235321045e+01  
5  6 -4.32000000000000e+04  1.40400000000000e+05  1.18690833333333e+02  1.21242500000000e+02  9.57416666666733e+00  1.40425000000004e+01  
5  6 -4.32000000000000e+04  1.40400000000000e+05  1.21257500000000e+02  1.23792499999999e+02  9.10750000000069e+00  1.38425000000004e+01  
5  6 -4.32000000000000e+04  1.40400000000000e+05  1.23807499999999e+02  1.26342499999999e+02  8.64083333333405e+00  1.33091666666671e+01  

@mjberger
Copy link
Contributor

mjberger commented Feb 15, 2024 via email

@tovogt
Copy link
Contributor Author

tovogt commented Feb 16, 2024

Thanks for your quick response!

Here is the complete set of input data that I used to run the example (with make .output): https://www.pik-potsdam.de/~tovogt/for_mjberger/2013306N07162_60as_2024-02-16.zip (This archive also includes a file stdout.log with the full log output of GeoClaw!) I use clawpack version 5.9.2 and gfortran version 13.2.0, and I run the example on a single HPC cluster node with 16 CPUs and 64 GB of RAM. Running this example on that setup requires more than 6 hours (wall time).

Here I uploaded the complete _plots directory: https://www.pik-potsdam.de/~tovogt/for_mjberger/2013306N07162_60as_plots_2024-02-16.zip

And here is the _output directory: https://www.pik-potsdam.de/~tovogt/for_mjberger/2013306N07162_60as_output_2024-02-16.zip (5.9 GB of data!)

Note that this is not the exact same setup that caused the "free list full with 5000 items" error message for me (this post: #525 (comment)). That one has exceedingly long run times (more than 24 hours), so I reduced the resolution a bit, but left everything else unchanged.

@tovogt
Copy link
Contributor Author

tovogt commented Feb 16, 2024

Oh, and I can now answer my question whether this is caused by refinement at the boundary. I enforced that there is no refinement at the boundary and the problem persists:
fig1001
Maybe this has to do with the bottom boundary being very close to the equator, and there is some kind of division by zero or so?

@mjberger
Copy link
Contributor

mjberger commented Feb 16, 2024 via email

@mjberger
Copy link
Contributor

mjberger commented Feb 16, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants