Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unstable connection using llcreader #317

Open
abodner opened this issue Dec 8, 2022 · 8 comments
Open

Unstable connection using llcreader #317

abodner opened this issue Dec 8, 2022 · 8 comments

Comments

@abodner
Copy link

abodner commented Dec 8, 2022

Hello,

This is my first time posting here!

I do not have a Plaeides account but for my research I would like to use the llc4320 data.
At first, I tried to use llcreader to access the model data, and then run calculations using dask, only saving locally the final result. However, my connection is not stable enough to complete the calculations.
Since I only need to subsample the data (10X10 degree boxes of 3D temp, salinity, velocities of the top 700m), I have been trying to transfer the subsampled variables to my local machine. But even for this task my connection brakes and it has been a painfully long process to get the data I need.

Any suggestions? Is there a way to ensure the connection does not break? Is it possible to transfer using another method (e.g. globus or rsync)?

Thanks in advance!

@timothyas
Copy link
Member

Hi @abodner! Sorry to hear about your troubles. Would you be able to share the lines of code you're using so we can make the discussion more concrete? Thanks!

@abodner
Copy link
Author

abodner commented Dec 8, 2022

Thanks @timothyas! It has indeed been a quite frustrating process!

Here is sample of my code for the variable W (ideally I would have about 20 of these regions for each of the five variables, so far I have only managed to get 6):

lat_min = -45
lat_max = -30
lon_min = -140
lon_max = -125
depth_lim = -700

ds_W_full = model.get_dataset(varnames=['W'], type='latlon')

sel_area_W = np.logical_and(np.logical_and(np.logical_and(ds_W_full.XC>lon_min, ds_W_full.XC<lon_max ),
                           np.logical_and(ds_W_full.YC>lat_min, ds_W_full.YC<lat_max)), ds_W_full.Zl>depth_lim)
ds_W = ds_W_full.where(sel_area_W, drop=True).resample(time='24H').nearest(tolerance="1H")

ds_W.to_netcdf(PATH+'raw_data/ds_W.nc',engine='h5netcdf')

@rabernat
Copy link
Member

rabernat commented Dec 8, 2022

Hi @abodner - thanks for posting here! Welcome!

Unfortunately the ECCO data portal is just not reliable or fast enough to facilitate this volume of data transfer. The best bet is to use rclone to move the data off of Pleaides. In order to do that, you need an allocation on that computer, which I'm assuming you don't have.

Fortunately, we are working on creating a mirror of more of the LLC data on Open Storage Network. We'd be happy to prioritize transferring the data you need. I'm cc'ing @dhruvbalwada and @rsaim who are working on this project.

Would you be available to join a call tomorrow at 10am to discuss in more detail? We'll be at https://columbiauniversity.zoom.us/j/92320021983?pwd=RmJ2TngxYTNrM0Fwd0ZYVDBNOUsrZz09

@abodner
Copy link
Author

abodner commented Dec 8, 2022

Hi @rabernat! Thanks for your reply and willingness to help out.
I would love to join the call tomorrow and discuss further.
See you then and thanks again!

@Shirui-peng
Copy link

Hi all -- I'm trying to load the llc2160 data in a similar way. In particular, I want to subsample the temperature and salinity data with one-year, full-depth coverage at selected grid points around the Kuroshio region. Here are lines of example code. I would appreciate any suggestions on how to do this efficiently. Thanks in advance!

import xmitgcm.llcreader as llcreader
model = llcreader.ECCOPortalLLC2160Model()
n = 413
ds = model.get_dataset(varnames=['Theta','SALT'],
iter_start=92160+n*1920,iter_stop=92160+(n+365)*1920,iter_step=1920)
pT = ds.Theta.isel(face=7,i=1600,j=320).values

@timothyas
Copy link
Member

Hi @Shirui-peng, sorry for a long silence. Is this still an issue? Do you need a larger spatial region, and do you need all vertical levels? If you need a larger horizontal area, I would access the data with the entire horizontal slice you need each time rather than looping over i,j values. If you need all vertical levels, or a subset, I would increase the k_chunksize parameter, making it as large as possible to fit into memory. The default is 1, which would be inefficient if more depth levels are needed but a small horizontal region is required.

Finally if you have other computations that will reduce the dataset it could be good to include those before calling the values into memory, with .values as in your last line.

@Shirui-peng
Copy link

Hi @timothyas, thank you for the response and help! Idealy, we will need all grid points that are nearest to 20+k Argo profile locations in the Kuroshio region. And we need all vertical levels but want to reduce the vertical dimension with some mode-weighted averaging. Inspired by your insights, it seems to me that one way is to access the entire horizontal slice with large enough k_chunksize parameter at each time step. And it will include the vertical averaging computation before calling the values. Do you think this is a reasonable approach?

@timothyas
Copy link
Member

I think that makes sense to me. Please let us know how it goes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants