Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to map correlations to Stokes parameters without looping the correlations? #213

Open
miguelcarcamov opened this issue Apr 30, 2022 · 4 comments

Comments

@miguelcarcamov
Copy link

  • dask-ms version: 0.2.6
  • Python version: 3.9.7
  • Operating System: Manjaro

Hello, I am trying to calculate the psf/dirtybeam analytically with dask-ms and also gridding the data. However, to calculate the psf analytically for each stokes parameter, or the dirty image for each stokes parameter it has been inevitable to loop the correlations to match the data of the correlations to each Stokes. This additional loop inside the loop of list of subms makes the code considerably slow. Has anyone found a way to map the correlations to each stokes without looping the correlations? Is there a way to do this with dask?

If what I just wrote above does not make sense to you, please ask :).

@JSKenyon
Copy link
Collaborator

JSKenyon commented May 3, 2022

Hi @miguelcarcamov! Apologies for the delay - I was on vacation. I am not entirely sure what you are trying to accomplish. Could you possibly provide more details/a code snippet?

Are you trying to map [XX, XY, YX, YY] to [I, Q, U V] on the xarray datasets?

@miguelcarcamov
Copy link
Author

Hi @JSKenyon well it depends of the feed really. If you see this code that tries to do a dirty map from data using dask, you can see that in line 120 I loop the correlations in order to map them to I,Q,U,V depending on the feed. In the code, gridded_data and gridded_weights have a shape of (m,n,ncorrs), and to sum them to I,Q,U,V uv-grids depending on the feed it costs me a loop through all correlations for each one of the subms. I want to get rid of that for loop, but I'm not so sure of how to do it yet. It might be difficult to follow this, but let me know if you have questions :)

@JSKenyon
Copy link
Collaborator

JSKenyon commented May 9, 2022

If you are doing all your operations on dask arrays, I am not sure why the loop itself would be slow (unless you have a huge number of datasets). You can likely simplify the code by just having a mapping stored somewhere so that you don't have to check so many conditions.

If your code is pure dask, that loop over correlations isn't doing any real work - it is just setting up a graph. If, however, your arrays have already been reified to numpy at that point, I can imagine that that is slow.

Would you be willing to run line_profiler on the function? That may make it a bit clearer to me.

@sjperkins
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants