Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: xarray support for COSMIC GPS #18

Merged
merged 36 commits into from Oct 21, 2021
Merged

ENH: xarray support for COSMIC GPS #18

merged 36 commits into from Oct 21, 2021

Conversation

rstoneback
Copy link
Collaborator

@rstoneback rstoneback commented Apr 15, 2021

Addresses #1 by including support for loading COSMIC GPS data into xarray. Full testing of this pull requires incorporating existing pulls, namely #14 and #17, as well as updates to GitHub actions and such :)

In [1]: import pysat                                                                                                                                                                                                        

In [2]: gps = pysat.Instrument('cosmic', 'gps', 'ionprf', update_files=True, altitude_bin=5.)                                                                                                                               

In [3]: gps.load(2008, 1)                                                                                                                                                                                                   
<ipython-input-3-bb6939bfe4e3>:1: UserWarning: Metadata set to defaults, as they were missing in the Instrument
  gps.load(2008, 1)

In [4]: gps.data                                                                                                                                                                                                            
Out[4]: 
<xarray.Dataset>
Dimensions:           (RO: 172, time: 1199)
Coordinates:
    MSL_alt           (time, RO) float64 51.46 55.6 61.09 65.21 ... nan nan nan
    GEO_lat           (time, RO) float64 34.3 34.33 34.36 34.38 ... nan nan nan
    GEO_lon           (time, RO) float64 -85.61 -85.62 -85.64 ... nan nan nan
    OCC_azi           (time, RO) float64 131.1 131.0 130.8 130.6 ... nan nan nan
    MSL_bin_alt       (time, RO) float64 50.0 55.0 60.0 65.0 ... nan nan nan nan
  * time              (time) datetime64[ns] 2008-01-01T00:08:16.000083968 ......
Dimensions without coordinates: RO
Data variables:
    ELEC_dens         (time, RO) float64 nan nan nan nan nan ... nan nan nan nan
    TEC_cal           (time, RO) float64 19.15 18.31 17.49 17.8 ... nan nan nan
    occ_id            (time) float64 1.0 1.0 1.0 nan nan ... nan 1.0 1.0 nan 1.0
    fiducial_id       (time) object '' '' '' nan nan '' ... nan nan '' '' nan ''
    reference_sat_id  (time) float64 0.0 0.0 0.0 nan nan ... nan 0.0 0.0 nan 0.0
    occulting_sat_id  (time) float64 8.0 20.0 31.0 nan nan ... 4.0 21.0 nan 8.0
    year              (time) float64 2.008e+03 2.008e+03 ... nan 2.008e+03
    month             (time) float64 1.0 1.0 1.0 nan nan ... nan 1.0 1.0 nan 1.0
    day               (time) float64 1.0 1.0 1.0 nan nan ... nan 1.0 1.0 nan 1.0
    hour              (time) float64 0.0 0.0 0.0 nan nan ... 23.0 23.0 nan 23.0
    minute            (time) float64 8.0 11.0 11.0 nan ... 54.0 55.0 nan 57.0
    second            (time) float64 16.0 4.0 29.0 nan nan ... 39.0 55.0 nan 7.0
    offset            (time) float64 0.0 0.0 0.0 nan nan ... nan 0.0 0.0 nan 0.0
    shortlen          (time) float64 430.0 453.0 451.0 nan ... 510.0 nan 479.0
    setting           (time) float64 0.0 0.0 0.0 nan nan ... nan 1.0 1.0 nan 1.0
    icalib            (time) float64 1.0 1.0 1.0 nan nan ... nan 1.0 0.0 nan 1.0
    botnum            (time) float64 1.0 1.0 1.0 nan nan ... nan 1.0 1.0 nan 1.0
    bottime           (time) float64 8.832e+08 8.832e+08 ... nan 8.833e+08
    botlct            (time) float64 18.41 7.499 9.174 nan ... 10.19 nan 8.033
    botalt            (time) float64 51.46 42.13 52.45 nan ... 1.9 nan 1.099
    botlat            (time) float64 34.3 21.31 24.43 nan ... 21.79 nan 18.94
    botlon            (time) float64 -85.61 110.0 135.1 nan ... 153.5 nan 120.8
    botaz             (time) float64 131.1 -124.9 -117.6 ... -175.9 nan 27.99
    topnum            (time) float64 430.0 453.0 451.0 nan ... 510.0 nan 479.0
    toptime           (time) float64 8.832e+08 8.832e+08 ... nan 8.833e+08
    toplct            (time) float64 18.28 8.827 8.938 nan ... 8.176 nan 8.296
    topalt            (time) float64 857.4 832.2 831.8 nan ... 836.3 nan 835.0
    toplat            (time) float64 39.96 16.93 14.71 nan ... 37.25 nan 30.38
    toplon            (time) float64 -89.7 127.7 129.2 nan ... 126.0 nan 127.3
    topaz             (time) float64 104.5 -165.3 -116.1 nan ... 143.1 nan 56.32
    edmaxtime         (time) float64 8.832e+08 8.832e+08 ... nan 8.833e+08
    edmaxlct          (time) float64 18.38 7.705 9.147 nan ... 9.901 nan 8.064
    edmaxalt          (time) float64 257.1 218.2 226.6 nan ... 225.2 nan 209.7
    edmaxlat          (time) float64 35.47 20.57 23.06 nan ... 24.53 nan 20.79
    edmaxlon          (time) float64 -86.32 112.8 134.3 nan ... 149.5 nan 121.7
    edmaxaz           (time) float64 124.4 -128.7 -117.5 ... -177.1 nan 33.52
    smear             (time) float64 725.8 1.912e+03 1.244e+03 ... nan 1.427e+03
    edmax             (time) float64 1.014e+05 2.987e+05 ... nan 3.143e+05
    critfreq          (time) float64 2.859 4.907 6.558 nan ... 6.906 nan 5.034
    tec0              (time) float64 2.477 5.767 8.661 nan ... 9.473 nan 5.971
    tec1              (time) float64 0.2999 0.7336 3.249 nan ... 1.075 nan 3.596
    edorbtime         (time) float64 8.832e+08 8.832e+08 ... nan 8.833e+08
    edorbalt          (time) float64 858.5 833.3 832.8 nan ... 837.3 nan 836.1
    edorb             (time) float64 7.127e+03 1.911e+04 ... nan 1.307e+04
    hscale            (time) float64 -999.0 -999.0 -999.0 ... 128.4 nan -999.0
    topfit            (time) float64 0.004766 0.009597 0.005782 ... nan 0.04341
    nmax              (time) float64 430.0 453.0 451.0 nan ... 510.0 nan 479.0
    fileStamp         (time) object 'C004.2008.001.00.07.G08' ... 'C002.2008....
    inverter          (time) object 'gmrion' 'gmrion' 'gmrion' ... nan 'gmrion'
    parmsfile         (time) object 'parms8' 'parms8' 'parms8' ... nan 'parms8'
    center            (time) object 'UCAR/CDAAC' 'UCAR/CDAAC' ... 'UCAR/CDAAC'
    mission           (time) object 'COSMIC' 'COSMIC' 'COSMIC' ... nan 'COSMIC'
    creation_time     (time) object '10-JUL-14 18:56' ... '10-JUL-14 18:56'

@rstoneback rstoneback linked an issue Apr 15, 2021 that may be closed by this pull request
3 tasks
@rstoneback
Copy link
Collaborator Author

Somehow some commits from my other pull ended up in here. sigh

@rstoneback
Copy link
Collaborator Author

Marking this as ready for review as it is nominally complete. More accurately, the basic loading, cleaning, etc. is in there. Getting the tests to pass will be most effective after getting the previous pulls in.

@rstoneback
Copy link
Collaborator Author

Ensure time mangling, if still present, is noted in the docstring.

…ysat/pysatCDAAC into xarray_support

� Conflicts:
�	pysatCDAAC/instruments/cosmic_gps.py
@rstoneback
Copy link
Collaborator Author

I was behind on my packages in comparison. I cloned my environment, upgraded xarray and pandas to the same version you have, and in another round upgraded to the absolute latest. I still get the correct file dates and times in either case....

What about numpy version?

@jklenzing
Copy link
Member

numpy 1.19.0

@rstoneback
Copy link
Collaborator Author

Installed a close numpy, as well as the latest, no dice. Files keep working for me.

In [1]: import pysat
import da^[[A
In [2]: import datetime as dt

In [3]: gps = pysat.Instrument('cosmic', 'gps', 'ionprf', update_files=True)

In [4]: gps.files.files
Out[4]: Series([], dtype: object)

In [5]: gps.download(dt.datetime(2018, 12, 30), dt.datetime(2019, 1, 4))

In [6]: gps.files.files
Out[6]: 
2018-12-30 00:04:00.062799872    2018/364/ionPrf_C006.2018.364.00.04.G28_2016.1...
2018-12-30 00:14:00.061700096    2018/364/ionPrf_C006.2018.364.00.14.G17_2016.1...
2018-12-30 00:20:00.062200064    2018/364/ionPrf_C006.2018.364.00.20.G22_2016.1...
2018-12-30 00:22:00.060100096    2018/364/ionPrf_C006.2018.364.00.22.G01_2016.1...
2018-12-30 00:33:00.061400064    2018/364/ionPrf_C006.2018.364.00.33.G14_2016.1...
                                                       ...                        
2019-01-04 23:05:00.062200064    2019/004/ionPrf_C006.2019.004.23.05.G22_2016.1...
2019-01-04 23:23:00.063200000    2019/004/ionPrf_C006.2019.004.23.23.G32_2016.1...
2019-01-04 23:33:00.061000192    2019/004/ionPrf_C006.2019.004.23.33.G10_2016.1...
2019-01-04 23:35:00.063100160    2019/004/ionPrf_C006.2019.004.23.35.G31_2016.1...
2019-01-04 23:41:00.062000128    2019/004/ionPrf_C006.2019.004.23.41.G20_2016.1...
Length: 891, dtype: object

In [7]: import numpy as np

In [8]: np.__version__
Out[8]: '1.19.2'

@rstoneback
Copy link
Collaborator Author

I did try to specifically get version 1.19 for numpy but specifying numpy=1.19.0 doesn't work and numpy=1.19 gets me numpy v 1.19.2. This is via conda.

@rstoneback
Copy link
Collaborator Author

I'm on python 3.7.6. What about netCDF4 version?

@jklenzing
Copy link
Member

python 3.8.2, netCDF4 1.5.6

@rstoneback
Copy link
Collaborator Author

OK! I got through a fresh install and files are still looking good on my end. I don't have any other immediate ideas for things to try on my end. Open to suggestions of course and I'll give it some thought while poking at other issues.

Python 3.8.2 | packaged by conda-forge | (default, Apr 24 2020, 07:56:27) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.24.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import pysat

In [2]: gps = pysat.Instrument('cosmic', 'gps', 'ionprf', update_files=True)

In [3]: gps.files.files
Out[3]: Series([], dtype: object)

In [4]: import datetime as dt

In [5]: gps.download(dt.datetime(2018, 12, 30), dt.datetime(2019, 1, 4))

In [6]: gps.files.files
Out[6]: 
2018-12-30 00:04:00.062799872    2018/364/ionPrf_C006.2018.364.00.04.G28_2016.1...
2018-12-30 00:14:00.061700096    2018/364/ionPrf_C006.2018.364.00.14.G17_2016.1...
2018-12-30 00:20:00.062200064    2018/364/ionPrf_C006.2018.364.00.20.G22_2016.1...
2018-12-30 00:22:00.060100096    2018/364/ionPrf_C006.2018.364.00.22.G01_2016.1...
2018-12-30 00:33:00.061400064    2018/364/ionPrf_C006.2018.364.00.33.G14_2016.1...
                                                       ...                        
2019-01-04 23:05:00.062200064    2019/004/ionPrf_C006.2019.004.23.05.G22_2016.1...
2019-01-04 23:23:00.063200000    2019/004/ionPrf_C006.2019.004.23.23.G32_2016.1...
2019-01-04 23:33:00.061000192    2019/004/ionPrf_C006.2019.004.23.33.G10_2016.1...
2019-01-04 23:35:00.063100160    2019/004/ionPrf_C006.2019.004.23.35.G31_2016.1...
2019-01-04 23:41:00.062000128    2019/004/ionPrf_C006.2019.004.23.41.G20_2016.1...
Length: 891, dtype: object

@rstoneback rstoneback mentioned this pull request Jun 14, 2021
9 tasks
@rstoneback
Copy link
Collaborator Author

I created issue #24 to track the download and file issue. That bug is out of scope for this particular pull since no changes have been made to those lines here.

@jklenzing
Copy link
Member

OK, the date corruption is still messing with this, but I've tried to fix it as best as I can. Deleted all data, downloaded only Jan 1 2014 data for ionprf.

  • 1095 files downloaded.
  • 1041 data points loaded.
  • 358 points have a NaN for the variable 'year'

Here's the data loaded:

<xarray.Dataset>
Dimensions:           (time: 1041, RO: 672)
Coordinates:
  * time              (time) datetime64[ns] 2014-01-01T00:07:28.000324096 ......
    MSL_alt           (time, RO) float64 76.56 78.86 81.15 83.45 ... nan nan nan
    GEO_lat           (time, RO) float64 23.49 23.52 23.54 23.56 ... nan nan nan
    GEO_lon           (time, RO) float64 136.9 136.8 136.8 136.8 ... nan nan nan
    OCC_azi           (time, RO) float64 -112.0 -112.1 -112.2 ... nan nan nan
Dimensions without coordinates: RO
Data variables: (12/53)
    occ_id            (time) float64 1.0 nan 1.0 nan nan ... 1.0 nan 1.0 nan 1.0
    fiducial_id       (time) object '' nan '' nan nan '' ... '' '' nan '' nan ''
    reference_sat_id  (time) float64 0.0 nan 0.0 nan nan ... 0.0 nan 0.0 nan 0.0
    occulting_sat_id  (time) float64 32.0 nan 26.0 nan nan ... nan 25.0 nan 17.0
    year              (time) float64 2.014e+03 nan 2.014e+03 ... nan 2.014e+03
    month             (time) float64 1.0 nan 1.0 nan nan ... 1.0 nan 1.0 nan 1.0
    ...                ...
    parmsfile         (time) object 'parms8' nan 'parms8' ... nan 'parms8'
    center            (time) object 'UCAR/CDAAC' nan ... nan 'UCAR/CDAAC'
    mission           (time) object 'COSMIC' nan 'COSMIC' ... nan 'COSMIC'
    creation_time     (time) object '13-SEP-14 04:01' nan ... '13-SEP-14 04:02'
    TEC_cal           (time, RO) float64 200.7 202.5 204.4 206.4 ... nan nan nan
    ELEC_dens         (time, RO) float64 nan nan nan nan nan ... nan nan nan nan

I'm also getting a user warning from pysat about metadata, which is presumably because we are converting attributes to variables when we sum across files. Not sure why so much data is being dropped.

Copy link
Member

@jklenzing jklenzing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some style bits on comments. Could you verify what loaded data looks like on your end? Not sure if my "dropped data" issue above is related to the file issues documented elsewhere.

pysatCDAAC/instruments/cosmic_gps.py Outdated Show resolved Hide resolved
pysatCDAAC/instruments/cosmic_gps.py Outdated Show resolved Hide resolved
pysatCDAAC/instruments/cosmic_gps.py Outdated Show resolved Hide resolved
pysatCDAAC/instruments/cosmic_gps.py Outdated Show resolved Hide resolved
pysatCDAAC/instruments/cosmic_gps.py Outdated Show resolved Hide resolved
pysatCDAAC/instruments/cosmic_gps.py Outdated Show resolved Hide resolved
pysatCDAAC/instruments/cosmic_gps.py Outdated Show resolved Hide resolved
pysatCDAAC/instruments/cosmic_gps.py Outdated Show resolved Hide resolved
pysatCDAAC/instruments/cosmic_gps.py Outdated Show resolved Hide resolved
pysatCDAAC/tests/test_instruments.py Outdated Show resolved Hide resolved
Co-authored-by: Jeff Klenzing <jklenzing@gmail.com>
pysatCDAAC/instruments/cosmic_gps.py Outdated Show resolved Hide resolved
pysatCDAAC/instruments/cosmic_gps.py Outdated Show resolved Hide resolved
pysatCDAAC/instruments/cosmic_gps.py Outdated Show resolved Hide resolved
pysatCDAAC/instruments/cosmic_gps.py Outdated Show resolved Hide resolved
pysatCDAAC/instruments/cosmic_gps.py Outdated Show resolved Hide resolved
pysatCDAAC/instruments/cosmic_gps.py Outdated Show resolved Hide resolved
pysatCDAAC/instruments/cosmic_gps.py Outdated Show resolved Hide resolved
pysatCDAAC/instruments/cosmic_gps.py Outdated Show resolved Hide resolved
pysatCDAAC/instruments/cosmic_gps.py Outdated Show resolved Hide resolved
pysatCDAAC/instruments/cosmic_gps.py Outdated Show resolved Hide resolved
@rstoneback
Copy link
Collaborator Author

In [9]: gps.load(2018, 365)
<ipython-input-9-17f90e0af544>:1: UserWarning: Metadata set to defaults, as they were missing in the Instrument
  gps.load(2018, 365)

In [10]: gps.data
Out[10]: 
<xarray.Dataset>
Dimensions:           (RO: 490, time: 95)
Coordinates:
  * time              (time) datetime64[ns] 2018-12-31T00:04:25.000145920 ......
    MSL_alt           (time, RO) float64 52.67 55.37 58.08 60.77 ... nan nan nan
    GEO_lat           (time, RO) float64 -36.9 -36.92 -36.93 ... nan nan nan
    GEO_lon           (time, RO) float64 -137.6 -137.7 -137.7 ... nan nan nan
    OCC_azi           (time, RO) float64 141.1 141.1 141.1 141.1 ... nan nan nan
Dimensions without coordinates: RO
Data variables: (12/53)
    occ_id            (time) float64 0.0 0.0 nan nan nan ... 0.0 0.0 0.0 nan 0.0
    fiducial_id       (time) object '    ' '    ' nan nan ... '    ' nan '    '
    reference_sat_id  (time) float64 -999.0 -999.0 nan nan ... -999.0 nan -999.0
    occulting_sat_id  (time) float64 14.0 31.0 nan nan nan ... 24.0 12.0 nan 6.0
    year              (time) float64 2.018e+03 2.018e+03 nan ... nan 2.018e+03
    month             (time) float64 12.0 12.0 nan nan ... 12.0 12.0 nan 12.0
    ...                ...
    parmsfile         (time) object 'parms8' 'parms8' nan ... nan 'parms8'
    center            (time) object 'UCAR/CDAAC' 'UCAR/CDAAC' ... 'UCAR/CDAAC'
    mission           (time) object 'COSMIC' 'COSMIC' nan ... nan 'COSMIC'
    creation_time     (time) object '06-APR-19 01:46' ... '06-APR-19 01:46'
    TEC_cal           (time, RO) float64 123.9 124.2 124.3 124.6 ... nan nan nan
    ELEC_dens         (time, RO) float64 1.687e+05 1.783e+05 ... nan nan

I'm also getting nans in occ_id and some others. When I check MSL_alt though all the starting altitudes are valid.

In [15]: gps[:, 0, 'MSL_alt']
Out[15]: 
<xarray.DataArray 'MSL_alt' (time: 95)>
array([52.66825104, 48.42986679, 58.19499207, 58.49398804, 58.96940231,
       46.29630661, 52.10069656, 63.45686722, 43.24747467, 56.62965012,
       44.27114487, 56.93402863, 54.12645721, 63.38557816, 55.27773666,
       50.20518494, 55.63033676, 37.34650421, 64.33950806, 55.44276428,
       53.65758514, 41.82579422, 51.86764145, 36.30073547, 39.68922424,
       55.43086624, 56.45871735, 43.15644836, 45.46442413, 50.55051804,
       64.1120224 , 48.71077347, 54.18301392, 57.50118256, 57.89498138,
       53.99144363, 47.18326569, 71.00965881, 54.10379791, 37.93962479,
       48.18090439, 61.35616684, 60.07112885, 57.65946579, 67.86323547,
       53.03757477, 42.25401306, 55.38709641, 47.97056198, 43.74263382,
       58.46900558, 43.55466843, 35.29411697, 58.04468536, 38.9177475 ,
       58.56344223, 46.8060379 , 52.69259644, 62.12597656, 53.95124054,
       59.46157455, 46.85376358, 45.90418625, 66.72284698, 39.32769394,
       54.50231552, 44.27587128, 49.54469299, 64.52571106, 72.15966034,
       41.98462677, 40.15797806, 51.29039764, 56.70417023, 62.16763306,
       40.00085068, 58.69358063, 47.50477219, 52.33638   , 56.17718506,
       44.95435715, 56.55222321, 43.51290131, 56.54217911, 43.90591049,
       59.87452698, 47.56930161, 57.19068146, 39.97000504, 38.92786026,
       41.8406868 , 53.08489227, 66.14324188, 54.96273422, 48.13269424])
Coordinates:
  * time     (time) datetime64[ns] 2018-12-31T00:04:25.000145920 ... 2018-12-...
    MSL_alt  (time) float64 52.67 48.43 58.19 58.49 ... 53.08 66.14 54.96 48.13
    GEO_lat  (time) float64 -36.9 -57.23 -61.48 -39.93 ... -36.68 -26.27 -2.639
    GEO_lon  (time) float64 -137.6 -95.67 -6.29 -15.29 ... 44.23 54.11 72.18
    OCC_azi  (time) float64 141.1 86.54 39.58 13.32 ... 86.59 13.06 14.3 31.85

I'm going to have to come back to this. I've got some checkout work to do before 8/31. I'd expect some nans in the coordinate variables near the end. I had to extend all profiles to the longest profile length for the day.

@jklenzing
Copy link
Member

jklenzing commented Sep 30, 2021

Initial checks using the pysat branch at pysat/pysat#913:

File issues are solved. Loaded 'ionprf' for 2008 (Jan 1, Jan 31, Dec 30, Dec 31) and inspected total points versus files. Dates chosen to span leap year issues. When loaded with clean_level='none', points match in all cases.

def num_files(inst, year, month, day):
    ind1 = inst.files.files.index.year == year
    ind2 = inst.files.files.index.month == month
    ind3 = inst.files.files.index.day == day
    print(sum(ind1 & ind2 & ind3))

@jklenzing
Copy link
Member

jklenzing commented Sep 30, 2021

Verified this loads as expected for 'scnlvl1' for the same dates.

Copy link
Member

@jklenzing jklenzing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to load and work with data, it looks like everything is working as expected. Tests runs for 'ionprf' and 'scnlvl1'. There are some issues (default metadata), but I think those can be written up as a separate issue.

@jklenzing
Copy link
Member

Going to start working on a branch downstream of this. Recommend we merge this is in to develop and treat other major changes as new branches.

@rstoneback
Copy link
Collaborator Author

Thanks for the reminder that this was already ready already 🐒

@rstoneback rstoneback merged commit 3284941 into develop Oct 21, 2021
@rstoneback rstoneback deleted the xarray_support branch October 21, 2021 21:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG/ENH: modernize cosmic_gps methods
3 participants