Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposed Recipes for Himawari-8 Level 3 SST #173

Open
sharkinsspatial opened this issue Aug 17, 2022 · 6 comments
Open

Proposed Recipes for Himawari-8 Level 3 SST #173

sharkinsspatial opened this issue Aug 17, 2022 · 6 comments

Comments

@sharkinsspatial
Copy link
Contributor

sharkinsspatial commented Aug 17, 2022

Dataset Name

Himawari-8

Dataset URL

https://registry.opendata.aws/noaa-himawari/

Description

@pbranson Has prototyped some initial kerchunk index generation for the Himawari-8 Level 3 SST data as part of a project for OceanHackWeek in this repo. Using his example, I'll try to put together a initial recipe ref oceanhackweek/ohw22-proj-kerchunk#2

License

Open Data

Data Format

NetCDF

Data Format (other)

No response

Access protocol

S3

Source File Organization

s3://noaa-himawari8/{dataset}/{year}/{month}/{day}/{hour}/YYYYMMDDHHHHSS-STAR-L3C_GHRSST-SSTsubskin-AHI_H08-ACSPO_V2.80-v02.0-fv01.0.nc

Which alters to s3://noaa-himawari8/{dataset}/{year}/{month}/{day}/{hour}/YYYYMMDDHHHHSS-NCCF-L3C_GHRSST-SSTsubskin-AHI_H08-ACSPO_V2.80-v02.0-fv01.0.nc from 7 April 2021 forward.

Example URLs

s3://noaa-himawari8/AHI-L2-FLDK-SST/2022/01/13/0000/20220113000000-NCCF-L3C_GHRSST-SSTsubskin-AHI_H08-ACSPO_V2.80-v02.0-fv01.0.nc

s3://noaa-himawari8/AHI-L2-FLDK-SST/2020/01/13/0000/20200113000000-STAR-L3C_GHRSST-SSTsubskin-AHI_H08-ACSPO_V2.80-v02.0-fv01.0.nc

Authorization

No response

Transformation / Processing

NA

Target Format

Reference Filesystem (Kerchunk)

Comments

No response

@chuckwondo
Copy link

chuckwondo commented Aug 18, 2022

@sharkinsspatial, per our "jam" session yesterday with @wildintellect, Anthony Lukach, and Aimee Barciauskas, I'm posting details on our work to identify gaps in the available files in AWS S3.

We may want to tidy things up a bit within a single script, but here are all the parts.

Produce List of Actual L3C Files

To produce a list of the relevant L3C files in lexicographical (and also chronological order, given the naming convention):

aws s3 ls --recursive s3://noaa-himawari8/AHI-L2-FLDK-SST/ | grep '[-]L3C' | cut -c 32- > l3c-actual.txt

Example file:

AHI-L2-FLDK-SST/2020/01/16/1500/20200116150000-STAR-L3C_GHRSST-SSTsubskin-AHI_H08-ACSPO_V2.60-v02.0-fv01.0.nc

Upon visual inspection, we identified 3 places where the file pattern shifts:

  • Change in version from V2.60 to V2.71 occurring at this point in the list (notice the transition occurs between the 2 middle items, but that both of those items share the same date/time value, so we have an overlap):

    AHI-L2-FLDK-SST/2020/07/02/1300/20200702130000-STAR-L3C_GHRSST-SSTsubskin-AHI_H08-ACSPO_V2.60-v02.0-fv01.0.nc
    AHI-L2-FLDK-SST/2020/07/02/1400/20200702140000-STAR-L3C_GHRSST-SSTsubskin-AHI_H08-ACSPO_V2.60-v02.0-fv01.0.nc
    AHI-L2-FLDK-SST/2020/07/02/1400/20200702140000-STAR-L3C_GHRSST-SSTsubskin-AHI_H08-ACSPO_V2.71-v02.0-fv01.0.nc
    AHI-L2-FLDK-SST/2020/07/06/1300/20200706130000-STAR-L3C_GHRSST-SSTsubskin-AHI_H08-ACSPO_V2.71-v02.0-fv01.0.nc
    
  • Change in version from V2.71 to V2.80 (notice that there are several missing hourly files between these 2, so we cannot tell specifically which hourly interval at which the version change occurs):

    AHI-L2-FLDK-SST/2021/03/22/1600/20210322160000-STAR-L3C_GHRSST-SSTsubskin-AHI_H08-ACSPO_V2.71-v02.0-fv01.0.nc
    AHI-L2-FLDK-SST/2021/03/23/0100/20210323010000-STAR-L3C_GHRSST-SSTsubskin-AHI_H08-ACSPO_V2.80-v02.0-fv01.0.nc
    
  • Change in format from STAR to NCCF (again, there are several hourly files missing between these 2):

    AHI-L2-FLDK-SST/2021/04/05/1500/20210405150000-STAR-L3C_GHRSST-SSTsubskin-AHI_H08-ACSPO_V2.80-v02.0-fv01.0.nc
    AHI-L2-FLDK-SST/2021/04/06/1200/20210406120000-NCCF-L3C_GHRSST-SSTsubskin-AHI_H08-ACSPO_V2.80-v02.0-fv01.0.nc
    

Further, the actual transition from STAR to NCCF (shown above) appears not to jive with the documented time where that change should occur, which seems to indicate that the transition should appear starting 2021/05/03, but perhaps that document only indicates when NCCF files will first become available, but not necessarily the earliest date of NCCF files produced (sometime during 2021/04/05 or 2021/04/06 based upon the 2 files shown above). [Thanks to @wildintellect for locating this reference.]

Produce List of Expected L3C Files

Given that our visual inspection of the S3 file list makes it apparent that there are numerous gaps, we want to produce a list of expected hourly files so that we can identify all of the gaps.

This logic is similar to what we'll need for our FilePattern, and uses the pattern changes identified above (and perhaps writing something to automatically identify such pattern changes would be helpful, to avoid the manual, error-prone visual inspection):

# list-expected-files.py

import pandas as pd

def print_filenames(start, end, format, version):
    for date in pd.date_range(start, end, freq="1H"):
        print(
            "AHI-L2-FLDK-SST/{time:%Y/%m/%d/%H}00/{time:%Y%m%d%H}0000-{format}-"
            "L3C_GHRSST-SSTsubskin-AHI_H08-ACSPO_{version}-v02.0-fv01.0.nc".format(
                time=date,
                format=format,
                version=version
            )
        )

print_filenames("2019-12-10 16:00:00", "2020-07-02 14:00:00", "STAR", "V2.60")
print_filenames("2020-07-02 14:00:00", "2021-03-22 16:00:00", "STAR", "V2.71")
print_filenames("2021-03-23 01:00:00", "2021-04-05 15:00:00", "STAR", "V2.80")
print_filenames("2021-04-06 12:00:00", "2022-08-17 23:00:00", "NCCF", "V2.80")

To produce the list of expected files:

python list-expected-files.py > l3c-expected.txt

Identify Missing L3C Files

We can now produce a list of files that are missing from S3:

diff l3c-actual.txt l3c-expected.txt | grep "^>" | sed -E 's/^> (.*)/\1/' > l3c-missing.txt

For reference, I've attached a list of missing files, through 2022-08-17: l3c-missing.txt

@sharkinsspatial
Copy link
Contributor Author

@Patrick-Keown In preparation for generating a kerchunk index for the Himawari-8 Level 3 SST data we have identified several missing hourly time steps and a potential inconsistency in the timestamps for changes in product version id. There is email contact associated with the NOAA BDP https://registry.opendata.aws/noaa-himawari/ but in the spirit of tracking of conversation in an open repository I was hopeful that you might be able to provide some feedback on these anomalies, if there is more appropriate point of contact for these discussions please let me know.

  • Are these missing hourly timestamps expected?
  • Is there documentation describing the key naming structure for the version and the STAR to NCCF transition?

Thank you in advance for any input you can provide.

@Patrick-Keown
Copy link

Patrick-Keown commented Aug 24, 2022 via email

@sharkinsspatial
Copy link
Contributor Author

@Patrick-Keown I don't see an email in the chain for Matthew Jochum, is there a good contact point for him (you can message me directly with it so as not to publish his email on a open channel). For a list of the missing L3 SST time steps you can review this list https://github.com/pangeo-forge/staged-recipes/files/9377046/l3c-missing.txt compiled by @chuckwondo .

@Patrick-Keown
Copy link

Patrick-Keown commented Aug 25, 2022 via email

@Patrick-Keown
Copy link

Patrick-Keown commented Oct 11, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants