Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collect and publish data for multiple different time slots #142

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

pnuu
Copy link
Member

@pnuu pnuu commented May 24, 2023

This PR adds a way to collect metadata for multiple different configurable time slots and publish them in a single message.

Closes #140

@pnuu pnuu requested a review from mraspaud May 24, 2023 11:26
@pnuu pnuu self-assigned this May 24, 2023
@codecov
Copy link

codecov bot commented May 24, 2023

Codecov Report

Merging #142 (ad7cbbe) into main (5f154a1) will decrease coverage by 0.68%.
The diff coverage is 97.90%.

@@            Coverage Diff             @@
##             main     #142      +/-   ##
==========================================
- Coverage   91.64%   90.96%   -0.68%     
==========================================
  Files          27       29       +2     
  Lines        4115     4547     +432     
==========================================
+ Hits         3771     4136     +365     
- Misses        344      411      +67     
Flag Coverage Δ
unittests 90.96% <97.90%> (-0.68%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pytroll_collectors/segments.py 93.14% <96.66%> (+0.64%) ⬆️
pytroll_collectors/tests/test_segments.py 100.00% <100.00%> (ø)

... and 3 files with indirect coverage changes

@coveralls
Copy link

coveralls commented May 24, 2023

Pull Request Test Coverage Report for Build 5087848949

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 0.0%

Totals Coverage Status
Change from base Build 5067098725: 0.0%
Covered Lines: 0
Relevant Lines: 0

💛 - Coveralls

@pnuu
Copy link
Member Author

pnuu commented May 25, 2023

For reference, here are some message data structures I found from my production logs.

For segment gatherer I only found this structure:

dataset = {
    "start_time": "2023-05-25T10:50:00",
    "platform_name": "Meteosat-11",
    "sensor": ["seviri"],
    "dataset": [
        {
            "uri": "/data/oper/seviri/rss/level1.5/H-000-MSG4__-MSG4_RSS____-_________-PRO______-202305251050-__",
            "uid": "H-000-MSG4__-MSG4_RSS____-_________-PRO______-202305251050-__"
        },
        ...
        {
            "uri": "/data/oper/seviri/rss/level1.5/H-000-MSG4__-MSG4_RSS____-_________-EPI______-202305251050-__",
            "uid": "H-000-MSG4__-MSG4_RSS____-_________-EPI______-202305251050-__"
        }
    ],
}

Message type is dataset. The same structure is present also when collecting e.g. VIIRS channel segments.

For the simple case of single-segment AVHRR data the geographic gatherer returns collection messages such as

collection = {
    "sensor": "avhrr",
    "platform_name": "Metop-C",
    "start_time": "2023-05-25T06:24:00",
    "end_time": "2023-05-25T06:33:00",
    "collection": [
        {
            "start_time": "2023-05-25T06:24:00",
            "end_time": "2023-05-25T06:25:00",
            "uri": "/data/oper/avhrr/ears/level0/AVHR_HRP_00_M03_20230525062400Z_20230525062500Z_N_O_20230525062820Z",
            "uid": "AVHR_HRP_00_M03_20230525062400Z_20230525062500Z_N_O_20230525062820Z"
        },
        ...
        {
            "start_time": "2023-05-25T06:32:00",
            "end_time": "2023-05-25T06:33:00",
            "uri": "/data/oper/avhrr/ears/level0/AVHR_HRP_00_M03_20230525063200Z_20230525063300Z_N_O_20230525063403Z",
            "uid": "AVHR_HRP_00_M03_20230525063200Z_20230525063300Z_N_O_20230525063403Z"
        }
    ]
}

For compact VIIRS data having two channel segments for a single time the collection consists of datasets.

collection_of_datasets = {
    "start_time": "2023-05-11T01:40:54.200000",
    "end_time": "2023-05-11T01:50:51.500000",
    "platform_name": "NOAA-20",
    "sensor": ["viirs"],
    "collection": [
        {
            "dataset": [
                {
                    "uri": "/data/oper/viirs/ears/level1b/SVDNBC_j01_d20230511_t0140542_e0142187_b28372_c20230511015204000213_eum_ops.h5",
                    "uid": "SVDNBC_j01_d20230511_t0140542_e0142187_b28372_c20230511015204000213_eum_ops.h5"
                },
                {
                    "uri": "/data/oper/viirs/ears/level1b/SVMC_j01_d20230511_t0140542_e0142187_b28372_c20230511015212000170_eum_ops.h5",
                    "uid": "SVMC_j01_d20230511_t0140542_e0142187_b28372_c20230511015212000170_eum_ops.h5"
                }
            ],
            "start_time": "2023-05-11T01:40:54.200000",
            "end_time": "2023-05-11T01:42:18.700000"
        },
        ...
        {
            "dataset": [
                {
                    "uri": "/data/oper/viirs/ears/level1b/SVDNBC_j01_d20230511_t0149270_e0150515_b28372_c20230511015839000126_eum_ops.h5",
                    "uid": "SVDNBC_j01_d20230511_t0149270_e0150515_b28372_c20230511015839000126_eum_ops.h5"
                },
                {
                    "uri": "/data/oper/viirs/ears/level1b/SVMC_j01_d20230511_t0149270_e0150515_b28372_c20230511015848000237_eum_ops.h5",
                    "uid": "SVMC_j01_d20230511_t0149270_e0150515_b28372_c20230511015848000237_eum_ops.h5"
                }
            ],
            "start_time": "2023-05-11T01:49:27",
            "end_time": "2023-05-11T01:50:51.500000"
        }
    ]
}

@pnuu
Copy link
Member Author

pnuu commented May 25, 2023

The multicollection message type could be something like this:

multicollection = {
    "start_times": ["2023-05-25T10:50:00", ... "2023-05-25T11:50:00"],
    "end_times": [],
    "platform_name": "Meteosat-11",
    "sensor": ["seviri"],
    "multicollection":
    [
        {
            "start_time": "2023-05-25T10:50:00",
            "platform_name": "Meteosat-11",
            "sensor": ["seviri"],
            "dataset": [
                {
                    "uri": "/data/oper/seviri/rss/level1.5/H-000-MSG4__-MSG4_RSS____-_________-PRO______-202305251050-__",
                    "uid": "H-000-MSG4__-MSG4_RSS____-_________-PRO______-202305251050-__"
                },
                ...
                {
                    "uri": "/data/oper/seviri/rss/level1.5/H-000-MSG4__-MSG4_RSS____-_________-EPI______-202305251050-__",
                    "uid": "H-000-MSG4__-MSG4_RSS____-_________-EPI______-202305251050-__"
                }
            ],
        },
        ...
        {
            "start_time": "2023-05-25T11:50:00",
            "platform_name": "Meteosat-11",
            "sensor": ["seviri"],
            "dataset": [
                {
                    "uri": "/data/oper/seviri/rss/level1.5/H-000-MSG4__-MSG4_RSS____-_________-PRO______-202305251150-__",
                    "uid": "H-000-MSG4__-MSG4_RSS____-_________-PRO______-202305251150-__"
                },
                ...
                {
                    "uri": "/data/oper/seviri/rss/level1.5/H-000-MSG4__-MSG4_RSS____-_________-EPI______-202305251150-__",
                    "uid": "H-000-MSG4__-MSG4_RSS____-_________-EPI______-202305251150-__"
                }
            ],
        }
    ]
}

The lists start_times and end_times on the top level might help later on with sorting, data selection, or something. With my chosen path of using segment gatherer internals in the code, it is not possible to collect data from different streams. If the collection happened in a separate process by listening to multiple segment or geographic gatherers, we could get multicollections like this:

multicollection_2 = {
    "start_times": ["2023-05-25T10:50:00", ..., "2023-05-06T21:52:10.300000"],
    "end_times": [None, ..., "2023-05-06T21:53:34.800000"],
    "platform_names": ["Meteosat-11", ..., "NOAA-20"],
    "sensors": ["seviri", ..., "viirs"],
    "multicollection":
    [
        {
            "start_time": "2023-05-25T10:50:00",
            "platform_name": "Meteosat-11",
            "sensor": ["seviri"],
            "dataset": [
                {
                    "uri": "/data/oper/seviri/rss/level1.5/H-000-MSG4__-MSG4_RSS____-_________-PRO______-202305251050-__",
                    "uid": "H-000-MSG4__-MSG4_RSS____-_________-PRO______-202305251050-__"
                },
                ...
                {
                    "uri": "/data/oper/seviri/rss/level1.5/H-000-MSG4__-MSG4_RSS____-_________-EPI______-202305251050-__",
                    "uid": "H-000-MSG4__-MSG4_RSS____-_________-EPI______-202305251050-__"
                }
            ],
        },
        ...
        {
            "start_time": "2023-05-06T21:52:10.300000",
            "end_time": "2023-05-06T21:53:34.800000",
            "platform_name": "NOAA-20",
            "sensor": ["viirs"]
            "dataset": [
                {
                    "uri": "/data/oper/viirs/ears/level1b/SVDNBC_j01_d20230506_t2152103_e2153348_b28312_c20230506220612000459_eum_ops.h5",
                    "uid": "SVDNBC_j01_d20230506_t2152103_e2153348_b28312_c20230506220612000459_eum_ops.h5"
                },
                ...
                {
                    "uri": "/data/oper/viirs/ears/level1b/SVMC_j01_d20230506_t2152103_e2153348_b28312_c20230506220623000658_eum_ops.h5",
                    "uid": "SVMC_j01_d20230506_t2152103_e2153348_b28312_c20230506220623000658_eum_ops.h5"
                }
            ],
        }
    ]
}

This structure could be used in collecting geo ring data for example, which could then be processed with Satpy MultiScene in one go. Now that I think of it, this would need a completely different logic compared to the initial purpose of this PR (publish multiple scenes with the same time for example), so I'll go with the former.

@pnuu
Copy link
Member Author

pnuu commented May 25, 2023

As @mraspaud said in #140 (comment) , I'll swap this collection type to temporal_collection and start building the metadata collection.

@pnuu pnuu marked this pull request as ready for review May 26, 2023 06:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Ready for review
Development

Successfully merging this pull request may close these issues.

Add "multi-scene" collecting and publishing
2 participants