Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option for non-greedy matching #76

Open
gerritholl opened this issue Jan 21, 2021 · 0 comments
Open

Add option for non-greedy matching #76

gerritholl opened this issue Jan 21, 2021 · 0 comments

Comments

@gerritholl
Copy link
Contributor

Feature request

The file delivery system in use at the German Weatherservice (DWD) (the Automatic File Distributor, see also English language link) delivers files to a system by first creating a temporary file starting with a ., then renaming that file when the transfer is complete. For example, it will first create .AVHR_HRP_00_M01_20210121082440Z_20210121082740Z_N_O_20210121082441Z, then moves that file to AVHR_HRP_00_M01_20210121082440Z_20210121082740Z_N_O_20210121082441Z when copying is complete. Due to the non-greedy matching implemented in Trollsift, any pattern that matches the final, intended file will also match the temporary, unintended file. For example, using filepattern={path}AVHR_HRP_00_{platform_name}_{start_time:%Y%m%d%H%M%S}Z_{end_time:%Y%m%d%H%M%S}Z_N_O_{processing_time:%Y%m%d%H%M%S}Z, we get:

[DEBUG: 2021-01-17 22:25:47,850: trollstalker] trigger: IN_CLOSE_WRITE
[DEBUG: 2021-01-17 22:25:47,851: trollstalker] processing /data/pytroll/IN/HRPT/.AVHR_HRP_00_M03_20210117221434Z_20210117221733Z_N_O_20210117221434Z
[DEBUG: 2021-01-17 22:25:47,851: trollstalker] filter: {path}AVHR_HRP_00_{platform_name}_{start_time:%Y%m%d%H%M%S}Z_{end_time:%Y%m%d%H%M%S}Z_N_O_{processing_time:%Y%m%d%H%M%S}Z  event: /data/pytroll/IN/HRPT/.AVHR_HRP_00_M03_20210117221434Z_20210117221733Z_N_O_20210117221434Z
[DEBUG: 2021-01-17 22:25:47,851: trollstalker] No origin_inotify_base_dir_skip_levels in self.custom_vars
[DEBUG: 2021-01-17 22:25:47,852: trollstalker] Extracted: OrderedDict([('path', '.'), ('platform_name', 'M03'), ('start_time', datetime.datetime(2021, 1, 17, 22, 14, 34)), ('end_time', datetime.datetime(2021, 1, 17, 22, 17, 33)), ('processing_time', datetime.datetime(2021, 1, 17, 22, 14, 34))])
[DEBUG: 2021-01-17 22:25:47,852: trollstalker] self.info['sensor']: ['avhrr/3']
[INFO: 2021-01-17 22:25:47,852: trollstalker] Publishing message pytroll://file/poes/avhrr file pytroll@oflks333.dwd.de 2021-01-17T22:25:47.852507 v1.01 application/json {"path": ".", "platform_name": "Metop-C", "start_time": "2021-01-17T22:14:34", "end_time": "2021-01-17T22:17:33", "processing_time": "2021-01-17T22:14:34", "uri": "/data/pytroll/IN/HRPT/.AVHR_HRP_00_M03_20210117221434Z_20210117221733Z_N_O_20210117221434Z", "uid": ".AVHR_HRP_00_M03_20210117221434Z_20210117221733Z_N_O_20210117221434Z", "s
ensor": ["avhrr/3"], "orig_platform_name": "M03"}
[DEBUG: 2021-01-17 22:25:47,865: trollstalker] trigger: IN_MOVED_TO
[DEBUG: 2021-01-17 22:25:47,865: trollstalker] processing /data/pytroll/IN/HRPT/AVHR_HRP_00_M03_20210117221434Z_20210117221733Z_N_O_20210117221434Z
[DEBUG: 2021-01-17 22:25:47,866: trollstalker] filter: {path}AVHR_HRP_00_{platform_name}_{start_time:%Y%m%d%H%M%S}Z_{end_time:%Y%m%d%H%M%S}Z_N_O_{processing_time:%Y%m%d%H%M%S}Z  event: /data/pytroll/IN/HRPT/AVHR_HRP_00_M03_20210117221434Z_20210117221733Z_N_O_20210117221434Z
[DEBUG: 2021-01-17 22:25:47,866: trollstalker] No origin_inotify_base_dir_skip_levels in self.custom_vars
[DEBUG: 2021-01-17 22:25:47,866: trollstalker] Extracted: OrderedDict([('path', ''), ('platform_name', 'M03'), ('start_time', datetime.datetime(2021, 1, 17, 22, 14, 34)), ('end_time', datetime.datetime(2021, 1, 17, 22, 17, 33)), ('processing_time', datetime.datetime(2021, 1, 17, 22, 14, 34))])
[DEBUG: 2021-01-17 22:25:47,866: trollstalker] self.info['sensor']: ['avhrr/3']
[INFO: 2021-01-17 22:25:47,867: trollstalker] Publishing message pytroll://file/poes/avhrr file pytroll@oflks333.dwd.de 2021-01-17T22:25:47.867246 v1.01 application/json {"path": "", "platform_name": "Metop-C", "start_time": "2021-01-17T22:14:34", "end_time": "2021-01-17T22:17:33", "processing_time": "2021-01-17T22:14:34", "uri": "/data/pytroll/IN/HRPT/AVHR_HRP_00_M03_20210117221434Z_20210117221733Z_N_O_20210117221434Z", "uid": "AVHR_HRP_00_M03_20210117221434Z_20210117221733Z_N_O_20210117221434Z", "sensor": ["avhrr/3"], "orig_platform_name": "M03"}

This is undesirable because Trollstalker sends messages about these files. Somewhere down the chain this is going to cause error messages when the temporary files are absent. Although those error messages do not prevent successful processing of the final input files, they do clutter the logs and may make more anomalous error messages harder to spot.

Describe the solution you'd like

I would like a new flag for whole-name matching only. This would likely require a change in both Trollstalker and Trollsift.

Describe any changes to existing user workflow

The new flag would be optional, and the default behaviour would correspond to the status quo. Therefore, this improvement should have no impact on backward compatibility.

Additional context

I tried to change the filepattern, but if I add a / between {path} and {AVHR...}, then neither the temporary nor the final file are matched; this is because Trollstalker matches against the filename, not against the full path.

My present workaround is to monitor only for the event {{IN_MOVED_TO}} and not for {{IN_CLOSE_WRITE}}. This workaround is problematic because it relies on an implementation detail of the file monitoring software. This detail may change without warning (from the perspective of us users), which could therefore suddenly break operational file processing. Therefore, a more sustainable solution would be desirable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant