Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proper handling of database duplicates #252

Open
johntruckenbrodt opened this issue Jul 3, 2023 · 1 comment
Open

proper handling of database duplicates #252

johntruckenbrodt opened this issue Jul 3, 2023 · 1 comment

Comments

@johntruckenbrodt
Copy link
Owner

The SQLite database created via drivers.Archive maintains two tables data and duplicates. The latter contains all scenes that share a unique outname_base attribute (ID) with a scene in data. At the moment the first scene with a unique ID is put into data and no check is done to compare further scenes that share its ID.
One large deficiency of outname_base (different products with same ID, e.g. S1 SLCs and GRDs) was recently described in #251.
Furthermore, the scene in data and the scene to be inserted need to be compared to decide which of the two will be put into data. It often happens that scenes are reprocessed/republished and the scene with the latest processing time should be put into data. This could mean that the one that is currently in this table is moved to duplicates if a scene with a later processing time is being inserted into the database.

@johntruckenbrodt
Copy link
Owner Author

johntruckenbrodt commented Sep 12, 2023

#251 has been fixed in #256

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant