New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delete Product CLI Command #1583
base: develop-1.9
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop-1.9 #1583 +/- ##
===============================================
+ Coverage 85.80% 85.84% +0.04%
===============================================
Files 140 140
Lines 15402 15469 +67
===============================================
+ Hits 13215 13279 +64
- Misses 2187 2190 +3 ☔ View full report in Codecov by Sentry. |
datacube/index/abstract.py
Outdated
@@ -641,6 +641,14 @@ def add_document(self, definition: JsonDict) -> Product: | |||
type_ = self.from_doc(definition) | |||
return self.add(type_) | |||
|
|||
@abstractmethod | |||
def delete(self, product: Product): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a ODC-beginner, it's hard to figure out if delete is guaranteed to succeed. Is there a reason return types are frequently omitted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The omission of a return type was my mistake, I've updated it to be None
, in line with the dataset purge
method. Beyond that, is there a more specific return type you would suggest to be more beginner-friendly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've seen some surprising types in my drive-bys of the pull requests in these repositories, but my main issue is missing type annotations. As a rule of thumb, input types help me understand what a function is working with/on, and return types generally help me understand if a function is expected to sometimes fail (None/bool) or produces/assembles some kind of data (string/dict/dataclass/etc).
I don't know if it makes sense for opendatacube, but the Ruff guy (Charlie Marsh) wrote about Mypy settings they used when he was working at Spring in a blog post: https://notes.crmarsh.com/using-mypy-in-production-at-spring
Even my own code, which is significantly smaller than opendatacube, doesn't pass with what he was running in large scale production, but if you want the pyproject.toml excerpt recommended without reading his post, here's what I believe the settings are:
[tool.mypy]
namespace_packages = true
disallow_untyped_defs = true
disallow_any_unimported = true
no_implicit_optional = true
check_untyped_defs = true
warn_return_any = true
show_error_codes = true
warn_unused_ignores = true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do run MyPy on this code, and it did pass (hmm). Thanks for the blog link, I'll play around with the settings and see what makes the most sense in a future PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're in the process of systematically typehinting a large complex codebase that has historically had major modules that were either not typehinted at all (or worse, were typehinted incorrectly).
We've come a long way in a relatively short period of time, but there's still a lot of work to be done in this space.
Definitely want this. Will this be backported to 1.8, or will datacube-explorer/datacube-ows work with 1.9 when that is released? |
@pjonsson Based on previous gh issues and discussions, it seems like this is best kept to 1.9+ - @SpacemanPaul would probably have deeper insight into this. |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good. I think we should move the logic up from the CLI to the index API - as per comments.
|
||
:param product: Product to be deleted | ||
""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the signature should be:
delete(self, product: Product, delete_archived_only: bool = True)
raise an exception if there exist archived datasets and delete_archived_only
is true.
(Edit: i.e. move this logic up from the CLI)
Add an exception to datacube.index.exceptions
- and impose an IndexException
superclass while you're there.
Also catch the low level and wrap it in an IndexException if deleting the product (or dataset) throws a database error - rather than just letting the low level exception bubble up as dataset.purge does currently.
active_product = True | ||
|
||
if active_product: | ||
if not force: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Push this logic up to the index API via the delete_archived_only
flag discussed above.
@@ -10,6 +10,7 @@ v1.9.next | |||
|
|||
- Fix typos in docs (:pull:`1577`) | |||
- Merge in 1.8.x branch changes. (:pull:`1568`, :pull:`1579`) | |||
- Add Product delete methods to API and command in CLI, plus misc cleanup of the surrounds (:pull:`1583`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to mention here that user requests for this feature have been around longer than the name of the project.
I understand your priorities, and I think they are the right ones. My need to delete things is still months/quarters away, so it's possible explorer for 1.9 will be really close and I can put some effort in there to help out, or you are already done with the porting effort and I will be happily using 1.9 by that time. |
Reason for this pull request
Functionality to delete products via the CLI has long been requested.
Proposed changes
Add
datacube product delete
CLI command that deletes the specified products and all related datasets. By default, products with active datasets will not be permitted to be deleted, but this can be overwritten with the--force
option.Misc cleanup in surrounding code; mostly minor changes that should have no user impact
Mark
search_unique_datasets
as deprecated since without multiple locations, it doesn't seem to provide anything beyond whatsearch_datasets
orsearch_returning
already do.Closes Deleting Products from ODC Index #177
Tests added / passed
Fully documented, including
docs/about/whats_new.rst
for all changes