Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PLA-585][external] Allow imports where some annotation frames are missing a main type #788

Merged
merged 36 commits into from Apr 18, 2024

Conversation

JBWilkie
Copy link
Contributor

Problem

This PR solves one main problem (1) but includes lots of clean up and refactoring of old V1 code and concepts as well (2 & 3). It's a very large PR, so I've annotated it heavily to help with reviewing it

  • 1: In IO-1163, we introduced logic to exports to compress the resulting JSON if we determine that the export size was sufficiently large that it would cause it to OOM. This compression took the form of only exporting keyframes, and adding an only_keyframes field in the JSON to reflect this. Unfortunately, darwin-py import logic was built on the assumption that every frame has a main annotation type, which this change violates, breaking these imports
  • 2: Removal of the concept of a complex_polygon annotation type. In V1, we stored polygons as complex polygons if they had >1 path. We removed this distinction in V2, instead referring to all polygons as type polygon. Therefore we should update darwin-py to reflect this
  • 3: Removal of old functions & erroneous deprecation messages. Darwin-py contains a lot of unused functions from the V1 version of the platform. These have already been re-written for V2 and will thus never be used again. There were also many deprecation messages that were not needed, out of date or wrong (functions were actually in use)

Solution

  • 1: For each annotation with only_keyframes set to True, get the first frame with a main type. Store this type & annotation data. Then, iterate through each frame of the annotation. If the main type is missing, use the stored value to populate it for import. Otherwise if the main type is present, update the stored annotation data. Repeat this for the entire annotation
  • 2: Removed the concept of complex polygons from darwin-py, instead referring to all polygons simply as type polygon to reflect how we store polygons in our DB
  • 3: Removed the unused functions, and removed incorrect deprecation messages

Changelog

  • Updated import logic to support importing of annotations that were compressed during export to avoid memory cap errors
  • Removed the distinction between complex polygons & polygons
  • Removed unused functions & erroneous deprecation warnings

Copy link

linear bot commented Mar 14, 2024

removed_in="0.8.0",
current_version=__version__,
details="The api_url parameter will be removed.",
)
def download_all_images_from_annotations(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is used by pull()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the deprecation warning points to a api_url parameter:

    details="The api_url parameter will be removed.",

If we're removing the deprecation warning - Should we remove the api_url parameter from the function download_all_images_from_annotations?

I found a PR #335 which introduced the deprecation 👀

@@ -401,7 +82,7 @@ def _calculate_tag_categories(
categories[annotation_class.name] = _calculate_category_id(
annotation_class
)
return categories
return dict(sorted(categories.items(), key=lambda item: item[1]))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes are to Ensure COCO categories are always in the same, ascending order to avoid E2E flakiness

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for fixing this!! 🥇

We could also replace the lambda function here and use python's operator.itemgetter (it can be a little faster than lambda function, especially for large dictionaries), by something like:

from operator import itemgetter  # importing it

- return dict(sorted(categories.items(), key=lambda item: item[1]))
+ return dict(sorted(categories.items(), key=itemgetter(1)))

removed_in="0.8.0",
current_version=__version__,
details=DEPRECATION_MESSAGE,
)
def build_main_annotations_lookup_table(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used when importing annotations

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder what was the goal behind deprecating this? 🤔

Ah found it in this PR e034b37#diff-c59e878e89aaf69bda12258e348fc25d69750c2841667a84da53f2d4f5bf60feR26-R28

Should we make it private instead by prefixing it with _?

removed_in="0.8.0",
current_version=__version__,
details=DEPRECATION_MESSAGE,
)
def find_and_parse( # noqa: C901
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used when importing annotations

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RE:

I wonder what was the goal behind deprecating this? 🤔

Ah found it in this PR e034b37#diff-c59e878e89aaf69bda12258e348fc25d69750c2841667a84da53f2d4f5bf60feR26-R28

Should we make it private instead by prefixing it with _?

removed_in="0.8.0",
current_version=__version__,
details=DEPRECATION_MESSAGE,
)
def build_attribute_lookup(dataset: "RemoteDataset") -> Dict[str, Unknown]:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used when importing annotations

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RE:

I wonder what was the goal behind deprecating this? 🤔

Ah found it in this PR e034b37#diff-c59e878e89aaf69bda12258e348fc25d69750c2841667a84da53f2d4f5bf60feR26-R28

Should we make it private instead by prefixing it with _?

removed_in="0.8.0",
current_version=__version__,
details=DEPRECATION_MESSAGE,
)
def get_remote_files(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used when importing annotations

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RE:

I wonder what was the goal behind deprecating this? 🤔

Ah found it in this PR e034b37#diff-c59e878e89aaf69bda12258e348fc25d69750c2841667a84da53f2d4f5bf60feR26-R28

Should we make it private instead by prefixing it with _?

if "polygon" in data:
if len(annotation.data["paths"]) > 1:
data["polygon"] = {
"path": annotation.data["paths"][0],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When importing polygons, they have to be structured as a primary path, and any subsequent paths have to be passed in additional_paths in the payload

@@ -128,7 +106,6 @@ def is_image_extension_allowed_by_filename(filename: str) -> bool:
return any(filename.lower().endswith(ext) for ext in SUPPORTED_IMAGE_EXTENSIONS)


@deprecation.deprecated(deprecated_in="0.8.4", current_version=__version__)
def is_image_extension_allowed(extension: str) -> bool:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used when pulling exports

@JBWilkie JBWilkie requested a review from saurbhc March 14, 2024 16:25
removed_in="0.8.0",
current_version=__version__,
details="The api_url parameter will be removed.",
)
def download_all_images_from_annotations(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the deprecation warning points to a api_url parameter:

    details="The api_url parameter will be removed.",

If we're removing the deprecation warning - Should we remove the api_url parameter from the function download_all_images_from_annotations?

I found a PR #335 which introduced the deprecation 👀

darwin/datatypes.py Outdated Show resolved Hide resolved
@@ -36,6 +36,7 @@ def darwin_to_dt_gen(
for d in split_video_annotation(data):
d.seq = count
count += 1
print(count)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the print for debugging? 🤔

@@ -401,7 +82,7 @@ def _calculate_tag_categories(
categories[annotation_class.name] = _calculate_category_id(
annotation_class
)
return categories
return dict(sorted(categories.items(), key=lambda item: item[1]))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for fixing this!! 🥇

We could also replace the lambda function here and use python's operator.itemgetter (it can be a little faster than lambda function, especially for large dictionaries), by something like:

from operator import itemgetter  # importing it

- return dict(sorted(categories.items(), key=lambda item: item[1]))
+ return dict(sorted(categories.items(), key=itemgetter(1)))

@@ -476,6 +157,7 @@ def _build_annotations(
) -> Iterator[Optional[Dict[str, Any]]]:
annotation_id = 0
for annotation_file in annotation_files:
print(annotation_file.filename)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the print for debugging? 🤔

removed_in="0.8.0",
current_version=__version__,
details=DEPRECATION_MESSAGE,
)
def build_main_annotations_lookup_table(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder what was the goal behind deprecating this? 🤔

Ah found it in this PR e034b37#diff-c59e878e89aaf69bda12258e348fc25d69750c2841667a84da53f2d4f5bf60feR26-R28

Should we make it private instead by prefixing it with _?

removed_in="0.8.0",
current_version=__version__,
details=DEPRECATION_MESSAGE,
)
def find_and_parse( # noqa: C901
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RE:

I wonder what was the goal behind deprecating this? 🤔

Ah found it in this PR e034b37#diff-c59e878e89aaf69bda12258e348fc25d69750c2841667a84da53f2d4f5bf60feR26-R28

Should we make it private instead by prefixing it with _?

removed_in="0.8.0",
current_version=__version__,
details=DEPRECATION_MESSAGE,
)
def build_attribute_lookup(dataset: "RemoteDataset") -> Dict[str, Unknown]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RE:

I wonder what was the goal behind deprecating this? 🤔

Ah found it in this PR e034b37#diff-c59e878e89aaf69bda12258e348fc25d69750c2841667a84da53f2d4f5bf60feR26-R28

Should we make it private instead by prefixing it with _?

removed_in="0.8.0",
current_version=__version__,
details=DEPRECATION_MESSAGE,
)
def get_remote_files(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RE:

I wonder what was the goal behind deprecating this? 🤔

Ah found it in this PR e034b37#diff-c59e878e89aaf69bda12258e348fc25d69750c2841667a84da53f2d4f5bf60feR26-R28

Should we make it private instead by prefixing it with _?

@JBWilkie
Copy link
Contributor Author

@saurbhc I've actioned all points of feedback

Copy link
Member

@saurbhc saurbhc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you 🙏

@JBWilkie JBWilkie merged commit 973d9cc into master Apr 18, 2024
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants