Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support non-sequential COCO category IDs #4354

Merged
merged 7 commits into from May 9, 2024
Merged

Conversation

brimoor
Copy link
Contributor

@brimoor brimoor commented May 6, 2024

Resolves #4293, Resolves #4162

Extends #4309 to support the full "spec" described in #4293 (comment).

import fiftyone as fo
import fiftyone.zoo as foz

dataset = foz.load_zoo_dataset(
    "coco-2017",
    split="validation",
    max_samples=50,
)

# minor change in behavior: this now only contains the actual class labels, no "interpolations"
assert len(dataset.default_classes) == 80

assert len(dataset.info["categories"]) == 80

print(dataset.info["categories"])
"""
[
    {'supercategory': 'person', 'id': 1, 'name': 'person'},
    {'supercategory': 'vehicle', 'id': 2, 'name': 'bicycle'},
    ...
    {'supercategory': 'indoor', 'id': 90, 'name': 'toothbrush'}
]
"""

# Use pre-defined category IDs
dataset.export(
    export_dir="/tmp/coco1",
    dataset_type=fo.types.COCODetectionDataset,
    categories=dataset.info["categories"],
)

# Use pre-defined category IDs and only export 'cat' objects
dataset.export(
    export_dir="/tmp/coco2",
    dataset_type=fo.types.COCODetectionDataset,
    categories=dataset.info["categories"],
    classes=["cat"],
)

Summary by CodeRabbit

Summary by CodeRabbit

  • New Features

    • Enhanced label input flexibility in functions, allowing both list of labels and dictionary mapping of class IDs to labels.
  • Improvements

    • Refined attribute and parameter naming for clarity and consistency across various methods and functions.
    • Upgraded logic in methods to handle new parameter structures and fallback scenarios effectively.

@brimoor brimoor requested a review from swheaton May 6, 2024 03:05
Copy link
Contributor

coderabbitai bot commented May 6, 2024

Walkthrough

The recent updates in the FiftyOne package primarily focus on enhancing the handling of class labels within the COCO dataset functionalities. The changes introduce a more flexible classes_map parameter to replace the classes list, enabling a dictionary mapping of class IDs to labels. These modifications are spread across various functions and methods, improving the system's ability to manage datasets with non-sequential or random category IDs, aiming to address issues like memory leaks during dataset imports.

Changes

Files Changes Summary
fiftyone/utils/coco.py Introduced classes_map in various functions and methods, replacing classes for improved mapping.
docs/source/user_guide/export_datasets.rst Updated documentation to include the option to pass both classes and categories parameters.
tests/unittests/import_export_tests.py Added code for exporting a COCO detection dataset with non-sequential categories.

Assessment against linked issues

Objective Addressed Explanation
Handle random category IDs without memory leak in COCODetection Dataset import (#4293)
Provide clear error messaging for issues with category IDs during dataset import (#4293) The changes focus on handling IDs better but do not explicitly mention improvements in error messages.

Poem

In the realm of code, where changes bloom bright,
A rabbit hopped in, bringing order to the sight.
🐇 "With classes_map, let's navigate this maze,
No more memory leaks in these data arrays!"
With a hop and a skip, and a fix in its stare,
Bugs scurry away, as solutions repair.
🌟 "Let's rejoice," it whispered with a smile,
"For cleaner datasets, free from turmoil!"


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Out of diff range and nitpick comments (4)
fiftyone/zoo/datasets/__init__.py (2)

Line range hint 120-120: Avoid using bare except statements; specify exception types to handle expected errors more gracefully.

-        except:
+        except OSError:

Line range hint 577-577: Avoid using bare except statements; specify exception types to handle expected errors more gracefully.

-        except:
+        except OSError:
fiftyone/utils/coco.py (2)

417-418: Initialization of _classes_map and _class_ids should be documented in the class docstring to clarify their usage and importance.

Consider adding a brief explanation in the class docstring about the purpose of _classes_map and _class_ids.


Line range hint 605-605: Ambiguous variable name l used in list comprehension. Consider using a more descriptive name.

-            license_map = {l.get("id", None): l.get(self.include_license, None) for l in info.get("licenses", [])}
+            license_map = {license.get("id", None): license.get(self.include_license, None) for license in info.get("licenses", [])}
Review Details

Configuration used: .coderabbit.yaml
Review profile: CHILL

Commits Files that changed from the base of the PR and between dc2e17f and 6b600a8.
Files selected for processing (2)
  • fiftyone/utils/coco.py (40 hunks)
  • fiftyone/zoo/datasets/init.py (1 hunks)
Additional Context Used
Ruff (12)
fiftyone/utils/coco.py (10)

605-605: Ambiguous variable name: l


908-908: Do not use bare except


918-918: Ambiguous variable name: l


1799-1799: Ambiguous variable name: l


1824-1824: Avoid equality comparisons to True; use if include_license: for truth checks


1988-1988: Ambiguous variable name: l


2042-2042: Do not use bare except


2139-2139: Avoid equality comparisons to True; use if extra_attrs: for truth checks


2142-2142: Avoid equality comparisons to False; use if not extra_attrs: for false checks


2264-2264: Do not use bare except

fiftyone/zoo/datasets/__init__.py (2)

120-120: Do not use bare except


577-577: Do not use bare except

Additional comments not posted (1)
fiftyone/zoo/datasets/__init__.py (1)

366-366: Ensure that info.classes is not None before assigning to dataset.default_classes.

Verification successful

The review comment suggested ensuring that info.classes is not None before assigning it to dataset.default_classes. The script output confirms that there is a check for info.classes not being None before the assignment occurs:

if info.classes is not None and not dataset.default_classes:
    dataset.default_classes = info.classes

Additionally, there is another usage where info.classes is set based on another variable classes if info.classes is not already set:

if classes and not info.classes:
    info.classes = classes

Both usages in the code handle the info.classes attribute carefully, ensuring it is not None before it is used in a critical assignment or set based on another variable. This aligns with the review comment's concern, and the implementation adheres to the suggested precaution.

Conclusion: The code correctly checks that info.classes is not None before using it in assignments.

Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify that `info.classes` is not `None` before assignment.

# Test: Search for the usage of `info.classes` in the file. Expect: Proper null checks before usage.
rg --type python $'info.classes' fiftyone/zoo/datasets/__init__.py

Length of output: 96



Script:

#!/bin/bash
# Description: Verify that `info.classes` is not `None` before assignment.

# Correcting the previous script error by removing the unrecognized file type flag.
rg 'info.classes' fiftyone/zoo/datasets/__init__.py

Length of output: 255

fiftyone/utils/coco.py Show resolved Hide resolved
fiftyone/utils/coco.py Show resolved Hide resolved
Comment on lines +460 to +462
if self.only_matching and self._class_ids is not None:
coco_objects = _get_matching_objects(
coco_objects, self.classes, self._classes
coco_objects, self._class_ids
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conditional check for only_matching and self._class_ids should be simplified to improve readability.

-            if self.only_matching and self._class_ids is not None:
+            if self.only_matching:

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
if self.only_matching and self._class_ids is not None:
coco_objects = _get_matching_objects(
coco_objects, self.classes, self._classes
coco_objects, self._class_ids
if self.only_matching:
coco_objects = _get_matching_objects(
coco_objects, self._class_ids

Comment on lines +558 to +569
classes_map,
supercategory_map,
images,
annotations,
) = load_coco_detection_annotations(
self.labels_path, extra_attrs=self.extra_attrs
)

classes = None
if classes_map is not None:
classes = _to_classes(classes_map)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable classes is redefined which might lead to confusion. Consider renaming it to avoid shadowing the outer scope variable.

-            classes = None
-            if classes_map is not None:
-                classes = _to_classes(classes_map)
+            local_classes = None
+            if classes_map is not None:
+                local_classes = _to_classes(classes_map)

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
classes_map,
supercategory_map,
images,
annotations,
) = load_coco_detection_annotations(
self.labels_path, extra_attrs=self.extra_attrs
)
classes = None
if classes_map is not None:
classes = _to_classes(classes_map)
classes_map,
supercategory_map,
images,
annotations,
) = load_coco_detection_annotations(
self.labels_path, extra_attrs=self.extra_attrs
)
local_classes = None
if classes_map is not None:
local_classes = _to_classes(classes_map)

try:
categories = _info.get("categories", None)
parse_coco_categories(categories)
except:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid using a bare except statement. Specify the exception type to handle specific exceptions only.

-        except:
+        except Exception as e:
+            logger.error("Failed to parse categories: %s", str(e))

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
except:
except Exception as e:
logger.error("Failed to parse categories: %s", str(e))

"name": l,
"supercategory": None,
}
for i, l in enumerate(classes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ambiguous variable name l used in list comprehension. Consider using a more descriptive name.

-            categories = [{ "id": i, "name": l, "supercategory": None } for i, l in enumerate(classes)]
+            categories = [{ "id": i, "name": label, "supercategory": None } for i, label in enumerate(classes)]

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
for i, l in enumerate(classes)
for i, label in enumerate(classes)


try:
classes_map, _ = parse_coco_categories(categories)
except:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid using a bare except statement. Specify the exception type to handle specific exceptions only.

-    except:
+    except Exception as e:
+        logger.error("Failed to parse categories from info: %s", str(e))

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
except:
except Exception as e:
logger.error("Failed to parse categories from info: %s", str(e))

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Out of diff range and nitpick comments (13)
fiftyone/utils/coco.py (7)

Line range hint 605-605: Ambiguous variable name l used in list comprehension. Consider using a more descriptive name.

-            license_map = {l.get("id", None): l.get(self.include_license, None) for l in info.get("licenses", [])}
+            license_map = {license_entry.get("id", None): license_entry.get(self.include_license, None) for license_entry in info.get("licenses", [])}

Line range hint 1795-1795: Ambiguous variable name l used in list comprehension. Consider using a more descriptive name.

-            categories = [{ "id": i, "name": l, "supercategory": None } for i, l in enumerate(classes)]
+            categories = [{ "id": i, "name": label, "supercategory": None } for i, label in enumerate(classes)]

Line range hint 1984-1984: Ambiguous variable name l used in list comprehension. Consider using a more descriptive name.

-            categories = [{ "id": i, "name": l, "supercategory": None } for i, l in enumerate(classes)]
+            categories = [{ "id": i, "name": label, "supercategory": None } for i, label in enumerate(classes)]

Line range hint 1820-1820: Avoid equality comparisons to True; use if include_license: for truth checks.

-        if include_license == True:
+        if include_license:

Line range hint 2127-2127: Avoid equality comparisons to True; use if extra_attrs: for truth checks.

-    if extra_attrs == True:
+    if extra_attrs:

Line range hint 2130-2130: Avoid equality comparisons to False; use if not extra_attrs: for false checks.

-    if extra_attrs == False:
+    if not extra_attrs:

Line range hint 2252-2252: Do not use bare except.

-    except:
+    except Exception:
tests/unittests/import_export_tests.py (6)

Line range hint 2160-2160: Remove unused import of pydicom.

- import pydicom  # pylint: disable=unused-import

Line range hint 3598-3598: Remove unused local variable img_filepath.

- img_filepath = self._new_image(name="openlabel_test")

Line range hint 3615-3615: Remove unused local variable img_filepath.

- img_filepath = self._new_image(name="openlabel_test")

Line range hint 3633-3633: Remove unused local variable img_filepath.

- img_filepath = self._new_image(name="openlabel_test")

Line range hint 3662-3662: Remove unused local variable img_filepath.

- img_filepath = self._new_image(name="openlabel_test")

Line range hint 3732-3732: Remove unused local variable vid_filepath.

- vid_filepath = self._new_video(filename="openlabel_test")
Review Details

Configuration used: .coderabbit.yaml
Review profile: CHILL

Commits Files that changed from the base of the PR and between 6b600a8 and b9ed7e9.
Files selected for processing (3)
  • docs/source/user_guide/export_datasets.rst (1 hunks)
  • fiftyone/utils/coco.py (42 hunks)
  • tests/unittests/import_export_tests.py (1 hunks)
Additional Context Used
Ruff (14)
fiftyone/utils/coco.py (8)

605-605: Ambiguous variable name: l


917-917: Ambiguous variable name: l


1795-1795: Ambiguous variable name: l


1820-1820: Avoid equality comparisons to True; use if include_license: for truth checks


1984-1984: Ambiguous variable name: l


2127-2127: Avoid equality comparisons to True; use if extra_attrs: for truth checks


2130-2130: Avoid equality comparisons to False; use if not extra_attrs: for false checks


2252-2252: Do not use bare except

tests/unittests/import_export_tests.py (6)

2160-2160: pydicom imported but unused


3598-3598: Local variable img_filepath is assigned to but never used


3615-3615: Local variable img_filepath is assigned to but never used


3633-3633: Local variable img_filepath is assigned to but never used


3662-3662: Local variable img_filepath is assigned to but never used


3732-3732: Local variable vid_filepath is assigned to but never used

Additional comments not posted (5)
fiftyone/utils/coco.py (5)

132-133: Update the docstring to reflect the change from classes to classes_map.


198-201: Ensure that the conversion from list to dictionary uses 1-based indexing if classes is a list, to maintain consistency with COCO's 1-based ID system.


460-462: The conditional check for only_matching and self._class_ids should be simplified to improve readability.


558-569: The variable classes is redefined which might lead to confusion. Consider renaming it to avoid shadowing the outer scope variable.


917-917: Ambiguous variable name l used in list comprehension. Consider using a more descriptive name.

Copy link
Contributor

@sashankaryal sashankaryal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@brimoor brimoor merged commit ba743bd into release/v0.24.0 May 9, 2024
10 checks passed
@brimoor brimoor deleted the fix_random_cat_id branch May 9, 2024 15:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants