Refactor evaluation to allow span based metrics #71

tranguyen221 · 2023-01-30T15:34:24Z

Implement the compare_span function in Evaluator class which takes lists of annotated_span and predicted_span and then generates the evaluation output in numbers
Implement the get_overlap_ratio function in Span class to calculate the overlapping ratio between annotated and predicted offsets.
Add the unittest for those functions

melmatlis · 2023-02-05T11:14:35Z

A general comment regarding the data_object.py file and unrelated to the PR.
The file is really long and contains 2 classes, perhaps we can add a separate task to add a folder for "data objects" and split the classes to separate files?
@omri374 @tranguyen221 what are your thoughts?

omri374 · 2023-02-05T14:01:40Z

@melmatlis I agree, but suggest to wait with this until we finalize all the changes, in order not to make unnecessary conflicts.

omri374 · 2023-02-05T19:58:25Z

/azp run

azure-pipelines · 2023-02-05T19:58:34Z

Azure Pipelines successfully started running 1 pipeline(s).

omri374

Publishing initial comments as discussed in our code review

omri374 · 2023-01-31T11:55:39Z

presidio_evaluator/evaluation/evaluator.py



 class Evaluator:
    def __init__(
        self,
        verbose: bool = False,
        compare_by_io=True,
-        entities_to_keep: Optional[List[str]] = None,
+        entities_to_keep=True,


Shouldn't this be a list of entities? If the logic has changed, please update the name of the argument and its docstring

omri374 · 2023-01-31T11:56:15Z

presidio_evaluator/evaluation/evaluator.py

@@ -37,6 +40,25 @@ def __init__(
        self.entities_to_keep = entities_to_keep
        self.span_overlap_threshold = span_overlap_threshold

+        # setup a dict for storing the span metrics
+        self.span_model_metrics = {


What are your thoughts on having this as a class/dataclass?

omri374 · 2023-02-05T14:57:49Z

presidio_evaluator/evaluation_helpers.py

@@ -0,0 +1,156 @@
+import numpy as np


Can this file move into the evaluation folder?

omri374 · 2023-02-05T14:58:51Z

presidio_evaluator/evaluation_helpers.py

+        """
+        Given a predicted_span, get the best matchest annotated_span based on the overlap_threshold. 
+        Return a SpanOutput
+        :param sample: InputSample


Please align docstring with function signature

omri374 · 2023-02-05T15:00:21Z

presidio_evaluator/evaluation_helpers.py

+from presidio_evaluator.evaluation import SpanOutput
+
+
+def get_matched_gold(predicted_span: Span, 


Is this method used? If not, consider removing. If yes, is the logic aligned with the docstring?

omri374 · 2023-02-05T15:03:16Z

presidio_evaluator/evaluation_helpers.py

+    """Find the overlap between two ranges
+    Find the overlap between two ranges. Return the overlapping values if
+    present, else return an empty set().
+    Examples:


Please add params to docstring, change the example format (:Example:) and add type hints

omri374 · 2023-02-05T15:06:03Z

presidio_evaluator/data_objects.py

@@ -73,6 +74,14 @@ def intersect(self, other, ignore_entity_type: bool):
        return min(self.end_position, other.end_position) - max(
            self.start_position, other.start_position
        )
+
+    def get_overlap_ratio(self, other):


Suggested change

def get_overlap_ratio(self, other):

def get_overlap_ratio(self, other: "Span") -> float:

I know we don't have type hints across the entire codebase, but let's try to update at least the methods we add to modernize the codebase.

omri374 · 2023-02-05T19:48:10Z

presidio_evaluator/evaluation/evaluator.py

 from pathlib import Path
+from copy import deepcopy
+from difflib import SequenceMatcher


please remove unused imports

melmatlis

Hey Trang, thanks for the PR. A lot of work went into the complicated modifications. Thank you for these.
May we please have a peer review session on the evaluator.py file. I need some guidance on the code readability. Thank you

melmatlis · 2023-02-05T11:01:37Z

presidio_evaluator/data_objects.py

@@ -73,6 +74,14 @@ def intersect(self, other, ignore_entity_type: bool):
        return min(self.end_position, other.end_position) - max(
            self.start_position, other.start_position
        )
+
+    def get_overlap_ratio(self, other):


Suggested change

def get_overlap_ratio(self, other):

def get_overlap_ratio(self, other: Span):

melmatlis · 2023-02-05T11:03:00Z

presidio_evaluator/data_objects.py

+        """
+        Calculates the ratio as: ratio = 2.0*M / T , where M = matches , T = total number of elements in both sequences
+        """
+        nb_matches = self.intersect(other, ignore_entity_type = True)


Will we always want to ignore_the entity type? Perhaps we should pass it as and argument to the function?

melmatlis · 2023-02-05T11:09:38Z

presidio_evaluator/data_objects.py

+        """
+        nb_matches = self.intersect(other, ignore_entity_type = True)
+        total_characters = (self.end_position - self.start_position) + (other.end_position - other.start_position)
+        return np.round((2*nb_matches/total_characters), 2)


Is there any theoretical chance that total_characters will be equal to 0?

melmatlis · 2023-02-05T11:22:12Z

tests/test_data_objects.py

+    [
+        (150, 153, "123", "A", 150, 153, "123", "A", True),
+        (150, 153, "123", "B", 150, 153, "123", "A", False),
+        (150, 153, "123", "A", 150, 153, "345", "A", False),


is it possible that the same range will have different entity values?

melmatlis · 2023-02-05T13:20:12Z

presidio_evaluator/evaluation/evaluator.py

@@ -1,27 +1,30 @@
 from collections import Counter
 from typing import List, Optional, Dict, Tuple
 from pathlib import Path
+from copy import deepcopy


@tranguyen221 May we please do peer review on this file?

melmatlis · 2023-02-05T13:30:51Z

presidio_evaluator/evaluation/evaluator_objects.py

+    def __eq__(self, other):
+        return (
+            self.output_type == other.output_type
+            and self.overlap_score == other.overlap_score


Perhaps we should compare the floats using math.isclose or some other alternative in order to avoid floating point comparison errors?

This is one alternative:

melmatlis · 2023-02-05T14:04:39Z

presidio_evaluator/evaluation_helpers.py

@@ -22,57 +22,79 @@ def get_matched_gold(predicted_span: Span,
                            overlap_score=0
                            )

+def find_overlap(true_range, pred_range):


should we move this file to be part of the evaluation directory as well? what is the logic of what is included/excluded from the directory?

melmatlis · 2023-02-06T10:12:07Z

presidio_evaluator/evaluation_helpers.py

+def span_compute_actual_possible(results: dict) -> dict:
+    """
+    Takes a result dict that has been output by compute metrics.
+    Returns the results dict with actual, possible populated.


I would propose to update the doc string further and to explain what's "Actual" and "possible" refer to.
Add the formulas into the docstring as well

melmatlis · 2023-02-06T10:15:20Z

presidio_evaluator/evaluation_helpers.py

+    calculating precision and recall.
+    """
+
+    actual = results["actual"]


Based on the schema you created, we should have a class named EvaluationResult. Are the results of type dict as an intermediate solution?

tranguyen221 added 15 commits January 16, 2023 13:31

Initialize SampleError class

9018e6c

Initilize TokenOutput class

c7c8f30

Initialize SpanOutput. Rename method in TokenOuput

f6c3840

Initialize Evaluator class

c6fb0e4

Initialize some utils function in helpers

bcd142a

Initialize EvaluationResult class

d74aba1

Implement compare_span function

c64d2c2

Add __eq__ method for SpanOutput class

b0b7dcb

Implement __eq__ for TokenOutput class

9bf20e5

Add unittest + fix bugs

d3c8caa

Add simple case unittest for compare_span function

106aeeb

Add function and unittest for get overlap score

abc9d5d

Fix bugs in compare_span function.

e285ef4

Add functions to helpers

e3acf9c

Add test for span equal function

c92b543

tranguyen221 requested review from omri374 and melmatlis January 30, 2023 15:34

Update docs

d123cca

microsoft deleted a comment from azure-pipelines bot Feb 5, 2023

omri374 changed the base branch from feature/refactor-evaluator to master February 5, 2023 14:56

omri374 changed the base branch from master to data-generator-2.1 February 5, 2023 14:56

omri374 changed the base branch from data-generator-2.1 to master February 5, 2023 19:58

omri374 changed the base branch from master to feature/new-datagen-and-eval February 6, 2023 09:28

omri374 reviewed Feb 6, 2023

View reviewed changes

melmatlis requested changes Feb 6, 2023

View reviewed changes

tranguyen221 closed this Feb 7, 2023

omri374 changed the title ~~Tranguyen/implement compare span~~ Refactor evaluation to allow span based metrics Oct 30, 2023

omri374 reopened this Oct 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor evaluation to allow span based metrics #71

Refactor evaluation to allow span based metrics #71

tranguyen221 commented Jan 30, 2023

melmatlis commented Feb 5, 2023

omri374 commented Feb 5, 2023

omri374 commented Feb 5, 2023

azure-pipelines bot commented Feb 5, 2023

omri374 left a comment

omri374 Jan 31, 2023

omri374 Jan 31, 2023

omri374 Feb 5, 2023

omri374 Feb 5, 2023

omri374 Feb 5, 2023

omri374 Feb 5, 2023

omri374 Feb 5, 2023

omri374 Feb 5, 2023

melmatlis left a comment

melmatlis Feb 5, 2023

melmatlis Feb 5, 2023

melmatlis Feb 5, 2023

melmatlis Feb 5, 2023

melmatlis Feb 5, 2023

melmatlis Feb 5, 2023

melmatlis Feb 5, 2023

melmatlis Feb 6, 2023

melmatlis Feb 6, 2023

		from presidio_evaluator.evaluation import SpanOutput


		def get_matched_gold(predicted_span: Span,

	def get_overlap_ratio(self, other):
	def get_overlap_ratio(self, other: "Span") -> float:

	def get_overlap_ratio(self, other):
	def get_overlap_ratio(self, other: Span):

Refactor evaluation to allow span based metrics #71

Are you sure you want to change the base?

Refactor evaluation to allow span based metrics #71

Conversation

tranguyen221 commented Jan 30, 2023

melmatlis commented Feb 5, 2023

omri374 commented Feb 5, 2023

omri374 commented Feb 5, 2023

azure-pipelines bot commented Feb 5, 2023

omri374 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

melmatlis left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment