Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC][HMA-in-a-bottle] Possible example of creating a modular matcher #1302

Merged
merged 3 commits into from
Mar 30, 2023

Conversation

Sam-Freeman
Copy link

Summary

An example of decoupling the matcher logic from the CLI into its own module and generic base class which allows for creation of specific matcher 'types' -- this was the simplest way to separate the logic for file matching and raw hash matching.

Test Plan

This is an RFC, but ran it locally and it all works.

… of the matcher, and decoupling it from the cli code
@github-actions github-actions bot added the python-threatexchange Items related to the threatexchange python tool / library label Mar 30, 2023
Sam Freeman added 2 commits March 29, 2023 22:44
# Supposed to be without whitespace, but let's make sure
distance_str = "".join(r.similarity_info.pretty_str().split())
print(
# s_type.get_name(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, knowing the type matched is valuable. One options though I'm not sure it is worth the overhead is to have it as part of the metadata(s)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, agreed, and something that we/I need to come back to. I commented it out for now as it didn't exist in this scope, and didn't want to forget that I removed it.


class Matcher(t.Generic[T]):
# question (sa) -- how do we want to handle the settings? In the CLI it's considered
# a god object -- do we want to pass it to the init of the class?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a singleton?

) -> t.Sequence[IndexMatch]:
raise NotImplementedError

def _get_indicies(self) -> t.List[t.Tuple[t.Type[SignalType], SignalTypeIndex]]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/indicies/indices/

@dougneal
Copy link
Contributor

Nice, this is pretty much what I had in mind also.

@Sam-Freeman Sam-Freeman marked this pull request as ready for review March 30, 2023 14:31
@Sam-Freeman Sam-Freeman merged commit 6f71420 into facebook:hma-in-a-bottle Mar 30, 2023
2 checks passed
@Sam-Freeman Sam-Freeman deleted the hma-in-a-bottle branch March 30, 2023 15:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed python-threatexchange Items related to the threatexchange python tool / library
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants