-
Notifications
You must be signed in to change notification settings - Fork 297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modularise the Hasher code from the hasher CLI #1303
Conversation
… of the matcher, and decoupling it from the cli code
… of the matcher, and decoupling it from the cli code
…ge into hma-in-a-bottle
… into hasher-module
Note: some of these were just merged in #1302 but are still showing up here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the general thrust of this, and I think we are unequivocally going to do something very similar to this in pulling out components for HMAiaB.
However, I'm not convinced at some of the top level choices on the interface yet:
- I don't think passing in the settings object is needed for this functionality, and may be detrimental.
- Stepping back, what does a Hasher object represent? What would be the advantage over a functional version?
- Which things might be details of the CLI, and which things might be generalized?
- Looking at SignalType interface, we support 3 versions: bytes hasher, str hasher, and then a superclass - file hasher. How should that map to the hasher instance/helper?
- Which version will HMAiaB need, assuming that it will be sent original content?
- Is there anything we can learn from UnifiedHasher in AWS HMA? https://github.com/facebook/ThreatExchange/blob/main/hasher-matcher-actioner/hmalib/hashing/unified_hasher.py
class Hasher: | ||
def __init__( | ||
self, | ||
settings: CLISettings, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the only thing you pass in the settings type for is the list of signal types. What if you passed in the full list of signal types?
Alternatively, you only use content type to narrow down to the applicable signal types. What if you only passed in those?
hash_str = hasher.hash_from_file(file) | ||
if hash_str: | ||
print(hasher.get_name(), hash_str) | ||
hasher = Hasher(settings, self.content_type, self.signal_type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blocking: Unfortunately, anything that is being passed in the settings god object is still heavily tied in with the CLI.
Hi @Sam-Freeman! Thank you for your pull request. We require contributors to sign our Contributor License Agreement, and yours needs attention. You currently have a record in our system, but the CLA is no longer valid, and will need to be resubmitted. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks! |
Hey @Sam-Freeman , I'm going to close this PR for inactivity, feel free to re-open if you still want to try and get it merged! |
Summary
Pretty much as title, this is quite simple. It's the first iteration, and likely not the final form.
Test Plan
Ran locally, produces same result. All tests pass