Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modularise the Hasher code from the hasher CLI #1303

Closed

Conversation

Sam-Freeman
Copy link

Summary

Pretty much as title, this is quite simple. It's the first iteration, and likely not the final form.

Test Plan

Ran locally, produces same result. All tests pass

@github-actions github-actions bot added the python-threatexchange Items related to the threatexchange python tool / library label Mar 30, 2023
@Sam-Freeman
Copy link
Author

Note: some of these were just merged in #1302 but are still showing up here.

Copy link
Contributor

@Dcallies Dcallies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the general thrust of this, and I think we are unequivocally going to do something very similar to this in pulling out components for HMAiaB.

However, I'm not convinced at some of the top level choices on the interface yet:

  1. I don't think passing in the settings object is needed for this functionality, and may be detrimental.
  2. Stepping back, what does a Hasher object represent? What would be the advantage over a functional version?
  3. Which things might be details of the CLI, and which things might be generalized?
  4. Looking at SignalType interface, we support 3 versions: bytes hasher, str hasher, and then a superclass - file hasher. How should that map to the hasher instance/helper?
  5. Which version will HMAiaB need, assuming that it will be sent original content?
  6. Is there anything we can learn from UnifiedHasher in AWS HMA? https://github.com/facebook/ThreatExchange/blob/main/hasher-matcher-actioner/hmalib/hashing/unified_hasher.py

class Hasher:
def __init__(
self,
settings: CLISettings,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the only thing you pass in the settings type for is the list of signal types. What if you passed in the full list of signal types?

Alternatively, you only use content type to narrow down to the applicable signal types. What if you only passed in those?

hash_str = hasher.hash_from_file(file)
if hash_str:
print(hasher.get_name(), hash_str)
hasher = Hasher(settings, self.content_type, self.signal_type)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blocking: Unfortunately, anything that is being passed in the settings god object is still heavily tied in with the CLI.

@facebook-github-bot
Copy link
Contributor

Hi @Sam-Freeman!

Thank you for your pull request.

We require contributors to sign our Contributor License Agreement, and yours needs attention.

You currently have a record in our system, but the CLA is no longer valid, and will need to be resubmitted.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

@Dcallies
Copy link
Contributor

Dcallies commented Mar 1, 2024

Hey @Sam-Freeman , I'm going to close this PR for inactivity, feel free to re-open if you still want to try and get it merged!

@Dcallies Dcallies closed this Mar 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed python-threatexchange Items related to the threatexchange python tool / library
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants