Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[py-tx] New extension interface for storage #1282

Open
Dcallies opened this issue Mar 27, 2023 · 0 comments
Open

[py-tx] New extension interface for storage #1282

Dcallies opened this issue Mar 27, 2023 · 0 comments
Labels
python-threatexchange Items related to the threatexchange python tool / library

Comments

@Dcallies
Copy link
Contributor

We want to the ability to add new storage mechanisms as an alternative to the one that comes installed by default in py-tx. We think that dbm might be a much better default storage

Pre-read material:

  1. Readme: https://github.com/facebook/ThreatExchange/tree/main/python-threatexchange
  2. SignalExchange interface (especially storage): https://github.com/facebook/ThreatExchange/blob/main/python-threatexchange/threatexchange/exchanges/signal_exchange_api.py#L20
  3. Backwards compatibility guarantee: https://github.com/facebook/ThreatExchange/tree/main/python-threatexchange#general-expectation-for-compatibility-and-versioning
  4. dbm module: https://docs.python.org/3/library/dbm.html

There will be a series of milestones:

  1. We'll be defining a new python interface for what methods need to be implemented for storage, likely patterned on https://github.com/facebook/ThreatExchange/blob/main/python-threatexchange/threatexchange/exchanges/helpers.py#L69
  2. Apply the interface to the existing storage at https://github.com/facebook/ThreatExchange/blob/main/python-threatexchange/threatexchange/cli/cli_state.py
  3. Create a dbm implementation of the interface
  4. Swap out the dbm version of the interface and show that it still produces the full dataset with the dataset command
  5. Add storage to the extensions interface at https://github.com/facebook/ThreatExchange/blob/main/python-threatexchange/threatexchange/extensions/manifest.py
  6. Add in a configuration field to https://github.com/facebook/ThreatExchange/blob/main/python-threatexchange/threatexchange/cli/cli_config.py#L47 which is the selected storage mechanism. Unset should default to the old in-memory merge file storage
  7. Add the ability to select the storage backend with a cli command - think about edge case behavior here
  8. End-to-end test swapping storages / large download
  9. [Stretch] work with Scott at the hackathon to spec out an AWS-based storage extension. It can live in https://github.com/facebook/ThreatExchange/tree/main/python-threatexchange/threatexchange/extensions as an "official" extension
@Dcallies Dcallies added the python-threatexchange Items related to the threatexchange python tool / library label Dec 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python-threatexchange Items related to the threatexchange python tool / library
Projects
None yet
Development

No branches or pull requests

1 participant