Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[py-tx] Investigate dbm as a replacement for the default store #1272

Open
Dcallies opened this issue Mar 14, 2023 · 0 comments
Open

[py-tx] Investigate dbm as a replacement for the default store #1272

Dcallies opened this issue Mar 14, 2023 · 0 comments
Labels
help wanted python-threatexchange Items related to the threatexchange python tool / library

Comments

@Dcallies
Copy link
Contributor

We hand-rolled a file storage for python-threatexchange even though the data is extremely simple key-value storage:

  • Key: the int or string returned by fetch()
  • Value: the dataclass in the value returned by fetch(). All the core ones are compatible with dacite, and so json-serializable

The current implementation stores this in the json serialization of a massive dict, which requires a full in-memory merge. At larger dataset sizes, this becomes untenable.

Because the data partitions so easily, any string: string key value store should work. We've discussed sqllite in the past, but it has the downside of requiring additional libraries.

dbm seems like it might be just a straight up upgrade over the current dumb file. It's similarly flexible, but has the additional benefit that it may get an optimized implementation (on unix), and even the dumb implementation doesn't load the data, only the keynames.

https://docs.python.org/3/library/dbm.html

@Dcallies Dcallies added help wanted python-threatexchange Items related to the threatexchange python tool / library labels Mar 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted python-threatexchange Items related to the threatexchange python tool / library
Projects
None yet
Development

No branches or pull requests

1 participant