Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pdq] Replace POSIX dependent functions with STL equivalent #1333

Merged
merged 6 commits into from Jul 17, 2023

Conversation

ianwal
Copy link
Contributor

@ianwal ianwal commented Jul 17, 2023

Summary

While I was testing vpdq on Windows I had to make a couple changes to get it to build with pdq.

They're solid changes in general, so I thought I would upload them.

Use C++11 STL over OS dependent functions in general:

  • Change file pointers to streams
    • File pointers were used with getline() which is POSIX only. Replaced file pointers only wherever necessary with streams and use std::getline(). I tried to change as little as possible here.
  • Use C++ random device instead of random() which is POSIX only
    • This could have been done with rand() but I just used the C++11 one because it's a C++ project
  • Fix __builtin_popcount where hamming*() should have been used
    • There's a define in pdqhamming.h which chooses either a lookup table or __builtin_popcount based on the USE_BUILTIN_POPCOUNT directive but some functions were directly calling __builtin_popcount regardless.

None of these changes should affect bindings. The Python binding doesn't call any of the changed functions since they're almost all file IO.

I'm not sure why __builtin_popcount is being used at all. std::bitset::count and __builtin_popcount generate the same assembly. I can replace __builtin_popcount with std::bitset::count if you would like.

The getline problem is a really good example of why it's not good to do using namespace std; since std::getline and getline are both functions.

I vaguely remember reading something about Buck for building and if you guys use that for PDQ I'm not sure how these changes would be affected by that.

Test Plan

Run included PDQ tests and vPDQ tests.

@ianwal ianwal requested a review from Dcallies as a code owner July 17, 2023 15:42
@github-actions github-actions bot added the pdq Items related to the pdq libraries or reference implementations label Jul 17, 2023
Oops, this  was just for testing. Linux should be suppported.
Copy link
Contributor

@Dcallies Dcallies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed explanation in the summary. I had a few questions that normally would have forced a roundtrip, but I was able to answer them from reading your summary.

As for the test plan, I don't know if there are dedicated unittests for it, the blast radius for the C++ implementation might be slightly larger if it is the one linked to the python bindings, though you mentioned it's not in the summary.

@Dcallies Dcallies merged commit c0cc68f into facebook:main Jul 17, 2023
6 checks passed
@ianwal
Copy link
Contributor Author

ianwal commented Jul 18, 2023

Did you have any thoughts on using std::bitset::count?

@Dcallies
Copy link
Contributor

If it produces the right output, then I'm game!

@ianwal ianwal deleted the pdq-fix-glibc-funcs branch January 11, 2024 07:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed pdq Items related to the pdq libraries or reference implementations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants