Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bloomfilter for Test-PasswordQuality #146

Open
PatchRequest opened this issue Sep 15, 2022 · 6 comments
Open

Bloomfilter for Test-PasswordQuality #146

PatchRequest opened this issue Sep 15, 2022 · 6 comments

Comments

@PatchRequest
Copy link

For: Test-PasswordQuality

Instead of passing a 30GB file with all hashes a bloomfilter could be created from it and used to check against it.
That would reduce the filesize to around 3GB and would be much faster and more efficent

I could implement such a feature would you be interested?

@MichaelGrafnetter
Copy link
Owner

Hi @PatchRequest , that sounds like a good idea!
What would be the expected search time for 1 hash and 10K hashes when compared to the current binary search approach? What would the false positive rate be and should it be dealt with?
What new paramater name of the Test-PasswordQuality cmdlet do you propose for this feature? -BlomFilterPath? And how would you like to name a cmdlet that would do the conversion? ConvertTo-BloomFilter?

@aseigler
Copy link
Contributor

If time were an issue, I could see this being helpful as a sort of pre-filter. Bloom filter would be much faster to return not in set, and if it returned possibly in set, a follow up lookup in the larger database would drop FPR to zero. That's probably how I'd approach it in my use case. I'd be interested in testing out how much faster I could run this scenario against my dataset.

@PatchRequest
Copy link
Author

I would use speed as a secondary argument i think size is more interesting because with a bloom filter the "bad password list" can fit on any usb stick with a false positive rate of 0.001%:

When benchmarking bloomfilters the nice think is they scale with O(1) while binary search is O(log N). Therefore the bigger the password list is the more efficent the bloom filter becomes. Which is a win-win situation

I think a parameter called -BlomFilterPath is a good idea, and the cmdlet for the creating it sounds good to.
The only thing i would add is to provide an bloomfilter for haveibeenpwnd already with github lfs. So the bad password check is just a git clone -> Downloading 3 GB -> Lets go

@MichaelGrafnetter
Copy link
Owner

Sounds great.
Regarding git lfs, I am a newbie here. Having issues with it, constantly getting download quota exceeded:

image

I used to store sample databases with git lfs, which was not a good idea. I am considering to do a cleanup and to upload my test ntds.dit files (several GBs) to Azure Blob Storage and to integrate their download into unit test runner.

@PatchRequest
Copy link
Author

mmh an alternative could be to get it hosted somewhere else where there is no quota :/

@PatchRequest
Copy link
Author

but anyways i will start to develop the features

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants