Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flag to only collect count of people and be GDPR compliant? #50

Open
mwargan opened this issue Feb 20, 2019 · 3 comments
Open

Flag to only collect count of people and be GDPR compliant? #50

mwargan opened this issue Feb 20, 2019 · 3 comments

Comments

@mwargan
Copy link

mwargan commented Feb 20, 2019

Related: #31 and #4

Would it be possible to add a flag so we do NOT store MAC addresses, and only see an aggregate count of devices at a given timestamp? The program is great, but as it stands, can not be used on public networks in Europe over GDPR compliancy :/

@mwargan
Copy link
Author

mwargan commented Feb 21, 2019

Ok so not a flag, but I did create a fork and comment out the MAC for GDPR line 250: https://github.com/mwargan/howmanypeoplearearound/blob/master/howmanypeoplearearound/__main__.py

@mwargan
Copy link
Author

mwargan commented Feb 21, 2019

Hey @AlexNaga! I think that hashing the MAC would make it not anonymous but pseudoanonymous, which means that it could be reversed engineered with more information (like the hashing algorithm).

Your idea to add the date is a good one as well, but the same data can be achieved by just setting a longer scan time, like -s 3600, which would be a safer option as it won't store any MAC address.

The whole problem lies with understanding how many unique devices there were over a period that is an aggregate of the sampling period (e.g. how many people came in a day when we only track how many people in a given hour)? Apparently the problem is so common it has a wikipedia page :D :https://en.wikipedia.org/wiki/Count-distinct_problem

I'm still unsure of what to do, but for now have just commented out the MAC in my fork so ultimately its not stored.

@buremba
Copy link

buremba commented Nov 6, 2019

@mwargan you can use Hyperloglog algorithm and push the mac IPs into the hash function for the minimum interval that you want to calculate. If you can create a Hyperloglog instance for each minute and merge them for creating rollups of hour, day or even month. Here is an example implementation in Python: https://github.com/svpcom/hyperloglog

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants