Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blog Idea: "How to Scrub for and Obscure (Remove Entirely) PII From Your Data" #9

Open
cardoni opened this issue Apr 12, 2016 · 0 comments

Comments

@cardoni
Copy link
Owner

cardoni commented Apr 12, 2016

Discuss my latest work going on over at Float, Inc.. Namely, on the topic of How to scrub for and obscure (or entirely remove) whatever PII that you'd like, from your data.

Specifically, I'm doing this in JavaScript, and—as I plan to release an open source library that does just this—I ought to write about the subject of PII, explain what it is, and how to not include it in your PII-free (long term) data store.

In dealing with the financial data for many thousands of customers and applicants at Float, we're entrusted by our users to properly protect and safeguard their most sensitive data. This means that we follow the law, and always strive for the highest level of customer-protection we can offer. As such ... yadda ... yadda ... yadda,... we scrub all PII-data before storing anything in a format that we long-term store and use for anonymous statistics / research.

PII is any data that could potentially identify a specific individual. The idea is to remove as much personally-identifying information as possible from your data. If someone got hold of your transactional data, they can detect which other transactions were transacted by a single individual, but not who in particular transacted said transactions. You might know someone's first name, but you wouldn't know their address or full-name, their phone number, email addresses within transactions, or know with whom they sent transfers or received money.

How do you find an address in a string. An email address? Phone number? Variations on someone's name. Credit card numbers? Etc., in JavaScript? Well, take a look at my library, X .... Coming Soon. At very least, you pass it text like this (example) and tell it to check for this type of data or (that) organization of words (regex) and optionally pass in which parts of an object ought to be omitted, let-through, and/or obfuscated—and how to do any of that (example).

Some tips on dealing with PII within your app:

  1. Streaming data to AWS S3/Redshift from memory, in-app, instead of writing anything to disk locally and uploading those files.
  2. Ensuring that after copying an object, you use regex, say, to indicate which data to copy over, which to obscure and hash the data, and which to remove from the copy entirely.
  3. US vs International: Phone numbers, Addresses, email addresses, etc.
  4. That's all I can think of for now. Will need to add more.
  5. Any Redshift tips?
@cardoni cardoni added the topics label Apr 12, 2016
@cardoni cardoni assigned cardoni and unassigned cardoni May 4, 2016
@cardoni cardoni added this to the Ten More Articles! milestone May 4, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant