Skip to content

A script to anonymize the datasets by replacing personal identifying information with fake using Python

Notifications You must be signed in to change notification settings

HariprasadManimozhi/Anonymize-data-in-python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

Anonymize data in Python

A possible solution to dealing with Personal identifying information(PII) in the datasets is to anonymize the dataset by replacing information that would identify a real individual with information about a fake (but similarly behaving or sounding) individual.

Objective:

Given a target dataset (for example, a CSV file with multiple columns), produce a new dataset such that for each row in the target, the anonymized dataset does not contain any personally identifying information. The anonymized dataset should have the same amount of data and maintain its analytical value.

Tools:

There are two third-party libraries for generating fake data with Python

  • Faker
  • Fake Factory, also called “Faker”

Faker provides anonymization for user profile data, which is completely generated on a per-instance basis. Fake Factory uses a providers approach to load many different fake data generators in multiple languages (deprecated now - still useable)

References:

About

A script to anonymize the datasets by replacing personal identifying information with fake using Python

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published