Skip to content

nunesgh/inep-anonymization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

INEP 1 (syntactic) Anonymization

DOI

Code and attributes hierarchies used for the anonymization process of INEP datasets using ARX Deidentifier tool.

DOI: 10.5281/zenodo.6533684.

The resulting datasets were used for vulnerability assessment using the BVM library (10.5281/zenodo.6533704). The assessment results were published in: Mário S. Alvim, Natasha Fernandes, Annabelle McIver, Carroll Morgan, Gabriel H. Nunes - Flexible and scalable privacy assessment for very large datasets, with an application to official governmental microdata (2022, 10.48550/arXiv.2204.13734).

We randomly selected only one record for each student with a same unique pseudonymization code (ID_ALUNO) in each dataset. The enrollment code (ID_MATRICULA) for each selected record is available in 10.5281/zenodo.6533675 (gitlab.com/nunesgh/inep-enrollment-codes).

ARX version

The jar files in arx/jars/ were compiled from the ARX fork made by @ramongonze, based on commit 8a936d3 and using the command ant -buildfile build.xml.

This fork allows for the creation of matrices with up to (2^31-1)^2 cells, instead of the original limit of up to 2^31-1 cells. Due to some GUI errors caused by the new feature, it is necessary to run ARX via CLI. For more information, see this issue.

License

The Unlicense.

Footnotes

  1. The Anísio Teixeira National Institute of Educational Studies and Research.

About

Code and attributes hierarchies used for the anonymization process of INEP datasets using ARX Deidentifier tool.

Topics

Resources

License

Stars

Watchers

Forks