Skip to content

This repo aims to extract pieces of GDPR-like content and form well-structured data for easy processing. We measure the similarity between GDPR-like from different countries.

License

Notifications You must be signed in to change notification settings

kornosk/GDPR-similarity-comparison

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GDPR Similarity Comparison

This repo is a part of the report - Towards Automatic Comparison of Data Privacy Documents: A Preliminary Experiment on GDPR-like Laws 🔥

  • We extract information from GDPR-like documents from different countries written in natuaral language and construct well-strucured data.

  • The structured data are 4 columns including chapter, section, article and recital. This could benefit any future work that would like to explore GDPR-like using computational methods. 🚀

  • This project is inspired by COSC-824 Data Protection by Design, Department of Computer Science at Georgetown University.

Data

We convert from PDF to Docx to CSV with well-structured style. Now, our data include GDPR-like documents from:

  • European 🇪🇺
  • Brazil 🇧🇷
  • Indian 🇮🇳
  • What next? 😉

Simply load the data into a dataframe in Python as following code.

import pandas as pd

file_path = "data/LGPD-ES-Brazil-converted.csv"
df = pd.read_csv(file_path) # columns: ["chapter", "section", "article", "recital"]

Materials

Project Member

  • Kornraphop Kawintiranon - Github
  • Yaguang Liu - Github
  • Prof. Benjamin E. Ujcich (Instructor) - Personal

Citation

If you feel our paper and resources are useful and encouraging, please consider citing our work! 🙏

@article{kawintiranon2021automatic,
    title={Towards Automatic Comparison of Data Privacy Documents: A Preliminary Experiment on GDPR-like Laws},
    author={Kawintiranon, Kornraphop and Liu, Yaguang},
    journal={arXiv preprint arXiv:2105.10117},
    year={2021},
    url={https://arxiv.org/abs/2105.10117}
}

References

About

This repo aims to extract pieces of GDPR-like content and form well-structured data for easy processing. We measure the similarity between GDPR-like from different countries.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages