Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates to the PresidioSentenceGenerator #72

Open
wants to merge 41 commits into
base: master
Choose a base branch
from

Conversation

omri374
Copy link
Contributor

@omri374 omri374 commented Feb 5, 2023

Continuation of the work on PR #50 by @Robbie-Palmer

  1. Removed redundant classes FakerSpan, FakerSpanResult and updated code to use Span and InputSample respectively instead, as discussed in PR PresidioSentenceFaker #50.
  2. Changed SentenceFaker to inherit from Faker instead of using composition.
  3. Simplified the use of SentenceFaker in the default option (RecordGenerator is instantiated if records are passed, otherwise a SpanGenerator is instantiated.)
  4. Ran black on all files I touched (which caused a lot of the changes in this PR)
  5. Updates to unit tests to support this change

Robbie-Palmer and others added 30 commits August 2, 2022 10:00
Map NATIONALITY entity to NRP not to LOCATION
Change dict literals to dict constructors to improve readability
Asa higher level abstraction over PresidioDataGenerator for utilising all templates and providers in this library
…fault AnalyzerEngine

Validate chosen language is available in provided AnalyzerEngine
Map NATIONALITY entity to NRP not to LOCATION
Change dict literals to dict constructors to improve readability
Asa higher level abstraction over PresidioDataGenerator for utilising all templates and providers in this library
…fault AnalyzerEngine

Validate chosen language is available in provided AnalyzerEngine
Update PresidioDataGenerator tests to make stronger assertions about contents of results
Update PresidioFakeRecordGenerator to use ReligionProvider
…-record-generator

# Conflicts:
#	presidio_evaluator/data_generator/__init__.py
#	presidio_evaluator/data_generator/faker_extensions/providers.py
#	presidio_evaluator/data_generator/presidio_data_generator.py
Co-authored-by: melmatlis <93650751+melmatlis@users.noreply.github.com>
…ator

# Conflicts:
#	presidio_evaluator/data_generator/presidio_data_generator.py
Rename PresidioFakeRecordGenerator to PresidioSentenceFaker to distinguish it from `RecordGenerator`
Robbie-Palmer and others added 11 commits January 17, 2023 16:13
Move functions for loading data from FakeNameGenerator.com in faker format into new datasets.py module
Move logic for choosing templates out of SentenceFaker into PresidioSentenceFaker
Remove generic read file function
Add missing HospitalProvider
Update Recognizer tests to use PresidioSentenceProvider
Make single module to hold all sentence semantic dependency logic for Faker, including SentenceFaker, RecordGenerator and RecordsFaker
…aker

Rename faker_to_presidio_entity_type to ENTITY_TYPE_MAPPING
Make presidio_templates_file_path and presidio_additional_entity_providers available from package
Update Data Generation README to outline choices
…cord-generator

# Conflicts:
#	presidio_evaluator/data_generator/faker_extensions/sentences.py
@omri374 omri374 changed the base branch from feature/new-datagen-and-eval to master March 15, 2023 09:15
@omri374 omri374 changed the base branch from master to feature/new-datagen-and-eval March 15, 2023 09:15
@omri374 omri374 changed the base branch from feature/new-datagen-and-eval to master March 15, 2023 09:17
@omri374 omri374 marked this pull request as ready for review March 15, 2023 09:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants