How to Use Custom SpaCy Model (beki/en_spacy_pii_distilbert) with Anonymize and Sensitive Scanners #112

rakendd · 2024-03-22T04:20:02Z

Hello llm_guard Team,

I've been exploring the use of custom models with the Anonymize and Sensitive scanners within the llm_guard library, as mentioned in the changelog for the latest release. Specifically, I'm interested in integrating the SpaCy model beki/en_spacy_pii_distilbert for PII detection tasks.

Objective
My goal is to leverage the beki/en_spacy_pii_distilbert model, which is not a traditional Hugging Face Transformer model but rather a SpaCy model, for enhanced PII detection accuracy and reduced latency as highlighted in your changelog.

Issue
I encountered difficulties when attempting to load and use this SpaCy model with the Anonymize scanner. Typically, the process for integrating models relies on specifying a model path or configuration that is compatible with Hugging Face's Transformer models. However, given that beki/en_spacy_pii_distilbert is a SpaCy model, the standard approach doesn't seem to apply.

Attempts
Here's an outline of my approach so far, based on the available documentation and examples:

Model Specification: Attempted to specify beki/en_spacy_pii_distilbert directly as a model path or through a configuration dictionary.
Custom Recognizer: Explored creating a custom recognizer to wrap the SpaCy model loading and analysis logic.
Adapter Pattern: Considered using an adapter to bridge the gap between the expected input/output formats of the llm_guard scanners and the SpaCy model.
The last approach is kind of working. But wanted to know best practise to use this model inside llm_guard

custom_recognizer = CustomSpacyRecognizer()  
adapter = CustomRecognizerAdapter(custom_recognizer=custom_recognizer)


vault = Vault()
scanner = Anonymize(
    vault=vault,
    language="en",
    use_faker=True,
    custom_recognizer=adapter  # Passing the adapter as the custom recognizer
)

Could you provide guidance or examples on how to correctly integrate a SpaCy model like beki/en_spacy_pii_distilbert with the Anonymize and Sensitive scanners?

Thank you for developing llm_guard and for your support in enhancing its capabilities. I look forward to your advice on integrating SpaCy models for improved PII detection.

Best regards,
Rakend

The text was updated successfully, but these errors were encountered:

asofter · 2024-03-22T08:32:15Z

Hey @rakendd , thanks for reaching out. We used to have this model but then realized that it blocked updates to the latest transformers due to dependency on "spacy-transformers>=1.1.8,<1.2.0".

https://llm-guard.com/changelog/#030-2023-10-14

I think if this model can be updated, then we could make another custom recognizer or just use the spacy one like we did before.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to Use Custom SpaCy Model (beki/en_spacy_pii_distilbert) with Anonymize and Sensitive Scanners #112

How to Use Custom SpaCy Model (beki/en_spacy_pii_distilbert) with Anonymize and Sensitive Scanners #112

rakendd commented Mar 22, 2024 •

edited

asofter commented Mar 22, 2024

How to Use Custom SpaCy Model (beki/en_spacy_pii_distilbert) with Anonymize and Sensitive Scanners #112

How to Use Custom SpaCy Model (beki/en_spacy_pii_distilbert) with Anonymize and Sensitive Scanners #112

Comments

rakendd commented Mar 22, 2024 • edited

asofter commented Mar 22, 2024

rakendd commented Mar 22, 2024 •

edited