Skip to content

Visual Document NER and New Healthcare Models in NLU 5.3.1 !

Compare
Choose a tag to compare
@C-K-Loan C-K-Loan released this 30 Apr 22:34
· 2 commits to release/531 since this release

We are excited to announce NLU 5.3.1 has been released! It comes with Visual Document NER, enabling you to extract entities from image files like JPGs.
Additionally 5 Healthcare Pipelines have been added for domains like Therapeutic Chemicals, HPO Resolvers, Voice of Patient, Oncology and Generic Clinical .
Additionally TextMatcherInternal based pipelines are now supported


Visual NER

VisualDocumentNER is a transformer-based model designed for Named Entity Recognition (NER) in documents. It serves as the primary interface for tasks such as detecting keys and values in datasets like FUNSD, representing the structure of a form. These keys and values are typically interconnected using a FormRelationExtractor model.

However, some VisualDocumentNER models are trained with a different approach, considering entities in isolation. These entities could be names, places, or medications, and the goal is not to connect these entities to others, but to utilize them individually.

Powered by Spark OCR's VisualDocumentNER


New Healthcare Models

NLU ref Model
en.resolve.atc_pipeline atc_resolver_pipeline
en.map_entity.hpo_resolver_pipe hpo_resolver_pipeline
en.explain_doc.pipeline_vop explain_clinical_doc_vop
en.explain_doc.clinical_generic.pipeline explain_clinical_doc_generic
en.explain_doc.clinical_oncology.pipeline explain_clinical_doc_oncology

New Medium Articles

Tutotirals on how to leverage Visual NLPs table extraction and Visual NER in 1 line and with custom pipelines:


📖Additional NLU resources


Installation

#PyPI
pip install nlu pyspark