Using only builtin Gazetteer and Grammar entity recognition without intent classification. #887

JECasillas · 2020-08-27T19:02:38Z

I would like to extract entities (interested in spanish Gazetteer and Grammar entities) exclusively; from any text, using python.

The way I know to use snips_nlu, in order to extract entities I can only do so from phrases that have an intent category for and have been trained, but I would like to extract any entities from any text I'm interested in, not texts I've trained or want to classify with an intent.
I was wondering if the entity extraction functionalities require the contextual information from training phrases to be recognized in a text, or if I could bypass the intent recognition functionalities that snips offer to use only the builtin entities
without them having to be specified and located in intent training phrases for intent classification.

adrienballsonos · 2020-08-28T07:54:52Z

Hello @JECasillas ,
It is not documented, but you can use the builtin entity parser of snips_nlu independently from the rest:

from snips_nlu.entity_parser import BuiltinEntityParser

parser = BuiltinEntityParser.build(language="es")
parser.parse("me despierto a las siete y media")  # [{'value': 'a las siete y media', 'resolved_value': {'kind': 'InstantTime', 'value': '2020-08-28 19:30:00 +02:00', 'grain': 'Minute', 'precision': 'Exact'}, 'entity_kind': 'snips/datetime', 'range': {'start': 13, 'end': 32}}]

# With gazetteer entities
parser = BuiltinEntityParser.build(language="es", gazetteer_entity_scope=["snips/city"])
parser.parse("mi vuelo a madrid es al mediodía")  # [{'value': 'mediodía', 'resolved_value': {'kind': 'InstantTime', 'value': '2020-08-28 12:00:00 +02:00', 'grain': 'Hour', 'precision': 'Exact'}, 'entity_kind': 'snips/datetime', 'range': {'start': 24, 'end': 32}}, {'value': 'madrid', 'resolved_value': {'kind': 'City', 'value': 'Madrid'}, 'entity_kind': 'snips/city', 'range': {'start': 11, 'end': 17}}]

I hope this will help.

adrienballsonos · 2020-08-28T07:57:25Z

Update on this, the builtin entity parser was designed to be used on short sentences (< 20 words). It will work on longer sentences but it may take some time to process, in which case you should try to split the text in smaller chunks before processing it with the parser.

JECasillas · 2020-08-28T23:45:17Z

Hello @JECasillas ,
It is not documented, but you can use the builtin entity parser of snips_nlu independently from the rest:

from snips_nlu.entity_parser import BuiltinEntityParser

parser = BuiltinEntityParser.build(language="es")
parser.parse("me despierto a las siete y media")  # [{'value': 'a las siete y media', 'resolved_value': {'kind': 'InstantTime', 'value': '2020-08-28 19:30:00 +02:00', 'grain': 'Minute', 'precision': 'Exact'}, 'entity_kind': 'snips/datetime', 'range': {'start': 13, 'end': 32}}]

# With gazetteer entities
parser = BuiltinEntityParser.build(language="es", gazetteer_entity_scope=["snips/city"])
parser.parse("mi vuelo a madrid es al mediodía")  # [{'value': 'mediodía', 'resolved_value': {'kind': 'InstantTime', 'value': '2020-08-28 12:00:00 +02:00', 'grain': 'Hour', 'precision': 'Exact'}, 'entity_kind': 'snips/datetime', 'range': {'start': 24, 'end': 32}}, {'value': 'madrid', 'resolved_value': {'kind': 'City', 'value': 'Madrid'}, 'entity_kind': 'snips/city', 'range': {'start': 11, 'end': 17}}]

I hope this will help.

This seems like just the answer that I was looking for.
The texts, fortunately for this case, are typically not that long, however just a paragraph may easily have more than 20 words.
I'll be benchmarking it and probably implementing solutions much like you mention, splitting the text into groups of 20 and extracting separately.

I'll let you know if there are any other inconveniences or any less.

JECasillas added the question label Aug 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using only builtin Gazetteer and Grammar entity recognition without intent classification. #887

Using only builtin Gazetteer and Grammar entity recognition without intent classification. #887

JECasillas commented Aug 27, 2020

adrienballsonos commented Aug 28, 2020

adrienballsonos commented Aug 28, 2020

JECasillas commented Aug 28, 2020

Using only builtin Gazetteer and Grammar entity recognition without intent classification. #887

Using only builtin Gazetteer and Grammar entity recognition without intent classification. #887

Comments

JECasillas commented Aug 27, 2020

adrienballsonos commented Aug 28, 2020

adrienballsonos commented Aug 28, 2020

JECasillas commented Aug 28, 2020