Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using only builtin Gazetteer and Grammar entity recognition without intent classification. #887

Open
JECasillas opened this issue Aug 27, 2020 · 3 comments
Labels

Comments

@JECasillas
Copy link

I would like to extract entities (interested in spanish Gazetteer and Grammar entities) exclusively; from any text, using python.

The way I know to use snips_nlu, in order to extract entities I can only do so from phrases that have an intent category for and have been trained, but I would like to extract any entities from any text I'm interested in, not texts I've trained or want to classify with an intent.
I was wondering if the entity extraction functionalities require the contextual information from training phrases to be recognized in a text, or if I could bypass the intent recognition functionalities that snips offer to use only the builtin entities
without them having to be specified and located in intent training phrases for intent classification.

@adrienballsonos
Copy link

Hello @JECasillas ,
It is not documented, but you can use the builtin entity parser of snips_nlu independently from the rest:

from snips_nlu.entity_parser import BuiltinEntityParser

parser = BuiltinEntityParser.build(language="es")
parser.parse("me despierto a las siete y media")  # [{'value': 'a las siete y media', 'resolved_value': {'kind': 'InstantTime', 'value': '2020-08-28 19:30:00 +02:00', 'grain': 'Minute', 'precision': 'Exact'}, 'entity_kind': 'snips/datetime', 'range': {'start': 13, 'end': 32}}]

# With gazetteer entities
parser = BuiltinEntityParser.build(language="es", gazetteer_entity_scope=["snips/city"])
parser.parse("mi vuelo a madrid es al mediodía")  # [{'value': 'mediodía', 'resolved_value': {'kind': 'InstantTime', 'value': '2020-08-28 12:00:00 +02:00', 'grain': 'Hour', 'precision': 'Exact'}, 'entity_kind': 'snips/datetime', 'range': {'start': 24, 'end': 32}}, {'value': 'madrid', 'resolved_value': {'kind': 'City', 'value': 'Madrid'}, 'entity_kind': 'snips/city', 'range': {'start': 11, 'end': 17}}]

I hope this will help.

@adrienballsonos
Copy link

Update on this, the builtin entity parser was designed to be used on short sentences (< 20 words). It will work on longer sentences but it may take some time to process, in which case you should try to split the text in smaller chunks before processing it with the parser.

@JECasillas
Copy link
Author

Hello @JECasillas ,
It is not documented, but you can use the builtin entity parser of snips_nlu independently from the rest:

from snips_nlu.entity_parser import BuiltinEntityParser

parser = BuiltinEntityParser.build(language="es")
parser.parse("me despierto a las siete y media")  # [{'value': 'a las siete y media', 'resolved_value': {'kind': 'InstantTime', 'value': '2020-08-28 19:30:00 +02:00', 'grain': 'Minute', 'precision': 'Exact'}, 'entity_kind': 'snips/datetime', 'range': {'start': 13, 'end': 32}}]

# With gazetteer entities
parser = BuiltinEntityParser.build(language="es", gazetteer_entity_scope=["snips/city"])
parser.parse("mi vuelo a madrid es al mediodía")  # [{'value': 'mediodía', 'resolved_value': {'kind': 'InstantTime', 'value': '2020-08-28 12:00:00 +02:00', 'grain': 'Hour', 'precision': 'Exact'}, 'entity_kind': 'snips/datetime', 'range': {'start': 24, 'end': 32}}, {'value': 'madrid', 'resolved_value': {'kind': 'City', 'value': 'Madrid'}, 'entity_kind': 'snips/city', 'range': {'start': 11, 'end': 17}}]

I hope this will help.

This seems like just the answer that I was looking for.
The texts, fortunately for this case, are typically not that long, however just a paragraph may easily have more than 20 words.
I'll be benchmarking it and probably implementing solutions much like you mention, splitting the text into groups of 20 and extracting separately.

I'll let you know if there are any other inconveniences or any less.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants