Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Rasa Entities Roles and Groups #48

Open
tomgun132 opened this issue Aug 26, 2020 · 7 comments
Open

Support for Rasa Entities Roles and Groups #48

tomgun132 opened this issue Aug 26, 2020 · 7 comments
Labels
improvement Improvement of an existing feature question Interrogation on the functioning of the program

Comments

@tomgun132
Copy link

tomgun132 commented Aug 26, 2020

Hello, thanks for your work. I've been using chatette until recently.

I want to know whether you have plan to support the new Rasa's entities roles and groups since it's quite important to have with the recent Rasa.
If you don't plan to do it in short term, I might be able to look at it, but the parser section of the code is quite difficult to understand for me compared to the adapter part.

I've also created a new yml adapter in preparation for Rasa 2.0 on my forked repo.

Thanks!

@SimGus
Copy link
Owner

SimGus commented Aug 26, 2020

Hi :)

Thanks for your kind message

I don't currently have plans to support the entities roles and groups in the near future, as there are other things of higher priority on my todo list and I don't have as much time to work on this as I used to.
I quite agree that the parser code is quite hard to understand, I plan to improve it when I have more time.

I hope this isn't too much of an issue for you.
Feel free to implement it, I'd be happy to answer any questions you have about my code if you decide to do this :)

I'll take a look at your yaml adapter on your repo as well.

Have a good day!

@SimGus SimGus added improvement Improvement of an existing feature question Interrogation on the functioning of the program labels Aug 26, 2020
@tomgun132
Copy link
Author

tomgun132 commented Aug 27, 2020

Hi, thank you for your reply.

If I have time, I will take a look into it. Can you give me suggestion on what kind of symbol that can be used for entities roles and groups? Also, which part of the code that I should start looking into if I want to add this functionality?

Cheers!

@SimGus
Copy link
Owner

SimGus commented Sep 4, 2020

Hi!

Sorry about the late answer, I've been extremely busy for the past few days.

As entity roles and groups are related to entities, I'd say we could put the information about roles and groups within parentheses placed after the entity (e.g. @[entity](HERE)). Obviously, that's just a suggestion, if you have a better idea, propose away :)

Note that we can always change symbols before we merge changes.

To implement this kind of feature, you'll need to do 3 things:

  • Add parsing abilities for this new syntax. This is done by implementing (and calling) a parsing rule. I'd suggest starting by looking at the parser which opens the file(s), calls the different parsing rules and create the abstract syntax tree.
  • Store this information so that the output files can be correctly created. From what I can see, it would make sense to store the information about roles and groups in the SlotDefinition class.
  • Adapt all adapters to be able to use (or discard) this new information and create the different output files.

This is quite a big task that requires a quite deep understanding of the current code, so feel free to ask for help if you try this out!

Sorry again for the late answer.
Thanks for your enthusiasm!

@tomgun132
Copy link
Author

tomgun132 commented Mar 9, 2021

Hi, I've been busy so I just started looking into this improvement recently. I think I've found on a way on how to do it, but I need advice on adding the entity roles/groups to the Rule class. Before that I want to comment on something on what you said:

As entity roles and groups are related to entities, I'd say we could put the information about roles and groups within parentheses placed after the entity (e.g. @entity). Obviously, that's just a suggestion, if you have a better idea, propose away :)
Store this information so that the output files can be correctly created. From what I can see, it would make sense to store the information about roles and groups in the SlotDefinition class.

I think the entity role and group cannot be stored inside SlotDefinition class since it should be defined inside the intent's sentence pattern. For example, based on what Rasa's documentation about role and group "- I want to fly from [Berlin]{"entity": "city", "role": "departure"} to [San Francisco]{"entity": "city", "role": "destination"}.", both entities are city, but has different role based on the position so I don't think we can define role in SlotDefinition.

Based on above, I'm thinking of adding role/group pattern inside intent's sentence pattern, for example:

%[askFlight]
        I want to fly from @[place]("role": "departure") to @[place]("role": "destination")

As you can see, there can be an annotation pattern right after entity reference pattern. I've added this to the rule in my repository right here. As a result of adding it outside the entity reference, I will need to change something around here to separate the slot_ref identifier from alias_ref and intent_ref and add a function to detect whether there is an annotation after entity reference. There are some things that I'm still not sure about:

  • For the pattern, which one is better, the pattern I wrote above or to make the annotation inside entity ref, e.g. @[place("role": "departure)]?
  • Should the rule raise ValueError if there is a syntax error in matching annotation pattern?
  • I think the role/group information should be put inside UnitReference class and it should generate a SlotUnit class with the role/group information. However, I'm still looking for a way on where to do these processes. Do you have any advice on this? What I have in mind is to create a new SlotReference class that inherit from UnitReference, then override _generate_random_strategy and _generate_all_strategy functions.
  • as for implementing to other adapter, I'm not sure if markdown should be supported since Rasa has said that they have deprecated markdown format. But implementing adapter is quite easy compared to adding new pattern rules so I can do that. Let me know your thought on this.

Hope to hear something from you soon!
Cheers!

@SimGus
Copy link
Owner

SimGus commented Mar 16, 2021

Hello,

Sorry for the late answer. I'm glad you are interested in implementing this!

About the fact that roles and groups cannot be stored in SlotDefinition directly, you are right, I didn't think this through. It does make more sense to store this in the related intent definition.

We could even add a way to declare which values for roles and groups are expected for a given intent (e.g. the role destination makes sense for intent bookFlight but not intent greet). However, I would say it's not necessary in a first implementation: all roles/groups could be accepted for any slot inside any intent for now. Therefore I'm not sure it's even useful to have anything stored in the intent definition at all.

About where you want to make changes (and the code you already wrote), I think you are on the right track. Be careful when modifying the parser though, it is a central and complicated part of the system and is easy to get wrong: just be sure that the unit tests are still working after you made your changes.

For the pattern, I'd go for @[slot]("role": "xyz") rather than @[slot("role": "xyz")] as this is already what was done for intent annotations. That way, the annotation syntax is similar, making it easier for the user to remember. Plus, it might allow us to reuse more of the existing annotation code that way.

For the error to be raised when there is a syntax error, I'd say a SyntaxError is the best solution here. It is what I did for other cases like those (see here for example).

It definitely sounds good to me to make a new subclass of UnitReference to store role/group information. Putting it directly in UnitReference wouldn't make much sense and would make code more complicated.
SlotReference sounds like a very good name for it. It should indeed override both methods _generate_random_strategy and _generate_all_strategy. From what I can see, none of the other methods need to be overridden.

About Rasa's markdown format, I actually didn't know it was deprecated (I didn't check in a few months). I'd prefer to continue supporting it until it actually gets removed from Rasa's codebase entirely, so I think this should be implemented for the relevant adapter as well. That being said, I'm not sure Rasa itself supports roles/groups in the markdown format (I can't find anything about this in their documentation). If it turns out they don't, then of course we don't need to modify this adapter.

That means the adapters that should be changed are:

  • the rasa (json) adapter
  • the jsonl adapter
  • the rasa markdown adapter (only if this feature is supported by Rasa)

I hope this clears it up for you and you are still enthusiastic about implementing this.
Feel free to ask any more question you would have!

@tomgun132
Copy link
Author

Hello,

I think I have almost finished the implementations of adding role/group to Chatette. As you said, I'm going with the pattern @[slot]("role": "xyz") because of the same reason you mentioned. However, for adding error to be raised, I think this is a bit tricky because I made the logic in this part to add role/group annotation if the pattern matches with RuleAnnotation class, otherwise it will be added as a normal word token. Therefore, if the annotation pattern is wrong or incomplete, Parser will add it as a Word rule. I think it's better to add a warning instead of a SyntaxError in the rule_unit_ref.py. What do you think about this?

I've also created a new subclass of UnitReference called SlotRoleGroupReference since Slot reference without role/group will still be handled by UnitReference after all. To call this new subclass, I added a new logic inside the Parser class to separate the TerminalType.slot_ref_end from other unit reference type. If slot end token detected, it will then check whether there is an annotation or not after the slot. If there is an annotation, then the role/group annotation will be added to current builder, which's supposed to be UnitRefBuilder. I edited the function create_concrete inside UnitRefBuilder to return the newly created subclass SlotRoleGroupReference if it has the role/group annotation inside the class.

As for the Adapter, rasa (json) and jsonl adapter are automatically implemented after I added role and/or group into as_dict function in the Entity unit class. As for markdown and also the new yaml format adapter, you can see it here:

If you notice, I also added the inline entity value (as requested here) for entities with synonyms for both markdown and yaml. The rasa_role_group branch in the forked repository has passed the current unit tests. However, I think I should add new test for this implementation.

There are few things that I still need to do:

  • complete the yaml adapter by implementing _get_base_to_extend and check_base_file_contents functions
  • added ruamel.yaml to the requirements when installing. This is my first time doing this, so do I just need to add the new dependency to setup.py?
  • add new unit tests for this implementation(?)

If you have any more suggestions, please tell me.

Cheers!

@SimGus
Copy link
Owner

SimGus commented Mar 30, 2021

Hey!

Props to you for implementing most of this so fast!

About the fact that you can't raise an error if the annotation syntax is not correct, you are actually right.
I would have suggested not saying anything in that case to avoid cluttering the output, but it makes sense to write a warning since it will be useful to the user in a majority of cases.

That being said, I don't see how we can easily detect that what is after a slot is an incorrect annotation. For instance, I assume those examples should raise a warning:

@[slot]('problem': 'not closed'
@[slot]"problem': 'not opened')
@[slot](incorrect annotation contents)

but this shouldn't

@[slot]some word

Did you have something in mind to avoid raising a warning for every slot that doesn't have an annotation?

As I said above, if it turns out it's hard to detect that the user mistyped an annotation, it seems acceptable to me to not print any warning whatsoever.

The way you made the changes to the parser is exactly how I would have done it :)
About the adapters, it looks very good, but if I can make one remark: chatette is still supporting older versions of python (notably version 2.7 and 3.4 -- I think I'll drop support for both of them soon, but I first need to deprecate them for a few versions). For that reason, we cannot use f-strings in the code yet (even though I'd really like to). It's usually not too hard to replace them with str.format().

Thanks for adding support for inline entities at the same time. If you think adding a few tests can help catching errors quickly, feel free to do it. I'm not aiming at 100% coverage, just enough coverage to catch big mistakes when refactoring.
Btw, thanks for making the yaml adapter as well :)

If you can, it would be nice to add a unit test that checks the parser is correctly parsing annotations for slots. Maybe a few tests for the new SlotRoleGroupReference could be valuable as well.

For the new dependency, you will need to add it in setup.py (this allows pip to install the dependencies automatically) and in requirements/common.txt (this allows to manually install the dependencies, for instance when setting up a venv). After that, everything should be working.

When you are done with the implementation, can you please open a pull request so I can review it more in depth and merge it?

Cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement of an existing feature question Interrogation on the functioning of the program
Projects
None yet
Development

No branches or pull requests

2 participants