Skip to content
This repository has been archived by the owner on Jan 10, 2023. It is now read-only.

SLING with many frames on a small dataset #446

Open
JoshuaMathias opened this issue Aug 26, 2020 · 3 comments
Open

SLING with many frames on a small dataset #446

JoshuaMathias opened this issue Aug 26, 2020 · 3 comments

Comments

@JoshuaMathias
Copy link

These questions are about using SLING on a small data set.

Motivation for using SLING: SLING appears to be unique in providing a good framework for specifying and training on arbitrary entity relationships that originate from text, while still allowing entities that are not directly tied to specific tokens.

However, on limited data, my confidence in SLING is based on the following assumptions:

  1. Specifying many entities (frames) that are based on logical relationships will aid training (a different kind of feature engineering). Some identification of entities and their relationships is complex, and other aspects are logic-based. If the logical relationships and hierarchies are specified, then the model will only need to focus on what's not specified, as long as it's able to handle a large amount of entities with a small dataset. So the main problem to avoid would be overfitting, but if I understand right specifying more related entities to keep things in check can help to not overfit.
  2. The use of hierarchical frames helps leverage the training data. For example, there are some features (in the text) consistent for all sub-frames of the parent frame, and some things unique among the sub-frames. Will the SLING model take advantage of this in the training data? My intuition is yes, since once it identifies the parent, it knows that the children must meet the requirements of the parent.

I'd appreciate any insights you have or corrections to my understanding. Thank you.

@ringgaard
Copy link
Contributor

ringgaard commented Aug 27, 2020

Just as other deep models, the SLING parser training is "data hungry". It needs a fair amount of training data, and more data is better data!

We are working on a silver annotation pipeline, which can take a Wikipedia and generate synthetic training data using the Wikidata knowledge base and a bunch of heuristics.

PS: Please notice that the SLING project has moved to https://github.com/ringgaard/sling.

@JoshuaMathias
Copy link
Author

JoshuaMathias commented Sep 5, 2020

Thanks for the pointers.

  1. On the silver annotation pipeline, the idea here is to use natural text from Wikipedia articles and annotate this data automatically using the Wikidata information that corresponds to the article? (that is, try to guess what part of the text corresponds with the knowledge extracted in Wikidata).
  2. I assume the use of heuristics for the silver annotation pipeline is independent of SLING training. Do you have any pointers about adding support for custom logical relationships among frames? For example: frame 1 requires at least a certain number of instances of a specific role, or frame 1 requires there to be a frame 2 in the same sentence.
  3. Let's say there's a much larger dataset that includes some frames, and then a smaller dataset that includes all frames. If the frames not included build upon the frames of the larger dataset, is it reasonable to expect more effective identification of these frames, or would some customization of the model be required due to the lack of these frames in the large dataset?

@ringgaard
Copy link
Contributor

The current parser is just supervised learning with cross-entropy loss. While the silver data pipeline makes use of the knowledge base, the parser training does not (yet) use the knowledge base during training. The silver annotations are noisy, especially when it comes to recall. There are many facts mentioned in Wikipedia that are not known in Wikidata, so the ROLE score is not very high when measured on the silver data.

However, the silver parser is only meant to be the first step in the parser training. We are working on adding support for reinforcement learning (RL) to the parser trainer. After the parser has been trained on the silver data, the next phase is to use RL to "fine-tune" the parser. The silver parser is then used for sampling during the RL training. The hope is that the silver model will assign some probability mass for the correct (unknown) golden annotations and that the correct annotation will get a higher reward than the silver annotations. We have a plausibility model that we hope can be used as part of the reward function.

I am not sure I understand what you mean by "custom logical relationships among frames". You can add you own annotators to the silver annotation pipeline which can add or modify the annotations.

You could try to run the training in two phases. First the silver training, and then train on your own training data set. It might work because the silver training will pretrain the contextual embeddings, but you will have to try to do the experiment to see how it works. There is an option to start training from an existing model which you can use.

One thing to be aware of is that you will have to add all the roles you need. Currently the role set is either determined from the training data (caspar) or a fixed set (knolex) that is currently hard-coded.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants