-
-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improvment #37
Comments
It's great you tried this! I don't really know to be honest - perhaps you could tell me more about the domain you are looking at (which dataset?). Have you observed a specific problem that motivated the addition of these features? |
Since the current features of the vectors are independent of the context, I tried to add some context-sensitive features. Currently I am working with RSS-dataset to train the model (although I have tried with the merged_RSS_istex dataset as well as). |
We don't have a specific domain, we would like this to work on any content. |
Is there any suggestion regarding which dataset might be more helpful for our goal? |
At the moment there is still some dependency on the context (as we discussed before here) - that was designed to "replace" context-sensitive features, in a sense. But it is totally possible that directly adding context-sensitive features helps too! If you want to improve the performance of the heuristics, I would do as follows:
For me this process is very much examples-driven - I design the features with some examples in mind. |
Yeah, the goal of the adjacency matrix in your work is to make it context-sensitive. |
Hi Antonin
I am working on Opentapioca to improve its accuracy to some extend in order to apply it in our project.
I tried to use other features besides the current features of the vectors.
connection_count: connection_count(tagi) = sum(tagi.edges.intersection(hrtag j)/ hrtag j.edges), hrtag j is the tag with the highest rank among the detected tags for phrase j.
hop_count: hop_count(tagi)= sum(1- tagi.edges.intersection(tagj)/ tagi.edges.union(tagj)), j is any detected tag for any phrase in the input sentence.
cosine_similarity: applying S-Bert to generate embeddings of descriptions of tag-candidates and the input sentence and then using cosine similarity between the generated vectors.
I also used XGBoost ranker(learn to rank) instead of SVM classifier.
None of mentioned solutions fulfilled increasing F1.
Do you have any suggestion for me?
The text was updated successfully, but these errors were encountered: