-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a function to compound tokens with skips #2104
Comments
It would be more useful and easy to implement
Here, all the unigrams are removed and skipgrams are created within the window. |
That seems to me like a very good way to implement it. Things to think about:
|
I have no good idea to remove original tokens in the first example, so I will leave this branch for the moment. |
Any rule: If a token is used in any compound, remove it (as it works now without a skip argument). All rule: If a token is is used in all compounds, remove it. Only applies if length(skip) > 1. The first already works, when skip is not used. |
Following the discussion on #2102, I created the dev-skipgram2 branch to add
skip
totokens_compound()
. I managed to make it possible to generate skipgrams, but removing original tokens of compounds appeared very difficult. I also discovered that we havewindow
already.Created on 2021-04-13 by the reprex package (v1.0.0)
The text was updated successfully, but these errors were encountered: