Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The usage of '<oov>' is not consistent with the paper #39

Open
plasmashen opened this issue Jan 4, 2021 · 3 comments
Open

The usage of '<oov>' is not consistent with the paper #39

plasmashen opened this issue Jan 4, 2021 · 3 comments

Comments

@plasmashen
Copy link

plasmashen commented Jan 4, 2021

In paper, the importance score of the word is calculated by removing this word, but you use '<oov>' to replace this word to calculate the importance score in
https://github.com/jind11/TextFooler/blob/master/attack_classification.py#L216

Moreover, the '<oov>' will be tokenized into 4 tokens which may have attention affects with other words.
I'm wondering why such nonsensical '<oov>' is used?

@jind11
Copy link
Owner

jind11 commented Jan 7, 2021

hi, I have tested both methods: removing the word or replacing it with "" and the difference is not obvious. is in the vocab so I don't think it can be tokenized into 4 tokens. Let me know if you have more questions.

@Youoo1
Copy link

Youoo1 commented Oct 20, 2021

Where is the emdding.npz file, please? Or how is it generated?
7a678cd5f2a8398b7980d8aaa9d5aec
b9069123768ea397299dc7ed1419901

@jind11
Copy link
Owner

jind11 commented Oct 21, 2021

The readme file has explained how to obtain the embeddings:
Run the following code to pre-compute the cosine similarity scores between word pairs based on the counter-fitting word embeddings [https://drive.google.com/file/d/1bayGomljWb6HeYDMTDKXrh0HackKtSlx/view].

python comp_cos_sim_mat.py [PATH_TO_COUNTER_FITTING_WORD_EMBEDDINGS]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants