Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about string cleaning #38

Open
Opdoop opened this issue Dec 27, 2020 · 1 comment
Open

Questions about string cleaning #38

Opdoop opened this issue Dec 27, 2020 · 1 comment

Comments

@Opdoop
Copy link

Opdoop commented Dec 27, 2020

Thanks for this solid work.
In the clean_str, it seems that Every dataset is lower cased except for TREC but in the example, in Table 6 the sentence is cased. This looks like a conflict to me.

def clean_str(string, TREC=False):

Also in clean_str say Tokenization/string cleaning for all datasets except for SST.
Did you train the model on a cleaned uncased dataset but test it on a cased raw dataset? But the split 1000 dataset in 'data' is uncased. I'm really confused. Is there something I have missed?
I apologize that I didn't go through your code before directly asking the question. That would be very generous and helpful. Thanks in advance~

@jind11
Copy link
Owner

jind11 commented Jan 7, 2021

hi, I am so sorry for the late response. Actually the attack is conducted on uncased text in experiments and I formatted the text to cased one in Table 6 just for better looking in the paper. Let me know if you have more questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants