Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Input sequences contain part of the outputs in CNN-for-Text-Classification #8

Open
ducalpha opened this issue May 30, 2018 · 0 comments

Comments

@ducalpha
Copy link

Step [54] "data = [[d.split(':')[1][:-1], d.split(':')[0]] for d in data]" seems to include the sub-category output into the input sequence.
For example, for data line "DESC:def What is ethology ?", data will be ["def What is ethology ?", "DESC"] so the "def" sub-category is included into the input sequence.

I suggest a fix:

# Remove the sub-category (first word) and the '?' at the end.
data = [[d.split(':')[1].split(' ', 1)[1][:-2], d.split(':')[0]] for d in data]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant