Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert csv to conllu format #102

Open
mastifatchiya opened this issue Nov 21, 2022 · 1 comment
Open

Convert csv to conllu format #102

mastifatchiya opened this issue Nov 21, 2022 · 1 comment

Comments

@mastifatchiya
Copy link

Hi everyone, I need help.
So I'm doing my research about contextual topic model for my thesis and I'm about to try using ELMo for Indonesian.
Unfortunately, my data was tweet and it was saved in csv format while this pre-trained is required conllu format.
My question is how to convert from csv to conllu format? So far, I found the way convert to from string to conllu but not document yet. Is there any advice for this? Thank you in advance guys.

@melanchthon19
Copy link

I suggest you look at this repo https://github.com/EMBEDDIA/supar-elmo#Usage
They have a util function to achieve what you want in an easy way:

from supar.utils import CoNLL
print(CoNLL.toconll(['She', 'enjoys', 'playing', 'tennis', '.']))
1       She     _       _       _       _       _       _       _       _
2       enjoys  _       _       _       _       _       _       _       _
3       playing _       _       _       _       _       _       _       _
4       tennis  _       _       _       _       _       _       _       _
5       .       _       _       _       _       _       _       _       _

Just read in your csv and tokenize your text.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants