Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sequence labeling for documents #372

Closed
mouthgalya opened this issue Sep 5, 2019 · 3 comments
Closed

Sequence labeling for documents #372

mouthgalya opened this issue Sep 5, 2019 · 3 comments
Labels
question Further information is requested

Comments

@mouthgalya
Copy link

Hi
We would want to import the whole document and file content and assign labels to the document content. Currently doccano automatically parses the input and separates out the individual sentences. Can we instead do sequence labeling at the document level??
MG

@icoxfog417 icoxfog417 added the question Further information is requested label Sep 5, 2019
@icoxfog417
Copy link
Contributor

You can do sequence labeling at the document level. But I recommend separating the document to each sentence to make model training easily.

@mouthgalya
Copy link
Author

But we are noticing that the document is by default split into lines(because of line feed characters) when we import the data. Is there any way to override this default behavior so that we can see the entire document content in the screen

@icoxfog417
Copy link
Contributor

icoxfog417 commented Sep 6, 2019

doccano separates each data by line break now. But you can use the hack that replace line break to \n. Please refer #330.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants