New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't upload a file with line break inside text #330
Comments
We don't support the text includes line breaks. Please refer the discussion at #34. |
I managed to solve it via .jsonl file. The data that I previously showed was saved as following:
Saved every document in a single line, with a "\n" for every line break. It doesn't appear in the "Dataset" section: When annotating the examples, the line breaks are rendered successfully: The same approach works when dealing with csv format, but unfortunately it requires at least one label, not allowing using it as an empty array. Because I don't want to send any label value, I had to use jsonl format, as it seems to be the only one allowing an empty label array. The txt/plain text format expects one example per line, not being able to support line breaks at all. |
I copy your example, and save into a.txt, but it does not work , it still can not render line breaks. My project type is sequence labeling. |
System information
Windows 10
3.6.4
Describe the problem
I'm trying to upload a file where the texts are not a single line, but they can have line breaks inside of them. Even when using JSON file format to separate every text as property instead of a line, when upload a file, it seems Doccano still separate via line break.
Source code / logs
For instance, this is a JSON file I'm trying to upload with a single text inside of it:
[{"text": "Processo 0000637-15.2012.8.12.0003 (003.12.000637-8) - Procedimento Comum - Inadimplemento Reqte: Fabiano Neves Gon\u00e7alves ADV: PAULO DE TARSO AZEVEDO PEGOLO (OAB 10789/MS) ADV: HENRIQUE LIMA (OAB 9979/MS) ADV: GUILHERME FERREIRA DE BRITO (OAB 9982/MS) ADV: RODRIGO LOUREIRO (OAB 13583/MS) ADV: FRANCIELLI SANCHEZ SALAZAR (OAB 15140/MS) ADV: JAC\u00d3 CARLOS SILVA COELHO (OAB 15155A/MS) ADV: IVONE CONCEI\u00c7\u00c3O SILVA (OAB 13609/MS) 1.\nCom o tr\u00e2nsito em julgado da senten\u00e7a de fl. 393 e satisfa\u00e7\u00e3o integral do cr\u00e9dito, o of\u00edcio jurisdicional acha-se cumprido e acabado, raz\u00e3o por que indefiro o pedido de digitaliza\u00e7\u00e3o do feito (fl. 416).\nAdemais, tramitam nessa unidade judici\u00e1ria milhares de processos e se for admitida a digitaliza\u00e7\u00e3o de todos os feitos finalizados, haver\u00e1 atraso injustificado nas atividades do cart\u00f3rio, pois \u00e9 necess\u00e1rio grande lapso temporal do servidor para este fim.\n2.\nDever\u00e1 o cart\u00f3rio promover a retifica\u00e7\u00e3o do advogado da Mafre Vida S/A no sistema SAJ, para futuras publica\u00e7\u00f5es e intima\u00e7\u00f5es, conforme declinado \u00e0 fl. 416.\nIntimem-se.\nAp\u00f3s, arquive-se."}]
The idea was to visualize the text with line breaks when showing it during the annotation process, but instead what we got was that Doccano was transforming every phrase in a text by itself. For comparison, this same text was uploaded as this:
As the image shows, the text was broke in every line break, and every substring was dealed as a document alone.
The text was updated successfully, but these errors were encountered: