-
Notifications
You must be signed in to change notification settings - Fork 7
ted crashs/hangs on binary files [bug] #40
Comments
Maybe its a problem with the utf8 encoding/decoding mechanism |
Try opening without utf8 |
Sorry, closed for accident. |
Is there a way to disable utf8? |
Ok I'll try this asap |
Ok I tried it and seems like that when commenting that part of the code, lines are no longer misplaced, but now ted hangs on line ~450. |
Maybe the problem lies within a call to utf8ToMultibyte from show_lines, but I am unsure on how to handle that |
I have retried and ted reads successfully till the end of the binary file, however it still misplaces the cursor sometimes and up to a certain point becomes so clunky I can't even tell if its crashed or just slow, so this seems like a bug to me |
I think we need a system to ignore characters that don't represent anything in text. |
By replacing with something like this (https://www.compart.com/en/unicode/U+16844) ? I think its a very good idea, and even production grade editors do that. |
The worst problem is that in case of a malformed unicode sequence we could read arrays out of bound or end up in other UB situations |
Also I have successfully read over 5000 lines with ted, so this theory about memory is officially disproved.
|
In theory, ted can read ~4gb |
I'll look into this, quoting from rfc 3629
|
I have tried to add the utf8 validation but I am unsure if it is actually working well, I would appreciate if you can check here https://github.com/bynect/Teditor/blob/utf8try/src/utf8.c (I have made only small changes) I tried to replace all invalid codepoints with this one |
I cut it out from the screenshot, but the file was detected as lf, and to my knowledge all files, binary or not, are lf terminated on Linux |
If you download a file that was made in Windows, it probally will be CRLF. |
Well yes, but in this case the file was produced on Linux, specifically by gcc from a little test c code |
I will take a look at it later, now I am changing other things in the code. |
Regarding this I think I have just found the root cause of misplaced lines et al, which is the way read_lines read linebreaks. Ive also found out that binary files are CRLF terminated even on linux, and this problem is related to that |
Probably the problem is that ted is trying to find the type of the line break, while it does not have a consistent line break type. |
Exactly, furthermore I found that read_lines discards a character after having found a carriage return without checking for the linefeed in CRLF mode. Also I was trying to implement an heuristics that would guess in an acceptable manner if the file is valid utf8/non binary. |
If we remove CR line break support, we can just ignore carriage returns when reading the file. |
CR is not used in any modern operating system. There is no why to support it. |
|
I erroneously opened a binary file in ted and it behaved strangely (misplaced cursor and line numbers), and after a certain line (300 or something) just hanged. I have yet to reproduce this behavior on multiple files, but I will post updates on this bug soon.
The text was updated successfully, but these errors were encountered: