New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid truncating a pattern in the middle of an UTF-8 character #1807
Conversation
`-1` was for stripping the newline, and has no reason to be repeated for truncation length as truncation occurs only if it would result in a strictly shorter string anyway (at most `len - 1` already).
651ce41
to
c971a39
Compare
func95()func955,110 | ||
func97func979,219 | ||
func97(func979,219 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@masatake do we wanna change this test case to balance the previous off-by-one-ness? I'm not sure, because now it properly checks the behavior when the line is the exactly as long, short and longer than the truncation, which seems a good thing to do, but you might remember if you had another goal here maybe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that there a bug has been existed since Exuberant-crags about newline handling.
line
returned from readLineFromBypassAnyway
can end with or without a newline char.
Both can occurs. Though the code trims the line with line[len - 1] = '\0''
.
To combine your fix, I think the diff should be:
diff --git a/main/writer-etags.c b/main/writer-etags.c
index 762c8378..dbd6470f 100644
--- a/main/writer-etags.c
+++ b/main/writer-etags.c
@@ -98,18 +98,18 @@ static int writeEtagsEntry (tagWriter *writer,
long seekValue;
char *const line =
readLineFromBypassAnyway (etags->vLine, tag, &seekValue);
- if (line == NULL)
+ if (line == NULL || line[0] == '\0')
return 0;
len = strlen (line);
if (tag->truncateLineAfterTag)
truncateTagLineAfterTag (line, tag->name, true);
- else
- line [len - 1] = '\0';
+ else if (line [len - 1] == '\n')
+ line [--len] = '\0';
if (Option.patternLengthLimit < len)
- line [Option.patternLengthLimit - 1] = '\0';
+ line [Option.patternLengthLimit] = '\0';
length = mio_printf (mio, "%s\177%s\001%lu,%ld\n", line,
tag->name, tag->lineNumber, seekValue);
@b4n, thank you very much. I will take a look at this weekend. |
@b4n, thank you for working on this topic. |
Value returned by `readLineFromBypassAnyway()` can end with or without a newline character. Thus, make sure to handle both cases gracefully. Patch by @masatake.
@masatake Thanks for the review! I made the changes, tell me if it was what you were expecting. |
Avoid truncating a pattern in the middle of a UTF-8 character
Fixes #1275, fixes #1805.