Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First entry of gitignore searched if it starts with BOM #2177

Open
tvrg opened this issue Apr 11, 2022 · 3 comments · May be fixed by #2782
Open

First entry of gitignore searched if it starts with BOM #2177

tvrg opened this issue Apr 11, 2022 · 3 comments · May be fixed by #2782
Labels
bug A bug.

Comments

@tvrg
Copy link

tvrg commented Apr 11, 2022

What version of ripgrep are you using?

ripgrep 13.0.0
-SIMD -AVX (compiled)
+SIMD +AVX (runtime)

How did you install ripgrep?

pacman -S ripgrep

What operating system are you using ripgrep on?

Arch Linux

Describe your bug.

When a .gitignore starts with a BOM, ripgrep does not ignore the first entry of the .gitignore. git itself ignores the first entry.

What are the steps to reproduce the behavior?

$ echo "\xef\xbb\xbftest\nignoreme" > .gitignore && echo "test" > ignoreme
$ rg test

What is the actual behavior?

$ rg --debug test
DEBUG|grep_regex::literal|crates/regex/src/literal.rs:58: literal prefixes detected: Literals { lits: [Complete(test)], limit_size: 250, limit_class: 10 }
DEBUG|globset|crates/globset/src/lib.rs:421: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|crates/globset/src/lib.rs:421: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|crates/globset/src/lib.rs:421: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|crates/globset/src/lib.rs:416: glob converted to regex: Glob { glob: "**/npm-debug.log*", re: "(?-u)^(?:/?|.*/)npm\\-debug\\.log[^/]*$", opts: GlobOptions { case_insensitive: false, literal_separator: true, backslash_escape: true }, tokens: Tokens([RecursivePrefix, Literal('n'), Literal('p'), Literal('m'), Literal('-'), Literal('d'), Literal('e'), Literal('b'), Literal('u'), Literal('g'), Literal('.'), Literal('l'), Literal('o'), Literal('g'), ZeroOrMore]) }
DEBUG|globset|crates/globset/src/lib.rs:416: glob converted to regex: Glob { glob: "**/yarn-debug.log*", re: "(?-u)^(?:/?|.*/)yarn\\-debug\\.log[^/]*$", opts: GlobOptions { case_insensitive: false, literal_separator: true, backslash_escape: true }, tokens: Tokens([RecursivePrefix, Literal('y'), Literal('a'), Literal('r'), Literal('n'), Literal('-'), Literal('d'), Literal('e'), Literal('b'), Literal('u'), Literal('g'), Literal('.'), Literal('l'), Literal('o'), Literal('g'), ZeroOrMore]) }
DEBUG|globset|crates/globset/src/lib.rs:416: glob converted to regex: Glob { glob: "**/yarn-error.log*", re: "(?-u)^(?:/?|.*/)yarn\\-error\\.log[^/]*$", opts: GlobOptions { case_insensitive: false, literal_separator: true, backslash_escape: true }, tokens: Tokens([RecursivePrefix, Literal('y'), Literal('a'), Literal('r'), Literal('n'), Literal('-'), Literal('e'), Literal('r'), Literal('r'), Literal('o'), Literal('r'), Literal('.'), Literal('l'), Literal('o'), Literal('g'), ZeroOrMore]) }
DEBUG|globset|crates/globset/src/lib.rs:421: built glob set; 1 literals, 6 basenames, 3 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 3 regexes
DEBUG|globset|crates/globset/src/lib.rs:421: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|crates/globset/src/lib.rs:421: built glob set; 0 literals, 1 basenames, 0 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|crates/globset/src/lib.rs:421: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|ignore::walk|crates/ignore/src/walk.rs:1741: ignoring ./.git: Ignore(IgnoreMatch(Hidden))
DEBUG|ignore::walk|crates/ignore/src/walk.rs:1741: ignoring ./.gitignore: Ignore(IgnoreMatch(Hidden))
ignoreme
1:test
DEBUG|globset|crates/globset/src/lib.rs:421: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|crates/globset/src/lib.rs:421: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|crates/globset/src/lib.rs:421: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|crates/globset/src/lib.rs:421: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|crates/globset/src/lib.rs:421: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|crates/globset/src/lib.rs:421: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|crates/globset/src/lib.rs:421: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|crates/globset/src/lib.rs:421: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes

What is the expected behavior?

It should ignore ignoreme because git does so as well. See e.g. git status.

@BurntSushi
Copy link
Owner

I believe the description of your bug report is correct, but your reproduction isn't. rg isn't ignoring ignoreme in your particular example because of the BOM, it's not ignoring it because it doesn't respect .gitignore outside of a git repository. You either need to run git init or mv .gitignore .ignore. Here's my repro:

$ echo "\xef\xbb\xbfignoreme1\nignoreme2" > .ignore && echo "test" > ignoreme1 && echo "test" > ignoreme2
$ rg test
ignoreme1
1:test

But rg test should return no results. Instead, it only ignores ignoreme2 and not ignoreme1. If I drop the BOM from the .ignore file, then ripgrep correctly ignores both .ignoreme1 and .ignoreme2.

I suspect the fix will be to tweak this code to handle the BOM:

pub fn add<P: AsRef<Path>>(&mut self, path: P) -> Option<Error> {
let path = path.as_ref();
let file = match File::open(path) {
Err(err) => return Some(Error::Io(err).with_path(path)),
Ok(file) => file,
};
let rdr = io::BufReader::new(file);
let mut errs = PartialErrorBuilder::default();
for (i, line) in rdr.lines().enumerate() {
let lineno = (i + 1) as u64;
let line = match line {
Ok(line) => line,
Err(err) => {
errs.push(Error::Io(err).tagged(path, lineno));
break;
}
};
if let Err(err) = self.add_line(Some(path.to_path_buf()), &line) {
errs.push(err.tagged(path, lineno));
}
}
errs.into_error_option()
}

@BurntSushi BurntSushi added the bug A bug. label Apr 11, 2022
@tvrg
Copy link
Author

tvrg commented Apr 11, 2022

Oh, yes, sorry for the incomplete reproduction. I was actually running the commands in an initialized git repo.

I'll try to fix this in the next days with the help of your pointer above if nobody else is faster.

tvrg added a commit to tvrg/ripgrep that referenced this issue Apr 12, 2022
@starthal
Copy link

Here is what Git does: https://github.com/git/git/blob/21306a098c3f174ad4c2a5cddb9069ee27a548b0/utf8.c#L794-L803

const char utf8_bom[] = "\357\273\277";

int skip_utf8_bom(char **text, size_t len)
{
	if (len < strlen(utf8_bom) ||
	    memcmp(*text, utf8_bom, strlen(utf8_bom)))
		return 0;
	*text += strlen(utf8_bom);
	return 1;
}

If ignore intends to match its behavior then it should just skip this sequence in the first line.

starthal added a commit to starthal/ripgrep that referenced this issue Apr 17, 2024
@starthal starthal linked a pull request Apr 17, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A bug.
Projects
None yet
3 participants