Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Big-ish files not being indexed #444

Open
critchtionary opened this issue Dec 19, 2022 · 1 comment
Open

Bug: Big-ish files not being indexed #444

critchtionary opened this issue Dec 19, 2022 · 1 comment
Labels

Comments

@critchtionary
Copy link

critchtionary commented Dec 19, 2022

Version 0.5.1, running in Docker.

In one of our repositories we have a ~150KB YAML file that Hound does not provide search results for. The issue seems somewhat related to the size of the file, as if I split the file into two separate files, both halves can be searched. However, we have another 2.2MB JSON file that search is working perfectly fine.

Steps to reproduce:

  1. Create a new Git repo
  2. Commit exampleyaml.txt (have replaced all YAML values with random strings to remove any sensitive data)
  3. Configure hound to index this repo
  4. Attempt to search for a string in this file e.g. permanent

Something else that points to it being size-related is my first attempt to remove sensitive data replaced every character in a value string with a. This file was searchable in Hound, possibly because it was able to compress to a smaller size.

Splitting this file is not a suitable workaround, as it's possible that there are other files that are not searchable that we are not aware of.

@salemhilal salemhilal added the bug label Dec 20, 2022
@salemhilal
Copy link
Contributor

Hmm, thank you for opening this bug. I wonder if it's related to the 32-bit-based indexing Hound uses (see #351). That would align with your attempt to replace everything with the letter a, since that would in theory create a much smaller index. That would unfortunately mean waiting until we have the time to rewrite the indexing to use 64-bit offsets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants