Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query would print all files indexed in repo #260

Open
stillonearth opened this issue Sep 26, 2023 · 5 comments
Open

Query would print all files indexed in repo #260

stillonearth opened this issue Sep 26, 2023 · 5 comments

Comments

@stillonearth
Copy link

I'm trying to run a query against a small repo with 2 python files, but any query would return the same result.

@kantord
Copy link
Owner

kantord commented Sep 27, 2023

Hi @stillonearth , can you please share me some examples of queries you did if it's not private? Also having access to the repo would be great.

Also, I would look at what the first results are, rather than all results. One difference between grep and SeaGOAT is that grep will just show all matches in whatever order it likes, whereas SeaGOAT will order matches based on their relevance. In this regard it is a bit like Google, the first few results could be very relevant whereas the rest might be less related to your query.

SeaGOAT does remove irrelevant results, but the threshold is quite lax, probably it will return a lot of things. And for a small repo, it could return nearly the whole repo. So based on your experience maybe we can fine-tune the threshold to make it a little bit more strict.

Another thing you can do is only show the first N results. I have been experimenting this and for the most part I usually find what I'm looking for within the first few results. If you do gt "my query" -l25 it should return the first 25 items

@kantord
Copy link
Owner

kantord commented Sep 27, 2023

In addition, any query that would result in the entire repo being printed in grep, would also result in the entire repo being printed in SeaGOAT

@GautierT
Copy link

I have the same issue.

I tried on the Seagot repo on the latest master branch.
With this query : gt "how to change the allowed file types ?"

And it gives me a lot of files (all ?)

  22   │         "server": {
  23   │             "type": "object",
  24   │             "additionalProperties": False,
  25   │             "properties": {
  26   │                 "port": {"type": "integer", "minimum": 1, "maximum": 65535},
  27   │                 "ignorePatterns": {"type": "array", "items": {"type": "string"}},
  28   │             },
  29   │         },
  30   │         "client": {
  31   │             "type": "object",
  32   │             "additionalProperties": False,
  33   │             "properties": {"host": {"type": "string"}},
  34   │         },
  35   │     },
  36   │ }
  37   │
  38   │ GLOBAL_CONFIG_DIR = Path(
  39   │     appdirs.user_config_dir(
  40   │         "seagoat-pytest" if "PYTEST_CURRENT_TEST" in os.environ else "seagoat"
  41   │     )
  42   │ )
  43   │ GLOBAL_CONFIG_FILE = GLOBAL_CONFIG_DIR / "config.yml"
  44   │
  45   │
  46   │ def validate_config_file(config_file: str):
  47   │     if os.path.exists(config_file):
  48   │         content = read_file_with_correct_encoding(config_file)
  49   │         new_config = yaml.safe_load(content) or {}
  50   │         jsonschema.validate(instance=new_config, schema=CONFIG_SCHEMA)
  51   │         return new_config
  52   │     return {}
  53   │
  54   │
  55   │ def extend_config_with_file(base_config, config_file):
  56   │     new_config = validate_config_file(config_file)
  57   │     return always_merger.merge(base_config, new_config) if new_config else base_config
  58   │
  59   │
  60   │ def get_config_values(repo_path: Path):
  61   │     config = copy.deepcopy(DEFAULT_CONFIG)
  62   │     repo_config_file = repo_path / ".seagoat.yml"
  63   │
  64   │     if GLOBAL_CONFIG_FILE.exists():
  65   │         config = extend_config_with_file(config, GLOBAL_CONFIG_FILE)
  66   │
  67   │     if repo_config_file.exists():
  68   │         config = extend_config_with_file(config, repo_config_file)
  69   │
  70   │     return config
───────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────
───────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       │ File: tests\test_json_serialization.py
───────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1   │ import os
   2   │ import tempfile
   3   │ from pathlib import Path
   4   │
   5   │ from seagoat.result import Result
   6   │ from seagoat.result import ResultLine
   7   │ from seagoat.result import ResultLineType
   8   │
   9   │
  10   │ def test_to_result_line_correct_output_example1():
  11   │     line = ResultLine(1, 0.5, "some line text", {ResultLineType.RESULT})
  12   │     result_dict = line.to_json("")
  13   │     assert result_dict == {
  14   │         "score": 0.25,
  15   │         "line": 1,
  16   │         "lineText": "some line text",
  17   │         "resultTypes": ["result"],
  18   │     }
  19   │
  20   │
  21   │ def test_to_result_line_correct_output_example2():
  22   │     line = ResultLine(2, 0.2, "another line of text", {ResultLineType.RESULT})
  23   │     result_dict = line.to_json("")
  24   │     assert result_dict == {
  25   │         "score": 0.1,
  26   │         "line": 2,
  27   │         "lineText": "another line of text",
  28   │         "resultTypes": ["result"],
  29   │     }
  30   │
  31   │
  32   │ def test_to_result_json_correct_output_example1():
  33   │     with tempfile.TemporaryDirectory() as tmpdirname:
  34   │         file_path = os.path.join(tmpdirname, "example1.txt")
  35   │         with open(file_path, "w", encoding="utf-8") as tmp_file:
  36   │             tmp_file.write("Line 1\nLine 2\nLine 3\n")
  37   │
  38   │         result = Result("example1.txt", Path(file_path))
  39   │         result.add_line(1, 0.5)
  40   │         result.add_line(2, 0.3)
  41   │
  42   │         result_dict = result.to_json("")
  43   │         assert result_dict == {
  44   │             "score": 0.225,
  45   │             "path": "example1.txt",
  46   │             "fullPath": file_path,
  47   │             "blocks": [
  48   │                 {
  49   │                     "lineTypeCount": {"result": 2},
  50   │                     "lines": [
  51   │                         {
  52   │                             "score": 0.25,
  53   │                             "line": 1,
  54   │                             "lineText": "Line 1",
  55   │                             "resultTypes": ["result"],
  56   │                         },
  57   │                         {
  58   │                             "score": 0.15,
  59   │                             "line": 2,
  60   │                             "lineText": "Line 2",
  61   │                             "resultTypes": ["result"],
  62   │                         },
  63   │                     ],
  64   │                 },
  65   │             ],
  66   │         }
  67   │
  68   │
  69   │ def test_to_result_json_correct_output_example2():
  70   │     with tempfile.TemporaryDirectory() as tmpdirname:
  71   │         file_path = os.path.join(tmpdirname, "example2.txt")
  72   │         with open(file_path, "w", encoding="utf-8") as tmp_file:
  73   │             tmp_file.write("This is line 1\nThis is line 2\nThis is line 3\n")
  74   │
  75   │         result = Result("example2.txt", Path(file_path))
  76   │         result.add_line(1, 0.5)
  77   │         result.add_line(3, 0.1)
  78   │
  79   │         result_dict = result.to_json("")
  80   │         assert result_dict == {
  81   │             "score": 0.075,
  82   │             "path": "example2.txt",
  83   │             "fullPath": file_path,
  84   │             "blocks": [
  85   │                 {
  86   │                     "lineTypeCount": {"result": 1},
  87   │                     "lines": [
  88   │                         {
  89   │                             "score": 0.25,
  90   │                             "line": 1,
  91   │                             "lineText": "This is line 1",
  92   │                             "resultTypes": ["result"],
  93   │                         },
  94   │                     ],
  95   │                 },
  96   │                 {
  97   │                     "lineTypeCount": {"result": 1},
  98   │                     "lines": [
  99   │                         {
 100   │                             "score": 0.05,
 101   │                             "line": 3,
 102   │                             "lineText": "This is line 3",
 103   │                             "resultTypes": ["result"],
 104   │                         },
 105   │                     ],
 106   │                 },
 107   │             ],
 108   │         }
───────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────
───────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       │ File: tests\test_ripgrep.py
───────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1   │ import pytest
   2   │
   3   │ from seagoat.engine import Engine
   4   │
   5   │
   6   │ @pytest.mark.asyncio
   7   │ async def test_includes_all_matching_lines_from_line(repo):
   8   │     repo.add_file_change_commit(
   9   │         file_name="events.txt",
  10   │         contents="""1: Nothing
  11   │         2: Battle of Waterloo 1815
  12   │         3:
  13   │         4: Moon landing 1969
  14   │         5: Unrelated data
  15   │         6: The first flight of the Wright Brothers 1903
  16   │         7: The signing of the Magna Carta 1215
  17   │         8: Some other information
  18   │         9: The fall of the Berlin Wall 1989
  19   │         """,
  20   │         author=repo.actors["John Doe"],
  21   │         commit_message="Add historical events",
  22   │     )
  23   │     seagoat = Engine(repo.working_dir)
  24   │     seagoat.analyze_codebase()
  25   │     my_query = "19"
  26   │     seagoat.query(my_query)
  27   │     await seagoat.fetch()
  28   │
  29   │     assert seagoat.get_results()[0].path == "events.txt"
  30   │     assert set(seagoat.get_results()[0].get_lines(my_query)) == {4, 6, 9}
  31   │
  32   │
  33   │ @pytest.mark.asyncio
  34   │ async def test_search_is_case_insensitive(repo):
  35   │     repo.add_file_change_commit(
  36   │         file_name="events.txt",
  37   │         contents="""1: Nothing
  38   │         2: Battle of Waterloo 1815
  39   │         3:
  40   │         4: Moon landing 1969
  41   │         5: Unrelated data
  42   │         6: The first flight of the Wright Brothers 1903
  43   │         7: The signing of the Magna Carta 1215
  44   │         8: Some other information
  45   │         9: The fall of the Berlin Wall 1989
  46   │         """,
  47   │         author=repo.actors["John Doe"],
  48   │         commit_message="Add historical events",
  49   │     )
  50   │     seagoat = Engine(repo.working_dir)
  51   │     seagoat.analyze_codebase()
  52   │     my_query = "UNRELATED"
  53   │     seagoat.query(my_query)
  54   │     await seagoat.fetch()
  55   │
  56   │     assert seagoat.get_results()[0].path == "events.txt"
  57   │     assert set(seagoat.get_results()[0].get_lines(my_query)) == {5}
  58   │
  59   │
  60   │ @pytest.mark.asyncio
  61   │ async def test_respects_file_extension_restrictions(repo):
  62   │     repo.add_file_change_commit(
  63   │         file_name="rock.mp3",
  64   │         contents="19",
  65   │         author=repo.actors["John Doe"],
  66   │         commit_message="Add music file",
  67   │     )
  68   │     seagoat = Engine(repo.working_dir)
  69   │     seagoat.analyze_codebase()
  70   │     my_query = "19"
  71   │     seagoat.query(my_query)
  72   │     await seagoat.fetch()
  73   │
  74   │     assert "rock.mp3" not in [result.path for result in seagoat.get_results()]
  75   │
  76   │
  77   │ @pytest.mark.parametrize(
  78   │     "context_above,context_below,expected_lines",
  79   │     [
  80   │         (0, 0, {4, 6, 9}),
  81   │         (2, 0, {2, 3, 4, 5, 6, 7, 8, 9}),
  82   │         (0, 2, {4, 5, 6, 7, 8, 9, 10, 11}),
  83   │         (2, 3, {2, 3, 4, 5, 6, 7, 8, 9, 10, 11}),
  84   │     ],
  85   │ )
  86   │ @pytest.mark.asyncio
  87   │ async def test_includes_context_lines_properly(
  88   │     repo, context_above, context_below, expected_lines
  89   │ ):
  90   │     repo.add_file_change_commit(
  91   │         file_name="events.txt",
  92   │         contents="""1: Nothing
  93   │         2: Battle of Waterloo 1815
  94   │         3:
  95   │         4: Moon landing 1969
  96   │         5: Unrelated data
  97   │         6: The first flight of the Wright Brothers 1903
  98   │         7: The signing of the Magna Carta 1215
  99   │         8: Some other information
 100   │         9: The fall of the Berlin Wall 1989
 101   │         10: Random event
 102   │         11: Another unrelated data
 103   │         """,
 104   │         author=repo.actors["John Doe"],
 105   │         commit_message="Add historical events",
 106   │     )
 107   │     seagoat = Engine(repo.working_dir)
 108   │     seagoat.analyze_codebase()
 109   │     my_query = "19"
 110   │     seagoat.query(my_query)
 111   │     seagoat.fetch_sync(context_above=context_above, context_below=context_below)
 112   │
 113   │     assert seagoat.get_results()[0].path == "events.txt"
 114   │     assert set(seagoat.get_results()[0].get_lines(my_query)) == expected_lines
 115   │
 116   │
 117   │ @pytest.mark.asyncio
 118   │ async def test_does_not_cause_an_error_when_an_unanalized_file_is_found(repo):
 119   │     repo.add_file_change_commit(
 120   │         file_name="another_file.txt",
 121   │         contents="""asdf""",
 122   │         author=repo.actors["Alice Smith"],
 123   │         commit_message="",
 124   │     )
 125   │     seagoat = Engine(repo.working_dir)
 126   │     seagoat.analyze_codebase()
 127   │     repo.add_file_change_commit(
 128   │         file_name="events.txt",
 129   │         contents="""1: Nothing
 130   │         2: Battle of Waterloo 1815
 131   │         3:
 132   │         4: Moon landing 1969
 133   │         5: Unrelated data
 134   │         6: The first flight of the Wright Brothers 1903
 135   │         7: The signing of the Magna Carta 1215
 136   │         8: Some other information
 137   │         9: The fall of the Berlin Wall 1989
 138   │         """,
 139   │         author=repo.actors["John Doe"],
 140   │         commit_message="Add historical events",
 141   │     )
 142   │     my_query = "19"
 143   │     seagoat.query(my_query)
 144   │     await seagoat.fetch()
 145   │
 146   │     assert seagoat.get_results()[0].path == "events.txt"
 147   │
 148   │
 149   │ @pytest.mark.asyncio
 150   │ async def test_ripgrep_respects_custom_ignore_patterns(repo, create_config_file):
 151   │     create_config_file({"server": {"ignorePatterns": ["**/events.txt"]}})
 152   │
 153   │     repo.add_file_change_commit(
 154   │         file_name="history/files/events.txt",
 155   │         contents="Battle of Waterloo 1815",
 156   │         author=repo.actors["John Doe"],
 157   │         commit_message="commit",
 158   │     )
 159   │
 160   │     repo.add_file_change_commit(
 161   │         file_name="events.txt",
 162   │         contents="Moon landing 1969",
 163   │         author=repo.actors["John Doe"],
 164   │         commit_message="Add historical events",
 165   │     )
 166   │
 167   │     seagoat = Engine(repo.working_dir)
 168   │     seagoat.analyze_codebase()
 169   │     my_query = "1"
 170   │     seagoat.query(my_query)
 171   │     await seagoat.fetch()
 172   │
 173   │     results_files = set(result.path for result in seagoat.get_results())
 174   │     assert "history/files/events.txt" not in results_files
 175   │     assert "events.txt" in results_files
───────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────
───────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       │ File: tests\test_result.py
───────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1   │ from pathlib import Path
   2   │
   3   │ from seagoat.result import Result
   4   │ from tests.conftest import pytest
   5   │
   6   │
   7   │ @pytest.fixture(name="create_result")
   8   │ def create_result_(repo):
   9   │     def noop():
  10   │         pass
  11   │
  12   │     cleanup = {"cleanup": noop}
  13   │
  14   │     def result_factory(fake_lines=None):
  15   │         if fake_lines is None:
  16   │             fake_lines = {}
  17   │         test_file_path = Path(repo.working_dir) / "test.txt"
  18   │         with test_file_path.open("w", encoding="utf-8") as output_file:
  19   │             fake_content = "".join(
  20   │                 f"{fake_lines[i]}\n" if i in fake_lines else f"fake line {i}\n"
  21   │                 for i in range(1, 100)
  22   │             )
  23   │             output_file.write(fake_content)
  24   │
  25   │         result = Result("test.txt", test_file_path)
  26   │         result.add_line(40, 0.5)
  27   │
  28   │         cleanup["cleanup"] = test_file_path.unlink
  29   │
  30   │         return result
  31   │
  32   │     yield result_factory
  33   │
  34   │     cleanup["cleanup"]()
  35   │
  36   │
  37   │ def test_get_lines_without_context(create_result, repo):
  38   │     query = "famous typographic sample text"
  39   │     result = create_result(fake_lines={40: "lorem ipsum"})
  40   │     actual_lines = result.get_lines(query)
  41   │     assert actual_lines == [40]
  42   │     assert result.to_json(query) == {
  43   │         "score": 0.75,
  44   │         "fullPath": str(Path(repo.working_dir) / "test.txt"),
  45   │         "blocks": [
  46   │             {
  47   │                 "lineTypeCount": {"result": 1},
  48   │                 "lines": [
  49   │                     {
  50   │                         "score": 0.5,
  51   │                         "line": 40,
  52   │                         "lineText": "lorem ipsum",
  53   │                         "resultTypes": [
  54   │                             "result",
  55   │                         ],
  56   │                     },
  57   │                 ],
  58   │             }
  59   │         ],
  60   │         "path": "test.txt",
  61   │     }
  62   │
  63   │
  64   │ def test_add_result_twice_when_combining_sources(create_result):
  65   │     result = create_result(fake_lines={40: "lorem ipsum"})
  66   │     result.add_line(40, 0.01)
  67   │     result.lines[40].types.add("context")
  68   │     result.add_line(
  69   │         40, 0.02
  70   │     )  # this should not be the final value because 0.01 is better
  71   │
  72   │     assert result.lines[40].vector_distance == 0.01
  73   │     assert "context" in result.lines[40].types
  74   │
  75   │
  76   │ def test_add_context_above_1(create_result, repo):
  77   │     query = "QueryTest"
  78   │     result = create_result()
  79   │     result.add_context_lines(-1)
  80   │     actual_lines = result.get_lines(query)
  81   │     assert actual_lines == [39, 40]
  82   │     assert result.to_json(query) == {
  83   │         "score": 0.75,
  84   │         "fullPath": str(Path(repo.working_dir) / "test.txt"),
  85   │         "blocks": [
  86   │             {
  87   │                 "lineTypeCount": {"context": 1, "result": 1},
  88   │                 "lines": [
  89   │                     {
  90   │                         "score": 0.0,
  91   │                         "line": 39,
  92   │                         "lineText": "fake line 39",
  93   │                         "resultTypes": [
  94   │                             "context",
  95   │                         ],
  96   │                     },
  97   │                     {
  98   │                         "score": 0.5,
  99   │                         "line": 40,
 100   │                         "lineText": "fake line 40",
 101   │                         "resultTypes": [
 102   │                             "result",
 103   │                         ],
 104   │                     },
 105   │                 ],
 106   │             }
 107   │         ],
 108   │         "path": "test.txt",
 109   │     }
 110   │
 111   │
 112   │ def test_add_context_above_2(create_result, repo):
 113   │     query = "QueryTest"
 114   │     result = create_result()
 115   │     result.add_line(20, 0.5)
 116   │     result.add_context_lines(-1)
 117   │     actual_lines = result.get_lines(query)
 118   │     assert actual_lines == [19, 20, 39, 40]
 119   │     assert result.to_json(query) == {
 120   │         "score": 0.75,
 121   │         "fullPath": str(Path(repo.working_dir) / "test.txt"),
 122   │         "blocks": [
 123   │             {
 124   │                 "lineTypeCount": {"context": 1, "result": 1},
 125   │                 "lines": [
 126   │                     {
 127   │                         "score": 0.0,
 128   │                         "line": 19,
 129   │                         "lineText": "fake line 19",
 130   │                         "resultTypes": [
 131   │                             "context",
 132   │                         ],
 133   │                     },
 134   │                     {
 135   │                         "score": 0.5,
 136   │                         "line": 20,
 137   │                         "lineText": "fake line 20",
 138   │                         "resultTypes": [
 139   │                             "result",
 140   │                         ],
 141   │                     },
 142   │                 ],
 143   │             },
 144   │             {
 145   │                 "lineTypeCount": {"context": 1, "result": 1},
 146   │                 "lines": [
 147   │                     {
 148   │                         "score": 0.0,
 149   │                         "line": 39,
 150   │                         "lineText": "fake line 39",
 151   │                         "resultTypes": [
 152   │                             "context",
 153   │                         ],
 154   │                     },
 155   │                     {
 156   │                         "score": 0.5,
 157   │                         "line": 40,
 158   │                         "lineText": "fake line 40",
 159   │                         "resultTypes": [
 160   │                             "result",
 161   │                         ],
 162   │                     },
 163   │                 ],
 164   │             },
 165   │         ],
 166   │         "path": "test.txt",
 167   │     }
 168   │
 169   │
 170   │ def test_add_context_below_1(create_result, repo):
 171   │     query = "QueryTest"
 172   │     result = create_result()
 173   │     result.add_context_lines(1)
 174   │     actual_lines = result.get_lines(query)
 175   │     assert actual_lines == [40, 41]
 176   │     assert result.to_json(query) == {
 177   │         "score": 0.75,
 178   │         "fullPath": str(Path(repo.working_dir) / "test.txt"),
 179   │         "blocks": [
 180   │             {
 181   │                 "lineTypeCount": {"context": 1, "result": 1},
 182   │                 "lines": [
 183   │                     {
 184   │                         "score": 0.5,
 185   │                         "line": 40,
 186   │                         "lineText": "fake line 40",
 187   │                         "resultTypes": [
 188   │                             "result",
 189   │                         ],
 190   │                     },
 191   │                     {
 192   │                         "score": 0.0,
 193   │                         "line": 41,
 194   │                         "lineText": "fake line 41",
 195   │                         "resultTypes": [
 196   │                             "context",
 197   │                         ],
 198   │                     },
 199   │                 ],
 200   │             },
 201   │         ],
 202   │         "path": "test.txt",
 203   │     }
 204   │
 205   │
 206   │ def test_add_context_below_2(create_result, repo):
 207   │     query = "QueryTest"
 208   │     result = create_result()
 209   │     result.add_line(41, 0.5)
 210   │     result.add_line(42, 0.5)
 211   │     result.add_context_lines(1)
 212   │     actual_lines = result.get_lines(query)
 213   │     assert actual_lines == [40, 41, 42, 43]
 214   │     assert result.to_json(query) == {
 215   │         "score": 0.75,
 216   │         "fullPath": str(Path(repo.working_dir) / "test.txt"),
 217   │         "blocks": [
 218   │             {
 219   │                 "lineTypeCount": {"context": 3, "result": 3},
 220   │                 "lines": [
 221   │                     {
 222   │                         "score": 0.5,
 223   │                         "line": 40,
 224   │                         "lineText": "fake line 40",
 225   │                         "resultTypes": [
 226   │                             "result",
 227   │                         ],
 228   │                     },
 229   │                     {
 230   │                         "score": 0.5,
 231   │                         "line": 41,
 232   │                         "lineText": "fake line 41",
 233   │                         "resultTypes": [
 234   │                             "context",
 235   │                             "result",
 236   │                         ],
 237   │                     },
 238   │                     {
 239   │                         "score": 0.5,
 240   │                         "line": 42,
 241   │                         "lineText": "fake line 42",
 242   │                         "resultTypes": [
 243   │                             "context",
 244   │                             "result",
 245   │                         ],
 246   │                     },
 247   │                     {
 248   │                         "score": 0.0,
 249   │                         "line": 43,
 250   │                         "lineText": "fake line 43",
 251   │                         "resultTypes": [
 252   │                             "context",
 253   │                         ],
 254   │                     },
 255   │                 ],
 256   │             },
 257   │         ],
 258   │         "path": "test.txt",
 259   │     }
 260   │
 261   │
 262   │ @pytest.mark.parametrize(
 263   │     "context_line, expected_lines",
 264   │     [
 265   │         (-2, [38, 39, 40]),
 266   │         (-3, [37, 38, 39, 40]),
 267   │         (2, [40, 41, 42]),
 268   │         (4, [40, 41, 42, 43, 44]),
 269   │     ],
 270   │ )
 271   │ def test_adds_correct_context_lines(create_result, context_line, expected_lines):
 272   │     result = create_result()
 273   │     result.add_context_lines(context_line)
 274   │     actual_lines = result.get_lines("")
 275   │     assert actual_lines == expected_lines
───────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────
───────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       │ File: seagoat\cache.py
───────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1   │ import hashlib
   2   │ import os
   3   │ import pickle
   4   │ from pathlib import Path
   5   │ from typing import Generic
   6   │ from typing import TypeVar
   7   │
   8   │ import appdirs
   9   │
  10   │ T = TypeVar("T")
  11   │
  12   │ # Change this whenever a new version is released that requires files to be
  13   │ # re-analyzed
  14   │ CACHE_FORMAT_VERSION = 24
  15   │
  16   │
  17   │ def get_cache_root():
  18   │     if "RUNNER_TEMP" in os.environ:
  19   │         return Path(os.environ["RUNNER_TEMP"])
  20   │
  21   │     return Path(
  22   │         appdirs.user_cache_dir(
  23   │             "seagoat-pytest" if "PYTEST_CURRENT_TEST" in os.environ else "seagoat"
  24   │         )
  25   │     )
  26   │
  27   │
  28   │ class Cache(Generic[T]):
  29   │     def __init__(self, cache_name: str, path: Path, initial_value: T):
  30   │         self._path = path
  31   │         self.data = initial_value
  32   │         self._cache_name = cache_name
  33   │
  34   │     def load(self):
  35   │         try:
  36   │             with open(self._get_cache_file(), "rb") as cache_file:
  37   │                 self.data = pickle.load(cache_file)
  38   │         except (FileNotFoundError, pickle.UnpicklingError, EOFError):
  39   │             pass
  40   │
  41   │     def persist(self):
  42   │         with open(self._get_cache_file(), "wb") as cache_file:
  43   │             pickle.dump(self.data, cache_file)
  44   │
  45   │     def _get_cache_file(self):
  46   │         return self.get_cache_folder() / self._cache_name
  47   │
  48   │     def get_cache_folder(self):
  49   │         cache_folder = get_cache_root() / self._get_project_hash()
  50   │         cache_folder.mkdir(parents=True, exist_ok=True)
  51   │
  52   │         return cache_folder
  53   │
  54   │     def _get_project_hash(self):
  55   │         normalized_path = Path(self._path).expanduser().resolve()
  56   │         text = f"""
  57   │         Cache version: {CACHE_FORMAT_VERSION}
  58   │         Normalized path: {normalized_path}
  59   │         """
  60   │
  61   │         return hashlib.sha256(text.encode()).hexdigest()
───────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────
───────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       │ File: seagoat\file.py
───────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1   │ import hashlib
   2   │ from typing import Dict
   3   │ from typing import List
   4   │ from typing import Literal
   5   │
   6   │ from seagoat.utils.file_reader import read_file_with_correct_encoding
   7   │
   8   │
   9   │ class File:
  10   │     def __init__(
  11   │         self, path: str, absolute_path: str, score: float, commit_messages: list[str]
  12   │     ):
  13   │         self.path = path
  14   │         self.absolute_path = absolute_path
  15   │         self.commit_hashes = set()
  16   │         self.score = score
  17   │         self.commit_messages = commit_messages
  18   │
  19   │     def __repr__(self):
  20   │         return f"<File {self.path} {self.score}>"
  21   │
  22   │     def add_commit(self, commit_hash: str):
  23   │         self.commit_hashes.add(commit_hash)
  24   │
  25   │     def get_metadata(self):
  26   │         commit_messages = "\n-".join(sorted(self.commit_messages))
  27   │         return f"""###
  28   │     Filename: {self.path}
  29   │     Commits:
  30   │ {commit_messages}"""
  31   │
  32   │     def _get_file_lines(self) -> Dict[int, str]:
  33   │         lines = {
  34   │             (i + 1): line
  35   │             for i, line in enumerate(
  36   │                 read_file_with_correct_encoding(self.absolute_path).splitlines()
  37   │             )
  38   │         }
  39   │
  40   │         return lines
  41   │
  42   │     def _format_chunk_summary(self, relevant_lines: List[str]):
  43   │         truncated_lines = [line[:100] for line in relevant_lines]
  44   │         chunk = "\n".join(truncated_lines)
  45   │         chunk = chunk + self.get_metadata()
  46   │
  47   │         return chunk
  48   │
  49   │     def _get_context_lines(
  50   │         self, lines: Dict[int, str], line_number: int, direction: Literal[-1, 1]
  51   │     ) -> List[str]:
  52   │         context_lines = []
  53   │         for i in range(1, 6):
  54   │             current_line_number = line_number + (direction * i)
  55   │             current_line = lines.get(current_line_number)
  56   │
  57   │             if current_line is None:
  58   │                 break
  59   │
  60   │             if direction == -1:
  61   │                 context_lines = [current_line] + context_lines
  62   │             else:
  63   │                 context_lines.append(current_line)
  64   │
  65   │             if self._line_has_relevant_data(current_line):
  66   │                 break
  67   │
  68   │         return context_lines
  69   │
  70   │     def _get_chunk_for_line(self, line_number: int, lines: Dict[int, str]):
  71   │         relevant_lines = (
  72   │             self._get_context_lines(lines, line_number, -1)
  73   │             + [lines[line_number]]
  74   │             + self._get_context_lines(lines, line_number, 1)
  75   │         )
  76   │         return FileChunk(self, line_number, self._format_chunk_summary(relevant_lines))
  77   │
  78   │     def _line_has_relevant_data(self, line: str):
  79   │         return sum(c.isalnum() for c in line) > 3
  80   │
  81   │     def get_chunks(self):
  82   │         lines = self._get_file_lines()
  83   │         return [
  84   │             self._get_chunk_for_line(line_number, lines)
  85   │             for line_number in lines.keys()
  86   │             if self._line_has_relevant_data(lines[line_number])
  87   │         ]
  88   │
  89   │
  90   │ # pylint: disable=too-few-public-methods
  91   │ class FileChunk:
  92   │     def __init__(self, parent: File, codeline: int, chunk: str):
  93   │         self.path = parent.path
  94   │         self.codeline = codeline
  95   │         self.chunk = chunk
  96   │         self.chunk_id = self._get_id()
  97   │
  98   │     def _get_id(self):
  99   │         text = f"""
 100   │         Path: {self.path}
 101   │         Code line: {self.codeline}
 102   │         Chunk: {self.chunk}
 103   │         """
 104   │         return hashlib.sha256(text.encode()).hexdigest()
 105   │
 106   │     def __repr__(self) -> str:
 107   │         return f"<FileChunk {self.path} {self.codeline}>"
───────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────
───────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       │ File: tests\test_file.py
───────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1   │ # pylint: disable=redefined-outer-name
   2   │ from pathlib import Path
   3   │
   4   │ import pytest
   5   │
   6   │ from seagoat.file import File
   7   │
   8   │
   9   │ @pytest.fixture
  10   │ def repo_folder(repo):
  11   │     with open(Path(repo.working_dir) / "hello.md", "w", encoding="utf-8") as hello_file:
  12   │         hello_file.write("Hello world!")
  13   │     yield Path(repo.working_dir)
  14   │
  15   │
  16   │ def test_file_returns_global_metadata_1(repo_folder, snapshot):
  17   │     my_file = File(
  18   │         "hello.md",
  19   │         repo_folder / "hello.md",
  20   │         0.543245,
  21   │         [
  22   │             "First commit",
  23   │             "Second commit",
  24   │         ],
  25   │     )
  26   │
  27   │     assert my_file.get_metadata() == snapshot
  28   │
  29   │
  30   │ def test_file_returns_global_metadata_2(repo_folder, snapshot):
  31   │     my_file = File("hello.md", repo_folder / "hello.md", 0.234234, ["unrelated commit"])
  32   │
  33   │     assert my_file.get_metadata() == snapshot
  34   │
  35   │
  36   │ def test_ignores_almost_empyt_lines_in_chunks(repo):
  37   │     repo.add_file_change_commit(
  38   │         file_name="example.py",
  39   │         contents="""#this is a Python file
  40   │
  41   │ # xd
  42   │
  43   │ class FooBar:
  44   │
  45   │
  46   │ def __init__(self):
  47   │         pass""",
  48   │         author=repo.actors["John Doe"],
  49   │         commit_message=".",
  50   │     )
  51   │
  52   │     my_file = File(
  53   │         "example.py", str(Path(repo.working_dir) / "example.py"), 0.234234, []
  54   │     )
  55   │     assert {item.codeline for item in my_file.get_chunks()} == {1, 5, 8, 9}
  56   │     line5 = [item for item in my_file.get_chunks() if item.codeline == 5][0]
  57   │     found_lines = line5.chunk.split("###")[0].splitlines()
  58   │     assert found_lines == [
  59   │         "#this is a Python file",
  60   │         "",
  61   │         "# xd",
  62   │         "",
  63   │         "class FooBar:",
  64   │         "",
  65   │         "",
  66   │         "def __init__(self):",
  67   │     ]
───────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────
───────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       │ File: tests\test_result_sorting.py
───────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1   │ def test_sort_results_test1(create_prepared_seagoat):
   2   │     ripgrep_lines = {
   3   │         "file1.md": [(1, 10.0), (2, 4.0)],
   4   │         "file2.md": [(1, 5.0)],
   5   │     }
   6   │     chroma_lines = {
   7   │         "file2.md": [(2, 6.0)],
   8   │         "file3.md": [(1, 4.5)],
   9   │     }
  10   │     my_query = "fake query"
  11   │
  12   │     seagoat = create_prepared_seagoat(my_query, ripgrep_lines, chroma_lines)
  13   │     results = seagoat.get_results()
  14   │
  15   │     assert [result.path for result in results] == ["file1.md", "file3.md", "file2.md"]
  16   │
  17   │
  18   │ def test_sort_results_test2(create_prepared_seagoat):
  19   │     ripgrep_lines = {
  20   │         "file1.md": [(1, 10.0)],
  21   │         "file2.md": [(1, 15.0)],
  22   │     }
  23   │     chroma_lines = {
  24   │         "file3.md": [(1, 5.0)],
  25   │     }
  26   │     my_query = "fake query"
  27   │
  28   │     seagoat = create_prepared_seagoat(my_query, ripgrep_lines, chroma_lines)
  29   │     results = seagoat.get_results()
  30   │
  31   │     assert [result.path for result in results] == ["file3.md", "file1.md", "file2.md"]
  32   │
  33   │
  34   │ def test_missing_file_in_one_source(create_prepared_seagoat):
  35   │     ripgrep_lines = {
  36   │         "file1.md": [(1, 10.0)],
  37   │         "file2.md": [(1, 5.0)],
  38   │     }
  39   │     chroma_lines = {
  40   │         "file1.md": [(1, 6.0)],
  41   │     }
  42   │     my_query = "fake query"
  43   │
  44   │     seagoat = create_prepared_seagoat(my_query, ripgrep_lines, chroma_lines)
  45   │     results = seagoat.get_results()
  46   │
  47   │     assert [result.path for result in results] == ["file2.md", "file1.md"]
  48   │
  49   │
  50   │ def test_no_lines(create_prepared_seagoat):
  51   │     ripgrep_lines = {}
  52   │     chroma_lines = {}
  53   │     my_query = "fake query"
  54   │
  55   │     seagoat = create_prepared_seagoat(my_query, ripgrep_lines, chroma_lines)
  56   │     results = seagoat.get_results()
  57   │
  58   │     assert results == []
  59   │
  60   │
  61   │ def test_file_edits_influence_order(create_prepared_seagoat, repo):
  62   │     repo.add_file_change_commit(
  63   │         file_name="file_few_edits.md",
  64   │         contents="Some content",
  65   │         author=repo.actors["John Doe"],
  66   │         commit_message="Edit file_few_edits.md",
  67   │     )
  68   │
  69   │     for i in range(10):
  70   │         for j in range(3):
  71   │             repo.add_file_change_commit(
  72   │                 file_name=f"file_with_some_edits_{i}.md",
  73   │                 contents=f"Some content {i} {j}",
  74   │                 author=repo.actors["John Doe"],
  75   │                 commit_message="Edit file_many_edits.md",
  76   │             )
  77   │             repo.tick_fake_date(days=1)
  78   │
  79   │     for i in range(20):
  80   │         repo.add_file_change_commit(
  81   │             file_name="file_many_edits.md",
  82   │             contents=f"Some content {i}",
  83   │             author=repo.actors["John Doe"],
  84   │             commit_message="Edit file_many_edits.md",
  85   │         )
  86   │         repo.tick_fake_date(days=1)
  87   │
  88   │     ripgrep_lines = {
  89   │         "file_few_edits.md": [(1, 5.0)],
  90   │         "file_many_edits.md": [(1, 6.0)],
  91   │     }
  92   │     chroma_lines = {
  93   │         "file_few_edits.md": [(2, 5.0)],
  94   │         "file_many_edits.md": [(1, 6.0)],
  95   │         "random.py": [(1, 60.01)],
  96   │         "things.js": [(1, 160.01)],
  97   │     }
  98   │     my_query = "asdfadsfdfdffdafafdsfadsf"
  99   │
 100   │     seagoat = create_prepared_seagoat(my_query, ripgrep_lines, chroma_lines)
 101   │     seagoat.analyze_codebase()
 102   │     results = seagoat.get_results()
 103   │
 104   │     assert [result.path for result in results][0:2] == [
 105   │         "file_many_edits.md",
 106   │         "file_few_edits.md",
 107   │     ]
───────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────
───────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       │ File: tests\test_queue.py
───────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1   │ from unittest.mock import Mock
   2   │
   3   │ import pytest
   4   │
   5   │ from seagoat.queue.task_queue import TaskQueue
   6   │
   7   │
   8   │ @pytest.fixture(name="task_queue")
   9   │ def task_queue_(repo):
  10   │     return TaskQueue(repo_path=repo.working_dir, minimum_chunks_to_analyze=0)
  11   │
  12   │
  13   │ @pytest.mark.parametrize(
  14   │     "chunks_analyzed, unanalyzed, expected_accuracy",
  15   │     [
  16   │         (0, 0, 100),
  17   │         (1, 999999, 1),
  18   │         (1000, 0, 100),
  19   │         (0, 20, 0),
  20   │         (5, 150, 2),
  21   │         (50, 450, 11),
  22   │         (5, 15, 45),
  23   │         (10, 10, 91),
  24   │         (100, 100, 91),
  25   │         (100_000, 100_001, 91),
  26   │         (15, 5, 99),
  27   │         (150, 5, 99),
  28   │         (150_000, 5, 99),
  29   │     ],
  30   │ )
  31   │ def test_handle_get_stats(task_queue, chunks_analyzed, unanalyzed, expected_accuracy):
  32   │     context = {
  33   │         "seagoat_engine": Mock(),
  34   │     }
  35   │
  36   │     context["seagoat_engine"].cache.data = {
  37   │         "chunks_already_analyzed": set(range(chunks_analyzed)),
  38   │         "chunks_not_yet_analyzed": set(range(unanalyzed)),
  39   │     }
  40   │
  41   │     stats = task_queue.handle_get_stats(context)
  42   │     task_queue.shutdown()
  43   │
  44   │     assert stats["accuracy"]["percentage"] == expected_accuracy
───────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────
───────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       │ File: seagoat\repository.py
───────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1   │ import datetime
   2   │ import math
   3   │ import subprocess
   4   │ from collections import defaultdict
   5   │ from pathlib import Path
   6   │
   7   │ from seagoat.file import File
   8   │ from seagoat.utils.file_types import is_file_type_supported
   9   │
  10   │
  11   │ def parse_commit_info(raw_line: str):
  12   │     commit_hash, date_str, author, commit_subject = raw_line.split(":::", 3)
  13   │
  14   │     commit_date = datetime.datetime.strptime(date_str, "%Y-%m-%d %H:%M:%S %z").date()
  15   │     today = datetime.date.today()
  16   │     days_passed = (today - commit_date).days
  17   │
  18   │     return (commit_hash, days_passed, author, commit_subject)
  19   │
  20   │
  21   │ class Repository:
  22   │     def __init__(self, repo_path: str):
  23   │         self.path = Path(repo_path)
  24   │         self.file_changes = defaultdict(list)
  25   │         self.frecency_scores = {}
  26   │
  27   │     def analyze_files(self):
  28   │         cmd = [
  29   │             "git",
  30   │             "-C",
  31   │             self.path,
  32   │             "log",
  33   │             "--name-only",
  34   │             "--pretty=format:###%h:::%ai:::%an <%ae>:::%s",
  35   │             "--no-merges",
  36   │         ]
  37   │
  38   │         self.file_changes.clear()
  39   │
  40   │         current_commit_info = None
  41   │         with subprocess.Popen(cmd, stdout=subprocess.PIPE, text=True) as proc:
  42   │             assert proc.stdout is not None
  43   │             for line in iter(proc.stdout.readline, ""):
  44   │                 line = line.strip()
  45   │                 if ":::" in line:
  46   │                     current_commit_info = parse_commit_info(line)
  47   │                 elif line:
  48   │                     filename = line
  49   │
  50   │                     if not is_file_type_supported(filename):
  51   │                         continue
  52   │
  53   │                     if not (self.path / filename).exists():
  54   │                         continue
  55   │
  56   │                     self.file_changes[filename].append(current_commit_info)
  57   │
  58   │         self._compute_frecency()
  59   │
  60   │     def _compute_frecency(self):
  61   │         self.frecency_scores = {}
  62   │         for file, commits in self.file_changes.items():
  63   │             score = sum(
  64   │                 1 / (math.log(days_passed + 2, 2))
  65   │                 for _, days_passed, __, ___ in commits
  66   │             )
  67   │             self.frecency_scores[file] = score
  68   │
  69   │     def top_files(self):
  70   │         return [
  71   │             (self.get_file(filename), score)
  72   │             for filename, score in sorted(
  73   │                 self.frecency_scores.items(), key=lambda x: x[0][1]
  74   │             )
  75   │         ]
  76   │
  77   │     def get_file(self, filename: str):
  78   │         return File(
  79   │             filename,
  80   │             str(self.path / filename),
  81   │             self.frecency_scores[filename],
  82   │             [commit[3] for commit in self.file_changes[filename]],
  83   │         )
───────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────
───────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       │ File: tests\test_source_ripgrep.py
───────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1   │ from seagoat.repository import Repository
   2   │ from seagoat.sources.ripgrep import initialize
   3   │ from tests.test_ripgrep import pytest
   4   │
   5   │
   6   │ @pytest.fixture(name="initialize_source")
   7   │ def _initialize_source():
   8   │     def _initalize(repo):
   9   │         path = repo.working_dir
  10   │         source = initialize(Repository(path))
  11   │
  12   │         return source["fetch"]
  13   │
  14   │     return _initalize
  15   │
  16   │
  17   │ def test_fetch_and_initialize(repo, initialize_source):
  18   │     contents = """
  19   │ 234
  20   │ hello foo bar baz
  21   │ hello foo bar baz 23
  22   │
  23   │ 234234
  24   │ 345 adaf
  25   │ 2345234523452345235
  26   │ 2345
  27   │ """
  28   │     repo.add_file_change_commit(
  29   │         file_name="file1.txt",
  30   │         contents=contents,
  31   │         author=repo.actors["John Doe"],
  32   │         commit_message="Initial commit for text file",
  33   │     )
  34   │
  35   │     fetch = initialize_source(repo)
  36   │     fetched_files = fetch("[0-9]{2,10}", limit=400)
  37   │
  38   │     assert len(fetched_files) == 1
  39   │     file = next(iter(fetched_files))
  40   │     assert file.path == "file1.txt"
  41   │     assert set(line for line in file.lines) == {2, 4, 6, 7, 8, 9}
  42   │
  43   │
  44   │ def test_whitespace_is_used_as_or_operator(repo, initialize_source):
  45   │     contents = """
  46   │ 234
  47   │ hello foo bar baz
  48   │ hello foo bar baz 23
  49   │
  50   │ 234234
  51   │ 345 adaf
  52   │ 2345234523452345235
  53   │ 2345
  54   │ baz
  55   │ bar
  56   │ b3
  57   │ """
  58   │     repo.add_file_change_commit(
  59   │         file_name="file1.txt",
  60   │         contents=contents,
  61   │         author=repo.actors["John Doe"],
  62   │         commit_message="Initial commit for text file",
  63   │     )
  64   │
  65   │     fetch = initialize_source(repo)
  66   │     fetched_files = fetch("[0-9]{2,10} baz bar b[0-9]", limit=100)
  67   │
  68   │     assert len(fetched_files) == 1
  69   │     file = next(iter(fetched_files))
  70   │     assert file.path == "file1.txt"
  71   │     assert set(line for line in file.lines) == {2, 3, 4, 6, 7, 8, 9, 10, 11, 12}
───────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────
───────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       │ File: seagoat\utils\json_file.py
───────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1   │ from pathlib import Path
   2   │
   3   │ import orjson
   4   │
   5   │
   6   │ def get_json_file_contents(file_path: Path):
   7   │     with open(str(file_path), "rb") as file:
   8   │         file_content = file.read()
   9   │         if not file_content:
  10   │             return None
  11   │
  12   │         return orjson.loads(file_content)
  13   │
  14   │
  15   │ def write_to_json_file(file_path: Path, data: dict) -> None:
  16   │     with open(str(file_path), "wb") as file:
  17   │         file.write(orjson.dumps(data))
───────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────
───────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       │ File: seagoat\utils\file_reader.py
───────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1   │ from chardet.universaldetector import UniversalDetector
   2   │
   3   │
   4   │ def read_file_with_correct_encoding(file_path):
   5   │     try:
   6   │         # Try to read the file as UTF-8 first
   7   │         with open(file_path, "r", encoding="utf-8") as file:
   8   │             return file.read()
   9   │     except UnicodeDecodeError:
  10   │         # If UTF-8 reading fails, then detect encoding and read accordingly
  11   │         detector = UniversalDetector()
  12   │         with open(file_path, "rb") as file:
  13   │             for line in file:
  14   │                 detector.feed(line)
  15   │                 if detector.done:
  16   │                     break
  17   │         detector.close()
  18   │         encoding = detector.result["encoding"] or "utf-8"
  19   │
  20   │         with open(file_path, "r", encoding=encoding) as file:
  21   │             return file.read()
───────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────
───────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       │ File: tests\test_regexp.py
───────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1   │ import pytest
   2   │
   3   │ from seagoat.engine import Engine
   4   │
   5   │
   6   │ @pytest.mark.asyncio
   7   │ async def test_simple_regexp(repo):
   8   │     repo.add_file_change_commit(
   9   │         file_name="line_positions.txt",
  10   │         contents="""apple
  11   │         orange apple 0
  12   │         apple banana
  13   │         grape 12
  14   │
  15   │ 9999999
  16   │         """,
  17   │         author=repo.actors["John Doe"],
  18   │         commit_message="Add fruits data",
  19   │     )
  20   │
  21   │     seagoat = Engine(repo.working_dir)
  22   │     seagoat.analyze_codebase()
  23   │
  24   │     my_regex_query = "[0-9]+[0-9]+"
  25   │     seagoat.query(my_regex_query)
  26   │     await seagoat.fetch()
  27   │
  28   │     assert seagoat.get_results()[0].path == "line_positions.txt"
  29   │     assert set(seagoat.get_results()[0].get_lines(my_regex_query)) == {4, 6}
  30   │
  31   │
  32   │ @pytest.mark.asyncio
  33   │ async def test_regexp_combined_with_chroma(repo):
  34   │     repo.add_file_change_commit(
  35   │         file_name="line_positions.txt",
  36   │         contents="""samsung iphone
  37   │         smart apps
  38   │         bicycle 12
  39   │
  40   │         foo
  41   │         bar
  42   │         baz
  43   │ 9999999 apple pie with orange recipe
  44   │ 9999999 banana pie with pear recipe
  45   │ 9999999 kiwi pie with lemon recipe
  46   │
  47   │ 2345 23452345 2345
  48   │ 2345235 23452345 32
  49   │ asdf
  50   │ asdf
  51   │ asdf
  52   │         """,
  53   │         author=repo.actors["John Doe"],
  54   │         commit_message="Add fruits data",
  55   │     )
  56   │
  57   │     seagoat = Engine(repo.working_dir)
  58   │     seagoat.analyze_codebase()
  59   │
  60   │     my_regex_query = "[0-9]+[0-9]+ fruit"
  61   │     seagoat.query(my_regex_query)
  62   │     await seagoat.fetch()
  63   │
  64   │     assert seagoat.get_results()[0].path == "line_positions.txt"
  65   │     assert set(seagoat.get_results()[0].get_lines(my_regex_query)) == {
  66   │         3,
  67   │         8,
  68   │         9,
  69   │         10,
  70   │         12,
  71   │         13,
  72   │     }
───────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────
───────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       │ File: tests\test_file_reader.py
───────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1   │ import os
   2   │ import tempfile
   3   │
   4   │ import pytest
   5   │ from chardet.universaldetector import UniversalDetector
   6   │
   7   │ from seagoat.utils.file_reader import read_file_with_correct_encoding
   8   │
   9   │
  10   │ MISDETECTED_UTF8_EXAMPLES = [
  11   │     # (sequence, detected_encoding),  # detection_confidence
  12   │     (b"\xcc\x8fa\\>N7:A", "Windows-1252"),  # 0.73
  13   │     (b"\xdd\x81nl", "Windows-1254"),  # 0.7729647837244535
  14   │     (b"\xd8\x90Mas", "Windows-1254"),  # 0.6183718269795628
  15   │     (b"t\xd1\x8fXbci\x15\x1bE", "Windows-1254"),  # 0.322068659885189
  16   │     (b"\xde\x9cL&\x1ba\xca\xb4r1", "TIS-620"),  # 0.30841818174528296
  17   │     (b"qdl\t=\xdc\xb4\x10\x0e9\xe9\x9f\x9d\x03", "Windows-1254"),  # 0.38648239186222677
  18   │     (b"#L)Xlg<\xd6\x90x", "Windows-1254"),  # 0.7729647837244535
  19   │     (b"$\xc6\x8f>.", "Windows-1252"),  # 0.73
  20   │     (b"\xdf\x8fO\x15mg$G", "Windows-1254"),  # 0.6870798077550698
  21   │     (b"da\xd4\x9e|", "Windows-1254"),  # 0.5153098558163024
  22   │     (b"\xc7\x8fN.f\x1dgmk`", "Windows-1254"),  # 0.8833883242565184
  23   │     (b"\x02\xc4\xa0\xd4\xba%/", "TIS-620"),  # 0.8095977270813678
  24   │     (b"K\x1f<\xec\x90\xb8", "Windows-1252"),  # 0.73
  25   │     (b"@<L\xd2\x90 ", "Windows-1252"),  # 0.73
  26   │     (b"U \x19\x04\x19\x04\x7f<Iu\xd9\x8dx\x1e", "Windows-1252"),  # 0.73
  27   │     (b"C\x0eju=\xd3\x8em\x15", "Windows-1254"),  # 0.6870798077550698
  28   │     (b"uu\x15\xd0\x90\x7fA", "Windows-1254"),  # 0.6183718269795628
  29   │     (b"\xc5\x8d!x>\t", "Windows-1252"),  # 0.73
  30   │     (b"\xd3\x9d>t", "Windows-1252"),  # 0.73
  31   │     (b"\xc5\x8dTa5", "Windows-1254"),  # 0.5153098558163024
  32   │ ]
  33   │
  34   │
  35   │ @pytest.mark.parametrize("sequence, detected_encoding", MISDETECTED_UTF8_EXAMPLES)
  36   │ def test_file_reader_does_not_crash_because_of_misdetected_utf8(
  37   │     sequence: bytes, detected_encoding: str
  38   │ ):
  39   │     # the fallback encoding is utf8 as that is what most projects are expected to use
  40   │
  41   │     with tempfile.NamedTemporaryFile(delete=False) as temp_file:
  42   │         temp_file.write(sequence)
  43   │
  44   │     try:
  45   │         # Ensure chardet detects the expected encoding
  46   │         # If it does not detect as that encoding, the test case might be useless
  47   │         test_detector = UniversalDetector()
  48   │         with open(temp_file.name, "rb") as file:
  49   │             for line in file:
  50   │                 test_detector.feed(line)
  51   │                 if test_detector.done:
  52   │                     break
  53   │         test_detector.close()
  54   │         detected = test_detector.result["encoding"]
  55   │         assert detected == detected_encoding
  56   │
  57   │         # Now, proceed with the read_file_with_correct_encoding
  58   │         content = read_file_with_correct_encoding(temp_file.name)
  59   │         assert content is not None
  60   │     finally:
  61   │         os.unlink(temp_file.name)
───────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────
───────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       │ File: seagoat\utils\file_types.py
───────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1   │ from pathlib import Path
   2   │ from typing import Union
   3   │
   4   │
   5   │ IGNORED_BRANCHES = {"gh-pages"}
   6   │
   7   │ TEXT_FILE_TYPES = {
   8   │     ".txt",
   9   │     ".md",
  10   │ }
  11   │
  12   │ SUPPORTED_FILE_TYPES = TEXT_FILE_TYPES | {
  13   │     ".py",
  14   │     ".c",
  15   │     ".cc",
  16   │     ".cpp",
  17   │     ".cxx",
  18   │     ".h",
  19   │     ".hpp",
  20   │     ".ts",
  21   │     ".js",
  22   │     ".tsx",
  23   │     ".jsx",
  24   │     ".html",
  25   │     ".go",
  26   │     ".java",
  27   │     ".php",
  28   │     ".rb",
  29   │ }
  30   │
  31   │
  32   │ def is_file_type_supported(path: Union[Path, str]):
  33   │     return Path(path).suffix in SUPPORTED_FILE_TYPES
  34   │
  35   │
  36   │ def get_file_penalty_factor(path: Union[Path, str]) -> float:
  37   │     # Text file lines are penalized compared to code file lines as they
  38   │     # generally have more meaningful words, but the user might be more likely
  39   │     # to be looking for code than text.
  40   │     if Path(path).suffix in TEXT_FILE_TYPES:
  41   │         return 1.5
  42   │
  43   │     return 1.0
───────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────
───────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       │ File: seagoat\utils\wait.py
───────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1   │ import time
   2   │
   3   │
   4   │ def wait_for(condition_function, timeout, period=0.05):
   5   │     start_time = time.time()
   6   │     while not condition_function():
   7   │         if time.time() - start_time > timeout:
   8   │             raise TimeoutError("Timeout expired while waiting for condition.")
   9   │         time.sleep(period)
───────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────
───────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       │ File: docs\overrides\main.html
───────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1   │ {% extends "base.html" %}
   2   │
   3   │ {% block outdated %}
   4   │   This documentation is for an old version of SeaGOAT.
   5   │   <a href="{{ '../' ~ base_url }}">
   6   │     <strong>Click here to go to latest documentation.</strong>
   7   │   </a>
   8   │ {% endblock %}
───────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────
───────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       │ File: tests/test_repository.py
───────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 154   │     assert set(file.path for file, _ in seagoat2.repository.top_files()) == {
 ...   │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ 8< ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
 158   │         "file4.js",
 159   │         "file4.md",
 ...   │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ 8< ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
 161   │
 162   │
 ...   │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ 8< ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
 178   │         "file4.md",
 ...   │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ 8< ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
 182   │ def test_only_returns_supported_file_types(repo):
 183   │     seagoat = Engine(repo.working_dir)
 ...   │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ 8< ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
 185   │         repo.add_file_change_commit(
───────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────
───────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       │ File: seagoat/repository.py
───────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   7   │ from seagoat.file import File
 ...   │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ 8< ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
  11   │ def parse_commit_info(raw_line: str):
 ...   │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ 8< ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
  48   │                     filename = line
 ...   │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ 8< ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
  50   │                     if not is_file_type_supported(filename):
  51   │                         continue
───────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────
───────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       │ File: seagoat/utils/file_reader.py
───────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1   │ from chardet.universaldetector import UniversalDetector
 ...   │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ 8< ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
   4   │ def read_file_with_correct_encoding(file_path):
 ...   │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ 8< ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
  12   │         with open(file_path, "rb") as file:
 ...   │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ 8< ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
  21   │             return file.read()
───────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────
───────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       │ File: seagoat/result.py
───────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────
  11   │ from seagoat.utils.file_reader import read_file_with_correct_encoding
  12   │ from seagoat.utils.file_types import get_file_penalty_factor
───────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────
───────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       │ File: tests/test_file_reader.py
───────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1   │ import os
 ...   │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ 8< ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
   4   │ import pytest
 ...   │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ 8< ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
   7   │ from seagoat.utils.file_reader import read_file_with_correct_encoding
 ...   │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ 8< ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
  11   │     # (sequence, detected_encoding),  # detection_confidence
 ...   │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ 8< ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
  14   │     (b"\xd8\x90Mas", "Windows-1254"),  # 0.6183718269795628
 ...   │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ 8< ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
  25   │     (b"@<L\xd2\x90 ", "Windows-1252"),  # 0.73
 ...   │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ 8< ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
  30   │     (b"\xd3\x9d>t", "Windows-1252"),  # 0.73
 ...   │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ 8< ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
  39   │     # the fallback encoding is utf8 as that is what most projects are expected to use
 ...   │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ 8< ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
  41   │     with tempfile.NamedTemporaryFile(delete=False) as temp_file:
  42   │         temp_file.write(sequence)
 ...   │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ 8< ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
  47   │         test_detector = UniversalDetector()
───────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────
───────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       │ File: CHANGELOG.md
───────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1   │ # CHANGELOG
   2   │
   3   │
   4   │
   5   │ ## v0.35.1 (2023-09-28)
   6   │
   7   │ ### Fix
   8   │
   9   │ * fix: display files correctly when remote server is divergent ([`f852e06`](https://github.com/kantord/SeaGOAT/
       │ commit/f852e069193effa387f49566abe31f3f14ee1bb3))
  10   │
  11   │ ### Refactor
  12   │
  13   │ * refactor: extract result filtering to separate functions ([`ea0cf79`](https://github.com/kantord/SeaGOAT/comm
       │ it/ea0cf79f8019cbb5c1472178f6e8172530a775ca))
  14   │
  15   │
  16   │ ## v0.35.0 (2023-09-28)
  17   │
  18   │ ### Chore
  19   │
  20   │ * chore(deps): update dependency pyright to v1.1.329 ([`b4f81c1`](https://github.com/kantord/SeaGOAT/commit/b4f
       │ 81c16932bcb9d373b67feb7e517062ffa214b))
  21   │
  22   │ ### Ci
  23   │
  24   │ * ci: set up code coverage (#263) ([`12b4145`](https://github.com/kantord/SeaGOAT/commit/12b41458edf415e147eb0c
       │ d793f1676a69afeb74))
  25   │
  26   │ ### Feature
  27   │
  28   │ * feat: support ignoring files that are not gitignored ([`bb6e53f`](https://github.com/kantord/SeaGOAT/commit/b
       │ b6e53f64ce61db442b0860855c72717761fe055))
  29   │
  30   │ ### Fix
  31   │
  32   │ * fix: make config file checking more accurate ([`92167b7`](https://github.com/kantord/SeaGOAT/commit/92167b7b8
       │ f8c822813df60a627631355077654c1))
  33   │
  34   │ ### Refactor
  35   │
  36   │ * refactor: remove ripgrepy as a dependency ([`b9a5847`](https://github.com/kantord/SeaGOAT/commit/b9a5847db113
       │ 52caa2e8776c2c3f8e8257ba6b65))
  37   │
  38   │ ### Test
  39   │
  40   │ * test: normalize path in seagoat to support &#39;.&#39;
  41   │
  42   │ tests #125 ([`f035203`](https://github.com/kantord/SeaGOAT/commit/f035203f498d1bc4a9873e306c5f2e96acbf0f75))
  43   │
  44   │ * test: remove redundant sleep() ([`6c48425`](https://github.com/kantord/SeaGOAT/commit/6c484258fab9504d603b07b
       │ b587a3fbe77c1bf0d))
  45   │
  46   │ * test: join server processes ([`b63865b`](https://github.com/kantord/SeaGOAT/commit/b63865b62f8d21701a72f1f976
       │ 444662581c3f96))
  47   │
  48   │
  49   │ ## v0.34.0 (2023-09-27)
  50   │
  51   │ ### Chore
  52   │
  53   │ * chore(deps): update dependency pyright to v1.1.328 ([`03d49bf`](https://github.com/kantord/SeaGOAT/commit/03d
       │ 49bfd00b2ebc74e778dc03bdef86994634459))
  54   │
  55   │ ### Documentation
  56   │
  57   │ * docs: fix configuration docs link in readme.md ([`0d61332`](https://github.com/kantord/SeaGOAT/commit/0d61332
       │ 5b0e37abfdb7279741e7633d3ea70ca44))
  58   │
  59   │ ### Feature
  60   │
  61   │ * feat: allow cli to connect to a remote server (#262)
  62   │
  63   │ fixes #236 ([`86b12e9`](https://github.com/kantord/SeaGOAT/commit/86b12e9bc8130cd551ef34b9a8974749d1df8259))
  64   │
  65   │
  66   │ ## v0.33.0 (2023-09-26)
  67   │
  68   │ ### Chore
  69   │
  70   │ * chore(deps): update dependency python-semantic-release to v8.1.1 (#258)
  71   │
  72   │ Co-authored-by: renovate[bot] &lt;29139614+renovate[bot]@users.noreply.github.com&gt; ([`b077c39`](https://gith
       │ ub.com/kantord/SeaGOAT/commit/b077c39ef8f57db90ad2e5bb59d7d2ff10ee9118))
  73   │
  74   │ ### Feature
  75   │
  76   │ * feat: support config files
  77   │
  78   │ * feat: allow users to create repo-wide configuration
  79   │
  80   │ * feat: support global config files
  81   │
  82   │ * feat: allow overriding global config from repo config
  83   │
  84   │ * feat: allow configuring port via config file ([`6f337ce`](https://github.com/kantord/SeaGOAT/commit/6f337cec4
       │ 9a4b1d242d8d0fa14f1ede78a0b90b1))
  85   │
  86   │
  87   │ ## v0.32.2 (2023-09-25)
  88   │
  89   │ ### Fix
  90   │
  91   │ * fix(deps): update dependency chromadb to v0.4.13 (#257)
  92   │
  93   │ Co-authored-by: renovate[bot] &lt;29139614+renovate[bot]@users.noreply.github.com&gt; ([`3b8fdb5`](https://gith
       │ ub.com/kantord/SeaGOAT/commit/3b8fdb516c081effc59cee202226e8bec20ecb97))
  94   │
  95   │
  96   │ ## v0.32.1 (2023-09-25)
  97   │
  98   │ ### Chore
  99   │
 100   │ * chore(deps): update dependency pylint to v2.17.6 (#254)
 101   │
 102   │ Co-authored-by: renovate[bot] &lt;29139614+renovate[bot]@users.noreply.github.com&gt; ([`b04a361`](https://gith
       │ ub.com/kantord/SeaGOAT/commit/b04a361bc46978a604526d22f6f1da7bf97cc7b5))
 103   │
 104   │ * chore(deps): update dependency mkdocs-material to v9.4.2 (#253)
 105   │
 106   │ Co-authored-by: renovate[bot] &lt;29139614+renovate[bot]@users.noreply.github.com&gt; ([`559ba1d`](https://gith
       │ ub.com/kantord/SeaGOAT/commit/559ba1d4c9b32126cd58639afe3bd4e8cbc2098d))
 107   │
 108   │ ### Fix
 109   │
 110   │ * fix: avoid crashing because of misdetected encoding (#255)
 111   │
 112   │ fixes #250 ([`5d471ea`](https://github.com/kantord/SeaGOAT/commit/5d471eabcb0959762513e72901b35308c7906fa1))
 113   │
 114   │
 115   │ ## v0.32.0 (2023-09-25)
 116   │
 117   │ ### Feature
 118   │
 119   │ * feat: penalize text files compared to code files (#252) ([`c79a6fa`](https://github.com/kantord/SeaGOAT/commi
       │ t/c79a6fa2e06921fca68781e3d553899effb10829))
 120   │
 121   │
 122   │ ## v0.31.0 (2023-09-24)
 123   │
 124   │ ### Feature
 125   │
 126   │ * feat: include cache folders in server info JSON (#249) ([`3bdc226`](https://github.com/kantord/SeaGOAT/commit
       │ /3bdc226f5ec0909ed875f57573c3e9d04b193f2a))
 127   │
 128   │
 129   │ ## v0.30.2 (2023-09-24)
 130   │
 131   │ ### Fix
 132   │
 133   │ * fix: don&#39;t crash when ripgrep finds an uncached file (#248)
 134   │
 135   │ partially or fully fixes #226 ([`4fe3c60`](https://github.com/kantord/SeaGOAT/commit/4fe3c60095240e5a7d7b1d2617
       │ ef33c219dd2a36))
 136   │
 137   │
 138   │ ## v0.30.1 (2023-09-24)
 139   │
 140   │ ### Fix
 141   │
 142   │ * fix: avoid crashing when file no longer exists (#247)
 143   │
 144   │ fixes #245 ([`d85231a`](https://github.com/kantord/SeaGOAT/commit/d85231a9647052c773fb7c38dc618b3e46298f81))
 145   │
 146   │
 147   │ ## v0.30.0 (2023-09-24)
 148   │
 149   │ ### Feature
 150   │
 151   │ * feat: detect file encoding to support encodings other than UTF-8
 152   │
 153   │ * Try to ignore binary files
 154   │
 155   │ * Fix typo in README
 156   │
 157   │ * fix: always detect a file encoding
 158   │
 159   │ * test: test that other encodings are supported
 160   │
 161   │ * add FileReader
 162   │
 163   │ * docs: document list of supported character encodings
 164   │
 165   │ ---------
 166   │
 167   │ Co-authored-by: Daniel Kantor &lt;git@daniel-kantor.com&gt; ([`3b889bc`](https://github.com/kantord/SeaGOAT/com
       │ mit/3b889bc43a5464f457a461b321f3bf851e75d6cc))
 168   │
 169   │
 170   │ ## v0.29.3 (2023-09-23)
 171   │
 172   │ ### Chore
 173   │
 174   │ * chore(deps): update actions/checkout digest to 8ade135 (#242)
 175   │
 176   │ Co-authored-by: renovate[bot] &lt;29139614+renovate[bot]@users.noreply.github.com&gt; ([`6e774aa`](https://gith
       │ ub.com/kantord/SeaGOAT/commit/6e774aad53d29eae0bb289554db4d1d6578ee17f))
 177   │
 178   │ ### Fix
 179   │
 180   │ * fix: support Windows file paths (#234) ([`fe11547`](https://github.com/kantord/SeaGOAT/commit/fe11547432240fa
       │ 6d0dab41894bee34fc054cf6a))
 181   │
 182   │
 183   │ ## v0.29.2 (2023-09-23)
 184   │
 185   │ ### Chore
 186   │
 187   │ * chore(deps): update dependency mkdocs-material to v9.4.1 (#239)
 188   │
 189   │ Co-authored-by: renovate[bot] &lt;29139614+renovate[bot]@users.noreply.github.com&gt; ([`3af3112`](https://gith
       │ ub.com/kantord/SeaGOAT/commit/3af311251506f5d99f319b493f20a258e2625e4c))
 190   │
 191   │ ### Fix
 192   │
 193   │ * fix: support commit messages that contain :::
 194   │
 195   │ * Fix exception in repositories with commits containing &#39;:::&#39; in commit message
 196   │
 197   │ Setting [maxsplit](https://docs.python.org/3/library/stdtypes.html#str.split).
 198   │
 199   │ The following exception was thrown:
 200   │
 201   │ ```
 202   │ Exception in thread Thread-1 (_worker_function):
 203   │ Traceback (most recent call last):
 204   │   File &#34;/home/user/.local/pipx/venvs/seagoat/lib/python3.11/site-packages/seagoat/queue/base_queue.py&#34;,
       │  line 76, in _worker_function
 205   │     task = self._task_queue.get(timeout=1)
 206   │            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 207   │   File &#34;/usr/lib/python3.11/queue.py&#34;, line 179, in get
 208   │     raise Empty
 209   │ _queue.Empty
 210   │
 211   │ During handling of the above exception, another exception occurred:
 212   │
 213   │ Traceback (most recent call last):
 214   │   File &#34;/usr/lib/python3.11/threading.py&#34;, line 1038, in _bootstrap_inner
 215   │     self.run()
 216   │   File &#34;/usr/lib/python3.11/threading.py&#34;, line 975, in run
 217   │     self._target(*self._args, **self._kwargs)
 218   │   File &#34;/home/user/.local/pipx/venvs/seagoat/lib/python3.11/site-packages/seagoat/queue/base_queue.py&#34;,
       │  line 81, in _worker_function
 219   │     self.handle_maintenance(context)
 220   │   File &#34;/home/user/.local/pipx/venvs/seagoat/lib/python3.11/site-packages/seagoat/queue/task_queue.py&#34;,
       │  line 50, in handle_maintenance
 221   │     remaining_chunks_to_analyze = context[&#34;seagoat_engine&#34;].analyze_codebase(
 222   │                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 223   │   File &#34;/home/user/.local/pipx/venvs/seagoat/lib/python3.11/site-packages/seagoat/engine.py&#34;, line 82,
       │ in analyze_codebase
 224   │     self.repository.analyze_files()
 225   │   File &#34;/home/user/.local/pipx/venvs/seagoat/lib/python3.11/site-packages/seagoat/repository.py&#34;, line
       │ 46, in analyze_files
 226   │     current_commit_info = parse_commit_info(line)
 227   │                           ^^^^^^^^^^^^^^^^^^^^^^^
 228   │   File &#34;/home/user/.local/pipx/venvs/seagoat/lib/python3.11/site-packages/seagoat/repository.py&#34;, line
       │ 12, in parse_commit_info
 229   │     commit_hash, date_str, author, commit_subject = raw_line.split(&#34;:::&#34;)
 230   │     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 231   │ ValueError: too many values to unpack (expected 4)
 232   │ ```
 233   │
 234   │ * Test commit messages with three or more colons
 235   │
 236   │ * style: fix code style issues
 237   │
 238   │ ---------
 239   │
 240   │ Co-authored-by: Daniel Kantor &lt;git@daniel-kantor.com&gt; ([`2a2df42`](https://github.com/kantord/SeaGOAT/com
       │ mit/2a2df42cd84ebd4be9484a1c2a2c87903d7304b1))
 241   │
 242   │
 243   │ ## v0.29.1 (2023-09-22)
 244   │
 245   │ ### Chore
 246   │
 247   │ * chore(deps): update dependency mkdocs-material to v9.4.0 (#235)
 248   │
 249   │ Co-authored-by: renovate[bot] &lt;29139614+renovate[bot]@users.noreply.github.com&gt; ([`4b6e74a`](https://gith
       │ ub.com/kantord/SeaGOAT/commit/4b6e74a06bf8ff02c0ca2a4e0c4df963d42636ea))
 250   │
 251   │ ### Documentation
 252   │
 253   │ * docs: document why SeaGOAT is not maxing out CPU (#233) ([`2499b6b`](https://github.com/kantord/SeaGOAT/commi
       │ t/2499b6b2e1965299b1b514dbebd40b47906e198e))
 254   │
 255   │ ### Fix
 256   │
 257   │ * fix(deps): update dependency gitpython to v3.1.37 (#237)
 258   │
 259   │ Co-authored-by: renovate[bot] &lt;29139614+renovate[bot]@users.noreply.github.com&gt; ([`28e3c2d`](https://gith
       │ ub.com/kantord/SeaGOAT/commit/28e3c2d21b601feb66cdf808fe0c7f79768cfc04))
 260   │
 261   │ ### Unknown
 262   │
 263   │ * Update README.md (#230)
 264   │
 265   │ fixing typo about Operating Systems ([`51ae32c`](https://github.com/kantord/SeaGOAT/commit/51ae32c845365e64226f
       │ 49d031769c6b47673bb1))
 266   │
 267   │
 268   │ ## v0.29.0 (2023-09-20)
 269   │
 270   │ ### Feature
 271   │
 272   │ * feat: support .cc and .cxx files
 273   │
 274   │ * added support for alternative C++ extension (cc)
 275   │
 276   │ * modified readme to reflect that .cc extension is supported
 277   │
 278   │ * .cxx for C++ also ([`8ebd516`](https://github.com/kantord/SeaGOAT/commit/8ebd516b456d6655c7ba07b575bfdb19e919
       │ c5a5))
 279   │
 280   │
 281   │ ## v0.28.0 (2023-09-20)
 282   │
 283   │ ### Chore
 284   │
 285   │ * chore(deps): update python-semantic-release/python-semantic-release action to v8.1.1 (#219)
 286   │
 287   │ Co-authored-by: renovate[bot] &lt;29139614+renovate[bot]@users.noreply.github.com&gt; ([`1b581e3`](https://gith
       │ ub.com/kantord/SeaGOAT/commit/1b581e32546321d2ed04bd8c030b355c1fa3ac01))
 288   │
 289   │ * chore(deps): update dependency mkdocs-material to v9.3.2 (#217)
 290   │
 291   │ Co-authored-by: renovate[bot] &lt;29139614+renovate[bot]@users.noreply.github.com&gt; ([`c444f9e`](https://gith
       │ ub.com/kantord/SeaGOAT/commit/c444f9edfd496872b80d1e4d665596d77e5db8b3))
 292   │
 293   │ ### Documentation
 294   │
 295   │ * docs: fix URL for Bat (#221) ([`55c3ab3`](https://github.com/kantord/SeaGOAT/commit/55c3ab3bb3f2130e4b213656c
       │ c27b906dd31279d))
 296   │
 297   │ * docs: add notice about me looking for a job
 298   │
 299   │ * Update README.md
 300   │
 301   │ * docs: small grammar fix ([`05b4805`](https://github.com/kantord/SeaGOAT/commit/05b4805765b293b0c3b8a2caa14c31
       │ 24da113bf0))
 302   │
 303   │ ### Feature
 304   │
 305   │ * feat: support more programming languages
 306   │
 307   │ this is an empty commit to trigger a release
 308   │ after this malformed commit message:
 309   │ https://github.com/kantord/SeaGOAT/commit/5b33c3eff26e6d8c157c6cac6d2524fc9bc8f06a ([`634b129`](https://github.
       │ com/kantord/SeaGOAT/commit/634b129f2e38ed58c57697842cfa08299bd4d07c))
 310   │
 311   │ ### Unknown
 312   │
 313   │ * Support more programming languages (#223)
 314   │
 315   │ * feat: extend set of supported languages
 316   │
 317   │ To support at least this list:
 318   │ https://huggingface.co/datasets/code_search_net#languages
 319   │
 320   │ * docs: document list of supported languages ([`5b33c3e`](https://github.com/kantord/SeaGOAT/commit/5b33c3eff26
       │ e6d8c157c6cac6d2524fc9bc8f06a))
 321   │
 322   │
 323   │ ## v0.27.2 (2023-09-20)
 324   │
 325   │ ### Fix
 326   │
 327   │ * fix(deps): update dependency chromadb to v0.4.12 ([`73f7826`](https://github.com/kantord/SeaGOAT/commit/73f78
       │ 265aad6c75974697c300a67f6352d425902))
 328   │
 329   │
 330   │ ## v0.27.1 (2023-09-19)
 331   │
 332   │ ### Documentation
 333   │
 334   │ * docs: document query API ([`08f8cee`](https://github.com/kantord/SeaGOAT/commit/08f8cee0998ecd86ce55f19a235ad
       │ 5c577297a25))
 335   │
 336   │ ### Fix
 337   │
 338   │ * fix(deps): update dependency chromadb to v0.4.11 ([`a2f494c`](https://github.com/kantord/SeaGOAT/commit/a2f49
       │ 4ceaad9481f8aa22d9fd37c03e3c53d4aa0))
 339   │
 340   │
 341   │ ## v0.27.0 (2023-09-18)
 342   │
 343   │ ### Feature
 344   │
 345   │ * feat: include isRunning for server-info ([`023ea85`](https://github.com/kantord/SeaGOAT/commit/023ea85ca4f7c4
       │ 0b5c27fb609b45a3612e2e661f))
 346   │
 347   │
 348   │ ## v0.26.0 (2023-09-18)
 349   │
 350   │ ### Feature
 351   │
 352   │ * feat: allow getting list of servers as JSON ([`21ff638`](https://github.com/kantord/SeaGOAT/commit/21ff638f51
       │ 7f64256460ca37c9c09dc20898dbec))
 353   │
 354   │
 355   │ ## v0.25.1 (2023-09-17)
 356   │
 357   │ ### Chore
 358   │
 359   │ * chore(deps): update dependency pyright to v1.1.327 ([`ffcc519`](https://github.com/kantord/SeaGOAT/commit/ffc
       │ c519342fef9cffabb783b0887dc823b84c234))
 360   │
 361   │ ### Fix
 362   │
 363   │ * fix(deps): update dependency nest-asyncio to v1.5.8 ([`7483c99`](https://github.com/kantord/SeaGOAT/commit/74
       │ 83c99540d364bb4023a5280307a2223f442853))
 364   │
 365   │ ### Refactor
 366   │
 367   │ * refactor: use a single file for all server info ([`76471bb`](https://github.com/kantord/SeaGOAT/commit/76471b
       │ b9f22245868e27c9d48bda5c0ff34677eb))
 368   │
 369   │
 370   │ ## v0.25.0 (2023-09-13)
 371   │
 372   │ ### Feature
 373   │
 374   │ * feat: make scores rounded to 4 digits ([`80c4ec2`](https://github.com/kantord/SeaGOAT/commit/80c4ec2b6800b097
       │ 14a32480af83e7a38fef596f))
 375   │
 376   │ * feat: include score for result lines ([`868d01f`](https://github.com/kantord/SeaGOAT/commit/868d01fa7f25b0125
       │ 5beeb1207f77327e152f632))
 377   │
 378   │ * feat: include score in results ([`2cde673`](https://github.com/kantord/SeaGOAT/commit/2cde67396b7e97302091883
       │ dbe8bf68169811aec))
 379   │
 380   │
 381   │ ## v0.24.0 (2023-09-13)
 382   │
 383   │ ### Feature
 384   │
 385   │ * feat: make grep vs chroma results more balanced ([`c802358`](https://github.com/kantord/SeaGOAT/commit/c80235
       │ 86ffcd92d0c0d3b74aa00e819f615ad03d))
 386   │
 387   │
 388   │ ## v0.23.6 (2023-09-13)
 389   │
 390   │ ### Documentation
 391   │
 392   │ * docs: update macos call to action in README ([`ad53cad`](https://github.com/kantord/SeaGOAT/commit/ad53cad201
       │ 3fb4b822c01f6183b4fcfd004b57d4))
 393   │
 394   │ ### Fix
 395   │
 396   │ * fix: avoid crashing when there are no results
 397   │
 398   │ test: test what happens when there are no results ([`49d28a2`](https://github.com/kantord/SeaGOAT/commit/49d28a
       │ 2cd768633868199400673c70ba2095e6e6))
 399   │
 400   │
 401   │ ## v0.23.5 (2023-09-12)
 402   │
 403   │ ### Fix
 404   │
 405   │ * fix(deps): update dependency setuptools to v68.2.2 ([`45c69a0`](https://github.com/kantord/SeaGOAT/commit/45c
       │ 69a07de8f050c90ec981b4af16214eee20d11))
 406   │
 407   │
 408   │ ## v0.23.4 (2023-09-12)
 409   │
 410   │ ### Fix
 411   │
 412   │ * fix(deps): update dependency gitpython to v3.1.36 ([`48c9a18`](https://github.com/kantord/SeaGOAT/commit/48c9
       │ a181d8955fbc4de333b1797b404322edf823))
 413   │
 414   │ * fix(deps): update dependency chromadb to v0.4.10 ([`837443e`](https://github.com/kantord/SeaGOAT/commit/83744
       │ 3ef8c648ebc7e0736e822a564d422897f4f))
 415   │
 416   │
 417   │ ## v0.23.3 (2023-09-12)
 418   │
 419   │ ### Ci
 420   │
 421   │ * ci: run all tests on Mac OS ([`68cb84a`](https://github.com/kantord/SeaGOAT/commit/68cb84a6c02d07dc667688485d
       │ e43099d08f93fc))
 422   │
 423   │ * ci: run more tests for Mac OS ([`b3f8406`](https://github.com/kantord/SeaGOAT/commit/b3f84065210dd844536c2a84
       │ 3e96d6616d480b51))
 424   │
 425   │ ### Fix
 426   │
 427   │ * fix: fix tests in mac os ([`9f215fd`](https://github.com/kantord/SeaGOAT/commit/9f215fdafa5b8ef33e8e8eaddeed7
       │ 0d14c70b74a))
 428   │
 429   │
 430   │ ## v0.23.2 (2023-09-11)
 431   │
 432   │ ### Chore
 433   │
 434   │ * chore(deps): update dependency mkdocs-material to v9.3.0 ([`fa540bc`](https://github.com/kantord/SeaGOAT/comm
       │ it/fa540bc7576e61b72bad09b35fe58045d534fdac))
 435   │
 436   │ * chore(deps): update dependency black to v23.9.1 ([`d414da7`](https://github.com/kantord/SeaGOAT/commit/d414da
       │ 7d2219f6bb72d6d212f3a76495723cb366))
 437   │
 438   │ ### Fix
 439   │
 440   │ * fix(deps): update dependency setuptools to v68.2.1 ([`54cfc12`](https://github.com/kantord/SeaGOAT/commit/54c
       │ fc128c2fc66764724ae9f1701ea868bd10540))
 441   │
 442   │
 443   │ ## v0.23.1 (2023-09-10)
 444   │
 445   │ ### Fix
 446   │
 447   │ * fix: use a Queue type that works on Mac OS ([`d6d6761`](https://github.com/kantord/SeaGOAT/commit/d6d67616aaf
       │ 7012c8019477541f3b56ab971f3be))
 448   │
 449   │
 450   │ ## v0.23.0 (2023-09-10)
 451   │
 452   │ ### Chore
 453   │
 454   │ * chore(deps): update dependency black to v23.9.0 ([`8b82efc`](https://github.com/kantord/SeaGOAT/commit/8b82ef
       │ c3dcafa16c5471e6fcf6bc50fa2f111d9a))
 455   │
 456   │ ### Documentation
 457   │
 458   │ * docs: add info about system requirements ([`4cf71fa`](https://github.com/kantord/SeaGOAT/commit/4cf71faeb8738
       │ e1a01a48bc2560a2628c83da585))
 459   │
 460   │ * docs: add titles to slideshow gif ([`c581e36`](https://github.com/kantord/SeaGOAT/commit/c581e3660938c9f9bbb0
       │ 654a0b039f891f73208b))
 461   │
 462   │ * docs: use a slideshow for the demo gif ([`0e8c510`](https://github.com/kantord/SeaGOAT/commit/0e8c5107e098c67
       │ 6915b153305ef8bc91e772cb3))
 463   │
 464   │ * docs: change gif theme ([`a9b76ad`](https://github.com/kantord/SeaGOAT/commit/a9b76adc4c21418fbf5709ae9440d71
       │ 841412af6))
 465   │
 466   │ * docs: improve gif quality
 467   │
 468   │ .
 469   │
 470   │ .
 471   │
 472   │ docs: update asciinema cast
 473   │
 474   │ docs: update dmoe gif ([`3c7a96a`](https://github.com/kantord/SeaGOAT/commit/3c7a96abcfa57aee14dcbd1c32b665622c
       │ 98b677))
 475   │
 476   │ ### Feature
 477   │
 478   │ * feat: use waitress as an HTTP server ([`16b31c2`](https://github.com/kantord/SeaGOAT/commit/16b31c244056d29d6
       │ d90d5e6c6b4348c53cffe9e))
 479   │
 480   │
 481   │ ## v0.22.1 (2023-09-08)
 482   │
 483   │ ### Fix
 484   │
 485   │ * fix(deps): update dependency orjson to v3.9.7 ([`43b963c`](https://github.com/kantord/SeaGOAT/commit/43b963c9
       │ 8fd5c5d24a5571c4abfea4948bbc4185))
 486   │
 487   │
 488   │ ## v0.22.0 (2023-09-08)
 489   │
 490   │ ### Documentation
 491   │
 492   │ * docs: fix too long lines in SECURITY.md ([`73c8d0b`](https://github.com/kantord/SeaGOAT/commit/73c8d0b3977c0e
       │ cd9459b32014215d2ad24f206a))
 493   │
 494   │ ### Feature
 495   │
 496   │ * feat: make regular expressions case insensitive ([`868c5f5`](https://github.com/kantord/SeaGOAT/commit/868c5f
       │ 528b4f1f5cfbbdd047d5bd823a3d97b9db))
 497   │
 498   │ ### Unknown
 499   │
 500   │ * Create SECURITY.md ([`013af0d`](https://github.com/kantord/SeaGOAT/commit/013af0d4b6a5408a5bcf35b44aa242d0004
       │ 30c3f))
 ...   │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ 8< ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
1017   │ ### Fix
 ...   │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ 8< ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
1019   │ * fix(deps): update dependency tqdm to v4.65.2 ([`1e09ac9`](https://github.com/kantord/SeaGOAT/commit/1e09ac906
       │ 4231ef44614035754e6e976f017784b))
 ...   │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ 8< ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
1021   │
 ...   │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ 8< ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
1025   │
 ...   │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ 8< ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
1213   │ ## v0.7.3 (2023-07-30)
 ...   │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ 8< ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
1227   │
 ...   │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ 8< ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
1231   │
 ...   │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ 8< ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
1233   │
 ...   │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ 8< ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
1235   │ ## v0.7.1 (2023-07-30)
 ...   │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ 8< ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
1280   │ * docs: add usage documentation to Readme ([`7cc15fa`](https://github.com/kantord/SeaGOAT/commit/7cc15fa7981da7
       │ 61e85eb5605aafdbfd6492e4cc))
 ...   │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ 8< ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
1282   │ * docs: allow editing documentation files ([`6a08d63`](https://github.com/kantord/SeaGOAT/commit/6a08d636c32052
       │ 13eda3b37cdc38a503befe72ee))
 ...   │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ 8< ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
1284   │ * docs: set up repo_url ([`2202f16`](https://github.com/kantord/SeaGOAT/commit/2202f16a1e5ec91e64c8334fc5ecf8c9
       │ 5a612708))
 ...   │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ 8< ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
1288   │ ### Feature
 ...   │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ 8< ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
1476   │ * chore: release new version ([`dc29537`](https://github.com/kantord/SeaGOAT/commit/dc295372787583cd8825899c951
       │ 568f630558a11))
 ...   │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ 8< ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
1490   │ * chore: set up basic test framework ([`107fb7f`](https://github.com/kantord/SeaGOAT/commit/107fb7f351a15664a24
       │ b60b7ee02e8944f05162c))
 ...   │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ 8< ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
1494   │ * ci: auto format yaml files ([`d01e233`](https://github.com/kantord/SeaGOAT/commit/d01e2331520e0ccbb91a23284dc
       │ a1fa9df3a3f30))
 ...   │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ 8< ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
1496   │ * ci: add markdown linting ([`45a7c1e`](https://github.com/kantord/SeaGOAT/commit/45a7c1ec2a95c75125dd73c99d5a9
       │ b8febeee1a4))
 ...   │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ 8< ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
1498   │ * ci: test on osx ([`4e22195`](https://github.com/kantord/SeaGOAT/commit/4e22195d180156fa636205725ca076a54aaab0
       │ 64))
───────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────
───────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       │ File: seagoat/file.py
───────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   4   │ from typing import Literal
 ...   │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ 8< ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
   6   │ from seagoat.utils.file_reader import read_file_with_correct_encoding
───────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────
───────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       │ File: README.md
───────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1   │ <!-- markdownlint-disable MD033 -->
   2   │
   3   │ <h1>
   4   │   <p align="center">
   5   │     <img src="assets/logo-small.png" alt="Logo" width="200"/>
   6   │     <font size="8"><b>SeaGOAT</b></font>
   7   │   </p>
   8   │ </h1>
   9   │
  10   │ A code search engine for the AI age. SeaGOAT is a local search tool that
  11   │ leverages vector embeddings to enable to search your codebase semantically.
  12   │
  13   │ <p align="center">
  14   │   <img src="assets/demo-slideshow.gif" alt="" />
  15   │ </p>
  16   │
  17   │ *Note: I was recently laid off my job and I am looking for new
  18   │ opportunities. If you need a Senior Full Stack Developer,
  19   │ [contact me](mailto:github@daniel-kantor.com)! I have experience with
  20   │ React, Node and Python and I'm located in Spain (European Union).
  21   │ 10+ years in software development professionally.*
  22   │
  23   │ ## Getting started
  24   │
  25   │ ### Install SeaGOAT
  26   │
  27   │ In order to install SeaGOAT, you need to have the following
  28   │ dependencies already installed on your computer:
  29   │
  30   │ - Python 3.11 or newer
  31   │ - ripgrep
  32   │ - [bat](https://github.com/sharkdp/bat) (**optional**, highly recommended)
  33   │
  34   │ When `bat` is [installed](https://github.com/sharkdp/bat#on-ubuntu-using-apt),
  35   │ it is used to display results as long as color is enabled. When SeaGOAT is
  36   │ used as part of a pipeline, a grep-line output format is used. When color is
  37   │ enabled, but `bat` is not installed, SeaGOAT will highlight the output using
  38   │ pygments. Using `bat` is recommended.
  39   │
  40   │ To install SeaGOAT using `pipx`, use the following command:
  41   │
  42   │ ```bash
  43   │ pipx install seagoat
  44   │ ```
  45   │
  46   │ ### System requirements
  47   │
  48   │ #### Hardware
  49   │
  50   │ Should work on any decent laptop.
  51   │
  52   │ #### Operating system
  53   │
  54   │ SeaGOAT is designed to work on Linux (*tested* ✅),
  55   │ macOS ([partly tested, **help**](https://github.com/kantord/SeaGOAT/issues/178) 🙏)
  56   │ and Windows ([**help needed**](https://github.com/kantord/SeaGOAT/issues/179) 🙏).
  57   │
  58   │ ### Start SeaGOAT server
  59   │
  60   │ In order to use SeaGOAT in your project, you have to start the SeaGOAT server
  61   │ using the following command:
  62   │
  63   │ ```bash
  64   │ seagoat-server start /path/to/your/repo
  65   │ ```
  66   │
  67   │ ### Search your repository
  68   │
  69   │ If you have the server running, you can simply use the
  70   │ `gt` or `seagoat` command to query your repository. For example:
  71   │
  72   │ ```bash
  73   │ gt "Where are the numbers rounded"
  74   │ ```
  75   │
  76   │ You can also use
  77   │ [Regular Expressions](https://en.wikipedia.org/wiki/Regular_expression)
  78   │ in your queries, for example
  79   │
  80   │ ```bash
  81   │ gt "function calc_.* that deals with taxes"
  82   │ ```
  83   │
  84   │ ### Stopping the server
  85   │
  86   │ You can stop the running server using the following command:
  87   │
  88   │ ```bash
  89   │ seagoat-server stop /path/to/your/repo
  90   │ ```
  91   │
  92   │ ### Configuring SeaGOAT
  93   │
  94   │ SeaGOAT can be tailored to your needs through YAML configuration files,
  95   │ either globally or project-specifically with a `.seagoat.yml` file.
  96   │ For instance:
  97   │
  98   │ ```yaml
  99   │ # .seagoat.yml
 100   │
 101   │ server:
 102   │   port: 31134  # Specify server port
 103   │ ```
 104   │
 105   │ [Check out the documentation](https://kantord.github.io/SeaGOAT/latest/configuration/)
 106   │ for more details!
 107   │
 108   │ ## Development
 109   │
 110   │ **Requirements**:
 111   │
 112   │ - [Poetry](https://python-poetry.org/)
 113   │ - Python 3.11 or newer
 114   │ - [ripgrep](https://github.com/BurntSushi/ripgrep)
 115   │
 116   │ ### Install dependencies
 117   │
 118   │ After cloning the repository, install dependencies using the following command:
 119   │
 120   │ ```bash
 121   │ poetry install
 122   │ ```
 123   │
 124   │ ### Running tests
 125   │
 126   │ #### Watch mode (recommended)
 127   │
 128   │ ```bash
 129   │ poetry run ptw
 130   │ ```
 131   │
 132   │ #### Test changed files
 133   │
 134   │ ```bash
 135   │ poetry run pytest .  --testmon
 136   │ ```
 137   │
 138   │ #### Test all files
 139   │
 140   │ ```bash
 141   │ poetry run pytest .
 142   │ ```
 143   │
 144   │ ### Manual testing
 145   │
 146   │ You can test any SeaGOAT command manually in your local development
 147   │ environment. For example to test the development version of the
 148   │ `seagoat-server` command, you can run:
 149   │
 150   │ ```bash
 151   │ poetry run seagoat-server start ~/path/an/example/repository
 152   │ ```
 153   │
 154   │ ## FAQ
 155   │
 156   │ The points in this FAQ are indications of how SeaGOAT works, but are not
 157   │ a legal contract. SeaGOAT is licensed under an open source license and if you
 158   │ are in doubt about the privacy/safety/etc implications of SeaGOAT, you are
 159   │ welcome to examine the source code,
 160   │ [raise your concerns](https://github.com/kantord/SeaGOAT/issues/new),
 161   │ or create a pull request to fix a problem.
 162   │
 163   │ ### How does SeaGOAT work? Does it send my data to ChatGPT?
 164   │
 165   │ SeaGOAT does not rely on 3rd party APIs or any remote APIs and executes all
 166   │ functionality locally using the SeaGOAT server that you are able to run on
 167   │ your own machine.
 168   │
 169   │ Instead of relying on APIs or "connecting to ChatGPT", it uses the vector
 170   │ database called ChromaDB, with a local vector embedding engine and
 171   │ telemetry disabled by default.
 172   │
 173   │ Apart from that, SeaGOAT also uses ripgrep, a regular-expression based code
 174   │ search engine in order to provider regular expression/keyword based matches
 175   │ in addition to the "AI-based" matches.
 176   │
 177   │ While the current version of SeaGOAT does not send your data to remote
 178   │ servers, it might be possible that in the future there will be **optional**
 179   │ features that do so, if any further improvement can be gained from that.
 180   │
 181   │ ### Why does SeaGOAT need a server?
 182   │
 183   │ SeaGOAT needs a server in order to provide a speedy response. SeaGOAT heavily
 184   │ relies on vector embeddings and vector databases, which at the moment cannot
 185   │ be replace with an architecture that processes files on the fly.
 186   │
 187   │ It's worth noting that *you are able to run SeaGOAT server entirely locally*,
 188   │ and it works even if you don't have an internet connection. This use case
 189   │ does not require you to share data with a remote server, you are able to use
 190   │ your own SeaGOAT server locally, albeit it's also possible to run a SeaGOAT
 191   │ server and allow other computers to connect to it, if you so wish.
 192   │
 193   │ ### Does SeaGOAT create AI-derived work? Is SeaGOAT ethical?
 194   │
 195   │ If you are concerned about the ethical implications of using AI tools keep in
 196   │ mind that SeaGOAT is not a code generator but a code search engine, therefore
 197   │ it does not create AI derived work.
 198   │
 199   │ That being said, a language model *is* being used to generate vector
 200   │ embeddings. At the moment SeaGOAT uses ChromaDB's default model for
 201   │ calculating vector embeddings, and I am not aware of this being an ethical
 202   │ concern.
 203   │
 204   │ ### What programming langauges are supported?
 205   │
 206   │ Currently SeaGOAT is hardcoded to only process files in the following
 207   │ formats:
 208   │
 209   │ - **Text Files** (`*.txt`)
 210   │ - **Markdown** (`*.md`)
 211   │ - **Python** (`*.py`)
 212   │ - **C** (`*.c`, `*.h`)
 213   │ - **C++** (`*.cpp`, `*.cc`, `*.cxx`, `*.hpp`)
 214   │ - **TypeScript** (`*.ts`, `*.tsx`)
 215   │ - **JavaScript** (`*.js`, `*.jsx`)
 216   │ - **HTML** (`*.html`)
 217   │ - **Go** (`*.go`)
 218   │ - **Java** (`*.java`)
 219   │ - **PHP** (`*.php`)
 220   │ - **Ruby** (`*.rb`)
 221   │
 222   │ ### Why is SeaGOAT processing files so slowly while barely using my CPU?
 223   │
 224   │ Since processing files for large repositories can take a long time, SeaGOAT
 225   │ is **designed to allow you to use your computer while processing files**. It is
 226   │ an intentional design choice to avoid blocking/slowing down your computer.
 227   │
 228   │ This design decision does not affect the performance of queries.
 229   │
 230   │ **By the way, you are able to use SeaGOAT to query your repository while
 231   │ it's processing your files!** When you make a query, and the files are not
 232   │ processed yet, you will receive a warning with an esimation of the accuracy
 233   │ of your results. Also, regular expression/full text search based results
 234   │ will be displayed from the very beginning!
 235   │
 236   │ ### What character encodings are supported?
 237   │
 238   │ The preferred character encoding is UTF-8. Most other character encodings
 239   │ should also work. Only text files are supported, SeaGOAT ignores binary files.
 240   │
 241   │ ### Where does SeaGOAT store it's database/cache?
 242   │
 243   │ Where SeaGOAT stores databases and cache depends on your operating system.
 244   │ For your convenience, you can use the `seagoat-server server-info`
 245   │ command to find out where these files are stored on your system.
 246   │
 247   │ ### Can I host SeaGOAT server on a different computer?
 248   │
 249   │ Yes, if you would like to use SeaGOAT without having to run the server on
 250   │ the same computer, you can simply self-host SeaGOAT server on a different
 251   │ computer or in the cloud, and
 252   │ [configure](https://kantord.github.io/SeaGOAT/latest/configuration/)
 253   │ the `seagoat`/`gt` command to connect to this remote server through the
 254   │ internet.
 255   │
 256   │ Keep in mind that SeaGOAT itself does not enforce any security as it is
 257   │ primarily designed to run locally. If you have private code that you do not
 258   │ wish to leak, you will have to make sure that only trusted people have
 259   │ access to the SeaGOAT server. This could be done by making it only available
 260   │ through a VPN that only your teammates can access.
 261   │
 262   │ ### Can I ignore files/directories?
 263   │
 264   │ SeaGOAT already ignores all files/directories ignored in your `.gitignore`.
 265   │ If you wish to ignore additional files but keep them in git, you can use the
 266   │ `ignorePatterns` attribute from the server configuration.
 267   │ [Learn more](https://kantord.github.io/SeaGOAT/latest/configuration/)
───────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────
───────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       │ File: docs\server.md
───────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1   │ <!-- markdownlint-disable MD046 -->
   2   │ # SeaGOAT-server
   3   │
   4   │ The seagoat-server is an integral component of the Seagoat command-line tool
   5   │ designed to analyze your codebase and create vector embeddings for it.
   6   │
   7   │ While it serves as a backend for the command-line tool, also allows you to
   8   │ use it through HTTP to build your own SeaGOAT-based applications.
   9   │
  10   │ ## Starting the server
  11   │
  12   │ To boot up the server for a specific repository, use:
  13   │
  14   │ ```bash
  15   │ seagoat-server start <repo_path> [--port=<custom_port>]
  16   │ ```
  17   │
  18   │ * `repo_path`: Path to your Git repository
  19   │ * `--port`: (Optional) Run the server on a specific port
  20   │
  21   │ If you don't specify a custom port, a random port will be assigned to your
  22   │ server. Don't worry, SeaGOAT will still be able to automatically find
  23   │ the server corresponding to a specific repository.
  24   │
  25   │ ## Developing with SeaGOAT-server
  26   │
  27   │ SeaGOAT-server not only serves as a backend for the SeaGOAT command-line tool
  28   │ but also offers developers the capability to integrate its functions to build
  29   │ custom applications.
  30   │
  31   │ ### Retrieving server information
  32   │
  33   │ As SeaGOAT servers only run on one repository at a time, there is a command
  34   │ provided in order to gather information about all running servers, including
  35   │ how to access them through HTTP.
  36   │
  37   │ To get detailed information about all active SeaGOAT servers in JSON format,
  38   │ you can utilize the `server-info` command:
  39   │
  40   │ ```bash
  41   │ seagoat-server server-info
  42   │ ```
  43   │
  44   │ You will receive a response similar to this:
  45   │
  46   │ ```json
  47   │ {
  48   │     "version": "0.5.3",
  49   │     "globalCache": "/home/myuser/.cache/seagoat",
  50   │     "globalConfigFile": "/home/myuser/.config/seagoat/config.yml",
  51   │     "servers": {
  52   │         "/path/to/repository/1": {
  53   │             "cacheLocation": {
  54   │               "chroma": "/home/myuser/.cache/seagoat/bfe8133b9e871ea1c8498a0"
  55   │             },
  56   │             "isRunning": true,
  57   │             "host": "127.0.0.1",
  58   │             "port": "8080",
  59   │             "address": "http://127.0.0.1:8080"
  60   │         },
  61   │         "/path/to/repository/2": {
  62   │             "cacheLocation": {
  63   │               "chroma": "/home/myuser/.cache/seagoat/fbee39c83bd47a75e2f839"
  64   │             },
  65   │             "isRunning": false,
  66   │             "host": "127.0.0.1",
  67   │             "port": "8081",
  68   │             "address": "http://127.0.0.1:8081"
  69   │         }
  70   │     }
  71   │ }
  72   │ ```
  73   │
  74   │ In this output, you can also see information about where databases/caches
  75   │ related to your projects are stored. `globalCache` is the parent folder of
  76   │ all the cache directories, and withing each server, you can find an attribute
  77   │ called `cacheLocation` which contains the path to the cache directory for
  78   │ each different type of cache associated with that project.
  79   │
  80   │ If you want to create a configuration file, you can see the path for it
  81   │ in the `globalConfigFile` attribute. This depends on your operating system.
  82   │ You can also create a configuration file for your project. See
  83   │ [the configuration documentation](configuration.md) for more information.
  84   │
  85   │ ### Making queries using the API
  86   │
  87   │ If you want to build an application using SeaGOAT-server, first you need to
  88   │ figure out the address of the server you want to connect to.
  89   │
  90   │ Once you have the address, you can start making queries to it. For instance,
  91   │ this is how you'd make a query using `curl` to the server running on
  92   │ `http://localhost:32835`:
  93   │
  94   │ ```bash
  95   │ curl 'http://localhost:32835/query/example'
  96   │ ```
  97   │
  98   │ You will receive a response similar to this one:
  99   │
 100   │ ```json
 101   │ {
 102   │   "results": [
 103   │     {
 104   │       "path": "tests/conftest.py",
 105   │       "fullPath": "/home/user/repos/SeaGOAT/tests/conftest.py",
 106   │       "score": 0.6,
 107   │       "blocks": [
 108   │       {
 109   │           "lines": [
 110   │             {
 111   │               "score": 0.21,
 112   │               "line": 100,
 113   │               "lineText": "def very_relevant_function():",
 114   │               "resultTypes": [
 115   │                 "result"
 116   │               ]
 117   │             }
 118   │           ],
 119   │           "lineTypeCount": {
 120   │             "result": 1
 121   │           }
 122   │         }
 123   │         {
 124   │           "lines": [
 125   │             {
 126   │               "score": 0.6,
 127   │               "line": 489,
 128   │               "lineText": " contents=(\"hello()\\n\" * (i % 50)),",
 129   │               "resultTypes": [
 130   │                 "result"
 131   │               ]
 132   │             },
 133   │             {
 134   │               "score": 0.84,
 135   │               "line": 490,
 136   │               "lineText": "     return foo * bar",
 137   │               "resultTypes": [
 138   │                 "result"
 139   │               ]
 140   │             }
 141   │           ],
 142   │           "lineTypeCount": {
 143   │             "result": 1
 144   │           }
 145   │         }
 146   │       ]
 147   │     },
 148   │     {
 149   │       "path": "tests/test_cli.py",
 150   │       "fullPath": "/home/user/repos/SeaGOAT/tests/test_cli.py",
 151   │       "score": 0.87,
 152   │       "blocks": [... etc ... ]
 153   │     },
 154   │     ... etc ...
 155   │   ],
 156   │   "version": "0.26.0"
 157   │ }
 158   │ ```
 159   │
 160   │ #### Understanding the response
 161   │
 162   │ The response contains the following information:
 163   │
 164   │ * `version` - This is the version of SeaGOAT being used
 165   │ * `results` - This is an array containing your results
 166   │
 167   │ Each result inside results has the following data:
 168   │
 169   │ * `path` - The (relative) path of the file within the repository.
 170   │ * `fullPath` - The absolute path to the file in your filesystem.
 171   │ * `score` - A number indicating how relevant a result is, smaller is better.
 172   │ * `blocks` - An array of relevant code blocks from this file.
 173   │
 174   │ Within each block you will find:
 175   │
 176   │ * `lines` - An array of line objects containing:
 177   │   * `score` - Relevance score for this line. See `score` above.
 178   │   * `line` - The line number in the file where the result was found.
 179   │   * `lineText` - The actual text content of that line.
 180   │   * `resultTypes` - An array indicating all types of result on this line
 181   │     * `"result"` Means that the line is directly relevant to the query
 182   │     * `"context"` Means that the line was added as a context line
 183   │ * `lineTypeCount` - An object containing a count of all line types within
 184   │   the code block. See `resultTypes` for more.
───────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────
───────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       │ File: docs\configuration.md
───────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1   │ # Configuring SeaGOAT
   2   │
   3   │ ## Introduction
   4   │
   5   │ Some features of SeaGOAT can be configured through config files.
   6   │ All configuration files are written in the *YAML* format.
   7   │
   8   │ There are two types of configuration files:
   9   │
  10   │ * **Global configuration files**. Use `seagoat-server server-info` to find the
  11   │ location of this file on your system.
  12   │ [Learn more](server.md#retrieving-server-information).
  13   │ * **Project configuration files**. Located in a file called
  14   │ `seagoat.yml` in the root folder of your repository.
  15   │
  16   │ Both of these types of configuration files have the exact same format, and
  17   │ your project-wide configuration files are merged with the global
  18   │ configuration. Whenever both your local as well as global configuration
  19   │ files define a value, the local value takes precedence.
  20   │
  21   │ This is an example of a configuration file:
  22   │
  23   │ ```yaml
  24   │ # .seagoat.yml
  25   │
  26   │ server:
  27   │   port: 31134  # A port number to run the server on
  28   │
  29   │   # globs to ignore in addition to .gitignore
  30   │   ignorePatterns:
  31   │     - "**/locales/*" # Ignore all files inside 'locales' directories
  32   │     - "**/*.po"     # Ignore all gettext translation files
  33   │
  34   │ client:
  35   │   # Connect the CLI to a remove server
  36   │   host: https://example.com/seagoat-instance/
  37   │
  38   │ ```
  39   │
  40   │ ## Available configuration options
  41   │
  42   │ ### Server
  43   │
  44   │ Server-related configuration resides under the `server` attribute in your
  45   │ config files.
  46   │
  47   │ The following values can be configured:
  48   │
  49   │ * `port`: The port number the server will run on
  50   │ * `ignorePatterns`: A list of glob patterns to ignore. Keep in mind that all
  51   │ files ignored by `.gitignore` are already ignored. You probably should not
  52   │ need to configure this value. It is only useful if there are some files that
  53   │ you wish to keep in git, but you wish to hide from from SeaGOAT.
  54   │ [Learn more about globs](https://en.wikipedia.org/wiki/Glob_(programming))
  55   │
  56   │ ### Client
  57   │
  58   │ Configuration for the CLI (`gt` command) resides under the `client` attribute.
  59   │
  60   │ The following values can be configured:
  61   │
  62   │ * `host`: The URL of the SeaGOAT instance to connect to. This is only
  63   │ needed when you are hosting your SeaGOAT server on a remote computer. *It is
  64   │ recommended to set this value in your project configuration file, so that
  65   │ you are still able to use the local server for different projects.*
───────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────
───────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       │ File: docs\usage.md
───────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1   │ <!-- markdownlint-disable MD046 -->
   2   │ # Usage
   3   │
   4   │ SeaGOAT is a command-line tool designed to assist in querying your codebase.
   5   │ By using technologies such as ChromaDB and ripgrep, it goes beyond direct
   6   │ match searches and uses semantic meaning to quickly find
   7   │ details related to your query.
   8   │
   9   │ !!! info "Only works with Git"
  10   │
  11   │     SeaGOAT takes your Git history into account in order to provide
  12   │     the most useful and relevant results.
  13   │
  14   │ ## Command Usage
  15   │
  16   │ ```bash
  17   │ seagoat <query> [repo_path] [OPTIONS]
  18   │ ```
  19   │
  20   │ !!! note
  21   │     The seagoat CLI queries the SeaGOAT server. If the server is not running,
  22   │     you would be prompted to start the server using
  23   │     `seagoat-server start {repo_path} command`.
  24   │
  25   │ ## Arguments
  26   │
  27   │ * `query`: This is a required argument.
  28   │ It is the query to be made to the SeaGOAT server.
  29   │ * `repo_path`: This argument is optional, and defaults
  30   │ to the current working directory. It represents the path to the code repository.
  31   │
  32   │ ### Examples
  33   │
  34   │ #### Query current folder
  35   │
  36   │ ```bash
  37   │ seagoat "myQuery"
  38   │ ```
  39   │
  40   │ #### Query specific folder
  41   │
  42   │ ```bash
  43   │ seagoat "myQuery" "/path/to/my/repo"
  44   │ ```
  45   │
  46   │ #### Using Regular Expressions
  47   │
  48   │ One of SeaGOAT's most powerful features is the ability to combine regular expressions
  49   │ with AI-driven vector queries. This synergistic approach narrows down your
  50   │ codebase search using pattern-based regular expressions while leveraging AI
  51   │ to understand the semantic meaning behind your query.
  52   │
  53   │ ```bash
  54   │ seagoat "function db_.* that initializes database"
  55   │ ```
  56   │
  57   │ ## Options
  58   │
  59   │ ### `--no-color`: Disable syntax highlighting
  60   │
  61   │ This is automatically enabled when used as part of a bash pipeline.
  62   │
  63   │ ```bash title="Example"
  64   │ seagoat "myQuery" --no-color
  65   │ ```
  66   │
  67   │ ### `-l, --max-results`: Limit number of result lines
  68   │
  69   │ This limits the number of result lines displayed.
  70   │ Useful if you only care about the best results.
  71   │
  72   │ ```bash title="Example"
  73   │ seagoat "myQuery" --max-results=5
  74   │ ```
  75   │
  76   │ !!! note "SeaGOAT is oriented around code blocks, not individual lines"
  77   │
  78   │     In SeaGOAT, code is displayed in full, continous blocks rather than
  79   │     individual lines. It'll always show at least one full block,
  80   │     even if your limit is 0.
  81   │
  82   │     If you set a limit, SeaGOAT ensures that complete blocks are shown as long
  83   │     as they fit in your limit. For example, with a 5-line limit,
  84   │     it can show a 3-line and a 2-line block, but not two 3-line blocks.
  85   │
  86   │     Also, this limit only counts the actual code, not any extra context lines
  87   │     you might request.
  88   │
  89   │ ### `--version`: Print version number
  90   │
  91   │ This prints the version of your current SeaGOAT installation.
  92   │
  93   │ ### `-B, --context-above`: Lines of context before each result
  94   │
  95   │ This option allows you to include a specified number of lines
  96   │ of context before each matching result.
  97   │
  98   │ !!! note "Tricky context lines"
  99   │
 100   │     Context lines are lines that are added because they are adjacent to a
 101   │     result line.
 102   │
 103   │     That being said, because lines are grouped into chunks of 3,
 104   │     results based on vector embeddings might already contain lines that might
 105   │     not be strictly related to the query.
 106   │
 107   │     This might make it appear like there are more context lines than you
 108   │     requested. Consider this when deciding how many context lines to include.
 109   │
 110   │ ```bash title="Example"
 111   │ seagoat "myQuery" --context-above=5
 112   │ ```
 113   │
 114   │ ### `-A, --context-below`: Lines of context after each result
 115   │
 116   │ This option allows you to include a specified number of lines of context after
 117   │ each matching result.
 118   │
 119   │ ```bash title="Example"
 120   │ seagoat "myQuery" --context-below=5
 121   │ ```
 122   │
 123   │ !!! note "Tricky context lines"
 124   │
 125   │     Context lines are lines that are added because they are adjacent to a
 126   │     result line.
 127   │
 128   │     That being said, because lines are grouped into chunks of 3,
 129   │     results based on vector embeddings might already contain lines that might
 130   │     not be strictly related to the query.
 131   │
 132   │     This might make it appear like there are more context lines than you
 133   │     requested. Consider this when deciding how many context lines to include.
 134   │
 135   │ ### `-C, --context`: Lines of context both before and after each result
 136   │
 137   │ This option sets both `--context-above` and `--context-below` to the same
 138   │ specified value. This is useful if you want an equal amount of context around
 139   │ each matching result.
 140   │
 141   │ ```bash title="Example"
 142   │ seagoat "myQuery" --context=5
 143   │ ```
 144   │
 145   │ !!! note "Tricky context lines"
 146   │
 147   │     Context lines are lines that are added because they are adjacent to a
 148   │     result line.
 149   │
 150   │     That being said, because lines are grouped into chunks of 3,
 151   │     results based on vector embeddings might already contain lines that might
 152   │     not be strictly related to the query.
 153   │
 154   │     This might make it appear like there are more context lines than you
 155   │     requested. Consider this when deciding how many context lines to include.
───────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────
───────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       │ File: CODE_OF_CONDUCT.md
───────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1   │ # Contributor Covenant Code of Conduct
   2   │
   3   │ ## Our Pledge
   4   │
   5   │ We as members, contributors, and leaders pledge to make participation in our
   6   │ community a harassment-free experience for everyone, regardless of age, body
   7   │ size, visible or invisible disability, ethnicity, sex characteristics, gender
   8   │ identity and expression, level of experience, education, socio-economic status,
   9   │ nationality, personal appearance, race, religion, or sexual identity
  10   │ and orientation.
  11   │
  12   │ We pledge to act and interact in ways that contribute to an open, welcoming,
  13   │ diverse, inclusive, and healthy community.
  14   │
  15   │ ## Our Standards
  16   │
  17   │ Examples of behavior that contributes to a positive environment for our
  18   │ community include:
  19   │
  20   │ * Demonstrating empathy and kindness toward other people
  21   │ * Being respectful of differing opinions, viewpoints, and experiences
  22   │ * Giving and gracefully accepting constructive feedback
  23   │ * Accepting responsibility and apologizing to those affected by our mistakes,
  24   │   and learning from the experience
  25   │ * Focusing on what is best not just for us as individuals, but for the
  26   │   overall community
  27   │
  28   │ Examples of unacceptable behavior include:
  29   │
  30   │ * The use of sexualized language or imagery, and sexual attention or
  31   │   advances of any kind
  32   │ * Trolling, insulting or derogatory comments, and personal or political attacks
  33   │ * Public or private harassment
  34   │ * Publishing others' private information, such as a physical or email
  35   │   address, without their explicit permission
  36   │ * Other conduct which could reasonably be considered inappropriate in a
  37   │   professional setting
  38   │
  39   │ ## Enforcement Responsibilities
  40   │
  41   │ Community leaders are responsible for clarifying and enforcing our standards of
  42   │ acceptable behavior and will take appropriate and fair corrective action in
  43   │ response to any behavior that they deem inappropriate, threatening, offensive,
  44   │ or harmful.
  45   │
  46   │ Community leaders have the right and responsibility to remove, edit, or reject
  47   │ comments, commits, code, wiki edits, issues, and other contributions that are
  48   │ not aligned to this Code of Conduct, and will communicate reasons for moderation
  49   │ decisions when appropriate.
  50   │
  51   │ ## Scope
  52   │
  53   │ This Code of Conduct applies within all community spaces, and also applies when
  54   │ an individual is officially representing the community in public spaces.
  55   │ Examples of representing our community include using an official e-mail address,
  56   │ posting via an official social media account, or acting as an appointed
  57   │ representative at an online or offline event.
  58   │
  59   │ ## Enforcement
  60   │
  61   │ Instances of abusive, harassing, or otherwise unacceptable behavior may be
  62   │ reported to the community leaders responsible for enforcement at
  63   │ <github@daniel-kantor.com>.
  64   │ All complaints will be reviewed and investigated promptly and fairly.
  65   │
  66   │ All community leaders are obligated to respect the privacy and security of the
  67   │ reporter of any incident.
  68   │
  69   │ ## Enforcement Guidelines
  70   │
  71   │ Community leaders will follow these Community Impact Guidelines in determining
  72   │ the consequences for any action they deem in violation of this Code of Conduct:
  73   │
  74   │ ### 1. Correction
  75   │
  76   │ **Community Impact**: Use of inappropriate language or other behavior deemed
  77   │ unprofessional or unwelcome in the community.
  78   │
  79   │ **Consequence**: A private, written warning from community leaders, providing
  80   │ clarity around the nature of the violation and an explanation of why the
  81   │ behavior was inappropriate. A public apology may be requested.
  82   │
  83   │ ### 2. Warning
  84   │
  85   │ **Community Impact**: A violation through a single incident or series
  86   │ of actions.
  87   │
  88   │ **Consequence**: A warning with consequences for continued behavior. No
  89   │ interaction with the people involved, including unsolicited interaction with
  90   │ those enforcing the Code of Conduct, for a specified period of time. This
  91   │ includes avoiding interactions in community spaces as well as external channels
  92   │ like social media. Violating these terms may lead to a temporary or
  93   │ permanent ban.
  94   │
  95   │ ### 3. Temporary Ban
  96   │
  97   │ **Community Impact**: A serious violation of community standards, including
  98   │ sustained inappropriate behavior.
  99   │
 100   │ **Consequence**: A temporary ban from any sort of interaction or public
 101   │ communication with the community for a specified period of time. No public or
 102   │ private interaction with the people involved, including unsolicited interaction
 103   │ with those enforcing the Code of Conduct, is allowed during this period.
 104   │ Violating these terms may lead to a permanent ban.
 105   │
 106   │ ### 4. Permanent Ban
 107   │
 108   │ **Community Impact**: Demonstrating a pattern of violation of community
 109   │ standards, including sustained inappropriate behavior,  harassment of an
 110   │ individual, or aggression toward or disparagement of classes of individuals.
 111   │
 112   │ **Consequence**: A permanent ban from any sort of public interaction within
 113   │ the community.
 114   │
 115   │ ## Attribution
 116   │
 117   │ This Code of Conduct is adapted from the [Contributor Covenant][homepage],
 118   │ version 2.0, available at
 119   │ <https://www.contributor-covenant.org/version/2/0/code_of_conduct.html>.
 120   │
 121   │ Community Impact Guidelines were inspired by [Mozilla's code of conduct
 122   │ enforcement ladder](https://github.com/mozilla/diversity).
 123   │
 124   │ [homepage]: https://www.contributor-covenant.org
 125   │
 126   │ For answers to common questions about this code of conduct, see the FAQ at
 127   │ <https://www.contributor-covenant.org/faq>. Translations are available at
 128   │ <https://www.contributor-covenant.org/translations>.
───────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────
───────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       │ File: SECURITY.md
───────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1   │ # Security Policy
   2   │
   3   │ ## Supported Versions
   4   │
   5   │ SeaGOAT uses semantic release to automatically release new versions
   6   │ automatically. Therefore there aren't multiple versions of SeaGOAT
   7   │ simultaneously maintained and to have the most secure version, you should be
   8   │ always using the latest version.
   9   │
  10   │ New versions are released automatically if one of SeaGOAT's dependencies
  11   │ releases a security upgrade.
  12   │
  13   │ ## Reporting a Vulnerability
  14   │
  15   │ Report vulnerabilities to <github@daniel-kantor.com>
───────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────
───────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       │ File: docs\index.md
───────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1   │ <!-- markdownlint-disable -->
   2   │ {!../README.md!}
───────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────
───────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       │ File: docs/index.md
───────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   2   │ {!../README.md!}

@GautierT
Copy link

GautierT commented Sep 29, 2023

I'm on windows with Seagoat 0.35.1

@kantord
Copy link
Owner

kantord commented Sep 29, 2023

Seems like it's giving you some entire files. Only few files are truncated to only give some results. That is weird.

It's not necessarily so weird that it gives you all files tho. The way it's designed to work is that the best results should be on top. But still, it looks like the filtering should be a bit more strict

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants