Skip to content

Commit

Permalink
Add an option for dealing with files with very long lines, fixes #13
Browse files Browse the repository at this point in the history
  • Loading branch information
dmitmel committed Jan 4, 2022
1 parent a01cfec commit f83773e
Show file tree
Hide file tree
Showing 3 changed files with 29 additions and 8 deletions.
25 changes: 17 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,14 +113,21 @@ end

_Default:_ `100`

Advanced option. See the section [Indexing](#indexing).
Optimization option. See the section [Indexing](#indexing-and-how-to-optimize-it).


### indexing_batch_size (type: number)

_Default:_ `1000`

Advanced option. See the section [Indexing](#indexing).
Optimization option. See the section [Indexing](#indexing-and-how-to-optimize-it).


### max_indexed_line_length (type: number)

_Default:_ `1024 * 40` (40 Kilobytes)

Optimization option. See the section [Indexing](#indexing-and-how-to-optimize-it).


## Locality bonus comparator (distance-based sorting)
Expand Down Expand Up @@ -149,7 +156,7 @@ cmp.setup({
```


## Indexing
## Indexing and how to optimize it

When a buffer is opened, this source first has to scan all lines in the buffer, match all words
and store all of their occurrences. This process is called _indexing_. When actually editing the
Expand All @@ -174,15 +181,17 @@ the indexer to the "synchronous" mode: this will process all lines in one go, ta
total (since no other code will be running on the Lua thread), but with the obvious downside that
the editor UI will be blocked.

The option `max_indexed_line_length` controls plugin's behavior in files with very long lines.
This is known to slow this source down significantly (see issue [#13](https://github.com/hrsh7th/cmp-buffer/issues/13)),
so by default it will take only the first few kilobytes of the line it is currently on. In other
words, very long lines are not ignored, but only a part of them is indexed.

### Performance on large text files

This source has been tested on code files of a few megabytes in size (5-10) and contains
optimizations for them, however, the indexed words can still take up tens of megabytes of RAM if
the file is large. It also currently has troubles on files with very long lines, see issue
[#13](https://github.com/hrsh7th/cmp-buffer/issues/13).

So, if you wish to avoid accidentally running this source on big files, you can tweak
`get_bufnrs`, for example like this:
the file is large. So, if you wish to avoid accidentally running this source on big files, you
can tweak `get_bufnrs`, for example like this:

```lua
get_bufnrs = function()
Expand Down
10 changes: 10 additions & 0 deletions lua/cmp_buffer/buffer.lua
Original file line number Diff line number Diff line change
Expand Up @@ -305,6 +305,16 @@ function buffer.index_line(self, linenr, line)
local word_i = 1

local remaining = line
-- The if statement checks the number of bytes in the line string, but slices
-- it on the number of characters. This is not a problem because the number
-- of characters is always equal to (if only ASCII characters are used) or
-- smaller than (if multibyte Unicode characters are used) the number of bytes.
-- In other words, if the line contains more characters than the max limit,
-- then it will always contain more bytes than the same limit.
-- This check is here because calling a Vimscript function is relatively slow.
if #remaining > self.opts.max_indexed_line_length then
remaining = vim.fn.strcharpart(line, 0, self.opts.max_indexed_line_length)
end
while #remaining > 0 do
-- NOTE: Both start and end indexes here are 0-based (unlike Lua strings),
-- and the end index is not inclusive.
Expand Down
2 changes: 2 additions & 0 deletions lua/cmp_buffer/source.lua
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ local buffer = require('cmp_buffer.buffer')
---@field public get_bufnrs fun(): number[]
---@field public indexing_batch_size number
---@field public indexing_interval number
---@field public max_indexed_line_length number

---@type cmp_buffer.Options
local defaults = {
Expand All @@ -16,6 +17,7 @@ local defaults = {
end,
indexing_batch_size = 1000,
indexing_interval = 100,
max_indexed_line_length = 1024 * 40,
}

local source = {}
Expand Down

0 comments on commit f83773e

Please sign in to comment.