Minimally invasive changes to improve search performance. #439
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Recently I have been using this tool on an almost daily basis for work. It saves me so much time, but parts search performance has been frustrating. I am excited about #348 and want to see it merged in some form, but I noticed the following comment from @chmorgan:
So I thought it would be informative/useful to see how far I could get with more traditional sqlite indexes. I was able to get 10x-100x speedups for a lot of common queries like searching for a particular part number or package, but didn't make much headway with pure fulltext search of the description field. This is where I think the work of @chmorgan would really shine. The changes in this PR are minimally invasive and would work on every KiCAD version we currently support. But the ft5 index could be enabled where available to make searching the description field as fast as the other queries.
Benchmark
Most of the following queries took over 1.5s on my machine. The time for all of them was similar, because SQLite was doing a full table scan each time.
3ms
.2ms
.74LV
takes7ms
100nf
) and a footprint (i.e.0603
) takes208ms
. (There are a lot of 0603 components!)10uf
) with a category (likeCapacitors
) takes26ms
10uf
again but with no category specified) takes368ms
. This is still faster than the baseline because it is able to ignore out of stock parts.In summary, everything is pretty fast as long as you have at least one hard constraint to narrow down the results. I was not able to get fulltext search working using standard indexes. We need ft5 for that part.
File Size
Adding the indexes increases the zip file download size from 226.9Mb to 333Mb (100Mb or 46% increase). Personally this is not an issue whatsoever for me, but it is a significant increase nonetheless.
Something we might want to consider is creating the indexes on the client after downloading rather than on the server. In my experience it only took about 30-60 seconds. This would give us the opportunity to customize what indexes and even tables which we create depending on the client capabilities. For example we could create an ft5 table if it is supported.
I would like to test how file size compares between downloading a premade SQLite database vs a compressed CSV file. I have a hunch the plain text might be more efficient, since SQLite is probably designed with fixed byte offsets for fast querying. If so, this might be an additional optimization to pursue which would "pay for" the time it takes to create the index client-side. But it's just a hypothesis and I could easily be wrong.
How to test this PR
I haven't yet touched the automatic database generation code. I'm lost with GitHub actions. I guess it would just be a matter of adding a few lines of SQL. But for now you will need to enter an SQLite shell and run a few commands to create the indexes:
Changes I had to make
74LV
would miss parts likeSN74LV
. That is just what it takes to make the index work. If you want full-text search, you have to type it in the query field and take the hit.Tricks I learned along the way
LIKE
query will use an index if certain conditions, like case-sensitivity and not being a number, are met. Most of my work involved running lots of example queries through the sqlite shell with.eqp on
so I could see what the query planner was thinking and tweak things until an index was used as expected.LIKE
optimization doesn't work if the string to be compared contains only numbers. (e.g."Footprint" LIKE "0603%")
So I detect those digit-only strings and fall back to `"Footprint" = "0603"./ If someone knows a better way to handle this I'd love to hear about it."Field" = "{p}" COLLATE NOCASE