Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimally invasive changes to improve search performance. #439

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

whmountains
Copy link
Collaborator

@whmountains whmountains commented Apr 9, 2024

Recently I have been using this tool on an almost daily basis for work. It saves me so much time, but parts search performance has been frustrating. I am excited about #348 and want to see it merged in some form, but I noticed the following comment from @chmorgan:

I did fix a number of performance issues, and introduced a few other changes that trimmed some off the database size. There are however a few corner case searches that take a moment longer, and the database is larger than the non fts5 one.

So I thought it would be informative/useful to see how far I could get with more traditional sqlite indexes. I was able to get 10x-100x speedups for a lot of common queries like searching for a particular part number or package, but didn't make much headway with pure fulltext search of the description field. This is where I think the work of @chmorgan would really shine. The changes in this PR are minimally invasive and would work on every KiCAD version we currently support. But the ft5 index could be enabled where available to make searching the description field as fast as the other queries.

Benchmark

Most of the following queries took over 1.5s on my machine. The time for all of them was similar, because SQLite was doing a full table scan each time.

  • Searching for a keyword, like "100nf", in the basic parts catalog takes 3ms.
  • Directly searching for a JLCPCB part number takes 2ms.
  • Directly searching for a manufacturer part number prefix like 74LV takes 7ms
  • Searching the entire catalog (basic and extended) using a keyword (i.e. 100nf) and a footprint (i.e. 0603) takes 208ms. (There are a lot of 0603 components!)
  • Searching for a keyword (like 10uf) with a category (like Capacitors) takes 26ms
  • Searching for a plain keyword (10uf again but with no category specified) takes 368ms. This is still faster than the baseline because it is able to ignore out of stock parts.

In summary, everything is pretty fast as long as you have at least one hard constraint to narrow down the results. I was not able to get fulltext search working using standard indexes. We need ft5 for that part.

File Size

Adding the indexes increases the zip file download size from 226.9Mb to 333Mb (100Mb or 46% increase). Personally this is not an issue whatsoever for me, but it is a significant increase nonetheless.

Something we might want to consider is creating the indexes on the client after downloading rather than on the server. In my experience it only took about 30-60 seconds. This would give us the opportunity to customize what indexes and even tables which we create depending on the client capabilities. For example we could create an ft5 table if it is supported.

I would like to test how file size compares between downloading a premade SQLite database vs a compressed CSV file. I have a hunch the plain text might be more efficient, since SQLite is probably designed with fixed byte offsets for fast querying. If so, this might be an additional optimization to pursue which would "pay for" the time it takes to create the index client-side. But it's just a hypothesis and I could easily be wrong.

How to test this PR

I haven't yet touched the automatic database generation code. I'm lost with GitHub actions. I guess it would just be a matter of adding a few lines of SQL. But for now you will need to enter an SQLite shell and run a few commands to create the indexes:

sqlite3 parts.db
-- Rewrite parts_lcsc_part_index to remove case sensitivity so it will be compatible with
-- queries using the LIKE operator which is also case-insensitive
DROP INDEX IF EXISTS parts_lcsc_part_index;
CREATE UNIQUE INDEX parts_lcsc_part_index ON parts ('LCSC Part' COLLATE NOCASE);

-- Create new indexes on most fields
CREATE INDEX parts_library_and_stock ON parts ('Library Type', 'Stock');
CREATE INDEX parts_manufacturer ON parts ('Manufacturer' COLLATE NOCASE);
CREATE INDEX parts_package ON parts ('Package' COLLATE NOCASE);
CREATE INDEX parts_first_category ON parts ('First Category' COLLATE NOCASE);
CREATE INDEX parts_second_category ON parts ('Second Category' COLLATE NOCASE);
CREATE INDEX parts_mfr_number ON parts ('MFR.Part' COLLATE NOCASE);
CREATE INDEX parts_solder_joints ON parts ('Solder Joint' COLLATE NOCASE);

-- This is very important for the query planner to efficiently use the indexes we just created
ANALYZE

Changes I had to make

  • Added indexes to the table. This will increase database file size.
  • Keyword field now only searches description and part number.
  • Part number field searches both manufacture part number prefixes and exact JLCPCB part numbers. NOTE: a query like 74LV would miss parts like SN74LV. That is just what it takes to make the index work. If you want full-text search, you have to type it in the query field and take the hit.
  • By default there is no sort order. Sorting only happens the first time you click a column. This makes queries that would match a huge number of parts much faster. My thinking is that those queries are generally not useful, but rather part of the exploration process where you haven't added enough filters yet.

Tricks I learned along the way

  • A LIKE query will use an index if certain conditions, like case-sensitivity and not being a number, are met. Most of my work involved running lots of example queries through the sqlite shell with .eqp on so I could see what the query planner was thinking and tweak things until an index was used as expected.
  • Unfortunately the LIKE optimization doesn't work if the string to be compared contains only numbers. (e.g. "Footprint" LIKE "0603%") So I detect those digit-only strings and fall back to `"Footprint" = "0603"./ If someone knows a better way to handle this I'd love to hear about it.
  • When comparing for equality, the default is for for the comparison to be case sensitive, thus skipping our COLLATE NOCASE index. We can turn that off again like this: "Field" = "{p}" COLLATE NOCASE
  • ANALYZE needs to be run after creating the database. This is very important. I found that the query planner would often do non-sensical things if ANALYZE had not been run.

@chmorgan
Copy link
Collaborator

chmorgan commented Apr 9, 2024

@whmountains I'll mention that my approach mostly obviates the part search gui elements other than the 'keywords' list.

I had considered this approach and you are correct that a massive amount of performance can be gained. The downside though is that you end up having to do a lot of the advanced query construction manually. The database is much smaller though, the fts5 database is a hefty 4GB. The zip size is only ~700MB though (so 3x the size).

My workflow of finding parts is to combine the jlcpcb plugin with the website and plug in freeform entries to the keywords list.

I've updated #348 against latest main and pushed out a new db to my GitHub hosted site if you want to give it a spin.

@whmountains
Copy link
Collaborator Author

I love the idea of having a single search box. Thanks for releasing the update, I will test it out and report back.

@whmountains
Copy link
Collaborator Author

whmountains commented Apr 10, 2024

@chmorgan FYI due to some things going on at work I don't think I will be able to test your fork for another 1-2 weeks. Looking forward to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants