Persistent Index? #98

dsully · 2021-04-10T15:19:18Z

Any plans to allow for the creation + update of a persistent index (perhaps SQLite backed)?

I'd love to be able to query over a large amount of data, where realtime queries for things like video width / height is extremely slow.

Thanks

pavlus · 2021-07-09T22:13:25Z

To be able to use indexes, RDBMS have to have control over modification of indexed data, so changes to it could be reflected on indexes as well.

Since this application doesn't control who changes files on your disk and how, i'm not sure how it would work.

Perhaps adding FS watcher (like with inotify) that runs fselect on changed files and updates some CSV file could be possible, but it could be done externally.

Or populating CSV file with fselect and querying it with some other tool? Importing that CSV into RDBMS as a table is a solution too.

jhspetersson · 2021-07-10T06:05:11Z

Maybe using third-party indices is a more viable solution for fselect. I definitely plan to support Everything on Windows some day.

danieldjewell · 2021-09-15T18:47:19Z

I'd love to be able to query over a large amount of data, where realtime queries for things like video width / height is extremely slow.

Sounds like you're looking more for a cache? (I suppose this is one of those things where SQL & Filesystem terms kinda clash... because <Index> != <Cache> in the SQL world.)

It would seem ideal and fitting to store cached results in some kind of database... (Perhaps SQLite or otherwise?)

Maybe even two modes:

Cache results the first time they're searched (no pre-calculation)
Pre-Calculate and cache all results in a batch job

I think it would be important to actually benchmark how long it takes to scan videos, etc. Using the sha256 (or ) as the primary key would allow the file path to change to anywhere and still have metadata on the media file... But whether or not this is actually faster would be down to [Time to Scan Video File Metadata] vs. [Time to Hash File]. On systems with SHA processor extensions, it'd probably be faster to hash the file, but that's just a guess.

(Storing the filepath alongside in the DB would have the advantage that you could easily add verification of files at zero performance penalty - if you're already hashing the file, that is.)

On the file name front, maybe it's possible to use an existing mlocate/plocate database to speed up name searches? (Pretty sure those two don't store file metadata though.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Persistent Index? #98

Persistent Index? #98

dsully commented Apr 10, 2021

pavlus commented Jul 9, 2021 •

edited

jhspetersson commented Jul 10, 2021

danieldjewell commented Sep 15, 2021

Persistent Index? #98

Persistent Index? #98

Comments

dsully commented Apr 10, 2021

pavlus commented Jul 9, 2021 • edited

jhspetersson commented Jul 10, 2021

danieldjewell commented Sep 15, 2021

pavlus commented Jul 9, 2021 •

edited