Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persistent Index? #98

Open
dsully opened this issue Apr 10, 2021 · 3 comments
Open

Persistent Index? #98

dsully opened this issue Apr 10, 2021 · 3 comments

Comments

@dsully
Copy link

dsully commented Apr 10, 2021

Any plans to allow for the creation + update of a persistent index (perhaps SQLite backed)?

I'd love to be able to query over a large amount of data, where realtime queries for things like video width / height is extremely slow.

Thanks

@pavlus
Copy link
Contributor

pavlus commented Jul 9, 2021

To be able to use indexes, RDBMS have to have control over modification of indexed data, so changes to it could be reflected on indexes as well.

Since this application doesn't control who changes files on your disk and how, i'm not sure how it would work.

Perhaps adding FS watcher (like with inotify) that runs fselect on changed files and updates some CSV file could be possible, but it could be done externally.

Or populating CSV file with fselect and querying it with some other tool? Importing that CSV into RDBMS as a table is a solution too.

@jhspetersson
Copy link
Owner

Maybe using third-party indices is a more viable solution for fselect. I definitely plan to support Everything on Windows some day.

@danieldjewell
Copy link

I'd love to be able to query over a large amount of data, where realtime queries for things like video width / height is extremely slow.

Sounds like you're looking more for a cache? (I suppose this is one of those things where SQL & Filesystem terms kinda clash... because <Index> != <Cache> in the SQL world.)

It would seem ideal and fitting to store cached results in some kind of database... (Perhaps SQLite or otherwise?)

Maybe even two modes:

  1. Cache results the first time they're searched (no pre-calculation)
  2. Pre-Calculate and cache all results in a batch job

I think it would be important to actually benchmark how long it takes to scan videos, etc. Using the sha256 (or ) as the primary key would allow the file path to change to anywhere and still have metadata on the media file... But whether or not this is actually faster would be down to [Time to Scan Video File Metadata] vs. [Time to Hash File]. On systems with SHA processor extensions, it'd probably be faster to hash the file, but that's just a guess.

(Storing the filepath alongside in the DB would have the advantage that you could easily add verification of files at zero performance penalty - if you're already hashing the file, that is.)

On the file name front, maybe it's possible to use an existing mlocate/plocate database to speed up name searches? (Pretty sure those two don't store file metadata though.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants