Make queries fast, filter all flexible attributes #5240

snejus · 2024-05-09T11:36:51Z

Make LazyClassProperty / cached_classproperty reusable
Add support for filtering relations
Add ability to debug queries
Fix querying fields present in both tables
Aggregate flexible attributes
Add ability to filter flexible attributes through the Query
Enable querying related flexible attributes
Remove slow lookups from beetsplug/aura

Description

Another and (hopefully) final attempt to improve querying speed.

Fixes #4360 #3515 and possibly more issues to do with slow queries.

This PR supersedes #4746.

What's been done

The album and item tables are joined, and corresponding data from item_attributes and album_attributes is merged and made available for filtering. This enables to achieve the following:

Faster album path queries, beet list -a path::some/path
Faster flexible attributes queries, both albums and tracks, beet list play_count:10
(New) Ability to filter albums with track-level (and vice-versa) db field queries, beet list -a title:something
(New) Ability to filter tracks with album-level flexible field queries, beet list artpath:cover
(New) Ability to filter albums with track-level flexible field queries, beet list -a art_source:something

Benchmarks

You can see that now querying speed is more or less constant regardless of the query, and the speed is mostly influenced by how many results need to be printed out

Compare this with what we had previously

To Do

Documentation. (If you've add a new command-line flag, for example, find the appropriate page under docs/ to describe it.)
Changelog. (Add an entry to docs/changelog.rst near the top of the document.)
Tests. (Encouraged but not strictly required.)

Later

Submit PR with the corresponding adjustment for sorting and fix for lslimit
Submit PR with the corresponding adjustment for template variables resolution

This will be help with testing each of the documents which do not any more depend on the 'global' `current_app` and `request`. These two can now be provided at the time the objects are instantiated.

github-actions · 2024-05-09T11:37:06Z

Thank you for the PR! The changelog has not been updated, so here is a friendly reminder to check if you need to add an entry.

Use `json_group_object` SQLite function to aggregate flexible attributes into `flex_attrs` field. Register SQLite converter `json.loads` to automatically convert the JSON string to a Python dictionary. Remove the code that had this task previously.

For a flexible attribute query, replace the `col_name` property with a function call that extracts that attribute from the `field_attrs` field introduced in the earlier commit. Additionally, for boolean, numeric and date queries CAST the value to NUMERIC SQLite affinity to ensure that our queries like 'flex:1..5' and 'flex:true' continue working fine. This removes the concept of 'slow query', since every query for any field now has an SQL clause.

Unify query creation logic from - queryparse.py:construct_query_part, - Model.field_query, - DefaultTemplateFunctions._tmpl_unique to a single implementation under `LibModel.field_query` class method. This method should be used for query resolution for model (flex)fields. Allow filtering item attributes in album queries and vice versa by merging `flex_attrs` from Album and Item together as `all_flex_attrs`. This field is only used for filtering and is discarded after.

It seems like previously filtering by flexible attributes did not work - I'd receive '{"data": []}' trying to GET `/aura/tracks?filter[play_count]=11` Now this works, not only for tracks, but for `/aura/artists` and `/aura/albums` too. Additionally, this improves `/aura/tracks` response time significantly. I tried loading the default of 500 tracks from my library: On `master`, it took ~20s After this commit, it takes under 1s.

snejus · 2024-05-09T21:05:38Z

I'm using the test-aura branch as the base since it depends on it getting merged. Though for now, I will change the base to master in order to run the tests.

snejus · 2024-05-09T21:09:10Z

beets/importer.py

@@ -1019,7 +1019,7 @@ def find_duplicates(self, lib):
        # temporary `Item` object to generate any computed fields.
        tmp_item = library.Item(lib, **info)
        keys = config["import"]["duplicate_keys"]["item"].as_str_seq()
-        dup_query = library.Album.all_fields_query(
+        dup_query = library.Item.match_all_query(


It seemed to me this was supposed to be Item instead of Album?

Serene-Arc · 2024-05-10T05:06:18Z

I'll be able to review in a week or two, just end of semester push at the moment.

In order to include the table name for fields in this query, use the `field_query` method. Since `AnyFieldQuery` is just an `OrQuery` under the hood, remove it and construct `OrQuery` explicitly instead.

wisp3rwind · 2024-05-13T20:03:41Z

Hey, also just chiming in to say that it will take some time for me to go through the current batch of PRs. I won't be able to keep up the response times I had last week, but I'll slowly work through all of them.

snejus added 6 commits May 7, 2024 19:55

Feed in app context and args into Document to allow testing

6ee7346

This will be help with testing each of the documents which do not any more depend on the 'global' `current_app` and `request`. These two can now be provided at the time the objects are instantiated.

Test fetching each of the documents

c1518db

Dedupe get_attribute_converter

e9ce625

Make LazyClassProperty / cached_classproperty reusable

c90513f

Add support for filtering relations

c91647c

Add ability to debug queries

7867b64

snejus self-assigned this May 9, 2024

snejus requested a review from wisp3rwind May 9, 2024 11:36

snejus mentioned this pull request May 9, 2024

Use SQL to query flex fields, and related Album/Item data #4746

Closed

3 tasks

snejus requested review from sampsyo, Serene-Arc and JOJ0 May 9, 2024 11:38

snejus added 5 commits May 9, 2024 12:40

Fix querying fields present in both tables

d57526e

Aggregate flexible attributes

bc03f7b

Use `json_group_object` SQLite function to aggregate flexible attributes into `flex_attrs` field. Register SQLite converter `json.loads` to automatically convert the JSON string to a Python dictionary. Remove the code that had this task previously.

snejus force-pushed the only-fast-filtering branch from ad2ea5a to 9ceffb6 Compare May 9, 2024 11:40

snejus added the review-needed label May 9, 2024

snejus changed the base branch from test-aura to master May 9, 2024 21:05

snejus force-pushed the only-fast-filtering branch from 3c293fd to bdb7fd9 Compare May 9, 2024 21:06

snejus commented May 9, 2024

View reviewed changes

Ensure that any field query uses the table name

070c87f

In order to include the table name for fields in this query, use the `field_query` method. Since `AnyFieldQuery` is just an `OrQuery` under the hood, remove it and construct `OrQuery` explicitly instead.

snejus force-pushed the only-fast-filtering branch from bdb7fd9 to 070c87f Compare May 10, 2024 08:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make queries fast, filter all flexible attributes #5240

Make queries fast, filter all flexible attributes #5240

snejus commented May 9, 2024 •

edited

github-actions bot commented May 9, 2024

snejus commented May 9, 2024

snejus May 9, 2024

Serene-Arc commented May 10, 2024

wisp3rwind commented May 13, 2024

Make queries fast, filter all flexible attributes #5240

Are you sure you want to change the base?

Make queries fast, filter all flexible attributes #5240

Conversation

snejus commented May 9, 2024 • edited

Description

What's been done

Benchmarks

To Do

Later

github-actions bot commented May 9, 2024

snejus commented May 9, 2024

snejus May 9, 2024

Choose a reason for hiding this comment

Serene-Arc commented May 10, 2024

wisp3rwind commented May 13, 2024

snejus commented May 9, 2024 •

edited