Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

beet list -a path:some/path queries are extremely slow #4360

Open
snejus opened this issue May 31, 2022 · 3 comments · May be fixed by #5240
Open

beet list -a path:some/path queries are extremely slow #4360

snejus opened this issue May 31, 2022 · 3 comments · May be fixed by #5240

Comments

@snejus
Copy link
Member

snejus commented May 31, 2022

Problem

  • beet list -a path: seems to be much, much slower than beet list path: while in reality an album query should be at least a tiny bit faster than an item query since there are fewer entities in the table.

Running a path query for items:

$ count beet list path:/run/media/sarunas/music/Music/micronaudio/djstingray313_molecularlevelsolutions
DJ Stingray 313         Bioplastics  (Molecular Level Solutions, 2021)
DJ Stingray 313         Construction Materials From Organic Waste  (Molecular Level Solutions, 2021)
DJ Stingray 313         Carbon Neutral Fuels  (Molecular Level Solutions, 2021)
DJ Stingray 313         Enzymatic Detergents  (Molecular Level Solutions, 2021)
0s 943ms 734us

Running the same query for albums:

$ count beet list -a path:/run/media/sarunas/music/Music/micronaudio/djstingray313_molecularlevelsolutions
DJ Stingray 313 - Molecular Level Solutions released on Micron Audio, Cat No.  (2021, DE)
13s 951ms 836us

And querying albums for a path that does not exist:

$ count beet list -a path::djstingray3133333333
15s 398ms 109us

Note: count is a shell function which measures the timing of the supplied command

Seeing that the last query took so long even though the result is empty, I have a feeling it's caused by some initial query returning all items/albums from the database, where filtering is done in python by looping across the items (potentially making additional queries, - seeing that this operation is really slow).

FYI, probably related, that's my beet stats output:

Tracks: 5697
Total time: 3.6 weeks (2165608.03 seconds)
Total size: 266.1 GiB (285764413488 bytes)
Artists: 2634
Albums: 1304
Album artists: 901

Setup

  • OS: Arch, Linux kernel 5.17.9
  • Python version: 3.10.4
  • beets version: tip of the master branch, 988bf26
  • Turning off plugins made problem go away (yes/no): No
  • Using the default configuration helped: No
@snejus
Copy link
Member Author

snejus commented Jun 1, 2022

I added a print(model_cls._table, query) call to the Database._fetch method in beets.dbcore.db to see what's going on:

$ beet list -a path::djstingray
items MatchQuery('id', 0, True)
albums AndQuery([RegexpQuery('path', re.compile('djstingray'), False)])
items MatchQuery('album_id', 7222, True)
items MatchQuery('album_id', 7497, True)
items MatchQuery('album_id', 5422, True)
items MatchQuery('album_id', 5837, True)
items MatchQuery('album_id', 7690, True)
items MatchQuery('album_id', 7432, True)
items MatchQuery('album_id', 7225, True)
items MatchQuery('album_id', 7226, True)
items MatchQuery('album_id', 7655, True)
items MatchQuery('album_id', 5838, True)
... the rest of *all* albums

Seems like it initially gets all albums, then loops across them and hits the db to get its items.

Added a conditional raise in the same function in order to see the calls that lead to the first one of items MatchQuery:

if "FROM items" in sql and "album_id" in query.field:
    raise Exception
items MatchQuery('album_id', 7222, True)
Traceback (most recent call last):
  File "/home/sarunas/.local/bin/beetbin", line 5, in <module>
    main()
  File "/home/sarunas/repo/beets/beets/ui/__init__.py", line 1304, in main
    _raw_main(args)
  File "/home/sarunas/repo/beets/beets/ui/__init__.py", line 1291, in _raw_main
    subcommand.func(lib, suboptions, subargs)
  File "/home/sarunas/repo/beets/beets/ui/commands.py", line 1089, in list_func
    list_items(lib, decargs(args), opts.album)
  File "/home/sarunas/repo/beets/beets/ui/commands.py", line 1081, in list_items
    for album in lib.albums(query):
  File "/home/sarunas/repo/beets/beets/dbcore/db.py", line 714, in _get_objects
    if not self.query or self.query.match(obj):
  File "/home/sarunas/repo/beets/beets/dbcore/query.py", line 463, in match
    return all(q.match(item) for q in self.subqueries)
  File "/home/sarunas/repo/beets/beets/dbcore/query.py", line 463, in <genexpr>
    return all(q.match(item) for q in self.subqueries)
  File "/home/sarunas/repo/beets/beets/dbcore/query.py", line 119, in match
    return self.value_match(self.pattern, item.get(self.field))
  File "/home/sarunas/repo/beets/beets/dbcore/db.py", line 371, in _get
    return getters[key](self)
  File "/home/sarunas/repo/beets/beets/library.py", line 1254, in item_dir
    item = self.items().get()
  File "/home/sarunas/repo/beets/beets/library.py", line 1149, in items
    return self._db.items(dbcore.MatchQuery('album_id', self.id))
  File "/home/sarunas/repo/beets/beets/library.py", line 1555, in items
    return self._fetch(Item, query, sort or self.get_default_item_sort())
  File "/home/sarunas/repo/beets/beets/library.py", line 1533, in _fetch
    return super()._fetch(
  File "/home/sarunas/repo/beets/beets/dbcore/db.py", line 1081, in _fetch
    raise Exception
Exception

Since the query filters on the path field and that albums table does not have it, I reckon the items are being queried for filtering.

Initial thoughts on how this could be improved:

  1. Get albums and items in the initial query
    • SELECT * FROM albums INNER JOIN on items.album_id == albums.id
  2. Do matching/filtering in the same query (moving it away from PathQuery.match)
    • there's a slight complication with regexp patterns: it's only available in the most recent SQLite versions. Luckily, there's connection.register_function method that allows to register our own implementation, - the same that RegexpQuery.match uses.
  3. album_attributes and item_attributes could also be obtained through a join

This should in turn

@sampsyo
Copy link
Member

sampsyo commented Jun 1, 2022

Indeed; the big shortcoming here is that we can't generate proper JOINs when searching for albums according to track-level data. Doing this would be a huge improvement, but it's also a rather large architectural change to the database abstraction layer, dbcore, because of the extremely basic way that we currently stitch together queries. I don't have any perfect ideas at the moment about how best to improve the architecture so this is possible, but I would love to discuss it more if you have ideas!

@snejus
Copy link
Member Author

snejus commented Jun 5, 2022

I've just opened a pull request which contains an update on query composition, which (I think) does not cause a big change in the overall architecture.

@snejus snejus linked a pull request May 9, 2024 that will close this issue
10 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants