MBS-13075: Indicate the number of items in a series #2935

reosarevok · 2023-05-10T10:51:38Z

MBS-13075

Problem

Right now there's no good way of knowing how many items are in a series, other than just counting them by hand. This can be quite annoying for any series with more than a few entries. This is useful for a general idea of how big a series is (in a search for example it helps see what are popular/large series), and for a specific series of known real world size (such as a catalogue) it also can give a general idea of how much content is still missing from the MB entry.

Solution

This reuses the general idea of entity_count we already had for collections. The entity_count for a series is calculated based on the ${entity_type}_series views. See additional details on the commit messages.

Testing

Manually, by checking series sidebars, creating a small series collection and making sure the counts appear, and doing both a direct and an indexed series search for "apple" and making sure the counts are there. None of these seemed particularly slow.

reosarevok · 2023-05-11T12:48:14Z

@brainzbot, retest this please

yvanzo

With a small collection it probably doesn’t change much, but it is 10x slower (9s instead of 0.9s roughly on average) on a large collection such as https://musicbrainz.org/collection/923e7570-b682-49a8-ba5f-82f0fecd48e5. Would it be possible to rather cache these counts?

mwiencek · 2023-09-18T00:30:31Z

With a small collection it probably doesn’t change much, but it is 10x slower (9s instead of 0.9s roughly on average) on a large collection such as https://musicbrainz.org/collection/923e7570-b682-49a8-ba5f-82f0fecd48e5. Would it be possible to rather cache these counts?

If you're testing over ssh tunnel, it might just be that + the raw number of additional queries. On my machine, each PG query via ssh tunnel can take additional 120 ms or longer on top of the planning and execution time. This additional latency wouldn't be nearly as bad in production. However, it's still possible to reduce the number of queries performed here.

lib/MusicBrainz/Server/Data/Series.pm

reosarevok · 2023-12-22T05:36:07Z

@brainzbot, retest this please

mwiencek

The new query for loading the entity counts is quite fast (completing in ~8ms on hendrix for 100 series IDs), so I don't think caching is needed.

lib/MusicBrainz/Server/Data/Series.pm

mwiencek · 2024-02-02T18:15:47Z

lib/MusicBrainz/Server/Data/Series.pm

+    );
+
+    my @series_ids = map{ $_->id } @series;
+    my @query_params = (\@series_ids) x scalar(@entity_types);


We should probably partition the ID lists by item_entity_type, so that we only pass the relevant IDs to each subquery.

Hmm. So rather than having the one UNION ALL query, partition the ID lists and only run (separately) the queries that actually have relevant series, then set the counts from each run of the entity type loop? Or partition the lists and change this to \@artist_series_ids, \@event_series_ids etc? (can we do the last, as in, can we trust the order of queries and params to always be consistent?)

mwiencek · 2024-02-02T18:34:36Z

lib/MusicBrainz/Server/Data/Search.pm

+            $item_entity_type = $self->c->sql->select_single_value(
+                'SELECT entity_type FROM series_type WHERE gid = ?',
+                $type_gid,
+            );


AFAICT, this adds an additional query per search result (so if returning 100 items, 100 additional queries). With an SSH tunnel to hendrix, this makes a search like http://localhost:5000/search?query=the&type=series&limit=100&method=indexed take an additional 30s for me (in production it won't be as noticeable, but is not efficient in either case).

Can we add something like load_item_entity_type to Data::SeriesType (ideally only making a single query) and just call that in external_search?

I did something like that, it does seem significantly faster, but do take a look.

It seems useful to be able to see at a glance the amount of entities in a series, so this adds it to the series sidebar. The display is the same as what we already do for collections, and so is most of the implementation (except for the load_entity_count method itself). We have views for this and it does not seem particularly slow.

It seems useful to be able to see at a glance the amount of entities in each series in a collection. The display is similar to what we show in a user's collections list. I did not make this sortable right now since the SQL sort would be kind of annoying - we'd need to get the counts by joining different views for each of the series in the collection based on their entity type, which seems nontrivial.

It seems useful to be able to see at a glance the amount of entities in each series in a search result. This requires loading the entity_type for the series type since that is missing from the indexed search data. That in return needs the type gid, which we have in the search response. For some reason, we were creating a type with just the name in schema_fixup_type; this adds the type gid to the fixed up types, and then loads the entity type for the relevant series types with load_entity_type_from_gid.

reosarevok added the New feature Non urgent new stuff label May 10, 2023

reosarevok force-pushed the MBS-13075 branch from 15d8aca to bc87468 Compare May 10, 2023 12:54

reosarevok force-pushed the MBS-13075 branch from bc87468 to 98c5154 Compare June 8, 2023 08:42

yvanzo requested changes Jul 22, 2023

View reviewed changes

mwiencek reviewed Sep 18, 2023

View reviewed changes

lib/MusicBrainz/Server/Data/Series.pm Outdated Show resolved Hide resolved

reosarevok force-pushed the MBS-13075 branch from 98c5154 to 1187791 Compare December 14, 2023 14:04

reosarevok force-pushed the MBS-13075 branch from 1187791 to b364fcf Compare December 21, 2023 14:17

mwiencek requested changes Feb 2, 2024

View reviewed changes

reosarevok added 3 commits February 12, 2024 06:51

reosarevok force-pushed the MBS-13075 branch from b364fcf to 0bd1497 Compare February 12, 2024 06:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MBS-13075: Indicate the number of items in a series #2935

MBS-13075: Indicate the number of items in a series #2935

reosarevok commented May 10, 2023

reosarevok commented May 11, 2023

yvanzo left a comment

mwiencek commented Sep 18, 2023

reosarevok commented Dec 22, 2023

mwiencek left a comment •

edited

mwiencek Feb 2, 2024

reosarevok Feb 12, 2024

mwiencek Feb 2, 2024

reosarevok Feb 12, 2024

MBS-13075: Indicate the number of items in a series #2935

Are you sure you want to change the base?

MBS-13075: Indicate the number of items in a series #2935

Conversation

reosarevok commented May 10, 2023

MBS-13075

Problem

Solution

Testing

reosarevok commented May 11, 2023

yvanzo left a comment

Choose a reason for hiding this comment

mwiencek commented Sep 18, 2023

reosarevok commented Dec 22, 2023

mwiencek left a comment • edited

Choose a reason for hiding this comment

mwiencek Feb 2, 2024

Choose a reason for hiding this comment

reosarevok Feb 12, 2024

Choose a reason for hiding this comment

mwiencek Feb 2, 2024

Choose a reason for hiding this comment

reosarevok Feb 12, 2024

Choose a reason for hiding this comment

mwiencek left a comment •

edited