Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MBS-13075: Indicate the number of items in a series #2935

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

reosarevok
Copy link
Member

MBS-13075

Problem

Right now there's no good way of knowing how many items are in a series, other than just counting them by hand. This can be quite annoying for any series with more than a few entries. This is useful for a general idea of how big a series is (in a search for example it helps see what are popular/large series), and for a specific series of known real world size (such as a catalogue) it also can give a general idea of how much content is still missing from the MB entry.

Solution

This reuses the general idea of entity_count we already had for collections. The entity_count for a series is calculated based on the ${entity_type}_series views. See additional details on the commit messages.

Testing

Manually, by checking series sidebars, creating a small series collection and making sure the counts appear, and doing both a direct and an indexed series search for "apple" and making sure the counts are there. None of these seemed particularly slow.

@reosarevok
Copy link
Member Author

@brainzbot, retest this please

Copy link
Contributor

@yvanzo yvanzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With a small collection it probably doesn’t change much, but it is 10x slower (9s instead of 0.9s roughly on average) on a large collection such as https://musicbrainz.org/collection/923e7570-b682-49a8-ba5f-82f0fecd48e5. Would it be possible to rather cache these counts?

@mwiencek
Copy link
Member

With a small collection it probably doesn’t change much, but it is 10x slower (9s instead of 0.9s roughly on average) on a large collection such as https://musicbrainz.org/collection/923e7570-b682-49a8-ba5f-82f0fecd48e5. Would it be possible to rather cache these counts?

If you're testing over ssh tunnel, it might just be that + the raw number of additional queries. On my machine, each PG query via ssh tunnel can take additional 120 ms or longer on top of the planning and execution time. This additional latency wouldn't be nearly as bad in production. However, it's still possible to reduce the number of queries performed here.

@reosarevok
Copy link
Member Author

@brainzbot, retest this please

Copy link
Member

@mwiencek mwiencek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new query for loading the entity counts is quite fast (completing in ~8ms on hendrix for 100 series IDs), so I don't think caching is needed.

lib/MusicBrainz/Server/Data/Series.pm Outdated Show resolved Hide resolved
);

my @series_ids = map{ $_->id } @series;
my @query_params = (\@series_ids) x scalar(@entity_types);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably partition the ID lists by item_entity_type, so that we only pass the relevant IDs to each subquery.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. So rather than having the one UNION ALL query, partition the ID lists and only run (separately) the queries that actually have relevant series, then set the counts from each run of the entity type loop? Or partition the lists and change this to \@artist_series_ids, \@event_series_ids etc? (can we do the last, as in, can we trust the order of queries and params to always be consistent?)

Comment on lines 419 to 422
$item_entity_type = $self->c->sql->select_single_value(
'SELECT entity_type FROM series_type WHERE gid = ?',
$type_gid,
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT, this adds an additional query per search result (so if returning 100 items, 100 additional queries). With an SSH tunnel to hendrix, this makes a search like http://localhost:5000/search?query=the&type=series&limit=100&method=indexed take an additional 30s for me (in production it won't be as noticeable, but is not efficient in either case).

Can we add something like load_item_entity_type to Data::SeriesType (ideally only making a single query) and just call that in external_search?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did something like that, it does seem significantly faster, but do take a look.

It seems useful to be able to see at a glance the amount
of entities in a series, so this adds it to the series sidebar.
The display is the same as what we already do for collections,
and so is most of the implementation
(except for the load_entity_count method itself).
We have views for this and it does not seem particularly slow.
It seems useful to be able to see at a glance the amount
of entities in each series in a collection.
The display is similar to what we show in a user's collections list.
I did not make this sortable right now since the SQL sort
would be kind of annoying - we'd need to get the counts by joining
different views for each of the series in the collection
based on their entity type, which seems nontrivial.
It seems useful to be able to see at a glance the amount
of entities in each series in a search result.

This requires loading the entity_type for the series type
since that is missing from the indexed search data. That in return
needs the type gid, which we have in the search response.
For some reason, we were creating a type with just the name
in schema_fixup_type; this adds the type gid to the fixed up types,
and then loads the entity type for the relevant series types
with load_entity_type_from_gid.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
New feature Non urgent new stuff
Projects
None yet
3 participants