Skip to content

Commit

Permalink
Merge pull request #1753 from MTG/similarity-solr
Browse files Browse the repository at this point in the history
Solr-based similarity search
  • Loading branch information
ffont committed Feb 9, 2024
2 parents 5c0df49 + 199ce0f commit 20d47d9
Show file tree
Hide file tree
Showing 27 changed files with 811 additions and 477 deletions.
6 changes: 3 additions & 3 deletions DEVELOPERS.md
Original file line number Diff line number Diff line change
Expand Up @@ -144,7 +144,7 @@ If a new search engine backend class is to be implemented, it must closely follo
utils.search.SearchEngineBase docstrings. There is a Django management command that can be used in order to test
the implementation of a search backend. You can run it like:

docker-compose run --rm web python manage.py test_search_engine_backend -fsw --backend utils.search.backends.solr9pysolr.Solr9PySolrSearchEngine
docker compose run --rm web python manage.py test_search_engine_backend -fsw --backend utils.search.backends.solr9pysolr.Solr9PySolrSearchEngine

Please read carefully the documentation of the management command to better understand how it works and how is it
doing the testing.
Expand Down Expand Up @@ -217,7 +217,7 @@ https://github.com/mtg/freesound-audio-analyzers. The docker compose of the main
services for the external analyzers which depend on docker images having been previously built from the
`freesound-audio-analyzers` repository. To build these images you simply need to checkout the code repository and run
`make`. Once the images are built, Freesound can be run including the external analyzer services by of the docker compose
file by running `docker-compose --profile analyzers up`
file by running `docker compose --profile analyzers up`

The new analysis pipeline uses a job queue based on Celery/RabbitMQ. RabbitMQ console can be accessed at port `5673`
(e.g. `http://localhost:5673/rabbitmq-admin`) and using `guest` as both username and password. Also, accessing
Expand All @@ -231,7 +231,7 @@ for Freesound async tasks other than analysis).

- Make sure that there are no outstanding deprecation warnings for the version of django that we are upgrading to.

docker-compose run --rm web python -Wd manage.py test
docker compose run --rm web python -Wd manage.py test

Check for warnings of the form `RemovedInDjango110Warning` (TODO: Make tests fail if a warning occurs)

Expand Down
34 changes: 17 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,35 +65,35 @@ Below are instructions for setting up a local Freesound installation for develop

8. Build all Docker containers. The first time you run this command can take a while as a number of Docker images need to be downloaded and things need to be installed and compiled.

docker-compose build
docker compose build

9. Download the [Freesound development database dump](https://drive.google.com/file/d/11z9s8GyYkVlmWdEsLSwUuz0AjZ8cEvGy/view?usp=share_link) (~6MB), uncompress it and place the resulting `freesound-small-dev-dump-2023-09.sql` in the `freesound-data/db_dev_dump/` directory. Then run the database container and load the data into it using the commands below. You should get permission to download this file from Freesound admins.

docker-compose up -d db
docker-compose run --rm db psql -h db -U freesound -d freesound -f freesound-data/db_dev_dump/freesound-small-dev-dump-2023-09.sql
docker compose up -d db
docker compose run --rm db psql -h db -U freesound -d freesound -f freesound-data/db_dev_dump/freesound-small-dev-dump-2023-09.sql
# or if the above command does not work, try this one
docker-compose run --rm --no-TTY db psql -h db -U freesound -d freesound < freesound-data/db_dev_dump/freesound-small-dev-dump-2023-09.sql
docker compose run --rm --no-TTY db psql -h db -U freesound -d freesound < freesound-data/db_dev_dump/freesound-small-dev-dump-2023-09.sql

10. Update database by running Django migrations

docker-compose run --rm web python manage.py migrate
docker compose run --rm web python manage.py migrate

11. Create a superuser account to be able to log in to the local Freesound website and to the admin site

docker-compose run --rm web python manage.py createsuperuser
docker compose run --rm web python manage.py createsuperuser

12. Install static build dependencies

docker-compose run --rm web npm install --force
docker compose run --rm web npm install --force

13. Build static files. Note that this step will need to be re-run every time there are changes in Freesound's static code (JS, CSS and static media files).

docker-compose run --rm web npm run build
docker-compose run --rm web python manage.py collectstatic --noinput
docker compose run --rm web npm run build
docker compose run --rm web python manage.py collectstatic --noinput

14. Run services 🎉

docker-compose up
docker compose up

When running this command, the most important services that make Freesound work will be run locally.
This includes the web application and database, but also the search engine, cache manager, queue manager and asynchronous workers, including audio processing.
Expand All @@ -102,24 +102,24 @@ Below are instructions for setting up a local Freesound installation for develop
15. Build the search index, so you can search for sounds and forum posts

# Open a new terminal window so the services started in the previous step keep running
docker-compose run --rm web python manage.py reindex_search_engine_sounds
docker-compose run --rm web python manage.py reindex_search_engine_forum
docker compose run --rm web python manage.py reindex_search_engine_sounds
docker compose run --rm web python manage.py reindex_search_engine_forum

After following the steps, you'll have a functional Freesound installation up and running, with the most relevant services properly configured.
You can run Django's shell plus command like this:

docker-compose run --rm web python manage.py shell_plus
docker compose run --rm web python manage.py shell_plus

Because the `web` container mounts a named volume for the home folder of the user running the shell plus process, command history should be kept between container runs :)

16. (extra step) The steps above will get Freesound running, but to save resources in your local machine some non-essential services will not be started by default. If you look at the `docker-compose.yml` file, you'll see that some services are marked with the profile `analyzers` or `all`. These services include sound similarity, search results clustering and the audio analyzers. To run these services you need to explicitly tell `docker-compose` using the `--profile` (note that some services need additional configuration steps (see *Freesound analysis pipeline* section in `DEVELOPERS.md`):
16. (extra step) The steps above will get Freesound running, but to save resources in your local machine some non-essential services will not be started by default. If you look at the `docker compose.yml` file, you'll see that some services are marked with the profile `analyzers` or `all`. These services include sound similarity, search results clustering and the audio analyzers. To run these services you need to explicitly tell `docker compose` using the `--profile` (note that some services need additional configuration steps (see *Freesound analysis pipeline* section in `DEVELOPERS.md`):

docker-compose --profile analyzers up # To run all basic services + sound analyzers
docker-compose --profile all up # To run all services
docker compose --profile analyzers up # To run all basic services + sound analyzers
docker compose --profile all up # To run all services


### Running tests

You can run tests using the Django test runner in the `web` container like that:

docker-compose run --rm web python manage.py test --settings=freesound.test_settings
docker compose run --rm web python manage.py test --settings=freesound.test_settings
2 changes: 1 addition & 1 deletion _docs/api/source/resources.rst
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ Filter name Type Description
``avg_rating`` numerical Average rating for the sound in the range [0, 5].
``num_ratings`` integer Number of times the sound has been rated.
``comment`` string Textual content of the comments of a sound (tokenized). The filter is satisfied if sound contains the filter value in at least one of its comments.
``comments`` integer Number of times the sound has been commented.
``num_comments`` integer Number of times the sound has been commented.
====================== ============= ====================================================


Expand Down
43 changes: 28 additions & 15 deletions freesound.code-workspace
Original file line number Diff line number Diff line change
Expand Up @@ -34,28 +34,23 @@
"tasks": {
"version": "2.0.0",
"tasks": [
{
"label": "Run web and search",
"type": "shell",
"command": "docker-compose up web search",
"problemMatcher": []
},

{
"label": "Docker compose build",
"type": "shell",
"command": "docker-compose build",
"command": "docker compose build",
"problemMatcher": []
},
{
"label": "Build static",
"type": "shell",
"command": "docker-compose run --rm web npm run build && docker-compose run --rm web python manage.py collectstatic --clear --noinput",
"command": "docker compose run --rm web npm run build && docker compose run --rm web python manage.py collectstatic --clear --noinput",
"problemMatcher": []
},
{
"label": "Install static",
"type": "shell",
"command": "docker-compose run --rm web npm install --force",
"command": "docker compose run --rm web npm install --force",
"problemMatcher": []
},
{
Expand All @@ -67,37 +62,55 @@
{
"label": "Create caches",
"type": "shell",
"command": "docker-compose run --rm web python manage.py create_front_page_caches && docker-compose run --rm web python manage.py create_random_sounds && docker-compose run --rm web python manage.py generate_geotags_bytearray",
"command": "docker compose run --rm web python manage.py create_front_page_caches && docker compose run --rm web python manage.py create_random_sounds && docker compose run --rm web python manage.py generate_geotags_bytearray",
"problemMatcher": []
},
{
"label": "Run tests",
"type": "shell",
"command": "docker-compose run --rm web python manage.py test --settings=freesound.test_settings",
"command": "docker compose run --rm web python manage.py test --settings=freesound.test_settings",
"problemMatcher": []
},
{
"label": "Run tests verbose with warnings",
"type": "shell",
"command": "docker-compose run --rm web python -Wa manage.py test -v3 --settings=freesound.test_settings",
"command": "docker compose run --rm web python -Wa manage.py test -v3 --settings=freesound.test_settings",
"problemMatcher": []
},
{
"label": "Migrate",
"type": "shell",
"command": "docker-compose run --rm web python manage.py migrate",
"command": "docker compose run --rm web python manage.py migrate",
"problemMatcher": []
},
{
"label": "Make migrations",
"type": "shell",
"command": "docker-compose run --rm web python manage.py makemigrations",
"command": "docker compose run --rm web python manage.py makemigrations",
"problemMatcher": []
},
{
"label": "Shell plus",
"type": "shell",
"command": "docker-compose run --rm web python manage.py shell_plus",
"command": "docker compose run --rm web python manage.py shell_plus",
"problemMatcher": []
},
{
"label": "Reindex search engine",
"type": "shell",
"command": "docker compose run --rm web python manage.py reindex_search_engine_sounds && docker compose run --rm web python manage.py reindex_search_engine_forum",
"problemMatcher": []
},
{
"label": "Post dirty sounds to search engine",
"type": "shell",
"command": "docker compose run --rm web python manage.py post_dirty_sounds_to_search_engine",
"problemMatcher": []
},
{
"label": "Orchestrate analysis",
"type": "shell",
"command": "docker compose run --rm web python manage.py orchestrate_analysis",
"problemMatcher": []
}
]
Expand Down
10 changes: 10 additions & 0 deletions freesound/settings.py
Original file line number Diff line number Diff line change
Expand Up @@ -638,6 +638,16 @@
SOLR5_BASE_URL = "http://search:8983/solr"
SOLR9_BASE_URL = "http://search:8983/solr"

SEARCH_ENGINE_SIMILARITY_ANALYZERS = {
FSDSINET_ANALYZER_NAME: {
'vector_property_name': 'embeddings',
'vector_size': 100,
}
}
SEARCH_ENGINE_DEFAULT_SIMILARITY_ANALYZER = FREESOUND_ESSENTIA_EXTRACTOR_NAME
SEARCH_ENGINE_NUM_SIMILAR_SOUNDS_PER_QUERY = 500
USE_SEARCH_ENGINE_SIMILARITY = False

# -------------------------------------------------------------------------------
# Similarity client settings
SIMILARITY_ADDRESS = 'similarity'
Expand Down
8 changes: 5 additions & 3 deletions general/tasks.py
Original file line number Diff line number Diff line change
Expand Up @@ -260,10 +260,12 @@ def process_analysis_results(sound_id, analyzer, status, analysis_time, exceptio
{'task_name': PROCESS_ANALYSIS_RESULTS_TASK_NAME, 'sound_id': sound_id, 'analyzer': analyzer, 'status': status,
'exception': str(exception), 'work_time': round(time.time() - start_time)}))
else:
# Load analysis output to database field (following configuration in settings.ANALYZERS_CONFIGURATION)
# Load analysis output to database field (following configuration in settings.ANALYZERS_CONFIGURATION)
a.load_analysis_data_from_file_to_db()
# Set sound to index dirty so that the sound gets reindexed with updated analysis fields
a.sound.mark_index_dirty(commit=True)

if analyzer in settings.SEARCH_ENGINE_SIMILARITY_ANALYZERS or analyzer in settings.ANALYZERS_CONFIGURATION:
# If the analyzer produces data that should be indexed in the search engine, set sound index to dirty so that the sound gets reindexed soon
a.sound.mark_index_dirty(commit=True)
workers_logger.info("Finished processing analysis results (%s)" % json.dumps(
{'task_name': PROCESS_ANALYSIS_RESULTS_TASK_NAME, 'sound_id': sound_id, 'analyzer': analyzer, 'status': status,
'work_time': round(time.time() - start_time)}))
Expand Down
2 changes: 2 additions & 0 deletions search/templatetags/search.py
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,8 @@ def display_facet(context, flt, facet, facet_type, title=""):
context['sort'] if context['sort'] is not None else '',
context['weights'] or ''
)
if context['similar_to'] is not None:
element['add_filter_url'] += '&similar_to={}'.format(context['similar_to'])
filtered_facet.append(element)

# We sort the facets by count. Also, we apply an opacity filter on "could" type pacets
Expand Down
39 changes: 28 additions & 11 deletions search/tests.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@

from django.core.cache import cache
from django.test import TestCase
from django.test.utils import skipIf
from django.test.utils import skipIf, override_settings
from django.urls import reverse
from sounds.models import Sound
from utils.search import SearchResults, SearchResultsPaginator
Expand Down Expand Up @@ -142,6 +142,7 @@ def test_search_page_response_ok(self, perform_search_engine_query):
self.assertEqual(resp.context['error_text'], None)
self.assertEqual(len(resp.context['docs']), self.NUM_RESULTS)


@mock.patch('search.views.perform_search_engine_query')
def test_search_page_num_queries(self, perform_search_engine_query):
perform_search_engine_query.return_value = self.perform_search_engine_query_response
Expand All @@ -155,16 +156,32 @@ def test_search_page_num_queries(self, perform_search_engine_query):
cache.clear()
with self.assertNumQueries(1):
self.client.get(reverse('sounds-search') + '?cm=1')

# Now check number of queries when displaying results as packs (i.e., searching for packs)
cache.clear()
with self.assertNumQueries(5):
self.client.get(reverse('sounds-search') + '?only_p=1')

# Also check packs when displaying in grid mode
cache.clear()
with self.assertNumQueries(5):
self.client.get(reverse('sounds-search') + '?only_p=1&cm=1')

with override_settings(USE_SEARCH_ENGINE_SIMILARITY=True):
# When using search engine similarity, there'll be one extra query performed to get the similarity status of the sounds

# Now check number of queries when displaying results as packs (i.e., searching for packs)
cache.clear()
with self.assertNumQueries(6):
self.client.get(reverse('sounds-search') + '?only_p=1')

# Also check packs when displaying in grid mode
cache.clear()
with self.assertNumQueries(6):
self.client.get(reverse('sounds-search') + '?only_p=1&cm=1')

with override_settings(USE_SEARCH_ENGINE_SIMILARITY=False):
# When not using search engine similarity, there'll be one less query performed as similarity state is retrieved directly from sound object

# Now check number of queries when displaying results as packs (i.e., searching for packs)
cache.clear()
with self.assertNumQueries(5):
self.client.get(reverse('sounds-search') + '?only_p=1')

# Also check packs when displaying in grid mode
cache.clear()
with self.assertNumQueries(5):
self.client.get(reverse('sounds-search') + '?only_p=1&cm=1')

@mock.patch('search.views.perform_search_engine_query')
def test_search_page_with_filters(self, perform_search_engine_query):
Expand Down
5 changes: 3 additions & 2 deletions search/views.py
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,7 @@ def search_view_helper(request, tags_mode=False):
'filter_query': query_params['query_filter'],
'filter_query_split': filter_query_split,
'search_query': query_params['textual_query'],
'similar_to': query_params['similar_to'],
'group_by_pack_in_request': "1" if group_by_pack_in_request else "",
'disable_group_by_pack_option': disable_group_by_pack_option,
'only_sounds_with_pack': only_sounds_with_pack,
Expand All @@ -152,7 +153,6 @@ def search_view_helper(request, tags_mode=False):
'has_advanced_search_settings_set': contains_active_advanced_search_filters(request, query_params, extra_vars),
'advanced_search_closed_on_load': settings.ADVANCED_SEARCH_MENU_ALWAYS_CLOSED_ON_PAGE_LOAD
}

tvars.update(advanced_search_params_dict)

try:
Expand Down Expand Up @@ -205,7 +205,8 @@ def search_view_helper(request, tags_mode=False):
# sure to remove the filters for the corresponding facet field thar are already active (so we remove
# redundant information)
if tags_in_filter:
results.facets['tag'] = [(tag, count) for tag, count in results.facets['tag'] if tag not in tags_in_filter]
if 'tag' in results.facets:
results.facets['tag'] = [(tag, count) for tag, count in results.facets['tag'] if tag not in tags_in_filter]

tvars.update({
'paginator': paginator,
Expand Down

0 comments on commit 20d47d9

Please sign in to comment.