Merge pull request #1753 from MTG/similarity-solr

Solr-based similarity search
MTG · Feb 9, 2024 · 20d47d9 · 20d47d9
2 parents 5c0df49 + 199ce0f
commit 20d47d9
Show file tree

Hide file tree

Showing 27 changed files with 811 additions and 477 deletions.
diff --git a/DEVELOPERS.md b/DEVELOPERS.md
@@ -144,7 +144,7 @@ If a new search engine backend class is to be implemented, it must closely follo
 utils.search.SearchEngineBase docstrings. There is a Django management command that can be used in order to test
 the implementation of a search backend. You can run it like:
 
-    docker-compose run --rm web python manage.py test_search_engine_backend -fsw --backend utils.search.backends.solr9pysolr.Solr9PySolrSearchEngine
+    docker compose run --rm web python manage.py test_search_engine_backend -fsw --backend utils.search.backends.solr9pysolr.Solr9PySolrSearchEngine
 
 Please read carefully the documentation of the management command to better understand how it works and how is it
 doing the testing.
@@ -217,7 +217,7 @@ https://github.com/mtg/freesound-audio-analyzers. The docker compose of the main
 services for the external analyzers which depend on docker images having been previously built from the 
 `freesound-audio-analyzers` repository. To build these images you simply need to checkout the code repository and run 
 `make`. Once the images are built, Freesound can be run including the external analyzer services by of the docker compose 
-file by running `docker-compose --profile analyzers up`
+file by running `docker compose --profile analyzers up`
 
 The new analysis pipeline uses a job queue based on Celery/RabbitMQ. RabbitMQ console can be accessed at port `5673`
 (e.g. `http://localhost:5673/rabbitmq-admin`) and using `guest` as both username and password. Also, accessing 
@@ -231,7 +231,7 @@ for Freesound async tasks other than analysis).
 
 - Make sure that there are no outstanding deprecation warnings for the version of django that we are upgrading to.
 
-      docker-compose run --rm web python -Wd manage.py test
+      docker compose run --rm web python -Wd manage.py test
 
 Check for warnings of the form `RemovedInDjango110Warning` (TODO: Make tests fail if a warning occurs)
 

diff --git a/README.md b/README.md
@@ -65,35 +65,35 @@ Below are instructions for setting up a local Freesound installation for develop
 
 8. Build all Docker containers. The first time you run this command can take a while as a number of Docker images need to be downloaded and things need to be installed and compiled. 
 
-       docker-compose build
+       docker compose build
 
 9. Download the [Freesound development database dump](https://drive.google.com/file/d/11z9s8GyYkVlmWdEsLSwUuz0AjZ8cEvGy/view?usp=share_link) (~6MB), uncompress it and place the resulting `freesound-small-dev-dump-2023-09.sql` in the `freesound-data/db_dev_dump/` directory. Then run the database container and load the data into it using the commands below. You should get permission to download this file from Freesound admins.
 
-       docker-compose up -d db
-       docker-compose run --rm db psql -h db -U freesound  -d freesound -f freesound-data/db_dev_dump/freesound-small-dev-dump-2023-09.sql
+       docker compose up -d db
+       docker compose run --rm db psql -h db -U freesound  -d freesound -f freesound-data/db_dev_dump/freesound-small-dev-dump-2023-09.sql
        # or if the above command does not work, try this one 
-       docker-compose run --rm --no-TTY db psql -h db -U freesound -d freesound < freesound-data/db_dev_dump/freesound-small-dev-dump-2023-09.sql
+       docker compose run --rm --no-TTY db psql -h db -U freesound -d freesound < freesound-data/db_dev_dump/freesound-small-dev-dump-2023-09.sql
 
 10. Update database by running Django migrations
 
-        docker-compose run --rm web python manage.py migrate
+        docker compose run --rm web python manage.py migrate
 
 11. Create a superuser account to be able to log in to the local Freesound website and to the admin site
 
-        docker-compose run --rm web python manage.py createsuperuser
+        docker compose run --rm web python manage.py createsuperuser
 
 12. Install static build dependencies
 
-        docker-compose run --rm web npm install --force
+        docker compose run --rm web npm install --force
 
 13. Build static files. Note that this step will need to be re-run every time there are changes in Freesound's static code (JS, CSS and static media files).
 
-        docker-compose run --rm web npm run build
-        docker-compose run --rm web python manage.py collectstatic --noinput
+        docker compose run --rm web npm run build
+        docker compose run --rm web python manage.py collectstatic --noinput
 
 14. Run services 🎉
 
-        docker-compose up
+        docker compose up
 
     When running this command, the most important services that make Freesound work will be run locally.
     This includes the web application and database, but also the search engine, cache manager, queue manager and asynchronous workers, including audio processing. 
@@ -102,24 +102,24 @@ Below are instructions for setting up a local Freesound installation for develop
 15. Build the search index, so you can search for sounds and forum posts
 
         # Open a new terminal window so the services started in the previous step keep running
-        docker-compose run --rm web python manage.py reindex_search_engine_sounds
-        docker-compose run --rm web python manage.py reindex_search_engine_forum
+        docker compose run --rm web python manage.py reindex_search_engine_sounds
+        docker compose run --rm web python manage.py reindex_search_engine_forum
 
     After following the steps, you'll have a functional Freesound installation up and running, with the most relevant services properly configured. 
     You can run Django's shell plus command like this:
 
-        docker-compose run --rm web python manage.py shell_plus
+        docker compose run --rm web python manage.py shell_plus
 
     Because the `web` container mounts a named volume for the home folder of the user running the shell plus process, command history should be kept between container runs :)
 
-16. (extra step) The steps above will get Freesound running, but to save resources in your local machine some non-essential services will not be started by default. If you look at the `docker-compose.yml` file, you'll see that some services are marked with the profile `analyzers` or `all`. These services include sound similarity, search results clustering and the audio analyzers. To run these services you need to explicitly tell `docker-compose` using the `--profile` (note that some services need additional configuration steps (see *Freesound analysis pipeline* section in `DEVELOPERS.md`):
+16. (extra step) The steps above will get Freesound running, but to save resources in your local machine some non-essential services will not be started by default. If you look at the `docker compose.yml` file, you'll see that some services are marked with the profile `analyzers` or `all`. These services include sound similarity, search results clustering and the audio analyzers. To run these services you need to explicitly tell `docker compose` using the `--profile` (note that some services need additional configuration steps (see *Freesound analysis pipeline* section in `DEVELOPERS.md`):
 
-        docker-compose --profile analyzers up   # To run all basic services + sound analyzers
-        docker-compose --profile all up         # To run all services
+        docker compose --profile analyzers up   # To run all basic services + sound analyzers
+        docker compose --profile all up         # To run all services
 
 
 ### Running tests
 
 You can run tests using the Django test runner in the `web` container like that:
 
-    docker-compose run --rm web python manage.py test --settings=freesound.test_settings
+    docker compose run --rm web python manage.py test --settings=freesound.test_settings
diff --git a/_docs/api/source/resources.rst b/_docs/api/source/resources.rst
@@ -80,7 +80,7 @@ Filter name             Type           Description
 ``avg_rating``          numerical      Average rating for the sound in the range [0, 5].
 ``num_ratings``         integer        Number of times the sound has been rated.
 ``comment``             string         Textual content of the comments of a sound  (tokenized). The filter is satisfied if sound contains the filter value in at least one of its comments.
-``comments``            integer        Number of times the sound has been commented.
+``num_comments``            integer        Number of times the sound has been commented.
 ======================  =============  ====================================================
 
 

diff --git a/freesound.code-workspace b/freesound.code-workspace
@@ -34,28 +34,23 @@
 	"tasks": {
 		"version": "2.0.0",
 		"tasks": [
-			{
-				"label": "Run web and search",
-				"type": "shell",
-				"command": "docker-compose up web search",
-				"problemMatcher": []
-			},
+
 			{
 				"label": "Docker compose build",
 				"type": "shell",
-				"command": "docker-compose build",
+				"command": "docker compose build",
 				"problemMatcher": []
 			},
 			{
 				"label": "Build static",
 				"type": "shell",
-				"command": "docker-compose run --rm web npm run build && docker-compose run --rm web python manage.py collectstatic --clear --noinput",
+				"command": "docker compose run --rm web npm run build && docker compose run --rm web python manage.py collectstatic --clear --noinput",
 				"problemMatcher": []
 			},
 			{
 				"label": "Install static",
 				"type": "shell",
-				"command": "docker-compose run --rm web npm install --force",
+				"command": "docker compose run --rm web npm install --force",
 				"problemMatcher": []
 			},
 			{
@@ -67,37 +62,55 @@
 			{
 				"label": "Create caches",
 				"type": "shell",
-				"command": "docker-compose run --rm web python manage.py create_front_page_caches && docker-compose run --rm web python manage.py create_random_sounds && docker-compose run --rm web python manage.py generate_geotags_bytearray",
+				"command": "docker compose run --rm web python manage.py create_front_page_caches && docker compose run --rm web python manage.py create_random_sounds && docker compose run --rm web python manage.py generate_geotags_bytearray",
 				"problemMatcher": []
 			},
 			{
 				"label": "Run tests",
 				"type": "shell",
-				"command": "docker-compose run --rm web python manage.py test --settings=freesound.test_settings",
+				"command": "docker compose run --rm web python manage.py test --settings=freesound.test_settings",
 				"problemMatcher": []
 			},
 			{
 				"label": "Run tests verbose with warnings",
 				"type": "shell",
-				"command": "docker-compose run --rm web python -Wa manage.py test -v3 --settings=freesound.test_settings",
+				"command": "docker compose run --rm web python -Wa manage.py test -v3 --settings=freesound.test_settings",
 				"problemMatcher": []
 			},
 			{
 				"label": "Migrate",
 				"type": "shell",
-				"command": "docker-compose run --rm web python manage.py migrate",
+				"command": "docker compose run --rm web python manage.py migrate",
 				"problemMatcher": []
 			},
 			{
 				"label": "Make migrations",
 				"type": "shell",
-				"command": "docker-compose run --rm web python manage.py makemigrations",
+				"command": "docker compose run --rm web python manage.py makemigrations",
 				"problemMatcher": []
 			},
 			{
 				"label": "Shell plus",
 				"type": "shell",
-				"command": "docker-compose run --rm web python manage.py shell_plus",
+				"command": "docker compose run --rm web python manage.py shell_plus",
+				"problemMatcher": []
+			},
+			{
+				"label": "Reindex search engine",
+				"type": "shell",
+				"command": "docker compose run --rm web python manage.py reindex_search_engine_sounds && docker compose run --rm web python manage.py reindex_search_engine_forum",
+				"problemMatcher": []
+			},
+			{
+				"label": "Post dirty sounds to search engine",
+				"type": "shell",
+				"command": "docker compose run --rm web python manage.py post_dirty_sounds_to_search_engine",
+				"problemMatcher": []
+			},
+			{
+				"label": "Orchestrate analysis",
+				"type": "shell",
+				"command": "docker compose run --rm web python manage.py orchestrate_analysis",
 				"problemMatcher": []
 			}
 		]

diff --git a/freesound/settings.py b/freesound/settings.py
@@ -638,6 +638,16 @@
 SOLR5_BASE_URL = "http://search:8983/solr"
 SOLR9_BASE_URL = "http://search:8983/solr"
 
+SEARCH_ENGINE_SIMILARITY_ANALYZERS = {
+    FSDSINET_ANALYZER_NAME: {
+        'vector_property_name': 'embeddings', 
+        'vector_size': 100,
+    }
+}
+SEARCH_ENGINE_DEFAULT_SIMILARITY_ANALYZER = FREESOUND_ESSENTIA_EXTRACTOR_NAME
+SEARCH_ENGINE_NUM_SIMILAR_SOUNDS_PER_QUERY = 500
+USE_SEARCH_ENGINE_SIMILARITY = False
+
 # -------------------------------------------------------------------------------
 # Similarity client settings
 SIMILARITY_ADDRESS = 'similarity'

diff --git a/general/tasks.py b/general/tasks.py
@@ -260,10 +260,12 @@ def process_analysis_results(sound_id, analyzer, status, analysis_time, exceptio
                 {'task_name': PROCESS_ANALYSIS_RESULTS_TASK_NAME, 'sound_id': sound_id, 'analyzer': analyzer, 'status': status,
                  'exception': str(exception), 'work_time': round(time.time() - start_time)}))
         else:
-            # Load analysis output to database field (following configuration  in settings.ANALYZERS_CONFIGURATION)
+            # Load analysis output to database field (following configuration in settings.ANALYZERS_CONFIGURATION)
             a.load_analysis_data_from_file_to_db()
-            # Set sound to index dirty so that the sound gets reindexed with updated analysis fields
-            a.sound.mark_index_dirty(commit=True)
+
+            if analyzer in settings.SEARCH_ENGINE_SIMILARITY_ANALYZERS or analyzer in settings.ANALYZERS_CONFIGURATION:
+                # If the analyzer produces data that should be indexed in the search engine, set sound index to dirty so that the sound gets reindexed soon
+                a.sound.mark_index_dirty(commit=True)
             workers_logger.info("Finished processing analysis results (%s)" % json.dumps(
                 {'task_name': PROCESS_ANALYSIS_RESULTS_TASK_NAME, 'sound_id': sound_id, 'analyzer': analyzer, 'status': status,
                  'work_time': round(time.time() - start_time)}))

diff --git a/search/templatetags/search.py b/search/templatetags/search.py
@@ -92,6 +92,8 @@ def display_facet(context, flt, facet, facet_type, title=""):
             context['sort'] if context['sort'] is not None else '',
             context['weights'] or ''
         )
+        if context['similar_to'] is not None:
+            element['add_filter_url'] += '&similar_to={}'.format(context['similar_to'])
         filtered_facet.append(element)
 
     # We sort the facets by count. Also, we apply an opacity filter on "could" type pacets

diff --git a/search/tests.py b/search/tests.py
@@ -20,7 +20,7 @@
 
 from django.core.cache import cache
 from django.test import TestCase
-from django.test.utils import skipIf
+from django.test.utils import skipIf, override_settings
 from django.urls import reverse
 from sounds.models import Sound
 from utils.search import SearchResults, SearchResultsPaginator
@@ -142,6 +142,7 @@ def test_search_page_response_ok(self, perform_search_engine_query):
         self.assertEqual(resp.context['error_text'], None)
         self.assertEqual(len(resp.context['docs']), self.NUM_RESULTS)
 
+
     @mock.patch('search.views.perform_search_engine_query')
     def test_search_page_num_queries(self, perform_search_engine_query):
         perform_search_engine_query.return_value = self.perform_search_engine_query_response
@@ -155,16 +156,32 @@ def test_search_page_num_queries(self, perform_search_engine_query):
         cache.clear()
         with self.assertNumQueries(1):
             self.client.get(reverse('sounds-search') + '?cm=1')
-
-        # Now check number of queries when displaying results as packs (i.e., searching for packs)
-        cache.clear()
-        with self.assertNumQueries(5):
-            self.client.get(reverse('sounds-search') + '?only_p=1')
-
-        # Also check packs when displaying in grid mode
-        cache.clear()
-        with self.assertNumQueries(5):
-            self.client.get(reverse('sounds-search') + '?only_p=1&cm=1')
+
+        with override_settings(USE_SEARCH_ENGINE_SIMILARITY=True):
+            # When using search engine similarity, there'll be one extra query performed to get the similarity status of the sounds
+
+            # Now check number of queries when displaying results as packs (i.e., searching for packs)
+            cache.clear()
+            with self.assertNumQueries(6):
+                self.client.get(reverse('sounds-search') + '?only_p=1')
+
+            # Also check packs when displaying in grid mode
+            cache.clear()
+            with self.assertNumQueries(6):
+                self.client.get(reverse('sounds-search') + '?only_p=1&cm=1')
+
+        with override_settings(USE_SEARCH_ENGINE_SIMILARITY=False):
+            # When not using search engine similarity, there'll be one less query performed as similarity state is retrieved directly from sound object
+
+            # Now check number of queries when displaying results as packs (i.e., searching for packs)
+            cache.clear()
+            with self.assertNumQueries(5):
+                self.client.get(reverse('sounds-search') + '?only_p=1')
+
+            # Also check packs when displaying in grid mode
+            cache.clear()
+            with self.assertNumQueries(5):
+                self.client.get(reverse('sounds-search') + '?only_p=1&cm=1')
 
     @mock.patch('search.views.perform_search_engine_query')
     def test_search_page_with_filters(self, perform_search_engine_query):

diff --git a/search/views.py b/search/views.py
@@ -131,6 +131,7 @@ def search_view_helper(request, tags_mode=False):
         'filter_query': query_params['query_filter'],
         'filter_query_split': filter_query_split,
         'search_query': query_params['textual_query'],
+        'similar_to': query_params['similar_to'],
         'group_by_pack_in_request': "1" if group_by_pack_in_request else "", 
         'disable_group_by_pack_option': disable_group_by_pack_option,
         'only_sounds_with_pack': only_sounds_with_pack,
@@ -152,7 +153,6 @@ def search_view_helper(request, tags_mode=False):
         'has_advanced_search_settings_set': contains_active_advanced_search_filters(request, query_params, extra_vars),
         'advanced_search_closed_on_load': settings.ADVANCED_SEARCH_MENU_ALWAYS_CLOSED_ON_PAGE_LOAD
     }
-
     tvars.update(advanced_search_params_dict)
 
     try:       
@@ -205,7 +205,8 @@ def search_view_helper(request, tags_mode=False):
         # sure to remove the filters for the corresponding facet field thar are already active (so we remove
         # redundant information)
         if tags_in_filter:
-            results.facets['tag'] = [(tag, count) for tag, count in results.facets['tag'] if tag not in tags_in_filter]
+            if 'tag' in results.facets:
+                results.facets['tag'] = [(tag, count) for tag, count in results.facets['tag'] if tag not in tags_in_filter]
 
         tvars.update({
             'paginator': paginator,