Skip to content

openfoodfacts/search-a-licious

Repository files navigation

Search-a-licious

NOTE: this is a prototype which will be heavily evolved to be more generic, more robust and have much more functionalities.

This API is currently in development. Read Search-a-licious roadmap architecture notes to understand where we are headed.

Organization

The main file is api.py, and the schema is in models/product.py.

A CLI is available to perform common tasks.

Running locally

Note: the Makefile will align the user id with your own uid for a smooth editing experience.

Before running the services, you need to make sure that your system mmap count is high enough for Elasticsearch to run. You can do this by running:

sudo sysctl -w vm.max_map_count=262144

Then build the services with:

make build

Start docker:

docker compose up -d

Note

You may encounter a permission error if your user is not part of the docker group, in which case you should either add it or modify the Makefile to prefix sudo to all docker and docker compose commands.

Docker spins up:

  • Two elasticsearch nodes
  • Elasticvue
  • The search service on port 8000
  • Redis on port 6379

You will then need to import from a JSONL dump (see instructions below).

Development

For development, you have two options for running the service:

  1. Docker
  2. Locally

To develop on docker, make the changes you need, then build the image and compose by running:

make up

However, this tends to be slower than developing locally.

To develop locally, create a venv, install dependencies, then run the service:

virtualenv .
source venv/bin/activate
pip install -r requirements.txt
uvicorn app.api:app --reload --port=8001 --workers=4

Note that it's important to use port 8001, as port 8000 will be used by the docker version of the search service.

Pre-Commit

This repo uses pre-commit to enforce code styling, etc. To use it:

pre-commit install

To run tests without committing:

pre-commit run

Running the import:

To import data from the JSONL export, download the dataset in the data folder, then run:

make import-dataset filepath='products.jsonl.gz'

If you get errors, try adding more RAM (12GB works well if you have that spare), or slow down the indexing process by setting num_processes to 1 in the command above.

Typical import time is 45-60 minutes.

If you want to skip updates (eg. because you don't have a Redis installed), use make import-dataset filepath='products.jsonl.gz' args="--skip-updates"

Fundings

This project has received financial support from the NGI Search (New Generation Internet) program, funded by the European Commission.

NGI-search logo

European flag