NOTE: this is a prototype which will be heavily evolved to be more generic, more robust and have much more functionalities.
This API is currently in development. Read Search-a-licious roadmap architecture notes to understand where we are headed.
The main file is api.py
, and the schema is in models/product.py
.
A CLI is available to perform common tasks.
Note: the Makefile will align the user id with your own uid for a smooth editing experience.
Before running the services, you need to make sure that your system mmap count is high enough for Elasticsearch to run. You can do this by running:
sudo sysctl -w vm.max_map_count=262144
Then build the services with:
make build
Start docker:
docker compose up -d
Note
You may encounter a permission error if your user is not part of the docker
group, in which case you should either add it or modify the Makefile to prefix sudo
to all docker and docker compose commands.
Docker spins up:
- Two elasticsearch nodes
- Elasticvue
- The search service on port 8000
- Redis on port 6379
You will then need to import from a JSONL dump (see instructions below).
For development, you have two options for running the service:
- Docker
- Locally
To develop on docker, make the changes you need, then build the image and compose by running:
make up
However, this tends to be slower than developing locally.
To develop locally, create a venv, install dependencies, then run the service:
virtualenv .
source venv/bin/activate
pip install -r requirements.txt
uvicorn app.api:app --reload --port=8001 --workers=4
Note that it's important to use port 8001, as port 8000 will be used by the docker version of the search service.
This repo uses pre-commit to enforce code styling, etc. To use it:
pre-commit install
To run tests without committing:
pre-commit run
To import data from the JSONL export, download the dataset in the data
folder, then run:
make import-dataset filepath='products.jsonl.gz'
If you get errors, try adding more RAM (12GB works well if you have that spare), or slow down the indexing process by setting num_processes
to 1 in the command above.
Typical import time is 45-60 minutes.
If you want to skip updates (eg. because you don't have a Redis installed),
use make import-dataset filepath='products.jsonl.gz' args="--skip-updates"
This project has received financial support from the NGI Search (New Generation Internet) program, funded by the European Commission.