Fork of dep_search with additional features like random search, xpos search or search for short sentences.
Requires docker
Run docker build -t drevesnik_base -f Dockerfile_service_base .
in the root of this project.
To run this project you first need to build indexes from conllu databases. Location of the indexes when the application is run is determined in docker-compose.yml
with the INDEXES_FOLDER
volume.
Create a folder with conllu database files. For each conllu file add a json metadata file with the same name as conllu file.
Metadata json file should contain:
- name - Name of the database.
- _desc - Html name and describtion of the db used on the first page for each language. Slovenian language sl should be supported by default. Proposed form:
<b>name</b><br> describtion
- priority - (optional) Defines how hight the database will be shown on the first page. Lower number gives higher priority. If no number is given it will assume the lowest priority.
Set up the corpra mounted folder in Dockerfile_build_dataset.yml
to point to the conllu folder and the output folder to the folder where database indexes will be generated.
Run docker-compose -f docker-compose-build-dataset.yml up --build
in the root of this project.
Config folder location is determined in docker-compose.yml
with the CONFIG_FOLDER
volume.
In each line of <config_folder>/cache_calls.txt
, there is a call that is executed when this service is started and its results are
cached forever to ensure that some queries are executed almost instantly. Each line contains parameters of query separated with
tabulator.
Parameters:
- Databases: Names of the databases to be queried are separated with a comma. ex: SSJ,SST
- Query: Query string. ex: _ <nsubj _
- Language: Language of the query (should be sl)
- Limit: Maximum number of sentences returned.
- Case: Value true if you want case insensitive search otherwise value false.
- Random: Value true for random hits otherwise false
- Short sentence search Value true for hits with sentences with 15 or fewer tokens otherwise false.
- Ticket: Name of the ticket where the query will be saved (important for the URL where query results will appear) (should not contain
.
,/
,_
).
Each of these queries can then be accessed at http://localhost/drevesnik/show/Ticket/Language/0/10.
For Slovene and English, there is a help page located on http://localhost/ for slovenian and http://localhost// for other languages.
Help page is generated from a markdown file saved in <config_folder>/dep-search_query-lang_original_<lang>.md
files.
Create <config_folder>/statistics_translations_<lang>
files for each language. There are some examples for slovenian and english.
Create <config_folder>/html_translations/<lang>
files for each language. There are some examples for slovenian and english.
Put all brands in <config_folder>/branding.json
the same way as in the example file.
All brandings should have:
- url - Web page link.
- image - Image url. Can also be image from
api_gui\static
folder like the examples in<config_folder>/branding.json
. - alt - Text that is shown if image url is not accessible.
Run
docker-compose up --build
to build and run this project. The home page of this service can be accessed at http://localhost/drevesnik.
Folder dep_search
contains all logic behind querying parsed sentences. The core script is dep_search/query.py
To run locally, cython scripts need to be built with file setup.py
by running pip install . -e
in the root of this project.
Folder api_gui
contains the flask web application. The core script is api_gui/api.py
where you can find all the endpoints.
Folder api_gui/templates
contains all the HTML templeates and folder api_gui/static
contains CSS and JSS scripts and images.