Offline indexing for documentation to make it searchable.
- Clone this repository
- Clone https://github.com/cloudfoundry/docs-bosh (leave the name
docs-bosh
) cd python
make
(assumes you havepython3
installed)
This will index the documentation, creating index_*.json
files in the current directory.
Now that we have the indexes, we can feed it into a search engine to test things out. Let's copy the indexes into a place where the web app can find it.
$ make sync
Now that the data is copied, we prepare and launch the web app. First, we setup the python libraries needed to run the app:
pip install -r requirements.txt
Next we need to configure some essential variables for the app:
source dev_variables
(if you typeecho $DOCS_URL
and get nothing, something went wrong)
We're now ready to launch the app:
$ make run
If you get an error about ports, edit Makefile and change 8001
to something else. If all went well, you should now navigate to http://localhost:8001 to play with the search engine.
Currently two types of search are possible:
- search/title/[query]
- search/content/[query]
As the names suggest, the first matches documents based on title, the second based on the entire contents.
- 01/15/17
- started project in Golang
- rewrote initial project in Python
- introduced stop words
- 01/16/17
- introduced punctuation
- finished a basic prototype
- introduced a search client
- 01/20/17
- introduced LocalRepository, Document
- tokenize entire contents of docs, not just titles
- store positional info on tokens
- useful for efficient phrasal queries
- supports induction of titles (line_num == 2)
- decided not to do stemming
- 01/21/17
- introduced TitleIndexer, ContextIndexer
- added options to search by title or by content in the web app
- refactored
repo
module - extracted dict to json routine into a utility function
- The set of stop words is from Kevin Bougé.