Skip to content

greenelab/word-lapse

Repository files navigation

Word Lapse

Explore how a word changes over time

✨ OPEN THE APP ✨

How does it work?

Type in a word, and Word Lapse will show you how its associated words and frequency of use has changed over the years, and some other interesting details. This information was generated by training multiple machine-learning models on 30+ million documents from Pubtator Central and on 160+ thousand preprints from bioRxiv and medRxiv.

Specifically, we used the Word2Vec natural-language-processing (NLP) technique, which represents words as dense (300 dimensional) vectors. This model constructs these vectors by training a shallow neural network to accomplish the NLP task of predicting a word given their neighboring words. Once the network has finished this task, these vectors contain information that allows a network to discern one word from the next and allows us to perform downstream tasks such as changepoint detection.

For more technical information about our approach and how we generated this data, see this paper.

API

The API for this application can be used directly at https://api-wl.greenelab.com/.

See the API documentation

License

Everything in this repo -- including the code, data, submodules, and app -- is licensed under BSD-3. See the license file

Development

To separate concerns and to make cloning and developing this repo easier, the model data (~26+ GB) for this project is stored in a separate submodule repo. See SUBMODULES.md for more information.

The backend for this app (under /server) consists of three components:

  • a RESTful API implemented in FastAPI
  • a Redis in-memory cache with writethrough to disk
  • a set of RQ workers that process word statistic lookups

The front-facing app (under /app) is made with React, bootstrapped with create-react-app.