Warning
This project is a 🚧 work in progress 🚧. We do not recommend you use the code at this stage. Please contact phs.edris@phs.scot with any queries.
Tip
📓 Documentation can be found at https://public-health-scotland.github.io/dose_instruction_parser/
This repository contains code for parsing dose instructions. These are short pieces of free text written on prescriptions to tell patients how to use their medication. An example prescription is shown to the left, with the dose instruction "125mg three times daily" highlighted.
The code is written primarily in Python and consists of two main phases:
- Named entity recognition (NER) using a model trained via the
spacy
package to identify phrases linked to key information, e.g. "three times daily" is tagged asFREQUENCY
- Rules to extract structured output from the recognised entities, e.g.
frequencyMin=3.0 frequencyMax=3.0 frequencyType='Day'
Code to create the model (1.) can be found in the model folder. Code to parse dose instructions given a model (2.) can be found in the dose_instruction_parser folder.
When the code is installed, dose instructions can be parsed from the command line in the following way:
(di-dev)$ parse_dose_instructions -di "125mg three times daily" -mod "en_edris9"
StructuredDI(inputID=None, text='125mg three times daily', form='mg', dosageMin=125.0, dosageMax=125.0, frequencyMin=3.0, frequencyMax=3.0, frequencyType='Day', durationMin=None, durationMax=None, durationType=None, asRequired=False, asDirected=False)
Note
Code in the model
folder was used to generate a model for 1. called edris9
. This is based on the med7 model, further trained using examples specific to the prescribing information system data held by Public Health Scotland. Due to information governance, the edris9
model is not public. Please contact phs.edris@phs.scot if you wish to use the model.
Important
The code for the di_parser
package is based on the parsigs
package. We recommend you have a look at this package as it may be better suited to your needs.
📦dose_instructions_parser
┣ 📂.github
┃ ┣ 📂workflows
┣ 📂coverage # code coverage information
┣ 📂doc # documentation
┃ ┣ 📂examples # -- example scripts
┃ ┗ 📂sphinx # -- source behind github pages docs
┃ ┃ ┣ 📂source
┃ ┃ ┃ ┣ 📂doc_pages
┃ ┃ ┃ ┣ 📂modules
┃ ┃ ┃ ┃ ┗ 📂di_parser
┃ ┃ ┃ ┣ 📂_static
┣ 📂dose_instruction_parser # package for parsing dose instructions
┃ ┣ 📂di_parser
┃ ┃ ┣ 📂data
┃ ┃ ┣ 📂tests
┣ 📂model # code for creating NER model
┃ ┣ 📂config # -- model configuration
┃ ┣ 📂data # -- processed .spacy data created here
┃ ┣ 📂preprocess # -- code for pre-processing training
┃ ┃ ┣ 📂processed # ---- intermediate processing carried out here
┃ ┃ ┣ 📂tagged # ---- put tagged .json training data here
┗ ┗ 📂setup # -- script for setting up conda for model development
There are several different ways to set up the project. Please choose the one which is right for you.
If you are a PHS analyst and just want to parse dose instructions you can do this directly using R. You will need to follow the dose instructions SOP, which you can obtain from colleagues in eDRIS.
If you are an analyst wishing to develop the model or code, see below.
Warning
This package is 🚧 not yet available 🚧 on PyPI. This functionality is coming soon!
conda create -n di # setup new conda env
conda activate di # activate
pip install di_parser # install di_parser from PyPI
parse_dose_instructions -h # get help on parsing dose instructions
-
Clone this repository
-
Add a file called called
secrets.env
in the top level of the cloned repository with the following contents:export DI_FILEPATH="</path/to/model/folder>"
This sets the environment variable
DI_FILEPATH
where the code will read/write models. If you are working within Public Health Scotland please contact phs.edris@phs.scot to receive the filepath. -
Run
cd model/setup/ source ./set_up_conda.sh
to set up the conda environment (default name model)
-
Activate environment with e.g.
conda activate model
-
Clone this repository
-
Add a file called called
secrets.env
in the top level of the cloned repository with the following contents:export DI_FILEPATH="</path/to/model/folder>"
This sets the environment variable
DI_FILEPATH
where the code will read/write models. If you are working within Public Health Scotland please contact phs.edris@phs.scot to receive the filepath. -
Create new conda environment and activate:
conda create -n di-dev conda activate di-dev
-
Install package using editable pip install and development dependencies:
python -m pip install -e dose_instruction_parser[dev]
Important
Make sure you run this from the top directory of the repository
- Get developing!
- 📓 Check out the documentation at https://public-health-scotland.github.io/dose_instruction_parser/ for more information on how to use and develop the code
- 💊 See the README in the
dose_instruction_parser
folder for information on thedi_parser
package - 🔧 See the README in the
doc/sphinx
folder for information on adding to the documentation - 👷 See the README in the
.github/workflows
folder for information on GitHub workflows for this repository - 📧 Contact phs.edris@phs.scot with any queries