disorder-normalizer

A sieve-based system for normalizing disorder mentions in biomedical data.

The disorder-normalizer tool has been written in Java and is released as free software.

You can find more explanation about our normalization system here.

Usage:

If using the source code available in "src" folder. First, copy the "resources" folder into the "src" folder, then run the program as below.

java tool.Main <terminology/ontology-file> max-sieve-level

Sieve levels:
1 for exact match
2 for abbreviation expansion
3 for subject<->object conversion
4 for numbers replacement
5 for hyphenation
6 for affixation
7 for disorder synonyms replacement
8 for stemming
9 for composite disorder mentions
10 for partial match

An example execution using the "training", "test", and ontology "TERMINOLOGY.txt" files provided in the "ncbi-data" folder is shown below.

C:\disorder-normalizer\src>java tool.Main ..\ncbi-data\training\ ..\ncbi-data\test\ ..\ncbi-data\TERMINOLOGY.txt 10

If using the executable jar file "disorder-normalizer.jar". First, copy the "resources" folder into the same folder as the jar file, then run the program as show in example below.

C:\disorder-normalizer>java -jar disorder-normalizer.jar ..\ncbi-data\training\ ..\ncbi-data\test\ ..\ncbi-data\TERMINOLOGY.txt 10

On executing the program, a new folder called "output" will be automatically created in the same folder as the <test-data-dir> to which the result from normalizing the test mentions will be written. In addition, the system performance on normalizing the test disorder mentions will be printed to the terminal.

Please Note:
* In order to run the tool on new data, please ensure that your training, test, and terminology files are in the same format as the data provided in the "ncbi-data" folder.
* The training data files are used to train the normalizer. However, since disorder-normalizer is not a learning-based system, this folder can be empty.
* The tool will attempt to normalize the mentions in the test data folder files to the terminology.
* The TERMINOLOGY file is the ontology/knowledge-base used to which the test data mentions are normalized.

Detailed explanation about our normalization system can be found on webpage here.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
ncbi-data		ncbi-data
resources		resources
src/tool		src/tool
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
disorder-normalizer.jar		disorder-normalizer.jar

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ncbi-data

ncbi-data

resources

resources

src/tool

src/tool

.gitattributes

.gitattributes

.gitignore

.gitignore

README.md

README.md