GitHub - BMKEG/sciDT-pipeline: UimaBioC-based pipelines to drive sciDT analysis.

This system is specfically designed to prepare data for the 'Science Discourse Tagger' https://github.com/BMKEG/sciDT orignally developed by Ed Hovy's group at CMU. These are UIMA pipelines that process open access articles from PubMed Central to generate split-clause data to be processed by sciDT. We also provide scripts to facilitate gathering training data from human annotations and export functions to our preferred output formats.

Instructions for Mac

Install the necessary dependencies for nxml2txt:

sudo port install texlive-latex texlive-latex-recommended texlive-latex-extra py-lxml

Install nxml2txt and add the binary to your $PATH:

git clone https://github.com/spyysalo/nxml2txt.git
cd nxml2txt
chmod 755 nxml2txt nxml2txt.sh

Clone and build this library

git clone https://github.com/BMKEG/sciDT-pipeline
cd sciDT-pipeline
mvn -DskipTests clean assembly:assembly

This will build a fully assembled jar file here:

Running the system is best performed using the provided shell script that executes the edu.isi.bmkeg.sciDT.bin.SciDT_0_Nxml2SciDT class.

./runPipeline /path/to/folder/ #nThreads /path/to/nxml2txt/executable

Where #nThreads is the number of threads we want to run the preprocessing pipeline on.

This should run to generate a number of files in subfolders. These are:

nxml2txt
bioc
preprocessed_bioc_results
scidt
tsv

The input files for the main sciDT system are in (A) the scidt folder and (B) the tsv folder

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.idea		.idea
src/main		src/main
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
drools_exptClassification.xls		drools_exptClassification.xls
pom.xml		pom.xml
runPipeline.sh		runPipeline.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.idea

.idea

src/main

src/main

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

drools_exptClassification.xls

drools_exptClassification.xls

pom.xml

pom.xml

runPipeline.sh

runPipeline.sh

Repository files navigation

Instructions for Mac

About

Releases 1

Packages

Languages

License

BMKEG/sciDT-pipeline

Folders and files

Latest commit

History

Repository files navigation

Instructions for Mac

About

Topics

Resources

License

Stars

Watchers

Forks

Languages