Skip to content
This repository has been archived by the owner on Dec 15, 2022. It is now read-only.

Project files for pattern recognition group assignment

Notifications You must be signed in to change notification settings

fqixiang/PatternRecognition

Repository files navigation

PatternRecognition

Project files for pattern recognition group assignment

Files

Currently contains the following files:

  1. data/WikiEssentials_L4.7z: output file of the WikiVitalArticles program. Each document is included in its entirety (but split by paragraph).
  2. preprocess_utils.py: preprocessing functions for Wiki data.
  3. model_utils.py: various utility functions used for modeling (e.g. loading embeddings).
  4. 1_preprocess_raw_data.py: preprocessing of raw input data. Currently shortens each article to first 8 sentences.
  5. 2_baseline_model.py: tokenization, vectorization of input data and baseline model (1-layer NN with softmax classifier).

Setup

  1. Download and install Anaconda Python 3
  2. Download latest version of Rstudio. Need this to run python scripts in Rstudio.
  3. In a terminal, go to this repository's folder and set up the Conda environment
conda env create -f environment.yml
  1. Install PyTorch with cuda 9.2 support
conda activate VitalWikiClassifier
conda install pytorch torchvision cudatoolkit=9.2 -c pytorch -c defaults -c numba/label/dev
  1. In R, install the reticulate library:
install.packages("reticulate")
  1. Check the .Rprofile file to ensure that R knows where to find your anaconda distribution.

About

Project files for pattern recognition group assignment

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published