Skip to content

guangyuzh/nlu-hmrnn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Word and Constituent Boundaries in Hierarchical Multiscale Recurrent Neural Networks

Branches under development

  • Parser benchmark on Penn Treebank: ptb
  • Question Answering: train_qa

References

Corpus

  • Text8
  • Penn Treebank partially available from NLTK
>>> import nltk
>>> nltk.download()
...
Identifier> treebank
  • Generate groundtruth boundary labels from Penn Treebank under treebank/: python convert_boundary.py --path TARGET_PATH --threshold MIN_TOKENS

Usages

Parser Benchmark

  • End-to-end training, testing, and evaluation on NYU HPC clusters:
sbatch ptb_pipe.sbt
  • Tuning configurations: modify hierarchical-rnn/config.yml
  • Relax, wait, and collect pickled output(s)

Updated Progress

  1. F1 score of HM-RNN boundary detection:

    1. (finished) Convert parsing in PTB to 1s/0s boundary indicators, and use that as ground truth boundaries
    2. (finished) Test trained HM-LSTM models on PTB, and store layer-wise indicators
    3. (finished) calculate F1 scores of HM-LSTM for some layer’s boundary indicators (TODO: plot fancy figures)
    4. (finished) Calculate BPC (LM evaluation metric) by these HM-LSTM on PTB
    5. Train more models; compare the correlation/trending of F1 and BPC
  2. Statistically analyze with PCFG from PTB:

    1. (finished) Compute PCFGs from PTB
    2. Pick the model with best syntactic meanings of HM-LSTM boundary indicators / highest F1 score
    3. Find out if/what constituencies detected by HM-LSTM boundary coincide with PCFGs
  3. QA on children book dataset

    1. (finished) Setup data preprocessing, pipeline to hm-lstm model
    2. (finished) Tune to improve test precision
    3. Replace self embedding nets with GloVe pre-trained word embeddings
    4. Beat the baseline performance of vanilla LSTM

About

Hierarchical Multiscale RNN, course project for "NLU and Computational Semantic" at NYU

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published