Skip to content

dlpbc/gru-language-modeling

Repository files navigation

Language Modeling on Locally curated dataset

An interesting project that entails training neural net language model on a locally curated dataset. See below or here for description of the dataset.

Dataset - 5k featured links on nairaland

  • 5000 sentences
  • One sentence per line
  • Each sentence is the title of a featured link thread on nairaland

Network Description

2 Layer Gated Recurrent Units (GRU). A slightly modified version of what was implemented here.

Network Configuration

  • 500 units word embedding layer
  • 400 gru units per gru layer -vocabulary size: 500
  • initial learning rate: 0.008 (annealed over epochs down to 0.00001)
  • number of epochs: 115

Network Performance

  • 0.92 loss after 115 epochs (Pre-trained model located in trained_model_param directory.

Usage

  • Train Model
python train_gru_rnn.py
  • Generate sentences (featured links) from pre-trained model
python sentence_generator.py

Some Sample Sentences (featured links) Generated by trained model

  • what to do during asuu strike
  • nigeria vs cameroon : who is your man of the strike ?
  • see new fayose for president 2019 today
  • a nairalander 's pre-wedding photos
  • how much do you members ways things ?
  • do you has caught phone ?
  • who is a his son ?
  • quit to what youths to buhari , in they about
  • president buhari receives state 2017
  • 5 things you need to know when building a new home

About

Language modeling using nairaland featured links as dataset

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages