tf_encdec_seq2seq

Configurable advanced Encoder-Decoder Sequence-to-Sequence model. Built with TensorFlow.

Features

Easily configurable.
Unidirectional-RNN, Bidirectional-RNN
Attention models.
Bucketing.
Embedding models.

Requirements

pip install -r requirements.txt

Preparing Data

Put your TSV file under data/ directory as all_data.txt Each line in file is input-output pair, seperated with tab.
Then run python build_data_matrix.py
It will create your data matrices through your raw data. If pre-trained is not used, it will train an Embedding model.

Train

cough python train.py cough

Interactive mode & Inference through file

python interactive.py
python test.py my_input_sentences.txt

Config

rnn_unit | List | Specifies unit count of each layer on Encoder and Decoder.
rnn_cell | String | RNN cell type of Encoder and Decoder.
[LSTM, GRU]
encoder_rnn_type | String | Encoder's RNN type.
[unidirectional, bidirectional]
attention_mechanism | String | Attention mechanism of the model.
[luong, bahdanau, None]
attention_size | int | Attention size of the model. (If not specified, will be defined as rnn_unit's last element)
dense_layers | List | Specifies unit count of each layer on FC.
dense_activation | String | Activation function to be used on FC layer.
[relu, sigmoid, tanh, None]
optimizer | String | Optimizer function.
[sgd, adam, rmsprop]
learning_rate | Float | Learning rate.
dropout_keep_prob_dense | Float | Dropout keep-prob rate on FC layer. (> 0.0, <= 1.0)
dropout_keep_prob_rnn_input | Float | Dropout keep-prob rate on RNN input. (> 0.0, <= 1.0)
dropout_keep_prob_rnn_output | Float | Dropout keep-prob rate on RNN output. (> 0.0, <= 1.0)
dropout_keep_prob_rnn_state | Float | Dropout keep-prob rate on RNN state. (> 0.0, <= 1.0)
bucket_use_padding | Bool | If true, adds <pad> tags to input and output sentence. So reduces count of buckets.
bucket_padding_input | List | Bucket sizes of input.
bucket_padding_output | List | Bucket sizes of output.
train_epochs | int | Epochs to be passed during training model. (Each epoch saves model to disk.)
train_steps | int | Steps to be passed during training model.
train_batch_size | int | Batch-size during training.
log_per_step_percent | int | Percent value that will be used as progress log point.
embedding_use_pretrained | Bool | Use pre-trained Embedding or not.
embedding_pretrained_path | String | Path of the pre-trained Embedding files.
embedding_type | String | Embedding type of the model.
[word2vec, fasttext]
embedding_size | int | Embedding size of the model.
embedding_negative_sample | int | Embedding negative sampling value.
vocab_limit | int | Vocabulary limit during build Embedding model.
vocab_special_token | List | Special vocabulary tokens that will be used as padding tag, unknown words, start and end of the sentences.
ngram | int | N-gram value of the Embedding model.
reverse_input_sequence | Bool | If true, reverse words of the input sentence.
seq2seq_loss | Bool | Use seq2seq loss. That means, during loss calculation tags like <pad> going to be ignored.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
model		model
.gitignore		.gitignore
EncDec_Seq2Seq.py		EncDec_Seq2Seq.py
LICENSE.md		LICENSE.md
README.md		README.md
build_data_matrix.py		build_data_matrix.py
interactive.py		interactive.py
model.json		model.json
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py
utils.py		utils.py

License

galinator9000/tf_encdec_seq2seq

Folders and files

Latest commit

History

Repository files navigation

tf_encdec_seq2seq

Features

Requirements

Preparing Data

Train

Interactive mode & Inference through file

Config

About

Topics

Resources

License

Stars

Watchers

Forks

Languages