Skip to content

ArushiSinghal/Neural-Machine-Translation-English-Hindi-for-domain-data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Neural-Machine-Translation

NLP Application Project

2.2.3 Build an NMT (Neural MT) system when training data (parallel sentences in the concerned source and target language) is available in a domain. However, such domain data is of small size. Machine learning is to be used in such a way that the small sized domain data can be combined with the large amount of general data.

Contributor:

  1. Arushi Singhal 201516178
  2. Simran Singhal 201516190

Presentation :- https://docs.google.com/presentation/d/1UgQXnST6rxZpctD8Atuaus7-2tdmhHMxCvMiZXemXck/edit?usp=sharing

Interim Report:- https://docs.google.com/document/d/1n1o2qPxLaCnB0E83i_ZiPZCA_8fN_uMCrQ-CQCzlql4/edit?usp=sharing

Report:- https://docs.google.com/document/d/10rAypGzTKjiJOw9Xe0qi9jYFTlNq8AQitohGlploK44/edit?usp=sharing

References

  1. https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html (main)
  2. https://arxiv.org/abs/1409.3215 (Research Paper)
  3. http://www.manythings.org/anki/
  4. https://machinelearningmastery.com/encoder-decoder-recurrent-neural-network-models-neural-machine-translation/
  5. https://machinelearningmastery.com/encoder-decoder-long-short-term-memory-networks/
  6. https://machinelearningmastery.com/develop-neural-machine-translation-system-keras/
  7. http://jalammar.github.io/visualizing-neural-machine-translation-mechanics-of-seq2seq-models-with-attention/
  8. https://towardsdatascience.com/nlp-sequence-to-sequence-networks-part-1-processing-text-data-d141a5643b72
  9. https://towardsdatascience.com/nlp-sequence-to-sequence-networks-part-2-seq2seq-model-encoderdecoder-model-6c22e29fd7e1
  10. https://nlp.stanford.edu/~johnhew/public/14-seq2seq.pdf
  11. https://www.analyticsvidhya.com/blog/2018/03/essentials-of-deep-learning-sequence-to-sequence-modelling-with-attention-part-i/
  12. https://blog.keras.io/a-ten-minute-introduction-to-sequence-to-sequence-learning-in-keras.html
  13. https://www.coursera.org/learn/nlp-sequence-models/lecture/ftkzt/recurrent-neural-network-model
  14. https://machinelearningmastery.com/encoder-decoder-attention-sequence-to-sequence-prediction-keras/ (important)
  15. https://blog.keras.io/a-ten-minute-introduction-to-sequence-to-sequence-learning-in-keras.html
  16. https://github.com/bentrevett/pytorch-seq2seq/blob/master/1%20-%20Sequence%20to%20Sequence%20Learning%20with%20Neural%20Networks.ipynb
  17. https://towardsdatascience.com/word-level-english-to-marathi-neural-machine-translation-using-seq2seq-encoder-decoder-lstm-model-1a913f2dc4a7
  18. https://discuss.pytorch.org/t/are-the-outputs-of-bidirectional-gru-concatenated/15103
  19. https://towardsdatascience.com/attention-seq2seq-with-pytorch-learning-to-invert-a-sequence-34faf4133e53
  20. https://github.com/spro/practical-pytorch/blob/master/seq2seq-translation/seq2seq-translation-batched.ipynb
  21. https://towardsdatascience.com/understanding-bidirectional-rnn-in-pytorch-5bd25a5dd66
  22. https://discuss.pytorch.org/t/cuda-changes-expected-lstm-hidden-dimensions/10765/6
  23. https://discuss.pytorch.org/t/cuda-changes-expected-lstm-hidden-dimensions/10765/6
  24. https://github.com/A-Jacobson/minimal-nmt/blob/master/nmt_tutorial.ipynb (Important)
  25. https://medium.com/@martinpella/how-to-use-pre-trained-word-embeddings-in-pytorch-71ca59249f76 (GloVe in pytorch)

Hindi text Normalization

  1. http://talukdar.net/papers/KBCS04_HPL-1.pdf
  2. https://medium.com/lingvo-masino/do-you-know-about-text-normalization-a19fe3090694

The IIT Bombay English-Hindi Parallel Corpus

https://www.cse.iitb.ac.in/~pb/papers/lrec18-iitbparallel.pdf

Document Link to the Errors found in the Dataset

https://docs.google.com/document/d/1zz67TTlVi0YuH7zUjD3up4O_7qKd8lCtElhxcH1bMWk/edit

Data Generator

https://stanford.edu/~shervine/blog/keras-how-to-generate-data-on-the-fly

Pytorch Neural Network Colab link got through flow group

https://colab.research.google.com/drive/1DgkVmi6GksWOByhYVQpyUB4Rk3PUq0Cp?fbclid=IwAR076PTAKeD99mN-htpMxCY4FaJNadF_OfCNry02rBwwixadJ-n1rygnW7I#scrollTo=6Q1AhoIB-pkp

Anaconda installation

https://www.digitalocean.com/community/tutorials/how-to-install-anaconda-on-ubuntu-18-04-quickstart

Multiple GPUs

https://www.pyimagesearch.com/2017/10/30/how-to-multi-gpu-training-with-keras-python-and-deep-learning/

  1. https://github.com/ZhenYangIACAS/NMT
  2. https://github.com/tuzhaopeng/nmt
  3. https://paperswithcode.com/paper/modeling-coverage-for-neural-machine#code

Thesis work done for converting between Hindi to English on almost same size of data

https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=4&cad=rja&uact=8&ved=2ahUKEwj72dn33fXhAhWNbn0KHYNnDNUQFjADegQIBBAC&url=http%3A%2F%2Fweb2py.iiit.ac.in%2Fresearch_centres%2Fpublications%2Fdownload%2Fmastersthesis.pdf.af2224b7bc18088c.4b756e616c2d5468657369732d46696e616c2e706466.pdf&usg=AOvVaw2PZO-pochZDvz7x-4t49pa

Researchgate for hindi to english machine translation

https://www.researchgate.net/publication/228783817_Machine_translation_of_bi-lingual_hindi-english_hinglish_text

About

NLP Application Project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published