Skip to content

Latest commit

 

History

History
21 lines (20 loc) · 4.69 KB

ML_architecture.md

File metadata and controls

21 lines (20 loc) · 4.69 KB

ML - Architecture

Paper Conference Remarks
Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling Arxiv 2014 Evaluate LSTM and GRU on the tasks of polyphonic music modeling and speech signal modeling and results show that they indeed perform better than traditional recurrent units
Neural Turing Machines Arxiv 2014 Coupling neural networks to external memory resources, which they can interact with by attentional processes.
End-To-End Memory Networks NIPS 2015 1. Introduce a neural network with a recurrent attention model over a possibly large external memory. 2. The proposed model is trained end-to-end, and hence requires significantly less supervision during training
Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks NIPS 2015 To alleviate the discrepancy between training and inference for sequence prediction models, authors propose a curriculum learning strategy to gently change the training process from a fully guided scheme using the true previous token, towards a less guided scheme which mostly uses the generated token instead
Professor Forcing: A New Algorithm for Training Recurrent Networks NIPS 2016 1. Uses adversarial domain adaptation to encourage the dynamics of the recurrent network to be the same when training the network and when sampling from the network over multiple time steps. 2. Produce T-SNEs showing that Professor Forcing successfully makes the dynamics of the network during training and sampling more similar.
Hybrid Computing Using a Neural Network with Dynamic External Memory Nature 2016 Traditional neural networks are limited in their ability to represent variables and data structures and to store data over long timescales owing the lack of an external memory. This paper proposes a machine learning model called a differential neural computer (DNC), which consists of a neural network that can read from and write to an external memory matrix, analogous to the RAM in a conventional computer.
Style Transfer from Non-Parallel Text by Cross-Alignment NIPS 2017 1. Focuses on style transfer on the basis of non-parallel text. 2. Assume a shared latent content distribution across different text corpora, and propose a method that leverages refined alignment of latent representations to perform style transfer.
TopicRNN: A Recurrent Neural Network with Long-Range Semantic Dependency ICLR 2017 1. Propose TopicRNN, a recurrent neural network (RNN)-based language model designed to directly capture the global semantic meaning relating words in a document via latent topics. 2. The proposed TopicRNN model integrates the merits of RNNs and latent topic models: it captures local (syntactic) dependencies using an RNN and global (semantic) dependencies using latent topics.
Convolutional Sequence to Sequence Learning Arxiv 2017 1. Introduce an seq2seq architecture based entirely on convolutional neural networks. 2. Compared to recurrent models, the proposed architecture has the advantage of fully parallelization during training and easier optimization since the number of non-linearities is fixed and independent of the input length. 3. Use gated linear units to ease gradient propagation and equip each decoder layer with a separate attention module.
Neural Text Generation: A Practical Guide Arxiv 2017 1. Current neural generative models suffer from generating truncated or repetitive outputs, outputting bland and generic responses, or in some cases producing ungrammatical gibberish. 2. Give a practical guide for resolving such undesired behavior in text generation models, with the aim of helping enable real-world applications.
Recent Advances in Recurrent Neural Networks Arxiv 2017 1. Present a survey on RNNs and several new advances. 2. The fundamentals and recent advances are explained and the research challenges are introduced.
Dynamic Evaluation of Neural Sequence Models ICML 2018 1. Explore dynamic evaluation, where sequence models are adapted to the recent sequence history using gradient descent, assigning higher probabilities to re-occurring sequential patterns. 2. Achieve the state-of-the-art results in language modelling

Back to index