General discussion on State-of-the-Art Research #82

JaCoderX · 2018-11-29T17:38:24Z

A lot of research in the field of RL is being done now days.
I thought it can be both interesting and productive to have a post that would bring new research from time to time that might be relevant to this project.

JaCoderX · 2018-11-29T18:16:13Z

CURIOSITY-DRIVEN LEARNING – EXPLORATION BY RANDOM NETWORK DISTILLATION

OpenAI have recently published a paper describing a new architecture extension for dealing with 'hard exploration' problem in Atari games. By highly rewarding the policy on exploration of 'state with interest' that would be normally ignored due to there complexity

This paper introduces an exploration bonus that is particularly simple to implement, works well with
high-dimensional observations, can be used with any policy optimization algorithm, and is efficient
to compute as it requires only a single forward pass of a neural network on a batch of experience.
Our exploration bonus is based on the observation that neural networks tend to have significantly
lower prediction errors on examples similar to those on which they have been trained. This motivates
the use of prediction errors of networks trained on the agent’s past experience to quantify the novelty
of new experience.

for better overview of the papar this blog offers also a nice diagram of the network.

On the same topic Uber had announced on their blog that they have achieved a significantly better results on the 'hard exploration' problem.
but for now no paper had been published

mysl · 2018-12-03T14:15:35Z

@Kismuz , it looks like Uber's new Go-Explore algorithm had some break-through
https://eng.uber.com/go-explore/

JaCoderX · 2018-12-06T12:00:03Z

Population Based Training (PBT) of Neural Networks

DeepMind had published a paper last year for 'lazy' hyperparameter tuning by self discovery of optimal hyperparameter set. Each worker is working with a small permutation of hyperparameters and during training the framework evaluate best performing worker/s and change the other workers accordingly to keep exploring the optimal set. (algo was tested on A3C)

PBT discovers a schedule of hyperparameter settings rather than following the generally sub-optimal strategy of trying to find a single fixed set to use for the whole course of training. With just a small modification to a typical distributed hyperparameter training framework, our method allows robust and reliable training of models. We demonstrate the effectiveness of PBT on deep reinforcement learning problems, showing faster wall-clock convergence and higher final performance of agents by optimising over a suite of hyperparameters

An implementation of PBT can be found in Ray - Tune library

JaCoderX · 2018-12-10T16:30:19Z

Unsupervised Predictive Memory in a Goal-Directed Agent

DeepMind recently published a paper on which they have presented a new external memory architecture (MERLIN) that is based on research from neuroscience.
external memory enhance drastically the ability of the model to access relevant temporal memory far beyond LSTM capabilities.

We propose MERLIN, an integrated AI agent architecture that acts in partially observed virtual reality environments and stores information in memory based on different principles from existing end-to-end AI systems: it learns to process high-dimensional sensory streams, compress and store them, and recall events with less dependence on task reward. We bring together ingredients from external memory systems, reinforcement learning, and state estimation (inference) models and combine them into a unified system using inspiration from three ideas originating in psychology and neuroscience: predictive sensory coding, the hippocampal representation theory of Gluck and Myers, and the temporal context model
and successor representation

Kismuz · 2018-12-14T19:51:47Z

Soft actor-critic (SAC) algorithm form UC Berkeley and Google Brain:

Blog post: https://bair.berkeley.edu/blog/2018/12/14/sac/
Paper: https://drive.google.com/file/d/1J8gZXJN0RqH-TkTh4UEikYSy8AqPTy9x/view

JaCoderX · 2018-12-15T16:27:49Z

Wow this is really impressive, this algo have some very nice proprieties in addition to showing great results.

JaCoderX · 2018-12-25T11:13:08Z

@Kismuz do you have any thoughts to bring Soft actor-critic algorithm to BTGym?

github repo by berkeley

Kismuz · 2018-12-25T11:43:45Z

@JacobHanouna, yes in general, not at the moment -
there are many exciting things could (and should) be done here: novel algorithms, network architectures, proper GPU support, live trading APIs, backtest parser presets for various types of assets to mention a few.
But to my sole believe and partially due to limited resources (single head and pair of hands) those are secondary objectives to implementing at least single algorithmic solution which can be justified as 'stable performing' at least with out of sample backtest.
At the moment implementing model-based mean-reverting pairs trading setup is my priority.
I do think of features implemented in package as of 'least acceptable baseline' supporting blocks for such a research. After any of such result is established and proven to be effective one can go down and improve by refining base components as it is more like software engineering job.

Of course I do welcome any contribution regarding all the aspects mentioned.

Kismuz · 2019-01-22T22:00:30Z

Recent high-level review from JPMorgan research group:
Idiosyncrasies and challenges of data driven learning in electronic trading

JaCoderX · 2019-01-26T16:10:17Z

not state of the art per se but interesting blog
Using the latest advancements in deep learning to predict stock price movements

one of the papers in the blog is also interesting
Simple random search provides a competitive approach to reinforcement learning

JaCoderX · 2019-02-16T19:54:38Z

While learning a bit about Meta-learning I came across the topic of Deep Neuroevolution which belong to the field of Genetic Algorithms.

Paper Repro: Deep Neuroevolution

Deep Neuroevolution: Genetic Algorithms are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning

Using Evolutionary AutoML to Discover Neural Network Architectures

mysl · 2019-02-17T00:32:37Z

Google/Deepmind's new paper "Learning Latent Dynamics for Planning from Pixels"
https://github.com/google-research/planet

PlaNet is a purely model-based reinforcement learning algorithm that solves control tasks from images by efficient planning in a learned latent space. PlaNet competes with top model-free methods in terms of final performance and training time while using substantially less interaction with the environment.

Kismuz · 2020-03-08T18:24:29Z

https://openai.com/blog/glow/
https://arxiv.org/pdf/1807.03039.pdf
https://arxiv.org/pdf/1605.08803.pdf
https://www.researchgate.net/publication/15614030_An_Information-Maximization_Approach_to_Blind_Separation_and_Blind_Deconvolution
https://www.cs.helsinki.fi/u/ahyvarin/papers/NN99.pdf

Kismuz added the information label Nov 29, 2018

Kismuz added discussion and removed information labels Dec 4, 2018

JaCoderX mentioned this issue Dec 8, 2018

Feature Request - Hyperparameter Tuning Framework #86

Closed

JaCoderX mentioned this issue Dec 13, 2018

Feature Request - External Memory Model #89

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

General discussion on State-of-the-Art Research #82

General discussion on State-of-the-Art Research #82

JaCoderX commented Nov 29, 2018

JaCoderX commented Nov 29, 2018

mysl commented Dec 3, 2018

JaCoderX commented Dec 6, 2018

JaCoderX commented Dec 10, 2018

Kismuz commented Dec 14, 2018

JaCoderX commented Dec 15, 2018

JaCoderX commented Dec 25, 2018 •

edited

Kismuz commented Dec 25, 2018 •

edited

Kismuz commented Jan 22, 2019

JaCoderX commented Jan 26, 2019

JaCoderX commented Feb 16, 2019

mysl commented Feb 17, 2019

Kismuz commented Mar 8, 2020

General discussion on State-of-the-Art Research #82

General discussion on State-of-the-Art Research #82

Comments

JaCoderX commented Nov 29, 2018

JaCoderX commented Nov 29, 2018

mysl commented Dec 3, 2018

JaCoderX commented Dec 6, 2018

JaCoderX commented Dec 10, 2018

Kismuz commented Dec 14, 2018

Soft actor-critic (SAC) algorithm form UC Berkeley and Google Brain:

JaCoderX commented Dec 15, 2018

JaCoderX commented Dec 25, 2018 • edited

Kismuz commented Dec 25, 2018 • edited

Kismuz commented Jan 22, 2019

JaCoderX commented Jan 26, 2019

JaCoderX commented Feb 16, 2019

mysl commented Feb 17, 2019

Kismuz commented Mar 8, 2020

JaCoderX commented Dec 25, 2018 •

edited

Kismuz commented Dec 25, 2018 •

edited