Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

General discussion on State-of-the-Art Research #82

Open
JaCoderX opened this issue Nov 29, 2018 · 13 comments
Open

General discussion on State-of-the-Art Research #82

JaCoderX opened this issue Nov 29, 2018 · 13 comments

Comments

@JaCoderX
Copy link
Contributor

A lot of research in the field of RL is being done now days.
I thought it can be both interesting and productive to have a post that would bring new research from time to time that might be relevant to this project.

@JaCoderX
Copy link
Contributor Author

CURIOSITY-DRIVEN LEARNING – EXPLORATION BY RANDOM NETWORK DISTILLATION

OpenAI have recently published a paper describing a new architecture extension for dealing with 'hard exploration' problem in Atari games. By highly rewarding the policy on exploration of 'state with interest' that would be normally ignored due to there complexity

This paper introduces an exploration bonus that is particularly simple to implement, works well with
high-dimensional observations, can be used with any policy optimization algorithm, and is efficient
to compute as it requires only a single forward pass of a neural network on a batch of experience.
Our exploration bonus is based on the observation that neural networks tend to have significantly
lower prediction errors on examples similar to those on which they have been trained. This motivates
the use of prediction errors of networks trained on the agent’s past experience to quantify the novelty
of new experience.

for better overview of the papar this blog offers also a nice diagram of the network.

On the same topic Uber had announced on their blog that they have achieved a significantly better results on the 'hard exploration' problem.
but for now no paper had been published

@mysl
Copy link

mysl commented Dec 3, 2018

@Kismuz , it looks like Uber's new Go-Explore algorithm had some break-through
https://eng.uber.com/go-explore/

@JaCoderX
Copy link
Contributor Author

JaCoderX commented Dec 6, 2018

Population Based Training (PBT) of Neural Networks

DeepMind had published a paper last year for 'lazy' hyperparameter tuning by self discovery of optimal hyperparameter set. Each worker is working with a small permutation of hyperparameters and during training the framework evaluate best performing worker/s and change the other workers accordingly to keep exploring the optimal set. (algo was tested on A3C)

PBT discovers a schedule of hyperparameter settings rather than following the generally sub-optimal strategy of trying to find a single fixed set to use for the whole course of training. With just a small modification to a typical distributed hyperparameter training framework, our method allows robust and reliable training of models. We demonstrate the effectiveness of PBT on deep reinforcement learning problems, showing faster wall-clock convergence and higher final performance of agents by optimising over a suite of hyperparameters

An implementation of PBT can be found in Ray - Tune library

@JaCoderX
Copy link
Contributor Author

Unsupervised Predictive Memory in a Goal-Directed Agent

DeepMind recently published a paper on which they have presented a new external memory architecture (MERLIN) that is based on research from neuroscience.
external memory enhance drastically the ability of the model to access relevant temporal memory far beyond LSTM capabilities.

We propose MERLIN, an integrated AI agent architecture that acts in partially observed virtual reality environments and stores information in memory based on different principles from existing end-to-end AI systems: it learns to process high-dimensional sensory streams, compress and store them, and recall events with less dependence on task reward. We bring together ingredients from external memory systems, reinforcement learning, and state estimation (inference) models and combine them into a unified system using inspiration from three ideas originating in psychology and neuroscience: predictive sensory coding, the hippocampal representation theory of Gluck and Myers, and the temporal context model
and successor representation

@Kismuz
Copy link
Owner

Kismuz commented Dec 14, 2018

Soft actor-critic (SAC) algorithm form UC Berkeley and Google Brain:

Blog post: https://bair.berkeley.edu/blog/2018/12/14/sac/
Paper: https://drive.google.com/file/d/1J8gZXJN0RqH-TkTh4UEikYSy8AqPTy9x/view

@JaCoderX
Copy link
Contributor Author

Wow this is really impressive, this algo have some very nice proprieties in addition to showing great results.

@JaCoderX
Copy link
Contributor Author

JaCoderX commented Dec 25, 2018

@Kismuz do you have any thoughts to bring Soft actor-critic algorithm to BTGym?

github repo by berkeley

@Kismuz
Copy link
Owner

Kismuz commented Dec 25, 2018

@JacobHanouna, yes in general, not at the moment -
there are many exciting things could (and should) be done here: novel algorithms, network architectures, proper GPU support, live trading APIs, backtest parser presets for various types of assets to mention a few.
But to my sole believe and partially due to limited resources (single head and pair of hands) those are secondary objectives to implementing at least single algorithmic solution which can be justified as 'stable performing' at least with out of sample backtest.
At the moment implementing model-based mean-reverting pairs trading setup is my priority.
I do think of features implemented in package as of 'least acceptable baseline' supporting blocks for such a research. After any of such result is established and proven to be effective one can go down and improve by refining base components as it is more like software engineering job.

Of course I do welcome any contribution regarding all the aspects mentioned.

@Kismuz
Copy link
Owner

Kismuz commented Jan 22, 2019

Recent high-level review from JPMorgan research group:
Idiosyncrasies and challenges of data driven learning in electronic trading

@JaCoderX
Copy link
Contributor Author

not state of the art per se but interesting blog
Using the latest advancements in deep learning to predict stock price movements

one of the papers in the blog is also interesting
Simple random search provides a competitive approach to reinforcement learning

@JaCoderX
Copy link
Contributor Author

While learning a bit about Meta-learning I came across the topic of Deep Neuroevolution which belong to the field of Genetic Algorithms.

Paper Repro: Deep Neuroevolution

Deep Neuroevolution: Genetic Algorithms are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning

Using Evolutionary AutoML to Discover Neural Network Architectures

@mysl
Copy link

mysl commented Feb 17, 2019

Google/Deepmind's new paper "Learning Latent Dynamics for Planning from Pixels"
https://github.com/google-research/planet

PlaNet is a purely model-based reinforcement learning algorithm that solves control tasks from images by efficient planning in a learned latent space. PlaNet competes with top model-free methods in terms of final performance and training time while using substantially less interaction with the environment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants