This is a remarkable achievement since Go has for long been considered a game unbreakable by artificial intelligence, mainly because the game was thought to rely to a large part upon human intuition in order to handle the extremely large amount of possible states during gameplay. The technique DeepMind used was a combination of deep neural networks and reinforcement learning. In machine learning deep neural networks has for the past few years been shown to achieve remarkable results in a number of different fields such as image recognition, speech recognition, language processing and so on, for some cool examples see The Unreasonable Effectiveness of Recurrent Neural Networks.
Reinforcement learning has not had the same amount of academic and public attention and has often been used to solve various toy problems. But, recently the combination of deep neural nets and reinforcement learning has proven to be very powerful and before DeepMind put their attention to Go they showed that a combination of these techniques could be used to achieve better than human results in a number of Atari computer games, using only the input and output that a regular human player would have access to.
The reason for combining a neural net with reinforcement learning is that a neural net will be able to handle a large amount of possible states. In plain reinforcement learning you often use a lookup table, and as long as the amount of possible states are finite and not too large this is fine. But when the number of possible states grows or continuous inputs are used then something that can handle a large state space is needed. To go beyond the toy examples, video games and board games this post is a tutorial for combining deep neural nets and self reinforcement learning and some real data and see if it is be possible to create a simple self learning quant or algorithmic financial trader.
Whatever the result will be in the end, a real algorithmic trader will be a very different beast to implement as there are numerous other factors that must be handled in live trading with real assets. Again, my goal is to explain and show the concept of self reinforcement learning combined with a neural network.
If you think you understand the basic concepts, then just search the internet for better and more mathematical correct explanations. In reinforcement learning there are a few basic notations and concepts:. Now, I will go through a few different cases where the complexity of data and the algorithm gradually increase.
Some key parts of the code is copied, but to keep the post readable and at a reasonable length the entire code for each example is not explained or copied here. Instead the code can be found on Github at https: In the first example I will see if I can learn the system to recognise an asset with a linearly increasing price. In terms of a quant trader this means that the trader should always buy or go long.
The data is simply created by a function that returns a straight line:. So what is this self reinforcement thing then? What we have here is a pretty basic system with a set of states, some actions that can be taken and some way of measuring rewards based to these actions.
We also have a Q function that should learn to approximate the reward. In a simple world we could just let Q be a table of all possible states and then find a way to explore all possible states, actions and rewards, save these to the table and then look up the best action for a given state when needed.
In a more complex world we need a way to generalise our knowledge and to be able to handle a very large number of different states. The self learning comes from a concept of looping through a number of different states and actions many times, and each time update the Q function a little bit.
So in each loop the Q function will know a little bit more about the world around it and should be able to approximate the real reward a little bit better for each possible action. Also, one very important thing in the learning process is to add a bit of randomness in order to explore as much as possible of the world. In our case we do this by adding a chance of selecting a random action instead of the action suggested by the Q function, this is the epsilon value in the code below.
In my code this main loop looks like this:. There is also a Q function, in this case the neural net. This is a simple three layer neural network with just 4 neurons in each layer that should be sufficient to learn what a straight line looks like. Ok, that are the basic components of the system. After one epoch one training loop in the main loop above I ask the system to suggest trades for each time step. This is the result:. Clearly our self learning quant has no clue what it is doing. Without modelling anything or giving any prior knowledge we have a system that has learnt what a straight line looks like: Now we can make things a little more complicated by replacing the straight line with a sine wave shaped line.
In example 1 and 2 there is no separate training and testing set of data, that is of course outrageous for anyone interested in machine learning but is done only to keep things simple. Remember that we evaluate each action based on its reward. By default this means that on each time step the system will learn what is the best choice of action to maximize its reward the next time step.
Gamma is chosen between 0 and 1 and by setting a large gamma we will value a high long term reward as well, so the system can learn to value a path of choices that will give a high reward several time steps into the future. The reward function will give a reward if the action or signal is in the same direction as the price movement, and it will also give a small extra reward if the action is the same as the last action. So what does this look like after one epoch?
Again, the system has no clue what is going on. What it has learned in one epoch is that it gets a reward if the action is the same as the previous action. Ahh, it can learn a what a smooth wavy line looks like.
With this simple neural network the result will not be much better even if we increase the number of epochs far beyond this. In this final example I have done a few changes to the basic code used above remember, the full code is available at https: Daily Bitcoin price data is used as input data source Kraken via Quandl 2.
The neural network is now a two layer recurrent neural network LSTM with 64 neurons in each layer. The self reinforcement learning loop is using a trick called experience replay that greatly improve the speed of learning by making each update batch bigger which is computationally efficient when updating a neural network.
Ok, so in the end the best result for this system using this input data was to buy and hold rather than to do shorter trades during this time frame. This is a very basic example using just a few common financial indicators, real traders use much more sophisticated tools. For those interested in more information and a tutorial based approached to learn the concepts of self reinforcement learning, I recommend to read the 3 part blog post series at http: Sign in Get started.
Home American Equity Bitcoin Bubble? In reinforcement learning there are a few basic notations and concepts: Action A, one of the possible actions than can be taken at time step S. Can be written as Q s,a. In our case Q is a neural network.
Straight line In the first example I will see if I can learn the system to recognise an asset with a linearly increasing price. The data is simply created by a function that returns a straight line: Hacker Noon is how hackers start their afternoons.
If you enjoyed this story, we recommend reading our latest tech stories and trending tech stories. Never miss a story from Hacker Noon , when you sign up for Medium. Get updates Get updates.More...