Combining Two Buzzwords Together

Peter Yu
6 min readDec 15, 2017

Cryptocurrency Robo Trader via Reinforcement Learning

Can we teach computers to do this for us?

About a month ago, I was looking for project ideas to practice AI programming using PyTorch when a friend of mine who’s been very excited about the whole cryptocurrency craze started telling me about Ethereum. I’m quite conservative when it comes to my money and investments, so I usually stay away from this kind of hubbub, but I thought it’d make a good practice project if I somehow combined cryptocurrency and machine learning together. And this is how I came up with the idea of creating a cryptocurrency robo trader using reinforcement learning.

The idea is simple: let’s reformulate trading of cryptocurrency into a game with states, actions and rewards, and use reinforcement learning algorithms to train an agent that would learn to play the game and ultimately make some money!

Game Version 1

I started with the most naive game: You start with a pile of cash and maybe some cryptocurrency (let’s say we have bitcoins). At every time step, you get some price data, and you can either buy or sell some bitcoins. More formally, we have:

State: (current cash, current crypto, price data)
Action: A real number. If positive, we buy some bitcoins, if negative, we sell.
Reward: Δ(total asset value) = Δ(current cash + current crypto * price)

For training, I used the REINFORCE algorithm, a simple but effective policy gradient reinforcement learning algorithm. As for the data, I used the minute-by-minute Coinbase price data from Kaggle. This dataset is huge! So I decided to use one week worth of price data from 10/10/2017 to 10/18/2017 as my training data set. I used a recurrent (LSTM) network with the hidden layer of 64 dimensions and 3 recurrent layers. The whole training set would be one time series in this case. Let’s see how it did during training:

Top: Blue line represents price, and each black vertical line represents the action. Bottom graph: total asset value at each time step.
Final asset value at each epoch.

As you can see, the training results are quite volatile without a sign of convergence. It does observe some upward trend, but I doubt it’ll get anywhere with more epochs. It was a disappointing result, but let’s see what was wrong with Game Version 1 to see if we can improve upon it.

What was wrong with Game Version 1?

The first problem I noticed was the reward function. My intuition was that my naive reward function does not give the agent strong enough signals to learn from. I read some research papers regarding reward functions, and I found two papers that used the Sharpe Ratio and its variations.

Another problem I noticed is that the agent made some illegal moves (buying and selling more than it can). This was because during the policy gradient search, I used the normal distribution as my stochastic function (the value produced by the network would be the mean with a predefined standard deviation), and naturally it included some illegal values. I’m still not sure how much impact this had, but it certainly didn’t help. A possible solution I came up with was the truncated normal distribution.

The biggest problem of all, though, was that in Game Version 1, you cannot “short”, meaning you can limit the loss by selling as the price starts to go down, but you can never make money. So I set to come up with a new game that would specifically solve this problem.

Game Version 2

For Game Version 2, I first researched into ways in which I can short bitcoins. Unfortunately, it’s not easy to short bitcoins. Due to its volatility, financial institutions like bitcoin exchanges are unwilling to lend you bitcoins to short, and the government, especially the US government, has tight regulations around it. In America, GDAX is the only exchange that lets you short bitcoins via margin trading, but you have to have at least 5 million dollars to qualify (and I certainly don’t have that much). Bitcoin futures are now available through the Chicago Mercantile Exchange, but at the time of coding, it was not available.

After some further research, I discovered a new exchange called Whaleclub. It’s a new cryptocurrency exchange based in Hong Kong where you are only allowed to use cryptocurrency to trade. You can also make bets on some foreign exchange rates and stocks. It also allows margin trading.

The most interesting aspect about Whaleclub was, though, their product called Turbo Trading. It’s a simple trading product where you only bet on the direction of the price in the next tick (1 minute or 5 minutes). If you get it right, you get some returns (for bitcoins it was 20% at the time of writing), and if you get it wrong, you lose your betting money. Now this was very attractive to me, because it drastically simplified the design of the game and the agent.

To simplify even further, I decided to fix the size of the bet. The resulting game is:

State: (price data)
Action: Long, Short or Hold
Reward: If correct, bet * return, otherwise -bet.

Now, the agent outputs a multinomial distribution from which we can draw the next discrete action. During training, I also tried a few different return values to see how the agent would behave. Let’s see how the training went:

Return 120%

Top: Price line, and colored dots represent actions. Bottom: Resulting rewards at every tick.
Final rewards through epochs.

Return 100%

Return 20%

You’ll notice that when the return is at least as big as the loss (120% and 100%), the agent behaves bullishly and takes a long position at almost every tick during training. On the other hand, when the return is smaller than the loss (20%), the agent is very risk-averse, and holds at almost every tick, pretty much refusing to play the game. The test results reflect this:

Conclusion

I trained various agents for two different cryptocurrency trading “games” using basic reinforcement learning techniques to see if it’s possible to create a profitable robo trader. It was successful in the sense that the agent learned that the best way to trade cryptocurrency (or any securities for that matter) based on past price data is not to trade. The reason is that a liquid market like cryptocurrency exchanges adheres to the efficient market hypothesis, and the price of a security in an efficient market is the agreement of the market based on all the information, both public and private. As a result, the previous prices are meaningless in terms of price predictions, because the market ultimately acts like an information black hole that sucks up all the information and outputs a meaningless number (akin to Hawking radiation). I might as well have tried to predict random noise, and this is why Michael Bloomberg is a billionaire.

The corollary is that if you do have access to some of this information, you may be able to make a reasonable prediction on the price of a security. I was thinking about applying NLP on news articles and tweets to figure out some kind of sentiment about cryptocurrencies, but for now I think I’m going to move on to a new project, as I’ve accomplished my initial goal of getting familiar with AI programming with PyTorch. Can’t get too greedy!

--

--

Peter Yu

PhD Student at UMich Researching NLP and Cognitive Architectures • Perviously Real-time Distributed System Engineer turned NLP Research Engineer at ASAPP