Automated FX Trading System Using ARL

Automated FX Trading System Using ARL

Authors: M. Dempster & V. Leemans
Date: 2004 Website: http://www.cfr.statslab.cam.ac.uk/publications/papers.html

Introduction

Adaptative Reinforcement Learning ⇒ ARL
3 layers structure:

Machine Learning Algorithm:
- Using Recurrent Reinforcement Learning (RRL)

Risk Management Overlay:
- Restrain/shutdown trading when in high uncertainty.

Dynamic Utility Optimization:
- Render selection of fixed meta-parameters useless.
- Allow risk-return trade-off control by user.

Summary of previous work: Add link to Dempster - 2004 - Adaptative Systems For Foreing Exchange Trading [Quantitative Finance 4]

Find definition of the Sharpe Ratio ?

Previous work: Tarding system based on 2 superimposed AI algorithms: Add link to Dempster - 2002 - Intraday FX Trading: and evolutionary reinforcement learning approach

Machine Learning layer + Dynamaic optimization layer ⇒ Adaptative Reinforcement Learning.

Machine Learning Algorithm

Basic initial model: Add link to Moody & Saffell - 1999 - Minimizing downside risk via schochastic dynamic programming

Mathematical model and estimation procedure: add link to Moody & Saffell - 2001 - Learning to trade via direct reinforcement

Model used: \(F_t = sign\left( \sum\limits_{i=0}^M (w_{i,t} \cdot r_{t-i}) + w_{M+1,t} \cdot F_{t-1} + \nu_t\right)\)

\(F_t \in \{-1, 1\}\) is the position to take at time t.
\(w_t\) is the weight vector at time t.
\(\nu_t\) is the threshold at time t.
\(r_t = p_t - p_{t-1}\) are the raw return values at time t.

See all previous Dempster refs [1-5] for continuous time situation

At evrey trade we buy/sell 1 unit of the currency pair.
Profit at time T can be calculated as (ignoring interest rates):

\[\begin{align}P_T & = \sum\limits_{t=0}^T R_t \\ R_t & = F_{t-1} \cdot r_t - \delta |F_t - F_{t-1}|\end{align}\]

Where \(\delta\) is the transaction cost per trade.

We then define a differential sharpe ratio considering a moving average version of the classical sharpe ratio:

\[\begin{align}\hat{S}(t) & = \frac{A_t}{B_t} \\ A_t & = A_{t-1} + \eta(R_t - A_{t-1}) \\ B_t & = B_{t-1} + \eta (R_t^2 - B_{t-1})\end{align}\]

We then expand \(\hat{S}(t)\) into a Taylor serie in \(\eta\), and we can consider the first derivative as an instantaneous performance measure:

\[D_t = {\frac{d\hat{S}(t)}{d\eta}}_{|\eta=0} = \frac{B_{t-1}\Delta A_t - \frac 12 A_{t-1} \Delta B_t}{(B_{t-1} - A_{t-1}^2)^{\frac 32}}\]

We then use a simple gradient descent method to update the weights: \(w_{i,t} = w_{i,t-1} + \rho \Delta w_{i,t}\)

In case of online learning we can consider only the term that depends on the most recent return \(R_t\), so we get:

\[\Delta w_{i,t} = \frac{dD_t}{dw_i} \approx \frac{dD_t}{dR_t} \left( \frac{dR_t}{dF_t} \cdot \frac{dF_t}{dw_{i,t}} + \frac{dR_t}{dF_{t-1}} \cdot \frac{dF_{t-1}}{dw_{i,t-1}}\right)\]

Considering that the neural network is recurrent, we can get:

\[\frac{dF_t}{dw_{i,t}} \approx \frac{\partial F_t}{\partial w_{i,t}} + \frac{\partial F_t}{\partial F_{t-1}} \cdot \frac{dF_{t-1}}{dw_{i,t}}\]

RRL are very interesing for rolling window usage: add link to [Gold - 2003 - FX trading via recurrent reinforcement learning] and [Dempster - 2001 - Realtime adaptative trading system using genetic programming]

The system is then trained on \(n_e = 10\) epochs on training set of length \(L_{train} = 2000\) ticks [optimal value].
Then we test it on test set of length \(L_{test} = 500\) ticks [optimal value].

The Trading System

Extensions to Machine Learning layer

Extended to take into account other inputs such as signals from 14 technical indicators. ⇒ This didn't improve the results.

See [Dempster - 2001 - Computational learning techniques for intraday FX trading using popular technical indicators

During training phase the transaction cost \(\delta\) is left as a tuning parameter.

To prevent too big weights, all weights are rescaled by a factor \(f \lt 1\) as soon as a threshold value is hit.

Improved position updating scheme: recalculating the output \(F_t\) twice:
1. As before
2. After the weights are updated ⇒ provide a more accurate value which might be different.

Risk and performances management layer

Build a trailing stop-loss which is always adjusted x points under or above the best price reached during the life of a position.

If a position is closed because of the stoploss, then a cool-down period [⇒ using 1 minute] is used before trading again.

This layer can evaluate the strength of the signal received from the NN using the non-thresholded value (inside the sign() function).

A threshold y can be provided by the optimization layer to only enter a position when we have an higher certainty.

A maximum draw-down system [parameter z] is implemented to prevent complete failure of the trading system.

Dynamic optimization of utility layer

Definition of the risk measure:

\[\Sigma = \frac{\sum_{i=0}^N (R_i)^2 I\{R_i \lt 0\}}{\sum_{i=0}^N (R_i)^2 I\{R_i \gt 0\}}\]

Here \(R_i\) is the raw return at time i: \(R_i = W_i - W_{i-1}\), with \(W_i\) the cumulative profit at time i.

Then we define the utility function: \(U(\bar{R},\Sigma,\nu) = a \cdot (1 - \nu) \bar{R} - \nu \cdot \Sigma\), with:
- \(\nu\): Risk aversion parameter.
- \(\bar{R} = \frac{W_N}{N}\) : average profit per frequency interval.

Standard theoritical work on risk measure (Not applicable above): [Ruszczynski - 2002 - Dual Stochastic dominance and related mean-risk models] and [Ruszczynski - 2003 - Frontiers of stochastically non-dominated portfolios]

The parameter optimization problem becomes:

\[\max_{\delta, \eta, \rho, x, y} U(\bar{R},\Sigma : \delta, \eta, \rho, x, y)\]

⇒ Implemented as a one-at-a-time random search optimization (using 15 random evaluation around current value for each parameter).

Provide typical range of the parameters here.

The Trading System

Tested with the EURUSD pair:
- Frequency of 1 minute from January 200 to January 2002.
- Spread of 2 pips.
Trading between 9 am and 5 pm (london time)
Interdealer platform: EBS / Reuters3000
Used risk aversion value \(\nu = 0.5\)
⇒ Earned 5104 pips over 2 years

Conclusion

Future work:
- In [2,7] it has demonstrated that order book or order flow information could enhance the perfs.
- Risk management layer could control multiple trading systems for different currencies.

Table of Contents

Automated FX Trading System Using ARL

Introduction

Machine Learning Algorithm

The Trading System

Extensions to Machine Learning layer

Risk and performances management layer

Dynamic optimization of utility layer

The Trading System

Conclusion