# Automated FX Trading System Using ARL

Authors: M. Dempster & V. Leemans
Date: 2004 Website: http://www.cfr.statslab.cam.ac.uk/publications/papers.html

## Introduction

• Adaptative Reinforcement Learning ⇒ ARL
• 3 layers structure:

• Machine Learning Algorithm:
• Using Recurrent Reinforcement Learning (RRL)
• Risk Management Overlay:
• Restrain/shutdown trading when in high uncertainty.
• Dynamic Utility Optimization:
• Render selection of fixed meta-parameters useless.
• Allow risk-return trade-off control by user.
Find definition of the Sharpe Ratio ?
Previous work: Tarding system based on 2 superimposed AI algorithms: Add link to Dempster - 2002 - Intraday FX Trading: and evolutionary reinforcement learning approach
• Machine Learning layer + Dynamaic optimization layer ⇒ Adaptative Reinforcement Learning.

## Machine Learning Algorithm

Basic initial model: Add link to Moody & Saffell - 1999 - Minimizing downside risk via schochastic dynamic programming
Mathematical model and estimation procedure: add link to Moody & Saffell - 2001 - Learning to trade via direct reinforcement
• Model used: $F_t = sign\left( \sum\limits_{i=0}^M (w_{i,t} \cdot r_{t-i}) + w_{M+1,t} \cdot F_{t-1} + \nu_t\right)$
• $F_t \in \{-1, 1\}$ is the position to take at time t.
• $w_t$ is the weight vector at time t.
• $\nu_t$ is the threshold at time t.
• $r_t = p_t - p_{t-1}$ are the raw return values at time t.
See all previous Dempster refs [1-5] for continuous time situation
• At evrey trade we buy/sell 1 unit of the currency pair.
• Profit at time T can be calculated as (ignoring interest rates):

\begin{align}P_T & = \sum\limits_{t=0}^T R_t \\ R_t & = F_{t-1} \cdot r_t - \delta |F_t - F_{t-1}|\end{align}

• Where $\delta$ is the transaction cost per trade.
• We then define a differential sharpe ratio considering a moving average version of the classical sharpe ratio:

\begin{align}\hat{S}(t) & = \frac{A_t}{B_t} \\ A_t & = A_{t-1} + \eta(R_t - A_{t-1}) \\ B_t & = B_{t-1} + \eta (R_t^2 - B_{t-1})\end{align}

• We then expand $\hat{S}(t)$ into a Taylor serie in $\eta$, and we can consider the first derivative as an instantaneous performance measure:

$D_t = {\frac{d\hat{S}(t)}{d\eta}}_{|\eta=0} = \frac{B_{t-1}\Delta A_t - \frac 12 A_{t-1} \Delta B_t}{(B_{t-1} - A_{t-1}^2)^{\frac 32}}$

• We then use a simple gradient descent method to update the weights: $w_{i,t} = w_{i,t-1} + \rho \Delta w_{i,t}$
• In case of online learning we can consider only the term that depends on the most recent return $R_t$, so we get:

$\Delta w_{i,t} = \frac{dD_t}{dw_i} \approx \frac{dD_t}{dR_t} \left( \frac{dR_t}{dF_t} \cdot \frac{dF_t}{dw_{i,t}} + \frac{dR_t}{dF_{t-1}} \cdot \frac{dF_{t-1}}{dw_{i,t-1}}\right)$

• Considering that the neural network is recurrent, we can get:

$\frac{dF_t}{dw_{i,t}} \approx \frac{\partial F_t}{\partial w_{i,t}} + \frac{\partial F_t}{\partial F_{t-1}} \cdot \frac{dF_{t-1}}{dw_{i,t}}$

RRL are very interesing for rolling window usage: add link to [Gold - 2003 - FX trading via recurrent reinforcement learning] and [Dempster - 2001 - Realtime adaptative trading system using genetic programming]
• The system is then trained on $n_e = 10$ epochs on training set of length $L_{train} = 2000$ ticks [optimal value].
• Then we test it on test set of length $L_{test} = 500$ ticks [optimal value].

### Extensions to Machine Learning layer

• Extended to take into account other inputs such as signals from 14 technical indicators. ⇒ This didn't improve the results.
See [Dempster - 2001 - Computational learning techniques for intraday FX trading using popular technical indicators
• During training phase the transaction cost $\delta$ is left as a tuning parameter.
• To prevent too big weights, all weights are rescaled by a factor $f \lt 1$ as soon as a threshold value is hit.
• Improved position updating scheme: recalculating the output $F_t$ twice:
1. As before
2. After the weights are updated ⇒ provide a more accurate value which might be different.

### Risk and performances management layer

• Build a trailing stop-loss which is always adjusted x points under or above the best price reached during the life of a position.
• If a position is closed because of the stoploss, then a cool-down period [⇒ using 1 minute] is used before trading again.
• This layer can evaluate the strength of the signal received from the NN using the non-thresholded value (inside the sign() function).
• A threshold y can be provided by the optimization layer to only enter a position when we have an higher certainty.
• A maximum draw-down system [parameter z] is implemented to prevent complete failure of the trading system.

### Dynamic optimization of utility layer

• Definition of the risk measure:

$\Sigma = \frac{\sum_{i=0}^N (R_i)^2 I\{R_i \lt 0\}}{\sum_{i=0}^N (R_i)^2 I\{R_i \gt 0\}}$

Here $R_i$ is the raw return at time i: $R_i = W_i - W_{i-1}$, with $W_i$ the cumulative profit at time i.

• Then we define the utility function: $U(\bar{R},\Sigma,\nu) = a \cdot (1 - \nu) \bar{R} - \nu \cdot \Sigma$, with:
• $\nu$: Risk aversion parameter.
• $\bar{R} = \frac{W_N}{N}$ : average profit per frequency interval.
Standard theoritical work on risk measure (Not applicable above): [Ruszczynski - 2002 - Dual Stochastic dominance and related mean-risk models] and [Ruszczynski - 2003 - Frontiers of stochastically non-dominated portfolios]
• The parameter optimization problem becomes:

$\max_{\delta, \eta, \rho, x, y} U(\bar{R},\Sigma : \delta, \eta, \rho, x, y)$

⇒ Implemented as a one-at-a-time random search optimization (using 15 random evaluation around current value for each parameter).

Provide typical range of the parameters here.

• Tested with the EURUSD pair:
• Frequency of 1 minute from January 200 to January 2002.
• Used risk aversion value $\nu = 0.5$