====== Automated FX Trading System Using ARL ======

Authors: M. Dempster & V. Leemans\\ Date: 2004
Website: http://www.cfr.statslab.cam.ac.uk/publications/papers.html

/* This is a comment */

===== Introduction =====

  * Adaptative Reinforcement Learning => ARL
  * 3 layers structure:
<graphviz dot center 200x200>
digraph finite_state_machine {
  rankdir=BT;
  size="14";
  node [shape = box];
  "Dynamic Utility Optimization";
  "Risk Management Overlay" -> "Dynamic Utility Optimization";
  "Machine Learning Algorithm" -> "Risk Management Overlay";
}
</graphviz> 

  * **Machine Learning Algorithm**: 
    * Using **Recurrent Reinforcement Learning** (RRL)

  * **Risk Management Overlay**:
    * Restrain/shutdown trading when in high uncertainty.

  * **Dynamic Utility Optimization**:
    * Render selection of fixed meta-parameters useless.
    * Allow risk-return trade-off control by user.


<note todo> Summary of previous work: Add link to Dempster - 2004 - Adaptative Systems For Foreing Exchange Trading [Quantitative Finance 4]</note>

<note todo>Find definition of the Sharpe Ratio ?</note>

<note todo>Previous work: Tarding system based on 2 superimposed AI algorithms: Add link to Dempster - 2002 - Intraday FX Trading: and evolutionary reinforcement learning approach</note>


  * Machine Learning layer + Dynamaic optimization layer => Adaptative Reinforcement Learning.

===== Machine Learning Algorithm =====

<note todo>Basic initial model: Add link to Moody & Saffell - 1999 - Minimizing downside risk via schochastic dynamic programming</note>

<note todo>Mathematical model and estimation procedure: add link to Moody & Saffell - 2001 - Learning to trade via direct reinforcement</note>

  * Model used: \(F_t = sign\left( \sum\limits_{i=0}^M (w_{i,t} \cdot r_{t-i}) + w_{M+1,t} \cdot F_{t-1} + \nu_t\right)\)

  * \(F_t \in \{-1, 1\}\) is the position to take at time t.
  * \(w_t\) is the weight vector at time t.
  * \(\nu_t\) is the threshold at time t.
  * \(r_t = p_t - p_{t-1}\) are the raw return values at time t.

<note todo>See all previous Dempster refs [1-5] for continuous time situation</note>

  * At evrey trade we buy/sell 1 unit of the currency pair.
  * Profit at time T can be calculated as (ignoring interest rates):
\[\begin{align}P_T & = \sum\limits_{t=0}^T R_t \\ R_t & = F_{t-1} \cdot r_t - \delta |F_t - F_{t-1}|\end{align}\]

  * Where \(\delta\) is the transaction cost per trade.

  * We then define a **differential sharpe ratio** considering a moving average version of the classical sharpe ratio:

\[\begin{align}\hat{S}(t) & = \frac{A_t}{B_t} \\ A_t & = A_{t-1} + \eta(R_t - A_{t-1}) \\ B_t & = B_{t-1} + \eta (R_t^2 - B_{t-1})\end{align}\]

  * We then expand \(\hat{S}(t)\) into a Taylor serie in \(\eta\), and we can consider the first derivative as an instantaneous performance measure:

\[D_t = {\frac{d\hat{S}(t)}{d\eta}}_{|\eta=0} = \frac{B_{t-1}\Delta A_t - \frac 12 A_{t-1} \Delta B_t}{(B_{t-1} - A_{t-1}^2)^{\frac 32}}\]

  * We then use a simple gradient descent method to update the weights: \(w_{i,t} = w_{i,t-1} + \rho \Delta w_{i,t}\)

  * In case of online learning we can consider only the term that depends on the most recent return \(R_t\), so we get:

\[\Delta w_{i,t} = \frac{dD_t}{dw_i} \approx \frac{dD_t}{dR_t} \left( \frac{dR_t}{dF_t} \cdot \frac{dF_t}{dw_{i,t}} + \frac{dR_t}{dF_{t-1}} \cdot \frac{dF_{t-1}}{dw_{i,t-1}}\right)\]

  * Considering that the neural network is recurrent, we can get:

\[\frac{dF_t}{dw_{i,t}} \approx \frac{\partial F_t}{\partial w_{i,t}} + \frac{\partial F_t}{\partial F_{t-1}} \cdot \frac{dF_{t-1}}{dw_{i,t}}\]

<note todo>RRL are very interesing for rolling window usage: add link to [Gold - 2003 - FX trading via recurrent reinforcement learning] and [Dempster - 2001 - Realtime adaptative trading system using genetic programming]</note>

  * The system is then trained on  \(n_e = 10\) epochs on training set of length \(L_{train} = 2000\) ticks [optimal value].
  * Then we test it on test set of length \(L_{test} = 500\) ticks [optimal value].

===== The Trading System =====

==== Extensions to Machine Learning layer ====

  * Extended to take into account other inputs such as signals from  14 technical indicators. => This didn't improve the results.

<note todo>See [Dempster - 2001 - Computational learning techniques for intraday FX trading using popular technical indicators</note>

  * During training phase the transaction cost \(\delta\) is left as a tuning parameter.

  * To prevent too big weights, all weights are rescaled by a factor \(f \lt 1\) as soon as a threshold value is hit.

  * Improved position updating scheme: recalculating the output \(F_t\) twice:
    - As before
    - After the weights are updated => provide a more accurate value which might be different.

==== Risk and performances management layer ====

  * Build a trailing stop-loss which is always adjusted **x points**  under or above the best price reached during the life of a position.

  * If a position is closed because of the stoploss, then a **cool-down** period [=> using 1 minute] is used before trading again.

  * This layer can evaluate the strength of the signal received from the NN using the non-thresholded value (inside the sign() function).

  * A **threshold y** can be provided by the optimization layer to only enter a position when we have an higher certainty.

  * A **maximum draw-down** system [**parameter z**] is implemented to prevent complete failure of the trading system.

==== Dynamic optimization of utility layer ====

  * Definition of the **risk measure**:
\[\Sigma = \frac{\sum_{i=0}^N (R_i)^2 I\{R_i \lt 0\}}{\sum_{i=0}^N (R_i)^2 I\{R_i \gt 0\}}\]

Here \(R_i\) is the raw return at time i: \(R_i = W_i - W_{i-1}\), with \(W_i\) the cumulative profit at time i.

  * Then we define the **utility function**: \(U(\bar{R},\Sigma,\nu) = a \cdot (1 - \nu) \bar{R} - \nu \cdot \Sigma\), with:
    * \(\nu\): Risk aversion parameter.
    * \(\bar{R} = \frac{W_N}{N}\) : average profit per frequency interval.

<note todo>Standard theoritical work on risk measure (Not applicable above): [Ruszczynski - 2002 - Dual Stochastic dominance and related mean-risk models] and [Ruszczynski - 2003 - Frontiers of stochastically non-dominated portfolios]</note>

  * The parameter optimization problem becomes:
\[\max_{\delta, \eta, \rho, x, y} U(\bar{R},\Sigma : \delta, \eta, \rho, x, y)\]

=> Implemented as a one-at-a-time random search optimization (using 15 random evaluation around current value for each parameter).

<note todo>Provide typical range of the parameters here.</note>

===== The Trading System =====

  * Tested with the EURUSD pair:
    * Frequency of 1 minute from January 200 to January 2002.
    * Spread of 2 pips.
  * Trading between 9 am and 5 pm (london time)
  * Interdealer platform: EBS / Reuters3000
  * Used risk aversion value \(\nu = 0.5\)
  * => Earned 5104 pips over 2 years

===== Conclusion =====

  * Future work:
    * In [2,7] it has demonstrated that order book or order flow information could enhance the perfs. 
    * Risk management layer could control multiple trading systems for different currencies.