Goal: find correlation over windows from the same or different streams on synchronous and asynchronous (a.k.a. lagged) variations.
Synchronous correlation: Given \(N_s\) streams, a start time \(t_{start}\) and a window size w, find, for each time window W of size w, all pairs of streams S1 and S2 such that S1 during time window W is highly correlated (over 0.95) with S2 during the same time window.
Asynchronous correlation: allow for difference windows (with size w) W1 and W2.
Cooperative : time series that exhibit a substantial degree of regularity. Reduction techniques: Fourier transforms 4)5)6), Wavelet transform 7)8)) , Singular Value Decomposition 9) , Piecewise Constant Approximations 10) .
Uncooperative: Time series with no such periodical regularities.
For uncooperative time series (for isntance returns of a stock), sketch-based approaches 11) can be used. We use random projection of the time series on random to exploit the bounds provided by the Johnson-Lindenstrauss lemma 12).
Notions of time-point, basic window and sliding window in Statstream framework.
Benefits of basic window (bw): if asking correlations for \([t_i, t_{i+sw}]\), we get results at \([t_{i+bw},t_{i+sw+bw}]\).
Given a point \(x \in \mathbb{R}^m\), we compute its dot product with d random vectors \(r_i \in 1,-1^m\).
The first random projection of x is given by \(y_1 = (x \cdot r_1, x \cdot r_2,\dots, x \cdot r_d)\)
We compute 2b more such random projections, so that we get \(y_1, y_2,\dots,y_{2b+1}\).
If w is another point \(w \in \mathbb{R}^m\), and \(z_1, z_2,\dots,z_{2b+1}\) are its projection using the same random vectors. Then the median of \(\|y_1 - z_1\|, \|y_2 - z_2\|, \dots, \|y_{2b+1} - z_{2b+1}\| \) is a good extimate of \(\|x - w\|\).
The estimate lies within a \(\theta(\frac 1d)\) factor of \(\|x - w\|\) with probability \(1 - {\frac 12}^b\).
Sketch implementation in statstream uses a structured random vector approach, providing a 30 to 40 factor improvement in runtime.
Example:
Build random vector \(r_{bw} = (r_0,r_1,\dots,r_{bw})\) where is \(r_i\) is 1 or -1 with probability 0.5.
Build control vector \(b = (b_0,b_1,\dots,b_{nb})\) where is \(b_i\) is 1 or -1 with probability 0.5. \(nb = \frac{sw}{bw}\).
Build the random vector for sliding window: \(r = (r_{bw}*b_0,r_{bw}*b_1,\dots,r_{bw}*b_{nb})\).
Statstream implementation uses 60 random vectors.
Pearson correlation is related to euclidian distance with: \(D^2(\hat{x},\hat{y}) = 2(1-corr(x,y))\).
Sketches work much better tha SVD method (see results in section 2.2.9).
2.3 - Statstream renewed
Standard error is related to standard deviation by: \(SE_{\hat{x}} = \frac{std}{\sqrt{n}}\).
Cole, R., Shasha, D., Zhao, X.: Fast window correlations over uncooperative time series. In: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining. KDD ’05, New York, NY, USA, ACM (2005) 743–749
Agrawal, R., Faloutsos, C., Swami, A.N.: Efficient similarity search in sequence databases. In: Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms. FODO ’93, London, UK, UK, Springer-Verlag (1993)
Efficient time series matching by wavelets. In: Proceedings of the 15th International Conference on Data Engineering. ICDE ’99, Washington, DC, USA, IEEE Computer Society (1999)
Keogh, E., Chakrabarti, K., Pazzani, M., Mehrotra, S.: Dimensionality reduction for fast similarity search in large time series databases. JOURNAL OF KNOWLEDGE AND INFORMATION SYSTEMS 3 (2000) 263–286
Kushilevitz, E., Ostrovsky, R., Rabani, Y.: Efficient search for approximate nearest neighbor in high dimensional spaces. In: Proceedings of the thirtieth annual ACM symposium on Theory of computing. STOC ’98, New York, NY, USA, ACM (1998) 614–623
Bottou, L.: Stochastic learning. In Bousquet, O., von Luxburg, U, eds.: Advanced Lectures on Machine Learning. Lecture Notes in Artificial Intelligence, LNAI 3176. Springer Verlag, Berlin (2004)
Bottou, L., LeCun, Y.: Large scale online learning. In Thrun, S., Saul, L., Sch¨olkopf, B., eds.: Advances in Neural Information Processing Systems 16. MIT Press, Cambridge, MA (2004)
Bottou, L.: Large-scale machine learning with stochastic gradient descent. In Lechevallier, Y., Saporta, G., eds.: Proceedings of the 19th International Conference on Computational Statistics (COMPSTAT’2010), Paris, France, Springer (August 2010) 177–187
Gasso, G., Pappaioannou, A., Spivak, M., Bottou, L.: Batch and online learning algorithms for nonconvex Neyman-Pearson classification. ACM Transaction on Intelligent System and Technologies 2(3) (2011)
Raiko, T., Valpola, H., LeCun, Y.: Deep learning made easier by linear transformations in perceptrons. Journal of Machine Learning Research - Proceedings Track 22 (2012) 924–932
Velazco, M., Oliveira, A., Lyra, C.: Neural networks give a warm start to linear optimization problems. In: Neural Networks, 2002. IJCNN ’02. Proceedings of the 2002 International Joint Conference on. Volume 2. (2002) 1871 –1876
Sutton, R.S., Whitehead, S.D.: Online learning with random representations. In: In Proceedings of the Tenth International Conference on Machine Learning, Morgan Kaufmann (1993) 314–321