By Tristan Webb, Ph.D., FP Complete Technical Staff
Several clients in the financial markets wanted to detect tradeable sequences of price movements in the public markets. This is an extremely competitive space, in which most professional traders already use domain-specific software. Typically machines are used to detect fairly simple patterns -- for example, a stock that is moving with subnormal correlation to a family of related stocks -- which are presented to humans or trading bots as transaction opportunities. Human traders are already completely saturated with information and cognitive tasks. The goal, then, was to further detect useful hidden information (signal categorization) with no increase in human workload.
We assigned one of our Ph.D. computer scientists with a background in machine learning, to research any recent breakthroughs in the relevant mathematics that could be applied to create a more powerful solution, and to create a practical implementation. After some study, he identified an algorithm in recent scientific literature that would require the level of computing hardware only recently made available to average traders, and that to our knowledge had never before been used on stock market data.
Unsupervised shapelet learning algorithms perform time series classification using a technique that can scale to a large number of CPUs. This family of algorithms have a broad range of applications: financial, health and energy fields to name a few. By choosing an unsupervised learning model, such approaches have the ability to save hundreds of hours of expert human time, and can fit inside larger machine learning systems providing cutting edge analytics.
These algorithms function by detecting "time series shapelets", or representative subsequences of the time series data. The detected shapelets offer an unprecedented level of detail for the examination of data, in that the shapelets found are found by locating recurring patterns in the time series.
Computational finance has a number of challenges in the development of new algorithms: Computational workload required for algorithms demanded high performance solution Learning algorithm must be able to operate on different data sources Extreme throughput is required to analyze a large number of securities: therefore must scale reliably to parallel computation Algorithms would need be implemented with larger system in mind, for application in real trading platforms
The algorithm was implemented in the Haskell language. In less than a week’s development time the algorithm was implemented in two versions, one imperative based off the pseudo-code provided in the research literature, and another programmed in a functional style which would allow it to more easily compose with various data sources and scale in the number of cores used. The algorithm was then run on a database of historic stock market data to extract shapelets in the price and volatility time series.
After shapelet discovery was accomplished, the extracted shapelets were visually examined and used in further analysis of the historical data. The purpose of extracting the shapelets were to provide a set of time series “primitives” that could be compared to events in the stock market preceeding large price movements.
Technology transfer from research to real-world data analysis often involves some extra work, and this case was no exception. We dealt with these issues among others: Algorithms were published as research work, and not available in off-the-shelf software packages, requiring a proprietary re-implementation Stock market data required cleansing/normalization to make it amenable to the algorithm Optimizing the computational complexity of the algorithm Verification of predictive power of shapelets in further analysis
The resulting system is able to automatically detect clusters of patterns that tend to recur in the movements of particular stocks. Without human intervention it reads a series of stock quotes, analyzes them on multi-core hardware, and outputs (numerically and graphically) a set of clusters showing what sequences of price movements occur more often than randomly, and grouping the actual price sequences into these clusters for further examination.
Future observed price sequences can then easily be compared to the identified historic clusters, in order to categorize a newly observed price movement and suggest irregularities and opportunities. Possible future work could develop better inference models based on the extracted shapelets; therefore future real-time predictive stock price models could expect greater accuracy.
Based on our observation of the algorithm’s performance on historical data, it was able to detect price movement patterns that a expert human would recognize as being important indicators of price movement (Head and Shoulders, Multi-Wedge, etc). Therefore, this approach could become a useful computational aid for stock analysts, and may help in discovery of new price patterns not previously identified.