Return to site

Making Big Bucks with Big Data

by Werner Broennimann

The perennial question of how one can easily outperform the stock market may finally have been answered: Just google it.

The recent Scientific Reports paper “Quantifying Trading Behavior in Financial Markets Using Google Trends” [1] shows some interesting findings on certain search terms and their respective frequency that have some predictive power on the short term movements of the Dow Jones Industrial Average (DJIA). The authors state that “...Using historic data from the period between January 2004 and February 2011, we detect increases in Google search volumes for key-words relating to financial markets before stock market falls. Our results suggest that these warning signs in search volume data could have been exploited in the construction of profitable trading strategies...”.

The paper simulates a long/short strategy on the DJIA. It uses the relative frequency of specific keywords on Google Trends to determine whether the market will go up or down. The best predictor of market performance turned out to be ‘debt’. Using this keyword their Google Trends Strategy would have returned 326% from 2004 to 2011, compared to 16% on a buy and hold strategy (see graph below). To be clear, the paper does not actually claim to have discovered a great trading strategy, the author’s conclusions are much more broad: They concluded “...that these results further illustrate the exciting possibilities offered by new big data sets to advance our understanding of complex collective behavior in our society...”. However it looks like most attention has been focused on the market predicting aspect.

Big data and its application in financial markets has been a hot topic for quite a while now. Harvesting the predictive power of the frequency of certain keywords in newspapers, Google searches, Twitter hashtags and other social media channels is the focus of a number of software packages. Certain quant hedge funds have been quick to incorporate this newest means of technology to squeeze that extra bit of performance out of the markets. The ‘Twitter Flash Crash’ of April 23 gives some insight into the impact of social media on the markets.

In this context the findings of the paper are not surprising and the results do look impressive. However in my opinion they should be taken with a grain of salt. Such findings tend to suffer from three main problems:

  1. Data mining
  2. Selection bias
  3. Sustainability

Data mining: Given the vast amount of data in today’s social networks it is not surprising that some words in the English language happen to correlate well with the weekly moves of the Dow Jones index. The study was actually quite good in this regard, after all its best search keyword turned out to be ‘debt’. The authors tried to quantify financial relevance by calculating the frequency of each search term in the online edition of the Financial Times from August 2004 to June 2011, normalized by the number of Google hits for each search term. They then complemented these terms with semantically related suggestions from Google Sets. Having a clear proposition of an economically sound model about potential cause and action tends to reduce the possibility of finding spurious relationships. However, completely avoiding random results when searching through financial time series is difficult. In practice it often is very tempting to calibrate one’s original idea with feedback from the calculations.

Selection Bias: The back-test of the Google Trends investment strategy looks very impressive, having said that, back-tests tend to have a strong selection bias and typically they look just great. On the other hand, had the strategy not performed it would not be big news, possibly there would not not even be a research paper - and definitely not this blog post. We will never know how many similar attempts the authors of the study, or anyone else, made with less impressive results that were never published. From this perspective it is easy to attach an exaggerated value to the published strategy.

Sustainability: The obvious problem in a zero-sum game like stock trading, is staying ahead of the pack. Once these methods become more widely adapted, the market participants that trade on these patterns will sooner or later make them disappear by virtue of their actions. The implicit value of any such strategy therefore tends to behave like a diminishing asset that can be mined for a while until it is gone. A lot of new attractive trading strategies that are easy to replicate eventually get so crowded that the easy money is long gone by the time they start making the headlines, this may be the case for this particular strategy as well. Whatever one’s view on the efficient market hypothesis, it is typically a good starting point. Unless there are significant barriers to entry, the easy money tends to be gone rather quickly.

There are probably further potential pitfalls with simple strategies that use big data from social networks. They may fall prey to attacks with false data, especially if they run with high frequency. The above mentioned ‘Twitter Flash Crash’ that was triggered by false news coming from a hacked Associated Press account is probably just one of more to come. There may be more information pollution with deceptive data on a regular basis. It is not a big mental leap to imagine the social network equivalent of a DDOS attack - with hacked or newly created accounts blasting certain hashtags to wreak havoc with unprepared trading algorithms.

All these points are important when considering whether past performance is in fact a good predictor of future performance. Without making any firm prediction, I suspect that the returns described in the paper will be hard to replicate with exactly the same setup. It would be more interesting to know how the strategy would have performed using out-of-sample data. Those willing to pursue this way of trading in the hope of generating great returns will eventually face two choices. Either to get busy about setting up a serious quant trading operation or to find an alternative way to monetise the big data intelligence from social networks, typically by writing a book about the topic.

All Posts

Almost done…

We just sent you an email. Please click the link in the email to confirm your subscription!