Forecasting financial time series with Twitter
The aim is to assess whether a sentiment index (SI) constructed from Twitter data can improve the forecasting of ï¬nancial time series. To this end we have collected and processed a large amount of Twitter messages from March 2011 to the present date related to companies listed in NASDAQ, and have built a time series of real values that reï¬‚ect the positive or negative mood of the public. We ï¬rst tested for non-linearity and causality relationships between this SI and the stock time series, and based on the observed relations we trained a wide variety of forecasting models (linear regression, neural networks, support vector machines, and others) under many parameter settings, both with and without the SI; thus building a database on which we can verify the hypothesis of whether the inclusion of the SI improves forecasting. In order to cope with the hundreds of results obtained under the different experimental settings, we have developed a decision tree-based summarisation method of this information and have implemented it with the open source data mining toolkit Weka. We found that the Twitter-based sentiment index is especially helpful when paired with support vector machines.