PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Forecasting financial time series with Twitter
Ramon Xuriguera, Marta Arias and Argimiro Arratia
In: 4th International Conference of the ERCIM WG on COMPUTING & STATISTICS (ERCIM'11), 17-19 Dec 2011, London, UK.


The aim is to assess whether a sentiment index (SI) constructed from Twitter data can improve the forecasting of financial time series. To this end we have collected and processed a large amount of Twitter messages from March 2011 to the present date related to companies listed in NASDAQ, and have built a time series of real values that reflect the positive or negative mood of the public. We first tested for non-linearity and causality relationships between this SI and the stock time series, and based on the observed relations we trained a wide variety of forecasting models (linear regression, neural networks, support vector machines, and others) under many parameter settings, both with and without the SI; thus building a database on which we can verify the hypothesis of whether the inclusion of the SI improves forecasting. In order to cope with the hundreds of results obtained under the different experimental settings, we have developed a decision tree-based summarisation method of this information and have implemented it with the open source data mining toolkit Weka. We found that the Twitter-based sentiment index is especially helpful when paired with support vector machines.

EPrint Type:Conference or Workshop Item (Talk)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
Learning/Statistics & Optimisation
Information Retrieval & Textual Information Access
ID Code:9232
Deposited By:Marta Arias
Deposited On:21 February 2012