PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

What's in a Hashtag? Content based Prediction of the Spread of Ideas in Microblogging Communities
Oren Tsur and Ari Rappoport
Proceedings of the fifth ACM international conference on Web search and data mining (WSDM) 2012.


Current social media research mainly focuses on temporal trends of the information flow and on the topology of the social graph that facilitates the propagation of information. In this paper we study the effect of the content of the idea on the information propagation. We present an efficient hybrid approach based on a linear regression for predicting the spread of an idea in a given time frame. We show that a combination of content features with temporal and topological features minimizes prediction error. Our algorithm is evaluated on Twitter hashtags extracted from a dataset of more than 400 million tweets. We analyze the contribution and the limitations of the various feature types to the spread of information, demonstrating that content aspects can be used as strong predictors thus should not be disregarded. We also study the dependencies between global features such as graph topology and content features.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Article
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Natural Language Processing
Information Retrieval & Textual Information Access
ID Code:9315
Deposited By:Amir Globerson
Deposited On:16 March 2012