Modelling and Predicting News Popularity
We explore the problem of learning and predicting popularity of articles in online news media. We exploit the articles’ textual content, and the information whether they became popular – by users clicking on them – or not. First we show that this problem cannot be solved satisfactorily by modelling it naively as a binary classiﬁcation problem. Next, we cast this problem as a Learning to Rank task of pairs of popular and non-popular articles and show that this approach can reach accuracy of up to 75%. We explore how prediction performance can be improved by adding more content-based features, which represent prior topic knowledge available to human users. For both approaches, different ﬂavours of Support Vector Machines are used. Furthermore, we try a different technique, the Lasso, which aims at sparse solutions. This allows to generate lists of keywords of manageable size, which would most likely trigger the readers’ attention. Finally, we present an in-depth investigation and application example for the outlet “BBC”.