PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Refining causality: who copied from whom?
Tristan Snowsill, Nick Fyson, Tijl De Bie and Nello Cristianini
In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining (2011) Association for Computing Machinery , New York, USA , pp. 466-474. ISBN 978-1-4503-0813-7

Abstract

Inferring causal networks behind observed data is an active area of research with wide applicability to areas such as epidemiology, microbiology and social science. In particular recent research has focused on identifying how information propagates through the Internet. This research has so far only used temporal features of observations, and while reasonable results have been achieved, there is often further information which can be used. In this paper we show that additional features of the observed data can be used very effectively to improve an existing method. Our particular example is one of inferring an underlying network for how text is reused in the Internet, although the general approach is applicable to other inference methods and information sources. We develop a method to identify how a piece of text evolves as it moves through an underlying network and how substring information can be used to narrow down where in the evolutionary process a particular observation at a node lies. Hence we narrow down the number of ways the node could have acquired the infection. Text reuse is detected using a suffix tree which is also used to identify the substring relations between chunks of reused text. We then use a modification of the NetCover method to infer the underlying network. Experimental results -- on both synthetic and real life data -- show that using more information than just timing leads to greater accuracy in the inferred networks.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Book Section
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
Theory & Algorithms
ID Code:8593
Deposited By:Tristan Snowsill
Deposited On:13 February 2012