PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

A Stochastic Model for XML Information Retrieval: Searching and Learning with the INEX collection
Benjamin Piwowarski and Patrick Gallinari
Journal of Information Retrieval 2004.

Abstract

Most recent document standards like XML rely on structured representations. On the other hand, current information retrieval systems have been developed for flat document representations and cannot be easily extended to cope with more complex document types. The design of such systems is still an open problem. We present here a new model for structured document retrieval which allows computing scores of document parts. This model is based on Bayesian networks whose conditional probabilities are learned from the labelled INEX collection (corpus + queries + assessments). Training these models is a complex machine learning task and is not standard. This is the focus of the paper. Preliminary results are presented on a small size collection which is a subset of the INEX corpus.

EPrint Type:Article
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
Information Retrieval & Textual Information Access
ID Code:198
Deposited By:Patrick Gallinari
Deposited On:06 June 2004