PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

A probabilistic learning method for XML annotation of documents
Boris Chidlovskii and Jérôme Fuselier
In: IJCAI 2005, Edinburgh, UK(2005).


We consider the problem of semantic annotation of semi-structured documents according to a target XML schema. The task is to annotate a document in a tree-like manner where the annotation tree is an instance of a tree class defined by DTD or W3C XML Schema descriptions. In the probabilistic setting, we cope with the tree annotation problem as a {\it generalized probabilistic context-free parsing} of an observation sequence where each observation comes with a probability distribution over terminals supplied by a probabilistic classifier associated with the content of documents. We determine the most probable tree annotation by maximizing the joint probability of selecting a terminal sequence for the observation sequence and the most probableparse for the selected terminal sequence.

PDF - PASCAL Members only - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Information Retrieval & Textual Information Access
ID Code:2073
Deposited By:Boris Chidlovskii
Deposited On:04 February 2006