PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Wrapper induction and maintenance in documentum ECI
Boris Chidlovskii, Bruno Roustant and Marc Brette
In: ACM SIGMOD 2006, 26-29 June 2006, Chicago, USA,.

Abstract

Documentum Enterprise Content Integration (ECI) services is a content integration middleware that provides one-query access to the Intranet and Internet content resources. The ECI Adapter technology offers an interface to any application for data and metadata extraction from unstructured Web pages. It offers a unique framework of wrapper production, automatic recovery and maintenance, developed at Xerox Research Center Europe and based on state-of-art algorithms from machine learning and grammatical inference. In this presentation we analyze the performance of ECI adapters deployed in current commercial installations. We benefit from accessing reports on daily tests for all ECI commercially deployed adapters collected from June 2003 to September 2005. Using the daily reports, we analyze different aspects of the wrapper technology

EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
Information Retrieval & Textual Information Access
ID Code:3041
Deposited By:Boris Chidlovskii
Deposited On:16 September 2007