PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Learning Schema Mappings
Victor Dalmau
In: EDBT/ICDT 2012 Joint Conference, 26-30 March 2012, Berlin, Germany.

Abstract

A schema mapping is a high-level specification of the relationship between a source schema and a target schema. Recently, a line of research has emerged that aims at deriving schema mappings au- tomatically or semi-automatically with the help of data examples, i.e., pairs consisting of a source instance and a target instance that depict, in some precise sense, the intended behavior of the schema mapping. Several different uses of data examples for deriving, re- fining, or illustrating a schema mapping have already been pro- posed and studied. In this paper, we use the lens of computational learning theory to systematically investigate the problem of obtaining algorithmi- cally a schema mapping from data examples. Our aim is to lever- age the rich body of work on learning theory in order to develop a framework for exploring the power and the limitations of the vari- ous algorithmic methods for obtaining schema mappings from data examples. We focus on GAV schema mappings, that is, schema mappings specified by GAV (Global-As-View) constraints. GAV constraints are the most basic and the most widely supported lan- guage for specifying schema mappings. We present an efficient algorithm for learning GAV schema map- pings using Angluin’s model of exact learning with membership and equivalence queries. This is optimal, since we show that nei- ther membership queries nor equivalence queries suffice, unless the source schema consists of unary relations only. We also obtain re- sults concerning the learnability of schema mappings in the context of Valiant’s well known PAC (Probably-Approximately-Correct) learning model. Finally, as a byproduct of our work, we show that there is no efficient algorithm for approximating the shortest GAV schema mapping fitting a given set of examples, unless the source schema consists of unary relations only.

EPrint Type:Conference or Workshop Item (Paper)
Additional Information:Best paper award
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Theory & Algorithms
ID Code:8744
Deposited By:Victor Dalmau
Deposited On:21 February 2012