Learning Schema Mappings
A schema mapping is a high-level speciﬁcation of the relationship between a source schema and a target schema. Recently, a line of research has emerged that aims at deriving schema mappings au- tomatically or semi-automatically with the help of data examples, i.e., pairs consisting of a source instance and a target instance that depict, in some precise sense, the intended behavior of the schema mapping. Several different uses of data examples for deriving, re- ﬁning, or illustrating a schema mapping have already been pro- posed and studied. In this paper, we use the lens of computational learning theory to systematically investigate the problem of obtaining algorithmi- cally a schema mapping from data examples. Our aim is to lever- age the rich body of work on learning theory in order to develop a framework for exploring the power and the limitations of the vari- ous algorithmic methods for obtaining schema mappings from data examples. We focus on GAV schema mappings, that is, schema mappings speciﬁed by GAV (Global-As-View) constraints. GAV constraints are the most basic and the most widely supported lan- guage for specifying schema mappings. We present an efﬁcient algorithm for learning GAV schema map- pings using Angluin’s model of exact learning with membership and equivalence queries. This is optimal, since we show that nei- ther membership queries nor equivalence queries sufﬁce, unless the source schema consists of unary relations only. We also obtain re- sults concerning the learnability of schema mappings in the context of Valiant’s well known PAC (Probably-Approximately-Correct) learning model. Finally, as a byproduct of our work, we show that there is no efﬁcient algorithm for approximating the shortest GAV schema mapping ﬁtting a given set of examples, unless the source schema consists of unary relations only.