Efficient semantic inference over language expressions
According to the traditional formal semantics approach inference is conducted at the logical level. Texts are first translated into some logical form and then new propositions are inferred from interpreted texts by a logical theorem prover. However, practical text understanding systems usually employ shallower lexical and lexical-syntactic representations, sometimes augmented with partial semantic annotations. While practical semantic inference is mostly performed directly over linguistic rather than logical representations, such practices are typically partial and quite ad-hoc, and lack a clear formalism that specifies how inference knowledge should be represented and applied. Our work proposes a step towards filling this gap, by defining a principled semantic inference mechanism over parse-based representations. We define a proof system that operates over syntactic parse trees. New trees are derived using entailment rules, which provide a principled and uniform mechanism for incorporating a wide variety of inference knowledge types. Notably, this approach allows easy incorporation of rules learned by unsupervised methods, which seems essential for scaling inference systems. Interpretation into stipulated semantic representations, which is often difficult and is inherently a supervised semantic task for learning, is circumvented. Our overall research goal is to explore how far we can get with such an inference approach, and identify the scope in which semantic interpretation may not be needed. Within the textual entailment setting, a system is required to recognize whether a hypothesized statement h can be inferred from an asserted text t . Given t and h, represented as parse trees, our inference system tries to generate h from t by applying entailment rules that aim to transform t into h, through a sequence of intermediate parse trees (consequents), similar to a proof process in logic. Finding such a "proof" is a search problem, characterized by high branching factor. In order to deal with this huge search space, we have developed a novel data structure, termed compact forest, which allows efficient generation and representation of inferred trees. We show how all inference operations defined in our framework can be implemented over compact forests. The resulting structure is an equivalent, compact representation of all the inferred consequents (trees), which allows recovery of each individual consequent. Since common subtrees are shared among different consequents, entailment rules can be applied only once to these subtrees, improving search efficiency. Overall, compact forests allow reduction in time and space from exponential to linear, comparing to explicit generation of each consequent. Another crucial source for entailments is co-reference relations. Our inference framework allows either substitution of co-referring expressions, or merging information from co-referring predicates. It provides uniform treatment of co-reference links that were either recognized externally, or generated as part of the proof process. Finally, while our research focuses on knowledge-based inference, in practice available knowledge is incomplete and it is often not possible to complete a proof based only on knowledge. I will briefly describe our machine-learning approach for quantifying the remaining knowledge gaps and determine entailment.