XRCE’s Participation to CLEF 2007 Domain-specific Track
Our participation to CLEF07 (Domain-specific Track) was motivated this year by assessing several query translation and expansion strategies that we recently designed and developed. One line of research and development was to use our own Statistical Machine Translation system (called Matrax) and its intermediate outputs to perform query translation and disambiguation. Our idea was to benefit from Matrax’ flexibility to output more than one plausible translations and to train its Language Model component on the CLEF07 target corpora. The second line of research consisted in designing algorithms to adapt an initial, general probabilistic dictionary to a particular pair (query, target corpus); this constitutes some extreme viewpoint on the “bilingual lexicon extraction and adaptation” topic that we are investigating since now more than 6 years. For this strategy, our main contributions lie in a pseudo-feedback algorithm and an EM-like optimisation algorithm that realize this adaptation. A third axis was to evaluate the potential impact of “Lexical Entailment” models in a cross-lingual framework, as they were only used in a monolingual setting up to now. Experimental results on CLEF-2007 corpora (domain-specific track) show that the dictionary adaptation mechanisms appear quite effective in the CLIR framework, exceeding in certain cases the performance of much more complex Machine Translation systems and even the performance of the monolingual baseline. In most cases also, Lexical Entailment models, used as query expansion mechanisms, turned out to be beneficial.