On the use of different loss functions in statistical pattern recognition applied to machine translation
In pattern recognition, an elegant and powerful way to deal with classification problems is based on the minimisation of the classification risk. The risk function is defined in terms of loss functions that measure the penalty for wrong decisions. However, in practice a trivial loss function is usually adopted (the so-called 0–1 loss function) that do no make the most of this framework. This work is focused on the study of different loss functions, and specially on those loss functions that do not depend on the class proposed by the system. Loss functions of this kind have allowed us to theoretically explain heuristics that are successfully used with very complex pattern recognition problem, such as (statistical) machine translation. A comparative experimental work has also been carried out to compare different proposals of loss functions in the practical scenario of machine translation.