Selecting Hidden Markov Model State Number with Cross-Validated Likelihood
The problem of estimating the number of hidden states in a hidden Markov chain model is considered. Emphasis is placed on cross-validated likelihood criteria. Using cross-validation to assess the number of hidden states allows to circumvent the well documented technical difficulties of the order identification problem in mixture models. Moreover, in a predictive perspective, it does not require that the sampling distribution belongs to one of the models in competition. However, computing cross-validated likelihood for hidden Markov chains involves difficulties since the data are not independent. Two approaches are proposed to compute cross-validated likelihood for a hidden Markov chain. The first one consists of using a deterministic half-sampling procedure, and the second one consists of an adaptation of the EM algorithm for hidden Markov chains, to take into account randomly missing values induced by cross-validation. Numerical experiments on both simulated and real data sets compare different versions of cross-validated likelihood criterion and penalised likelihood criteria, including BIC and a penalised marginal likelihood criterion. Those numerical experiments hightlight a promising behaviour of the deterministic half-sampling criterion.