PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

A Bernstein-von Mises Theorem for discrete probability distributions
Stéphane Boucheron and Elisabeth Gassiat
Electronic Journal of Statistics 2009.


We investigate the asymptotic normality of the posterior distribution in the discrete setting, when model dimension increases with sample size. We consider a probability mass function $\theta_0$ on $\N \setminus \{0\}$ and a sequence of truncation levels $(k_n)_n$ satisfying $k_n^3 \leq n \inf_{i\leq k_n}\theta_0(i).$ Let $\hat{\theta}$ denote the maximum likelihood estimate of $(\theta_0(i))_{i\leq k_n}$ and let $\Delta_n(\theta_0)$ denote the $k_n$-dimensional vector which $i$-th coordinate is defined by \begin{math} \sqrt{n} \left(\hat{\theta}_n(i)-\theta_0(i) \right) \end{math} for $1\leq i\leq k_n.$ We check that under mild conditions on $\theta_0$ and on the sequence of prior probabilities on the $k_n$-dimensional simplices, after centering and rescaling, the variation distance between the posterior distribution recentered around $\hat{\theta}_n$ and rescaled by $\sqrt{n}$ and the $k_n$-dimensional Gaussian distribution $\mathcal{N}(\Delta_n(\theta_0), I^{-1}(\theta_0))$ converges in probability to $0.$ This theorem can be used to prove the asymptotic normality of Bayesian estimators of Shannon and R\'enyi entropies. The proofs are based on concentration inequalities for centered and non-centered Chi-square (Pearson) statistics. The latter allow to establish posterior concentration rates with respect to Fisher distance rather than with respect to the Hellinger distance as it is commonplace in non-parametric Bayesian statistics.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Article
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
ID Code:4872
Deposited By:Elisabeth Gassiat
Deposited On:24 March 2009