A Bernstein-von Mises Theorem for discrete probability distributions
Stéphane Boucheron and Elisabeth Gassiat
Electronic Journal of Statistics 2009.

## Abstract

We investigate the asymptotic normality of the posterior distribution in the discrete setting, when model dimension increases with sample size. We consider a probability mass function $\theta_0$ on $\N \setminus \{0\}$ and a sequence of truncation levels $(k_n)_n$ satisfying $k_n^3 \leq n \inf_{i\leq k_n}\theta_0(i).$ Let $\hat{\theta}$ denote the maximum likelihood estimate of $(\theta_0(i))_{i\leq k_n}$ and let $\Delta_n(\theta_0)$ denote the $k_n$-dimensional vector which $i$-th coordinate is defined by \begin{math} \sqrt{n} \left(\hat{\theta}_n(i)-\theta_0(i) \right) \end{math} for $1\leq i\leq k_n.$ We check that under mild conditions on $\theta_0$ and on the sequence of prior probabilities on the $k_n$-dimensional simplices, after centering and rescaling, the variation distance between the posterior distribution recentered around $\hat{\theta}_n$ and rescaled by $\sqrt{n}$ and the $k_n$-dimensional Gaussian distribution $\mathcal{N}(\Delta_n(\theta_0), I^{-1}(\theta_0))$ converges in probability to $0.$ This theorem can be used to prove the asymptotic normality of Bayesian estimators of Shannon and R\'enyi entropies. The proofs are based on concentration inequalities for centered and non-centered Chi-square (Pearson) statistics. The latter allow to establish posterior concentration rates with respect to Fisher distance rather than with respect to the Hellinger distance as it is commonplace in non-parametric Bayesian statistics.