A Reversible Jump Method for Bayesian Phylogenetic Inference with a Nonhomogeneous Substitution Model
Nonhomogeneous substitution models have been introduced for phylogenetic inference when the substitution process is nonstationary, for example, when sequence composition differs between lineages. Existing models can have many parameters, and it is then difficult and computationally expensive to learn the parameters and to select the optimal model complexity. We extend an existing nonhomogeneous substitution model by introducing a reversible jump Markov chain Monte Carlo method for efficient Bayesian inference of the model order along with other phylogenetic parameters of interest. We also introduce a new hierarchical prior which leads to more reasonable results when only a small number of lineages share a particular substitution process. The method is implemented in the PHASE software, which includes specialized substitution models for RNA genes with conserved secondary structure. We apply an RNA-specific nonhomogeneous model to a structure-based alignment of rRNA sequences spanning the entire tree of life. A previous study of the same genes from a similar set of species found robust evidence for a mesophilic last universal common ancestor (LUCA) by inference of the G + C composition at the root of the tree. In the present study, we find that the helical GC composition at the root is strongly dependent on the root position. With a bacterial rooting, we find that there is no longer strong support for either a mesophile or a thermophile LUCA, although a hyperthermophile LUCA remains unlikely. We discuss reasons why results using only RNA helices may differ from results using all aligned sites when applying nonhomogeneous models to RNA genes.