Leveraging Sequence Classification by Taxonomy-based Multitask Learning
In a previous publication at last year’s NIPS , we compared a number of recent domain adaptation algorithms in a scenario that assumes one source domain with an abundance of data, and one target domain with only little training data. As prediction problem, we considered the supervised classification task of mRNA splice site recognition, which is representative for many other prediction tasks in sequence biology. We observed that considerable improvements over baseline methods are possible, which encouraged us to further pursue this direction of research. Hence, in our current research, we move from one source and one target organism to a scenario where we consider transfer learning between greater number of organisms, whose relationship to each other is given by a hierarchical structure or phylogeny. We explore several extensions of domain adaptation algorithms that allow the exploitation of hierarchical task relations for transfer learning. These algorithms were designed with large-scale applications in mind, allowing for a great number of training examples. The performance of the presented methods is demonstrated in an experiment where we combine splice-site data from 15 eukaryotic genomes. In general, we argue that transfer learning is well suited for applications computational biology, as different organisms can be regarded as different domains, which enables us to cast a wide range of prediction problems into the transfer learning framework.