Hierarchical multilabel classification trees for gene function prediction (Extended abstract)
Hendrik Blockeel, Leander Schietgat, Jan Struyf, Amanda Clare and Saso Dzeroski
In: Probabilistic Modeling and Machine Learning in Structural and Systems Biology, 17-18 June 2006, Tuusula, Finland.
Prediction of gene function is a so-called hierarchical multilabel classification (HMC) task: a single instance can be labelled with multiple classes rather than just one (i.e., a gene can have multiple functions), and these classes are organized in a hierarchy. Many machine learning methods focus on learning predictive models with a single target variable. One can then learn to predict all classes separately and combine the predictions afterwards. An alternative is to upgrade these methods towards the HMC context. In this paper we explore this alternative for classification trees. A comparison of learning HMC trees with learning normal classification trees shows that the former has clear advantages with respect to accuracy, efficiency, and interpretability. It seems worth investigating to what extent these results carry over to other machine learning methods.