Application of Lexical Topic Models to Protein Interaction Sentence Prediction
Tamara Polajnar and Mark Girolami
In: NIPS 2009 Workshop on Applications for Topic Models: Text and Beyond, 11 Dec 2009, Whistler, BC, Canada.
Topic models can be used to improve classiﬁcation of protein-protein interactions (PPIs) by condensing lexical knowledge available in unannotated biomedical text into a semantically-informed kernel smoothing matrix. Detection of sentences that describe PPIs is difﬁcult due to lack of annotated data. Furthermore, sentences generally contain a small percentage of the features, thus leading to sparse training vectors. By exploiting contextual similarity of words we are able to improve the classiﬁcation performance. This contextual data is gathered from a large unannotated corpus and incorporated through a semantic kernel. We use Hyperspace Analogue to Language (HAL) and Bound Encoding of the Aggregate Language Environment (BEAGLE) semantic models to create the kernels. The modularity of the method lends itself to further exploration along several different avenues including experimentation with any number of word and topic models.