PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Application of Lexical Topic Models to Protein Interaction Sentence Prediction
Tamara Polajnar and Mark Girolami
In: NIPS 2009 Workshop on Applications for Topic Models: Text and Beyond, 11 Dec 2009, Whistler, BC, Canada.

Abstract

Topic models can be used to improve classification of protein-protein interactions (PPIs) by condensing lexical knowledge available in unannotated biomedical text into a semantically-informed kernel smoothing matrix. Detection of sentences that describe PPIs is difficult due to lack of annotated data. Furthermore, sentences generally contain a small percentage of the features, thus leading to sparse training vectors. By exploiting contextual similarity of words we are able to improve the classification performance. This contextual data is gathered from a large unannotated corpus and incorporated through a semantic kernel. We use Hyperspace Analogue to Language (HAL) and Bound Encoding of the Aggregate Language Environment (BEAGLE) semantic models to create the kernels. The modularity of the method lends itself to further exploration along several different avenues including experimentation with any number of word and topic models.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Poster)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Natural Language Processing
Information Retrieval & Textual Information Access
ID Code:5815
Deposited By:Tamara Polajnar
Deposited On:08 March 2010