Unsupervised domain adaptation based on text relatedness
In: Recent Advances in Natural Language Processing (RANLP 2011), 12-14 Sep 2011, Hissar, Boulgaria.
In this paper an unsupervised approach to do-main adaptation is presented, which exploits external knowledge sources in order to port a classification model into a new thematic do-main. Our approach extracts a new feature set from documents of the target domain, and tries to align the new features to the original ones, by exploiting text relatedness from external knowledge sources, such as WordNet. The approach has been evaluated on the task of document classification, involving the classification of newsgroup postings into 20 news groups.