PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Self-organized ordering of terms and documents in NSF awards data
Mikaela Klami and Timo Honkela
In: WSOM 2007, 3-6 September 2007, Bielefeld, Germany.


We present the results of an analysis of a text corpus of 129,000 abstracts of NSF-sponsored basic research projects between years 1990 and 2003. The methods used in the analysis include term extraction based on a reference corpus and an entropy measure, and the Self-Organizing Map algorithm for the formation of a term map and a document map. Methodologically, the basic approach is based on earlier developments, such as word category maps and the WEBSOM method, but in the level of details, we report several new aspects and quantitative comparison results between methodological variants in this article. The data covers a quite large proportion of US-based scientific research during recent years. The analysis results indicate the basic patterns discernable in the data, both at the level of the awards and at the terminology used in them.

EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Natural Language Processing
ID Code:3487
Deposited By:Timo Honkela
Deposited On:11 February 2008