PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Machine learning for resolving researcher affiliation
Marjan Sterk, Damijel Vladusic, Eva Milosevic, Jure Ferlez, Dunja Mladenić and Marko Grobelnik
In: IS-2007, 8-12 October 2007, Ljbuljana, Slovenia.


This paper describes the Institution Finder, an approach to develop a simple web mining procedure to find the internet domain of the institution(s) that a given researcher is affiliated with. The Institution Finder starts several queries on public Web search engines and tries to extract from the hits the institution names and internet domains that are likely to be related to the given researcher. A simple procedure based on machine learning is used to improve ranking of the hits. A researcher can be also rejected by the system if the corresponding domain cannot be found reliably. The performance is quantified by accuracy, i.e. the conditional probability P(correct | not rejected), and by the reject rate. The hits obtained from various queries can be combined in different ways, enabling the trade-off between reasonable accuracy with almost no reject (i.e. of the 363 test examples about 44% are correctly classified) or high accuracy with high reject (for example 55% of the test examples rejected and 75% of the rest correctly classified).

EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
Information Retrieval & Textual Information Access
ID Code:3750
Deposited By:Dunja Mladenić
Deposited On:16 February 2008