Machine learning for resolving researcher affiliation
This paper describes the Institution Finder, an approach to develop a simple web mining procedure to find the internet domain of the institution(s) that a given researcher is affiliated with. The Institution Finder starts several queries on public Web search engines and tries to extract from the hits the institution names and internet domains that are likely to be related to the given researcher. A simple procedure based on machine learning is used to improve ranking of the hits. A researcher can be also rejected by the system if the corresponding domain cannot be found reliably. The performance is quantified by accuracy, i.e. the conditional probability P(correct | not rejected), and by the reject rate. The hits obtained from various queries can be combined in different ways, enabling the trade-off between reasonable accuracy with almost no reject (i.e. of the 363 test examples about 44% are correctly classified) or high accuracy with high reject (for example 55% of the test examples rejected and 75% of the rest correctly classified).