Author disambiguation: a nonparametric topic and co-authorship model
Andrew Dai and Amos Storkey
In: NIPS workshop on Applications for Topic Models: Text and Beyond, 11 Dec 2009, Whistler, Canada.
A fully generative model is provided for the problem of author disambiguation. This approach infers the topics for each author and combines that with co-author information. The problems involved are similar to other entity resolution problems where differing references may refer to one author entity and identical references may refer to different author entities. We extend the hierarchical Dirichlet process and nonparametric latent Dirichlet allocation models to tackle this problem in a nonparametric, generative manner making no prior assumptions on the number of author entities, topics or research groups in the corpus. The model develops a hierarchical Dirichlet process for author-topic combinations. It conditions this model at document level on another hierarchical Dirichlet process for research groups. This enables the authors and topics to be suitably coupled. We perform joint inference to sample the author entities, topics and their group memberships. We present results from our approach on real-world datasets.