PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Scaling the iHMM: Parallelization versus Hadoop
Sebastien Bratieres, Jurgen van Gael, Andreas Vlachos and Zoubin Ghahramani
In: Workshop on Scalable Machine Learning and Applications, IEEE International Conference on Computing and Information Technology, 29 June - 1 July 2010, Bradford, UK.

Abstract

This paper compares parallel and distributed implementations of an iterative, Gibbs sampling, machine learning algorithm. Distributed implementations run under Hadoop on facility computing clouds. The probabilistic mode under study is the infinite HMM, in which parameters are learnt using an instance blocked Gibbs sampling, with a step consisting of a dynamic program. We apply this model to learn part-of-speech tags from newswire text in an unsupervised fashion. However our focus here is on runtime performance as opposed to NLP-relevant scores, embodied by iteration duration, ease of development, deployment and debugging.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Talk)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
ID Code:6994
Deposited By:Sebastien Bratieres
Deposited On:05 September 2010