PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Mining the blogosphere: age, gender, and the varieties of self-expression
Shlomo Argamon, Moshe Koppel, James Pennebaker and Jonathan Schler
First Monday Volume 12, Number 9, 2007.


The growth of the blogosphere offers an unprecedented opportunity to study language and how people use it on a large scale. We present an analysis of over 140 million words of English text drawn from the blogosphere, exploring if and how age and gender affect writing style and topic. Our primary result is that a number of stylistic and content–based indicators are significantly affected by both age and gender, and that the main difference between older and younger bloggers, and between male and female bloggers, lies in the extent to which their discourse is outer– or inner–directed. In fact, the linguistic factors that increase in use with age are just those used more by males of any age, and conversely, those that decrease in use with age are those used more by females of any age.

Other (html)
EPrint Type:Article
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Natural Language Processing
Information Retrieval & Textual Information Access
ID Code:3406
Deposited By:Moshe Koppel
Deposited On:10 February 2008