CASA Working Paper 116

1 March 2007

The Cultural, Ethnic and Linguistic Classification of Populations and Neighbourhoods using Personal Names

There are growing needs to understand the nature and detailed composition of ethnic groups in today's increasingly multicultural societies. Ethnicity classifications are often hotly contested, but still greater problems arise from the quality and availability of classifications, with knock on consequences for our ability meaningfully to subdivide populations. Name analysis and classification has been proposed as one efficient method of achieving such subdivisions in the absence of ethnicity data, and may be especially pertinent to public health and demographic applications. However, previous approaches to name analysis have been designed to identify one or a small number of ethnic minorities, and not complete populations.

This working paper presents a new methodology to classify the UK population and neighbourhoods into groups of common origin using surnames and forenames. It proposes a new ontology of ethnicity that combines some of its multidimensional facets; language, religion, geographical region, and culture. It uses data collected at very fine temporal and spatial scales, and made available, subject to safeguards, at the level of the individual. Such individuals are classified into 185 independently assigned categories of Cultural Ethnic and Linguistic (CEL) groups, based on the probable origins of names. We include a justification for the need of classifying ethnicity, a proposed CEL taxonomy, a description of how the CEL classification was built and applied, a preliminary external validation, and some examples of current and potential applications.

This working paper is available as a PDF. The file size is 684KB.

Authors: Pablo Mateos, Paul Longley, Richard Webber

Publication Date: 1/3/2007

Download working paper No. 116.