This is a project that marks our first approach to mining large data sets to extract their geography. In one sense, it reflects our interest in citations but also in the geography of scientific activity and technology. This work is now part of our SECSE project and we have moved on to work with much larger data sets. This project is based on an accessible data set and it illustrates what we can do with very simple data sets.
The Institute of Scientific Information first produced a list of the most highly cited scientists in 8 fields in 2001. This was then expanded to 14 fields where details of the top 100 scientists by citation in each of these fields is listed with the raw data being taken from an analysis of the ISI's various citation counts. These are available from 1981 to date and are updated weekly. It is intended that this database be expanded to cover a much larger number of disciplines/fields including the social sciences with up to 250 highly cited individuals.
The data that is contained in the ISIHighlyCited data base is based on the series from 1981 to 1999. This is the data that we have examined with a view to exploring the geographical distribution of these scientists. We can aggregate this data by institution, place or location, and country. Or indeed any other classification that we consider might yield interesting patterns. Our focus here is on spatial or geographical distribution because we are interested in the pattern of concentration in the data.
We are also interested in the extent to which knowledge of these patterns explain government policy in terms of concentrating resources and we are interested in how the world is becoming more or less concentrated geographically in terms of the knowledge industries that these data relate to.
- Michael Batty
- Rui Carvalho