The module teaches quantitative skills, with an emphasis on the context and use of data. Students learn to focus on datasets which will allow them to explore questions in society – in arts, humanities, sports, criminal justice, economics, inequality, or policy. The student will be expected to work with Python to carry out data manipulation (cleaning and segmentation), analysis (for example, deriving descriptive statistics) and visualisation (graphing, mapping and other forms of visualisation). They will engage with literatures around a topic and connect their datasets and analyses to explore and decide wider arguments, and link their results to these contextual considerations.
The module is assessed by a group research project, using data analysis and visualisation to explore a “real-world” question. The literature-research question-data-analysis-presentation-conclusion model follows the path of typical data-driven research projects which take place at a postgraduate and postdoctoral level.
How is data structured and how does this affect the ways in which we work with it? Covering numerical data, text analysis, geographical data, network science, linked data (and semantic web), normalization and scaling, and social media data.
How is data accessed, analysed and communicated, and what are its impacts? Principles of design for mapping and data visualization, graphing, the Open Data movement, Government and Census data, private data and ethics, aggregation and uncertainty.
These strands are covered by weekly lectures and colloquia based on set readings, giving students the chance to discuss and develop an understanding of where to find data, what techniques to use, what this tells us, and issues around communication and extraction in a societal context.
Students will be taught the use of Python libraries (matplotlib, Pandas, basemap and nltk) for importing and working with raw data to produce tables, scatter graphs, histograms, summary statistics (range, mean, median, mode, IQR, standard deviation, geometrical mean, centroids of geographical data) and maps, and how these techniques facilitate insight and understanding. Through IPython notebooks, we build a literate programming framework which allows narrative, explanation, image, weblinks and equations to complement the exercises.
This will be delivered through two hours of workshops and associated homeworks.
At the end of this module, we want students to have a bedrock of technical skill paired to the context of social data; practical, political, communicative, ethical and transformative.
To ensure the delivery of quantitative skills, there will be a weekly 2-hour Python workshop for all students, and we will develop additional out-of-class exercises to give students additional experience with the Python programming language and the underlying concepts with which they will be working.
This module is taught in Term 1 of Year 2. There are 4 compulsory contact hours each week, split as follows:
|Lecturers||Steven Gray (CASA)|
|Lecture:||1-2pm on Tuesday|
|Colloquium||Group 1: 11am-12pm on Thursday|
Group 2: 12-1pm on Thursday
|Workshop:||Both groups: 2-4pm on Friday|
|Module level:||Level 5|
|Credit value:||15 credits|
The module will be assessed through a group project which integrates context, data analysis and visualisation, and communication and contextualisation of outputs.
- Group website - 60%
- 1,000 word individual report - 30%
- Group presentation - 10%
Students enrolled on the module can view more information on Moodle.
The following are recommended prerequisites which you may find useful to complete before embarking on the module.
The course expects a basic familiarity with Python and iPython. You won’t need to be an expert – this is not a programming course – but we will be using Python to manipulate, analyse and display data. We will be doing this by presenting data manipulations in iPython Notebooks – environments which include text description, code and visual and numerical outputs in one place. You may have come across these in your first year – they look a bit like this:
These will provide scripts for working with data that you use and adapt for your group projects. You’ll be working with the Enthought Canopy environment, which allows interactive execution of these notebooks.
When you arrive at the module, we expect you to
- Be familiar with notebooks and the iPython environment. You can do this by downloading Enthought Canopy from https://www.enthought.com/products/canopy/ and running through our introductory notebook [LINK]. You’ll need to sign up as a student on the Enthought site to get access to the full range of libraries.
- Become familiar with some of the basic elements of Python. Python is a friendly language with approachable syntax, and we recommend that if you haven’t used it before (either as part of your first year QM project or elsewhere) you complete some introductory classes. Start by taking a look at Codecademy – one of the more accessible and friendly courses: http://www.codecademy.com/en/tracks/python - it’s worth concentrating on Section 1 (Variables, whitespace, comments, arithematic and formatting), Section 2a (Strings) and especially Section 5 (Lists and Dicts), which will be really useful in this course. The material on functions and conditionals is less vital, so concentrate on the basic elements of Sections 1 and 2, and Section 5. Some of the other sections are quite hard, so don’t feel too discouraged if you find them tricky.
If you don’t get on with codecademy – maybe you’re fatigued by the constant Monty Python references, or maybe you’d prefer to take a course, or learn from a book or pdf – one of these options should meet your needs. Remember, we want you to have some basic confidence and familiarity with Python when you start the course, not be a Python ninja. Focus on variables, basic syntax, lists and dicts.
- Code First Girls - look into the courses offered by http://www.codefirstgirls.org.uk/courses.html
- Think Python provides a more traditional book to lead you into the world of Python. You can download a free copy here: http://www.greenteapress.com/thinkpython/ This goes into more depth, and might be more useful if you already have a little programming experience, but try it out, nevertheless.
- Software Carpentry offer some useful tutorials, geared towards data manipulation: http://swcarpentry.github.io/python-novice-inflammation/
- Learn Python looks a little bit creaky but has an online interface: http://www.learnpython.org
- Learn Python the Hard Way is designed to create good habits, but if you’re a casual programmer, it might be a bit… hard. There are links to a free online version as well as a paid-for book and video series here: http://learnpythonthehardway.org