- Module code
- Taught during
- Session 2
- Module leader
- Dr Philip Lewis
- GPA of around 3.3/4.0 (US) or equivalent - please see below for subject-specific pre-requisites
- Assessment method
- Group presentation (15%), In-class test (20%), Written report (65%)
This module will provide an introduction to the most fundamental data analytic tools and techniques, and will teach students how to use specialised software to analyse real-world data and answer policy-relevant questions.
Data Science is an exciting new area that combines scientific inquiry, statistical knowledge, substantive expertise, and computer programming. One of the main challenges for businesses and policy makers when using big data is to find people with the appropriate skills. This module will cover classic topics in data analysis (regression, binary models, and panel data) and introduce more specialised techniques, such as classification and decision trees, clustering and pattern recognition, and dimensionality reduction. It will cover data preparation and processing, including working with structured, key-value formatted (JSON), and unstructured data.
Upon successful completion of this module, students will:
- Have a sound understanding of the field of data science and have developed the ability to analyse real-world data using some of its main methods
- Be comfortable with descriptive and predictive analytics, and their application to big data problems
- Have gained a solid foundation for more advanced or more specialised study in this area.
Successful completion of a first year undergraduate level module in statistics and experience of using statistical computer software.
Classes take place on the Bloomsbury campus, Monday through to Thursday, 9:00 am to 5:00 pm. Off-campus site visits and supervised fieldwork may also take place during these hours. While there are no classes on the first two Fridays of the module, assessment and a plenary event will take place on the last Friday. The module offers 45 contact hours, but students are expected to spend an additional 100 hours on assignments and self-study.
- Group presentation (15%)
- In-class test (20%)
- Written report (65%)
Dr Philip Lewis works in the Department of Cell and Developmental Biology at UCL but originally studied for his PhD in the field of High Energy Physics. He worked on analysis of the massive datasets generated by the Tevatron collider, and on the computing infrastructure needed to store, retrieve and analyse the data. For the last five years he has worked in the field of Computational Biology, and currently helps to deliver the SysMIC course which trains doctoral students across the UK in the computational skills increasingly necessary for cutting edge biology research.