Unsupervised learning methods are essential tools for data scientists and statisticians alike. They are often applied as a pre-processing step for feature selection and dimensionality reduction in statistical learning tasks. Cluster analysis is the most popular example of stand-alone unsupervised learning methods, often seen as an exploratory and hypothesis-generating approach.
Despite its wide use in many fields, cluster analysis is challenging to design, implement and evaluate. The challenge stems from the exploratory nature of the clustering process, and the multiple ways the analysis outputs can be interpreted.
This course will outline the fundamentals of cluster analysis and dimensionality reduction with the aim to enable learners to confidently design and implement such methods independently on a variety of datasets.
Learning Objectives
By the end of this course, participants should be able to:
- Understand the concept of representations in data
- Decide when to use dimensionality reduction
- Select an appropriate dimensionality reduction method
- Select an appropriate dissimilarity metric
- Outline the generalised clustering pipeline
- Describe in detail the k-means algorithm
- Understand the basic principles of hierarchical, density based, and gaussian mixture clustering
- Evaluate clustering algorithm outputs
- Outline the challenges and opportunities of applying cluster analysis for the discovery of disease subtypes.
Time | Session Title | Lead Tutor |
---|---|---|
9:00 - 09:30 | registration, coffee and welcome | Maria Pikoula |
9:30 - 11:00 |
| Maria Pikoula |
11:00 - 11:15 | Coffee break | |
11:15 - 12:45 |
| Maria Pikoula and Lucy Pembrey |
12:45 - 13:45 | Lunch | |
13:45 - 15:15 |
| Maria Pikoula and Nonie Alexander |
15:15 - 15:30 | Coffee break | |
15:30 - 17:00 | Tutorial: evaluating clustering results | Maria Pikoula and Nonie Alexander |
- Dr Maria Pikoula
- Maria is a data scientist by training, and currently work in the field of health informatics and electronic health record mining for research. Her work at the Institute for Health Informatics focuses on disease prediction and disease sub-typing via clustering methods. Maria teaches Python for data analysis and machine learning at the Msc Programme: Data science for research in health and biomedicine.