Next Farr seminar: An Automated Data Science Assistant

9 May 2017

Date: Tuesday 13 June 2017, 13:00–14:00

Venue: Room G01, Farr Institute of Health Informatics Research, 222 Euston Road, London, NW1 2DA

Speaker: Professor Jason Moore

Title: 'An Automated Data Science Assistant'

Biography: Professor Moore is the Edward Rose Professor of Informatics and Director of the Penn Institute for Biomedical Informatics. He also serves as Senior Associate Dean for Informatics and Director of the Division of Informatics in the Department of Biostatistics, Epidemiology, and Informatics. He came to Penn in 2015 from Dartmouth where he was Director of the Institute for Quantitative Biomedical Sciences. Prior to Dartmouth he served as Director of the Advanced Computing Center for Research and Education at Vanderbilt University. He has a PhD in Human Genetics and an MSc in Applied Statistics from the University of Michigan. He leads an active NIH-funded research program focused on the development of artificial intelligence and machine learning algorithms for the analysis of complex biomedical data with a focus on genetics and genomics. He is an elected fellow of the American Association for the Advancement of Science (AAAS), an elected fellow of the American College of Medical Informatics (ACMI), an elected fellow of the American Statistical Association, and was selected as a Kavli fellow of the National Academy of Sciences.

Abstract: Machine learning is commonly described as a 'field of study that gives computers the ability to learn without being explicitly programmed' (Simon, 2013). Despite this common claim, practitioners know that designing effective machine learning pipelines is often a tedious endeavor, and typically requires considerable experience with machine learning algorithms, expert knowledge of the problem domain, and brute force search to accomplish. Thus, contrary to what machine learning enthusiasts would have us believe, machine learning still requires considerable explicit programming and expertise. In response to this challenge, we have developed an automated machine learning (AutoML) method called the tree-based pipeline optimisation tool (TPOT). The TPOT method will be presented and discussed in the context of developing automated data science assistants for the analysis of complex biomedical data.

Light refreshments and a sandwich lunch will be served from 12:30 noon.