SLMS Academic Careers Office

Grand Challenges

8. Using social media big data to understand the genetic and environmental aetiology of mental health and disorder in emerging adulthood

Supervisor Pair: Dr Oliver Davis and Dr Sebastian Riedel Potential Student’s Home Department: Division of Biosciences / Department of Genetics, Evolution and Environment

Born around the same time as the commercial Internet, today’s emerging adults are the Internet generation, with the vast majority engaging frequently with their real-life peer groups through online social networks. Emerging adulthood is a critical period for the development of psychiatric disorders, so learning about these interactions is crucially important to our understanding of the origins of mental health and wellbeing. If we are to understand social influences on mental health and disorder in this or future generations of adults, then we must take notice of this online, as well as offline social activity. Fortunately, whereas offline social networks are difficult to assess and track, online social networks are detailed, ecologically valid databases of real time social activity. Last year, funding from the Wellcome Trust and an MRC Centenary Award allowed us to collaborate with Dr Claire Haworth at the University of Warwick to recruit over 2,400 eighteen-year-old participants from the longitudinal Twins Early Development Study (TEDS) for a study of online social network activity. This pilot project demonstrated the feasibility of collecting high-resolution longitudinal data on phenotypes and environmental exposures through social media. We have linked this growing dataset of over four million tweets to concurrent questionnaire data, historical information collected from birth to adulthood, and genome-wide genotype data funded by the Wellcome Trust Case Control Consortium (WTCCC). This interdisciplinary Grand Challenges project will use this globally unique dataset to answer questions about the dynamic genetic and environmental aetiology of wellbeing and psychiatric disorders during emerging adulthood by developing new approaches to the coding, analysis and visualisation of complex multidimensional big data from the fields of genetics and computational social science. For example, what can text analysis of tweets tell us about an individual’s thoughts and feelings? Can we characterise Twitter behaviour, and does the way people behave on Twitter relate to their behaviour offline? Can we use Twitter to track individuals’ mood across time and monitor their reactions to real-world events? And how can we incorporate twin and genotype data to explore the changing relationship between genotype and phenotype during this critical developmental period? The project will be based in Dr Oliver Davis’s Dynamic Genetics Lab in the new Computational Biology Laboratory at UCL Genetics Institute and the Department of Genetics, Evolution and Environment (GEE), in close collaboration with Dr Sebastian Riedel’s Machine Reading Lab at the new London Media Technology Campus and the Department of Computer Science. Skills developed through this project will include the analysis and visualisation of two very different types of big data, statistical genetics, psychometrics, programming in R and Python, natural language processing, and machine learning.