UCL Great Ormond Street Institute of Child Health


Great Ormond Street Institute of Child Health


Statistical models for periodic data in clinical and epidemiological child health studies

Supervisors: Professor Mario Cortina-Borja, Professor Angie Wade, Dr Eirini Koutoumanou

Periodic –also known as circular, cyclical, angular, directional or seasonal– data consist of directions, angles, times, or dates in which the measurement scale is cyclic and can be expressed as an angle within the unit circle (1). Statistical analyses of cyclical patterns require particular mathematical frameworks to summarise, visualise and model seasonal observations. Well-known examples of biomedical and epidemiological angular data include (i) yearly incidence of infections, (ii) frequencies of hour and dates of birth (2) or deaths, and presentation of weather-related conditions; (iii) circadian rhythms; (iv) angles related to developmental dysplasia of the hip or popliteal angle in children with cerebral palsy; (v) in bioinformatics, protein shapes can be described using coordinates of the backbone chain which is expressed in terms of two torsional angles (3). There is also increasing evidence of the effects of seasonal changes experienced in utero or in very early life and future adult health and mortality, for example incidence of coronary heart disease, insulin resistance, and suicide have shown to be affected by the subject’s date of birth (4). Typically, analysing circular data require complex parametric forms, and the class of generalised additive models for location, scale and shape (gamlss) is a convenient way to develop inferential procedures for appropriate regression models. A related problem relates to modelling bivariate angular outcome variables, for instance dates of birth and death from SIDS, or month of birth and presentation of menarche. Investigation of such joint variables might shed light on questions about aetiological or physiological aspects of a condition. Copula models, fitted in the Bayesian framework (5), are the natural way to construct conditional models for such data but this class has not been properly developed.

There is a need to extend modern statistical methods to the domain of circular statistics. For instance, there are no clear methodological guidelines for dealing with censored and truncated observations, missing values, longitudinal designs, transformations, latent class mixtures, multivariate responses and regression diagnostics, and there is a marked lack of inferential procedures in the Bayesian framework for circular data settings. This project will explore these areas in the context of clinical and epidemiological child health studies, and will disseminate the applications of these methods among child health researchers and practitioners.

Development and implementation of statistical methodology to analyse circular variation; this includes writing R libraries to facilitate data visualisation and fitting univariate and multivariate regression models for circular data in the Bayesian framework. Existing datasets include: (i) many regional data available from national statistical offices on hours and dates of birth, and on dates of presentation of specific conditions, e.g. suicide; (ii) data from Hospital Episode Statistics (HES) database for England on hour and dates of admission to hospital for specific conditions; (iii) clinical data from our partner hospital regarding angles relevant to gait analysis.

In their first year, the student will undertake a literature review of statistical methods for circular data, seasonal health problems, and software available for implementation. We are particularly interested in extending the gamlss class to the circular setting to construct flexible regression-based modelling, including finite latent class mixtures. In their second and third year the student will develop two comprehensive packages within the R environment for statistical computing. One package will extend the gamlss class to cover non-standard univariate circular probability distributions, including random effects, mixtures, truncated and censored distributions, and goodness of fit procedures; the second package will allow fitting bivariate circular regression models in the Bayesian framework. Throughout the project, the student will analyse clinical and epidemiological seasonal data using circular regression models.

1. Cremers J; Mulder KT; Klugkist I. (2018). Circular interpretation of regression coefficients. Br. J. Math. Stat. Psychol. 71, 75–95.
2. Martin P; Cortina-Borja M; Newburn M; et al. (2018) Timing of singleton births by onset of labour and mode of birth in NHS maternity units in England, 2005–2014: A study of linked birth registration, birth notification, and hospital episode data. PLoS ONE 13(6): e0198183.
3. Mardia KV; Taylor CC; Subraiman G (2007) Protein bioinformatics and mixtures of bivariate von Mises distributions for angular data. Biometrics 63, 505–512.
4. Salib E; Cortina-Borja M; (2006) Effect of month of birth on the risk of suicide in England and Wales. Br. J. Psychiatry 188 416-422.
5. Stander J; Dalla Valle L; Taglioni C; Liseo B; Wade A; Cortina-Borja M. (2019) Analysis of paediatric visual acuity using Bayesian copula model with sinh-arcsinh marginal densities. Statistics in Medicine, 38 3421-3443.