"Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write" H.G. Wells
This introductory statistics and research methods course is concerned with all areas of statistics, from data collection through to interpretation. It is impossible to consider any of these in isolation. For example, to try and interpret results of formal analyses without consideration of the data collection process and the form of the data (as shown by the initial summaries) would be foolhardy and liable to error.
Where analyses are described, the emphasis is on understanding the principles rather than on the mechanics of calculation. Where the intricacies of the calculations are given this is to enable a better understanding of the outcomes and limitations of the analyses. Excel spreadsheets are incorporated into the web-based material to assist in simple calculations.
Researchers sometimes feel that statistics is an over-rated subject. However, medicine and many other areas of research have become increasingly quantitative. There has been a rapid expansion in the ability to collect and analyse large amounts of data, aided by the increasing availability of electronic data collecting systems and software packages. Furthermore, practitioners have much greater access to research data than ever before. Even if they are never going to perform a study themselves, physicians and scientists must be able to read reports intelligently and evaluate critically the numerical evidence presented. There has been a growth in the scientific journal articles talking about the misuse of statistics and giving pointers towards improvement. Often, however, statistics is learnt during training as part of a very hectic schedule, it gets pushed aside in the mind 'to be of use later'. It is difficult to learn statistical techniques in isolation from a textbook or journal articles.
The statistical analysis section of most grant applications, ethics forms and published papers tends to be small, the statistical content of published papers appears to be minor. At the mention of statistics most people tend to think only of equations and the numerical analysis (or manipulation) of data. However, statistics is a much larger science that encompasses the entire research process: from data collection through to summarising, presenting and analysing that data, and finally interpreting the results of those analyses.
Definition of statistics: The science of collecting, summarising, analysing and interpreting data
- Supplementary Reading
For supplementary reading if required, the following textbooks are recommended:
- Practical Statistics for Medical Research, DG Altman, Chapman & Hall, 2006. ISBN 1584880392
- Medical Statistics: A guide to data analysis and critical appraisal, J Peat & B Barton, BMJ Books, Blackwell Publishing, 2007. ISBN 978-0-7279-1812-3
- Essential Medical Statistics, BR Kirkwood & JAC Sterne, Blackwell Science, 2005. ISBN 978-0-86542-871-3
- An Introduction to Medical Statistics, M Bland, Oxford University Press, 2008. ISBN 978-0-19-263269-2
- Medical Statistics at a Glance, A Petrie & C Sabin, Blackwell publishing, 2005. ISBN 978-1-4051-2780-6
- Statistics with Confidence, DG Altman, D Machin, TN Bryant, MJ Gardner. BMJ books, 2005. ISBN 0 7279 1375 1
- Presenting medical statistics from proposal to publication: A step-by-step guide. J Peacock & S Kerry, Oxfird University Press, 2007. ISBN 0-19-859966-8
- Medical Statistics: A textbook for the health sciences. D Machin, MJ Campbell & SJ Walters. Wiley & sons, 2007. ISBN 978-0-470-02519-2
- Planning a Study
When a study is undertaken, inconclusive results may arise because of avoidable flaws. These flaws can be in the:
- Study design
- Data analysis
The latter two of these are less serious than the first since they are correctable.
Flaws in the study design may make it impossible to salvage anything of use from the research undertaken.
The aim of this section is to provide a framework for designing a research study. Different study designs are introduced and some published research is considered within the practical session.
- Defining and documenting the research question
The research study should be designed to answer a specific research question. A careful review of existing published and unpublished data should justify the need to undertake the research necessary to answer the question.
The question should be specific enough to be answerable. For example, questions such as "How do I cure diabetes?" or "What are the problems associated with diabetes?" are too broad.
An acronym that is sometimes used to help make sure a question is specific is PICO. This stands for Patient, Intervention, Comparison, Outcome.
Whilst this cannot be applied to all research questions, it is a useful breakdown that may assist. An explanation of each part is given below:
P: What is the Patient or Problem or Population? - Describe the group or groups to which the question applies.
I: Intervention or exposure or test being considered.
C: Is there a comparative intervention? Or perhaps comparison is with healthy individuals
O: What is the outcome measure? What are you trying to identify differences in?Research questions can be categorized according to whether the question relates to measurements:
a) From a single group.
(i) What is the prevalence of pre-school asthma?
(ii) What is the average height of 5 year olds?
(iii) What are the population centiles of blood pressure in healthy 20-30 year olds?
(iv) What is the average height of 5 year olds in Cornwall?
(v) What is the likely prognosis for newly diagnosed lung cancer patients?
Sometimes the single group is measured more than once under different conditions. For example, a crossover (or within patient) trial of treatment versus placebo where each patient is measured after a course of treatment and again after a placebo period. The order of treatment/placebo should be randomized and there may need to be a washout period between to ensure that the patient is similar at the commencement of each study phase.
b) Compared between two or more groups.
Comparisons of diseased and healthy individuals
How they differ may highlight current differences that may indicate certain needs or differences in past practices that may have led to disease (potential causality).
They may be given the same test to evaluate its predictive value (diagnostic study).
(vi) Do children with epilepsy have different average BMIs compared to non-epileptic children?
(vii) Are children who develop epilepsy between 5 and 15 years of age more likely to have been prescribed antibiotics in the first year of life?
(viii) Are fetuses with head circumferences above the 99th centile at any ultrasound scan more likely to have congenital abnormality? (Ie. Is large head circumference diagnostic of congenital abnormality?)
Comparison of diseased individuals given different treatments (or treatment versus none)
(ix) Do steroids improve lung function in mild to moderate asthmatics?
(x) Does increasing patient contact with or without further therapy improve skin condition for patients with severe eczema?
The groups of individuals may be individually matched to ensure that they are similar with respect to specified features. For example, each individual with disease may be matched to someone without disease of the same age and sex so that any differences found cannot be due to age and sex differences between the groups.
Sometimes it is not obvious that more than one set is required.
(xi) Do diabetic adults have lower average blood pressures than healthy adults?
(xii) Are children who develop diabetes shorter than average at age 5?
Although it would be possible to compare values with established reference ranges, it is better to have two groups measured concurrently.
Cost-effectiveness studies may be used in either case i.e. to determine how much cost can be attributed to having/treating a disease or to compare the relative costs of different treatments.
- Target and Sample Populations
For study results to be generalisable, the study should ideally be performed on a random sample of the relevant individuals.
However, usually the research question relates to large population of individuals (for example, those with diabetes or asthma) and even though this may be reduced or refined within the specific research question (e.g., adult male diabetics, or 1-5 year old severe asthmatics), the population will still be too wide-ranging to truly obtain a random sample from it. What tends to happen is that a random sample of the local eligible population is taken and inferences drawn from this. Sometimes a multi-centre study is used to enlist a random sample from a wider (but still not completely comprehensive) area.
Statistical analyses allow valid inferences to be made from a random sample to the population that the random sample was selected from. For the inferences to be valid, the sample must be random. If a random sample is taken of adult male diabetics from the local health authority district, then statistical inference can only be made about adult male diabetics within that district and the results of any statistical analyses apply only to that population.
To make statements about a wider population than that randomly sampled from requires clinical inference. This is the process whereby the researcher argues that results found in the local population will also apply elsewhere. So, for example, a treatment may be trialled and found to be effective in local adult male diabetics and it may be inferred (clinically) that there is no reason why it should not therefore be effective for similar adult male diabetics in other districts (indeed even in other countries). One important step in this process would be to decide what 'similar' constitutes. The original trial should ideally give enough information for a comparable population to be identified elsewhere.
Sampling bias occurs if some members of the eligible population are more likely to be included in the sample than others. If this happens then the sample is not random and statistical inferences cannot validly be drawn directly although it may be possible to adjust for the bias. Adjustment for this form of bias, whilst possible, can only give unbiased answers if the process by which bias occurred is known. For example, suppose a population consists of 50% males, 50% females and that 50% of males respond positively to a question compared to only 10% of females. The purpose of the study is to estimate the prevalence of positive response in the population irrespective of gender (this will be 30%) but there is a tendency for males to be much more likely to respond than females. Hence in our sample, the estimate of positive response will be too large (since it is inflated by the relatively large proportion of male respondents). If we know that the non-responders were mostly female we can consider the fact that they tend to be less likely to respond positively and adjust our sample estimate accordingly (taking into account the extent to which they are under-represented in the sample).
- Refusal to Participate
If subjects are approached and then refuse to enter the study, as much information as possible should be recorded about them. It will then be possible to indicate the extent to which the study sample represents the target population. The possible implications for the study results can be discussed. For example:
(i) Ref: Turnbull FM et al. The Australian Measles Control Campaign, 1998. Bulletin of the World Health Organization, 2001 79(9).
Of 1601 parents randomly selected for interview to determine vaccination coverage and to investigate associations between socioeconomic variables and vaccination uptake, 43% could not be contacted. It is possible that the responses received from the remaining 57% might not be representative. The non-contactable group were probably more transient and therefore less likely to be vaccinated; this would mean that an estimate of uptake gained from the group who were contacted would over-estimate the percentage coverage in the whole population. In this study, information was available on the age and sex of the children and whether they were from a metropolitan area. The contacted and non-contacted groups had children of similar age and sex but a larger proportion of the non-contactable individuals were from metropolitan areas.
(ii) Ref: Osaki Holm S et al, Comparison of two azithromycin distribution strategies for controlling trachoma in Nepal. Bulletin of the World Health Organization, 2001, 79(3)
In this study, treatment refusals were rare and not recorded. The authors however estimate that well over 95% of the children offered treatment received it.
- Comparison Between Groups
Most studies are concerned with comparing two or more groups of individuals or the same individuals under two or more different conditions.
There will ultimately be a comparison of some outcome measurement between treatment or disease groups. The aim is to see whether the outcome differs between groups, and ideally we would like to be able to attribute any differences seen to the grouping variable (e.g., disease status, treatment presence or type).
Although the way in which groups are selected and/or followed up may vary according to the selected study design, the aim is always the same:
We want the groups (disease/healthy; those given different treatments) to be as alike as possible with respect to other factors that may influence outcome.
If we achieve this, then we can be sure that any differences in outcomes found between treatment or disease groupings are not due to other factors.We want the groups (disease/healthy; those given different treatments) to be as alike as possible with respect to other factors that may influence outcome.
Often we will have to tailor study collection to ensure such similarity. For example we may select groups of individuals matched for age and sex.
- Confounding Factors
A background factor is something that we are not directly interested in. For example, when comparing IgG levels in various ethnic groups we are not interested in age, sex, social class, eating habits etc. etc.
Sometimes background factors can get in the way and make addressing the research question more complicated.
Background factors for which...
- The groups differ on the background factor AND
- The background factor itself influences outcome
…are known as confounding factors
Confounding factors 'get in the way' of the comparison between groups that we want to make. Confounding is defined as "a situation in which the effects of two processes are not separated". The word comes from the Latin 'confundere' which means 'to mix together'.
a) Suppose IgG levels differ between ethnic groups but some ethnic groups tend to be older and we know that IgG is also associated with age; we don't know if differences in IgG are due to age differences or ethnicity, since:
- The (ethnic) groups differ on the background factor (age) AND
- The background factor (age) itself influences outcome (IgG level)
age is a confounding factor.
Both criteria need to be fulfilled for age to be a confounder: If the ethnic groups differed in their age distributions but age did not affect IgG level (criteria (2) not satisfied), then age would not get in the way of our comparison of IgG levels in different ethnic groups.
Similarly, if age affected IgG levels, but the ethnic groups were of the same age (criteria (1) not satisfied), there would be no problem.
b) In a study to compare respiratory compliance in preterm infants who require assisted ventilation with those who do not, birthweight may be a confounding factor in that it may:
- Differ between the groups AND
- Influence outcome (respiratory compliance)
c) In a randomized controlled trial to compare two treatments for eczema, severity of rash at presentation will be a confounder if:
- Patients allocated to one treatment tended to have worse severity at presentation AND
- Severity at presentation is linked to the final outcome (severity after treatment).
The effect of confounding can be avoided by appropriate study design, or by adjusting for these factors in the analysis. Sometimes studies are designed specifically to avoid potential confounders becoming actual confounders by ensuring that criterion (1) is not satisfied. For example, if age-matched ethnic groups were selected, then any differences in IgG level between the ethnic groups could not be due to age differences and age could not be a confounding factor in our comparison of IgG levels.
When planning a study, it is not always clear which factors are potential confounders. If there is doubt as to whether a factor is a confounder or not then information should be collected on it so that adjustment can be made if necessary. For example if we collect details of the birthweights and a measure of initial disease severity for the examples (b) and (c) given above, then we may be able to correct for differences between groups (assisted ventilation/ not and treatment 1/treatment 2 respectively) in the analysis. If the birthweight or severity information is not recorded then it will be impossible to discount their influence on the results or to correct for any influence they may have.
Whatever type of comparative study is undertaken, whether it is observational or experimental, ideally the groups being compared should not differ in ways that may affect outcome, apart from the grouping variable (usually disease or treatment). It will then be possible to attribute differences in outcome to that grouping variable.
Any feature that differs between the groups and is associated with outcome will act as a confounder.
Failure to recognise confounders can lead to wrong conclusions. Confounding factors can mask associations or create spurious ones.
- Types of Study
There are many different types of study. A particular research question may often be approached using a variety of study designs. Although each different design may give an answer to the question, some designs are less open to confounding (see below) and hence we can be more sure that any effects seen are due to the variables we are interested in (usually treatment or disease status).
Studies can be classified according to:
a) Whether they are based on a single sample or compare two or more groups of individuals
This will relate to the number of groups and/or measurements outlined in the research question.
b) Whether they are observational or experimental
- with an observational study there is no intervention; the researcher merely observes what happens in one or more groups of individuals/items.
- with an experimental study the researcher applies different treatments or interventions to individuals (either different individuals per treatment/intervention/condition or the same individual measured at different times under different conditions) and then observes the outcome (i.e., they change what is happening rather than merely observing).
The choice of random samples to compare and avoidance, as far as possible, of confounding are equally important for both experimental and observational studies.
c) Whether they are prospective or retrospective
- with a prospective study the measurements are made at the time of the study.
- with a retrospective study, there is an element of recall of previously made assessments.
The distinction between prospective and retrospective studies is not as clear-cut as the distinction between one or more groups, or between experimental and observational designs, because a study may be essentially prospective but collect some information retrospectively.
Some types of study are quite common and have specific names...
Common types of observational studies:
Historical comparison: In some studies, data may be compared with some previously known value. For example, the weights of a group of children with diabetes may be compared to weight centiles (which were based on a healthy group of randomly selected children).
If the comparison group consists of infants previously tested or treated in the same centre as the proposed new group, they are sometimes referred to as historical controls. For example, the adverse event rate when administering yellow fever vaccination at 6 months to all newly presenting infants could be compared with what has been recorded previously when the vaccination was given at 9 months of age. The group of infants vaccinated at 9 months would be called historical controls.
These types of study are rarely justified as they are so prone to confounding. The comparison group (whether summarized as a reference range or not) has been previously sampled, there will probably be confounding factors that have not been measured and cannot be accounted for in the analysis. It is always preferable to collect the relevant information from the comparison group concurrently with the group of interest.
Ecological: The unit of analysis is a population rather than an individual and association across different populations is investigated. For example, an ecological study may look at the association between prematurity and childhood cancer rates in different countries to see whether those countries with higher prematurity rates also have higher levels of childhood cancers.
Case-control: Individuals with disease and a healthy group are compared with respect to things that they have done differently in the past that may have led to disease. This is the most common type of retrospective study and may be open to recall bias (whereby those with disease are perhaps more likely to recall potentially causal agents than those without disease in the absence of any true differences in previous practice).
Cross-sectional: Individuals in two groups (usually those with and without disease) are compared with respect to their current habits. The aim is to identify features that may be causally associated with disease.
Cohort: Currently healthy individuals are classified according to some feature (such as drinking alcohol or not, smoking or not, eating a high fat diet or not) and followed forward in time to see whether one group is more likely to develop disease.
Common types of experimental study:
Randomised controlled trial (RCT): A trial in which treatment is compared to some control and allocation is made using randomization. The control would be an alternative active treatment, no treatment, or placebo.
It is important that allocation is random and not systematic otherwise bias in the groups may be introduced.
Crossover trial: A RCT for which individuals are randomized to have the different treatments (or placebo and treatment) in random order (as opposed to different groups of individuals having the two treatments, or treatment and placebo).
The design is only practical if the treatment is ongoing (for example, not a surgical procedure) and the underlying disease state is fairly stable. In addition, there should be no likelihood of treatment in the first period 'carrying-over' into the second, and this may be reduced, if necessary, by introducing a 'wash-out' period.
Crossover trials may be very efficient, since there is less chance of hidden confounding factors. The difference in outcome between the period on treatment and the period as a comparison should be calculated for each individual and these differences used in the analysis, which should also examine for period and carry-over effects.
Occasionally, different treatments or interventions can be simultaneously tested and evaluated on the same individual. This will be even more efficient than a crossover trial. The design is appropriate when the treatment or intervention is for eyes, legs, arms, kidneys etc. (that is, anything of which an individual has at least 2), and the treatments/interventions will not interact. The units (eyes, kidneys, etc.) should be randomly allocated to groups. For example, if all left eyes were assigned to treatment 1 and all right eyes to treatment 2, then the allocation would be systematic and not random.
- Removal of Confounding
Observational studies are more likely to be confounded. Information may not have been collected on the confounders, some may not even have been considered as potential confounders. For this reason, causality cannot be ascribed in observational studies. However, this does not mean that it is not still useful to aim to identify potential confounders so that they can be avoided via design (for example by selecting pairs of individuals who are similar with respect to potential confounders) or via adjustment in subsequent analyses. We must always be aware that hidden confounders may well remain.
With experimental studies causality can be ascribed as confounding can theoretically be removed. Some things that can be done to remove confounding are:
a) The use of placebo controls.
b) Sham treatment periods so that those in the intervention group do not receive more attention regardless of additional treatment.
c) Blinding: whereby individuals, their assessors, carers etc. are not aware which treatment group the individual is in.
d) Randomisation of individuals to treatments: Randomisation may be simple (like tossing a coin) but this will often lead to unequal group sizes and does not ensure that confounders are removed - merely that they will be on average. These issues are particularly problematic for small sample sizes. Specific confounders can be avoided by the use of:
i) Pair matching prior to randomization
ii) Stratified randomization
iii) Minimization incorporating a random element.
e) Allocation concealment - whereby group allocation is not known prior to patient recruitment.
These aspects are discussed in more detail in the following sections.
The comparison group for a new treatment is often known as a control group. This is either a group that receives no active treatment or receives the current standard. Hence a control group might actually be a set of individuals with disease who are given intense treatment. The appropriate control group to use will depend on the research question being asked. For example, "Is the new treatment doing anything?" will require a different control group to "Is the new treatment better than the current standard?"
Every attempt should be made to ensure that the groups are treated as similarly as possible apart from the treatment difference. This may involve the use of placebo injection or tablet. Both groups should have equal amounts of clinician and other care time.
If the subject is aware of the treatment that they are receiving, then this may have an impact on outcome. For example, there may be a psychological benefit to receiving an active treatment (as opposed to placebo) or the new treatment ('new' generally being perceived as 'better' regardless of the current equipoise allowing a randomised trial to be undertaken). Alternatively, those who are not randomised to new and/or active treatment, having been informed of the options during the consent process, may feel aggrieved and this may influence their decision to complete their allocated treatment.
Hence, knowledge of treatment may influence behaviour and/or outcome and create a difference between the treatment groups that is unrelated to treatment efficacy per se. Therefore, it is always preferred that the patient/study participant is unaware of the treatment group to which they have been allocated within a randomised trial. A trial in which the subject (patient) is unaware of their treatment group is known as a blind trial.
Biased assessments may also result when those responsible for treatment and/or evaluation are aware of the allocated intervention or the disease group.
A study is described as double blind when both the subject, and the clinician or researcher, are unaware of the treatment allocation. Studies in which only the subject (patient) is unaware of the treatment group are described as single blind. Occasionally a study will be undertaken for which the patient knows the treatment group but the assessor/clinician does not. - This would also be known as single blind. Sometimes the clinician administering treatment is different to the individual making assessments, then the study may be triple blind with neither patient, nor assessor, nor clinician being aware of treatment.
Whatever the scenario, the important point to note is that:
If anyone (patient, assessor, carers, clinicians etc.) is aware of the treatment group to which an individual in a randomised trial has been allocated, then this may potentially influence or bias the outcome of the study. This influence or bias is best avoided if at all possible. This avoidance is achieved by blinding the study.
Whether a study can be blinded or not may depend on cost and ethics as well as feasibility. For example, if a trial treatment for babies consists of several injections of drug, then placebo injections, whilst feasible, may be deemed unethical. The pharmacy department within the hospital may often be responsible for blinded allocation of treatments.
Studies which rely on either historical or non-randomly allocated comparison groups have the potential to provide distorted estimates of the effectiveness of an intervention.
In some types of study, random allocation may not be possible. For example:
- It is not possible to allocate disease status at random
- It would be neither ethical nor feasible to consider randomising mothers to smoking or non-smoking during pregnancy.
However, in studies where it is possible, random allocation of individuals to groups removes the effects of individual choice and outside biases. One important, but often disregarded, possible bias is the investigators conscious or subconscious desire for a particular patient to receive a particular treatment.
Sometimes systematic assignment to groups is used instead of proper randomisation. Common examples are:
- Using different days for different treatments,
- Assigning odd and even birthdates to different groups
- Assigning patients in turn to each treatment.
The problem with these is that the investigator knows which treatment his/her next patient will receive, which may influence their decision to enter them into a trial.
Tactics like assigning patients according to the first letter of their surname may introduce confounding factors into the study (all Jones's will be in one group, MacTavish's in another). Similarly, by choosing to treat the first 5 mice that are removed from a box of 10 may result in the fattest, oldest and slowest mice being treated. Random does not mean haphazard.
Apart from differences in treatment or intervention, we want to ensure that the comparison group should be the same as the treated group. Adequate randomisation is often no more difficult than the problematic techniques used in the examples given. For example, it would not take much more effort to assign numbers to the 10 mice in the box, choose 5 numbers (between 1 and 10) at random and then select the corresponding 5 mice for treatment.
There are many ways to randomly allocate individuals to treatment groups. Randomisation can be tailored to ensure that groups are of similar size and/or have a similar distribution with respect to potential confounders. The sections below describe the main forms of random allocation.
Simple randomisation: This is the most elementary form of randomisation and is equivalent to tossing a coin. On a practical level an allocation list may be constructed using random number tables or a random number generator within a computer package. For example, odd numbers may be used to indicate allocation to group 1 and even numbers to indicate allocation to group 2.
One problem with simple randomisation is that it may result in an imbalance in the numbers assigned to each group even if the overall sample sizes are relatively large. For example, if 200 individuals are randomly allocated into two groups using simple randomisation then there is approximately a 14% chance that 1 of the groups will contain 110 or more of the individuals; if only 40 individuals are randomised then there is a similar (15%) chance that 1 group will contain more than 60% (24) of the individuals. Such imbalances can be avoided by using block randomisation.
Block randomisation: This guarantees that at no time will the imbalance be large and at certain points the numbers in each group will be equal. For example, if blocks of size 4 are chosen to allocate individuals into two groups (A & B), then there are 6 possible sequences within each block of 4:
AABB, ABAB, ABBA, BABA, BAAB, BBAA
One of these blocks is chosen at random and the next 4 individuals allocated accordingly. The process is repeated until the required sample size is obtained.
Either of the above methods of randomisation may result in an imbalance in potential confounding factors. For example, suppose we want to know the effect of primrose oil in the treatment of eczema. We may randomly allocate primrose oil or a placebo treatment to our eczema patients and by chance end up with the placebo group tending to be younger, or having more severe disease, or more often male etc. If any of these factors are associated with the outcome (improvement or change in eczema reported) then the imbalance will make it difficult to interpret the results of the trial and is preferably avoided.
Stratification: Separate randomisation lists may be used for each potentially confounding subgroup i.e. separate lists for different age bands, or severity groups, or for males and females. Within each subgroup, block randomisation should be used. Note that using simple randomisation within each subgroup would be no different to using simple randomisation without subgrouping.
If there are two or more potential confounders then separate lists could be used for each combination, for example males and females within each age band. However, this may result in an unmanageable number of lists and the numbers within each list may be small.
Minimisation: This method of allocation achieves balance on a set of variables although not for each combination. It is especially suitable for smaller trials or where there are a number of potential confounders. The individuals are considered in turn and the randomisation is weighted so that the most likely group allocation will minimise differences between the groups with respect to the potential confounders. The first individual is allocated using simple randomisation (i.e., a 50:50 chance of being in either group). The weightings used are updated as each additional individual is assigned to the groups.
The following example is taken from Altman 'Practical Statistics for Medical Research':
Breast cancer patients are to be randomised to receive either mustine or talc as a treatment for pleural effusions. There are 4 potential confounders and for each of these the possible values are divided into two groups:
- Age (years) < 50 or > 50
- Stage of disease (1 or 2) or (3 or 4)
- Time from diagnosis of cancer to diagnosis of effusions (months) < 30 or > 30
- Menopausal status Pre or Post
Suppose after 29 patients that the numbers in each subgroup are as shown:
Mustine (n=15) Talc (n=14) Age <=50
Stage 1 to 2
3 to 4
Time Interval <=30m
Menopausal Status Pre
We now wish to enter into the trial a patient with the following characteristics:
57 years old; stage 3; time interval 22 months; postmenopausal. The numbers of women with this patient's characteristics who are already in the two treatment groups are shown below:
Mustine (n=15) Talc (n=14) Age >50 8 8 Stage 3 to 4 4 3 Time Interval <=30m 6 4 Menopausal Status Post 8 9 Total 26 24
As we wish to have the two groups as similar as possible, the preferable treatment for the new patient is that with the smaller total. Here we would use weighted randomisation with a weighting in favour of talc. For example, we might use a weighting of 4 to 1, so there is an 80% chance that the patient receives talc and only a 20% chance that they receive mustine.
When using minimisation for allocation there are several criteria to specify/consider:
- The number of variables to minimise with respect to
- How to categorise continuous minimisation factors - number and cut-points
- Whether some minimisation variables are more important than others and therefore there needs to be a weighting of the variables
- How heavily to bias in favour of the preferred group
At the Institute of Child Health we have developed a minimisation/simulation package which allows the effect of changing these criteria to be quantified in terms of the likely resultant discrepancies between groups. This package (SiMiN) also provides automatic allocation of patients according to specified minimisation criteria.
Cluster randomisation: For practical or ethical reasons it may be necessary to form clusters (or groups or sets) of subjects and assign each cluster, as a whole, to either the study or comparison group using any of the above randomisation methods. If the cluster is retained as the unit of analysis then no further adjustment is required. However if individuals within clusters are to be the units of analysis then account needs to be taken of the non-independence of individuals within each cluster.
(i) a trial wishes to assess the effectiveness of putting anti-smoking posters in schools on the uptake of smoking amongst schoolchildren. Suppose 10 schools are assigned to intervention and 10 schools act as controls, then the prevalence of smoking after 4 years could be compared between groups using schools as the unit of analysis. Note that for statistical purposes, taking this approach, the total sample size is 20 schools regardless of the number children within each of the schools. However some account may need to be taken in the analysis of the numbers of children on which each of the school estimates is based.
(ii) Ref: Osaki Holm S et al, Comparison of two azithromycin distribution strategies for controlling trachoma in Nepal. Bulletin of the World Health Organization, 2001, 79(3).
In order to compare the effectiveness of mass treatment versus targeted treatment of only those children found to be clinically active, it was necessary to perform randomisation by ward. Outcome was the prevalence with which individual children had clinically active trachoma and infection six months after the treatment period. The children were clustered within ward.
The key points to note are that because of the greater similarity of individuals or items within a cluster than between clusters the total sample size, for statistical purposes, is NOT the total number of individuals or items and that the clustered nature of the data MUST be taken account of in any analysis.
In the worst case scenario, one cluster (a district, hospital or ward etc.) is given one treatment or intervention and one other cluster (another district, hospital or ward etc.) is given some alternative treatment or intervention. In this case it is usually impossible to determine whether any observed difference is due to the treatment or intervention or merely to between-cluster differences that may have occurred anyway (even if a baseline measure indicated similarity). Hence this is a very bad research design and should never be used.
If it is necessary to allocate groups of individuals collectively to treatments, then there should be at least several groups randomly allocated to each treatment arm. The exact number of clusters to allocate should be formally decided using appropriate sample size estimations.
Randomisation in practice
Consideration also needs to be given to the practicalities of randomization. The process used should be concealed so that allocation cannot be inferred prior to randomization taking place (known as allocation concealment). Ideally randomization should take place off-site by someone unconnected with the rest of the study and be fully automated.
Note that if participants are required to give consent to entering the trial then this should be obtained prior to randomisation unless there are strong ethical reasons why this is not possible.
Randomisation in groups
Note that it may be necessary to randomise treatments or interventions within groups. For example, suppose increasing dosages of some preparation are to be added to an assay and compared to control assays prepared on the same days/at the same times. The control assays will allow the effect of some potential confounders to be accounted for (e.g. temperature, light intensity etc.). However, if all of the smallest dose are prepared etc. first, then the next largest and so on until the largest dosage is studied last, then there may be a 'learning' effect and this may be difficult to distinguish from any overall trend associated with increasing dose. If it is possible to randomise the dosages (rather than doing all of each dose 'on block') then this will remove this potential problem.
- Intention to Treat Analysis
In some instances, individuals are randomised to treatments that they do not complete (for example, they fail to take prescribed tablets) but do provide outcome data. For analysis purposes, these individuals should be considered to have remained in the group to which they were originally randomised. This is known as an intention-to-treat analysis.
If they are removed from the comparison, or their grouping is changed from their allocated to their actual treatment (or lack of) prior to comparison, then the results will be biased.
For example, when comparing treatment to placebo, some of those in the treatment arm may fail to comply with the treatment regime. It may be tempting to analyse them alongside the placebo group (who also did not have treatment), but this may introduce bias. The group who did not comply may have had side effects or found the regime too taxing or have recovered prior to the treatment being needed. Hence the bias could go in either direction.
If some do not adhere to their allocated treatment a per-protocol analysis may be presented. This is a comparison of only the subgroups that adhered to their allocated treatment. Those who did not adhere fully to their allocated treatment are omitted from the analyses.
A per protocol analysis should ideally be ADDITIONAL to the intention-to-treat analysis, which should be given if at all possible.
- Competing Study Designs
A particular research question may often be approached using a variety of study designs. Although each different design may give an answer to the question, some designs are less open to confounding and hence we can be more confident that any effects seen are due to the variables we are interested in (usually treatment or disease status).
For example, to determine whether smoking is causally associated with lung cancer various studies were undertaken:
- Historical comparison would look at how levels of lung cancer had changed as smoking levels increased.
- Ecological studies show a relationship between levels of smoking and lung cancer rates in different countries/regions.
- Cross sectional studies would show that those with lung cancer are more likely to be current smokers.
- Case-control studies would select a group of lung cancer patients and a group of healthy controls to see how they differed in previous behaviours (i.e., smoking).
- A Cohort study would select groups of currently healthy smokers and non-smokers and follow these forward in time to see whether one group was more likely to develop lung cancer.
- A randomised controlled trial would randomly allocate healthy individuals to smoke or not and then see who developed lung cancer.
This latter study would of course be unethical!
In all of the study types (1) to (6) there is the potential for confounding. For example, if the smokers are more likely to drink alcohol. In the observational studies (1) to (5) there is more likelihood of there being confounding factors that may not even be considered or measured. It may be that the tendency to smoke is associated with some other factor (such as drinking alcohol or dietary factors) that is the true causation of lung cancer occurring (hence smoking would be falsely declared the causal agent). The RCT (6) is not immune from confounding but it is less likely that there will be hidden confounders.
If a potential confounder such as drinking alcohol is known about then this can be avoided by suitable design manipulation. For the observational studies this will involve selecting groups similar with respect to alcohol consumption to compare. For the RCT the randomization can be stratified according to alcohol consumption, minimization can be used (incorporating alcohol consumption as a minimization factor), or randomization to smoking or not can take place within pairs matched for alcohol consumption.
- Losses to Follow-Up
Representative samples may be chosen initially but some individuals may not complete the treatment or return for follow-up assessment. The individuals who do not provide information may be biased in some way and this could complicate interpretation of the results.
(i) Ref: Kapur et al. Iron status of children aged 9-36 months in an urban slum integrated child development services project in Delhi. Indian Pediatrics, 2002, 39, 136-144.
Mothers lost to follow-up (n=44) had lower ferritin levels and attended less antenatal visits than the 58 women who were available for the 3 month follow up.
(ii) Ref: McAlister et al. Attitudes towards war, killing, and punishment of children among young people in Estonia, Finland, Romania, the Russian Federation, and in the USA. Bulletin of the World Health Organisation, 2001, 79(5), 382-387.
Study samples were selected to reflect diverse populations, represent average socioeconomic levels in the city concerned and to cover a range of ethnic groups. Response rates varied from 0.67 to 0.81.
As much information as possible should be collected on those for whom outcome is not recorded, and the groups who do and do not provide outcome data should be compared with respect to the available information. It may then be possible to ascertain ways in which the group providing information may not be representative and temper the interpretation of the generalisability of the results.
It is often useful and informative to present a flowchart that details the exclusions, drop-outs and losses to follow up. Examples are given below.
1) Juul-Kristensen et al. Motor competence and physical activity in 8-year old school children with generalized joint hypermobility. Pediatrics. 2009; Vol 124 (5), 1380-1387.
The aim of this study was to determine the prevalence of GJH in Danish primary school children. All children in the second grade in a midsize Danish municipality were invited to participate via a letter to their parents. The flow chart shows how the number approached (524) was reduced to the 349 who were used in the final analyses:
2) Watson et al. Clinical and economic effects of iNO premature newborns with respiratory failure at 1 year. Pediatrics. 2009; Vol 124 (5), 1333-1343.
This study is a 1-year follow up of infants participating in a randomized controlled trial. The flowchart shows the attrition in each arm of the trial:
- Written Study Protocols
All research should follow a written protocol. A protocol has the structure of a grant application or an ethical committee submission. It should:
(1) Clearly specify the RESEARCH QUESTION that is to be addressed.
(2) Give a scientific BACKGROUND to the problem, justifying the need for this research. Describe the TARGET POPULATION.
(3) Give precise details of the DESIGN of the proposed study and the METHODS used to collect the data. In particular:
- How the sample(s) will be chosen from the target population(s). State any exclusion criteria.
- How refusers will be treated (any data recorded)
- What will be recorded for each participant and precisely how will this information be collected (laboratory methods, questionnaires etc.). State the form that the outcome measures will take and be precise about who will make these measurements and when. For example, 'recordings will be made of disease severity before and after treatment' is not sufficient, whereas 'disease severity will be assessed by the consultant in charge as either very bad, bad, average, good or very good just prior to treatment and 6 months after commencement', is acceptable.
(4) Contain a detailed description of the conduct and TIMETABLE for the work proposed. This will clarify whether sufficient time has been allotted to the various stages of the study. It may reveal an over ambitious subject accrual rate, given the likely subject availability.
(5) Give details of the required SAMPLE SIZE and the STATISTICAL METHODS to be used. It should be clear how the analysis will address the research question. If information has been collected on potential confounders, then it should be stated how these will be incorporated into the analysis. Describe the policy for dealing with subjects who fail to complete the study.
(6) Outline the research and POLICY IMPLICATIONS of the possible findings of the study, together with details of how and to whom the results will be disseminated.
Plan of Investigation
The stages which an investigation should follow are given in the boxes below.
- Sample Size and Pilot Studies
The larger the sample, the greater the confidence with which we can make inferences about the population. For example, in a study to determine the prevalence of visual defects in pre-school children, an estimate obtained from a sample of 1000 children will be more precise than that obtained from a sample of only 200.
When comparing outcome measures between two groups, larger samples will allow us to detect smaller differences between the groups. Too small a sample will probably lead to inconclusive results and may make it harder to justify a larger trial. To proceed with a study that will not recruit, usually because of practical or time limitations, a sample large enough to enable the proposed research question may be considered unethical. Conversely, it is also unacceptable to waste time, both the patients and our own, collecting data from larger samples than is necessary.
Note that it is important that adequate sample size is determined at the design stage for planning purposes to ensure that the research question is answerable with the available resources. We do not cover the topic in this course and a further 1-day course is devoted to sample size estimation.
A pilot study IS NOT a small or improperly designed version of a larger trial.
A pilot study IS an initial investigation to give information that will be necessary when designing a future trial or study. For example a pilot may be used to:
- Assess the time required to examine each patient
- To determine the quality of a proposed questionnaire
- To estimate the variability of key variables (for sample size calculation)
There should be an outline of the future study for which the pilot is being used to gather information. As the sample size of a pilot study is seldom sufficient to draw reliable conclusions, the pilot should not be an end in itself.