XClose

UCL Great Ormond Street Institute of Child Health

Home

Great Ormond Street Institute of Child Health

Menu

Chapter 5: Making Inferences

The previous chapter considered ways of quantifying differences between groups and quantifying associations between variables. In this section we look at how to make inferences about a population from a sample of data in order to answer research questions.

Statistical Inference

Data should always be collected in order to answer a pre-specified research question. Usually a sample of data is collected in order to make inferences about some larger population.

For example:

We may wish to know whether having asthma during the pre-school years is associated with reduced height at age 16. By observing heights in random samples of 16 year olds who did and did not have preschool asthma and looking at the between group differences, we would hope to be able to make general statements about all children (or at least a population more wide than the selected sample - this may be racially, socially or geographically restricted) who have suffered from preschool asthma.

Statistical inference is the process by which we make inferences from our random sample to the population from which that sample was taken. We use statistical methods to do this. All inferences depend on the sample being randomly selected from the inference population. If the sample is not random then any inferences may be of little, or limited, use.

Data are collected in order to answer a research question. The question usually refers to a large population of individuals, from all of whom it is not practical to collect data.

For example:

  1. Does administering yellow fever immunization at 6 months of age cause more significant adverse events compared to giving it at 9 months?
  2. Do infants born prematurely have impaired reading ability in later life?
  3. Is there a relationship between cord concentration of vitamin A and neonatal anthropometry?
  4. Does administration of FGF alter cell growth patterns?
  5. Does a new surgical procedure lead to reduced morbidity?

To address these research questions, a sample is taken and used to make inferences about the larger population. Ideally, the sample(s) should be randomly selected from the target population(s), that is, the population about which we wish to make inferences. The generalisability of the results will depend on the target population and the selection criteria.

Target and sample populations were discussed previously in Chapter 1.

Standard Error

What a sample tells us

To illustrate the power of a random sample to give information about the population, consider the following example.

Example:

Shown below is the distribution of birthweights of 17,333 babies born consecutively in 3 West London hospitals. The mean birthweight was 3263.57g (7lb 3oz), standard deviation 551.71g (1lb 3oz):

Birth Weights Population Histogram

This is the TOTAL POPULATION of birthweights in those hospitals in the chosen time period. The distribution of birthweights is clearly normal.

Suppose that we had wished to know the average birthweight in the area served by the hospitals over the relevant time period, but were only able to record the weights of 30 random births.

A random sample of 30 birthweights (chosen by computer) from the births at the hospitals is shown below:

First Random Sample Table

We can show this sample as a bar chart:

First Random-Sample Histogram

This sample of 30 birthweights randomly selected from the 17,333 has a mean of 3379.0g and standard deviation 596.22g.

If we had taken only this sample from the population then our BEST GUESS of the average birthweight of that population is given by the MEAN OF THE SAMPLE (i.e., 3379g). This guess is unlikely to be completely accurate; we would not expect a sample of 30 to give us the exact mean of the total population of 17,333. However, we do have a better idea of the mean in the population after having sampled the 30 values.

If we repeat the process and take another sample of 30 babies then we will have another estimate of the population mean. The mean of the second sample is highly unlikely to be the same as that in the first sample. The means of the samples from repeating the process will vary. To illustrate the variability of sample means obtained by repeatedly taking samples of the same size (in this case 30 birthweights), the process was repeated 5 more times and the results are shown below.

Sample Histogram 2

Sample Histogram 3

Sample Histogram 4

Sample Histogram 5

Sample Histogram 6

The sample means vary, sometimes over-estimating the population mean (for example, samples 1,3,4,5 and 6) and sometimes under-estimating (for example, sample 2). It is very unlikely that we would ever obtain the true population mean exactly from a sample of 30.

However, if we consider where the sample means lie on the distribution of population birthweights, we can see that the sample estimates are actually relatively near to the population mean:

Multiple Samples Plotted Against Population

A sample as small as 30 from this population of 17,333 is actually very informative in terms of giving us information re the mean of the population. The sample means cluster around the true 'population' mean and are just as likely to over as to under-estimate the population mean.

If we continued to take samples of size 30, we could form a distribution of the sample means that we obtain. This distribution would be normally distributed around the population mean and less spread than the actual values.

Standard error

A general result is that if the population has a mean µ and standard deviation σ, then the sample means of samples of size n will be normally distributed with mean µ and standard deviation

38_sigma_squareroute_n

The standard deviation of the sample means is known as the standard error. This is a measure of how precisely any one sample is likely to estimate the population mean.

NOTE that the standard deviation describes the spread of a sample; standard error is a measure of the precision with which the sample statistic approximates the true population value. The standard error must, by definition, become smaller as the sample size increases.

Big CLT Diagram

Population Distribution and Sample Means

Larger samples will give a smaller standard error and hence a less spread out distribution. The less spread the distribution, the closer any one sample mean is likely to be to the population mean.

The STANDARD ERROR is the name given to the precision with which a sample estimate approximates the population value. The standard error is not an estimate of any quantity in the population, but is a measure of the uncertainty of a single sample value as an estimate of the population value. 

In real life situations we would only have one estimate of the population mean from our single random sample.

We do not usually know the population standard deviation, so we estimate it using the standard deviation of the sample. The estimate from the sample can be assumed to be ok if:

  • The sample is at least approximately normally distributed and
  • The sample consists of at least 20 measurements.

Even when we have a large sample that is approximately normally distributed we know that our estimate is still unlikely to be exactly correct. However, any error in the sample estimate of the population standard deviation will be reduced when it is divided by the square root of the sample size in construction of the standard error estimate. If the sample is large, then the erroneous value is divided by a relatively large number and any error will be diminished further than if a small sample were used. Hence, there is a trade-off between the degree of non-normality that can be tolerated before the sample estimate becomes too inaccurate and the size of the sample.

For example:

The 6 random samples of 30 birthweights selected by the computer had standard deviations 596.17, 527.11, 520.72, 550.91, 569.87 and 543.18, which are up to 44.46g away from the true value of the population standard deviation (551.71g).

These 6 samples, based on 30 measurements each, lead to standard error estimates of 108.85, 96.24, 95.07, 100.59, 104.05 and 99.17 respectively, all within 8.12g of the true value.

Hence the errors in the standard error estimates are smaller than the errors in the standard deviation estimates. If the sample used had been larger than the standard deviation estimates would have been divided by a larger quantity and the errors in the standard errors reduced further. 

Confidence Intervals

The means of repeated samples of the same size taken from a population are normally distributed around the population mean, and hence (as previously shown) 95% of the sample means will lie in the interval (mean ± 1.96 standard errors).

11_distribution_sample_means

Because 95% of the sample means lie within 1.96 se of the population mean, 95% of the time we will choose a random sample whose mean is less than, or equal to, 1.96 standard errors away from the population mean.

Hence, an interval spanning 1.96 se either side of our sample mean will contain the population mean for 95% of all random samples:

12_confidence_intervals_diagram

The interval (sample mean ± 1.96 standard errors) is a 95% confidence interval for the population mean.

For 95% of samples, the 95% confidence interval will contain the population mean.

BUT we only have one sample and one confidence interval. We will not know whether this interval contains the population mean or not. We only know that for 95% of random samples it will.

NOTE: For 5% of random samples the interval will NOT contain the population mean.

Usually a 95% confidence interval is given. We are 95% confident that the 95% confidence interval contains the population parameter.

The edges of the confidence interval are known as confidence limits.

(i) A 99% confidence interval is constructed by taking the interval:

(Sample mean ± 2.58 standard errors)

We are 99% confident that this interval will contain the population mean.

(ii) An 80% confidence interval is constructed by taking the interval:

(Sample mean ± 1.28 standard errors)

We are 80% confident that this interval will contain the population mean.

Note that the more confident we are that the interval contains the population value the wider the confidence interval.

For example:

Ref: Landau H et al, Cross-sectional and longitudinal study of the pituitary-thyroid axis in patients with thalassaemia major, Clinical Endocrinology, 1993; 38, 55-61.

Ferritin levels (µg/l) were measured in 37 patients with confirmed ß-thalassaemia major. The mean value was 2745, standard deviation 1030.

12a_CI_calc_169.35

A 95% confidence interval for the population mean is given by:

(2745 ± 1.96(169.35)) = (2745 ± 331.93) = (2413, 3077)

A 99% confidence interval for the population mean ferritin level is wider (and more likely to contain the true population value) and is given by:

(2745 ± 2.58(169.35)) = (2745 ± 436.92) = (2308, 3182)

For normal individuals ferritin levels rarely rise above 100µg/l, hence using either confidence interval it can be seen that mean ferritin levels appear to be grossly elevated in this group of patients.

Confidence intervals attach a level of precision to a sample estimate to help facilitate interpretation. By considering the confidence limits (ends of the confidence interval), the researcher is made aware of the extreme population scenarios with which the sample is compatible.

For example:

The sample of 37 ferritin levels allowed a guesstimate to be made with 95% confidence that the average level for the population lay somewhere between 2413 and 3077. If this interval is too wide to be clinically useful then the sample size could be increased, leading to the standard error being reduced and hence a narrower confidence interval.

Standard errors and confidence intervals for other population parameters

Standard errors for other population parameters

For most of the population estimates mentioned so far, a standard error can be estimated from the sample data and this can be used as a measure of precision of the estimate and to construct a confidence interval. The exceptions are the median, centiles, inter-quartile range and difference in medians. Where a valid standard error can be calculated, then taking interval 1.96 standard errors either side of the sample estimate will give a 95% confidence interval.

Sample estimates of standard errors for population parameters and their validity are given below. For a single sample, n is the sample size; for comparisons of 2 samples, n1 and n2 are the sample sizes.

1. Single mean

39_S_squareroute_n

This formula is valid if:

  1. The sample is approximately normally distributed AND
  2. n > 20.

2. Difference between two means

13_SE_diff_means

This formula is valid if:

  1. The samples are approximately normally distributed AND
  2. n1 > 20 and n2 > 20 AND
  3. S1 and S2 are approximately equal (one not more than twice as large as the other)

3. Single proportion

14_SE_single_prop

This formula is valid if:

  1. 0.1 < p < 0.9 AND
  2. n > 20

4. Difference between two proportions

15_SE_diff_props

This formula is valid if:

  1. 0.1 < p1 < 0.9 and 0.1 < p2 < 0.9 AND
  2. n1 > 20 and n2 > 20

5. Single percentage

16_SE_single_percentage

This formula is valid if:

  1. 10 < p% < 90 AND
  2. n > 20

6. Difference between two percentages

17_SE_diff_percentages

This formula is valid if:

  1. 0.1 < p1% < 0.9 and 0.1 < p2% < 0.9 AND
  2. n1 > 20 and n2 > 20

For relative risk (RR), odds ratios (OR) and correlation coefficients, it is only possible to calculate standard error on a transformed scale. The scale involved is the natural log. The natural log of a value, X, is written ln(X).

7. Relative Risk

18_SE_lnRR

8. Odds ratio

19_SE_lnOR

9. Correlation coefficient, r

20_SE_correlation

Valid if distributions of both variables approximately normal

Note that although the population parameter may vary, the principle is always the same:

  • The sample is used to calculate a measure of expected precision which is known as the standard error.
  • The standard error is always directly affected by the sample size (it is smaller, indicating greater precision, for larger samples).
  • The standard error is then used to construct a confidence interval by taking the appropriate number of standard errors either side of the sample estimate or some transformation of it.

Confidence intervals for other population parameters

For estimates 1-6 (means, proportions and percentages), a 95% confidence interval can be constructed by taking the sample estimate ± 1.96 standard errors.

Examples:

(i) Ref: Elsherif et al. Indicators of a more complicated clinical course for pediatric patients with retropharyngeal abscess. International Journal of Pediatric Otorhinolaryngology, 2010: 74; 198-201.

115 patients with a smooth clinical course (SCC) had average hospital duration of 5.4 days (S=2.9).

The standard error for the mean hospital duration is = 0.27A 95% confidence interval for the mean hospital duration in this group of patients is therefore (5.4 ± 1.96(0.27)) = (5.4 ± 0.53) = (4.87, 5.93 days)

(ii) Ref: As (i)

Patients with a complicated clinical course (CCC) were compared to those with a smooth clinical course (SCC). The 115 with SCC had an average hospital duration of 5.4 days (sd 2.9) compared to an average 7.6 days (sd 4.4) for the 15 with CCC.

The difference in mean hospital stat of 2.2 days has standard error:

21_comparing_SCC_CCC_SE_calc

Hence, a 95% confidence interval for the difference in average stay is given by:

(2.2 ± 1.96(0.85)) = (2.2 ± 1.7) = (0.5, 3.9 days)

A 99% ci for the difference is given by (2.2 ± 2.58(0.85)) = (0.007, 4.393 days)

(iii) Ref: Dai et al. Time trends in oral clefts in Chinese newborns: data from the Chinese national birth defects monitoring network. Birth Defects Research (Part A), 2010; 88;41-47.

Of 6961 non-syndromic births between 1996 and 2005 with some form of clefting, 976 had a cleft palate alone (i.e. no cleft lip).

22_cleft_palate_single_prop_SE

This is a proportion 976/6961 = 0.140 with standard error

Hence a 95% ci for the proportion with cleft palate alone is:

(0.14 ± 1.96(0.004)) = (0.14 ± 0.0078) = (0.132, 0.148)

(iv) Ref: As (iii)

Of the syndromic births, a higher proportion (279/1172 = 0.238) were cleft palate only.

The difference in proportions of 0.098 (=0.238 - 0.140), has standard error:

23_syndromic_nonsyndromic_diff_prop_SE

And 95% confidence interval (0.098 ± 1.96(0.013)) = (0.098 ± 0.025) = (0.073, 0.123).

(v) Ref: As (iii)

The percentage of non-syndromic births between 1996 and 2005 with some form of clefting who had a cleft palate alone (i.e., no cleft lip) is 0.140x 100 = 14% with standard error:

24_cleft_palate_single_percent_SE

Hence a 95% ci for the percentage with cleft palate alone is:

(14 ± 1.96(0.4)) = (14 ± 0.78) = (13.2, 14.8%)

(vi) Ref: As (iii)

Of the syndromic births 23.8% were cleft palate only.

The difference in percentages of 9.8% has standard error:

25_syndromic_nonsyndromic_diff_percent_SE

And 95% confidence interval (9.8 ± 1.96(1.3)) = (9.8 ± 2.5) = (7.3, 12.3%).

For estimates 7-9, the estimate is transformed appropriately and a 95% confidence interval on the transformed scale is constructed by taking the transformed estimate ± 1.96 standard errors on the transformed scale. The confidence limits can then be back-transformed to give a confidence interval on the original scale.

Back transformation is the converse of the transformation we originally used. For the natural logarithmic transformation (ln) taking the exponential back-transforms the data. This is analogous to squaring a value and then taking the square root to get back to the original value:

26_ln_exp_diagram

This is analogous to squaring the original data and then square-rooting those values to get back to the original.

27_square_root_diagram

The table and graph shown overleaf present numerically and graphically how some values change when they are naturally logged or exponentiated.

Notice that for a negative value the natural logarithm is not defined. Also values below 1 yield negative natural log value and values above 1 give positive natural log values.

In addition, the exponentiated value never goes below zero. Negative values give exponentiated values less than 1, positive values when exponentiated yield outcomes greater than 1.

28a_ln_exp_functions_table

28b_ln_exp_functions_graph

Examples

(vii) Ref: Kiani et al. Prevention of soccer-related knee injuries in teenaged girls. Archives of Internal Medicine, 2010; 170(1):43-49.

Players were either given a training program of exercises designed to reduce knee injury or acted as controls with no intervention. The main outcome was the number of new knee injuries in the following 9 month period. The intervention group sustained 3 injuries within 66,981 player hours compared to 13 in the controls during 66,505 player hours. These equated to rates of 0.044 and 0.195 per 1000 player hours respectively, which gives a relative risk of 0.044/0.195 = 0.226.

The standard error of the logged RR (-1.49)= 

29_lnRR_example

95% ci for ln(RR) = (-2.708, -0.272); 99% ci for ln(RR) = (-3.09, 0.114)

95% ci for RR, 0.226 is (0.07, 0.76); 99% ci for RR = (0.455, 1.12)

(viii) Ref: Kabir et al. Active smoking and second-hand-smoke exposure at home among Irish children, 1995-2007; Archives of Disease in Childhood; 2010: 95, 42-45.

The prevalence of active smoking fell from 17.1% to 15.5% between 1995 and 1998. The corresponding odds were therefore 0.206 and 0.183, and the odds ratio comparing years was 0.89.

Ln(0.89) = -0.12 and the standard error of this is given by:

30_lnOR_example

Hence a 95% confidence interval for ln(OR) = -0.12 ± 1.96(0.1146) = (-0.3446, 0.1046)

Exponentiating these limits gives a 95% ci for the OR, 0.89, of (0.71, 1.11)

(ix) Ref: Oliviero et al. Effects of long-term L-thyroxine treatment on endothelial function and arterial distensibility in young adults with congenital hypothyroidism. European Journal of Edocrinology (2010), 162:289-294

The correlation between Flow Mediated Dilation (FMD) and pubertal mean TSH in 32 patients with congenital hypothyroidism was -0.81:

31_TSH_FMD_scatter

32_correlation_SE_CI_calculation
Confidence intervals when standard error cannot be estimated

The previous sections gave standard errors and their validity. When the samples are not large enough to make the necessary calculations or the sample distributions not suitable alternatives must be used.

Numeric data

In chapter 3 we saw that medians offered a better alternative than means when data were skew. The previous section of this chapter showed that calculation of the standard error for the mean or difference in means was also dependent on having sample sizes of greater than 20.

Hence if the data are not normally distributed and/or are relatively small, the median, or difference in medians, should be used as a summary. As with any sample estimate of a population parameter, medians (or differences) should always be presented with a confidence interval to show their precision and the range of population scenarios that the sample is compatible with.

The calculation of confidence intervals for medians is given in Chapter 8 when the inferences that can be made using non-parametric statistics is discussed.

Categorical data

The calculation of standard errors for proportions, percentages and their differences required the proportions/percentages to be not extreme and for them to be based on samples of at least 20.

If this is not the case, then there are formulae that can be used to calculate the confidence limits directly and these are valid even for extreme values and small samples. If the samples are not small nor the proportions/percentages extreme, these formulae will give the same limits as using sample estimate ± 1.96 standard errors. Hence the following formulae can be used whenever a confidence interval is required for a single proportion or percentage, or the difference between two proportions or percentages.

1 SAMPLE: small samples and extreme proportions

For a 95% confidence interval, calculate:

33_small_sample_extreme_prop_CI_formulae

Where r is the number of events observed in a sample of size n.

A 95% confidence interval for the population proportion is given by:

34_small_sample_extreme_prop_CI_limits

When there are zero events (r=0), then A=B and the confidence interval has lower limit zero and upper limit 3.84/ (n + 3.84)

Examples:

(i) Out of a sample of 15 children, 4 test positive for asthma. This is a proportion 0.2667 (=4/15) or 26.67%

35_small_sample_extreme_prop_CI_example

(ii) A particular problem is to calculate a confidence interval when the percentage is zero. It is not uncommon to find statements in the literature like 'none of the x children in our study suffered side effects .... Therefore, the treatment is safe'. This logic is wrong, because none out of x suffer any side effects does not mean that the treatment is completely free from such effects. 

What is needed is a confidence interval for the percentage that have side effects i.e., the range of population scenarios that zero out of x is compatible with. The formula given above can be used to find this.

Ref: WF Paterson, E McNeill, S Reid, AS Hollman, MDC Donaldson. Efficacy of Zoladex LA (goserelin) in the treatment of girls with central precocious or early puberty. Archive of Disease in Childhood, 1998; 79, 323-327.

Twelve (12) girls were treated and there were no serious side effects. A 95% confidence interval for the population proportion who would suffer side effects is from zero to 3.84/(12+3.84) = 0.242

So, (0, 24.2%) forms a 95% confidence interval for the population percentage.

i.e., if just under a quarter of all such girls given the treatment suffered serious side effects than we would not be very surprised to find no occurrences in our random sample of 12 girls.

The excel spreadsheet linked below will use the above methodology to calculate confidence intervals for single proportions and percentages when samples are small and/or the proportion or percentage is extreme.

The spreadsheet can be used to calculate confidence intervals other than 95% and can also be validly used to calculate confidence intervals for larger samples and non-extreme proportions/percentages.

 

Example:

Entering the numbers 4 and 15 in the spreadsheet tells us that a 95% confidence interval for the proportion with asthma (4/15 or 0.2667) is (0.11, 0.52) or, alternatively, for the percentage is (10.90, 51.95%).

The 90% confidence interval (12.28, 47.88 %) is, as expected, narrower and 80% narrower still at (14.89, 43.05%).

If the numbers were doubled at 8 cases out of 30, then the proportion remains unchanged at 0.2667 but the 95% confidence interval is narrower at (0.14, 0.44).

2 SAMPLES: Small samples and extreme proportions

For a 95% confidence interval for the difference, first calculate the limits for each sample separately as given in the previous section.

If the proportion (confidence limits) for group 1 are l1 and u1, and for group 2 they are l2 and u2, then a 95% confidence interval for the difference is given by:

36_small_sample_extreme_prop_difference_CI_formulae

Example:

(i) Ref: Yung M et al. Outcome of cardiopulmonary resuscitation in hospitalized African children. Journal of Tropical Pediatrics, 2001, 47, 108-109.

None of 52 admissions with cardiorespiratory arrest died (0%) compared to 8 of the 54 respiratory arrests (14.8%).

37_small_sample_extreme_prop_difference_CI_example

The excel spreadsheet linked below will use the above methodology to calculate confidence intervals for the difference between two proportions or percentages when samples are small and/or the proportions or percentages are extreme.

The spreadsheet can be used to calculate confidence intervals other than 95% and can also be validly used to calculate confidence intervals for larger samples and non-extreme proportions/percentages.