There are a variety of statistical tests available to formally test normality.
However, it is NEVER possible to prove that a variable is normally distributed, only to show that the sample data is compatible with normality. The formal tests available answer the question:
"Could this data have come from a normal distribution?"
Rather than what we wanted to know, which was…
"Does this data come from a normal distribution?"
- The data may also be compatible with other, distinctly non-normal, distributional shapes.
- Formal tests of normality depend more on sample size than distributional shape.
- Simpler methods are to be preferred.
Simple methods are:
i) To examine a bar chart, or histogram, of the data.
ii) Use the mean and standard deviation of the data value to construct the interval within which 95% of the values would be expected to lie if the data were normally distributed. (i.e., the interval (mean ± 1.96 standard deviations)).This interval should exclude approximately 2.5% of the sample values at either side, if it does not, and/or if the interval has limits that are unfeasible (for example negative ages), then this implies that the data is non-normal.
For example: Shown is the barchart of the serum bilirubin measurements in 216 patients with primary biliary cirrhosis:
The distribution is distinctly upwardly skew. Putting all 216 values into an appropriate statistical package, the mean value is calculated as 56.43 and the standard deviation 64.018
If the data were normally distributed then the interval bounded by (mean ± 1.96 sd) would contain about 95% of the sample values. ('About' because there may be some sampling variability - there would be exactly 95% within this interval in the population but a sample of 216 might yield slightly less or slightly more.)
The interval limits are calculated as:
Mean - 1.96sd = -69.045 and Mean + 1.96sd = 181.905
If the data were normally distributed we would expect approximately 2.5% (or 5.4) of the values to be less than -69.045 and a similar number to be greater than 181.905. Most of the values should lie in the interval (-69.045, 181.905). No values can be lower than zero and there are more than 6 with values above the upper limit. The lower limit of -69.045 is biologically unfeasible. These findings indicate that the data is non-normal and, in fact, upwardly skew.
It is not uncommon to find non-normal data summarised using the mean and standard deviation. Some statistical tests are invalid if the data samples are non-normal.
When reviewing publications that have data summarised as the mean and standard deviation, if the interval (mean ± 2 sd) has unfeasible limits then this may indicate that the statistical tests subsequently performed are invalid.
Ref: Rautanen T et al, Randomised double blind trial of hypotonic oral rehydration solutions with and without citrate, Archives of Disease in Childhood, 1994; 70, 44-46.
In the first table, age, duration of vomiting and weight loss have unreasonable lower limits for the interval (mean±2sd). For instance, the average age of the citrate ORS group is 13.5 months with a standard deviation of 6.9 months. If the ages are normally distributed, then we expect about 95% of the patients to have been aged between 13.5-2(6.9) and 13.5+2(6.9) i.e. between -0.3 and 27.3 months. Clearly an age of -0.3 months is not possible. The implication is that these variables are upwardly skew and the chosen means of presentation and analysis are invalid.
In the second table, weight increase and the durations of vomiting, diarrhoea and stay also appear to be upwardly skew and inappropriately presented.
Upward skew distributions of values are far more common than downward skew. This is because there is often a lower limit (usually zero) below which values cannot fall. What was found in the above example fits in with a plausible clinical scenario. For instance, patients could not have a negative duration of stay but it is likely that most people only stayed for a relatively short period with a few staying much longer who formed an upward tail to the distribution as well as heavily influencing the mean.
Sometimes non-normally distributed data can be transformed to normality.
The transformations used should not change the
relative ordering of the values but alter the distance between successively
ordered values to change the overall shape of the distribution.
For example:If a dataset is transformed by squaring each of the values the larger values will be pulled further apart than the smaller values.
- There is a difference of 1 between 2 and 3 prior to transformation; after squaring the measurements 2 becomes 4 and 3 becomes 9 and the difference between the transformed measures is 5 (9-4).
- There is a difference of 1 between 10 and 11 prior to transformation; after squaring the measurements 10 becomes 100 and 11 becomes 121 and the difference between the transformed measures is 21 (=121-100)
After transformation the higher measurements (10 and 11) are more apart than the smaller (2 and 3).
Squaring data values can therefore be used to normalise downward skew data (by pulling apart the higher measurements an upward tail is created to match the downward skew and hence give a normal distribution).
There are a variety of transformations that can be used to correct for skewing to a greater or lesser extent. The correct transformation to use will depend on both the direction and extent of skew. It is possible to over-correct by using too powerful a transformation and change the direction of the skew. For example, a small amount of downward skewing might be over-balanced by squaring the measurements and result in an upward skew distribution.
Tukey's ladder of transformations (shown below) gives several common transformations to correct skew in each direction and illustrates the relative effectiveness of these.
For example, the ladder shows that squaring corrects downward skew and that cubing the data gives an even stronger correction; i.e., if we cube rather than square the values then the right hand (higher) values are pulled apart even more, creating a more extreme upper tail.
Upwardly skew data is not uncommon in medical
applications and many measurements which display upward skewing are what is
known as 'lognormally
distributed'. When data is lognormally distributed, taking logarithms
(or logs) of the data values will normalise the data.
Serum triglyceride values in cord-blood are lognormal:
Choosing a suitable transformation can be a matter of trial and error. Logging corrects upward skew; if data is downwardly skew then logging will make the skewness worse. Downward skew may be corrected (to varying extents) by squaring, cubing or anti-logging.
Often it is possible to use a transformation that has some biological basis. For example, taking square roots of areas or cube roots of volumes may be effective. Taking logarithms may not seem intuitive but this transformation is particularly useful when there are different groups to be transformed and compared. The particular properties of logarithmic transformation are illustrated later in this course.
All of these transformations change the magnitude of the data values, some more than others, to reduce the skew. Note that they never change the relative ORDER OF THE VALUES.
Some distributions show skewing so extreme that a large percentage of measurements are at one of the extremes.
For example, psychological tests often consist of a rating scale, 'normal' people being expected to score zero on the scale, higher scores indicating deviances from 'normal' behaviour or emotions. It is not uncommon to collect a sample which consists mostly of zeros, or ones, ('normal' people). The result is what is known as a J-shaped distribution.
The J-shape may be skewed to the left or the right depending on whether the majority of measurements are at the lower or upper extreme.
Examples of each type are shown below:
Ref: Thornton A, Morley C, Green S, Cole T, Walker K and Bonnett J, Field trials of the Baby Check score card: mothers scoring their babies at home, Archives of Disease in Childhood, 1991; 66: 106-110.
Figure: Profile of daily scores (n=701). The numbers of babies with each score are shown at the top of each column.
Most babies are completely healthy (implied by a score of 1) and this is shown by the majority of measurements at the lower extreme.
Ref: Tibrewal S and Foss M, Day care surgery for the correction of hallux valgus, Health Trends, 1991; Vol 23 No. 3: 117-119.
Figure: Linear analogue scores for pain relief after Wilson's Osteotomy.
Most individuals had 100% (full) pain relief after the operation.
Since transformations never change the order of the sample, any transformation of a J-shaped distribution will still be J-shaped. The extreme measurements will all transform to the same new value, and will always be at one extreme of the transformed sample.
J-shaped distributions cannot be transformed to normality.