Normal Distribution data is required for many statistical tools that assume normality. This page gives some information about how to deal with not normally distributed data.
Step 1
Do normally check Anderson Darling normality test with a high p value you can assume normality of the data. Develve assumes a p value above 0.10 as normally distributed. Develve is on the safe side some people say that 0.05 is enough to assume normality.
Step 2
Find out why the data is possible not normally distributed.
Mixture of various distributions
Samples from different batches
Samples from different dates
Samples form different mold cavities
Try to sort the data in subgroups. This is possible in the DOE mode in Develve.
Example
In this example the data is sorted on the two production lines 1 and 2 and after sorting the data of the both production lines are normally distributed Column B and C, and the original data is in column A. Data file
More information how to sort data in subgroups see here.
Example 2
Sometimes the indication of a mixture of 2 different distributions is not clearly visible in the histogram but when looking to the normally plot there is a bend in line (see graph below). Data file
Extreme values (outliers)
Too many outliers will result in non normality. If the outliers are special causes it wise to filter these data points. But be aware in normally distributed data-set you can expect some outliers. In normally distributed data a outlier is not always caused by a special cause.
When filtering the data you should analysis and explain why you can remove these outliers.
Example
In the example in column B is the filtered data and in column C are the outliers and in column A is the original data. After filtering the data is normally distributed. Data file
Cases that are not solvable by rearranging the data.
Sorted data
The data set is only a part of all the data and all the data outside the tolerance borders is filtered. Data file
On from left to right: the original data, without the data above the tolerance border, data without min max tolerance and only data above the upper tolerance.
This can happen when analyzing
Field returns
Line rejects
Data without the rejects
Data is close to zero or a other limit
Data close to the zero or the optimum will tend to skew to the left.
Low resolution of the measurement
Due low resolution of the measurement the data is rounded to the nearest digit. This leads to data that the data is grouped in small sets see graph. To solve this try to increase the measurement resolution. Use the histogram or the individual dot plot see if there is a rounding effect in the data. Data file
Data is following an other distribution
Lifetime data is often not normal distributed (wear out). This data is often following the Weibull or Lognormal distribution. For this data use Weibull analysis.
Data is close to zero or a other limit
Proportional data
Example
Use the Distribution fitting function Tools=>Distribution fitting. The graph with the highest Correlation coefficient (r²) is the best fitting distribution. Data file
Step 3
If the case is not solvable by rearranging the data there are two options. Transform data or use a test that is not based on a normally assumption.
Transform
With the Box-Cox transformation it is possible to transform non normal distributed data to a more normal distributed data-set see Box-Cox transformation.