parent nodes: Anderson Darling normality test | Box-Cox transformation | Choosing a statistical test | Cp Cpk % out of tolerance | Distribution fitting | IndexHB | Kurtosis | Measuring and data collection | One way Anova | Skewness | Support | t-test | Variation F-test | WilcoxonMannWhitneytest


What to do with not normally distributed Data

Main Help

Statistical Handbook
Measuring and data collection
Choosing a statistical test
Minimum sample size
Not normally distributed
Statistical process control (SPC)


Full version of Develve
For commersial use
75 EURO

Normal Distribution data is required for many statistical tools that assume normality. This page gives some information about how to deal with not normally distributed data.

Step 1

Do normally check Anderson Darling normality test with a high p value you can assume normality of the data. Develve assumes a p value above 0.10 as normally distributed. Develve is on the safe side some people say that 0.05 is enough to assume normality.

Step 2

Find out why the data is possible not normally distributed.

Mixture of various distributions

Try to sort the data. This is possible in the DOE mode in Develve.

Example

In this example the data is sorted on the two production lines 1 and 2 and after sorting the data of the both production lines are normally distributed Column B and C, and the original data is in column A.

Data file

Example 2

Sometimes the indication of a mixture of 2 different distributions is not clearly visible in the histogram but when looking to the normally plot there is a bend in line (see graph below).

Data file

Extreme values (outliers)

Too many outliers will result in non normality. If the outliers are special causes it wise to filter these data points. But be aware in normally distributed data-set you can expect some outliers. In normally distributed data a outlier is not always caused by a special cause.

When filtering the data you should analysis and explain why you can remove these outliers.

Example

In the example in column B is the filtered data and in column C are the outliers and in column A is the original data. After filtering the data is normally distributed.

Data file

Drift in measurement system

Look to the Time graph.

Data file

Cases that are not solvable by rearranging the data.

Sorted data

The data set is only a part of all the data and all the data outside the tolerance borders is filtered.

On the left the original data, in the middle data without data above the tolerance border and to the right data without data outside the min and max tolerance.

Data is close to zero or a other limit

Data close to the zero or the optimum will tend to skew to the left.

Low resolution of the measurement

Due low resolution of the measurement the data is rounded to the nearest digit. This leads to data that the data is grouped in small sets see graph. To solve this try to increase the measurement resolution. Use the histogram or the individual dot plot see if there is a rounding effect in the data.


Data file

Data is following an other distribution

Example

Use the Distribution fitting function Tools=>Distribution fitting. The graph with the highest Correlation coefficient (r²) is the best fitting distribution.

Data file

Step 3

If the case is not solvable by rearranging the data there are two options. Transform data or use a test that is not based on a normally assumption.

Transform

With the Box-Cox transformation it is possible to transform non normal distributed data to a more normal distributed data-set see Box-Cox transformation.

Test not based on normal assumption

No normal assumptionBased on normal assumption
1 sample Wilcoxon median testone sample t-test
2 sample Mann-Whitney median test2 sample t-test
Variation Levene testVariation F-test
Kruskal-Wallis TestOne way Anova