Normal Distribution data is required for many statistical tools that assume normality. This page gives some information about how to deal with not normally distributed data.

Step 1

Do normally check Anderson Darling normality test with a high p value you can assume normality of the data. Develve assumes a p value above 0.10 as normally distributed. Develve is on the safe side some people say that 0.05 is enough to assume normality.

Step 2

Find out why the data is possible not normally distributed.

Mixture of various distributions

Samples from different batches

Samples from different dates

Samples form different mold cavities

Try to sort the data in subgroups. This is possible in the DOE mode in Develve.

Example

In this example the data is sorted on the two production lines 1 and 2 and after sorting the data of the both production lines are normally distributed Column B and C, and the original data is in column A.

Sometimes the indication of a mixture of 2 different distributions is not clearly visible in the histogram but when looking to the normally plot there is a bend in line (see graph below).

Too many outliers will result in non normality. If the outliers are special causes it wise to filter these data points. But be aware in normally distributed data-set you can expect some outliers. In normally distributed data a outlier is not always caused by a special cause.

When filtering the data you should analysis and explain why you can remove these outliers.

Example

In the example in column B is the filtered data and in column C are the outliers and in column A is the original data. After filtering the data is normally distributed.

Cases that are not solvable by rearranging the data.

Sorted data

The data set is only a part of all the data and all the data outside the tolerance borders is filtered.

Data file
On from left to right: the original data, without the data above the tolerance border, data without min max tolerance and only data above the upper tolerance.

This can happen when analyzing

Field returns

Line rejects

Data without the rejects

Data is close to zero or a other limit

Data close to the zero or the optimum will tend to skew to the left.

Low resolution of the measurement

Due low resolution of the measurement the data is rounded to the nearest digit. This leads to data that the data is grouped in small sets see graph. To solve this try to increase the measurement resolution. Use the histogram or the individual dot plot see if there is a rounding effect in the data.

Lifetime data is often not normal distributed (wear out). This data is often following the Weibull or Lognormal distribution. For this data use Weibull analysis.

Data is close to zero or a other limit

Proportional data

Example

Use the Distribution fitting function Tools=>Distribution fitting. The graph with the highest Correlation coefficient (r²) is the best fitting distribution.

If the case is not solvable by rearranging the data there are two options. Transform data or use a test that is not based on a normally assumption.

Transform

With the Box-Cox transformation it is possible to transform non normal distributed data to a more normal distributed data-set see Box-Cox transformation.