# The Law of Large Numbers and Six Sigma Data

## Introduction

The **Law of Large Numbers** is one of several mathematical theorems which expresses the idea that as the number of trials (N) of a random event increases, the percentage difference between the expected values and the actual values comes down to zero.

That is, according to the Law of Large Numbers, the average of the values for a large number of trials will be close to the expected value and will further converge as the number of trials increases.

## Importance to the Project Manager

Two concepts which are very important to the project manager are the Law of Large Numbers (described above) and the Central Limit Theorem. While they seem like very mathematical terms, they have widespread applications and are very important when it comes to determining sample size for data collection.

For information regarding the mathematics behind these two concepts, please refer Law of Large Numbers and Central Limit Theorem on Wikipedia.

## Applications in Six Sigma

The DMAIC methodology or the DMADV methodology of Six Sigma, have Define and Measure as the first two phases. Data collection and measurement, hence, play a very vital role in Six Sigma projects.

Factors that influence the sample size include:

- Degree of precision required
- Available time and effort
- Acceptable error tolerance

With all else being equal, an increase in the sample size, increases the overall accuracy of the measurement and reduces the error in the measurement.

## Real World Analogy

This is an analogy of a real world situation, which illustrates the usage of the terms “samples” and “populations” to a layman who doesn’t have a lot of background knowledge about statistics.

**Six Sigma Project**

In a Six Sigma project, data is assumed to be distributed normally. A sample (of a certain size) of this data set is taken and population parameters are evaluated based upon this sample in the Define and Measure Phases.

**Real World Analogy**

Imagine, a trial version software which you can utilize for a limited period of time. By using it, it helps you form an opinion and have some perspective about the software. It helps you determine the software’s features and see how useful the software is. The longer you use the software, and the more features the trial version has, the more accurate your judgement is about the final software. This in turn helps you determine whether to purchase the software or not.

A sample and a sample size are analogous to the above scenario. A sample is like the trial software, and helps you determine some parameters about the population which is analogous to forming a judgement about the final software. The features and duration of the trial period are akin to the sample size – the more the features and the longer the duration, the more accurate your judgement is, just as how, the greater the sample size, the more precise your measurement is.

Here, the underlying idea is that, with an increase in sample size, the sample becomes **more representative** of the population. (I.e., it reflects the population more accurately). Conversely, if you pick a sample size that’s too small or study a process for too short of a time period, the measurements you obtain may not give you a true picture of what’s really taking place.