All About The Empirical Rule In Statistics

The empirical rule, also known as the three-sigma rule or the 68-95-99.7 rule, is a statistical rule that states that for a normal distribution. The observed data will be within three standard deviations (denoted by σ) of the mean. -average (denoted by µ).

Specifically, the empirical rule predicts that 68% of observations fall within the first standard deviation (µ ± σ). Another 95% within the first two standard deviations (µ ± 2σ), and 99.7% within the first three standard deviations (µ ± 3σ).

Important point

The Empirical Rule states that 99.7% of the observed data following a normal distribution is within 3 standard deviations of the mean.

Under this rule, 68% of the data is within one standard deviation, 95% per cent is within two standard deviations, and 99.7% is within three standard deviations from the mean.

Three sigma limits following the Empirical Rule are used to define upper and lower control limits in statistical quality control charts and risk analyses such as VaR.

Understanding Empirical Rule

Empirical rules are often used in statistics to predict final results. After calculating the standard deviation and before collecting precise data, this rule can be used as a rough estimate of the future data results to be collected and analyzed.

This probability distribution can thus be used as a provisional heuristic since collecting relevant data may be time-consuming or even impossible in some cases.

Such considerations come into play when a company reviews its quality control measures or evaluates its risk exposure. For example, a popular risk tool known as value-at-risk (VaR) assumes that the likelihood of risk events follows a normal distribution.

Empirical rules are also used as a rough way to test the “normality” of a distribution. Suppose too many data points fall outside the three standard deviation limits. This indicates that the distribution is not normal and may be skewed or follow another distribution.

The empirical rule is also known as the three-sigma rule because of “three-sigma”. It refers to the statistical data distribution within three standard deviations of the mean in a normal distribution (bell curve), as shown in the figure below.

Examples of Empirical Rules

Let’s assume the population of animals in a zoo is known to be normally distributed. Each animal lived to an average of 13.1 years (mean), and the standard deviation of its lifespan was 1.5 years.

They could use empirical rules to know the probability that an animal would outlive 14.6 years. Knowing the mean of the distribution is 13.1 years, the following age ranges occur for each standard deviation:

1. One standard deviation (µ ± σ): (13.1 – 1.5) to (13.1 + 1.5), or 11.6 to 14.6

2. Two standard deviations (µ ± 2σ): 13.1 – (2 x 1.5) to 13.1 + (2 x 1.5), or 10.1 to 16.1

3. Three standard deviations (µ ± 3σ): 13.1 – (3 x 1.5) to 13.1 + (3 x 1.5), or, 8.6 to 17.6

The person solving this problem needs to calculate the total probability of the animals living 14.6 years or more. The empirical rule suggests that 68% of the distribution is within one standard deviation, in this case, from 11.6 to 14.6 years.

So, the remaining 32% of the distribution falls outside this range. Half is above 14.6, and half is below 11.6. So, the probability of an animal surviving over 14.6 is 16% (calculated as 32% divided by two).

As another example, assume an animal in zoo lives for average of 10 years, with standard deviation of 1.4 years. Assume a zookeeper is trying to determine the probability that an animal lives for more than 7.2 years. This distribution looks as follows:

1. One standard deviation (µ ± σ): 8.6 to 11.4 years

2. Two standard deviations (µ ± 2σ): 7.2 to 12.8 years

3. Three standard deviations ((µ ± 3σ): 5.8 to 14.2 years

The empirical rule states that 95% of the distribution is within two standard deviations. So, 5% is outside two standard deviations; half over 12.8 years and a half under 7.2 years. So, the probability of living more than 7.2 years is:

95% + (5% / 2) = 97.5%

Normal Distribution

The Probability Distribution has various forms. Some are called Binomial, Uniform, and Exponential Distributions. We will focus on one Probability distribution form, namely the Normal Distribution.

The Normal Distribution or Gaussian Distribution, also known as the Bell Curve, is a form of the Opportunity Distribution that is very important in statistics. Why?

Because there are many phenomena with random variables in this world whose data conforms to the Normal Distribution. Also, the Distribution of the Mean value of sample data with a large enough sample size will be in the form of a Normal Distribution.

The Normal Distribution is a probability function that shows how the values of a variable are distributed. The Normal Distribution has a symmetrical shape, not skewed to the right or left.

Parameters of the Normal Distribution

The Mean and Standard Deviation are parameters of the Normal Distribution. These two values will determine the shape of the histogram or the curve of the distribution of weights.

The Mean value determines the position from the midpoint of the Normal Distribution. Meanwhile, the Standard Deviation value changes the height of the histogram or curve. If the Standard Deviation value is high, then the spread of data values is also increased to change the form of the Normal Distribution.

Properties of the Normal Distribution

All Normal Distributions, even though they have different shapes, have several properties or characteristics, include:

The body is always symmetrical between right and left.
Mean, Median, and Mode are the same.
Half of the population of data values in the distribution are in the range of values less than the Mean, and the other half are more significant than the Mean.
Now how do we understand the data through this Normal Distribution?

Empirical Rule in the Normal Distribution

In a Normal Distribution, several conditions are confirmed to be accurate. That is, some of the data shown in percentage will be at a certain Standard Deviation number from the Mean value.

What are these Empirical Rules?
If the Standard Deviation value is 1, then as much as 68% of the data value will be in it.
Meanwhile, if the Standard Deviation value is 2, then as much as 95% of the data value will be in it.
However, if the Standard Deviation value is 3, then as much as 99.7% of the data value will be in it.

Conclusion

The Empirical Rule is a statistical tool that provides an estimate of the likelihood of different observations within a large population. It is important to note that these estimates are not always precise and outliers may exist that do not fit within the distribution.

Therefore, it is important to exercise caution when making predictions or decisions based on the Empirical Rule.