Variance Formula: Definition and Examples

By Indeed Editorial Team

Updated April 7, 2021 | Published September 25, 2020

Updated April 7, 2021

Published September 25, 2020

The variance formula tells statisticians about various aspects of a data set. Typically, you'll use two slightly different formulas for calculating the variance for an entire data set versus calculating variance for only a sample of the data set. Additionally, the variance depends on the standard deviation, and both statistical concepts are useful in a variety of settings.

In this article, we'll explore what the variance formula is, why it's important, how it differs from the standard deviation and how to use each formula to calculate the variance of a population and a small sample.

Related: Definitive Guide to Understanding Descriptive Statistics

What is variance?

Variance is the average of the squared differences, also known as standard deviation, from the mean. Simply put, the variance is a statistical measure of how spread apart data points are within a sample or data set. In addition to the mean and standard deviation, the variance of a sample set allows statisticians to make sense of, organize and evaluate data they collect for research purposes.

Essentially, the variance has two formulas you can use depending on the group of data you're measuring. For instance, if you are measuring data from an entire population set, such as an entire college class's grades, you will calculate the variance using this formula:

Variance = (The sum of each term - the mean)^2 / n

Here are the elements of the formula:

  • The variance of your entire population will be the square of the standard deviation.

  • Each term represents each of the values or numbers in your data set.

  • You will need to know the mean of your data set.

  • The expression ^2 represents the squaring function, or in other words, multiplying a number by itself.

  • The variable n represents the number of values you have in your population.

When calculating the variance of just a sample of the population, you'll use this formula:

Variance = (The sum of each term - the mean)^2 / n-1

Here are the elements of the formula:

  • Variance is what you want to find for your sample set.

  • Each term is what you're using to subtract the mean, which you'll also need to know before calculating the variance.

  • The variable n represents the total number of samples you have.

You use n-1 since you are calculating variance for a sample of the whole population rather than the entire population itself.

Related: How to Calculate Variance

Variance vs. standard deviation

Simply put, the standard deviation looks at the exact values of how spread apart a set of data points is from the mean of a population or sample. The variance, though, measures the average degree that each data point differs from the mean. This means the variance is looking at the average of all of the values in your data set, while the standard deviation is looking at the exact valuation of the data's spread.

Although there is this slight difference between these two concepts, variance and standard deviation are dependent on one another. When you find the standard deviation within a sample set or an entire population, you can square this result to get the variance. While this is the simplest relationship between variance and standard deviation, it represents the necessity of understanding how both of these calculations work to provide insight into different aspects of data that you study.

Additionally, the standard deviation represents the relative range of a set of data and does not account for any outliers to either direction of the standard mean. The variance, conversely, represents all variables of change or difference within the data set, including the relative outliers on either side of the mean. Without these two factors of statistics, there would be no diversity within the range of data from the sample set, meaning the values in the data set will be clustered more around the mean rather than spread out, similar to a bell curve.

Read more: How To Calculate Relative Standard Deviation: Formula and Examples

How to calculate the variance of a data set

In statistics, you can calculate the variance of the entire set of data, such as an annual sales report that lists each day's total net sales during the year. You can also calculate just a sample of all data points. In the example of a simple yearly sales report, a sample could be summer sales totals. In this case, statisticians would measure the sample set within a specific date range. In both of these examples, you can calculate the variance using one of the two formulas:

Calculating variance of an entire data set

If you're measuring the entire data set, use the following steps for the variance formula for whole data sets:

Variance = (The sum of each term - the mean)^2 / n

  1. Subtract the mean from each value in your data set. Your first step is to subtract the mean of your population from each of the terms in your set. For instance, assume you have a population of three data points. You will subtract the mean value from each of these three terms. Here's an example assuming the mean value of a population is 35: (108-35, 100-35, 78-35) where each term subtracts 35.

  2. Square each of these differences. Once you have subtracted the mean from all of your terms, square each of these results by multiplying the value by itself. Using the example from above, this is what it would look like: (73), (65), (43) and each of these terms squared results in (5,329), (4,225) and (1,849), respectively.

  3. Add up all the resulting squares. Add up these new values to come to a total sum, like this: (5,329) + (4,225) + (1,849) = 11,403.

  4. Divide the resulting sum by the number of values in your data set. Now you can divide the sum from step three by the total number of values you have in the population you're measuring. Using the example values from the previous steps, the sum you use to divide is 11,403 and the value you use for n is three, since there are only three terms in the example population. Here's what it would look like: (11,403) / (3) = 3,801. So the variance of the entire population is 3,801.

Here is a simplistic version of the example above:

σ2 = ((108-35)^2 + (100-35)^2 + (78-35)^2) / 3
= (73^2 + 65^2 + 43^2) / 3
= (5,329 + 4,225 + 1,849) / 3
= 11,403 / 3
= 3,801

Related: How to Calculate Relative Standard Deviation: Formula and Examples

Calculating variance within a sample of the data

If you are measuring only a sample of the entire data set, you'll rely on the formula that accounts for this with the n-1 term. Just like the variance formula for an entire population, you'll start off this formula in the same way. Follow the steps below:

Variance = (The sum of each term - the mean)^2 / (n-1)

  1. Subtract the mean from each value in your sample set. Just as you would with an entire data set, subtract your mean from each of the terms in your sample. Here is an example assuming the mean is 25 and you have three values in your sample: (33-25), (16-25), (45-25). Your differences will result in (8), (-9) and (20), respectively.

  2. Square each of these differences. After you get each difference, go ahead and square each of these values. Using the example values from the previous step, here are the resulting products: (64), (81) and (400). With this example, you can see how the (-9) value squared to give you a positive value. This is important and essential for the variance, as the variance is more like an average of the points' spread from the mean.

  3. Add up all the resulting squares. Just like the previous variance formula, add up all your resulting products from step two: (64) + (81) + (400) = 545.

  4. Subtract one from the total number of values in your sample set. Before you divide, subtract one from the number of values in your sample set. Using the previous example, you only have three terms. Plug three into the n-1 part of the formula: n-1 = (3) - 1. The result is two.

  5. Divide the sum by the resulting difference of n-1. Finally, divide the sum from step three by two, since this is the resulting difference you arrived at in step four. Use the prior example values to divide: (545) / (2) = 272.5. So the variance of the example sample set is equal to 272.5.

σ2 = ((33-25)^2 + (16-25)^2 + (45-25)^2) / (3-1)
= (8^2 + -9^2 + 20^2) / (3-1)
= (64 + 81 + 400) / (3-1)
= 545 / (3-1)
= 545 / 2
= 272.5

Related: How to Calculate Coefficient of Variation With Examples

Population variance vs. sample variance

The variance of a small sampling of an entire population or data set only gives researchers and statisticians a limited perspective of what's really going on in the entire population. The variance of the population, however, can give statisticians a more accurate representation about the data range and its relationship to the mean. Here are some examples of how this works:

Example of population variance

Assume a statistician wants to measure the variance in weights of a population of zebras in a wildlife preserve. The statistician will first find the mean of the population's weights, and then subtract that value from each weight value. Assume there are five zebras currently being held at the preserve. The statistician measures each zebra's weight at the following values:

  • Zebra 1: 670 pounds

  • Zebra 2: 765 pounds

  • Zebra 3: 780 pounds

  • Zebra 4: 820 pounds

  • Zebra 5: 735 pounds

The statistician then adds up all of these values to get 3,770 total pounds. They divide this value by five, since five is the number of zebras in the entire population. The resulting mean is 754. This means the average weight of the preserve's five zebras is 754 pounds. The statistician then subtracts this mean value from each zebra's weight:

  • 670 - 754 = -84

  • 765 - 754 = 11

  • 780 - 754 = 26

  • 820 - 754 = 66

  • 735 - 754 = -19

The statistician then squares each of these differences before adding up the resulting products:

  • (-84)^2 = 7,056

  • (11)^2 = 121

  • (26)^2 = 676

  • (66)^2 = 4,356

  • (-19)^2 = 361

(7,056) + (121) + (676) + (4,356) + (361) = 12,570

The statistician then divides this sum by the number of zebras in the population: (12,570) / (5) = 2,514. This value represents the variance of the entire population.

Example of sample variance

If the example set of five zebras represents a sample of a larger population, the statistician will subtract one from five before dividing. Here's what that will look like:

(12,570) / (5-1) = 12,570 / 4 = 3,142.5. This means that the variance of just that small sample would then be 3,142.5.

What is the importance of variance?

The variance allows statisticians to understand the breadth of diversity in a sample or entire population, as the variance will often account for any outliers within the population. The variance formula is also useful in many business situations, including measuring and assessing sales numbers, developing products based on market research and many other applicable uses that can benefit businesses and organizations.

In addition to business uses, statisticians rely on the variance to compare different numbers within a range of data. Within an entire data set, the variance is extremely important for tracking outliers, that is, data points that lie far from the mean. The closer to zero the variance gets, the more clustered together the data set is. When the variance results in a higher value and especially expressed as a ratio, the more spread apart (and thus diverse) the data points are.

Explore more articles