5 Ways To Find Outliers in Statistics (With Examples)

By Indeed Editorial Team

Updated February 15, 2022

Published June 29, 2021

The Indeed Editorial Team comprises a diverse and talented team of writers, researchers and subject matter experts equipped with Indeed's data and insights to deliver useful tips to help guide your career journey.

An outlier in statistics is any data point that differs significantly from the other data points. Outliers may be errors or significant observations so it’s necessary to find them and understand them. In this article, we discuss outliers in statistics, how to find outliers in your data and we provide examples.

Why is it important to find outliers in statistics?

Outliers in statistics can significantly change the outcome of your data, especially if you’re trying to calculate the average or mean of a data set where all of the other points of data have a different value range.

You might ultimately end up removing an outlier from your results if you find it was recorded in error, but it’s necessary to first analyze it in order to understand its meaning.

Outliers may also be able to expose inconsistencies in research and data gathering techniques and help you refine your procedures.

5 ways to find outliers

Here are five ways to find outliers in your data set:

1. Sort your data

An easy way to identify outliers is to sort your data, which allows you to see any unusual data points within your information. Try sorting your data by ascending or descending order, then examine the data to find outliers. An unusually high or low piece of data could be an outlier. 

For example, if you have these numbers in ascending order: 3, 6, 7, 10 and 54, you can see that 54 is a lot larger than the rest of the data points. Statisticians would consider 54 an outlier. 

Read More: How To Sort Data in Excel (With Step-by-Step Instructions)

2. Graph your data

You can also use graphs, such as scatter plots or histograms, to find outliers. Graphs present your data visually, making it easy to see when a piece of data differs from the rest of the data set. A scatter plot displays your points of data as dots on a graph based on two variables, plotted on your x-axis and y-axis. Scatter plots are useful for visualizing outliers because you can see when one dot is far away from the other dots, which are usually clustered together. Therefore, the data point that is far away from the group is the outlier.

A histogram displays data in groups called "bins." Histograms usually group data in ranges, which is what differentiates histograms from bar graphs. Your range of data is usually the x-axis and your other variable is typically the y-axis. This can help identify unusual data points. For example, if most of your data points are on the right side of the graph and one bin of data is on the left side of the graph, then you can deduce that the far left bin is an outlier.

Related: A Guide to Histogram Graphs

3. Calculate the z-score

A z-score, or standard score, shows how far away a data point is from the mean of the data. To calculate the z-score, you subtract the mean from the raw measurement and divide it by the standard deviation.

The equation for calculating the z-score is:

Z = (X−µ) ÷ σ

where:

X = raw measurement

µ = the mean

σ = the standard deviation

The further the z-score is from 0, the more unusual the data point is. For example, if the z-scores for your data points are: -0.35, -0.26, -.021, -0.18 and 4.7, you can tell that the data point with a z-score of 4.7 is the furthest away from 0 and is the outlier.

Read more: How To Calculate a Z-score

4. Calculate the interquartile range

The interquartile range (IQR) measures the dispersion of the data points between the first and third quartile marks. The general rule for using it to calculate outliers is that a data point is an outlier if it is over 1.5 times the IQR below the first quartile or 1.5 times the IQR above the third quartile.

To calculate the IQR, you need to know the percentile of the first and third quartile. The median of the upper half of the data set is the percentile for the third quartile, and the median of the lower half of the data set is the percentile for the first quartile. 

To find the IQR, you subtract the first quartile from the third quartile:

IQR = Q3− Q1

where:

Q3 = the third quartile = the median of the upper half of the data set

Q1 = the first quartile = the median of the lower half of the data set

You can then use the IQR to find any outliers in your data set. The equations to calculate low or high outliers via the IQR range are:

High outlier ≥ Q3 + (1.5 x IQR)

Low outlier ≤ Q1 − (1.5 x IQR)

Read more: How To Find the Median of a Data Set in Statistics

5. Use a hypothesis test

If you would like to try more advanced options to find outliers, consider trying hypothesis tests such as Grubbs' test, generalized ESD or Pierce's Criterion. Hypothesis tests involve processing data through equations to see whether it matches predicted results. Grubbs' test can be used when you suspect just one outlier in a normally distributed data set.

The generalized extreme studentized deviate (ESD) test can use data with only one variable to test for more than one outlier. Statisticians use Pierce's Criterion to find and eliminate outliers by calculating how the standard deviation compares to the mean of the data set.

Since it's difficult to select the correct hypothesis test unless you know a lot about your data set, they can be inaccurate or challenging to complete. You can study them in advance to help you select the right one, or consider whether simpler methods could allow you to find outliers in your data.

Explore more articles