You are here:

Choosing the appropriate data display

Selecting the appropriate data display method is crucial for accurately representing and interpreting data. Different types of data—whether univariate, bivariate, categorical, or numerical—require specific display methods to reveal meaningful insights. Each method has its unique features that emphasise different aspects of the data. Key display methods, including bar charts, histograms, dot plots, stem-and-leaf plots, and box plots, and their statistical measure are explored to effectively interpret data.

Use this page to revise the following concepts within choosing the appropriate data display:

Bar Charts
Histogram
Dot Plot
Stem Plot
Box Plot

Bar Charts

A Bar chart is a graphical display of data using bars of different heights. Bar charts are specifically designed to display categorical data. In a bar chart, each bar’s height (or length) is directly proportional to the value it represents, allowing quick visual comparisons among categories.

A variation on the standard bar chart is the segmented bar chart. It is a compact display that is particularly useful when comparing two or more categorical variables. In a percentage segmented bar chart, the lengths of each segment in a bar are determined by the percentages, and the total height of the bar is 100%.

Bar charts are particularly useful for identifying the mode or modal category (the most frequently occurring value or category), which is shown as the tallest bar or the longest segment in a bar chart.

Bar Chart
Segmented Bar Chart
Percentage Segmented Bar Chart

Check your understanding View

Histogram

A histogram is a graphical representation of data using columns, where the height of each column indicates the frequency of data points within a specific range. Unlike bar charts, which display categorical data, histograms are used for numerical data. Histograms are particularly useful for identifying the most and least common values and analysing the shape of data distribution.

Mean

The mean is calculated by dividing the sum of all data values by the total number of data points. It represents the average value and helps summarise the overall centre of the data distribution.

In a histogram, the data is often grouped into intervals, covering a range of values. The frequency of exact values within each interval are not known, so the mean can often only be estimated approximately.

Median

The median in a histogram is the point that divides the histogram into two equal parts, with 50% of the data represented by the columns to the left and 50% to the right. Similar to the mean, median can often only be estimated approximately in a histogram.

Mode/Modal Interval

In a histogram with grouped continuous data, the tallest column represents the modal interval, indicating the most frequent range of values. To clearly define the distribution, the full range should be specified (e.g., 10–<15 rather than just the starting value).

In a histogram with ungrouped discrete data, the mode is the tallest column, showing the most frequently occurring value.

The mode (or modal interval) is useful for understanding central tendencies and identifying skewness in a distribution, helping to interpret patterns in the data.

Histogram with Grouped Continuous Data
Histogram with Ungrouped Discrete Data

Features of Data Distribution from Histograms

A histogram visually represents the distribution of numerical data, highlighting key features including shape and outliers. These features help understand patterns, identify trends, and draw meaningful conclusions from the data.

Shape

A histogram can be positively skewed, negatively skewed, or symmetric. Identifying its shape is essential for selecting appropriate summary statistics and understanding data distribution patterns.

Positively skewed: the distribution has a short left tail and a long right tail, indicating that most values are clustered towards the left lower end, while a few larger values extend toward the positive side. This often causes the mean to be greater than the median or mode.
Negatively skewed: the distribution has a short right tail and a long left tail, indicating that most values are higher, but a few smaller values extend toward the negative side. This often causes the mean to be less than the median or mode.
Symmetric: the distribution appears balanced around the central vertical axis, forming a mirror image.
- In a single-peaked symmetric distribution, there is one clear peak, the mean, median, and mode are usually similar.
- In a bimodal symmetric distribution has two distinct peaks, each representing a mode, with the mean and median usually falling in between the two peaks.

Positively Skewed Histogram
Negatively Skewed Histogram
Single-peaked Symmetric Distribution
Bimodal Symmetric Distribution

Outliers

Outliers are data points that stand out from the main body of the data, appearing unusually high or low. Identifying and investigating outliers is important, as they may indicate rare occurrences, experimental anomalies, or potential errors in data collection.

Check your understanding View

Dot Plot

A dot plot is a graphical representation to display discrete numerical data using a number line, with each data point represented by a dot. When multiple data points have the same value, the dots are stacked on top of each other. Dot plots are particularly useful for small data sets, providing a visualisation of the distribution and frequency of values.

In a dot plot, the mode, which is the value that appears most frequently, can be easily identified by the tallest stack of dots.

Shape and Outliers

The shape of a dot plot, like a histogram, shows the overall distribution of the data.

Positively skewed: the dots are clustered on the left side, with a long tail extending to the right, indicating a few higher values.
Negatively skewed: the dots are clustered on the right side, with a long tail extending to the left, indicating a few lower values.
Symmetric: the left and right side of the dots are approximately distributed around a central peak, forming a mirror image of each other.
Outliers: Individual dots that are stand-out from the main cluster of data points, indicating values that are significantly high or low.

Summary Table with Examples

Positively skewed	Negatively skewed	Symmetric	Outliers

Measures of Centre

The mean and the median are both measures of the centre of a distribution.

When the distribution is symmetric and there are no outliers, either the mean or the median can be used to indicate the centre of the distribution. However, if the distribution is clearly skewed and/or there are outliers, the median is the preferred measure, as it is less affected by extreme values.

The mean is the average of all data points, calculated by summing the data values and dividing by the total number of data values.
Worked Example
The dot plot shows the distribution of the number of children in 16 families. Find the mean number of children in these families, correct to 2 decimal places.
\(\textrm{Mean}\) \(=\) \(\frac{0\times 3+1\times 2+2\times 6+3\times 2+4\times 1+7\times 1+8\times 1}{16}\) \(=\) \(2.4375\approx 2.44\)

Median

Steps	Solution
1. There is an even number of data values (16), use the rule to find the positions of the middle two values.	The median is the average of the \(\frac{16}{2}=8\)th and \((\frac{16}{2}+1)=9\)th values.
2. Locate the middle data values on the dot plot.
3. Write down the median.	\(\textrm{Median}=\frac{2+2}{2}=2\) Note: You can verify your answer by counting the number of data values on each side of the median. They should be equal.

Measures of Spread

Measures of spread in a dot plot describe how the data is distributed around the centre. The range, interquartile range, and standard deviation are commonly used to measure spread.

To measure the spread of a data distribution around the median, the interquartile range is used.
To measure the spread of a data distribution about the mean, the standard deviation is used.

Range measures the difference between the largest and smallest values in the data set and can be affected by outliers.
\(R=\) largest data value - smallest data value.
Worked Example
The dot plot shows the distribution of the number of rainy days recorded in a high rainfall area for 14 months. Determine the range of the number of rainy days over 14 months.
\(R=20-8=12\)

Interquartile Range

Interquartile range (IQR) measures the spread of the middle 50% of data in a dot plot. It is calculated as the difference between the third quartile \((Q_{3})\),which marks the 75th percentile, and the first quartile \((Q_{1})\), which marks the 25th percentile.

\(IQR=Q_{3}-Q_{1}\)

Unlike the range, the IQR is generally not affected by the presence of outliers, making it a more reliable measure of spread.

Worked Example 1

The dot plot shows the distribution of the number of rainy days recorded in a high rainfall area for 14 months. Determine the interquartile range of the number of rainy days over 14 months.

The dot plot shows the number of rainy days on the horizontal axis, ranging from 8 to 20. There are 3 dots above 15, 2 dots each above 13 and 17, and 1 dot each above 8, 11, 12, 14, 19, and 20.

Steps	Solution
1. There are 14 values in total. This means that there are 7 values in the lower half, and 7 in the upper half.
2. The median of the lower half is the median of lower 7 values, which is the \(\frac{7+1}{2}=4\textrm{th}\) value.	Lower half: 8, 11, 12, 13, 13, 14, 15
3. The median of the upper half is the median of upper 7 values, which is also the \(\frac{7+1}{2}=4\textrm{th}\) value.	Upper half: 15, 15, 16, 16, 17, 19, 20
4. Calculate \(IQR\) using the formula.	\(IQR=16-3=3\)

Hidden break text

Worked Example 2

The dot plot shows the age distribution (in years) of the 19 members of a local soccer team. Determine the interquartile range of their ages.

The dot plot displays age (in years) on the horizontal axis, ranging from 17 to 29. There are 4 dots above 22, 3 dots above 20 and 23, 2 dots above 18 and 31, and 1 dot above 19, 24, 25, 26, and 28.

Steps	Solution
1. There are 19 values in total. Calculate the median first.	The median is the \(\frac{19+1}{2}=10\textrm{th}\) value, which is 22. Mark the value 22 on the dot plot.
2. Since \(n\) is odd, to find the quartiles, the median value is excluded. This leaves 9 values below the median and 9 values above the median.
3. \(Q_{1}=\) median of the lower 9 values.	Lower half: 18, 18, 19, 20, 20, 20, 21, 21, 22
4. \(Q_{3}=\) median of the top 9 values.	Upper half: 22, 22, 23, 23, 23, 24, 25, 26, 28
5. Calculate \(IQR\) using the formula.	\(IQR=23-20=3\)

Standard deviation uses the mean as centre and is most appropriate for symmetrical distribution. It measures, on average, how far the data points are from the mean. The formula for calculating the standard deviation, s, is as follows:
\(s=\sqrt{\frac{\sum(x-\bar x)^{2}}{n-1}}\)
where \(n\) is the number of data points in the set and \(\bar x\) is the mean of the \(x-\)values.
However, a calculator or statistical software is typically used to determine the value of a standard deviation.
A large standard deviation indicates that the data points are more spread out, whereas a small standard deviation means the data points are closer to the mean. Therefore, the standard deviation is especially useful when comparing two datasets to assess their variability.

Check your understanding View

Stem Plot

A stem plot (or stem-and-leaf plot), is a graphical representation of numerical data, which works best for small to medium-sized datasets (up to about 50 data values). Like dot plots, stem plots are designed as pen-and-paper techniques.

In a stem plot, each data value is divided into two parts: the leading digits form the ‘stem’, and the last digit is the ‘leaf’. A key (e.g., \(1|2=12\)) should always be included to clarify how the numbers in the plot should be interpreted.

Splitting the stems is particularly useful when there are few different values for the stem, and many data values share the same one or two leading digits. This technique helps to prevent overcrowding in the plot and allows for an interpretation of the underlying shape of the distribution.

Stem Plot
Stem plot with split stems

Shape and Outliers

Stem plots, like histograms, show the overall distribution of data. To visualise the shape more easily, imagine rotating the stem plot 90 degrees counterclockwise, its shape now can be interpreted in the same way as for a histogram or dot plot - look for symmetry, peaks, or long tails to determine if the distribution is symmetric or skewed.

Outliers in a stem plot are data points that are separated from the main cluster of data, appearing as isolated stems with leaves.

Summary Table with Examples

Positively skewed	Negatively skewed

Symmetric	Outliers

Mode, Mean, and Median

The mode, mean, and median are key measures that provide valuable insights into the central tendency of the data. The methods for calculating these measures are similar to those used in dot plots.

Worked Example

The stem plot shows the duration of the calls that Lucy makes each day.

The stem plot has key 3|0 = 30 minutes. The stems are as follows: Stem 0: Leaves 2, 2, 3, 5; Stem 1: Leaves 0, 0, 1, 2, 3; Stem 2: Leaves 6, 6, 6; Stem 3: Leaves 0, 1, 1, 1, 2, 3; Stem 6: Leaf 2.

a) Find the most frequent call duration that Lucy makes each day.

In this stem plot, the number 6 with stem 2 and the number 1 with stem 3 both appear three times, making them the most frequent values. Therefore the modes of this dataset are 26 and 31.

Note that a dataset can have multiple modes if there are several values with the same highest frequency.

b) Find the mean duration of the calls Lucy makes each day, rounded to the nearest whole number.

\(\textrm{Mean}=\frac{2\times 2+3+5+10\times 2+11+12+13+26\times 3+30+31\times 3+32+33+62}{19}=20.8421\cdots\approx 21\)

c) Find the median call durations Lucy make each day.

The total number of data values is 19, so the median is located at the \(\frac{19+1}{2}=10\textrm{th}\) value, which is 26.

Measure of spread

The measure of spread including range, interquartile range, and standard deviation for a stem plot are similar to those used in a dot plot and provide insight into how the data is distributed around the centre.

Worked Example

The stem plot shows the average number of hours per week a group of 26 people spent on the internet.

The stem plot has key 2|3 = 23 hours. The stems are as follows: Stem 1: Leaf 4; Stem 4: Leaves 0, 0, 7; Stem 5: Leaves 2, 3, 4; Stem 6: Leaves 1, 1, 2, 3, 6; Stem 7: Leaves 2, 2, 4, 7; Stem 8: Leaves 0, 1, 1, 2, 2, 3, 5, 5, 5; Stem 9: Leaf 3.

a) Find the range of the average number of hours per week spent on the internet by this group.

\(R=93-14=79\)

b) Find the interquartile range of the average number of hours per week spent on the internet by this group.

There are 26 values in total. This means that there are 13 values in the lower ‘half’ and 13 values in the upper ‘half’.

Lower half: 14, 40, 40, 47, 52, 53, 54, 61, 61, 62, 63, 66, 72

Upper half: 72, 74, 77, 80, 81, 81, 82, 82, 83, 85, 85, 85, 93

\(Q_{1}\) is the \(\frac{13+1}{2}=7\textrm{th}\) value of the lower half, 54.

\(Q_{3}\) is also 7th value of the upper half, 82.

\(IQR=Q_{3}-Q{1}=82-54=28\).

c) A calculator is used to determine the standard deviation of this stem plot to be 18.42, rounded to two decimal places. Interpret this value in the context of the stem plot.

A standard deviation of 18.42 indicates that, on average, the data points are 18.42 units away from the mean. This suggests that the data is relatively spread out. However, since the stem plot is negatively skewed and includes a potential outlier of 14, it would be more appropriate to use the median and Interquartile range to measure the centre and spread, as these are less affected by extreme values.

Check your understanding View

Box Plot

A box plot (or a box-and-whisker plot) is a graphical representation of numerical data that visually displays the five-number summary, which consists of:

Minimum - the smallest value in the dataset
Lower quartile \(Q_{1}\) - the median of the lower half of the data (25th percentile)
Median \(Q_{2}\) - the middle value of the dataset (50th percentile)
Upper quartile \(Q_{3}\) - the median of the upper half of the data (75th percentile)
Maximum - the largest value in the dataset

An outlier is a data point that lies more than 1.5 times the interquartile range (IQR) beyond the quartiles. The boundaries for identifying outliers are:

Upper fence: \(Q_{3}+1.5\times IQR\)
Lower fence: \(Q_{1}-1.5\times IQR\)

Data values outside these fences are classified as possible outliers and are plotted separately. However, if a data point lies exactly on an upper or lower fence, it is not considered an outlier.

Box plots are extremely useful for summarising large datasets in concise plots. Box plots provide a visual summary of key statistical measures, making it easier to identify patterns and potential outliers.

Check your understanding View

Distributions of Box plots

Box plots provide a concise and effective graphical representation for describing and comparing the shape, centre, spread, and outliers of a distribution.

Shape and Outliers

The shape of the distribution can be identified by the position and length of the whiskers, as well as the location of the median.

Symmetric distribution: The median is located near the middle of the box, and the whiskers are approximately equal in length. In this case, the mean and median are usually similar.
Positively skewed distribution: The median is off-centre towards the left hand side of the box, and the right whisker is longer than the left. This often causes the mean to be greater than the median.
Negatively skewed distribution: The median is off-centre towards the right hand side of the box, and the left whisker is longer than the right. This often causes the mean to be less than the median.
Distribution with outliers: The box plot with outliers usually has large gaps between the main body and the data values in the tails, with outliers represented as dots separate from the box and whiskers.

Centre

The centre of the distribution in a box plot is indicated by the position of the median.

Spread

The spread refers to the variability of the distribution, it reflects how spread out the data values are within the set. The spread can be measured using the IQR or range in a box plot. However, IQR is less affected by outliers.

Summary Table with Examples

Positively skewed	Negatively skewed

Symmetric	Distribution with outliers

Worked Example

Describe the distribution represented by the boxplot in terms of shape and outliers, centre and spread. Give appropriate values.

The box plot has a horizontal axis from 0 to 50, with a dot above 5. A horizontal line spans from 12.5 to 15, followed by a box with vertical lines at 15, 20, and 30. The horizontal line extends from 30 to 40, and a dot appears above 50.

The distribution is positively skewed with outliers. The distribution is centred at 20, the median value. The spread of the distribution, as measured by the IQR, is 15, as measured by the range, 45. There are two outliers: 5 and 50.

Check your understanding View

Taking it further

Your feedback matters

We want to hear from you! Let us know what you found most useful or share your suggestions for improving this resource.

Consolidate your knowledge: Practice questions View

Back

Choosing the appropriate data display

Bar Charts

Check your understanding View

Histogram

Mean

Median

Mode/Modal Interval

Features of Data Distribution from Histograms

Shape

Outliers

Check your understanding View

Dot Plot

Shape and Outliers

Summary Table with Examples

Measures of Centre

Worked Example

Worked Example

Measures of Spread

Worked Example

Worked Example 1

Worked Example 2

Check your understanding View

Stem Plot

Shape and Outliers

Summary Table with Examples

Mode, Mean, and Median

Worked Example

Measure of spread

Worked Example

Check your understanding View

Box Plot

Check your understanding View

Distributions of Box plots

Shape and Outliers

Centre

Spread

Summary Table with Examples

Worked Example

Check your understanding View

Taking it further

Prepare for timed assessments

Responsible and ethical use of AI

Science lab report

Chemistry

Physics

Biology

Your feedback matters

Consolidate your knowledge: Practice questions View