What's the difference?

A collection of blue tiles, with one orange tile in the middle

Comparing the heights of 13 year olds to 14 year olds

Data sampled from https://new.censusatschool.org.nz

Take a look at the picture above and compare the two images.

What is different? What is the same? These kinds of questions get asked in statistics classrooms all over the world, but they hide a level of nuance and complexity that deserves further consideration.

Because of the way statistics is often taught, it is easy for teachers to ignore – and therefore easy for students to miss – some of the subtleties that are apparent when handling data. When students are first introduced to statistics at school they may investigate data sourced from within their classroom and engage in descriptive statistics – making direct comparisons between complete sets of data with some sense of certainty. In the data above, the median height for 14 year-olds is greater than the median height for 13 year-olds, the range for 14 year-olds is smaller than the range for 13 year-olds, and so on. In the descriptive situation, where the data is taken from all students in the two year groups for example, this is sufficient and comparisons can be made definitively.

Later, students begin to work with samples and the game radically changes. Any comparison made now depends on an inference to the population that is merely conjectured from the data in the sample. If we treat the data above as samples, the facts about the median and the range remain true, but now we can no longer be certain what they mean. The true values for the population are masked and our conclusions are no longer obvious. Is the median truly larger for 14 year-olds? Or might that only be true for these samples?

Formal inferential techniques provide tools for statisticians to quantify the level of certainty that can be attributed to decisions made from samples, but what might this look like in a secondary or even a primary school classroom, where only a more informal approach may be accessible to students?

There is a growing body of research (for example, Pratt et al., 2008) that suggests teachers should encourage students to consider whether they are playing game 1 (describing or comparing a population) or game 2 (working with a sample from a larger population). It may seem obvious, but often this is not clear to students. A population of thirty students from a single class may have little that obviously distinguishes it from a sample of thirty students from the whole school. Worse, the phrasing of the question under consideration may alter the game being played by students on the same data set. A simple example of this can be seen in the data above. Game 1 “Are these 14 year-olds taller than these 13 year-olds?” becomes game 2 when reworded as “Are 14 year-olds taller than 13 year-olds?”

This distinction is subtle, and if teachers don’t create opportunities to highlight it then there is no reason why students should notice it, and hence select appropriate pathways when working in the different situations.

A second important approach that has been recommended by researchers is to use a model for informal inference that can be adapted to the students’ current understanding. Makar and Rubin (2009) suggest that when making statistical statements, students should provide:

a generalisation
a justification from the data
a probabilistic statement referring to confidence in the prediction

For the height data this could be something like: ‘14 year-olds are slightly taller than 13 year-olds in this sample as suggested by the median for both; however the difference is small so it’s likely that this may only be true for this sample’. It doesn’t matter that the assessment of confidence in the conclusion is subjective and potentially unreliable – the key is promoting the idea that reasoning from samples is not deterministic, laying the groundwork for more formal techniques later.

By building informal inference into student experiences of statistics education from an early age, a more robust concept of statistical techniques can be supported, improving data literacy for all students – even if they need never progress to more formal techniques.

References:

Makar, K., & Rubin, A. (2009). A framework for thinking about informal statistical inference. Statistics Education Research Journal, 8(1), 82–105.

Pratt, D., Johnston-Wilder, P., Ainley, J., & Mason, J. (2008). Local and global thinking in statistical inference. Statistics Education Research Journal, 7(2), 107–129.

Join the conversation: You can tweet us @CambridgeMaths or comment below.