# Using Excel to explore the variability of samples

- Cambridge Mathematics
- Mathematical Salad
- Using Excel to explore the variability of samples

## Using Excel to explore the variability of samples

Students are expected to learn how to collect a sample of data and discover that bigger samples are more representative. Without direct experience it is difficult to quantify the degree to which size affects the variability of the samples when compared to the population. Technology provides opportunities to explore this relationship further, and even begin to demonstrate the concept of multi-sampling to students before they go on to encounter the more technical concepts of formal statistical inference in further study.

It is straightforward in Excel to set up a spreadsheet that allows teachers and students repeatedly to take samples of differing sizes from a population and graph the results; all that is required are some data, a computer and a few simple commands.

**Setup**

First you will need some data. In this example we will use bivariate data concerning student grades (US) in a mid-term exam against the time it took to complete the test. It is good practice to encourage students to discuss what they think the relationship might be before showing them any graphs.

Record the data in a spreadsheet in two adjacent columns and create a third column called “Random”. In the first available row enter the formula “=rand()” into the cell and hit enter. This will generate a random number between 0 and 1. Copy this cell down the column for all the data by double clicking in the bottom right of the cell (your cursor should change from an arrow to a small cross when you are in the right place).

You should now have three columns in your spreadsheet: two for the data and a third containing random numbers. The F9 key causes Excel to refresh all formulae; if you hit it a few times you will see the random numbers change.

**Graphing **

Now add a graph. Decide on your sample, for example 10 data points, and highlight it, then choose Insert->Scatter from the menu tools. Don’t worry too much about the sample size for now as it is easy to change later.

At this point you will have a scatter graph based on the first ten items of data, but clearly this is not a random sample. To get some random data, simply click on the column index and choose sort and Filter->Sort smallest to largest, then “Expand the selection”

You should immediately see the data and the graph change. So what is going on here? When you select sort, Excel orders the rows by the random numbers from smallest to largest, and hence new values are put in the cells from which the scatter graph is constructed. If you look at the random numbers now though, they do not appear to be in order – this is because after Excel has ordered the list it recalculates all its formulae, generating a new list of random numbers.

To create a new sample simply repeat the process of sorting the “Random” column. If you wish to streamline the process, it is easy to record a Macro and create a button that can be pressed to select new data. Simply select “Record Macro” in the “View” tab, give it a name and run through the column sorting process once, then choose “Stop recording”. Insert a shape into the spreadsheet to act as a button, then right click and choose “Assign Macro” using the Macro you just recorded. Now each time you click on the button you generate a new sample.

**Tidying up **

To avoid confusion for students, it is possible to hide the column of random numbers by right-clicking on the column index and selecting “hide”.

You may also wish to add a line of best fit to the scatter graph, which can be done by right-clicking one of the data points on the graph and choosing “Add Trendline”.

Finally, if you want to explore the effect of changing the sample size, when the chart is highlighted, the data source is shown by coloured boxes on the spread sheet. These can be dragged in or out to create larger or smaller samples.

We have used bivariate data and scatter graphs here, but the same basic setup could be used to generate means (=AVERAGE), medians (=MEDIAN) or frequency charts for discrete data (using =COUNTIF several times to create a frequency table from the sample and using this as the data source). See what you can create and share your ideas with us on Twitter @Cambridgemaths.