Lies, damned lies and littered with letters all over the place

If you’re starting to read research papers and don’t have a background in statistics, you may be finding some of the many letters, tables and symbols somewhat off-putting. It is an unfortunate fact that many really great research papers look and feel inaccessible to those who might find them most useful because they are impenetrable at first, second and even third glance (we certainly find this, sometimes). As part of our work at Cambridge Maths is reading and pondering over oodles of research, an observer might smile at the amount of time we spend frowning over tables and symbols as we try to puzzle out exactly what is going on. This happens to everyone, in our experience. But, please, be undaunted: one place to start is to consider some of the most common statistical techniques and letters used when doing quantitative research (counting stuff) – here are some of the ones we come across most often.

n is the number of something, sometimes called frequency – usually the total number of people or things in the study. An n=1 study would be a single-subject study with just one person in it; n = 66,000,000 would be a study of approximately all the current inhabitants of the UK.

s, sd or σ is standard deviation, which is a measure of how spaced or spread out the data is. A low value means the data is quite close to the middle, while a high value means it’s not. (It is important to remember that ‘low’ is in context of the data being used). It’s ‘standard’ because if the underlying data is shaped in a bell curve (the shape that data makes the most often), we can say the same thing for any dataset:

This means you can in some way compare spread across all kinds of different data sets (very cool). Data further than two standard deviations away from the mean (the middle) are sometimes said to be ‘significant’ (stats term for ‘worth poking around in’). Look out for the sample standard deviation (the spread of the data in the researcher’s sample) versus the population standard deviation (the spread of data in the whole population as far as is known).

significance level or α value is the cut-off point that the researcher has used to try to tell whether something is likely to have been an actual effect of the thing they’re studying, or just happened by chance. Significance levels tend to be 0.05, 0.01 or 0.001, which means the probability of this particular event (or a more extreme example) happening by chance is around 5%, 1% or 0.1%. A smaller critical value therefore suggests a higher degree of confidence that the effect is likely to be important, rather than just by chance. Good research should usually set a significance before it starts, otherwise the researchers run the risk of ‘adjusting down’ to make it look like the results say something. A result that is very close to a boundary is not generally very reliable.

T or the test statistic is a value calculated from the data so that the researcher can test for significance (‘is this data saying anything important?’). The researcher chooses a model to fit based on key assumptions about the data and then does the test by comparing to that model. Common tests are the t-test or z-test.

p or p values are probabilities (hence p) of a test statistic occurring by chance. They are found from T values of different tests and compared to the chosen critical value to decide if a result is significant. A massive p value (like 0.5) would mean that the experiment is pretty useless at telling you anything.

F or F-test is a way of comparing two groups (often a small subset of the population to the populations) to see if the differences in one are interesting or different compared to the differences in another.

X² (chi-squared) or X² test (often looks very intimidating with lots of tables) is a more general test and can be used for data that is not numerical. For a set of data sorted into discrete groups, a chi-squared test asks ‘Are these results very different to what we would expect to happen if there was no relationship here?’ You calculate ‘expected frequencies’ (how many items would we expect in each group if it were to happen by chance?) and then calculate the ‘actual frequencies’ (how many times has this happened in our data?) and compare the two. Big differences might suggest a relationship between categories; small differences might suggest nothing but chance is going on and the things probably aren’t related. Look out for cells in the table that have small frequencies – lots less than 5 probably means there isn’t enough data to make any real conclusions.

r is also known as Pearson’s correlation coefficient, which is a very fancy way of saying ‘how related two things are to one another according to the data’. (r technically means the correlation coefficient for the sample, rather than the whole population.) It can take any value between -1 and 1, where values around 0 mean ‘there probably isn’t a relationship’ and values nearer -1 or 1 mean ‘there probably is some kind of relationship’. For example, you would expect a correlation between a person’s handspan and their height (a larger handspan means a greater height is a bit more likely), although it wouldn’t be a very strong one – see below for an example group of data. Here, r = 0.56, which is not that strong. Roughly, you might call an r value of 0.6 or greater ‘strong’ and 0.8 or greater ‘very strong’. Negative r values work in the same way, they just suggest a different kind of relationship – as one thing increases, the other tends to decrease, like the miles you travel and the amount of patience you have for your children asking if you’re there yet. BEWARE the very common fallacy of ‘correlation implies causation’, though! People sometimes (wrongly) suggest that their large r value means that one thing ‘causes’ another, when actually it just means they might be related in some way. You can have a lot of fun with sites like this and this, which play with correlations.