Jargon Buster
Margin for sampling error
- What does this mean?
-
A statistical term representing the degree to which a sample or analytical finding may be inaccurate due to the size of the sample used.
- How did we get this definition?
-
In an ideal world, the statistics used to make and shape decisions – or to provide an understanding of audiences – would be completely accurate and reliable. A sense of accuracy and robustness inevitably inspires confidence in the reader, and so enhances the credibility of any observations or claims being made.
But sadly, life’s not always like that. Indeed findings resulting from market research – or from the analysis of data – are a striking instance of this if they are based on an examination of a sample. For instance, an audience survey based on a sample of 500 people might suggest that ‘47% of attenders at a particular event came because they saw a poster.’ Yet it’s always wise to ask how big a ‘pinch of salt’ should this finding be taken with?
Statistical theory recognises this potential and the inherent inaccuracy of such findings by providing a way of dealing with it. This enables us to use less than reliable data but to still draw conclusions about the real, overall situation from it in a way that is convincingly robust. This is thanks to the set of techniques known as ‘statistical inference’.
Achieving the absolute perfection of completely robust findings means examining every instance of the issue under consideration, (i.e. in the example used here: everyone who came to the specific event in question). Such a research exercise that deals with the entire relevant population is known as a ‘census’.
However, practical issues (such as cost, resource and time factors) mean that it’s not always possible to conduct a census. Instead we might resort to what the terminology calls a sample-based survey.
When a sample-based survey is carried out, this tends to be based on a sample of a wider population. It’s already been noted that complete accuracy – in terms of any findings (such as the proportion of people who do a particular thing or who fit a particular profile) – would only be available if we were to carry out a survey of the entire given population (i.e. a census). So a survey based on a sample drawn from a wider population can only ever be a snapshot of the relevant population (see diagram).
![[Diagram representing a sample within a population]](http://www.aduk.org/images/jargon/large/sample.jpg)
And because we are working with a snapshot here, this means that any findings drawn from that sample-based survey will potentially suffer from a degree of inaccuracy – because the findings may be slightly unrepresentative of the population as a whole. Hence any sample-based finding is only an approximation of the true state of the population.
One branch of statistical knowledge (known as ‘sampling theory’) enables us to calculate the degree to which a sample based finding (such as ‘47% of people who came, came because they saw a poster’) could be inaccurate. The extent of this degree of inaccuracy is stated as a plus or minus (a +/- or a ‘give or take’) figure and is known as ‘the margin for sampling error on a proportion finding’. Furthermore it’s worth noting that the size of this margin gets bigger as the size of the sample used gets smaller.
- Related and similar definitions
-
Typically calculations of this kind are carried out at the 95% confidence level (generally because using a 95% confidence level is a sort of ‘industry standard’ for statisticians). When survey findings are analysed at the 95% confidence level, this means that we can be confident that if the same survey was done 100 times, we would be likely to get the same findings on 95 of these 100 occasions.
To find the margin for sampling error (at the 95% confidence level) relating to a given survey finding that is a percentage, you could: EITHER look up the margin for the finding you get and the size of the sample involved, by using the ready-reckoner table shown below OR for greater precision, you could work it out using the formula shown in the formulae and worked examples section (D2).
Finding Size of sample 100 250 500 1,000 1,500 2,000 5% 4.3 2.7 1.9 1.4 1.1 1.0 10% 5.9 3.7 2.6 1.9 1.5 1.3 15% 7.0 4.4 3.1 2.2 1.8 1.6 20% 7.8 5.0 3.5 2.5 2.0 1.8 25% 8.5 5.4 3.8 2.7 2.2 1.9 30% 9.0 5.7 4.0 2.8 2.3 2.0 35% 9.3 5.9 4.2 3.0 2.4 2.1 40% 9.6 6.1 4.3 3.0 2.5 2.1 45% 9.8 6.2 4.4 3.1 2.5 2.2 50% 9.8 6.2 4.4 3.1 2.5 2.2 55% 9.8 6.2 4.4 3.1 2.5 2.2 60% 9.6 6.1 4.3 3.0 2.5 2.1 65% 9.3 5.9 4.2 3.0 2.4 2.1 70% 9.0 5.7 4.0 2.8 2.3 2.0 75% 8.5 5.4 3.8 2.7 2.2 1.9 80% 7.8 5.0 3.5 2.5 2.0 1.8 85% 7.0 4.4 3.1 2.2 1.8 1.6 90% 5.9 3.7 2.6 1.9 1.5 1.3 95% 4.3 2.7 1.9 1.4 1.1 1.0 (Table shows the percentage to add or subtract from a finding to infer the likely case for the overall population in question at the sample sizes shown. Idea for ready reckoner drawn from E. Hill, C. O’Sullivan and T. O’Sullivan [1995] Creative Arts Marketing, Butterworth Heinemann, with calculation of above figures performed by current author.)
For instance, say a survey finding suggests that 40% of an audience travels to an arts facility by car, and this is based on a sample of 500 people, then the likely proportion of all attenders at that facility who come by car is 40% +/- 4.3%. So between 40%-4.3% and 40%+4.3% come by car, i.e. 35.7% and 44.3% (estimated at the 95% level of confidence) (see diagram).
![[Diagram representing margin for sampling error]](http://www.aduk.org/images/jargon/large/margin.jpg)
- When to use
-
This set of techniques, procedures and protocols is particularly useful when evidence of the robustness of the finding is required: for instance, when any decision based upon it is one of high importance, or when you want the reader to be convinced of the finding arrived at.
Formulae & Worked Examples
The formula for finding the margin for sampling error (referred to here as ‘MFSE’ for short) relating to a finding that is a percentage, done at the 95% confidence level is:
MFSE = 1.96 × √(P × (100 - P) ÷ n)
Where:
- MFSE is the margin for sampling error
- 1.96 is the constant used for doing this calculation at the 95% confidence level (ie A figure of 1.96 is always used when working at the 95% confidence level)
- √ means the square root of the part of the calculation shown to the right of the square root sign
- P is the finding being test and
- n is the sample size.
Here please note that the square root found is that of everything that comes after the square root sign (i.e. (P × (100-P)) ÷ n ) .
Also please note that this formula only works for samples that are larger than 30.
Worked example
An art gallery carries out a visitor survey to which 1,000 people respond. This finds that 23% of responding visitors are in the age group aged 15 to 24 years old.
The gallery wants to know what proportion of all visitors are likely to be aged 15 to 24 years old, and it wants to know this at the 95% level of confidence.
So the calculation here will be:
MFSE = 1.96 × √(P × (100 - P) ÷ n)
= 1.96 × √(23 × (100-23) ÷ 1,000)
= 1.96 × √(23 × 77 ÷ 1,000)
= 1.96 × √(1,771 ÷ 1,000)
= 1.96 × √1.771
= 1.96 × 1.33
= 2.6Hence the proportion of visitors aged 15 to 24 years old among the entire population of gallery visitors, is likely to be 23% +/- 2.6%. That is, it will fall between 20.4 and 25.6%.
One last thing to bear in mind. If your aim in performing such a calculation is to inspire enhanced confidence in the reader, it’s always advisable to annotate the finding with the aspects of the situation that has led to it. That is, don’t just say 'Between 20.4% and 25.6% of attenders at the Gallery are likely to be aged 15 to 24 years old', but add a note (in brackets, as a footnote or endnote) that also says: 'Sample size = 1,000, margin for sampling error = 2.6% at the 95% confidence level)'.
