Jargon Buster
Sample size
- What does this mean?
-
The total number of cases or instances considered as part of a research exercise.
The robustness of a finding based on a sample will depend on the size of the sample used and on the way in which it was collected.
- Related and similar definitions
-
Unfortunately, there is no hard or fast view as to the optimum sample size to use.
Some writers do, however, suggest certain ‘rules of thumb’. Thus Curwin and Slater [1991] recommend using a sample of more than 30 instances. (This is because the maths involved behave differently for samples that are less than 30 in size.) Furthermore the Faculty of Computing, Information and English at the University of Central England Business School point out that: ‘Where the number of results is under 50 it will not be feasible to analyse any sub-categories of the data.’
Then Deborah Rumsey [2003] points out that for a finding to be reliable, the sample size multiplied by the expected finding (expressed as a proportion) needs to equal at least 5. The sample size multiplied by one minus the expected finding (stated as a proportion) also needs to be at least 5.
Elsewhere other authorities recommend different minimum sample sizes. For instance, the data analysts group from the national audience development agencies recommend using a sample of at least 150. The Gallup Organization – famed for its opinion polls in the United States – uses an average sample size of 1,000 for its national surveys. And in an extremely useful guidance note on collecting data using surveys (which forms part of its Strategy Survival Guide), the Prime Minister’s Strategy Unit says that:
‘The sample size needs to be an adequate size, in order to generalise from the survey's findings. Provided that the sample size is representative of the population, the larger the sample size, the more confident you can be that the results are an accurate reflection of the population as a whole. The key factor is the absolute size of the sample, rather than the proportion of the population that gets included in the sample. Adequate samples can be estimated from the expected variation in the major variables of interest, and will therefore depend on the specific question or hypothesis to be tested. As a general rule of thumb, adequate samples will generally involve more than 30 events or people. Most market research companies use samples of around 1000–2000.” (current author’s emphasis).
So what – in the face of such varied advice – is an under-resourced arts organisation to do?
The whole point is that, starting with the size of the sample you wish to use, is rather like trying to push a piece of string along. It is starting the process from an unhelpful place. It is far more helpful to begin by deciding on four basic but essential factors, namely:
- What you want to find out (i.e. your research objective)
- The level of confidence you want the sample to work at
- The ± degree of tolerance you are willing to have for sampling error (the ‘limit of error’) and
- The finding you expect to get (if in doubt use 50% here).
Once you have established all of this, it is possible to work out the optimum sample size to use. The actual formula involved is provided in the formulae section below. But in case you prefer to avoid the algebra, the ready reckoner below shows the optimum sample sizes for a range of expected findings and tolerable limits of error. These are all worked out assuming that the sample exercise will work at the 95% confidence level.
Ready reckoner for Sample Sizes working at the 95% confidence level
Expected finding Tolerable limit of error ±1% ±2% ±3% ±4% ±5% 95% 1,825 456 203 114 73 90% 3,457 864 384 216 138 80% 6,147 1,537 683 384 246 70% 8,067 2,017 896 504 323 60% 9,220 2,305 1,024 576 369 50% 9,604 2,401 1,067 600 384 40% 9,220 2,305 1,024 576 369 30% 8,067 2,017 896 504 323 20% 6,147 1,537 683 384 246 10% 3,457 864 384 216 138 5% 1,825 456 203 114 73 For instance, if you want to work at the 95% confidence level, can tolerate a margin for sampling error of ± 3%, and are using 50% as the value for the expected finding, you should use a minimum sample of 1,067 or more.
There’s one further thing to bear in mind. Generally speaking, the larger the sample size used, the more reliable the findings that can be generated (as is borne out by the ready reckoner, since it can be seen that the smaller the tolerable limit of error – and thus the greater the desired reliability – gets, the larger the sample size needed becomes).
However, here it’s worth noting that, while in the case of samples, size is important, the relationship between sample size and the reliability of the sample findings is not that of a ‘straight line’. In other words, doubling the sample size does not result in a doubling of the findings’ reliability.
The diagram below shows how there is a ‘diminishing returns’ effect here. Hence the chart shows the extent to which reliability improves (in terms of the incremental reduction in the margin for sampling error) as a consequence of an increase in sample size. (The situation depicted here assumes that the confidence level being used is 95% and the expected answer used is 50%.) Here the scale down the left-hand side shows the reduction of the margin for sampling error when compared to the previous sample size charted, while the horizontal scale along the bottom shows the relevant sample sizes. Consequently it can be seen that increasing the sample size from 250 to 500 leads to a narrowing of the margin for sampling error by 1.82 percentage points. But increasing the sample size from 500 to 750 only leads to a reduction of the margin for sampling error of 0.8 percentage points. And so on. So it is always as well to identify at which point the improvements in accuracy are outweighed by the costs and effort required to increase the sample size to a particular level.
![[Graph showing reductions in MISE as sample size increases]](http://www.aduk.org/images/jargon/large/sample-size-graph.jpg)
- When to use
-
Samples and sampling theory can be applied to a range of research activities. Hence this is particularly useful when used in relation to surveys and questionnaires, as well as analysis of data extracted from a wider set.
Formulae & Worked Examples
Rules of thumb and ‘ready reckoners’ can be used to give an approximate indication of the sample size to use. Yet even doing this requires that the desired confidence level, the tolerable limits of error, and the expected research finding are known.
And if these issues have been thought about, then researchers are well placed to use them in a calculation that will give an exact figure for the sample size needed.
This formula can be found in a number of online resources. It is:
Sample size = [Z2 × P × (1 - P)] ÷ E2
Where:
- Z is the standard multiplier for the desired confidence level
- P is the expected finding (expressed as a proportion)
- E is the limit for error (again expressed as a proportion). Hence if you were wanting to work at the 95% confidence level (where the standard multiplier to use is always 1.96), the expected finding is 30% (or 0.3 when expressed as a proportion), and the tolerable limit of error is ± 5% (or ± 0.05 when expressed as a proportion), the calculation would be as follows:
Hence the optimum sample size here would be 323 or more.Finding Size of sample 100 250 500 1,000 1,500 2,000 5% 4.3 2.7 1.9 1.4 1.1 1.0 10% 5.9 3.7 2.6 1.9 1.5 1.3 15% 7.0 4.4 3.1 2.2 1.8 1.6 20% 7.8 5.0 3.5 2.5 2.0 1.8 25% 8.5 5.4 3.8 2.7 2.2 1.9 30% 9.0 5.7 4.0 2.8 2.3 2.0 35% 9.3 5.9 4.2 3.0 2.4 2.1 40% 9.6 6.1 4.3 3.0 2.5 2.1
