Jargon Buster
Sample
- What does this mean?
-
An extract from an overall population, usually used as a part of a survey or analysis exercise. There is more than one sort of sample.
A random sample is a sample that is carried out in a way that ensures every member of the relevant population has an equal chance of being selected, for instance choosing every fifth telephone booker.
A stratified sample is a sample that has been designed to reflect the nature of the relevant population. Thus if a particular audience is 60% female and 40% male, a stratified sample of that audience would be composed of 60% women and 40% men.
See also below and SAMPLE SIZE.
- How did we get this definition?
-
A sample is an extract from an overall population. Such an extract is like taking a ‘snapshot’. It is used to find things out about the population, and to make valid generalisations about it (for instance, things such as: the population’s average age; its social background; or attendance patterns).
Hence ‘sampling’ is the process of taking such a sample. Typically this is done using a survey, a questionnaire or through other analysis of data extracted from a wider data set.
While not quite as definitive as examining the entire population being considered (when the relevant exercise would be a ‘census’), using samples has three particular virtues.
- It tends to require lower costs, time and effort than surveying the entire population.
- It tends to be more easily done.
- Thanks to a technique known as ‘statistical inference’ – it is still possible to draw conclusions about the population on the basis of findings generated from the sample.
But there are also a number of potential drawbacks, complications and vital considerations involved in using samples.
- Related and similar definitions
-
A sample, then, is a snapshot of a larger population. As a result, although it is still possible to make generalisations about that population based on sample findings, these findings are inevitably less firm (i.e. less accurate) than ones built on an examination of the entire relevant population. Some important procedures for reducing the inherent inaccuracies of samples are outlined below.
Avoiding bias
For a start, the robustness of sample findings can be underminded if the way in which the sample is devised and surveyed has not taken care to reduce bias.
Deborah Rumsey warns that:
'Bias is systematic favouritism that is present in the data collection process, resulting in lopsided, misleading results.'
In fact such bias is so insidious that it can creap into a survey process in three ways.
- Sample bias (also sometimes called design bias) happens when a researcher has not taken care to use a sample that is representative of the entire population. This danger can be reduced either by using a random sample (where every case has an equal chance of being included - e.g. every fifth person) or by using a 'stratified sample' (i.e. one that has been specially devised to reflect the population from which it has been taken - for instance, in terms of age groups, gender mix, or social background).
- Measurement bias results from not asking the appropriate questions that would actually help in meeting the research objective. Here, in a useful online tutorial, The Roper Center [sic.] at the University of Connecticut observes: 'Measurement Error or bias that occurs when surveys do not survey what they are intended to measure. This type of error results from flaws in the instrument, question wording, question order, interviewer error, timing, question response options, etc. this is perhaps the most common and most problematic of errors facing the polling industry.' This sort of bias can be avoided by taking pains to ensure that the questions asked of a sample during a research exercise are a logical consequence of what you are trying to find out.
- Then non-response bias can also have an undermining and damaging effect on survey findings. This is all to do with low response rates leading to problems with the survey's coverage of the population. Hence, although there is no definitive industry standard as to what is a 'good' response rate, if the proportion of people responding from one part of the population exceeds the proportion of respondents drawn from another part of the population (for instance, if more men respond than do women), then there will inevitably be further built-in bias to the sample and its findings. Thus it will be unrepresentative of the population. An important consideration here, then, is to ensure that the sample is representative of the population. This means staying alert to the patterns of response and non-response both to a survey as a whole and to the individual questions within it.
Deciding how confident you want to be
Once everything that can be done to avoid bias has been done, the next stage in making the most of a sampling exercise is to decide which level of confidence you want to work at.
Confidence levels (also known as 'confidence intervals') are an indication of how much trust and certainty can be placed in a set of sample findings. In other words, this is a measure of how likely it is that the findings will represent the true state of the population.
These levels are usually written as a percentage, and as a result make a definative statement about a sample finding's accuracy. However, take care how such intervals are understood and interpreted. When a sample is working at the 99% level of confidence, commonsensical usage might suggest that you were more or less certain of the findings. However, a confidence level of 99% means much more than 'you can be more or less certain of something'. What it is actually saying is that if the sampling exercise was repeated 100 times, you would be likely to get the same findings on 99 out of those 100 occassions (i.e. 99 ÷ 100 = 99%).
Statisticians typically prefer to use a 95% level of confidence as an industry standard. And although no one seems to know why 95% is preferred (rather than 94% or 96%), it is here recommended that - wherever possible - 95% is used.
Other numerical issues
Even if painstaking steps have been taken to avoid the effects of bias, and to set a desired level of confidence, the findings resulting from a survey based on a sample can still be compromised by a number of other numerical considerations.
This is because the robustness of the sample results are influenced by a trio of factors. These are:
- the give or take (i.e. plus or minus [+-]) tolerance of error you are willing to accept for your results - this is the tolerable limit of error you would be happy to work with
- the expected findings you think may result from a particular question and
- the actual size of the sample
![[Diagram representing factors iminging on the reliability of survey findings]](http://www.aduk.org/images/jargon/large/sample-reliability.jpg)
Here 'limit of error' is actually about how wide a margin for sampling error you would be happy to live with, before your findings were underminded.
Then the matter of 'expected findings' is essentially about what finding you would expect to get from a question. Let's say that in previous surveys your organisation asked people to rate its services. As a result you found that the average percentage of people who said that they feel the services are 'good' is 40%. Consequently you should use 40% as the predicted response here.
- When to use
-
Samples and sampling theory can be applied to a range of research activities. Hence this is particularly useful when used in relation to surveys and questionnaires, as well as analysis of data extracted from a wider data set.
