Sample Sizes and Representativeness in Psychology and the Social Sciences
One of the biggest issues in psychology and the various social sciences (e.g., sociology) is that they involve people, and people are not always easy to work with. To apply the scientific method correctly requires devotion to both adequate repetition and adequate samples.
Consider a coin. If we assume that a coin is unbiased, then flipping it should result in heads coming up 50% of the time and tails coming up the other 50% of the time. Coin flipping is also an excellent way to illustrate the danger of small samples. What happens if you flip the coin twice and it comes up heads on both occasions? Would you consider the coin biased? I doubt that anyone would. Certainly, if you accused someone of using a rigged coin just because the coin came up heads twice in a row, most people would think you were crazy.
But how many times would the coin have to come up heads before you started to wonder? If you flipped the coin fifteen times, and it came up heads all fifteen times, I’m sure most people would begin to wonder. If you flipped the coin two hundred times, and it came up heads each time, then almost everyone would guess that the coin is biased.
Why the change of heart?
Even if most people can’t express it mathematically, most people do have an intuitive grasp of what’s going on. In small samples, seemingly extreme outcomes are often not that unlikely. The probability of a coin that is flipped twice coming up heads on both occasions is 25% (50% x 50%). But as the sample grows larger, the probability of seeing extreme events becomes increasingly small. For instance, the probability of a coin being flipped fifteen times and coming up heads on each occasion is 0.003%.
How does this relate to humans?
Obviously, humans are a lot more complex than coins. However, the basic principle still applies. If you want to learn about general human behaviour, then you need to look at a lot of humans.
Consider the question of intelligence. If you want to know how intelligent the average person is, how many people should you test? I think everyone would agree that testing one person is a terrible idea. Ten? That doesn’t sound very good either since there are billions of people in the world. What about a hundred? How about a thousand? What about ten thousand?
But it gets more complicated.
If we assume that the manufacturing process is good, then coins are, by and large, extremely similar in terms of their weight, composition, and shape. In contrast, humans are known to vary along a great many physical (e.g., height, weight, body shape) and psychological (e.g., intelligence, personality) dimensions. How meaningful is the mathematical average calculated by ‘averaging’ across different people?
The answer, as it so often is with psychology and the social sciences, is that it depends.
It depends on how much error you’re willing to accept and how general you want your conclusions to be. For instance, if you want to rank order the mathematical ability of students in a particular class, then giving them a mathematics test isn’t actually that bad an idea.
But what about if you want to examine the cultural knowledge of an entire country? In many countries, there are large differences in what different areas consider to be culturally important (see e.g., the cultural differences between different parts of the United States, as well as the different parts of places like France, China, and even South Africa). Even within the same area, different groups of people may consider different thing to be culturally important. Just look at the differences in culture between different neighbourhoods of a large city like New York or even cities like Sydney, Tokyo, or Hong Kong.
The point of this discussion should be getting clearer. When dealing with people, it is not enough to have a large sample. That sample must also be representative. In other words, the demographics of the sample should reflect the demographics about which you wish to draw conclusions.
If you want to talk about the personality of the average Australian, then you need to look at not only a large sample of Australians but also at a large sample of Australians that reflects the characteristics of the general Australian population. Confining your sample only to people from Sydney or only people above the age of fifty would be experimental malpractice. Likewise, focusing only on Melbourne and young people would also be extremely poor conduct.
As you can imagine, this has implications for psychology and the social sciences, which often rely on student volunteers from university to conduct their experiments. Strictly speaking, it is not valid to generalise findings from studies that rely only on university students to the general population. University students are not representative of the general population. They are typically young, and their mere presence at university generally indicates a certain level of education and intelligence. It is entirely possible that university students do not psychologically resemble the general population. Why should they when simply getting into a university requires going through a selection process? It would be like trying to estimate the fitness of the average person by looking only at athletes.
Most researchers have at least passing familiarity with these issues, and the more astute have railed against them for decades. However, the simple fact of the matter is that obtaining a truly representative sample for human research would essentially require a form of conscription. Simply relying on volunteers (even including volunteers from outside of university) would not fix the problem because it is quite possible that there is something psychologically different about people willing to volunteer for research.
Of course, I am not advocating that we forcibly conscript people into participating in human research. That would be crazy. One approach that has been tried is offering to pay people for participation. Unfortunately, this runs into the same problem. Not everyone is willing to participate in research, even for money, so how do we know that the people who are willing to be paid to participate are the same, psychologically speaking, as those who are not?
These problems sound horrible, but they do not invalidate psychology or the social sciences. What they do suggest, however, is that more caution be exercised when interpreting the results of research that often relies on samples that are relatively small and less than representative (e.g., a large quantity of research relies on sample sizes less than two hundred and made up entirely of university students). Likewise, efforts should be made to expand sample sizes and their representativeness. Indeed, many statistical techniques (e.g., Factor Analysis and Structural Equation Modelling) are essentially worthless on small sample sizes, making their growing use in certain areas plagued by small sample sizes (e.g., abnormal/clinical psychology) quite worrying.
Research into psychology and the social sciences has the potential to bring about great good but it also has the potential to do great harm. Care should thus be taken to ensure that the best possible samples are used: samples of sufficient size that are also representative of the populations the research wishes to generalise to.
If you want to read more about my thoughts on writing, education, and other subjects, you can find those here.
I also write original fiction, which you can find here.