razieltwelve

Because writing should be fun

Archive for the tag “science”

Sample Sizes and Representativeness in Psychology and the Social Sciences

One of the biggest issues in psychology and the various social sciences (e.g., sociology) is that they involve people, and people are not always easy to work with. To apply the scientific method correctly requires devotion to both adequate repetition and adequate samples.

Consider a coin. If we assume that a coin is unbiased, then flipping it should result in heads coming up 50% of the time and tails coming up the other 50% of the time. Coin flipping is also an excellent way to illustrate the danger of small samples. What happens if you flip the coin twice and it comes up heads on both occasions? Would you consider the coin biased? I doubt that anyone would. Certainly, if you accused someone of using a rigged coin just because the coin came up heads twice in a row, most people would think you were crazy.

But how many times would the coin have to come up heads before you started to wonder? If you flipped the coin fifteen times, and it came up heads all fifteen times, I’m sure most people would begin to wonder. If you flipped the coin two hundred times, and it came up heads each time, then almost everyone would guess that the coin is biased.

Why the change of heart?

Even if most people can’t express it mathematically, most people do have an intuitive grasp of what’s going on. In small samples, seemingly extreme outcomes are often not that unlikely. The probability of a coin that is flipped twice coming up heads on both occasions is 25% (50% x 50%). But as the sample grows larger, the probability of seeing extreme events becomes increasingly small. For instance, the probability of a coin being flipped fifteen times and coming up heads on each occasion is 0.003%.

How does this relate to humans?

Obviously, humans are a lot more complex than coins. However, the basic principle still applies. If you want to learn about general human behaviour, then you need to look at a lot of humans.

Consider the question of intelligence. If you want to know how intelligent the average person is, how many people should you test? I think everyone would agree that testing one person is a terrible idea. Ten? That doesn’t sound very good either since there are billions of people in the world. What about a hundred? How about a thousand? What about ten thousand?

But it gets more complicated.

If we assume that the manufacturing process is good, then coins are, by and large, extremely similar in terms of their weight, composition, and shape. In contrast, humans are known to vary along a great many physical (e.g., height, weight, body shape) and psychological (e.g., intelligence, personality) dimensions. How meaningful is the mathematical average calculated by ‘averaging’ across different people?

The answer, as it so often is with psychology and the social sciences, is that it depends.

It depends on how much error you’re willing to accept and how general you want your conclusions to be. For instance, if you want to rank order the mathematical ability of students in a particular class, then giving them a mathematics test isn’t actually that bad an idea.

But what about if you want to examine the cultural knowledge of an entire country? In many countries, there are large differences in what different areas consider to be culturally important (see e.g., the cultural differences between different parts of the United States, as well as the different parts of places like France, China, and even South Africa). Even within the same area, different groups of people may consider different thing to be culturally important. Just look at the differences in culture between different neighbourhoods of a large city like New York or even cities like Sydney, Tokyo, or Hong Kong.

The point of this discussion should be getting clearer. When dealing with people, it is not enough to have a large sample. That sample must also be representative. In other words, the demographics of the sample should reflect the demographics about which you wish to draw conclusions.

If you want to talk about the personality of the average Australian, then you need to look at not only a large sample of Australians but also at a large sample of Australians that reflects the characteristics of the general Australian population. Confining your sample only to people from Sydney or only people above the age of fifty would be experimental malpractice. Likewise, focusing only on Melbourne and young people would also be extremely poor conduct.

As you can imagine, this has implications for psychology and the social sciences, which often rely on student volunteers from university to conduct their experiments. Strictly speaking, it is not valid to generalise findings from studies that rely only on university students to the general population. University students are not representative of the general population. They are typically young, and their mere presence at university generally indicates a certain level of education and intelligence. It is entirely possible that university students do not psychologically resemble the general population. Why should they when simply getting into a university requires going through a selection process? It would be like trying to estimate the fitness of the average person by looking only at athletes.

Most researchers have at least passing familiarity with these issues, and the more astute have railed against them for decades. However, the simple fact of the matter is that obtaining a truly representative sample for human research would essentially require a form of conscription. Simply relying on volunteers (even including volunteers from outside of university) would not fix the problem because it is quite possible that there is something psychologically different about people willing to volunteer for research.

Of course, I am not advocating that we forcibly conscript people into participating in human research. That would be crazy. One approach that has been tried is offering to pay people for participation. Unfortunately, this runs into the same problem. Not everyone is willing to participate in research, even for money, so how do we know that the people who are willing to be paid to participate are the same, psychologically speaking, as those who are not?

These problems sound horrible, but they do not invalidate psychology or the social sciences. What they do suggest, however, is that more caution be exercised when interpreting the results of research that often relies on samples that are relatively small and less than representative (e.g., a large quantity of research relies on sample sizes less than two hundred and made up entirely of university students). Likewise, efforts should be made to expand sample sizes and their representativeness. Indeed, many statistical techniques (e.g., Factor Analysis and Structural Equation Modelling) are essentially worthless on small sample sizes, making their growing use in certain areas plagued by small sample sizes (e.g., abnormal/clinical psychology) quite worrying.

Research into psychology and the social sciences has the potential to bring about great good but it also has the potential to do great harm. Care should thus be taken to ensure that the best possible samples are used: samples of sufficient size that are also representative of the populations the research wishes to generalise to.

If you want to read more about my thoughts on writing, education, and other subjects, you can find those here.

I also write original fiction, which you can find here.

On The Relationship Between Theory And Experiment

A theory is a set of propositions or beliefs that is used to account for or explain a particular phenomena. However, multiple theories can (and often are) proposed to account for a given phenomena, which makes being able to distinguish between competing theories extremely important.

But how are we to choose between competing theories? Presumably, there is one theory that is better than the others, which is the one we should select. One possibility is to simply accept whichever theory appears to be the most plausible. However, just because a theory is implausible does not mean that it is wrong. Indeed, some of the most important theories in science seem quite implausible at first glance (consider e.g., some of the stranger implications of relativity).

What is required is a systematic way of comparing competing theories that allows us to identify their relative merits. The method that science typically uses for this purpose is the experimental method. At the heart of the experimental method is the idea that a good theory should not only explain existing data but also predict the results of future experiments. This is a subtle but exceedingly important point.

Imagine that we have several competing theories that all seek to explain a particular phenomena. Assuming that all of these theories were developed in good faith, each of them should be able to account for existing data. Otherwise, they would be unable to explain the phenomena at all.

What separates these competing theories are the mechanisms and processes that each uses to explain the phenomena. And it is these mechanisms and processes that can be used to make predictions, which can then be tested by conducting experiments. Whichever theory is best supported by the results of the experiments is the one we accept as being closer to the truth.

To see why this must be so, consider a theory that offers a sublimely elegant explanation for a particular phenomena while also making several key predictions. Experiments are carried out, and every single one of the those key predictions is not only wrong but incredibly wrong. Regardless of how elegant the theory appears, it would be difficult to say it was actually any good.

Remember, the predictions a theory makes should be based on the mechanism and processes it uses to explain the phenomena in question. If those proposed mechanisms and processes cannot make any accurate predictions, then how can they be correct? And if they are not correct, what good is the theory? The answer is that the theory is not good because the processes and mechanisms its relies upon to explain the phenomena are not supported by the available evidence.

The point I am trying to get at is this: the experiments used to test a theory are, in essence, attempts to determine whether or not there is evidence in favour of the mechanisms and processes that theory relies upon to explain a particular phenomena. Failure to find this evidence suggests that the theory may be false.

We must also be careful even when experiments appear to support a particular theory (i.e., the results of the experiments match the predictions made by the theory). This is because different theories that rely upon different mechanisms and processes can sometimes make the same predictions. However, the more predictions are tested, the less likely it is that two theories will make the same predictions for all those tests (i.e., all of those experiments). When two theories make many similar predictions, the key is to develop an experiment around a prediction in which they differ.

Now, the explanation I’ve given so far is, in many ways, a simplification, one that assumes that the experiment was carried out properly. If we do not make this assumption, then there may be several reasons that a theory’s predictions are not supported by the results of experiments.

Let us assume that a theory’s predictions are not supported by experiments. Here are two ways that result could be interpreted.

  1. The experiments were conducted properly; the theory is not supported by the evidence.
  2. The experiments were conducted improperly; it is not possible to say if the theory is supported by the evidence or not.

The second outcome is of particular importance. Imagine that you are testing a particular theory’s prediction about the amount of radiation given off by a nuclear reactor. The theory predicts that the nuclear reactor should give off a certain amount of radiation, but the experiment suggests that it actually gives off considerably less radiation. But what if the instrument used to detect radiation is faulty? If this is the case, then the theory has not been properly tested by the experiment, and no firm conclusions can be drawn about it.

This may seem like a trivial example, but it has massive implications in areas like sociology and psychology. In the physical sciences (e.g., physics and chemistry), it is generally possible to measure a particular attribute directly. For instance, we can use a ruler to measure length or a geiger counter to measure radiation. Thus when a theory makes a prediction about something like length or radiation, we can be fairly confident that the results of our experiment are meaningful (presuming we use proper equipment and a well-designed experiment).

Now consider an attribute like intelligence or extraversion. Setting aside definitional issues, how are we to measure intelligence or extraversion? Intellect and personality attributes like extraversion do not have obvious physical correlates that are directly amenable to measurement. Instead, we must infer the level of someone’s intelligence or extraversion through their behaviour (e.g., a smarter person should be better at problem solving and an extroverted person should be more outgoing).

What this means is that instead of using things like rulers of geiger counters, we are forced to use things like intelligence tests and personality surveys. But using such instruments involves a very, very big assumption: that these instruments adequately reflect the constructs (i.e., concepts) they are trying to measure. If our intelligence tests does not actually measure intelligence, then people’s scores on it are essentially meaningless.

And this is where things get ugly.

Constructs like intelligence are, by their nature, highly contentious. What is intelligence? If you ask different people you are likely to get different answers, so how are we to decide which definition is correct? The definition matters because the definition decides what we include in an intelligence test. If we believe problem solving is part of intelligence, our intelligence tests will include problem solving. If we do not believe problem solving is part of intelligence, our intelligence tests won’t include it. How we see intelligence (or virtually any psychological construct/concept) heavily influences how we attempt to measure it.

Let us return now to my earlier remarks about experimental testing, and in particular, let us return to what I said about what happens when an experiment does not support the predictions made by a theory. Imagine that a theory of intelligence has made a prediction that intelligence should be associated with workplace performance.

According to this prediction, intelligence scores should be positively correlated with workplace performance (i.e., higher intelligence scores should be associated with better workplace performance). Now imagine if this prediction is not supported by an experiment.

On one hand, we could interpret this to mean that intelligence has nothing to do with workplace performance. But on the other hand, it could also be possible that the researcher’s conception of intelligence is flawed. That is, the way they view intelligence is wrong. If this is the case, then their intelligence tests are founded on the wrong model of intelligence, which explains why no correlation was found (i.e., the instrument they are using to measure intelligence is faulty, so the experiment was not conducted properly).

If we assume that it is the researcher’s conception of intelligence that is wrong, what can be done? The logical next step is to find the correct model of intelligence and use that to build a better intelligence test. But how are we to know that we’ve found the correct model of intelligence? That is where things get tricky (and potentially ugly) since we have to make more assumptions. For example, if we assume that intelligence is involved in certain behaviours (e.g., school performance, workplace performance, memory, etc.), then the best model of intelligence is the one that properly predicts performance across those behaviours.

If this sounds complicated and weird compared to experimentation in the physical sciences, it is because it is. Sociology and psychology, by their very nature, often deal with phenomena that are not currently accessible to direct physical measurement (e.g., it is not inconceivable that neural correlates to intelligence measures might be found, but that has not yet occurred to a level that permits direct physical measurement).

This results in a curious change in the relationship between theory and experiment. In the physical sciences, a theory makes predictions that can often be verified through relatively direct measurement. Lack of support for those predictions can thus be interpreted as lack of support for the theory (presuming that the tests and measurements involved were accurately performed). In a field like psychology, the lack of direct measurement means that a theory’s failure to make accurate predictions may be the result of either the theory being wrong or the constructs involved in the theory being conceptualised incorrectly.

In other words, the following hypothetical situation can be true: intelligence may indeed predict workplace performance very well, but experiments to examine that issue may show the opposite if the incorrect conception of intelligence is used. This situation exists because the physical sciences can, more often than not, rely upon relatively direct measures of phenomena (e.g., length, mass, etc. can be measured with reasonable directness and accuracy).

In contrast, psychological phenomena (and similar phenomena) do not have direct physical correlates yet, making their measurement much more difficult since researchers do not even know if the instruments (e.g., psychological tests) they are using in their experiments are even accurate or meaningful. This is why psychology and related fields often discuss notions of validity and reliability to a far greater extent that other sciences. They simply cannot be as confident in their instruments as fields like physics or chemistry (with some exceptions, e.g., the psychology of perception can generally rely upon direct physical measurements).

The relationship between theory and experiment is thus simple at times and complicated at others. At its core, however, is a fairly simple but powerful suggestion: a theory should make predictions, and a good theory’s predictions should be supported by the results of experiment.

On The Relationship Between Theory And Experiment

A theory is a set of propositions or beliefs that is used to account for or explain a particular phenomena. However, multiple theories can (and often are) proposed to account for a given phenomena, which makes being able to distinguish between competing theories extremely important.

But how are we to choose between competing theories? Presumably, there is one theory that is better than the others, which is the one we should select. One possibility is to simply accept whichever theory appears to be the most plausible. However, just because a theory is implausible does not mean that it is wrong. Indeed, some of the most important theories in science seem quite implausible at first glance (consider e.g., some of the stranger implications of relativity).

What is required is a systematic way of comparing competing theories that allows us to identify their relative merits. The method that science typically uses for this purpose is the experimental method. At the heart of the experimental method is the idea that a good theory should not only explain existing data but also predict the results of future experiments. This is a subtle but exceedingly important point.

Imagine that we have several competing theories that all seek to explain a particular phenomena. Assuming that all of these theories were developed in good faith, each of them should be able to account for existing data. Otherwise, they would be unable to explain the phenomena at all. 

What separates these competing theories are the mechanisms and processes that each uses to explain the phenomena. And it is these mechanisms and processes that can be used to make predictions, which can then be tested by conducting experiments. Whichever theory is best supported by the results of the experiments is the one we accept as being closer to the truth.

To see why this must be so, consider a theory that offers a sublimely elegant explanation for a particular phenomena while also making several key predictions. Experiments are carried out, and every single one of the those key predictions is not only wrong but incredibly wrong. Regardless of how elegant the theory appears, it would be difficult to say it was actually any good.

Remember, the predictions a theory makes should be based on the mechanism and processes it uses to explain the phenomena in question. If those proposed mechanisms and processes cannot make any accurate predictions, then how can they be correct? And if they are not correct, what good is the theory? The answer is that the theory is not good because the processes and mechanisms its relies upon to explain the phenomena are not supported by the available evidence.

The point I am trying to get at is this: the experiments used to test a theory are, in essence, attempts to determine whether or not there is evidence in favour of the mechanisms and processes that theory relies upon to explain a particular phenomena. Failure to find this evidence suggests that the theory may be false. 

We must also be careful even when experiments appear to support a particular theory (i.e., the results of the experiments match the predictions made by the theory). This is because different theories that rely upon different mechanisms and processes can sometimes make the same predictions. However, the more predictions are tested, the less likely it is that two theories will make the same predictions for all those tests (i.e., all of those experiments). When two theories make many similar predictions, the key is to develop an experiment around a prediction in which they differ.

Now, the explanation I’ve given so far is, in many ways, a simplification, one that assumes that the experiment was carried out properly. If we do not make this assumption, then there may be several reasons that a theory’s predictions are not supported by the results of experiments.

Let us assume that a theory’s predictions are not supported by experiments. Here are two ways that result could be interpreted.

  1. The experiments were conducted properly; the theory is not supported by the evidence. 
  2. The experiments were conducted improperly; it is not possible to say if the theory is supported by the evidence or not. 

The second outcome is of particular importance. Imagine that you are testing a particular theory’s prediction about the amount of radiation given off by a nuclear reactor. The theory predicts that the nuclear reactor should give off a certain amount of radiation, but the experiment suggests that it actually gives off considerably less radiation. But what if the instrument used to detect radiation is faulty? If this is the case, then the theory has not been properly tested by the experiment, and no firm conclusions can be drawn about it.

This may seem like a trivial example, but it has massive implications in areas like sociology and psychology. In the physical sciences (e.g., physics and chemistry), it is generally possible to measure a particular attribute directly. For instance, we can use a ruler to measure length or a geiger counter to measure radiation. Thus when a theory makes a prediction about something like length or radiation, we can be fairly confident that the results of our experiment are meaningful (presuming we use proper equipment and a well-designed experiment).

Now consider an attribute like intelligence or extraversion. Setting aside definitional issues, how are we to measure intelligence or extraversion? Intellect and personality attributes like extraversion do not have obvious physical correlates that are directly amenable to measurement. Instead, we must infer the level of someone’s intelligence or extraversion through their behaviour (e.g., a smarter person should be better at problem solving and an extroverted person should be more outgoing). 

What this means is that instead of using things like rulers of geiger counters, we are forced to use things like intelligence tests and personality surveys. But using such instruments involves a very, very big assumption: that these instruments adequately reflect the constructs (i.e., concepts) they are trying to measure. If our intelligence tests does not actually measure intelligence, then people’s scores on it are essentially meaningless. 

And this is where things get ugly.

Constructs like intelligence are, by their nature, highly contentious. What is intelligence? If you ask different people you are likely to get different answers, so how are we to decide which definition is correct? The definition matters because the definition decides what we include in an intelligence test. If we believe problem solving is part of intelligence, our intelligence tests will include problem solving. If we do not believe problem solving is part of intelligence, our intelligence tests won’t include it. How we see intelligence (or virtually any psychological construct/concept) heavily influences how we attempt to measure it.

Let us return now to my earlier remarks about experimental testing, and in particular, let us return to what I said about what happens when an experiment does not support the predictions made by a theory. Imagine that a theory of intelligence has made a prediction that intelligence should be associated with workplace performance.

According to this prediction, intelligence scores should be positively correlated with workplace performance (i.e., higher intelligence scores should be associated with better workplace performance). Now imagine if this prediction is not supported by an experiment.

On one hand, we could interpret this to mean that intelligence has nothing to do with workplace performance. But on the other hand, it could also be possible that the researcher’s conception of intelligence is flawed. That is, the way they view intelligence is wrong. If this is the case, then their intelligence tests are founded on the wrong model of intelligence, which explains why no correlation was found (i.e., the instrument they are using to measure intelligence is faulty, so the experiment was not conducted properly).

If we assume that it is the researcher’s conception of intelligence that is wrong, what can be done? The logical next step is to find the correct model of intelligence and use that to build a better intelligence test. But how are we to know that we’ve found the correct model of intelligence? That is where things get tricky (and potentially ugly) since we have to make more assumptions. For example, if we assume that intelligence is involved in certain behaviours (e.g., school performance, workplace performance, memory, etc.), then the best model of intelligence is the one that properly predicts performance across those behaviours.

If this sounds complicated and weird compared to experimentation in the physical sciences, it is because it is. Sociology and psychology, by their very nature, often deal with phenomena that are not currently accessible to direct physical measurement (e.g., it is not inconceivable that neural correlates to intelligence measures might be found, but that has not yet occurred to a level that permits direct physical measurement). 

This results in a curious change in the relationship between theory and experiment. In the physical sciences, a theory makes predictions that can often be verified through relatively direct measurement. Lack of support for those predictions can thus be interpreted as lack of support for the theory (presuming that the tests and measurements involved were accurately performed). In a field like psychology, the lack of direct measurement means that a theory’s failure to make accurate predictions may be the result of either the theory being wrong or the constructs involved in the theory being conceptualised incorrectly. 

In other words, the following hypothetical situation can be true: intelligence may indeed predict workplace performance very well, but experiments to examine that issue may show the opposite if the incorrect conception of intelligence is used. This situation exists because the physical sciences can, more often than not, rely upon relatively direct measures of phenomena (e.g., length, mass, etc. can be measured with reasonable directness and accuracy). 

In contrast, psychological phenomena (and similar phenomena) do not have direct physical correlates yet, making their measurement much more difficult since researchers do not even know if the instruments (e.g., psychological tests) they are using in their experiments are even accurate or meaningful. This is why psychology and related fields often discuss notions of validity and reliability to a far greater extent that other sciences. They simply cannot be as confident in their instruments as fields like physics or chemistry (with some exceptions, e.g., the psychology of perception can generally rely upon direct physical measurements).

The relationship between theory and experiment is thus simple at times and complicated at others. At its core, however, is a fairly simple but powerful suggestion: a theory should make predictions, and a good theory’s predictions should be supported by the results of experiment.

Post Navigation