"

Evaluating Experiments and Quasi-Experiments

Evaluating Experiments and Quasi-Experiments

The strength of experimental and experiment-related designs lies in the logic behind the design: if all differences are removed or controlled other than the independent variable(s) and then we see a difference in a dependent variable, we have found evidence that the independent variable “causes” (using this word very carefully here!) the dependent variable. It’s a great idea, but there are many ways it can go wrong and the logic can fall apart, so it’s very important to carefully design and evaluate experimental and quasi-experimental research with a critical eye.

The logic of experimental design

As we discussed at the beginning of this chapter, experimental design is commonly misunderstood but informally implemented in everyday life. We often say that we are conducting an experiment when we try a new restaurant or date a new person. As you’ve learned from the last two sections, you must rigorously apply the various components of experimental design for something to be a true experiment, or even a quasi- or pre-experiment. If you wanted trying a new restaurant to be a true experiment, you would need to recruit a large sample, randomly assign participants to control and experimental groups, pretest and posttest, and clearly and objectively use defined measures of restaurant satisfaction.

Social scientists use this level of rigor and control because they try to maximize the internal validity of their experiment. Internal validity is the confidence researchers have about whether their intervention produced variation in their dependent variable. Thus, experiments are attempts to establish causality between two variables—your treatment and its intended outcome. As we’ve talked about before, nomothetic causal relationships must establish four criteria: covariation, plausibility, temporality, and nonspuriousness.

The logic and rigor of experimental designs allows for causal relationships to be established. Experimenters can assess covariation on the dependent variable through pre- and posttests. The use of experimental and control conditions ensures that some people receive the intervention and others do not, providing variation in the independent variable. Moreover, since the researcher controls when the intervention is administered, they can be assured that changes in the independent variable (the treatment) happened before changes the dependent variable (the outcome). In this way, experiments assure temporality. In our restaurant experiment, the assignment to experimental and control groups would show us that people varied in the restaurant they attended. The use of pre- and posttest measures would allow us to know whether their level of satisfaction changed, and our design would assure us that the changes in our diners’ satisfaction occurred after they left the restaurant.

Additionally, experimenters will have a plausible reason why their intervention would cause changes in the dependent variable. Either theory or previous empirical evidence should indicate the potential for a causal relationship. Perhaps we discover a national poll that found pizza, the type of food our experimental restaurant served, is the most popular food in America. Perhaps this restaurant has good reviews on Yelp or Google. This evidence would give us a plausible reason to establish our restaurant as causing satisfaction.

One of the most important features of experiments is that they allow researchers to eliminate spurious variables. True experiments are usually conducted under strictly controlled laboratory conditions. The intervention must be given to each person in the same way, with a minimal number of other variables that might cause their posttest scores to change. In our restaurant example, this level of control might prove difficult. We cannot control how many people are waiting for a table, whether participants saw someone famous there, or if there is bad weather. Any of these factors might cause a diner to be less satisfied with their meal. These spurious variables may cause changes in satisfaction that have nothing to do with the restaurant itself, which is an important problem in real-world research. For this reason, experiments use the laboratory environment try to control as many aspects of the research process as possible. Researchers in large experiments often employ clinicians or other research staff to help them. Researchers train their staff members exhaustively, provide pre-scripted responses to common questions, and control the physical environment of the lab so each person who participates receives the exact same treatment.

Experimental researchers also document their procedures so others can review how well they controlled for spurious variables. A powerful example of this is Bruce Alexander’s Rat Park (1981) experiments. Much of the early research conducted on addictive drugs like heroin and cocaine was conducted on non-human animals, usually mice or rats. While this may seem strange, the systems of our mammalian relatives are similar enough to humans that causal inferences can be made from animal studies to human studies. It is certainly unethical to deliberately cause humans to become addicted to cocaine and measure them for weeks in a laboratory, but it is currently more ethically acceptable to do so with animals. There are specific ethical processes for animal research, similar to an IRB review.

Before Alexander’s experiments, the scientific consensus was that cocaine and heroin were so addictive that rats would repeatedly consume the drug until they perished. Researchers claimed that this behavior in rats explained how addiction worked in humans, however Alexander was not so sure. He knew rats were social animals and the procedure from previous experiments did not allow them to socialize. Instead, rats were isolated in small cages with only food, water, and metal walls. To Alexander, social isolation was a spurious variable that was causing changes in addictive behavior that were not due to the drug itself. Alexander created an experiment of his own, in which rats were allowed to run freely in an interesting environment, socialize and mate with other rats, and of course, drink from a solution that contained an addictive drug. In this environment, rats did not become hopelessly addicted to drugs. In fact, they had little interest in the substance.

The results of Alexander’s experiment demonstrated to him that social isolation was more of a causal factor for addiction than the drug itself. This makes intuitive sense to most of us in the social sciences. If you were in solitary confinement for most of your life, the escape of an addictive drug would seem more tempting than if you were in your natural environment with friends, family, and activities. One challenge with Alexander’s findings is that subsequent researchers have had mixed success replicating his findings (e.g., Petrie, 1996; Solinas et al., 2009). [2] Replication involves conducting another researcher’s experiment in the same manner and seeing if it produces the same results. If the causal relationship is real, it should occur in all (or at least most) replications of the experiment.

One of the defining features of experiments is that they diligently report their procedures, which allows for easier replication. Recently, researchers at the Reproducibility Project have caused a significant controversy in social science fields like psychology (Open Science Collaboration, 2015). [3] In one study, researchers attempted reproduce the results of 100 experiments published in major psychology journals between 2008 and the present. Their findings were shocking: Only 36% of the studies had reproducible results. Despite close coordination with the original researchers, the Reproducibility Project found that nearly two-thirds of psychology experiments published in respected journals were not reproducible. The implications of the Reproducibility Project are staggering, and social scientists are developing new ways to ensure researchers do not cherry-pick data or change their hypotheses simply to get published.

Returning to our discussion of Alexander’s Rat Park study: Consider the implications of his experiment for substance abuse professionals. The conclusions he drew from experimenting on rats were meant to generalize to the population of people with substance use disorders with whom these professionals work. Experiments seek to establish external validity, or the degree to which their conclusions generalize to larger populations and different situations. Alexander contends that his conclusions about addiction and social isolation help us understand why people living in deprived, isolated environments are more likely to become addicted to drugs when compared to people living in more enriching environments. Similarly, earlier rat researchers contended that their results showed these drugs to be instantly addictive, often to the point of death.

Neither study will match up perfectly with real life. In practice, substance use disorder counselors likely meet with many individuals who may have fit into Alexander’s social isolation model, but social isolation is complex for humans. Clients can live in environments with other sociable humans, worked jobs, and had romantic relationships, so it may be difficult to consider them “isolated.” On the other hand, many faced structural racism, poverty, trauma, and other challenges that may contribute to social isolation. Alexander’s work can help us understand part of clients’ experiences, but the explanation will always be incomplete. The real world is much more complicated than the experimental conditions in Rat Park, just as humans are more complex than rats.

Social scientists must thus be attentive to how social context shapes social life. We are likely to point that experiments are rather artificial. How often do real-world social interactions occur in the lab? Experiments that are conducted in community settings may be less subject to artificiality, though their conditions are less easily controlled. This relationship demonstrates the tension between internal and external validity. The more researchers tightly control the environment to ensure internal validity, the less they can claim external validity and that their results are applicable to different populations and circumstances. Correspondingly, researchers whose settings are just like the real world will be less able to ensure internal validity, as there are many factors that could pollute the research process. This is not to suggest that experimental research cannot have external validity, but experimental researchers must always be aware that external validity problems can occur and be forthcoming in their reports of findings about this potential weakness – in some fields, this is even more challenging than others, but experiments can still be extremely helpful and rewarding when done well (Doan et al., 2024).

Threats to validity

Internal validity and external validity are conceptually linked. Internal validity refers to the degree to which the intervention causes its intended outcomes, and external validity refers to how well that relationship applies to different groups and circumstances. There are a number of factors that may influence a study’s validity. You might consider these threats to all be spurious variables, as discussed earlier. Each threat proposes another factor that is changing the relationship between intervention and outcome. The threats introduce error and bias into the experiment.

Throughout this chapter, we reviewed the importance of experimental and control groups. These groups must be comparable for the experimental design to work. Comparable groups are groups that are similar across factors important for the study. Researchers can help establish comparable groups by using probability sampling, random assignment, or matching techniques. Control or comparison groups pose a counterfactual consideration— what would have happened to the experimental group had they not been given the intervention? Two very different groups would not allow you to answer that question. Intuitively, we know that no two people are the same, so groups are ever perfectly comparable. Importantly, we must ensure that groups are comparable along the variables that are relevant to our research project.

If one of the groups in our restaurant example had numerous vegetarians or gluten-free individuals, their satisfaction with the restaurant might be influenced by their dietary needs. In that case, our groups would not be comparable. Researchers also account for these effects by measuring other variables like dietary preference, and by statistically controlling for their effects after the data are collected. We discussed control variables like these before. Similarly, if we were to pick people that we thought would “really like” our restaurant and assign them to the experimental group, or if we were to allow participants to choose to be in the new restaurant group or the control group (such that “foodies” might self-select to try the new joint), we would be introducing selection bias into our sample. Experimenters use random assignment so that conscious and unconscious bias do not influence the group to which a participant is assigned.

Experimenters themselves are often the source of threats to validity. They may choose measures that do not accurately measure participants or implement the measure in a way that biases participant responses in one direction or another. The act of simply conducting an experiment may cause researchers to influence participants to perform differently. Experiments are different from participants’ normal routines, so the novelty of a research environment or experimental treatment may cause them to expect to feel differently, independent of the actual intervention. You have likely heard of the placebo effect, in which a participant feels better, despite having received no intervention at all.

Researchers may also introduce error by expecting participants in each group to behave differently. They may expect the experimental group to feel better and the researchers may give off conscious or unconscious cues to participants that influence their outcomes. Control groups will be expected to fare worse, and research staff could cue participants that they should feel worse than they otherwise would. For this reason, researchers often use double-blind designs where research staff that interact with participants are unaware of who is in the control group and who is in the experimental group. Proper training and supervision are also necessary to account for these and other threats to validity. If proper supervision is not applied, research staff administering the control group may try to equalize treatment or engage in a rivalry with research staff administering the experimental group (Engel & Schutt, 2016). [4]

No matter how tightly the researcher controls the experiment, participants are humans and are therefore curious, problem-solving creatures. Participants who learn they are in the control group may react by trying to outperform the experimental group or by becoming demoralized. In either case, their outcomes in the study would be different had they been unaware of their group assignment. Participants in the experimental group may begin to behave differently or share insights from the intervention with individuals in the control group. Whether through social learning or conversation, participants in the control group may receive parts of the intervention of which they were supposed to be unaware, a result called contamination. As a result, experimenters try to keep experimental and control groups as separate as possible. This is significantly easier inside a laboratory study, as the researchers control access and timing at the facility. This problem is more complicated in agency-based research. If your intervention is effective, then your experimental group participants may impact the control group by behaving differently and sharing the insights they’ve learned with their peers. Agency-based researchers may locate experimental and control conditions at separate offices with separate treatment staff to minimize the interaction between their participants.

Thus, in experiments we have to ask ourselves some important questions: how much control do we want over the participants’ experiences and data we collect, how much control could we have, and how much control do we actually have? Experiments, by nature, can have very high control: the researcher manipulates who is exposed to what, when they’re exposed, and where. That high control lends incredible internal validity when done right! That high internal validity, however, comes with the trade-off described regarding external validity: sometimes an experimental setting is just too unnatural to believe that results from that setting will actually carry-over in real life. That’s not a failing of research, it’s simply a potential weakness of the design. Recognizing the weaknesses in one design just mean that you also recognize the need for other studies with other designs to help build knowledge on a given topic. How do you make sure experiments, as the design most open to control, are well controlled and have the highest validity possible? There are a few methods. 

Practical Considerations

The information presented so far in this chapter is enough to design a basic experiment. When it comes time to conduct that experiment, however, several additional practical issues arise. In this section, we consider some of these issues and how to deal with them. Much of this information applies to nonexperimental studies as well as experimental ones.

Recruiting Participants

Of course, at the start of any research project, you should be thinking about how you will obtain your participants. Unless you have access to people with schizophrenia or incarcerated juvenile offenders, for example, then there is no point designing a study that focuses on these populations. But even if you plan to use a convenience sample, you will have to recruit participants for your study.

There are several approaches to recruiting participants. One is to use participants from a formal subject pool—an established group of people who have agreed to be contacted about participating in research studies. For example, at many colleges and universities, there is a subject pool consisting of students enrolled in introductory psychology courses who must participate in a certain number of studies to meet a course requirement. Researchers post descriptions of their studies and students sign up to participate, usually via an online system. Participants who are not in subject pools can also be recruited by posting or publishing advertisements or making personal appeals to groups that represent the population of interest. For example, a researcher interested in studying older adults could arrange to speak at a meeting of the residents at a retirement community to explain the study and ask for volunteers.

The Volunteer Subject

Even if the participants in a study receive compensation in the form of course credit, a small amount of money, or a chance at being treated for a psychological problem, they are still essentially volunteers. This is worth considering because people who volunteer to participate in psychological research have been shown to differ in predictable ways from those who do not volunteer. Specifically, there is good evidence that on average, volunteers have the following characteristics compared with non-volunteers (Rosenthal & Rosnow, 1976)[1]:

  • They are more interested in the topic of the research.
  • They are more educated.
  • They have a greater need for approval.
  • They have higher intelligence quotients (IQs).
  • They are more sociable.
  • They are higher in social class.

This difference can be an issue of external validity if there is a reason to believe that participants with these characteristics are likely to behave differently than the general population. For example, in testing different methods of persuading people, a rational argument might work better on volunteers than it does on the general population because of their generally higher educational level and IQ.

In many field experiments, the task is not recruiting participants but selecting them. For example, researchers Nicolas Guéguen and Marie-Agnès de Gail conducted a field experiment on the effect of being smiled at on helping, in which the participants were shoppers at a supermarket. A confederate (an actor working with the researcher) walking down a stairway gazed directly at a shopper walking up the stairway and either smiled or did not smile. Shortly afterward, the shopper encountered another confederate, who dropped some computer diskettes on the ground. The dependent variable was whether or not the shopper stopped to help pick up the diskettes (Guéguen & de Gail, 2003)[2]Notice that these participants were not “recruited,” but the researchers still had to select them from among all the shoppers taking the stairs that day. It is extremely important that this kind of selection be done according to a well-defined set of rules that are established before the data collection begins and can be explained clearly afterward. In this case, with each trip down the stairs, the confederate was instructed to gaze at the first person he encountered who appeared to be between the ages of 20 and 50. Only if the person gazed back did he or she become a participant in the study. The point of having a well-defined selection rule is to avoid bias in the selection of participants. For example, if the confederate was free to choose which shoppers he would gaze at, he might choose friendly-looking shoppers when he was set to smile and unfriendly-looking ones when he was not set to smile. As we will see shortly, such biases can be entirely unintentional.

Treatment and Control Conditions

We’ve talked about assigning participants to different groups (treatment vs. control), but what should the control group actually do instead of receiving the treatment? There are different types of control conditions. In a no-treatment control condition, participants receive no treatment whatsoever. One problem with this approach, however, is the existence of placebo effects. A placebo is a simulated treatment that lacks any active ingredient or element that should make it effective, and a placebo effect is a positive effect of such a treatment. Many folk remedies that seem to work—such as eating chicken soup for a cold or placing soap under the bed sheets to stop nighttime leg cramps—are probably nothing more than placebos. Although placebo effects are not well understood, they are probably driven primarily by people’s expectations that they will improve. Having the expectation to improve can result in reduced stress, anxiety, and depression, which can alter perceptions and even improve immune system functioning (Price et al., 2008).

Placebo effects are interesting in their own right (see Note “The Powerful Placebo”), but they also pose a serious problem for researchers who want to determine whether a treatment works. The Figure below shows some hypothetical results in which participants in a treatment condition improved more on average than participants in a no-treatment control condition. If these conditions (the two leftmost bars in the figure) were the only conditions in this experiment, however, one could not conclude that the treatment worked. It could be instead that participants in the treatment group improved more because they expected to improve, while those in the no-treatment control condition did not.

This bar graph illustrates the concept of participant expectations and the placebo effect in experimental research. The y-axis represents the level of improvement, while the x-axis shows three different experimental conditions: Treatment – Participants receive the actual intervention and expect to improve, resulting in the highest level of improvement. No Treatment – Participants receive nothing and do not expect to improve, leading to the lowest improvement. Placebo – Participants receive a fake treatment but expect to improve, showing moderate improvement, higher than no treatment but less than the real treatment. This graph highlights how expectations alone can influence outcomes, as seen in the placebo condition. It emphasizes the importance of control groups and blinding in research to distinguish between the real effects of a treatment and psychological effects due to participant belief.

Hypothetical Results From a Study Including Treatment, No-Treatment, and Placebo Conditions

Fortunately, there are several solutions to this problem. One is to include a placebo control condition, in which participants receive a placebo that looks much like the treatment but lacks the active ingredient or element thought to be responsible for the treatment’s effectiveness. When participants in a treatment condition take a pill, for example, then those in a placebo control condition would take an identical-looking pill that lacks the active ingredient in the treatment (a “sugar pill”). In research on psychotherapy effectiveness, the placebo might involve going to a psychotherapist and talking in an unstructured way about one’s problems. The idea is that if participants in both the treatment and the placebo control groups expect to improve, then any improvement in the treatment group over and above that in the placebo control group must have been caused by the treatment and not by participants’ expectations. This difference is what is shown by a comparison of the two outer bars in the Figure above.

Of course, the principle of informed consent requires that participants be told that they will be assigned to either a treatment or a placebo control condition—even though they cannot be told which until the experiment ends. In many cases the participants who had been in the control condition are then offered an opportunity to have the real treatment. An alternative approach is to use a wait-list control condition, in which participants are told that they will receive the treatment but must wait until the participants in the treatment condition have already received it. This disclosure allows researchers to compare participants who have received the treatment with participants who are not currently receiving it but who still expect to improve (eventually). A final solution to the problem of placebo effects is to leave out the control condition completely and compare any new treatment with the best available alternative treatment. For example, a new treatment for simple phobia could be compared with standard exposure therapy. Because participants in both conditions receive a treatment, their expectations about improvement should be similar. This approach also makes sense because once there is an effective treatment, the interesting question about a new treatment is not simply “Does it work?” but “Does it work better than what is already available?

The Powerful Placebo

Many people are not surprised that placebos can have a positive effect on disorders that seem fundamentally psychological, including depression, anxiety, and insomnia. However, placebos can also have a positive effect on disorders that most people think of as fundamentally physiological. These include asthma, ulcers, and warts (Shapiro & Shapiro, 1999)[4]. There is even evidence that placebo surgery—also called “sham surgery”—can be as effective as actual surgery.

Medical researcher J. Bruce Moseley and his colleagues conducted a study on the effectiveness of two arthroscopic surgery procedures for osteoarthritis of the knee (Moseley et al., 2002)[5]The control participants in this study were prepped for surgery, received a tranquilizer, and even received three small incisions in their knees. But they did not receive the actual arthroscopic surgical procedure. The surprising result was that all participants improved in terms of both knee pain and function, and the sham surgery group improved just as much as the treatment groups. According to the researchers, “This study provides strong evidence that arthroscopic lavage with or without débridement [the surgical procedures used] is not better than and appears to be equivalent to a placebo procedure in improving knee pain and self-reported function” (p. 85).

Standardizing the Procedure

It is surprisingly easy to introduce extraneous variables during the procedure. For example, the same experimenter might give clear instructions to one participant but vague instructions to another. Or one experimenter might greet participants warmly while another barely makes eye contact with them. To the extent that such variables affect participants’ behavior, they add noise to the data and make the effect of the independent variable more difficult to detect. If they vary systematically across conditions, they become confounding variables and provide alternative explanations for the results. For example, if participants in a treatment group are tested by a warm and friendly experimenter and participants in a control group are tested by a cold and unfriendly one, then what appears to be an effect of the treatment might actually be an effect of experimenter demeanor. When there are multiple experimenters, the possibility of introducing extraneous variables is even greater, but is often necessary for practical reasons.

Experimenter’s Sex as an Extraneous Variable

It is well known that whether research participants are male or female can affect the results of a study. But what about whether the experimenter is male or female? There is plenty of evidence that this matters too. Male and female experimenters have slightly different ways of interacting with their participants, and of course, participants also respond differently to male and female experimenters (Rosenthal, 1976)[6].

For example, in a recent study on pain perception, participants immersed their hands in icy water for as long as they could (Ibolya, Brake, & Voss, 2004)[7]. Male participants tolerated the pain longer when the experimenter was a woman, and female participants tolerated it longer when the experimenter was a man.

Researcher Robert Rosenthal has spent much of his career showing that this kind of unintended variation in the procedure does, in fact, affect participants’ behavior. Furthermore, one important source of such variation is the experimenter’s expectations about how participants “should” behave in the experiment. This outcome is referred to as an experimenter expectancy effect (Rosenthal, 1976)[8]For example, if an experimenter expects participants in a treatment group to perform better on a task than participants in a control group, then he or she might unintentionally give the treatment group participants clearer instructions or more encouragement or allow them more time to complete the task. In a striking example, Rosenthal and Kermit Fode had several students in a laboratory course in psychology train rats to run through a maze. Although the rats were genetically similar, some of the students were told that they were working with “maze-bright” rats that had been bred to be good learners, and other students were told that they were working with “maze-dull” rats that had been bred to be poor learners. Sure enough, over five days of training, the “maze-bright” rats made more correct responses, made the correct response more quickly, and improved more steadily than the “maze-dull” rats (Rosenthal & Fode, 1963)[9]. Clearly, it had to have been the students’ expectations about how the rats would perform that made the difference. But how? Some clues come from data gathered at the end of the study, which showed that students who expected their rats to learn quickly felt more positively about their animals and reported behaving toward them in a more friendly manner (e.g., handling them more).

The way to minimize unintended variation in the procedure is to standardize it as much as possible so that it is carried out in the same way for all participants regardless of the condition they are in. Here are several ways to do this:

  • Create a written protocol that specifies everything that the experimenters are to do and say from the time they greet participants to the time they dismiss them.
  • Create standard instructions that participants read themselves or that are read to them word for word by the experimenter.
  • Automate the rest of the procedure as much as possible by using software packages for this purpose or even simple computer slide shows.
  • Anticipate participants’ questions and either raise and answer them in the instructions or develop standard answers for them.
  • Train multiple experimenters on the protocol together and have them practice on each other.
  • Be sure that each experimenter tests participants in all conditions.

Another good practice is to arrange for the experimenters to be “blind” to the research question or to the condition in which each participant is tested. The idea is to minimize experimenter expectancy effects by minimizing the experimenters’ expectations. For example, in a drug study in which each participant receives the drug or a placebo, it is often the case that neither the participants nor the experimenter who interacts with the participants knows which condition he or she has been assigned to complete. Because both the participants and the experimenters are blind to the condition, this technique is referred to as a double-blind study. (A single-blind study is one in which only the participant is blind to the condition.) Of course, there are many times this blinding is not possible. For example, if you are both the investigator and the only experimenter, it is not possible for you to remain blind to the research question. Also, in many studies, the experimenter must know the condition because he or she must carry out the procedure in a different way in the different conditions.

Record Keeping

It is essential to keep good records when you conduct any study, but especially when you do an experiment. As discussed earlier, it is typical for experimenters to generate a written sequence of conditions before the study begins and then to test each new participant in the next condition in the sequence. As you test them, it is a good idea to add to this list basic demographic information; the date, time, and place of testing; and the name of the experimenter who did the testing. It is also a good idea to have a place for the experimenter to write down comments about unusual occurrences (e.g., a confused or uncooperative participant) or questions that come up. This kind of information can be useful later if you decide to analyze sex differences or effects of different experimenters, or if a question arises about a particular participant or testing session.

Since participants’ identities should be kept as confidential (or anonymous) as possible, their names and other identifying information should not be included with their data. In order to identify individual participants, it can, therefore, be useful to assign an identification number to each participant as you test them. Simply numbering them consecutively beginning with 1 is usually sufficient. This number can then also be written on any response sheets or questionnaires that participants generate, making it easier to keep them together.

Manipulation Check

In many experiments, the independent variable is a construct that can only be manipulated indirectly. For example, a researcher might try to manipulate participants’ stress levels indirectly by telling some of them that they have five minutes to prepare a short speech that they will then have to give to an audience of other participants. In such situations, researchers often include a manipulation check in their procedure. A manipulation check is a separate measure of the construct the researcher is trying to manipulate. The purpose of a manipulation check is to confirm that the independent variable was, in fact, successfully manipulated. For example, researchers trying to manipulate participants’ stress levels might give them a paper-and-pencil stress questionnaire or take their blood pressure—perhaps right after the manipulation or at the end of the procedure—to verify that they successfully manipulated this variable.

Manipulation checks are particularly important when the results of an experiment turn out null. In cases where the results show no significant effect of the manipulation of the independent variable on the dependent variable, a manipulation check can help the experimenter determine whether the null result is due to a real absence of an effect of the independent variable on the dependent variable or if it is due to a problem with the manipulation of the independent variable. Imagine, for example, that you exposed participants to happy or sad movie music—intending to put them in happy or sad moods—but you found that this had no effect on the number of happy or sad childhood events they recalled. This could be because being in a happy or sad mood has no effect on memories for childhood events. But it could also be that the music was ineffective at putting participants in happy or sad moods. A manipulation check—in this case, a measure of participants’ moods—would help resolve this uncertainty. If it showed that you had successfully manipulated participants’ moods, then it would appear that there is indeed no effect of mood on memory for childhood events. But if it showed that you did not successfully manipulate participants’ moods, then it would appear that you need a more effective manipulation to answer your research question.

Manipulation checks are usually done at the end of the procedure to be sure that the effect of the manipulation lasted throughout the entire procedure and to avoid calling unnecessary attention to the manipulation (to avoid a demand characteristic). However, researchers are wise to include a manipulation check in a pilot test of their experiment so that they avoid spending a lot of time and resources on an experiment that is doomed to fail and instead spend that time and energy finding a better manipulation of the independent variable.

Pilot Testing

It is always a good idea to conduct a pilot test of your experiment (though this works for any study). A pilot test is a small-scale study conducted to make sure that a new procedure works as planned. In a pilot test, you can recruit participants formally (e.g., from an established participant pool) or you can recruit them informally from among family, friends, classmates, and so on. The number of participants can be small, but it should be enough to give you confidence that your procedure works as planned. There are several important questions that you can answer by conducting a pilot test:

  • Do participants understand the instructions?
  • What kind of misunderstandings do participants have, what kind of mistakes do they make, and what kind of questions do they ask?
  • Do participants become bored or frustrated?
  • Is an indirect manipulation effective? (You will need to include a manipulation check.)
  • Can participants guess the research question or hypothesis (are there demand characteristics)?
  • How long does the procedure take?
  • Are computer programs or other automated procedures working properly?
  • Are data being recorded correctly?

Of course, to answer some of these questions you will need to observe participants carefully during the procedure and talk with them about it afterward. Participants are often hesitant to criticize a study in front of the researcher, so be sure they understand that their participation is part of a pilot test and you are genuinely interested in feedback that will help you improve the procedure. If the procedure works as planned, then you can proceed with the actual study. If there are problems to be solved, you can solve them, pilot test the new procedure, and continue with this process until you are ready to proceed.

Image Attributes

Graph of treatment, no-treatment control, and placebo group copied from Price, P. C., Jhangiani, R. S., Chaing, I-C. A., Leighton, D. C., & Cuttler, C. (2017). Research Methods in Psychology (3rd American Edition). Pressbooks. under a CC-BY-NC-SA license.

References

Alexander, B. (2010). Addiction: The view from rat park. Retrieved from: http://www.brucekalexander.com/articles-speeches/rat-park/148-addiction-the-view-from-rat-park

Doan, L., Quadlin, N., & Khanna, K. (2024). Using experiments to study families and intimate relationships. Journal of Marriage and Family, 86(5), 1251-1271. https://doi.org/10.1111/jomf.12959

Engel, R. J. & Schutt, R. K. (2016). The practice of research in social work (4th ed.). Washington, DC: SAGE Publishing.

Guéguen, N., & de Gail, Marie-Agnès. (2003). The effect of smiling on helping behavior: Smiling and good Samaritan behavior. Communication Reports, 16, 133–140.

Ibolya, K., Brake, A., & Voss, U. (2004). The effect of experimenter characteristics on pain reports in women and men. Pain, 112, 142–147.

Moseley, J. B., O’Malley, K., Petersen, N. J., Menke, T. J., Brody, B. A., Kuykendall, D. H., … Wray, N. P. (2002). A controlled trial of arthroscopic surgery for osteoarthritis of the knee. The New England Journal of Medicine, 347, 81–88.

Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716.

Petrie, B. F. (1996). Environment is not the most important variable in determining oral morphine consumption in Wistar rats. Psychological reports, 78(2), 391-400.

Price, D. D., Finniss, D. G., & Benedetti, F. (2008). A comprehensive review of the placebo effect: Recent advances and current thought. Annual Review of Psychology, 59, 565–590.

Rosenthal, R. (1976). Experimenter effects in behavioral research (enlarged ed.). New York, NY: Wiley.

Rosenthal, R., & Fode, K. (1963). The effect of experimenter bias on performance of the albino rat. Behavioral Science, 8, 183-189.

Rosenthal, R., & Rosnow, R. L. (1976). The volunteer subject. New York, NY: Wiley.

Shapiro, A. K., & Shapiro, E. (1999). The powerful placebo: From ancient priest to modern physician. Baltimore, MD: Johns Hopkins University Press.

Solinas, M., Thiriet, N., El Rawas, R., Lardeux, V., & Jaber, M. (2009). Environmental enrichment during early stages of life reduces the behavioral, neurochemical, and molecular effects of cocaine. Neuropsychopharmacology, 34(5), 1102

Media Attributions

  • 6.2

License

Icon for the Creative Commons Attribution-ShareAlike 4.0 International License

Understanding Research Design in the Social Science Copyright © by Utah Valley University is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License, except where otherwise noted.

Share This Book