Correlational Designs

Dr. Rachel Arocho; Dr. Christie Knight; Rachel Munk

Correlational Designs

To correlate something with something else means to examine the association between the two somethings. Correlation is more about the way the data are analyzed than specifically how they are collected, but it’s likely that you’d be using a correlational design of some kind if you used the methods described in this chapter to get your data. As such, we’ll take some time now to talk about considerations in correlational studies, and then later we’ll talk more about how to interpret correlations accurately and fairly. It’s very important to remember the old phrase “correlation does not equal causation” when considering correlational designs. In these types of studies (technically there are a number of ways to calculate correlations, but in this case, we’ll refer to this whole group as “correlational designs”) the aim is simply to see how the variables of interest vary together. We cannot know which causes the other (or if one even does – it could be something else causing both), and frankly, we may not care. Simply understanding shared patterns of change can be very useful and informative.

What Is Correlational Research?

Correlational research is a type of non-experimental research in which the researcher measures two variables and assesses the statistical relationship (i.e., the correlation) between them with little or no effort to control extraneous variables. There are many reasons that researchers interested in statistical relationships between variables would choose to conduct a correlational study rather than an experiment, in which they would take steps to control the other influences on the outcome. The first is that they do not believe that the statistical relationship is a causal one or are not interested in causal relationships. Recall two goals of science are to describe and to predict and the correlational research strategy allows researchers to achieve both of these goals. Specifically, this strategy can be used to describe the strength and direction of the relationship between two variables and if there is a relationship between the variables then the researchers can use scores on one variable to predict scores on the other (using a statistical technique called regression).

Another reason that researchers would choose to use a correlational study rather than an experiment is that the statistical relationship of interest is thought to be causal, but the researcher cannot manipulate the independent variable because it is impossible, impractical, or unethical. For example, while we might be interested in the relationship between the frequency people use cannabis and their memory abilities, we cannot ethically manipulate the frequency that people use cannabis. As such, we must rely on the correlational research strategy; measure the frequency that people use cannabis and measure their memory abilities using a standardized test of memory and then determine whether the frequency people use cannabis use is statistically related to memory test performance.

Correlation is also used to establish the reliability and validity of measurements. For example, a researcher might evaluate the validity of a brief extraversion test by administering it to a large group of participants along with a longer extraversion test that has already been shown to be valid. This researcher might then check to see whether participants’ scores on the brief test are strongly correlated with their scores on the longer one. Neither test score is thought to cause the other, so there is no independent variable to manipulate. In fact, the terms independent variable and dependent variable do not apply to this kind of research in the traditional sense, though you’ll sometimes still see them used (so you should always ask the question, “was there any manipulation going on or just measuring what was already happening?”).

A strength of correlational research is that it is often higher in external validity than experimental research. Recall there is typically a trade-off between internal validity and external validity. As greater control is exerted over an experiment, internal validity is increased but often at the expense of external validity. In contrast, correlational studies typically have low internal validity because nothing is manipulated or controlled, but they often have high external validity. Since nothing is manipulated or controlled by the experimenter the results are more likely to reflect relationships that exist in the real world.

Finally, extending upon this trade-off between internal and external validity, correlational research can help to provide converging evidence for a theory. If a theory is supported by a true experiment that is high in internal validity as well as by a correlational study that is high in external validity then the researchers can have more confidence in the validity of their theory. As a concrete example, correlational studies establishing that there is a relationship between watching violent television and aggressive behavior have been complemented by experimental studies confirming that the relationship is a causal one (Bushman & Huesmann, 2001)^[1]. These converging results provide strong evidence that there is a real relationship (indeed a causal relationship) between watching violent television and aggressive behavior.

Data Collection in Correlational Research

Again, the defining feature of correlational research is that neither variable is manipulated. It does not matter how or where the variables are measured. A researcher could have participants come to a laboratory to complete a computerized backward digit span task and a computerized risky decision-making task and then assess the relationship between participants’ scores on the two tasks. Or a researcher could go to a shopping mall to ask people about their attitudes toward the environment and their shopping habits and then assess the relationship between these two variables. Both of these studies would be correlational because no independent variable is manipulated.

Why can’t correlation establish causality?

There are two reasons that correlation does not imply causation. The first is called the directionality problem. Two variables, X and Y, can be statistically related because X causes Y or because Y causes X. Consider, for example, a study showing that whether or not people exercise is statistically related to how happy they are—such that people who exercise are happier on average than people who do not. This statistical relationship is consistent with the idea that exercising causes happiness, but it is also consistent with the idea that happiness causes exercise. Perhaps being happy gives people more energy or leads them to seek opportunities to socialize with others by going to the gym. The second reason that correlation does not imply causation is called the third-variable problem. Two variables, X and Y, can be statistically related not because X causes Y, or because Y causes X, but because some third variable, Z, causes both X and Y. For example, the fact that nations that have won more Nobel prizes tend to have higher chocolate consumption probably reflects geography in that European countries tend to have higher rates of per capita chocolate consumption and invest more in education and technology (once again, per capita) than many other countries in the world. Similarly, the statistical relationship between exercise and happiness could mean that some third variable, such as physical health, causes both of the others. Being physically healthy could cause people to exercise and cause them to be happier. Correlations that are a result of a third-variable are often referred to as spurious correlations.

Control and Confounding Variables

When designing a study, especially one you will analyze as correlational but that we do want to see, it’s important to consider other factors that might influence the relationship between your variables of interest. X and Y are both just variables, but they get their titles based on their position in this question: X is the independent variable (IV). It is “independent” because it can vary – just, in this case, you’re not making it vary. Y is the dependent variable (DV), so named because the value of Y for any given person is hypothesized to be dependent on the value of X of that person, even though we’re not controlling Y.

Which variable is dependent and which is independent varies based on a number of factors that you might consider as you plan the study. One is time. If one thing occurs before another, you’d look at the earlier occurrence as a predictor of the later one. For example, let’s say you want to design a study to assess how substance abuse in adulthood is influenced by exposure to trauma in early childhood. In that case, child trauma exposure would be your independent variable (or predictor), and adult substance abuse would be your dependent (or outcome) variable. You want to look at how events in early childhood shape adult experiences. It wouldn’t make sense to look at how adult experiences influence child outcomes (unless you have a time machine). In this example, you’re assuming adult substance abuse depends (to an extent) on childhood trauma. The dependent variable is always assumed to be dependent upon the independent variable.

The determination of which variable is independent and which is dependent can also vary based on your research design. This is often the case when two things happen at the same time, or when both happen over time. For example, let’s imagine you want to design a study examining whether there is a connection between child behavior and parenting styles. Your two variables are child behavior and parenting styles – but which is causing which? If you hypothesize that parenting style influences subsequent child behavior, then parenting style would be the independent variable, and child behavior would be the dependent variable. In contrast, if you hypothesize that child behavior shapes later parenting style, child behavior would be the independent variable and parenting style would be the dependent variable. This is called directionality – if two variables are correlated, which direction do we think the influence is occurring in? It all comes down to your research question and what you think will happen. Remember that your hypotheses should be based on empirical and theoretical evidence, not on your own opinions! (In reality, research suggests that the influence between child behavior and parenting styles is bidirectional, meaning it goes both ways. Both influence the other! In that case, you might use a more complex, longitudinal research design to explore how the two variables are associated and predict each other over time).

In the example above, we talked about the influence of child behavior on parenting style. What other factors might influence how a parent raises their child? The parent’s personality, perhaps? Marital conflict? Stress from working multiple jobs? There are many influences on parenting style, but we want to isolate the effects of one variable on parenting style: child behavior. How do we know that the parenting styles of our participants are due to child behavior, and not a multitude of other factors?

Confounding variables are factors that can seriously mess up the validity of a study; they are a common concern in experiments, but they also show up in non-experimental research. A confounding variable is a variable other than the independent variable that varies systematically and could be contributing to the dependent variable. In an experiment, we’d work to remove confounding variables through added control (making sure all the subjects are tested in the same room and at the same time of day, for example). For non-experimental designs, confounders are often uncontrollable in a literal sense, so you have to make do with what you can and get fancy on the back-end. One way to do this is by accounting for the effects of other variables – control variables – on your dependent variables. Control variables are variables that are measured (this is key – if you don’t measure a variable that’s hypothesized to be associated, you can’t isolate the relationship from it) and that can be added to the statistical model along with the IV and DV of interest. By controlling for the additional variables, the statistical relationship can be clarified. We won’t get into the math here, but the assumption becomes that we’re seeing the association between X and Y when the additional variable is held constant (meaning not influencing either X or Y). If you have reason to believe that there might be a third variable at play in a correlational relationship you’re interested in, plan to measure it too so that you can control it in the statistical sense.

References

Bushman, B. J., & Huesmann, L. R. (2001). Effects of televised violence on aggression. In D. Singer & J. Singer (Eds.), Handbook of children and the media (pp. 223–254). Thousand Oaks, CA: Sage. ↵

License

Icon for the Creative Commons Attribution-ShareAlike 4.0 International License

Understanding Research Design in the Social Science Copyright © by Utah Valley University is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License, except where otherwise noted.