Sampling Bias

Dr. Rachel Arocho; Dr. Christie Knight; Rachel Munk

Sampling Bias

In all research designs, your goal is to get the best information possible with the fewest resources. Balance this goal with practical limitations (can you actually do a simple random sample if you don’t have any way to get a sampling frame, for example?), and you can begin to see that the process of determining and implementing a sampling design is not simple.

Basically, to help with the “best information possible” part of the goal, we need to think about reducing potential sampling bias as much as possible. “Bias” sounds like a bad word, but it’s a natural part of research, and it likely doesn’t mean what you might think on first glance. Bias, in this case, simply means that a sample doesn’t look like the population it is meant to represent in one way or another. Our goal as researchers is to acknowledge sampling bias, limit it as much as possible, and document what we can’t limit. If a sample gets biased enough, it might not be applicable to the population we were originally going for; that’s a problem, but even then, we can usually acknowledge the bias, define the population the sample does generalize to, and be careful of our interpretations and generalizations from the research.

Who sampled, how, and for what purpose?

Have you ever been a participant in someone’s research? If you have ever taken an introductory psychology or sociology class at a large university, that’s probably a silly question to ask. While social science researchers on college campuses have access to a bunch of (presumably) willing and able human guinea pigs, but that luxury comes at the cost of sample representativeness. One study of top academic journals in psychology found that over two-thirds (68%) of participants in studies published by those journals were based on samples drawn in the United States (Arnett, 2008). ^[1] Further, the study found that two-thirds of the work that derived from US samples published in the Journal of Personality and Social Psychology was based on samples made up entirely of American undergraduates taking psychology courses.

These findings certainly raise the question: What do we actually learn from social scientific studies and about whom do we learn it? That is exactly the concern raised by Joseph Henrich and colleagues (Henrich et al., 2010), ^[2] authors of the article “The Weirdest People in the World?” In their piece, Henrich and colleagues point out that behavioral scientists commonly make sweeping claims about human nature based on samples drawn only from WEIRD (Western, Educated, Industrialized, Rich, and Democratic) societies. These claims are often based on even narrower samples, as is the case with many studies relying on samples drawn from college classrooms. As it turns out, many robust findings about the nature of human behavior like fairness, cooperation, visual perception, trust, and others are based on studies that excluded participants from outside the United States and sometimes excluded anyone outside the college classroom (Begley, 2010). ^[3] This certainly raises questions about what we know about human behavior as opposed to the behavior of US residents or US undergraduates behavior. Of course, not all research findings are based on samples of WEIRD folks like college students, but we should always pay attention to the population that studies are based on and the claims that the study makes about populations.

In the preceding discussion, the concern is with researchers making claims about populations other than those from which their samples were drawn. A related, but slightly different, potential concern is sampling bias. Bias in sampling occurs when the elements selected for inclusion in a study do not represent the larger population from which they were drawn. For example, if you were to sample people walking into the social science building on campus during each weekday, your sample would include too many social science majors and not enough non-social science majors. Furthermore, you would completely exclude graduate students if graduate classes are held at night. Bias may be introduced by the sampling method used or due to conscious or unconscious bias introduced by the researcher (Rubin & Babbie, 2017). ^[4] A researcher might select people who “look like good research participants,” and thereby transfer their unconscious biases to their sample.

A sample may be representative in all respects that a researcher thinks are relevant, but there may be aspects that are relevant that didn’t occur to the researcher when they were drawing their sample. For example, you might not think that a person’s phone would have much to do with their voting preferences. However, if the pollsters making predictions about the 2008 presidential election results not been careful to include both cell phone-only and landline households in their surveys, it is possible that their predictions would have underestimated Barack Obama’s lead over John McCain because Obama was much more popular among cell-only users than McCain (Keeter, Dimock, & Christian, 2008). ^[5]

So how do we know when we can count on results that are being reported to us? While there might not be any magic or always-true rules we can apply, there are a couple of things we can keep in mind as we read the claims researchers make about their findings.

First, remember that sample quality is determined only by the sample actually obtained, not by the sampling method itself. A researcher may set out to administer a survey to a representative sample by correctly employing a random selection technique, but if they only receive a handful of responses, then they will have to be very careful about the claims they can make about their survey findings. Nonresponse bias is exactly what it sounds like: bias that enters your sample because of who doesn’t enter the sample. We’ll talk more about this when we get to survey design because surveys are especially prone to having problems with nonresponse, but for any design it’s important to do all we can to recruit those who were selected for our study (within ethical guidelines, of course – we can’t force or coerce someone to join the study!) and keep participants in the study until completion. We can best do this through careful study and recruitment design, considering respondent burden (not making it hard for them to participate), and anticipating and regularly assessing participant risks and safety (we don’t want to scare them away!). It’s important to note that nonresponse is not itself biasing; it’s when nonresponse is systematic, meaning that certain people are more likely to drop out of the study (which is specifically called attrition) or refuse to participate that it biases the sample. If you are trying to study both men and women, for example, but you create your protocol in such a way that it makes women less likely to participate (maybe you only post recruitment flyers near urinals? That’s a pretty bad example, but it would be an effective way to screw this up), then the nonresponse of women will bias your sample.

Intentionality and Inclusion/Exclusion Criteria

Sometimes bias occurs because of the recruitment technique or some part of the study design itself. For example, if you need to recruit couples because you want to study something that requires both people from the couple (as opposed to sampling “married individuals” which would only mean you need one member of the couple), you’re likely to end up with different types of relationships (higher-quality relationships, namely) than if you were recruiting only one member of the dyad (Barton et al., 2019). It’s always worth thinking very carefully about who you’re recruiting and how, and if that how biases the who that ends up in your study.

It’s important that researchers are intentional about who they include in their samples. Common explicit exclusion criteria are rules regarding safety, age (if you’ve seen a study that says you must be 18+, they’re explicitly excluding the vulnerable population of children), and language (if the research team only speaks one language, it’s probably not a good idea for them to recruit participants who don’t speak that language – diversifying the team can open up possibilities, though). Limiting this way does not cause sampling bias, per se, because you’re sampling this way because of your population. It still must be documented, however.

We read and hear about research results so often that we might overlook questioning where the research participants came from and how they are identified for inclusion. It is easy to focus solely on findings when we are busy and when the most interesting information is in a study’s conclusions rather than its procedures. Now that you have some familiarity with the variety of procedures for selecting study participants, you are equipped to ask some very important questions about the findings you read and you are ready to be a more responsible consumer of research. Another thing to keep in mind is that researchers may want to talk about implications of their findings as though they apply to some group other than the population that was sampled. Though this tendency is usually quite innocent, it is very tempting to talk about findings this way. As consumers of those findings, it is our responsibility to be attentive to this sort of (likely unintentional) bait and switch. At their core, questions about sample quality should address who has been sampled, how they were sampled, and for what purpose they were sampled. Being able to answer those questions will help you better understand, and more responsibly read, research results.

Sample Size

Can I use AI for this?

Show

Understanding and Using Sample Size Calculators in Research

Sample size calculators can also be used to determine the appropriate sample size. Sample size calculators are tools that help researchers determine how many people they need to include in a study to get reliable results. There are key pieces of information you’ll need to know if you decide to use a sample-size calculator. You will need to know the approximate population size, the confidence interval, and the margin of error you are comfortable with. To use a sample size calculator effectively, it’s important to understand what each of these terms means and how they impact your study.

The population size refers to the total number of people in the group the researcher is interested in, such as all college students taking an introductory psychology class or all parents in a designated geographical area with a child who has autism. The confidence interval, often expressed as a percentage (like 95%), tells us how sure we can be that our results reflect what is valid for the entire population. A 95% confidence interval means that the results would be similar if we repeated the study multiple times, 95 out of 100 times. The margin of error shows how much the results might differ from the actual values in the population. For example, if a survey says that 60% of people prefer online shopping with a margin of error of ±5%, the actual percentage in the population could be anywhere from 55% to 65%. A sample size calculator helps balance all these factors to ensure the study is accurate without collecting unnecessary data.

If you want to learn more about sample size calculators or use one to determine the right sample size for your project, you can find helpful information here.

Finally, it’s worth discussing sample size as a factor in a research study’s quality, but the relationship may not be as straightforward as you might expect. Both the number of people you aim to recruit and the number of people you end up recruiting are both important considerations for how we can interpret the results of a study. It’s common to hear studies criticized for “too small of a sample,” but what’s actually “too small?” For some methods, very small samples, even as small as 1 or 2 individuals, are completely appropriate! Case studies often use data from only one subject to delve into detail. Many qualitative studies can reach saturation (a metric of how much new information is added with new recruitment – when no new information is coming in, recruitment can stop) with only a couple dozen participants. On the other hand, quantitative methods usually do suffer when sample sizes are too small to produce enough statistical power for the tests being conducted; for some statistical models, samples of hundreds or even thousands of participations are needed for the statistics to work correctly. To determine an appropriate sample size for a given study, then, is not always easy. For quantitative methods, an a priori power analysis can be used to calculate the minimum sample needed to detect a statistical effect using the planned tests and anticipated effect size before starting recruitment. More and more quantitative researchers are using these tests to plan their studies. However, this isn’t always possible, or even when it is, it’s not always done. Instead, post hoc power analysis can be used to determine what power is present in a study. For both qualitative and quantitative methods, though, sample size is more often determined by precedence – that is, what is most often done for a given discipline and method, or simply by what is feasible with a given set of resources. Look through the studies you’ve cited in your literature review and compare their methods and sample sizes. Do you see a lot of surveys being done? Are they usually with samples in the hundreds? Thousands? Millions? You’ll likely want to propose something similar for your study, given adequate ability to do so. However, if you’re seeing more examples of interview studies using a dozen, twenty, or even one hundred participants but not many more, then proposing a study in line with what you’re seeing will likely be acceptable. Remember, just like in other areas of sampling (and research in general), there’s rarely one right or wrong answer to a question, rather, it’s about justifying decisions and documenting the process to make sure the research is as transparent and honest as possible.

Refernces

Arnett, J. J. (2008). The neglected 95%: Why American psychology needs to become less American. American Psychologist, 63, 602–614. ↵

Henrich, J., Heine, S. J., & Norenzayan, A. (2010). The weirdest people in the world? Behavioral and Brain Sciences, 33, 61–135. ↵

Begley, S. (2010). What’s really human? The trouble with student guinea pigs. Retrieved from http://www.newsweek.com/2010/07/23/what-s-really- human.html ↵

Rubin, C. & Babbie, S. (2017). Research methods for social work (9th edition). Boston, MA: Cengage. ↵

Keeter, S., Dimock, M., & Christian, L. (2008). Calling cell phones in ’08 pre-election polls. The Pew Research Center for the People and the Press. Retrieved from http://people-press.org/files/legacy-pdf/cell-phone-commentary.pdf ↵

Barton, A. W., Lavner, J. A., Stanley, S. M., Johnson, M. D., & Rhoades, G. K. (2020). “Will you complete this survey too?” Differences between individual versus dyadic samples in relationship research. Journal of Family Psychology, 34(2), 196–203. https://doi.org/10.1037/fam0000583

Image attributions

men women apparel couple by 5688709 CC-0

License

Icon for the Creative Commons Attribution-ShareAlike 4.0 International License

Understanding Research Design in the Social Science Copyright © by Utah Valley University is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License, except where otherwise noted.