When conducting research, it’s essential for researchers to consider the external validity of their study. External validity refers to the extent to which the results of a study can be generalized to other populations, settings, and situations beyond the specific conditions of the study itself. In other words, it determines whether the findings of a particular study hold true in the real world and can be applied to different contexts.

Types of external validity

There are two main types of external validity that researchers need to take into account:

Population validity

Population validity refers to the degree to which the study sample accurately represents the entire population of interest. If the sample is not representative of the target population, the results may not be generalizable to that larger group.

Example: low population validity

You want to study if people view themselves as more academically capable than others. The target population is 10,000 undergrads at your university. You recruit over 200 participants who are mostly American, male, 18-20 years old, from high socioeconomic backgrounds, and studying science/engineering. In a lab, they take a math/science test and rate their perceived performance.

The results show the average participant believes they performed better than 66% of peers.

Can you conclude most people see themselves as smarter than others in math/science? No, because your sample lacks population validity. It only represents a specific demographic (young, affluent, male, STEM students), not the diverse undergraduate population. To generalize findings, your sample must include students of different genders, ages, backgrounds, and majors.  

Ecological validity

Ecological validity refers to the extent to which the study’s findings can be applied to real-life situations and settings outside of the controlled research environment. If the study conditions are too artificial or contrived, it may be difficult to generalize the results to more naturalistic settings.

Example: low ecological validity

You want to study the effects of sleep deprivation on emotional regulation abilities. In a highly controlled laboratory setting, you ask participants to stay awake for 36 hours straight. During this period, you strictly regulate their environment, including lighting, temperature, noise levels, and access to stimulants like caffeine. At the end of the 36-hour period, you measure participants’ ability to regulate their emotions using computerized tasks and self-report questionnaires.

While this study allows you to isolate the effects of sleep deprivation under tightly controlled conditions, it lacks ecological validity because the artificial laboratory environment does not reflect the real-world situations and challenges that individuals typically face when sleep-deprived.

Trade-off between external and internal validity

There is often a trade-off between external and internal validity. While internal validity refers to the degree to which the study’s design and procedures minimize potential confounding variables and ensure that the observed effects are truly caused by the independent variable, increasing internal validity can sometimes come at the expense of external validity.

Internal vs. external validity example

In a tightly controlled laboratory experiment investigating the effects of sleep deprivation on cognitive performance, researchers may ensure high internal validity by carefully controlling factors such as lighting, temperature, and noise levels. However, this highly controlled environment may lack ecological validity, as it may not accurately reflect the real-world conditions and distractions that individuals typically experience when sleep-deprived.

Threats to external validity and how to counter them

Several factors can threaten the external validity of a study, making it challenging to generalize the findings to other populations or settings. Here is an example with some common threats to external validity and examples of how to counter them:

Research example

A team of researchers wants to investigate the effect of a new cognitive training program on improving memory and attention skills in older adults. They recruit a sample of 50 participants aged 65-75 from a local retirement community.

At the beginning of the study, participants complete a battery of cognitive tests to assess their baseline memory and attention abilities. They are then randomly assigned to either the experimental group, which receives the new cognitive training program, or the control group, which does not receive any intervention.

The cognitive training program involves engaging in various computer-based exercises and activities designed to challenge and enhance memory and attention skills. Participants in the experimental group attend weekly 1-hour training sessions at the research facility for 12 weeks.

After the 12-week intervention period, all participants complete the same cognitive tests again to measure any changes in their memory and attention performance. The researchers find that the experimental group who received the cognitive training program showed significantly better scores on the memory and attention tests compared to the control group.

Based on these results, the researchers conclude that the new cognitive training program is effective in improving memory and attention abilities in older adults.

ThreatMeaningExample
Sampling biasWhen the sample is not representative of the target population due to systematic errors in the sampling process.The sample was drawn from a single retirement community, which may not be representative of the broader population of older adults in terms of factors such as socioeconomic status, education level, and health conditions.
HistoryWhen external events or occurrences during the study period may influence the results, making it difficult to attribute the observed effects solely to the independent variable.If a nationwide campaign promoting brain health and cognitive exercises for seniors was launched during the study period, it could potentially influence the participants’ engagement with cognitive activities, confounding the results of the cognitive training program.
Observer biasWhen the researchers’ expectations or biases influence the way they observe and interpret the participants’ behavior or responses.The researchers may unconsciously provide more encouragement or support to participants in the experimental group during the training sessions, potentially influencing their performance on the cognitive tests.
Hawthorne effectWhen participants alter their behavior or responses simply because they are aware of being observed or studied.Participants in the study may put forth extra effort or motivation during the cognitive training sessions and assessments due to the novelty of being part of a research study.
Testing effectWhen the act of taking a pre-test or being exposed to the study’s procedures influences the participants’ performance or responses on subsequent tests or measures.The pre-test cognitive assessments could potentially sensitize participants to the specific cognitive domains being measured, influencing their performance on the post-test assessments, regardless of the cognitive training intervention.
Aptitude-treatment interactionWhen the effects of the independent variable (treatment) vary depending on the participants’ characteristics or abilities.The effectiveness of the cognitive training program may vary depending on individual factors such as baseline cognitive abilities, motivation, and technology proficiency, which could limit the generalizability of the findings.
Situation effectWhen the specific characteristics of the study setting or context influence the participants’ behavior or responses, making it difficult to generalize the results to other settings.The study was conducted in a controlled research setting, where participants attended weekly training sessions. This artificial environment may not accurately reflect the conditions and distractions individuals would face when trying to engage in cognitive training exercises at home or in other real-world settings.

How to counter threats to external validity

Researchers can employ various strategies to enhance the external validity of their studies and increase the generalizability of their findings:

  • Replications: Replicating the study across different populations, settings, and contexts can help assess the robustness and generalizability of the findings. If similar results are obtained in multiple replications, it increases confidence in the external validity of the study.
  • Field experiments: Conducting research in naturalistic, real-world settings rather than highly controlled laboratory environments can improve ecological validity and increase the likelihood that the findings will generalize to other real-world situations.
  • Probability sampling: Employing probability sampling techniques, such as simple random sampling or stratified random sampling, can help ensure that the sample is representative of the target population, thereby enhancing population validity.
  • Recalibration: Researchers can adjust or recalibrate their findings based on information about the characteristics of their sample and the target population. This process involves statistically correcting for known biases or discrepancies between the sample and the broader population, allowing for more accurate generalization of the results.

Other strategies to counter threats to external validity include:

  • Using multiple measures and data sources (e.g., observations, self-reports, physiological measures) to triangulate the findings and increase confidence in their generalizability.
  • Providing rich and detailed descriptions of the study context, participants, and procedures to enable readers to assess the potential for generalizability to other settings or populations.
  • Involving relevant stakeholders, such as practitioners or community members, in the research process to ensure that the study addresses real-world issues and is relevant to the intended audience.
  • Conducting follow-up assessments over an extended period to evaluate the long-term effects and generalizability of the findings.