As a teacher, ensuring the validity of your assessments is crucial for accurately evaluating student learning and proficiency. One key aspect of assessment validity is criterion validity – the degree to which test scores or other assessment results correlate with a relevant external measure or outcome.
Example: Criterion validity
You have developed a new “Math Aptitude Test” (MAT) to predict students’ performance in a college-level mathematics course. To establish criterion validity, you want to compare the scores on the MAT with the actual grades students achieve in the math course at the end of the semester.
You administer the MAT to 100 students at the beginning of the semester. After the semester ends, you collect their final grades in the math course. Then, you calculate the correlation coefficient between the MAT scores and the final grades.
After calculating the correlation coefficient, the correlation between MAT scores and final grades is 0.85. This high positive correlation indicates that the MAT scores are strongly related to the actual performance in the math course.
Interpreting the results:
A correlation coefficient of 0.85 suggests a strong positive relationship between the MAT scores and the final grades. This means that students who scored higher on the MAT tended to achieve higher grades in the math course. The strong correlation provides evidence of criterion validity, indicating that the MAT is a valid measure for predicting success in the college-level mathematics course.
In this example, the criterion (the variable being predicted) is the final grade in the math course, and the MAT scores serve as the predictor variable. The high correlation between the two variables supports the criterion validity of the Math Aptitude Test.
What is Criterion Validity?
Criterion validity refers to the extent to which performance on a given assessment or evaluation instrument is related to or predicts performance on some other measure or outcome considered a relevant criterion. In other words, it examines how well the assessment results correspond to a separate, independent indicator of the knowledge, skills, or abilities being measured.
Criterion validity is important because it demonstrates an assessment’s practical usefulness and real-world applicability. If an exam or evaluation tool shows strong criterion validity, it suggests the assessment is a good predictor of examinees’ performance in relevant, authentic contexts.
Types of Criterion Validity
There are two primary types of criterion validity, which differ based on the timing of when the assessment results and the criterion measure are obtained.
Concurrent Validity
Concurrent validity is demonstrated when the scores on a new assessment or test correlate highly with those on an existing, established instrument that is considered a valid measure of the same construct. This form of criterion validity is assessed by simultaneously administering the new test and criterion measure.
Establishing concurrent validity is particularly important when a new assessment claims to be superior to existing measures in some way, such as being more objective, efficient, or cost-effective.
Example: Concurrent Validity
A psychologist creates a new self-report measure of body image dissatisfaction and wants to evaluate its concurrent validity. They can do this by administering the new test and comparing the scores to a clinical diagnosis of body image issues made at the same time. A strong positive correlation between the two measures would provide evidence of the new test’s concurrent validity.
Predictive Validity
Predictive validity refers to an assessment’s ability to forecast future performance or outcomes. In this case, the assessment scores are examined in relation to a criterion variable that is measured at some point in the future after the assessment has been administered.
Researchers often investigate the predictive validity of tests by looking at how well the initial test results predict a relevant future outcome, such as using an IQ test to predict future academic achievement.
Example: Predictive Validity
Suppose a researcher wants to examine the predictive validity of a college entrance math exam in relation to performance in an engineering degree program. They would compare students’ scores on the math entrance exam to their GPA after the first semester of the engineering program. If high math test scores are associated with strong academic performance in the engineering program, as indicated by a high GPA, then the math exam would demonstrate good predictive validity.
Note: The criterion validity type depends on the assessment instrument’s purpose and intended use. Both concurrent and predictive validity provide valuable insights into the assessment’s real-world applicability and practical usefulness.
Criterion Validity Example
Employers often use various assessments and tests as part of the hiring process to evaluate candidates’ skills, abilities, and fit for a particular role. Criterion validity is an important consideration when validating the use of these hiring assessments.
Example: Assessing the Criterion Validity of a Customer Service Aptitude Test
A retail company wants to implement a new customer service aptitude test as part of their hiring process for entry-level sales associate positions. The goal is to use the test to identify candidates most likely to excel in providing high-quality customer service.
To establish the criterion validity of this new aptitude test, the company could take the following steps:
- Administer the customer service aptitude test to a group of current sales associates.
- Gather performance ratings or metrics for those same sales associates, as evaluated by their managers. These performance measures serve as the criterion variables.
- Analyze the correlation between the sales associates’ aptitude test scores and their actual job performance ratings.
If the test scores show a strong positive correlation with the criterion performance measures, it would provide evidence of the test’s criterion validity. This would indicate that the aptitude assessment accurately predicts how candidates are likely to perform in the customer service-focused sales associate role.
Establishing this type of criterion-related validity is crucial, as it demonstrates the hiring assessment’s practical relevance and real-world applicability. It gives the company confidence that using the customer service aptitude test will help identify candidates with the necessary skills and abilities to succeed.
How to Measure Criterion Validity
Evaluating the criterion validity of an assessment or measurement instrument involves quantifying the statistical relationship between the scores on the new measure and an established external criterion.
The most common approach is to calculate the Pearson correlation coefficient, denoted as “r,” which ranges from -1 to +1. This correlation coefficient represents the strength and direction of the linear relationship between the two variables.
Here’s how you can use correlational analysis to assess criterion validity:
Identify the Criterion Variable: Determine the appropriate external measure or outcome you want to use as the criterion. This should be a well-established, validated indicator of the construct you are trying to assess.
Collect Concurrent/Predictive Data: Administer your new assessment instrument and the criterion measure, either simultaneously (concurrent validity) or with a time lag (predictive validity).
Calculate the Correlation Coefficient: Use statistical software to compute the Pearson correlation coefficient between the scores on your new measure and the criterion variable.
Interpret the Results
- r = 1.0: Perfect positive correlation, indicating maximal criterion validity
- r = 0.0: No correlation, suggesting lack of criterion validity
- r = -1.0: Perfect negative correlation
Generally, a correlation coefficient above 0.70 demonstrates strong criterion validity, while values below 0.40 indicate poor alignment with the criterion.
Evaluate the Significance: Assess whether the obtained correlation coefficient is statistically significant, meaning the relationship is unlikely to have occurred by chance.