Correlational research is a type of non-experimental research that investigates the relationship between two or more variables without manipulating them. The main goal of correlational research is to determine whether there is a significant association between variables and to what extent they are related. Correlational research can be used to explore the direction and strength of relationships between variables, but it cannot establish a cause-and-effect relationship.

There are three types of correlations that can be observed in correlational research:

  • Positive correlation: This occurs when an increase in one variable is associated with an increase in another variable. For example, as the number of hours spent studying increases, the test scores of students also increase.
  • Negative correlation: This occurs when an increase in one variable is associated with a decrease in another variable. For example, as the number of hours spent watching television increases, the amount of time spent on physical activity decreases.
  • Zero correlation: This occurs when there is no relationship between two variables. For example, there may be no correlation between a person’s height and their favorite color.

Correlational vs. experimental research

Correlational research and experimental research are two different approaches to investigating relationships between variables. The main differences between these two types of research are:

Correlational research  Experimental research
PurposeTo investigate the relationship between variables without manipulating themTo investigate the causal relationship between variables by manipulating one variable and observing its effect on another
VariablesVariables are not manipulated, and the researcher observes the naturally occurring relationships between themOne or more variables are manipulated by the researcher to observe their effect on another variable
ControlThere is no control over the variables, and the researcher cannot establish cause-and-effect relationshipsThe researcher has control over the variables and can establish cause-and-effect relationships
ValidityCorrelational research has lower internal validity than experimental research because it cannot establish cause-and-effect relationshipsExperimental research has higher internal validity than correlational research because it can establish cause-and-effect relationships

When to use correlational research

Correlational research can be used in various situations, such as:

To investigate non-causal relationships

Correlational research is often used to examine relationships between variables without inferring causality. In other words, researchers can use correlational studies to determine if there is a significant association between two or more variables, but they cannot conclude that one variable causes changes in another.

Example: A researcher may investigate the relationship between a person’s daily coffee consumption and their self-reported anxiety levels. While the study may find a correlation between the two variables (e.g., higher coffee consumption is associated with higher anxiety levels), it cannot conclude that drinking coffee causes anxiety or vice versa. Other factors, such as stress or genetics, may influence both coffee consumption and anxiety levels.

To explore causal relationships between variables

Although correlational research cannot directly establish causality, it can be used as a preliminary step to explore potential causal relationships between variables. If a strong correlation is found, researchers may follow up with experimental studies to investigate the causal nature of the relationship.

Example: A correlational study may find a strong negative correlation between the amount of time students spend on social media and their academic performance. While this finding does not prove that social media use causes poor academic performance, it provides a basis for conducting an experimental study. In the experimental study, researchers could randomly assign students to different groups (e.g., limited social media use vs. unlimited social media use) and compare their academic performance to determine if social media use has a causal effect.

To test new measurement tools

Correlational research can be used to assess the validity and reliability of new measurement tools, such as questionnaires or scales. By comparing the results of a new measurement tool with those of established tools that measure similar constructs, researchers can determine if the new tool is an accurate and consistent measure of the variable of interest.

Example: A researcher develops a new questionnaire to measure employee job satisfaction. To test the validity of the questionnaire, the researcher administers both the new questionnaire and an established job satisfaction scale to a sample of employees. The researcher then correlates the scores from the two measures. If there is a strong positive correlation between the scores, it provides evidence that the new questionnaire is a valid measure of job satisfaction. If the correlation is weak or non-existent, it suggests that the new questionnaire may not accurately measure job satisfaction and requires further refinement.

How to collect correlational data

There are several methods for collecting correlational data, including:

Surveys

Surveys are a common method for collecting correlational data. They involve asking participants a series of questions about their attitudes, beliefs, behaviors, or experiences. Surveys can be administered online, in person, or through the mail. Researchers can use surveys to collect data on a wide range of variables and to examine the relationships between them.

Example: A researcher is interested in investigating the relationship between job satisfaction and employee turnover intentions. The researcher creates an online survey that asks employees to rate their level of job satisfaction and their likelihood of seeking a new job in the next year. The survey also collects demographic information, such as age, gender, and job tenure. By analyzing the survey data, the researcher can determine if there is a significant correlation between job satisfaction and turnover intentions and if any demographic variables moderate this relationship.

Naturalistic observation

Naturalistic observation involves observing and recording the behavior of individuals in their natural environment. This method allows researchers to collect data on behavior as it occurs in real-world settings, without the artificial constraints of a laboratory. Researchers can use naturalistic observation to examine the relationships between variables that are difficult to measure through surveys or experiments.

Example: A researcher is interested in investigating the relationship between parenting styles and children’s social skills. The researcher observes parent-child interactions at a local playground and records the parenting behaviors (e.g., warmth, control) and children’s social behaviors (e.g., cooperation, aggression). By analyzing the observational data, the researcher can determine if there is a significant correlation between specific parenting styles and children’s social skills.

Secondary data

Secondary data refers to data that has been collected by someone else for a different purpose. Researchers can use secondary data sources, such as government databases, medical records, or online repositories, to investigate relationships between variables without having to collect new data. Secondary data analysis is often less expensive and time-consuming than primary data collection.

Example: A researcher is interested in investigating the relationship between socioeconomic status and health outcomes. Rather than collecting new data, the researcher accesses a national health survey database that contains information on participants’ income, education, and various health measures (e.g., blood pressure, BMI). By analyzing the secondary data, the researcher can determine if there is a significant correlation between socioeconomic status and specific health outcomes and if any demographic variables moderate this relationship.

How to analyze correlational data

There are two main methods for analyzing correlational data:

Correlation analysis

This involves calculating a correlation coefficient, which is a numerical value that indicates the strength and direction of the relationship between two variables. The most commonly used correlation coefficient is Pearson’s r, which ranges from -1 to +1. A value of +1 indicates a perfect positive correlation, a value of -1 indicates a perfect negative correlation, and a value of 0 indicates no correlation.

Regression analysis

This involves using one or more predictor variables to predict the value of a dependent variable. Regression analysis can be used to determine the strength and direction of the relationship between variables and to predict the value of the dependent variable based on the values of the predictor variables.

Correlation and causation

It is important to note that correlation does not imply causation. Just because two variables are correlated does not mean that one variable causes the other. There are two main problems with inferring causation from correlation:

Directionality problem

The directionality problem refers to the difficulty in determining the direction of the relationship between two variables based on correlation alone. In other words, if variable A is correlated with variable B, it is unclear whether A causes B, B causes A, or if there is a bidirectional relationship between the two.

Example: A study finds a positive correlation between the number of hours students spend on social media and their levels of anxiety. Based on this correlation, it might be tempting to conclude that social media use causes anxiety. However, the directionality problem suggests that it is equally plausible that students with higher levels of anxiety tend to spend more time on social media as a coping mechanism. Without further investigation, such as an experimental study, it is impossible to determine the direction of the causal relationship between social media use and anxiety.

Third variable problem

The third variable problem refers to the possibility that the relationship between two variables is caused by a third, unmeasured variable. In this case, the correlation between the two variables is spurious, meaning that it is not a true relationship but rather the result of both variables being related to the third variable.

Example: A study finds a positive correlation between ice cream sales and the number of drowning incidents at the beach. Based on this correlation, it might be tempting to conclude that eating ice cream causes people to drown. However, the third variable problem suggests that a more likely explanation is that both ice cream sales and drowning incidents are related to a third variable: hot weather. When the weather is hot, more people buy ice cream and more people go swimming, which increases the likelihood of drowning incidents. In this case, the correlation between ice cream sales and drowning incidents is spurious and does not reflect a causal relationship between the two variables.