U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • J Bras Pneumol
  • v.43(1); Jan-Feb 2017

Types of outcomes in clinical research

Tipos de desfecho em pesquisa clínica, juliana carvalho ferreira.

1 . Methods in Epidemiologic, Clinical and Operations Research-MECOR-program, American Thoracic Society/Asociación Latinoamericana del Tórax, Montevideo, Uruguay.

2 . Divisão de Pneumologia, Instituto do Coração - InCor - Hospital das Clínicas, Faculdade de Medicina, Universidade de São Paulo, São Paulo, Brasil.

Cecilia Maria Patino

3 . Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA.

PRACTICAL SCENARIO

In a randomized trial evaluating the efficacy of a new drug for pulmonary arterial hypertension (PAH), patients were randomly assigned to receive the new drug or a placebo. The primary composite outcome was the time to the first PAH-related event (worsening of symptoms, initiation of treatment with prostanoids, lung transplantation, or atrial septostomy) or to death. Secondary outcomes included changes in the 6-minute walk distance (6MWD) and adverse events.

DEFINITIONS

Outcomes (also called events or endpoints) are variables that are monitored during a study to document the impact that a given intervention or exposure has on the health of a given population. Typical examples of outcomes are cure, clinical worsening, and mortality. The primary outcome is the variable that is the most relevant to answer the research question. Ideally, it should be patient-centered (i.e., an outcome that matters to patients, such as quality of life and survival).

Secondary outcomes are additional outcomes monitored to help interpret the results of the primary outcome: in our example, an increase in the 6MWD is inversely associated with the need for lung transplantation. They can also provide preliminary data for a larger study. For example, a preliminary trial that uses 6MWD as the primary outcome may include mortality as a secondary outcome if the power of the study to detect a difference in mortality is low. Although investigators may be tempted to monitor several outcomes, the effort and cost to monitor various outcomes may be prohibitive. Therefore, it is essential to decide which outcome(s) to monitor ( Table 1 ).

OutcomePatient-centeredCompositeSurrogate
AsthmaAsthma control (questionnaire)Hospitalization or a > 20% decline in asthma controlFEV , peak flow, eosinophils
PAH2-year survivalLung transplantation or death6MWD, PASP
ARDSHospital survivalTime to extubation or tracheotomyPaO /FiO ratio, ventilator-free days

PAH: pulmonary arterial hypertension; 6MWD: six-minute walk distance; and PASP: pulmonary artery systolic pressure.

Surrogate outcomes are biomarkers intended to substitute for a clinical outcome, for example, 6MWD as a marker of disease severity in PAH. Surrogate outcomes are typically continuous variables and occur earlier than does the clinical outcome, reducing costs, study duration, and size. Surrogates are commonly used as the primary outcome in phase I and II clinical trials. However, they may lead to false interpretations of the efficacy of the intervention if the surrogate is not a very good predictor of the clinical outcome.

Composite outcomes are made up of multiple variables. In our practical scenario, the primary outcome was composed of several clinical outcomes related to disease progression. Using composite outcomes has the advantage of increasing the power of the study when each of the events is rare and when events are competitive (patients who die cannot have a lung transplant). However, the interpretation of results can be misleading: if the intervention reduces the occurrence of the composite outcome, it does not necessarily mean that it reduces the occurrence of all of its components.

IMPORTANT CONSIDERATIONS

  • The study outcomes should be stated a priori (before the researcher looks at the results) in order to avoid the risk of drawing false conclusions by testing every possible variable until one is statistically significant.
  • The sample size calculation should be carried out to detect a clinically relevant effect of the intervention on the primary outcome, although calculations can also be made for secondary outcome variables, which may increase the sample size but also increase trial validity.
  • More importantly, the choice of the most suitable outcome should be based on the research question and the corresponding hypothesis.

RECOMMENDED READING

what is an outcome variable in research

  • Where should I go to learn some maths/statistics/biostatistics?
  • How do I formulate research questions and hypotheses?
  • How do I calculate sample size and power for my study?
  • How do I prepare a research plan / research protocol?
  • Some guidelines for the design of clinical trials
  • How should I enter and store the data?
  • Data [R]angling – finding a specific observation for an individual using dplyr
  • What program should I use?

What type of variable is the outcome variable?

  • Principal Components Analysis
  • Cluster Analysis
  • Network diagrams
  • How do I work with dates and times in R?
  • What is a p-value? What is a confidence interval?
  • How can I present my results clearly?
  • Statistics can be dangerous work

Which variable is the outcome variable?

If you intend to use statistical models to test your research hypotheses, you need to start by choosing which variables you are going to treat as your ‘outcome’, and which as the independent, or `predictor’ variables. The outcome is the attribute that you think might be predicted, or affected, by other attributes – for example, a disease that is affected by lifestyle factors. In a mathematical model, it is normally placed on the left hand side of the equation. The variables that you think might have an effect on the outcome are placed on the right hand side of the model equation. Some different jargon people use for outcomes and predictors are:

Dependent variable Independent variable
Response Explanatory variable
Regressand Regressor
Covariate
Exposure variable

A note on causality:

When interpreting the results of your statistical model, remember to bear in mind that picking one of your variables to be an outcome, and another to be a predictor, and running a model which gives a strong result unfortunately does not necessarily prove that the predictor `caused’ the outcome. If you switched the variables around so that the predictor was now treated as being independent on the outcome, your result will be just as strong. The model will give evidence that the two variables are related, and to be sure that the variable you’ve chosen as predictor is the one doing the affecting and not the other way around, you will normally need further information. information. Some thoughts on determining causality.

Types of variables

The type of analysis you run will be dictated partly by the outcome variable: is it continuous or discrete/categorical ? Continuous variables can take on almost any value within a range, as its name suggests. Examples of continuous variables are height of people, age, BMI and blood pressure. Even if the values are restricted (for example, a measurement device with coarse gradations so that there are gaps between possible values), we can usually `model’ the variable as continuous.

Categorical , or discrete, variables are those with only a few possible values. They are typically created to describe categories, eg. male/female, nationality, level of education. Those with only two categories are known as binary variables (or sometimes dummy or boolean variable). If your outcome variable is categorical, you also need to decide if it is `ordered’: that is, if the categories have some sort of meaningful order. Examples of ordered categories would be level of education, low/med/high SES categories, responses to a survey such as “Never”, “sometimes” and “always”. Ordered categorical variables are known as ordinal variables. Examples of categorical variables with no order are nationality, job type or marital status. These are sometimes called nominal .

There are some situations where the line between ordinal and continuous is blurry. A common one is survey responses with ordered levels (eg. response levels Strongly Agree, Agree Somewhat etc.). Whether to treat such data as continuous, with a mean and standard deviation, has been debated extensively (eg. Jamieson , Brown ). There is an excellent discussion on treating ordinal variables as continuous, and vice versa, here .

If your outcome variable is measured at more than one point in time, you will need to consider employing longitudinal data techniques… we’re hoping to post something on this soon…

Time-to-event, or survival outcomes

If your outcome is the time elapsed before an event occurs (such as death or heart attack), you can expect to employ one of the methods of ‘survival analysis’, because of the special nature of such a variable (see Cochrane and Cornell University for a start).

A link to working with dates and times in Stata & R

When there are multiple outcomes

Many similar outcomes. See our posts on principal components analysis and cluster analysis .

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

  • Search for:
  • February 2017
  • September 2016
  • February 2015
  • January 2015
  • case control studies
  • clinical trials
  • cluster-analysis
  • confidence intervals
  • confounding
  • dates and times
  • hypothesis development
  • network-diagrams
  • power calculations
  • principal components analysis
  • r-as.POSIXct
  • r-installation
  • r-t (transpose)
  • r-tutorials
  • research protocols
  • stata-installation
  • stata-tutorials
  • statistical programs
  • variable types

Authorised by the Manager, Pathways Project | Site Feedback

helpful professor logo

13 Predictor and Outcome Variable Examples

predictor vs outcome variables, explained below

A predictor variable is used to predict the occurrence and/or level of another variable, called the outcome variable.

A researcher will measure both variables in a scientific study and then use statistical software to determine if the predictor variable is associated with the outcome variable. If there is a strong correlation, we say the predictor variable has high predictive validity .

This methodology is often used in epidemiological research. Researchers will measure both variables in a given population and then determine the degree of association between the predictor and outcome variable.

This allows scientists to examine the connection between many meaningful variables, such as exercise and health or personality type and depression, just to give a few examples.

Although this type of research can provide significant insights that help us understand a phenomenon, we cannot say that the predictor valuable causes the outcome variable.

In order to use the term ‘cause and effect’, the researcher must be able to control and manipulate the level of a variable and then observe the changes in the other variable.

Definition of Predictor and Outcome Variables

In reality, many variables usually affect the outcome variable. So, researchers will measure numerous predictor variables in the population under study and then determine the degree of association that each one has with the outcome variable.

It sounds a bit complicated, but fortunately, the use of a statistical technique called multiple regression analysis simplifies the process.

As long as the variables are measured accurately and the population size is large, the software will be able to determine which of the predictor variables are associated with the outcome variable and the degree of association.

Not all predictors will have an equal influence on the outcome variable. Some may have a very small impact, some may have a substantial impact, and others may have no impact at all.

Predictor and outcome are not to be confused with independent and dependent variables .

Examples of Predictor and Outcome Variables

1. diet and health.

Does the food you eat have any impact on your physical health? This is a question that a lot of people want to know the answer to.

Many of us have very poor diets, with lots of fast food and salty snacks. Other people, however, almost never make a run through the drive-thru, and consume mostly fruits and veggies.

Thankfully, epidemiological research can give us a relatively straightforward answer. First, researchers measure the quality of diet of each person in a large population.

So, they will track how much fast food and fruits and veggies people consume. There are a lot of different ways to measure this.

Secondly, researchers will measure some aspects of health. This could involve checking cholesterol levels, for example. There are a lot of different ways to measure health. The final step is to input all of the data into the statistical software program and perform the regression analysis to see the results.

Quality of diet is the predictor variable, and health is the outcome variable.

2. Noise Pollution and IQ

One scientist speculates that living in a noisy environment will affect a person’s ability to concentrate, which will then affect their mental acuity and subsequent cognitive development .

So, they decide to conduct a study examining the relationship between noise pollution and IQ.

First, they travel through lots of different neighborhoods and use a sound level meter to assess noise pollution. Some neighborhoods are in the suburbs, and some are near busy highways or construction sites.

Next, they collect data on SAT scores of the children living in those neighborhoods.

They then conduct a regression analysis to determine the connection between the sound level meter data and the SAT scores.

In this example, the predictor variable is the sound levels, and the outcome variable is the SAT scores.

Surprisingly, the results revealed an inverse relationship between noise and SAT scores. That is, the more noise in the environment the higher the SAT score. Any idea why?

3. Family Income and Achievement Test Scores

In this study, sociologists conducted a study examining the relationship between how much income a family has and the achievement test scores of their children.

The researchers collected data from schools on the achievement test scores of hundreds of students and then estimated the household income of the families based on the occupation of the parents.

The results revealed a strong relationship between family income and test scores, such that the higher the family income, the higher the test score of the child.

In this example, family income is the predictor variable, and test score is the outcome variable.

4. Parental Utterances and Children’s Vocabulary

A team of child psychologist is interested in the impact of how much parents talk to their child and that child’s verbal skills.

So, they design a study that involves observing families in the home environment. They randomly choose 50 families to study that live nearby.

A research assistant visits each family, records, and later counts the number of utterances spoken by the mother directed at their only child.

On a different occasion, a second research assistant administers a verbal skills test to every child. Yes, this type of study takes a lot of time.

The regression analysis reveals a direct relationship between the number of utterances from the mother and the child’s verbal skills test score. The more utterances, the higher the score.

In this example, the predictor variable is the number of utterances directed at the child, and the outcome variable is the child’s verbal skills test score.

5. Video Games and Aggressiveness

The debate about the effects of TV violence and video games has been raging for nearly 70 years. There have been hundreds, maybe even thousands of studies conducted on the issue.

One type of study involves assessing how frequently a group of people play certain video games and then tracking their level of aggressiveness over a period of time.

Of course, there are other factors involved in whether a person is aggressive or not, so the researchers might assess those variables as well.

In this type of study, the predictor variable is the frequency of playing video games, and the outcome variable is the level of aggressiveness.

6. Chemicals in Food Products and Puberty

In many countries, farmers may inject various antibiotics and growth hormones into their cattle to ward off infection and increase body mass and milk production.

Unfortunately, those chemicals do not disappear once the food hits the supermarket shelves. Some parents, educators, and food scientists began to notice an association between these agricultural practices and the onset of puberty in young children.

Numerous scientific studies were conducted examining the relationship between these practices and puberty.

So, the researchers studied the relationship between the predictor variable (chemicals in food) and the outcome variable (onset of puberty).

7. Full Moon and Craziness

Who hasn’t heard that a full moon brings out the crazies? A lot of people have theorized that when the moon is full, people get a little bit wild and uninhibited.

That can lead to people doing things they would not normally do.

To put this theory to the test, a group of criminologists decides to examine the police records of numerous large cities and compare that with the lunar cycle.

The researchers input all of the data into a stats program to examine the degree of association between police incidents and the moon.

In this study, the lunar cycle is the predictor variable, and contravention of the law is the outcome variable.  

8. Testosterone and Leadership Style

There are many types of leadership styles. Some leaders are very people-oriented and try to help their employees prosper and feel good about their jobs.

Other leaders are more task-driven and prefer to clearly define objectives, set deadlines, and push their staff to work hard.

To examine the relationship between leadership style and testosterone, a researcher first administers a questionnaire to hundreds of employees in several types of companies. The questionnaire asks the employees to describe the leadership style of their primary supervisor.

At the same time, the researcher also collects data on the testosterone levels of those supervisors and matches them with the questionnaire data.

By examining the association between the two, it will be possible to determine if there is a link between leadership style and testosterone.

The predictor variable is testosterone, and the outcome variable is leadership style.  

9. Personality Type and Driver Safety

A national bus company wants to hire the safest drivers possible. Fewer accidents mean passengers will be safe and their insurance rates will be lower. 

So, the HR staff begin collecting data on the safety records of their drivers over the last 3 years. At the same time, they administer a personality inventory that assesses Type A and Type B personalities.

The Type A personality is intense, impatient, and highly competitive. The Type B personality is easygoing and relaxed. People have varying levels of each type.

The HR department wants to know if there is a relationship between personality type (A or B) and accidents among their drivers.

The predictor variable is personality type, and the outcome variable is the number of accidents.

10. Vitamins and Health

Americans take a lot of vitamins. However, there is some debate about whether vitamins actually do anything to improve health.

There are so many factors that affect health, will taking a daily supplement really count?

So, a group of small vitamin companies pull their resources and hire an outside consulting firm to conduct a large-scale scientific study.

The firm randomly selects thousands of people from throughout the country to participate in the study. The people selected come from a wide range of SES backgrounds, ethnicities, and ages.

Each person is asked to go to a nearby hospital and have a basic health screening that includes cholesterol and blood pressure. They also respond to a questionnaire that asks if they take a multi-vitamin, how many and how often.

The consulting firm then compares the degree of association between multi-vitamins and health.

Multi-vitamin use is the predictor variable, and health is the outcome variable.

11. Automobiles and Climate Change

A group of climatologists has received funding from the EU to conduct a large-scale study on climate change.

The researchers collect data on a wide range of variables that are suspected of affecting the climate. Some of those variables include automobile production, industrial output, size of cattle herds, and deforestation, just to name a few.

The researchers proceed by gathering the data beginning with the 1970s all the way to the current year. They also collect data on yearly temperature fluctuations.

Once all the data is collected, it is put into a stats program, and a few minutes later, the results are revealed.

In this example, there are many predictor variables, such as automobile production, and one primary outcome variable (yearly temperature fluctuations).

12. Smartphone Use and Eye Strain

If you’ve ever noticed, people spend a lot of time looking at their smartphones.

When they are reading, when they are waiting in line, in bed at night, and even when walking from point A to point B.

Many optometrists are concerned that all of this screen time is doing harm to people’s eyesight. So, they decide to conduct a study.

Fortunately, they all work for a nationwide optometry company with offices located in Wal-Marts.

When patients come into their office, they give each one a standard eye exam. They also put a question on the in-take form asking each person to estimate how many hours a day they spend looking at their smartphone screen. 

Then they examine the relation between screen-time usage and the results of the eye exams.

In this study, the predictor variable is screen-time, and the outcome variable is the eye-exam results.

13. Soil Composition and Agricultural Yields

Although farming looks easy, it can be a very scientific enterprise. Agriculturalists study the composition of soil to help determine what type of food will grow best.

Today, they know a lot about which soil nutrients affect the growth of different plant varieties because there have been decades of studies.

The research involves collecting soil samples, measuring crop yields, and then examining the association between the two.

For example, scientists will measure the pH levels, mineral composition, as well as water and air content over many acres of land and relate that to the amount harvested of a particular crop (e.g., corn).

In this example, there are numerous predictor variables, all of which have some effect on crop growth, which is the outcome variable.

Even though there are so many variables to consider, the regression analysis will be able to tell us how important each one is in predicting the outcome variable.

There can be a lot of reasons why something happens. More often than not, nothing happens as a result of just one factor. Our physical health, climate change, and a person’s level of aggressiveness are all the result of numerous factors.

Fortunately for science, there is a brilliant way of determining which factors are connected to a phenomenon and how strong is each and every one of them.

By collecting data on a predictor variable (or variables) and then examining the association with the outcome variable, we can gain valuable insights into just about any subject matter we wish to study.

Ferguson, C. J., & Kilburn, J. (2010). Much ado about nothing: The misestimation and overinterpretation of violent video game effects in Eastern and Western nations: Comment on Anderson et al. (2010). Psychological Bulletin, 136 (2), 174–178. https://doi.org/10.1037/a0018566

Ferguson, C. J., San Miguel, C., Garza, A., & Jerabeck, J. M. (2012). A longitudinal test of video game violence influences on dating and aggression: A 3-year longitudinal study of adolescents, Journal of Psychiatric Research, 46 (2), 141-146. https://doi.org/10.1016/j.jpsychires.2011.10.014

Gordon, R. (2015). Regression Analysis for the Social Sciences (2 nd ed). New York: Routledge.

Hoff, E. (2003). The specificity of environmental influence: Socioeconomic status affects early vocabulary development via maternal speech. Child Development, 74 (5), 1368–1378.

Lopez-Rodriguez, D., Franssen, D., Heger, S., & Parent, AS. (2011). Endocrine-disrupting chemicals and their effects on puberty. Best Practice & Research Clinical Endocrinology & Metabolism, 35 (5), 101579. https://doi.org/10.1016/j.beem.2021.101579

Man, A., Li, H., & Xia, N. (2020). Impact of lifestyles (Diet and Exercise) on vascular health: Oxidative stress and endothelial function. Oxidative Medicine and Cellular Longevity , 1496462. https://doi.org/10.1155/2020/1496462

Thompson, R., Smith, R. B., Karim, Y. B., Shen, C., Drummond, K., Teng, C., & Toledano, M. B. (2022). Noise pollution and human cognition: An updated systematic review and meta-analysis of recent evidence. Environment International , 158 , 106905.

Dave

Dave Cornell (PhD)

Dr. Cornell has worked in education for more than 20 years. His work has involved designing teacher certification for Trinity College in London and in-service training for state governments in the United States. He has trained kindergarten teachers in 8 countries and helped businessmen and women open baby centers and kindergartens in 3 countries.

  • Dave Cornell (PhD) https://helpfulprofessor.com/author/dave-cornell-phd/ 15 Theory of Planned Behavior Examples
  • Dave Cornell (PhD) https://helpfulprofessor.com/author/dave-cornell-phd/ 18 Adaptive Behavior Examples
  • Dave Cornell (PhD) https://helpfulprofessor.com/author/dave-cornell-phd/ 15 Cooperative Play Examples
  • Dave Cornell (PhD) https://helpfulprofessor.com/author/dave-cornell-phd/ 15 Parallel Play Examples

Chris

Chris Drew (PhD)

This article was peer-reviewed and edited by Chris Drew (PhD). The review process on Helpful Professor involves having a PhD level expert fact check, edit, and contribute to articles. Reviewers ensure all content reflects expert academic consensus and is backed up with reference to academic studies. Dr. Drew has published over 20 academic articles in scholarly journals. He is the former editor of the Journal of Learning Development in Higher Education and holds a PhD in Education from ACU.

  • Chris Drew (PhD) #molongui-disabled-link 15 Theory of Planned Behavior Examples
  • Chris Drew (PhD) #molongui-disabled-link 18 Adaptive Behavior Examples
  • Chris Drew (PhD) #molongui-disabled-link 15 Cooperative Play Examples
  • Chris Drew (PhD) #molongui-disabled-link 15 Parallel Play Examples

1 thought on “13 Predictor and Outcome Variable Examples”

' src=

If I want to undertake an interventional study where I measure the Knowledge, attitudes and practices of adolescents in 3 key sexual and reproductive areas. And their parents’ acceptance of ASRH education for their children, and their misconceptions of ASRH. And then I introduce both children and parents to ASRH education. Then I do an end line to look for improvement in the adolescent’s KAP in those 3 areas, and an increased acceptance of ASRH education among parents, what is my predictor variable and outcome variable?

Leave a Comment Cancel Reply

Your email address will not be published. Required fields are marked *

  • Privacy Policy

Research Method

Home » Variables in Research – Definition, Types and Examples

Variables in Research – Definition, Types and Examples

Table of Contents

Variables in Research

Variables in Research

Definition:

In Research, Variables refer to characteristics or attributes that can be measured, manipulated, or controlled. They are the factors that researchers observe or manipulate to understand the relationship between them and the outcomes of interest.

Types of Variables in Research

Types of Variables in Research are as follows:

Independent Variable

This is the variable that is manipulated by the researcher. It is also known as the predictor variable, as it is used to predict changes in the dependent variable. Examples of independent variables include age, gender, dosage, and treatment type.

Dependent Variable

This is the variable that is measured or observed to determine the effects of the independent variable. It is also known as the outcome variable, as it is the variable that is affected by the independent variable. Examples of dependent variables include blood pressure, test scores, and reaction time.

Confounding Variable

This is a variable that can affect the relationship between the independent variable and the dependent variable. It is a variable that is not being studied but could impact the results of the study. For example, in a study on the effects of a new drug on a disease, a confounding variable could be the patient’s age, as older patients may have more severe symptoms.

Mediating Variable

This is a variable that explains the relationship between the independent variable and the dependent variable. It is a variable that comes in between the independent and dependent variables and is affected by the independent variable, which then affects the dependent variable. For example, in a study on the relationship between exercise and weight loss, the mediating variable could be metabolism, as exercise can increase metabolism, which can then lead to weight loss.

Moderator Variable

This is a variable that affects the strength or direction of the relationship between the independent variable and the dependent variable. It is a variable that influences the effect of the independent variable on the dependent variable. For example, in a study on the effects of caffeine on cognitive performance, the moderator variable could be age, as older adults may be more sensitive to the effects of caffeine than younger adults.

Control Variable

This is a variable that is held constant or controlled by the researcher to ensure that it does not affect the relationship between the independent variable and the dependent variable. Control variables are important to ensure that any observed effects are due to the independent variable and not to other factors. For example, in a study on the effects of a new teaching method on student performance, the control variables could include class size, teacher experience, and student demographics.

Continuous Variable

This is a variable that can take on any value within a certain range. Continuous variables can be measured on a scale and are often used in statistical analyses. Examples of continuous variables include height, weight, and temperature.

Categorical Variable

This is a variable that can take on a limited number of values or categories. Categorical variables can be nominal or ordinal. Nominal variables have no inherent order, while ordinal variables have a natural order. Examples of categorical variables include gender, race, and educational level.

Discrete Variable

This is a variable that can only take on specific values. Discrete variables are often used in counting or frequency analyses. Examples of discrete variables include the number of siblings a person has, the number of times a person exercises in a week, and the number of students in a classroom.

Dummy Variable

This is a variable that takes on only two values, typically 0 and 1, and is used to represent categorical variables in statistical analyses. Dummy variables are often used when a categorical variable cannot be used directly in an analysis. For example, in a study on the effects of gender on income, a dummy variable could be created, with 0 representing female and 1 representing male.

Extraneous Variable

This is a variable that has no relationship with the independent or dependent variable but can affect the outcome of the study. Extraneous variables can lead to erroneous conclusions and can be controlled through random assignment or statistical techniques.

Latent Variable

This is a variable that cannot be directly observed or measured, but is inferred from other variables. Latent variables are often used in psychological or social research to represent constructs such as personality traits, attitudes, or beliefs.

Moderator-mediator Variable

This is a variable that acts both as a moderator and a mediator. It can moderate the relationship between the independent and dependent variables and also mediate the relationship between the independent and dependent variables. Moderator-mediator variables are often used in complex statistical analyses.

Variables Analysis Methods

There are different methods to analyze variables in research, including:

  • Descriptive statistics: This involves analyzing and summarizing data using measures such as mean, median, mode, range, standard deviation, and frequency distribution. Descriptive statistics are useful for understanding the basic characteristics of a data set.
  • Inferential statistics : This involves making inferences about a population based on sample data. Inferential statistics use techniques such as hypothesis testing, confidence intervals, and regression analysis to draw conclusions from data.
  • Correlation analysis: This involves examining the relationship between two or more variables. Correlation analysis can determine the strength and direction of the relationship between variables, and can be used to make predictions about future outcomes.
  • Regression analysis: This involves examining the relationship between an independent variable and a dependent variable. Regression analysis can be used to predict the value of the dependent variable based on the value of the independent variable, and can also determine the significance of the relationship between the two variables.
  • Factor analysis: This involves identifying patterns and relationships among a large number of variables. Factor analysis can be used to reduce the complexity of a data set and identify underlying factors or dimensions.
  • Cluster analysis: This involves grouping data into clusters based on similarities between variables. Cluster analysis can be used to identify patterns or segments within a data set, and can be useful for market segmentation or customer profiling.
  • Multivariate analysis : This involves analyzing multiple variables simultaneously. Multivariate analysis can be used to understand complex relationships between variables, and can be useful in fields such as social science, finance, and marketing.

Examples of Variables

  • Age : This is a continuous variable that represents the age of an individual in years.
  • Gender : This is a categorical variable that represents the biological sex of an individual and can take on values such as male and female.
  • Education level: This is a categorical variable that represents the level of education completed by an individual and can take on values such as high school, college, and graduate school.
  • Income : This is a continuous variable that represents the amount of money earned by an individual in a year.
  • Weight : This is a continuous variable that represents the weight of an individual in kilograms or pounds.
  • Ethnicity : This is a categorical variable that represents the ethnic background of an individual and can take on values such as Hispanic, African American, and Asian.
  • Time spent on social media : This is a continuous variable that represents the amount of time an individual spends on social media in minutes or hours per day.
  • Marital status: This is a categorical variable that represents the marital status of an individual and can take on values such as married, divorced, and single.
  • Blood pressure : This is a continuous variable that represents the force of blood against the walls of arteries in millimeters of mercury.
  • Job satisfaction : This is a continuous variable that represents an individual’s level of satisfaction with their job and can be measured using a Likert scale.

Applications of Variables

Variables are used in many different applications across various fields. Here are some examples:

  • Scientific research: Variables are used in scientific research to understand the relationships between different factors and to make predictions about future outcomes. For example, scientists may study the effects of different variables on plant growth or the impact of environmental factors on animal behavior.
  • Business and marketing: Variables are used in business and marketing to understand customer behavior and to make decisions about product development and marketing strategies. For example, businesses may study variables such as consumer preferences, spending habits, and market trends to identify opportunities for growth.
  • Healthcare : Variables are used in healthcare to monitor patient health and to make treatment decisions. For example, doctors may use variables such as blood pressure, heart rate, and cholesterol levels to diagnose and treat cardiovascular disease.
  • Education : Variables are used in education to measure student performance and to evaluate the effectiveness of teaching strategies. For example, teachers may use variables such as test scores, attendance, and class participation to assess student learning.
  • Social sciences : Variables are used in social sciences to study human behavior and to understand the factors that influence social interactions. For example, sociologists may study variables such as income, education level, and family structure to examine patterns of social inequality.

Purpose of Variables

Variables serve several purposes in research, including:

  • To provide a way of measuring and quantifying concepts: Variables help researchers measure and quantify abstract concepts such as attitudes, behaviors, and perceptions. By assigning numerical values to these concepts, researchers can analyze and compare data to draw meaningful conclusions.
  • To help explain relationships between different factors: Variables help researchers identify and explain relationships between different factors. By analyzing how changes in one variable affect another variable, researchers can gain insight into the complex interplay between different factors.
  • To make predictions about future outcomes : Variables help researchers make predictions about future outcomes based on past observations. By analyzing patterns and relationships between different variables, researchers can make informed predictions about how different factors may affect future outcomes.
  • To test hypotheses: Variables help researchers test hypotheses and theories. By collecting and analyzing data on different variables, researchers can test whether their predictions are accurate and whether their hypotheses are supported by the evidence.

Characteristics of Variables

Characteristics of Variables are as follows:

  • Measurement : Variables can be measured using different scales, such as nominal, ordinal, interval, or ratio scales. The scale used to measure a variable can affect the type of statistical analysis that can be applied.
  • Range : Variables have a range of values that they can take on. The range can be finite, such as the number of students in a class, or infinite, such as the range of possible values for a continuous variable like temperature.
  • Variability : Variables can have different levels of variability, which refers to the degree to which the values of the variable differ from each other. Highly variable variables have a wide range of values, while low variability variables have values that are more similar to each other.
  • Validity and reliability : Variables should be both valid and reliable to ensure accurate and consistent measurement. Validity refers to the extent to which a variable measures what it is intended to measure, while reliability refers to the consistency of the measurement over time.
  • Directionality: Some variables have directionality, meaning that the relationship between the variables is not symmetrical. For example, in a study of the relationship between smoking and lung cancer, smoking is the independent variable and lung cancer is the dependent variable.

Advantages of Variables

Here are some of the advantages of using variables in research:

  • Control : Variables allow researchers to control the effects of external factors that could influence the outcome of the study. By manipulating and controlling variables, researchers can isolate the effects of specific factors and measure their impact on the outcome.
  • Replicability : Variables make it possible for other researchers to replicate the study and test its findings. By defining and measuring variables consistently, other researchers can conduct similar studies to validate the original findings.
  • Accuracy : Variables make it possible to measure phenomena accurately and objectively. By defining and measuring variables precisely, researchers can reduce bias and increase the accuracy of their findings.
  • Generalizability : Variables allow researchers to generalize their findings to larger populations. By selecting variables that are representative of the population, researchers can draw conclusions that are applicable to a broader range of individuals.
  • Clarity : Variables help researchers to communicate their findings more clearly and effectively. By defining and categorizing variables, researchers can organize and present their findings in a way that is easily understandable to others.

Disadvantages of Variables

Here are some of the main disadvantages of using variables in research:

  • Simplification : Variables may oversimplify the complexity of real-world phenomena. By breaking down a phenomenon into variables, researchers may lose important information and context, which can affect the accuracy and generalizability of their findings.
  • Measurement error : Variables rely on accurate and precise measurement, and measurement error can affect the reliability and validity of research findings. The use of subjective or poorly defined variables can also introduce measurement error into the study.
  • Confounding variables : Confounding variables are factors that are not measured but that affect the relationship between the variables of interest. If confounding variables are not accounted for, they can distort or obscure the relationship between the variables of interest.
  • Limited scope: Variables are defined by the researcher, and the scope of the study is therefore limited by the researcher’s choice of variables. This can lead to a narrow focus that overlooks important aspects of the phenomenon being studied.
  • Ethical concerns: The selection and measurement of variables may raise ethical concerns, especially in studies involving human subjects. For example, using variables that are related to sensitive topics, such as race or sexuality, may raise concerns about privacy and discrimination.

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Qualitative Variable

Qualitative Variable – Types and Examples

Ratio Variable

Ratio Variable – Definition, Purpose and Examples

Continuous Variable

Continuous Variable – Definition, Types and...

Polytomous Variable

Polytomous Variable – Definition, Purpose and...

Intervening Variable

Intervening Variable – Definition, Types and...

Categorical Variable

Categorical Variable – Definition, Types and...

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Methodology

  • Types of Variables in Research & Statistics | Examples

Types of Variables in Research & Statistics | Examples

Published on September 19, 2022 by Rebecca Bevans . Revised on June 21, 2023.

In statistical research , a variable is defined as an attribute of an object of study. Choosing which variables to measure is central to good experimental design .

If you want to test whether some plant species are more salt-tolerant than others, some key variables you might measure include the amount of salt you add to the water, the species of plants being studied, and variables related to plant health like growth and wilting .

You need to know which types of variables you are working with in order to choose appropriate statistical tests and interpret the results of your study.

You can usually identify the type of variable by asking two questions:

  • What type of data does the variable contain?
  • What part of the experiment does the variable represent?

Table of contents

Types of data: quantitative vs categorical variables, parts of the experiment: independent vs dependent variables, other common types of variables, other interesting articles, frequently asked questions about variables.

Data is a specific measurement of a variable – it is the value you record in your data sheet. Data is generally divided into two categories:

  • Quantitative data represents amounts
  • Categorical data represents groupings

A variable that contains quantitative data is a quantitative variable ; a variable that contains categorical data is a categorical variable . Each of these types of variables can be broken down into further types.

Quantitative variables

When you collect quantitative data, the numbers you record represent real amounts that can be added, subtracted, divided, etc. There are two types of quantitative variables: discrete and continuous .

Discrete vs continuous variables
Type of variable What does the data represent? Examples
Discrete variables (aka integer variables) Counts of individual items or values.
Continuous variables (aka ratio variables) Measurements of continuous or non-finite values.

Categorical variables

Categorical variables represent groupings of some kind. They are sometimes recorded as numbers, but the numbers represent categories rather than actual amounts of things.

There are three types of categorical variables: binary , nominal , and ordinal variables .

Binary vs nominal vs ordinal variables
Type of variable What does the data represent? Examples
Binary variables (aka dichotomous variables) Yes or no outcomes.
Nominal variables Groups with no rank or order between them.
Ordinal variables Groups that are ranked in a specific order. *

*Note that sometimes a variable can work as more than one type! An ordinal variable can also be used as a quantitative variable if the scale is numeric and doesn’t need to be kept as discrete integers. For example, star ratings on product reviews are ordinal (1 to 5 stars), but the average star rating is quantitative.

Example data sheet

To keep track of your salt-tolerance experiment, you make a data sheet where you record information about the variables in the experiment, like salt addition and plant health.

To gather information about plant responses over time, you can fill out the same data sheet every few days until the end of the experiment. This example sheet is color-coded according to the type of variable: nominal , continuous , ordinal , and binary .

Example data sheet showing types of variables in a plant salt tolerance experiment

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

Experiments are usually designed to find out what effect one variable has on another – in our example, the effect of salt addition on plant growth.

You manipulate the independent variable (the one you think might be the cause ) and then measure the dependent variable (the one you think might be the effect ) to find out what this effect might be.

You will probably also have variables that you hold constant ( control variables ) in order to focus on your experimental treatment.

Independent vs dependent vs control variables
Type of variable Definition Example (salt tolerance experiment)
Independent variables (aka treatment variables) Variables you manipulate in order to affect the outcome of an experiment. The amount of salt added to each plant’s water.
Dependent variables (aka ) Variables that represent the outcome of the experiment. Any measurement of plant health and growth: in this case, plant height and wilting.
Control variables Variables that are held constant throughout the experiment. The temperature and light in the room the plants are kept in, and the volume of water given to each plant.

In this experiment, we have one independent and three dependent variables.

The other variables in the sheet can’t be classified as independent or dependent, but they do contain data that you will need in order to interpret your dependent and independent variables.

Example of a data sheet showing dependent and independent variables for a plant salt tolerance experiment.

What about correlational research?

When you do correlational research , the terms “dependent” and “independent” don’t apply, because you are not trying to establish a cause and effect relationship ( causation ).

However, there might be cases where one variable clearly precedes the other (for example, rainfall leads to mud, rather than the other way around). In these cases you may call the preceding variable (i.e., the rainfall) the predictor variable and the following variable (i.e. the mud) the outcome variable .

Once you have defined your independent and dependent variables and determined whether they are categorical or quantitative, you will be able to choose the correct statistical test .

But there are many other ways of describing variables that help with interpreting your results. Some useful types of variables are listed below.

Type of variable Definition Example (salt tolerance experiment)
A variable that hides the true effect of another variable in your experiment. This can happen when another variable is closely related to a variable you are interested in, but you haven’t controlled it in your experiment. Be careful with these, because confounding variables run a high risk of introducing a variety of to your work, particularly . Pot size and soil type might affect plant survival as much or more than salt additions. In an experiment you would control these potential confounders by holding them constant.
Latent variables A variable that can’t be directly measured, but that you represent via a proxy. Salt tolerance in plants cannot be measured directly, but can be inferred from measurements of plant health in our salt-addition experiment.
Composite variables A variable that is made by combining multiple variables in an experiment. These variables are created when you analyze data, not when you measure it. The three plant health variables could be combined into a single plant-health score to make it easier to present your findings.

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Student’s  t -distribution
  • Normal distribution
  • Null and Alternative Hypotheses
  • Chi square tests
  • Confidence interval
  • Cluster sampling
  • Stratified sampling
  • Data cleansing
  • Reproducibility vs Replicability
  • Peer review
  • Likert scale

Research bias

  • Implicit bias
  • Framing effect
  • Cognitive bias
  • Placebo effect
  • Hawthorne effect
  • Hindsight bias
  • Affect heuristic

You can think of independent and dependent variables in terms of cause and effect: an independent variable is the variable you think is the cause , while a dependent variable is the effect .

In an experiment, you manipulate the independent variable and measure the outcome in the dependent variable. For example, in an experiment about the effect of nutrients on crop growth:

  • The  independent variable  is the amount of nutrients added to the crop field.
  • The  dependent variable is the biomass of the crops at harvest time.

Defining your variables, and deciding how you will manipulate and measure them, is an important part of experimental design .

A confounding variable , also called a confounder or confounding factor, is a third variable in a study examining a potential cause-and-effect relationship.

A confounding variable is related to both the supposed cause and the supposed effect of the study. It can be difficult to separate the true effect of the independent variable from the effect of the confounding variable.

In your research design , it’s important to identify potential confounding variables and plan how you will reduce their impact.

Quantitative variables are any variables where the data represent amounts (e.g. height, weight, or age).

Categorical variables are any variables where the data represent groups. This includes rankings (e.g. finishing places in a race), classifications (e.g. brands of cereal), and binary outcomes (e.g. coin flips).

You need to know what type of variables you are working with to choose the right statistical test for your data and interpret your results .

Discrete and continuous variables are two types of quantitative variables :

  • Discrete variables represent counts (e.g. the number of objects in a collection).
  • Continuous variables represent measurable amounts (e.g. water volume or weight).

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bevans, R. (2023, June 21). Types of Variables in Research & Statistics | Examples. Scribbr. Retrieved June 9, 2024, from https://www.scribbr.com/methodology/types-of-variables/

Is this article helpful?

Rebecca Bevans

Rebecca Bevans

Other students also liked, independent vs. dependent variables | definition & examples, confounding variables | definition, examples & controls, control variables | what are they & why do they matter, get unlimited documents corrected.

✔ Free APA citation check included ✔ Unlimited document corrections ✔ Specialized in correcting academic texts

Grad Coach

Research Variables 101

Independent variables, dependent variables, control variables and more

By: Derek Jansen (MBA) | Expert Reviewed By: Kerryn Warren (PhD) | January 2023

If you’re new to the world of research, especially scientific research, you’re bound to run into the concept of variables , sooner or later. If you’re feeling a little confused, don’t worry – you’re not the only one! Independent variables, dependent variables, confounding variables – it’s a lot of jargon. In this post, we’ll unpack the terminology surrounding research variables using straightforward language and loads of examples .

Overview: Variables In Research

1. ?
2. variables
3. variables
4. variables

5. variables
6. variables
7. variables
8. variables

What (exactly) is a variable?

The simplest way to understand a variable is as any characteristic or attribute that can experience change or vary over time or context – hence the name “variable”. For example, the dosage of a particular medicine could be classified as a variable, as the amount can vary (i.e., a higher dose or a lower dose). Similarly, gender, age or ethnicity could be considered demographic variables, because each person varies in these respects.

Within research, especially scientific research, variables form the foundation of studies, as researchers are often interested in how one variable impacts another, and the relationships between different variables. For example:

  • How someone’s age impacts their sleep quality
  • How different teaching methods impact learning outcomes
  • How diet impacts weight (gain or loss)

As you can see, variables are often used to explain relationships between different elements and phenomena. In scientific studies, especially experimental studies, the objective is often to understand the causal relationships between variables. In other words, the role of cause and effect between variables. This is achieved by manipulating certain variables while controlling others – and then observing the outcome. But, we’ll get into that a little later…

The “Big 3” Variables

Variables can be a little intimidating for new researchers because there are a wide variety of variables, and oftentimes, there are multiple labels for the same thing. To lay a firm foundation, we’ll first look at the three main types of variables, namely:

  • Independent variables (IV)
  • Dependant variables (DV)
  • Control variables

What is an independent variable?

Simply put, the independent variable is the “ cause ” in the relationship between two (or more) variables. In other words, when the independent variable changes, it has an impact on another variable.

For example:

  • Increasing the dosage of a medication (Variable A) could result in better (or worse) health outcomes for a patient (Variable B)
  • Changing a teaching method (Variable A) could impact the test scores that students earn in a standardised test (Variable B)
  • Varying one’s diet (Variable A) could result in weight loss or gain (Variable B).

It’s useful to know that independent variables can go by a few different names, including, explanatory variables (because they explain an event or outcome) and predictor variables (because they predict the value of another variable). Terminology aside though, the most important takeaway is that independent variables are assumed to be the “cause” in any cause-effect relationship. As you can imagine, these types of variables are of major interest to researchers, as many studies seek to understand the causal factors behind a phenomenon.

Need a helping hand?

what is an outcome variable in research

What is a dependent variable?

While the independent variable is the “ cause ”, the dependent variable is the “ effect ” – or rather, the affected variable . In other words, the dependent variable is the variable that is assumed to change as a result of a change in the independent variable.

Keeping with the previous example, let’s look at some dependent variables in action:

  • Health outcomes (DV) could be impacted by dosage changes of a medication (IV)
  • Students’ scores (DV) could be impacted by teaching methods (IV)
  • Weight gain or loss (DV) could be impacted by diet (IV)

In scientific studies, researchers will typically pay very close attention to the dependent variable (or variables), carefully measuring any changes in response to hypothesised independent variables. This can be tricky in practice, as it’s not always easy to reliably measure specific phenomena or outcomes – or to be certain that the actual cause of the change is in fact the independent variable.

As the adage goes, correlation is not causation . In other words, just because two variables have a relationship doesn’t mean that it’s a causal relationship – they may just happen to vary together. For example, you could find a correlation between the number of people who own a certain brand of car and the number of people who have a certain type of job. Just because the number of people who own that brand of car and the number of people who have that type of job is correlated, it doesn’t mean that owning that brand of car causes someone to have that type of job or vice versa. The correlation could, for example, be caused by another factor such as income level or age group, which would affect both car ownership and job type.

To confidently establish a causal relationship between an independent variable and a dependent variable (i.e., X causes Y), you’ll typically need an experimental design , where you have complete control over the environmen t and the variables of interest. But even so, this doesn’t always translate into the “real world”. Simply put, what happens in the lab sometimes stays in the lab!

As an alternative to pure experimental research, correlational or “ quasi-experimental ” research (where the researcher cannot manipulate or change variables) can be done on a much larger scale more easily, allowing one to understand specific relationships in the real world. These types of studies also assume some causality between independent and dependent variables, but it’s not always clear. So, if you go this route, you need to be cautious in terms of how you describe the impact and causality between variables and be sure to acknowledge any limitations in your own research.

Free Webinar: Research Methodology 101

What is a control variable?

In an experimental design, a control variable (or controlled variable) is a variable that is intentionally held constant to ensure it doesn’t have an influence on any other variables. As a result, this variable remains unchanged throughout the course of the study. In other words, it’s a variable that’s not allowed to vary – tough life 🙂

As we mentioned earlier, one of the major challenges in identifying and measuring causal relationships is that it’s difficult to isolate the impact of variables other than the independent variable. Simply put, there’s always a risk that there are factors beyond the ones you’re specifically looking at that might be impacting the results of your study. So, to minimise the risk of this, researchers will attempt (as best possible) to hold other variables constant . These factors are then considered control variables.

Some examples of variables that you may need to control include:

  • Temperature
  • Time of day
  • Noise or distractions

Which specific variables need to be controlled for will vary tremendously depending on the research project at hand, so there’s no generic list of control variables to consult. As a researcher, you’ll need to think carefully about all the factors that could vary within your research context and then consider how you’ll go about controlling them. A good starting point is to look at previous studies similar to yours and pay close attention to which variables they controlled for.

Of course, you won’t always be able to control every possible variable, and so, in many cases, you’ll just have to acknowledge their potential impact and account for them in the conclusions you draw. Every study has its limitations , so don’t get fixated or discouraged by troublesome variables. Nevertheless, always think carefully about the factors beyond what you’re focusing on – don’t make assumptions!

 A control variable is intentionally held constant (it doesn't vary) to ensure it doesn’t have an influence on any other variables.

Other types of variables

As we mentioned, independent, dependent and control variables are the most common variables you’ll come across in your research, but they’re certainly not the only ones you need to be aware of. Next, we’ll look at a few “secondary” variables that you need to keep in mind as you design your research.

  • Moderating variables
  • Mediating variables
  • Confounding variables
  • Latent variables

Let’s jump into it…

What is a moderating variable?

A moderating variable is a variable that influences the strength or direction of the relationship between an independent variable and a dependent variable. In other words, moderating variables affect how much (or how little) the IV affects the DV, or whether the IV has a positive or negative relationship with the DV (i.e., moves in the same or opposite direction).

For example, in a study about the effects of sleep deprivation on academic performance, gender could be used as a moderating variable to see if there are any differences in how men and women respond to a lack of sleep. In such a case, one may find that gender has an influence on how much students’ scores suffer when they’re deprived of sleep.

It’s important to note that while moderators can have an influence on outcomes , they don’t necessarily cause them ; rather they modify or “moderate” existing relationships between other variables. This means that it’s possible for two different groups with similar characteristics, but different levels of moderation, to experience very different results from the same experiment or study design.

What is a mediating variable?

Mediating variables are often used to explain the relationship between the independent and dependent variable (s). For example, if you were researching the effects of age on job satisfaction, then education level could be considered a mediating variable, as it may explain why older people have higher job satisfaction than younger people – they may have more experience or better qualifications, which lead to greater job satisfaction.

Mediating variables also help researchers understand how different factors interact with each other to influence outcomes. For instance, if you wanted to study the effect of stress on academic performance, then coping strategies might act as a mediating factor by influencing both stress levels and academic performance simultaneously. For example, students who use effective coping strategies might be less stressed but also perform better academically due to their improved mental state.

In addition, mediating variables can provide insight into causal relationships between two variables by helping researchers determine whether changes in one factor directly cause changes in another – or whether there is an indirect relationship between them mediated by some third factor(s). For instance, if you wanted to investigate the impact of parental involvement on student achievement, you would need to consider family dynamics as a potential mediator, since it could influence both parental involvement and student achievement simultaneously.

Mediating variables can explain the relationship between the independent and dependent variable, including whether it's causal or not.

What is a confounding variable?

A confounding variable (also known as a third variable or lurking variable ) is an extraneous factor that can influence the relationship between two variables being studied. Specifically, for a variable to be considered a confounding variable, it needs to meet two criteria:

  • It must be correlated with the independent variable (this can be causal or not)
  • It must have a causal impact on the dependent variable (i.e., influence the DV)

Some common examples of confounding variables include demographic factors such as gender, ethnicity, socioeconomic status, age, education level, and health status. In addition to these, there are also environmental factors to consider. For example, air pollution could confound the impact of the variables of interest in a study investigating health outcomes.

Naturally, it’s important to identify as many confounding variables as possible when conducting your research, as they can heavily distort the results and lead you to draw incorrect conclusions . So, always think carefully about what factors may have a confounding effect on your variables of interest and try to manage these as best you can.

What is a latent variable?

Latent variables are unobservable factors that can influence the behaviour of individuals and explain certain outcomes within a study. They’re also known as hidden or underlying variables , and what makes them rather tricky is that they can’t be directly observed or measured . Instead, latent variables must be inferred from other observable data points such as responses to surveys or experiments.

For example, in a study of mental health, the variable “resilience” could be considered a latent variable. It can’t be directly measured , but it can be inferred from measures of mental health symptoms, stress, and coping mechanisms. The same applies to a lot of concepts we encounter every day – for example:

  • Emotional intelligence
  • Quality of life
  • Business confidence
  • Ease of use

One way in which we overcome the challenge of measuring the immeasurable is latent variable models (LVMs). An LVM is a type of statistical model that describes a relationship between observed variables and one or more unobserved (latent) variables. These models allow researchers to uncover patterns in their data which may not have been visible before, thanks to their complexity and interrelatedness with other variables. Those patterns can then inform hypotheses about cause-and-effect relationships among those same variables which were previously unknown prior to running the LVM. Powerful stuff, we say!

Latent variables are unobservable factors that can influence the behaviour of individuals and explain certain outcomes within a study.

Let’s recap

In the world of scientific research, there’s no shortage of variable types, some of which have multiple names and some of which overlap with each other. In this post, we’ve covered some of the popular ones, but remember that this is not an exhaustive list .

To recap, we’ve explored:

  • Independent variables (the “cause”)
  • Dependent variables (the “effect”)
  • Control variables (the variable that’s not allowed to vary)

If you’re still feeling a bit lost and need a helping hand with your research project, check out our 1-on-1 coaching service , where we guide you through each step of the research journey. Also, be sure to check out our free dissertation writing course and our collection of free, fully-editable chapter templates .

what is an outcome variable in research

Psst... there’s more!

This post was based on one of our popular Research Bootcamps . If you're working on a research project, you'll definitely want to check this out ...

You Might Also Like:

Survey Design 101: The Basics

Very informative, concise and helpful. Thank you

Ige Samuel Babatunde

Helping information.Thanks

Ancel George

practical and well-demonstrated

Michael

Very helpful and insightful

Submit a Comment Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

  • Print Friendly

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons

Margin Size

  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Statistics LibreTexts

1.10: The role of variables — predictors and outcomes

  • Last updated
  • Save as PDF
  • Page ID 16761

  • Matthew J. C. Crump
  • Brooklyn College of CUNY

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

Okay, I’ve got one last piece of terminology that I need to explain to you before moving away from variables. Normally, when we do some research we end up with lots of different variables. Then, when we analyse our data we usually try to explain some of the variables in terms of some of the other variables. It’s important to keep the two roles “thing doing the explaining” and “thing being explained” distinct. So let’s be clear about this now. Firstly, we might as well get used to the idea of using mathematical symbols to describe variables, since it’s going to happen over and over again. Let’s denote the “to be explained” variable \(Y\), and denote the variables “doing the explaining” as \(X_1\), \(X_2\), etc.

Now, when we doing an analysis, we have different names for \(X\) and \(Y\), since they play different roles in the analysis. The classical names for these roles are independent variable (IV) and dependent variable (DV). The IV is the variable that you use to do the explaining (i.e., \(X\)) and the DV is the variable being explained (i.e., \(Y\)). The logic behind these names goes like this: if there really is a relationship between \(X\) and \(Y\) then we can say that \(Y\) depends on \(X\), and if we have designed our study “properly” then \(X\) isn’t dependent on anything else. However, I personally find those names horrible: they’re hard to remember and they’re highly misleading, because (a) the IV is never actually “independent of everything else” and (b) if there’s no relationship, then the DV doesn’t actually depend on the IV. And in fact, because I’m not the only person who thinks that IV and DV are just awful names, there are a number of alternatives that I find more appealing.

For example, in an experiment the IV refers to the manipulation , and the DV refers to the measurement . So, we could use manipulated variable (independent variable) and measured variable (dependent variable).

The terminology used to distinguish between different roles that a variable can play when analysing a data set.
role of the variable classical name modern name
“to be explained” dependent variable (DV) Measurement
“to do the explaining” independent variable (IV) Manipulation

We could also use predictors and outcomes . The idea here is that what you’re trying to do is use \(X\) (the predictors) to make guesses about \(Y\) (the outcomes). This is summarized in the table:

The terminology used to distinguish between different roles that a variable can play when analysing a data set.
role of the variable classical name modern name
“to be explained” dependent variable (DV) outcome
“to do the explaining” independent variable (IV) predictor

COMMON MISTEAKS MISTAKES IN USING STATISTICS: Spotting and Avoiding Them

Glossary      blog, choosing an outcome 1 variable, example 1: how to measure "big", example 2: how to measure "unemployment rate".

  • Do not assume you understand what a measure is just because the name makes sense to you. Be sure to find and read the definition carefully; it may not be what you think.
  • Be especially careful when making comparisons. The same term might be used differently by different authors or in different places. For example, different countries have different definitions of unemployment rate. (See http://www.bls.gov/fls/flsfaqs.htm#laborforcedefinitions )

Example 3: What is a good outcome variable for deciding whether cancer treatment in a country has been improving?

Example 4: what is a good outcome variable for answering the question, "do males or females suffer more traffic fatalities", example 5: what is a good outcome variable for research on the effect of medication on bone fractures  , statistical considerations.

what is an outcome variable in research

Variables in Research | Types, Definiton & Examples

what is an outcome variable in research

Introduction

What is a variable, what are the 5 types of variables in research, other variables in research.

Variables are fundamental components of research that allow for the measurement and analysis of data. They can be defined as characteristics or properties that can take on different values. In research design , understanding the types of variables and their roles is crucial for developing hypotheses , designing methods , and interpreting results .

This article outlines the the types of variables in research, including their definitions and examples, to provide a clear understanding of their use and significance in research studies. By categorizing variables into distinct groups based on their roles in research, their types of data, and their relationships with other variables, researchers can more effectively structure their studies and achieve more accurate conclusions.

what is an outcome variable in research

A variable represents any characteristic, number, or quantity that can be measured or quantified. The term encompasses anything that can vary or change, ranging from simple concepts like age and height to more complex ones like satisfaction levels or economic status. Variables are essential in research as they are the foundational elements that researchers manipulate, measure, or control to gain insights into relationships, causes, and effects within their studies. They enable the framing of research questions, the formulation of hypotheses, and the interpretation of results.

Variables can be categorized based on their role in the study (such as independent and dependent variables ), the type of data they represent (quantitative or categorical), and their relationship to other variables (like confounding or control variables). Understanding what constitutes a variable and the various variable types available is a critical step in designing robust and meaningful research.

what is an outcome variable in research

ATLAS.ti makes complex data easy to understand

Turn to our powerful data analysis tools to make the most of your research. Get started with a free trial.

Variables are crucial components in research, serving as the foundation for data collection , analysis , and interpretation . They are attributes or characteristics that can vary among subjects or over time, and understanding their types is essential for any study. Variables can be broadly classified into five main types, each with its distinct characteristics and roles within research.

This classification helps researchers in designing their studies, choosing appropriate measurement techniques, and analyzing their results accurately. The five types of variables include independent variables, dependent variables, categorical variables, continuous variables, and confounding variables. These categories not only facilitate a clearer understanding of the data but also guide the formulation of hypotheses and research methodologies.

Independent variables

Independent variables are foundational to the structure of research, serving as the factors or conditions that researchers manipulate or vary to observe their effects on dependent variables. These variables are considered "independent" because their variation does not depend on other variables within the study. Instead, they are the cause or stimulus that directly influences the outcomes being measured. For example, in an experiment to assess the effectiveness of a new teaching method on student performance, the teaching method applied (traditional vs. innovative) would be the independent variable.

The selection of an independent variable is a critical step in research design, as it directly correlates with the study's objective to determine causality or association. Researchers must clearly define and control these variables to ensure that observed changes in the dependent variable can be attributed to variations in the independent variable, thereby affirming the reliability of the results. In experimental research, the independent variable is what differentiates the control group from the experimental group, thereby setting the stage for meaningful comparison and analysis.

Dependent variables

Dependent variables are the outcomes or effects that researchers aim to explore and understand in their studies. These variables are called "dependent" because their values depend on the changes or variations of the independent variables.

Essentially, they are the responses or results that are measured to assess the impact of the independent variable's manipulation. For instance, in a study investigating the effect of exercise on weight loss, the amount of weight lost would be considered the dependent variable, as it depends on the exercise regimen (the independent variable).

The identification and measurement of the dependent variable are crucial for testing the hypothesis and drawing conclusions from the research. It allows researchers to quantify the effect of the independent variable , providing evidence for causal relationships or associations. In experimental settings, the dependent variable is what is being tested and measured across different groups or conditions, enabling researchers to assess the efficacy or impact of the independent variable's variation.

To ensure accuracy and reliability, the dependent variable must be defined clearly and measured consistently across all participants or observations. This consistency helps in reducing measurement errors and increases the validity of the research findings. By carefully analyzing the dependent variables, researchers can derive meaningful insights from their studies, contributing to the broader knowledge in their field.

Categorical variables

Categorical variables, also known as qualitative variables, represent types or categories that are used to group observations. These variables divide data into distinct groups or categories that lack a numerical value but hold significant meaning in research. Examples of categorical variables include gender (male, female, other), type of vehicle (car, truck, motorcycle), or marital status (single, married, divorced). These categories help researchers organize data into groups for comparison and analysis.

Categorical variables can be further classified into two subtypes: nominal and ordinal. Nominal variables are categories without any inherent order or ranking among them, such as blood type or ethnicity. Ordinal variables, on the other hand, imply a sort of ranking or order among the categories, like levels of satisfaction (high, medium, low) or education level (high school, bachelor's, master's, doctorate).

Understanding and identifying categorical variables is crucial in research as it influences the choice of statistical analysis methods. Since these variables represent categories without numerical significance, researchers employ specific statistical tests designed for a nominal or ordinal variable to draw meaningful conclusions. Properly classifying and analyzing categorical variables allow for the exploration of relationships between different groups within the study, shedding light on patterns and trends that might not be evident with numerical data alone.

Continuous variables

Continuous variables are quantitative variables that can take an infinite number of values within a given range. These variables are measured along a continuum and can represent very precise measurements. Examples of continuous variables include height, weight, temperature, and time. Because they can assume any value within a range, continuous variables allow for detailed analysis and a high degree of accuracy in research findings.

The ability to measure continuous variables at very fine scales makes them invaluable for many types of research, particularly in the natural and social sciences. For instance, in a study examining the effect of temperature on plant growth, temperature would be considered a continuous variable since it can vary across a wide spectrum and be measured to several decimal places.

When dealing with continuous variables, researchers often use methods incorporating a particular statistical test to accommodate a wide range of data points and the potential for infinite divisibility. This includes various forms of regression analysis, correlation, and other techniques suited for modeling and analyzing nuanced relationships between variables. The precision of continuous variables enhances the researcher's ability to detect patterns, trends, and causal relationships within the data, contributing to more robust and detailed conclusions.

Confounding variables

Confounding variables are those that can cause a false association between the independent and dependent variables, potentially leading to incorrect conclusions about the relationship being studied. These are extraneous variables that were not considered in the study design but can influence both the supposed cause and effect, creating a misleading correlation.

Identifying and controlling for a confounding variable is crucial in research to ensure the validity of the findings. This can be achieved through various methods, including randomization, stratification, and statistical control. Randomization helps to evenly distribute confounding variables across study groups, reducing their potential impact. Stratification involves analyzing the data within strata or layers that share common characteristics of the confounder. Statistical control allows researchers to adjust for the effects of confounders in the analysis phase.

Properly addressing confounding variables strengthens the credibility of research outcomes by clarifying the direct relationship between the dependent and independent variables, thus providing more accurate and reliable results.

what is an outcome variable in research

Beyond the primary categories of variables commonly discussed in research methodology , there exists a diverse range of other variables that play significant roles in the design and analysis of studies. Below is an overview of some of these variables, highlighting their definitions and roles within research studies:

  • Discrete variables : A discrete variable is a quantitative variable that represents quantitative data , such as the number of children in a family or the number of cars in a parking lot. Discrete variables can only take on specific values.
  • Categorical variables : A categorical variable categorizes subjects or items into groups that do not have a natural numerical order. Categorical data includes nominal variables, like country of origin, and ordinal variables, such as education level.
  • Predictor variables : Often used in statistical models, a predictor variable is used to forecast or predict the outcomes of other variables, not necessarily with a causal implication.
  • Outcome variables : These variables represent the results or outcomes that researchers aim to explain or predict through their studies. An outcome variable is central to understanding the effects of predictor variables.
  • Latent variables : Not directly observable, latent variables are inferred from other, directly measured variables. Examples include psychological constructs like intelligence or socioeconomic status.
  • Composite variables : Created by combining multiple variables, composite variables can measure a concept more reliably or simplify the analysis. An example would be a composite happiness index derived from several survey questions .
  • Preceding variables : These variables come before other variables in time or sequence, potentially influencing subsequent outcomes. A preceding variable is crucial in longitudinal studies to determine causality or sequences of events.

what is an outcome variable in research

Master qualitative research with ATLAS.ti

Turn data into critical insights with our data analysis platform. Try out a free trial today.

what is an outcome variable in research

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

1.1.2 - explanatory & response variables.

In some research studies one variable is used to predict or explain differences in another variable. In those cases, the  explanatory variable  is used to predict or explain differences in the  response variable . In an experimental study, the explanatory variable is the variable that is manipulated by the researcher. 

Also known as the independent  or  predictor variable , it explains variations in the response variable; in an experimental study, it is manipulated by the researcher

Also known as the  dependent  or  outcome variable,  its value is predicted or its variation is explained by the explanatory variable; in an experimental study, this is the outcome that is measured following manipulation of the explanatory variable

Example: Panda Fertility Treatments Section  

A team of veterinarians wants to compare the effectiveness of two fertility treatments for pandas in captivity. The two treatments are in-vitro fertilization and male fertility medications. This experiment has one  explanatory variable : type of fertility treatment. The  response variable  is a measure of fertility rate.

Example: Public Speaking Approaches Section  

A public speaking teacher has developed a new lesson that she believes decreases student anxiety in public speaking situations more than the old lesson. She designs an experiment to test if her new lesson works better than the old lesson. Public speaking students are randomly assigned to receive either the new or old lesson; their anxiety levels during a variety of public speaking experiences are measured.  This experiment has one  explanatory variable : the lesson received. The  response variable  is anxiety level.

Example: Coffee Bean Origin Section  

A researcher believes that the origin of the beans used to make a cup of coffee affects hyperactivity. He wants to compare coffee from three different regions: Africa, South America, and Mexico. The  explanatory variable is the origin of coffee bean; this has three levels: Africa, South America, and Mexico. The  response variable  is hyperactivity level.

Example: Height & Age Section  

A group of middle school students wants to know if they can use height to predict age. They take a random sample of 50 people at their school, both students and teachers, and record each individual's height and age. This is an observational study. The students want to use height to predict age so the  explanatory variable  is height and the  response variable  is age.

Example: Grade & Height Section  

Research question:  Do fourth graders tend to be taller than third graders?

This is an observational study. The researcher wants to use grade level to explain differences in height. The  explanatory variable  is grade level. The  response variable  is height. 

Types of Variable

All experiments examine some kind of variable(s). A variable is not only something that we measure, but also something that we can manipulate and something we can control for. To understand the characteristics of variables and how we use them in research, this guide is divided into three main sections. First, we illustrate the role of dependent and independent variables. Second, we discuss the difference between experimental and non-experimental research. Finally, we explain how variables can be characterised as either categorical or continuous.

Dependent and Independent Variables

An independent variable, sometimes called an experimental or predictor variable, is a variable that is being manipulated in an experiment in order to observe the effect on a dependent variable, sometimes called an outcome variable.

Imagine that a tutor asks 100 students to complete a maths test. The tutor wants to know why some students perform better than others. Whilst the tutor does not know the answer to this, she thinks that it might be because of two reasons: (1) some students spend more time revising for their test; and (2) some students are naturally more intelligent than others. As such, the tutor decides to investigate the effect of revision time and intelligence on the test performance of the 100 students. The dependent and independent variables for the study are:

Dependent Variable: Test Mark (measured from 0 to 100)

Independent Variables: Revision time (measured in hours) Intelligence (measured using IQ score)

The dependent variable is simply that, a variable that is dependent on an independent variable(s). For example, in our case the test mark that a student achieves is dependent on revision time and intelligence. Whilst revision time and intelligence (the independent variables) may (or may not) cause a change in the test mark (the dependent variable), the reverse is implausible; in other words, whilst the number of hours a student spends revising and the higher a student's IQ score may (or may not) change the test mark that a student achieves, a change in a student's test mark has no bearing on whether a student revises more or is more intelligent (this simply doesn't make sense).

Therefore, the aim of the tutor's investigation is to examine whether these independent variables - revision time and IQ - result in a change in the dependent variable, the students' test scores. However, it is also worth noting that whilst this is the main aim of the experiment, the tutor may also be interested to know if the independent variables - revision time and IQ - are also connected in some way.

In the section on experimental and non-experimental research that follows, we find out a little more about the nature of independent and dependent variables.

Experimental and Non-Experimental Research

  • Experimental research : In experimental research, the aim is to manipulate an independent variable(s) and then examine the effect that this change has on a dependent variable(s). Since it is possible to manipulate the independent variable(s), experimental research has the advantage of enabling a researcher to identify a cause and effect between variables. For example, take our example of 100 students completing a maths exam where the dependent variable was the exam mark (measured from 0 to 100), and the independent variables were revision time (measured in hours) and intelligence (measured using IQ score). Here, it would be possible to use an experimental design and manipulate the revision time of the students. The tutor could divide the students into two groups, each made up of 50 students. In "group one", the tutor could ask the students not to do any revision. Alternately, "group two" could be asked to do 20 hours of revision in the two weeks prior to the test. The tutor could then compare the marks that the students achieved.
  • Non-experimental research : In non-experimental research, the researcher does not manipulate the independent variable(s). This is not to say that it is impossible to do so, but it will either be impractical or unethical to do so. For example, a researcher may be interested in the effect of illegal, recreational drug use (the independent variable(s)) on certain types of behaviour (the dependent variable(s)). However, whilst possible, it would be unethical to ask individuals to take illegal drugs in order to study what effect this had on certain behaviours. As such, a researcher could ask both drug and non-drug users to complete a questionnaire that had been constructed to indicate the extent to which they exhibited certain behaviours. Whilst it is not possible to identify the cause and effect between the variables, we can still examine the association or relationship between them. In addition to understanding the difference between dependent and independent variables, and experimental and non-experimental research, it is also important to understand the different characteristics amongst variables. This is discussed next.

Categorical and Continuous Variables

Categorical variables are also known as discrete or qualitative variables. Categorical variables can be further categorized as either nominal , ordinal or dichotomous .

  • Nominal variables are variables that have two or more categories, but which do not have an intrinsic order. For example, a real estate agent could classify their types of property into distinct categories such as houses, condos, co-ops or bungalows. So "type of property" is a nominal variable with 4 categories called houses, condos, co-ops and bungalows. Of note, the different categories of a nominal variable can also be referred to as groups or levels of the nominal variable. Another example of a nominal variable would be classifying where people live in the USA by state. In this case there will be many more levels of the nominal variable (50 in fact).
  • Dichotomous variables are nominal variables which have only two categories or levels. For example, if we were looking at gender, we would most probably categorize somebody as either "male" or "female". This is an example of a dichotomous variable (and also a nominal variable). Another example might be if we asked a person if they owned a mobile phone. Here, we may categorise mobile phone ownership as either "Yes" or "No". In the real estate agent example, if type of property had been classified as either residential or commercial then "type of property" would be a dichotomous variable.
  • Ordinal variables are variables that have two or more categories just like nominal variables only the categories can also be ordered or ranked. So if you asked someone if they liked the policies of the Democratic Party and they could answer either "Not very much", "They are OK" or "Yes, a lot" then you have an ordinal variable. Why? Because you have 3 categories, namely "Not very much", "They are OK" and "Yes, a lot" and you can rank them from the most positive (Yes, a lot), to the middle response (They are OK), to the least positive (Not very much). However, whilst we can rank the levels, we cannot place a "value" to them; we cannot say that "They are OK" is twice as positive as "Not very much" for example.

Testimonials

Continuous variables are also known as quantitative variables. Continuous variables can be further categorized as either interval or ratio variables.

  • Interval variables are variables for which their central characteristic is that they can be measured along a continuum and they have a numerical value (for example, temperature measured in degrees Celsius or Fahrenheit). So the difference between 20°C and 30°C is the same as 30°C to 40°C. However, temperature measured in degrees Celsius or Fahrenheit is NOT a ratio variable.
  • Ratio variables are interval variables, but with the added condition that 0 (zero) of the measurement indicates that there is none of that variable. So, temperature measured in degrees Celsius or Fahrenheit is not a ratio variable because 0°C does not mean there is no temperature. However, temperature measured in Kelvin is a ratio variable as 0 Kelvin (often called absolute zero) indicates that there is no temperature whatsoever. Other examples of ratio variables include height, mass, distance and many more. The name "ratio" reflects the fact that you can use the ratio of measurements. So, for example, a distance of ten metres is twice the distance of 5 metres.

Ambiguities in classifying a type of variable

In some cases, the measurement scale for data is ordinal, but the variable is treated as continuous. For example, a Likert scale that contains five values - strongly agree, agree, neither agree nor disagree, disagree, and strongly disagree - is ordinal. However, where a Likert scale contains seven or more value - strongly agree, moderately agree, agree, neither agree nor disagree, disagree, moderately disagree, and strongly disagree - the underlying scale is sometimes treated as continuous (although where you should do this is a cause of great dispute).

It is worth noting that how we categorise variables is somewhat of a choice. Whilst we categorised gender as a dichotomous variable (you are either male or female), social scientists may disagree with this, arguing that gender is a more complex variable involving more than two distinctions, but also including measurement levels like genderqueer, intersex and transgender. At the same time, some researchers would argue that a Likert scale, even with seven values, should never be treated as a continuous variable.

  • USC Libraries
  • Research Guides

Organizing Your Social Sciences Research Paper

  • Independent and Dependent Variables
  • Purpose of Guide
  • Design Flaws to Avoid
  • Glossary of Research Terms
  • Reading Research Effectively
  • Narrowing a Topic Idea
  • Broadening a Topic Idea
  • Extending the Timeliness of a Topic Idea
  • Academic Writing Style
  • Applying Critical Thinking
  • Choosing a Title
  • Making an Outline
  • Paragraph Development
  • Research Process Video Series
  • Executive Summary
  • The C.A.R.S. Model
  • Background Information
  • The Research Problem/Question
  • Theoretical Framework
  • Citation Tracking
  • Content Alert Services
  • Evaluating Sources
  • Primary Sources
  • Secondary Sources
  • Tiertiary Sources
  • Scholarly vs. Popular Publications
  • Qualitative Methods
  • Quantitative Methods
  • Insiderness
  • Using Non-Textual Elements
  • Limitations of the Study
  • Common Grammar Mistakes
  • Writing Concisely
  • Avoiding Plagiarism
  • Footnotes or Endnotes?
  • Further Readings
  • Generative AI and Writing
  • USC Libraries Tutorials and Other Guides
  • Bibliography

Definitions

Dependent Variable The variable that depends on other factors that are measured. These variables are expected to change as a result of an experimental manipulation of the independent variable or variables. It is the presumed effect.

Independent Variable The variable that is stable and unaffected by the other variables you are trying to measure. It refers to the condition of an experiment that is systematically manipulated by the investigator. It is the presumed cause.

Cramer, Duncan and Dennis Howitt. The SAGE Dictionary of Statistics . London: SAGE, 2004; Penslar, Robin Levin and Joan P. Porter. Institutional Review Board Guidebook: Introduction . Washington, DC: United States Department of Health and Human Services, 2010; "What are Dependent and Independent Variables?" Graphic Tutorial.

Identifying Dependent and Independent Variables

Don't feel bad if you are confused about what is the dependent variable and what is the independent variable in social and behavioral sciences research . However, it's important that you learn the difference because framing a study using these variables is a common approach to organizing the elements of a social sciences research study in order to discover relevant and meaningful results. Specifically, it is important for these two reasons:

  • You need to understand and be able to evaluate their application in other people's research.
  • You need to apply them correctly in your own research.

A variable in research simply refers to a person, place, thing, or phenomenon that you are trying to measure in some way. The best way to understand the difference between a dependent and independent variable is that the meaning of each is implied by what the words tell us about the variable you are using. You can do this with a simple exercise from the website, Graphic Tutorial. Take the sentence, "The [independent variable] causes a change in [dependent variable] and it is not possible that [dependent variable] could cause a change in [independent variable]." Insert the names of variables you are using in the sentence in the way that makes the most sense. This will help you identify each type of variable. If you're still not sure, consult with your professor before you begin to write.

Fan, Shihe. "Independent Variable." In Encyclopedia of Research Design. Neil J. Salkind, editor. (Thousand Oaks, CA: SAGE, 2010), pp. 592-594; "What are Dependent and Independent Variables?" Graphic Tutorial; Salkind, Neil J. "Dependent Variable." In Encyclopedia of Research Design , Neil J. Salkind, editor. (Thousand Oaks, CA: SAGE, 2010), pp. 348-349;

Structure and Writing Style

The process of examining a research problem in the social and behavioral sciences is often framed around methods of analysis that compare, contrast, correlate, average, or integrate relationships between or among variables . Techniques include associations, sampling, random selection, and blind selection. Designation of the dependent and independent variable involves unpacking the research problem in a way that identifies a general cause and effect and classifying these variables as either independent or dependent.

The variables should be outlined in the introduction of your paper and explained in more detail in the methods section . There are no rules about the structure and style for writing about independent or dependent variables but, as with any academic writing, clarity and being succinct is most important.

After you have described the research problem and its significance in relation to prior research, explain why you have chosen to examine the problem using a method of analysis that investigates the relationships between or among independent and dependent variables . State what it is about the research problem that lends itself to this type of analysis. For example, if you are investigating the relationship between corporate environmental sustainability efforts [the independent variable] and dependent variables associated with measuring employee satisfaction at work using a survey instrument, you would first identify each variable and then provide background information about the variables. What is meant by "environmental sustainability"? Are you looking at a particular company [e.g., General Motors] or are you investigating an industry [e.g., the meat packing industry]? Why is employee satisfaction in the workplace important? How does a company make their employees aware of sustainability efforts and why would a company even care that its employees know about these efforts?

Identify each variable for the reader and define each . In the introduction, this information can be presented in a paragraph or two when you describe how you are going to study the research problem. In the methods section, you build on the literature review of prior studies about the research problem to describe in detail background about each variable, breaking each down for measurement and analysis. For example, what activities do you examine that reflect a company's commitment to environmental sustainability? Levels of employee satisfaction can be measured by a survey that asks about things like volunteerism or a desire to stay at the company for a long time.

The structure and writing style of describing the variables and their application to analyzing the research problem should be stated and unpacked in such a way that the reader obtains a clear understanding of the relationships between the variables and why they are important. This is also important so that the study can be replicated in the future using the same variables but applied in a different way.

Fan, Shihe. "Independent Variable." In Encyclopedia of Research Design. Neil J. Salkind, editor. (Thousand Oaks, CA: SAGE, 2010), pp. 592-594; "What are Dependent and Independent Variables?" Graphic Tutorial; “Case Example for Independent and Dependent Variables.” ORI Curriculum Examples. U.S. Department of Health and Human Services, Office of Research Integrity; Salkind, Neil J. "Dependent Variable." In Encyclopedia of Research Design , Neil J. Salkind, editor. (Thousand Oaks, CA: SAGE, 2010), pp. 348-349; “Independent Variables and Dependent Variables.” Karl L. Wuensch, Department of Psychology, East Carolina University [posted email exchange]; “Variables.” Elements of Research. Dr. Camille Nebeker, San Diego State University.

  • << Previous: Design Flaws to Avoid
  • Next: Glossary of Research Terms >>
  • Last Updated: May 30, 2024 9:38 AM
  • URL: https://libguides.usc.edu/writingguide
  • How it works

researchprospect post subheader

Types of Variables – A Comprehensive Guide

Published by Carmen Troy at August 14th, 2021 , Revised On October 26, 2023

A variable is any qualitative or quantitative characteristic that can change and have more than one value, such as age, height, weight, gender, etc.

Before conducting research, it’s essential to know what needs to be measured or analysed and choose a suitable statistical test to present your study’s findings. 

In most cases, you can do it by identifying the key issues/variables related to your research’s main topic.

Example:  If you want to test whether the hybridisation of plants harms the health of people. You can use the key variables like agricultural techniques, type of soil, environmental factors, types of pesticides used, the process of hybridisation, type of yield obtained after hybridisation, type of yield without hybridisation, etc.

Variables are broadly categorised into:

  • Independent variables
  • Dependent variable
  • Control variable

Independent Vs. Dependent Vs. Control Variable

Type of variable Definition Example
Independent Variable (Stimulus) It is the variable that influences other variables.
Dependent variable (Response) The dependent variable is the outcome of the influence of the independent variable. You want to identify “How refined carbohydrates affect the health of human beings?”

: refined carbohydrates

: the health of human beings

You can manipulate the consumption of refined carbs in your human participants and measure how those levels of consuming processed carbohydrates influence human health.

Control Variables
Control variables are variables that are not changed and kept constant throughout the experiment.

The research includes finding ways:

  • To change the independent variables.
  • To prevent the controlled variables from changing.
  • To measure the dependent variables.

Note:  The term dependent and independent is not applicable in  correlational research  as this is not a  controlled experiment.  A researcher doesn’t have control over the variables. The association and between two or more variables are measured. If one variable affects another one, then it’s called the predictor variable and outcome variable.

Example:  Correlation between investment (predictor variable) and profit (outcome variable)

What data collection best suits your research?

  • Find out by hiring an expert from ResearchProspect today!
  • Despite how challenging the subject may be, we are here to help you.

data collection best suits your research

Types of Variables Based on the Types of Data

A data is referred to as the information and statistics gathered for analysis of a research topic. Data is broadly divided into two categories, such as:

Quantitative/Numerical data  is associated with the aspects of measurement, quantity, and extent. 

Categorial data  is associated with groupings.

A qualitative variable consists of qualitative data, and a quantitative variable consists of a quantitative variable.

Types of variable

Quantitative Variable

The quantitative variable is associated with measurement, quantity, and extent, like how many . It follows the statistical, mathematical, and computational techniques in numerical data such as percentages and statistics. The research is conducted on a large group of population.

Example:  Find out the weight of students of the fifth standard studying in government schools.

The quantitative variable can be further categorised into continuous and discrete.

Type of variable Definition Example
Continuous Variable A continuous variable is a quantitative variable that can take a value between two specific values.
Discrete Variable A discrete variable is a quantitative variable whose attributes are separated from each other.  Literacy rate, gender, and nationality.

Scale: Nominal and ordinal.

Categorial Variable

The categorical variable includes measurements that vary in categories such as names but not in terms of rank or degree. It means one level of a categorical variable cannot be considered better or greater than another level. 

Example: Gender, brands, colors, zip codes

The categorical variable is further categorised into three types:

Type of variable Definition Example
Dichotomous (Binary) Variable This is the categorical variable with two possible results (Yes/No) Alcoholic (Yes/No)
Nominal Variable Nominal Variable can take the value that is not organised in terms of groups, degree, or rank.
Ordinal Variable Ordinal Variable can take the value that can be logically ordered or ranked.

Note:  Sometimes, an ordinal variable also acts as a quantitative variable. Ordinal data has an order, but the intervals between scale points may be uneven.

Example: Numbers on a rating scale represent the reviews’ rank or range from below average to above average. However, it also represents a quantitative variable showing how many stars and how much rating is given.

Not sure which statistical tests to use for your data?

Let the experts at researchprospect do the daunting work for you..

Using our approach, we illustrate how to collect data, sample sizes, validity, reliability, credibility, and ethics, so you won’t have to do it all by yourself!

Other Types of Variables

It’s important to understand the difference between dependent and independent variables and know whether they are quantitative or categorical to choose the appropriate statistical test.

There are many other types of variables to help you differentiate and understand them.

Also, read a comprehensive guide written about inductive and deductive reasoning .

Type of variable Definition Example
Confounding variables The confounding variable is a hidden variable that produces an association between two unrelated variables because the hidden variable affects both of them. There is an association between water consumption and cold drink sales.

The confounding variable could be the   and compels people to drink a lot of water and a cold drink to reduce heat and thirst caused due to the heat.

Latent Variable These are the variables that cannot be observed or measured directly. Self-confidence and motivation cannot be measured directly. Still, they can be interpreted through other variables such as habits, achievements, perception, and lifestyle.
Composite variables
A composite variable is a combination of multiple variables. It is used to measure multidimensional aspects that are difficult to observe.
  • Entertainment
  • Online education
  • Database management, storage, and retrieval

Frequently Asked Questions

What are the 10 types of variables in research.

The 10 types of variables in research are:

  • Independent
  • Confounding
  • Categorical
  • Extraneous.

What is an independent variable?

An independent variable, often termed the predictor or explanatory variable, is the variable manipulated or categorized in an experiment to observe its effect on another variable, called the dependent variable. It’s the presumed cause in a cause-and-effect relationship, determining if changes in it produce changes in the observed outcome.

What is a variable?

In research, a variable is any attribute, quantity, or characteristic that can be measured or counted. It can take on various values, making it “variable.” Variables can be classified as independent (manipulated), dependent (observed outcome), or control (kept constant). They form the foundation for hypotheses, observations, and data analysis in studies.

What is a dependent variable?

A dependent variable is the outcome or response being studied in an experiment or investigation. It’s what researchers measure to determine the effect of changes in the independent variable. In a cause-and-effect relationship, the dependent variable is presumed to be influenced or caused by the independent variable.

What is a variable in programming?

In programming, a variable is a symbolic name for a storage location that holds data or values. It allows data storage and retrieval for computational operations. Variables have types, like integer or string, determining the nature of data they can hold. They’re fundamental in manipulating and processing information in software.

What is a control variable?

A control variable in research is a factor that’s kept constant to ensure that it doesn’t influence the outcome. By controlling these variables, researchers can isolate the effects of the independent variable on the dependent variable, ensuring that other factors don’t skew the results or introduce bias into the experiment.

What is a controlled variable in science?

In science, a controlled variable is a factor that remains constant throughout an experiment. It ensures that any observed changes in the dependent variable are solely due to the independent variable, not other factors. By keeping controlled variables consistent, researchers can maintain experiment validity and accurately assess cause-and-effect relationships.

How many independent variables should an investigation have?

Ideally, an investigation should have one independent variable to clearly establish cause-and-effect relationships. Manipulating multiple independent variables simultaneously can complicate data interpretation.

However, in advanced research, experiments with multiple independent variables (factorial designs) are used, but they require careful planning to understand interactions between variables.

You May Also Like

Disadvantages of primary research – It can be expensive, time-consuming and take a long time to complete if it involves face-to-face contact with customers.

A confounding variable can potentially affect both the suspected cause and the suspected effect. Here is all you need to know about accounting for confounding variables in research.

This article presents the key advantages and disadvantages of secondary research so you can select the most appropriate research approach for your study.

USEFUL LINKS

LEARNING RESOURCES

researchprospect-reviews-trust-site

COMPANY DETAILS

Research-Prospect-Writing-Service

  • How It Works

Doing Quantitative Research with Outcome Measures

  • First Online: 24 December 2020

Cite this chapter

what is an outcome variable in research

  • Charlie Duncan 2 &
  • Barry McInnes 2  

2416 Accesses

1 Citations

This chapter explores some of the contextual factors and the key drivers shaping the movement towards routine measurements. It touches upon some underlying philosophies behind some key drivers and how they differentially shape the way in which outcome measurement may be implemented. It also considers the body of research which demonstrates that while different therapeutic approaches are broadly similar in their outcomes, at a service and practitioner level there is considerable variance. We provide examples of outcome measurement from research and practice settings, inviting you to consider options of engaging with measurement of outcome in therapy settings to both learn and evidence effective prac tice

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

BACP. (2018a). Ethical Framework for the Counselling Professions . Available at: https://www.bacp.co.uk/events-and-resources/ethics-and-standards/ethical-framework-for-the-counselling-professions/.

BACP. (2018b). Ethical Guidelines for Research in the Counselling Professions . Available at: https://www.bacp.co.uk/events-and-resources/research/publications/ethical-guidelines-for-research-in-the-counselling-professions /.

Börjesson, S., & Boström, P. K. (2019). “I want to know what it is used for”: Clients’ perspectives on completing a routine outcome measure (ROM) while undergoing psychotherapy. Psychotherapy Research . https://doi.org/10.1080/10503307.2019.1630780 .

Boynton, P. M., & Greenhalgh, T. (2004). Selecting, designing, and developing your questionnaire. BMJ, 328 (7451), 1312–1315. https://doi.org/10.1136/bmj.328.7451.1312 .

Article   Google Scholar  

Brown, J., Dreis, S., & Nace, D. K. (1999). What really makes a difference in psychotherapy outcome? Why does managed care want to know? In M. A. Hubble, B. L. Duncan, & S. D. Miller (Eds.), The heart and soul of change: What works in therapy (pp. 389–406). Washington, DC, US: American Psychological Association.

Chapter   Google Scholar  

Chow, D., Miller, S. D., Seidel, J. A., Kane, R. T., Thornton, J., & Andrews, W. P. (2015). The role of deliberate practice in the development of highly effective psychotherapists. Psychotherapy, 52 (3), 337–345. http://dx.doi.org/10.1037/pst0000015

Firth, N., Saxon, D., Stiles, W. B., & Barkham, M. (2019) Therapist and clinic effects in psychotherapy: a three-level model of outcome variability. Journal of Consulting and Clinical Psychology , 87(4), 345–356.

Google Scholar  

Goldberg, S. B., Rousmaniere, T., Miller, S. D., Whipple, J., Nielsen, S. L., et al. (2016). Do psychotherapists improve with time and experience? A longitudinal analysis of outcomes in a clinical setting. Journal of Counseling Psychology, 63 (1), 1–11.

Hannan, C., Lambert, M. J., Harmon, C., Nielsen, S. L., Smart, D. W., et al. (2005). A lab test and algorithms for identifying clients at risk for treatment failure. Journal of Clinical Psychology, 61 (2), 155–163.

Heale, R., & Twycross, A. (2015). Validity and reliability in quantitative studies. Evidence-Based Nursing, 18 , 66–67.

Howard, K. I., Kopta, S. M., Krause, M. S., & Orlinsky, D. E. (1986). The dose-effect relationship in psychotherapy. American Psychologist, 41 (2), 159–164.

Jefford, M., Stockler, M.R., Tattersa, M. (2003) Outcomes research: what is it and why does it matter? Internal Medicine Journal . 33(3):110–8. doi: 10.1046/j.1445-5994.2003.00302.x.

Lambert, M. J., Harmon, C., Slade, K., Whipple, J. L., & Hawkins, E. J. (2005). Providing feedback to psychotherapists on their patients’ progress: Clinical results and practice suggestions. Journal of Clinical Psychology, 61 (2), 165–174.

Lambert, M. J. (2013). Outcome in psychotherapy: The past and important advances. Psychotherapy, 50 (1), 42–51. https://doi.org/10.1037/a0030682

Law, D. (2018). Goals and goal-based outcomes (GBOs): Goal progress chart. Available at: https://goalsintherapycom.files.wordpress.com/2018/03/gbo-version-2-march-2018-final.pdf

Marsella, A. J. (2003). Cultural Aspects of Depressive Experience and Disorders. Online Readings in Psychology and Culture, 10 (2). https://doi.org/10.9707/2307-0919.1081 .

Miller, S. D., Duncan, B. L., Sorrell, R., Brown, G. S., & Chalk, M. B. (2006). Using outcome to inform therapy practice. Journal of Brief Therapy , 5(1), 5–22.

Okiishi, J., Lambert, M., Nielsen, S. L., & Ogles, B. M. (2003). Waiting for supershrink: An empirical analysis of therapist effects. Clinical Psychology and Psychotherapy, 10 (6), 361–373.

Stafford, M. R., Cooper, M., Barkham, M., Beecham, J., Bower, P., Cromarty, K., et al. (2018). Effectiveness and cost-effectiveness of humanistic counselling in schools for young people with emotional distress (ETHOS): study protocol for a randomised controlled trial. Trials, 19 , 175. https://doi.org/10.1186/s13063-018-2538-2 .

Van Rijn, B. (2020). Evaluating Our Practice. In S. Bager-Charleson (Ed.), Reflective Practice and Personal Development in the field of Therapy . London: Sage.

Wampold, B. E., & Imel, Z. E. (2015). The great psychotherapy debate: The evidence for what makes psychotherapy work (Second Edition) . New York, New York: Routledge. https://doi.org/10.4324/9780203582015 .

Book   Google Scholar  

Whipple, J. L., Lambert, M. J., Vermeersch, D. A., Smart, D. W., Nielsen, S. L., & Hawkins, E. J. (2003). Improving the effects of psychotherapy: The use of early identification of treatment and problem-solving strategies in routine practice. Journal of Counseling Psychology, 50 (1), 59–68.

Download references

Author information

Authors and affiliations.

(BACP) The British Association for Counselling and Psychotherapy, Lutterworth, Leicestershire, UK

Charlie Duncan & Barry McInnes

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Charlie Duncan .

Editor information

Editors and affiliations.

Metanoia Institute, London, UK

Sofie Bager-Charleson  & Alistair McBeath  & 

Rights and permissions

Reprints and permissions

Copyright information

© 2020 The Author(s)

About this chapter

Duncan, C., McInnes, B. (2020). Doing Quantitative Research with Outcome Measures. In: Bager-Charleson, S., McBeath, A. (eds) Enjoying Research in Counselling and Psychotherapy. Palgrave Macmillan, Cham. https://doi.org/10.1007/978-3-030-55127-8_11

Download citation

DOI : https://doi.org/10.1007/978-3-030-55127-8_11

Published : 24 December 2020

Publisher Name : Palgrave Macmillan, Cham

Print ISBN : 978-3-030-55126-1

Online ISBN : 978-3-030-55127-8

eBook Packages : Behavioral Science and Psychology Behavioral Science and Psychology (R0)

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research
  • Marketing Strategy
  • Five Forces
  • Business Lists
  • Competitors
  • Statistics ›

Outcome variables - Meaning & Definition

What is outcome variables.

Outcome variables are usually the dependent variables which are observed and measured by changing independent variables. These variables determine the effect of the cause (independent) variables when changed for different values. The dependent variables are the outcomes of the experiments determining what was caused or what changed as a result of the study.

For a simple example, the marks a student obtains in an exam is a result of the hard word measured in the number of hours put behind studying and the intelligence measured in IQ are the independent variables. The marks obtained thus represents the dependent or outcome variable. When the values of the independent variables are changed, the marks may or may not change and hence dependent variables are dependent on the independent variables while the opposite is implausible i.e., when the marks are changed, it doesn’t change the number of hours of study or the IQ of the student.

The response variable is also called as the dependent variable because it depends on the causal factor, the independent variable. Depending on the various input values of the experimental variables, the responses are recorded.

This article has been researched & authored by the Business Concepts Team . It has been reviewed & published by the MBA Skool Team. The content on MBA Skool has been created for educational & academic purpose only.

Browse the definition and meaning of more similar terms. The Management Dictionary covers over 1800 business concepts from 5 categories.

Continue Reading:

  • Class Interval
  • Multiple Regression
  • Frequency Distribution
  • Marketing & Strategy Terms
  • Human Resources (HR) Terms
  • Operations & SCM Terms
  • IT & Systems Terms
  • Statistics Terms

Facebook Share

What is MBA Skool? About Us

MBA Skool is a Knowledge Resource for Management Students, Aspirants & Professionals.

Business Courses

  • Operations & SCM
  • Human Resources

Quizzes & Skills

  • Management Quizzes
  • Skills Tests

Quizzes test your expertise in business and Skill tests evaluate your management traits

Related Content

  • Inventory Costs
  • Sales Quota
  • Quality Control
  • Training and Development
  • Capacity Management
  • Work Life Balance
  • More Definitions

All Business Sections

  • Business Concepts
  • SWOT Analysis
  • Marketing Strategy & Mix
  • PESTLE Analysis
  • Five Forces Analysis
  • Top Brand Lists

Write for Us

  • Submit Content
  • Privacy Policy
  • Contribute Content
  • Web Stories

FB Page

Research Hypothesis In Psychology: Types, & Examples

Saul Mcleod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul Mcleod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Learn about our Editorial Process

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

On This Page:

A research hypothesis, in its plural form “hypotheses,” is a specific, testable prediction about the anticipated results of a study, established at its outset. It is a key component of the scientific method .

Hypotheses connect theory to data and guide the research process towards expanding scientific understanding

Some key points about hypotheses:

  • A hypothesis expresses an expected pattern or relationship. It connects the variables under investigation.
  • It is stated in clear, precise terms before any data collection or analysis occurs. This makes the hypothesis testable.
  • A hypothesis must be falsifiable. It should be possible, even if unlikely in practice, to collect data that disconfirms rather than supports the hypothesis.
  • Hypotheses guide research. Scientists design studies to explicitly evaluate hypotheses about how nature works.
  • For a hypothesis to be valid, it must be testable against empirical evidence. The evidence can then confirm or disprove the testable predictions.
  • Hypotheses are informed by background knowledge and observation, but go beyond what is already known to propose an explanation of how or why something occurs.
Predictions typically arise from a thorough knowledge of the research literature, curiosity about real-world problems or implications, and integrating this to advance theory. They build on existing literature while providing new insight.

Types of Research Hypotheses

Alternative hypothesis.

The research hypothesis is often called the alternative or experimental hypothesis in experimental research.

It typically suggests a potential relationship between two key variables: the independent variable, which the researcher manipulates, and the dependent variable, which is measured based on those changes.

The alternative hypothesis states a relationship exists between the two variables being studied (one variable affects the other).

A hypothesis is a testable statement or prediction about the relationship between two or more variables. It is a key component of the scientific method. Some key points about hypotheses:

  • Important hypotheses lead to predictions that can be tested empirically. The evidence can then confirm or disprove the testable predictions.

In summary, a hypothesis is a precise, testable statement of what researchers expect to happen in a study and why. Hypotheses connect theory to data and guide the research process towards expanding scientific understanding.

An experimental hypothesis predicts what change(s) will occur in the dependent variable when the independent variable is manipulated.

It states that the results are not due to chance and are significant in supporting the theory being investigated.

The alternative hypothesis can be directional, indicating a specific direction of the effect, or non-directional, suggesting a difference without specifying its nature. It’s what researchers aim to support or demonstrate through their study.

Null Hypothesis

The null hypothesis states no relationship exists between the two variables being studied (one variable does not affect the other). There will be no changes in the dependent variable due to manipulating the independent variable.

It states results are due to chance and are not significant in supporting the idea being investigated.

The null hypothesis, positing no effect or relationship, is a foundational contrast to the research hypothesis in scientific inquiry. It establishes a baseline for statistical testing, promoting objectivity by initiating research from a neutral stance.

Many statistical methods are tailored to test the null hypothesis, determining the likelihood of observed results if no true effect exists.

This dual-hypothesis approach provides clarity, ensuring that research intentions are explicit, and fosters consistency across scientific studies, enhancing the standardization and interpretability of research outcomes.

Nondirectional Hypothesis

A non-directional hypothesis, also known as a two-tailed hypothesis, predicts that there is a difference or relationship between two variables but does not specify the direction of this relationship.

It merely indicates that a change or effect will occur without predicting which group will have higher or lower values.

For example, “There is a difference in performance between Group A and Group B” is a non-directional hypothesis.

Directional Hypothesis

A directional (one-tailed) hypothesis predicts the nature of the effect of the independent variable on the dependent variable. It predicts in which direction the change will take place. (i.e., greater, smaller, less, more)

It specifies whether one variable is greater, lesser, or different from another, rather than just indicating that there’s a difference without specifying its nature.

For example, “Exercise increases weight loss” is a directional hypothesis.

hypothesis

Falsifiability

The Falsification Principle, proposed by Karl Popper , is a way of demarcating science from non-science. It suggests that for a theory or hypothesis to be considered scientific, it must be testable and irrefutable.

Falsifiability emphasizes that scientific claims shouldn’t just be confirmable but should also have the potential to be proven wrong.

It means that there should exist some potential evidence or experiment that could prove the proposition false.

However many confirming instances exist for a theory, it only takes one counter observation to falsify it. For example, the hypothesis that “all swans are white,” can be falsified by observing a black swan.

For Popper, science should attempt to disprove a theory rather than attempt to continually provide evidence to support a research hypothesis.

Can a Hypothesis be Proven?

Hypotheses make probabilistic predictions. They state the expected outcome if a particular relationship exists. However, a study result supporting a hypothesis does not definitively prove it is true.

All studies have limitations. There may be unknown confounding factors or issues that limit the certainty of conclusions. Additional studies may yield different results.

In science, hypotheses can realistically only be supported with some degree of confidence, not proven. The process of science is to incrementally accumulate evidence for and against hypothesized relationships in an ongoing pursuit of better models and explanations that best fit the empirical data. But hypotheses remain open to revision and rejection if that is where the evidence leads.
  • Disproving a hypothesis is definitive. Solid disconfirmatory evidence will falsify a hypothesis and require altering or discarding it based on the evidence.
  • However, confirming evidence is always open to revision. Other explanations may account for the same results, and additional or contradictory evidence may emerge over time.

We can never 100% prove the alternative hypothesis. Instead, we see if we can disprove, or reject the null hypothesis.

If we reject the null hypothesis, this doesn’t mean that our alternative hypothesis is correct but does support the alternative/experimental hypothesis.

Upon analysis of the results, an alternative hypothesis can be rejected or supported, but it can never be proven to be correct. We must avoid any reference to results proving a theory as this implies 100% certainty, and there is always a chance that evidence may exist which could refute a theory.

How to Write a Hypothesis

  • Identify variables . The researcher manipulates the independent variable and the dependent variable is the measured outcome.
  • Operationalized the variables being investigated . Operationalization of a hypothesis refers to the process of making the variables physically measurable or testable, e.g. if you are about to study aggression, you might count the number of punches given by participants.
  • Decide on a direction for your prediction . If there is evidence in the literature to support a specific effect of the independent variable on the dependent variable, write a directional (one-tailed) hypothesis. If there are limited or ambiguous findings in the literature regarding the effect of the independent variable on the dependent variable, write a non-directional (two-tailed) hypothesis.
  • Make it Testable : Ensure your hypothesis can be tested through experimentation or observation. It should be possible to prove it false (principle of falsifiability).
  • Clear & concise language . A strong hypothesis is concise (typically one to two sentences long), and formulated using clear and straightforward language, ensuring it’s easily understood and testable.

Consider a hypothesis many teachers might subscribe to: students work better on Monday morning than on Friday afternoon (IV=Day, DV= Standard of work).

Now, if we decide to study this by giving the same group of students a lesson on a Monday morning and a Friday afternoon and then measuring their immediate recall of the material covered in each session, we would end up with the following:

  • The alternative hypothesis states that students will recall significantly more information on a Monday morning than on a Friday afternoon.
  • The null hypothesis states that there will be no significant difference in the amount recalled on a Monday morning compared to a Friday afternoon. Any difference will be due to chance or confounding factors.

More Examples

  • Memory : Participants exposed to classical music during study sessions will recall more items from a list than those who studied in silence.
  • Social Psychology : Individuals who frequently engage in social media use will report higher levels of perceived social isolation compared to those who use it infrequently.
  • Developmental Psychology : Children who engage in regular imaginative play have better problem-solving skills than those who don’t.
  • Clinical Psychology : Cognitive-behavioral therapy will be more effective in reducing symptoms of anxiety over a 6-month period compared to traditional talk therapy.
  • Cognitive Psychology : Individuals who multitask between various electronic devices will have shorter attention spans on focused tasks than those who single-task.
  • Health Psychology : Patients who practice mindfulness meditation will experience lower levels of chronic pain compared to those who don’t meditate.
  • Organizational Psychology : Employees in open-plan offices will report higher levels of stress than those in private offices.
  • Behavioral Psychology : Rats rewarded with food after pressing a lever will press it more frequently than rats who receive no reward.

Print Friendly, PDF & Email

Related Articles

Qualitative Data Coding

Research Methodology

Qualitative Data Coding

What Is a Focus Group?

What Is a Focus Group?

Cross-Cultural Research Methodology In Psychology

Cross-Cultural Research Methodology In Psychology

What Is Internal Validity In Research?

What Is Internal Validity In Research?

What Is Face Validity In Research? Importance & How To Measure

Research Methodology , Statistics

What Is Face Validity In Research? Importance & How To Measure

Criterion Validity: Definition & Examples

Criterion Validity: Definition & Examples

  • Research article
  • Open access
  • Published: 15 February 2023

The impact of food insecurity on health outcomes: empirical evidence from sub-Saharan African countries

  • Sisay Demissew Beyene   ORCID: orcid.org/0000-0001-7347-4168 1  

BMC Public Health volume  23 , Article number:  338 ( 2023 ) Cite this article

25k Accesses

19 Citations

7 Altmetric

Metrics details

Food insecurity adversely affects human health, which means food security and nutrition are crucial to improving people’s health outcomes. Both food insecurity and health outcomes are the policy and agenda of the 2030 Sustainable Development Goals (SDGs). However, there is a lack of macro-level empirical studies (Macro-level study means studies at the broadest level using variables that represent a given country or the whole population of a country or economy as a whole. For example, if the urban population (% of the total population) of XYZ country is 30%, it is used as a proxy variable to represent represent country's urbanization level. Empirical study implies studies that employ the econometrics method, which is the application of math and statistics.) concerning the relationship between food insecurity and health outcomes in sub-Saharan African (SSA) countries though the region is highly affected by food insecurity and its related health problems. Therefore, this study aims to examine the impact of food insecurity on life expectancy and infant mortality in SSA countries.

The study was conducted for the whole population of 31 sampled SSA countries selected based on data availability. The study uses secondary data collected online from the databases of the United Nations Development Programme (UNDP), the Food and Agricultural Organization (FAO), and the World Bank (WB). The study uses yearly balanced data from 2001 to 2018. This study employs a multicountry panel data analysis and several estimation techniques; it employs Driscoll-Kraay standard errors (DKSE), a generalized method of momentum (GMM), fixed effects (FE), and the Granger causality test.

A 1% increment in people’s prevalence for undernourishment reduces their life expectancy by 0.00348 percentage points (PPs). However, life expectancy rises by 0.00317 PPs with every 1% increase in average dietary energy supply. A 1% rise in the prevalence of undernourishment increases infant mortality by 0.0119 PPs. However, a 1% increment in average dietary energy supply reduces infant mortality by 0.0139 PPs.

Conclusions

Food insecurity harms the health status of SSA countries, but food security impacts in the reverse direction. This implies that to meet SDG 3.2, SSA should ensure food security.

Peer Review reports

Food security is essential to people’s health and well-being [ 1 ]. Further, the World Health Organization (WHO) argues that health is wealth and poor health is an integral part of poverty; governments should actively seek to preserve their people’s lives and reduce the incidence of unnecessary mortality and avoidable illnesses [ 2 ]. However, lack of food is one of the factors which affect health outcomes. Concerning this, the Food Research and Action Center noted that the social determinants of health, such as poverty and food insecurity, are associated with some of the most severe and costly health problems in a nation [ 3 ].

According to the FAO, the International Fund for Agricultural Development (IFAD), and the World Food Programme (WFP), food insecurity is defined as "A situation that exists when people lack secure access to sufficient amounts of safe and nutritious food for normal growth and development and an active and healthy life" ([ 4 ]; p50). It is generally believed that food security and nutrition are crucial to improving human health and development. Studies show that millions of people live in food insecurity, which is one of the main risks to human health. Around one in four people globally (1.9 billion people) were moderately or severely food insecure in 2017 and the greatest numbers were in SSA and South Asia. Around 9.2% of the world's population was severely food insecure in 2018. Food insecurity is highest in SSA countries, where nearly one-third are defined as severely insecure [ 5 ]. Similarly, 11% (820 million) of the world's population was undernourished in 2018, and SSA countries still share a substantial amount [ 5 ]. Even though globally the number of people affected by hunger has been decreasing since 1990, in recent years (especially since 2015) the number of people living in food insecurity has increased. It will be a huge challenge to achieve the SDGs of zero hunger by 2030 [ 6 ]. FAO et al. [ 7 ] projected that one in four individuals in SSA were undernourished in 2017. Moreover, FAO et al. [ 8 ] found that, between 2014 and 2018, the prevalence of undernourishment worsened. Twenty percent of the continent's population, or 256 million people, are undernourished today, of which 239 million are in SSA. Hidden hunger is also one of the most severe types of malnutrition (micronutrient deficiencies). One in three persons suffers from inadequacies related to hidden hunger, which impacts two billion people worldwide [ 9 ]. Similarly, SSA has a high prevalence of hidden hunger [ 10 , 11 ].

An important consequence of food insecurity is that around 9 million people die yearly worldwide due to hunger and hunger-related diseases. This is more than from Acquired Immunodeficiency Syndrome (AIDS), malaria, and tuberculosis combined [ 6 ]. Even though the hunger crisis affects many people of all genders and ages, children are particularly affected in Africa. There are too many malnourished children in Africa, and malnutrition is a major factor in the high infant mortality rates and causes physical and mental development delays and disorders in SSA [ 12 ]. According to UN statistics, chronic malnutrition globally accounts for 165 million stunted or underweight children. Around 75% of these kids are from SSA and South Asia. Forty percent of children in SSA are impacted. In SSA, about 3.2 million children under the age of five dies yearly, which is about half of all deaths in this age group worldwide. Malnutrition is responsible for almost one child under the age of five dying every two minutes worldwide. The child mortality rate in the SSA is among the highest in the world, about one in nine children pass away before the age of five [ 12 ].

In addition to the direct impact of food insecurity on health outcomes, it also indirectly contributes to disordered eating patterns, higher or lower blood cholesterol levels, lower serum albumin, lower hemoglobin, vitamin A levels, and poor physical and mental health [ 13 , 14 , 15 ]. Iodine, iron, and zinc deficiency are the most often identified micronutrient deficiencies across all age groups. A deficiency in vitamin A affects an estimated 190 million pre-schoolers and 19 million pregnant women [ 16 ]. Even though it is frequently noted that hidden hunger mostly affects pregnant women, children, and teenagers, it further affects people’s health at all stages of life [ 17 ].

With the above information, researchers and policymakers should focus on the issue of food insecurity and health status. The SDGs that were developed in 2015 intend to end hunger in 2030 as one of its primary targets. However, a growing number of people live with hunger and food insecurity, leading to millions of deaths. Hence, this study questioned what is the impact of food insecurity on people's health outcomes in SSA countries. In addition, despite the evidence implicating food insecurity and poor health status, there is a lack of macro-level empirical studies concerning the impact of food insecurity on people’s health status in SSA countries, which leads to a knowledge (literature) gap. Therefore, this study aims to examine the impact of food insecurity on life expectancy and infant mortality in SSA countries for the period ranging from 2001–2018 using panel mean regression approaches.

Theoretical and conceptual framework

Structural factors, such as climate, socio-economic, social, and local food availability, affect people’s food security. People’s health condition is impacted by food insecurity through nutritional, mental health, and behavioral channels [ 18 ]. Under the nutritional channel, food insecurity has an impact on total caloric intake, diet quality, and nutritional status [ 19 , 20 , 21 ]. Hunger and undernutrition may develop when food supplies are scarce, and these conditions may potentially lead to wasting, stunting, and immunological deficiencies [ 22 ]. However, food insecurity also negatively influences health due to its effects on obesity, women's disordered eating patterns [ 23 ], and poor diet quality [ 24 ].

Under the mental health channel, Whitaker et al. [ 25 ] noted that food insecurity is related to poor mental health conditions (stress, sadness, and anxiety), which have also been linked to obesity and cardiovascular risk [ 26 ]. The effects of food insecurity on mental health can worsen the health of people who are already sick as well as lead to disease acquisition [ 18 ]. Similarly, the behavioral channel argues that there is a connection between food insecurity and health practices that impact disease management, prevention, and treatment. For example, lack of access to household food might force people to make bad decisions that may raise their risk of sickness, such as relying too heavily on cheap, calorically dense, nutrient-poor meals or participating in risky sexual conduct. In addition, food insecurity and other competing demands for survival are linked to poorer access and adherence to general medical treatment in low-income individuals once they become sick [ 27 , 28 , 29 , 30 ]

Food insecurity increases the likelihood of exposure to HIV and worsens the health of HIV-positive individuals [ 18 ]. Weiser et al. [ 31 ] found that food insecurity increases the likelihood of unsafe sexual activities, aggravating the spread of HIV. It can also raise the possibility of transmission through unsafe newborn feeding practices and worsening maternal health [ 32 ]. In addition, food insecurity has been linked to decreased antiretroviral adherence, declines in physical health status, worse immunologic status [ 33 ], decreased viral suppression [ 34 , 35 ], increased incidence of serious illness [ 36 ], and increased mortality [ 37 ] among people living with HIV.

With the above theoretical relationship between target variables and since this study focuses on the impact of food insecurity on health outcomes, and not on the causes, it adopted the conceptual framework of Weiser et al. [ 18 ] and constructed Fig.  1 .

figure 1

A conceptual framework of food insecurity and health. Source: Modified and constructed by the author using Weiser et al. [ 18 ] conceptual framework. Permission was granted by Taylor & Francis to use their original Figs. (2.2, 2.3, and 2.4); to develop the above figure. Permission number: 1072954

Several findings associate food insecurity with poorer health, worse disease management, and a higher risk of premature mortality even though they used microdata. For instance, Stuff et al. [ 38 ] found that food insecurity is related to poor self-reported health status, obesity [ 39 ], abnormal blood lipids [ 40 ], a rise in diabetes [ 24 , 40 ], increased gestational diabetes[ 41 ], increased perceived stress, depression and anxiety among women [ 25 , 42 ], Human Immunodeficiency Virus (HIV) acquisition risk [ 43 , 44 , 45 ], childhood stunting [ 46 ], poor health [ 47 ], mental health and behavioral problem [ 25 , 48 , 49 ].

The above highlight micro-level empirical studies, and since the scope of this study is macro-level, Table 1 provides only the existing macro-level empirical findings related to the current study.

Empirical findings in Table 1 are a few, implying a limited number of macro-level level empirical findings. Even the existing macro-level studies have several limitations. For instance, most studies either employed conventional estimation techniques or overlooked basic econometric tests; thus, their results and policy implications may mislead policy implementers. Except for Hameed et al. [ 53 ], most studies’ data are either outdated or unbalanced; hence, their results and policy implications may not be valuable in the dynamic world and may not be accurate like balanced data. Besides, some studies used limited (one) sampled countries; however, few sampled countries and observations do not get the asymptotic properties of an estimator [ 56 ]. Therefore, this study tries to fill the existing gaps by employing robust estimation techniques with initial diagnostic and post-estimation tests, basic panel econometric tests and robustness checks, updated data, a large number of samples.

Study setting and participants

According to Smith and Meade [ 57 ], the highest rates of both food insecurity and severe food insecurity were found in Sub-Saharan Africa in 2017 (55 and 28%, respectively), followed by Latin America and the Caribbean (32 and 12%, respectively) and South Asia (30 and 13%). Similarly, SSA countries have worst health outcomes compared to other regions. For instance, in 2020, the region had the lowest life expectancy [ 58 ] and highest infant mortality [ 59 ]. Having the above information, this study's target population are SSA countries chosen purposively. However, even though SSA comprises 49 of Africa's 55 countries that are entirely or partially south of the Sahara Desert. This study is conducted for a sample of 31 SSA countries (Angola, Benin, Botswana, Burkina Faso, Cameroon, Cabo Verde, Chad, Congo Rep., Côte d'Ivoire, Ethiopia, Gabon, The Gambia, Ghana, Kenya, Lesotho, Liberia, Madagascar, Malawi, Mali, Mauritania, Mauritius, Mozambique, Namibia, Nigeria, Rwanda, Senegal, Sierra Leone, South Africa, Sudan, Tanzania, and Togo). The sampled countries are selected based on data accessibility for each variable included in the empirical models from 2001 to 2018. Since SSA countries suffer from food insecurity and related health problems, this study believes the sampled countries are appropriate and represent the region. Moreover, since this study included a large sample size, it improves the estimator’s precision.

Data type, sources, and scope

This study uses secondary data collected in December 2020 online from the databases of the Food and Agricultural Organization (FAO), the United Nations Development Programme (UNDP), and the World Bank (WB) (see Table 2 ). In addition, the study uses yearly balanced data from 2001 to 2018, which is appropriate because it captures the Millennium Development Goals, SDGs, and other economic conditions, such as the rise of SSA countries’ economies and the global financial crisis of the 2000s. Therefore, this study considers various global development programs and events. Generally, the scope of this study (sampled countries and time) is sufficient to represent SSA countries. In other words, the study has n*T = 558 observations, which fulfills the large sample size criteria recommended by Kennedy [ 56 ].

The empirical model

Model specification is vital to conduct basic panel data econometric tests and estimate the relationship of target variables. Besides social factors, the study includes economic factors determining people's health status. Moreover, it uses two proxies indicators to measure both food insecurity and health status; hence, it specifies the general model as follows:

The study uses four models to analyze the impact of food insecurity on health outcomes.

where LNLEXP and LNINFMOR (dependent variables) refer to the natural logarithm of life expectancy at birth and infant mortality used as proxy variables for health outcomes. Similarly, PRUND and AVRDES are the prevalence of undernourishment and average dietary energy supply adequacy – proxy and predictor variables for food insecurity.

Moreover, to regulate countries’ socio-economic conditions and to account for time-varying bias that can contribute to changes in the dependent variable, the study included control variables, such as GDPPC, GOVEXP, MNSCHOOL, and URBAN. GDPPC is GDP per capita, GOVEXP refers to domestic general government health expenditure, MNSCHOOL is mean years of schooling and URBAN refers to urbanization. Further, n it , v it , ε it , and μ it are the stochastic error terms at period t. The parameters \({\alpha }_{0}, { \beta }_{0}, { \theta }_{0},{ \delta }_{0}\) refer to intercept terms and \({\alpha }_{1}-{\alpha }_{5}, {\beta }_{1}-{\beta }_{5}, { \theta }_{1}-{\theta }_{5}, and {\delta }_{1}-{\delta }_{5}\) are the long-run estimation coefficients. Since health outcomes and food insecurity have two indicators used as proxy variables, this study estimates different alternative models and robustness checks of the main results. Furthermore, the above models did not address heterogeneity problems; hence, this study considers unobserved heterogeneity by introducing cross-section and time heterogeneity in the models. This is accomplished by assuming a two-way error component for the disturbances with:

From Eq.  2 , the unobservable individual (cross-section) and unobservable time heterogeneities are described by \({\delta }_{i} and {\tau }_{t}\) (within components), respectively. Nonetheless, the remaining random error term is \({\gamma }_{it}\) (panel or between components). Therefore, the error terms in model 1A-1D will be substituted by the right-hand side elements of Eq.  2 .

Depending on the presumptions of whether the error elements are fixed or random, the FE and RE models are the two kinds of models that will be evaluated. Equation ( 2 ) yields a two-way FE error component model, or just a FE model if the assumptions are that \({\delta }_{i} and {\tau }_{t}\) are fixed parameters to be estimated and that the random error component, \({\gamma }_{it}\) , is uniformly and independently distributed with zero mean and constant variance (homoscedasticity).

Equation ( 2 ), on the other hand, provides a two-way RE error component model or a RE model if we suppose \({\delta }_{i} and {\tau }_{t}\) are random, just like the random error term, or \({\delta }_{i},{\tau }_{t}, and {\gamma }_{it}\) are all uniformly and independently distributed with zero mean and constant variance, or they are all independent of each other and independent variables [ 60 ].

Rather than considering both error components, \({\delta }_{i}, and {\tau }_{t}\) , we can examine only one of them at a time (fixed or random), yielding a one-way error component model, FE or RE. The stochastic error term \({\varpi }_{it}\) in Eq.  2 will then be:

Statistical analysis

This study conducted descriptive statistics, correlation analysis, and initial diagnosis tests (cross-sectional and time-specific fixed effect, outliers and influential observations, multicollinearity, normality, heteroscedasticity, and serial correlation test). Moreover, it provides basic panel econometric tests and panel data estimation techniques. For consistency, statistical software (STATA) version 15 was used for all analyses.

Descriptive statistics and correlation analysis

Descriptive statistics is essential to know the behavior of the variables in the model. Therefore, it captures information, such as the mean, standard deviation, minimum, maximum, skewness, and kurtosis. Similarly, the study conducted Pearson correlation analysis to assess the degree of relationship between the variables.

Initial diagnosis

Cross-sectional and time-specific fixed effect.

One can anticipate differences arising over time or within the cross-sectional units, given that the panel data set comprises repeated observations over the same units gathered over many periods. Therefore, before estimation, this study considered unexplained heterogeneity in the models. One fundamental limitation of cross-section, panel, and time series data regression is that they do not account for country and time heterogeneity [ 60 ]. These unobserved differences across nations and over time are crucial in how the error term is represented and the model is evaluated. These unobserved heterogeneities, however, may be represented by including both country and time dummies in the regression. However, if the parameters exceed the number of observations, the estimate will fail [ 60 ]. However, in this study, the models can be estimated. If we include both country and time dummies, we may assume that the slope coefficients are constant, but the intercept varies across countries and time, yielding the two-way error components model. As a result, this study examines the null hypothesis that intercepts differ across nations and time in general.

Detecting outliers and influential observations

In regression analysis, outliers and influential observations may provide biased findings. Therefore, the Cooks D outlier and influential observation test was used in the study to handle outliers and influencing observations. To evaluate whether these outliers have a stronger impact on the model to be estimated, each observation in this test was reviewed and compared with Cook’s D statistic [ 61 ]. Cook distance evaluates the extent to which observation impacts the entire model or the projected values. Hence, this study tested the existence of outliers.

Normality, heteroscedasticity, multicollinearity, and serial correlation test

Before the final regression result, the data used for the variables were tested for normality, heteroscedasticity, multicollinearity, and serial correlation to examine the characteristics of the sample.

Regression models should be checked for nonnormal error terms because a lack of Gaussianity (normal distribution) can occasionally compromise the accuracy of estimation and testing techniques. Additionally, the validity of inference techniques, specification tests, and forecasting critically depends on the normalcy assumption [ 62 ]. Similarly, multicollinearity in error terms leads to a dataset being highly sensitive to a minor change, instability in the regression model, and skewed and unreliable results. Therefore, this study conducted the normality using Alejo et al. [ 62 ] proposed command and multicollinearity (using VIF) tests.

Most conventional panel data estimation methods rely on homoscedastic individual error variance and constant serial correlation. Since the error component is typically connected to the variance that is not constant during the observation and is serially linked across periods, these theoretical presumptions have lately reduced the applicability of various panel data models. Serial correlation and heteroskedasticity are two estimate issues frequently connected to cross-sectional and time series data, respectively. Similarly, panel data is not free from these issues because it includes cross-sections and time series, making the estimated parameters ineffective, and rendering conclusions drawn from the estimation incorrect [ 63 ]. Therefore, this study used the Wooldridge [ 63 ] test for serial correlation in linear panel models as well as the modified Wald test for heteroskedasticity.

Basic panel econometric tests

The basic panel data econometric tests are prerequisites for estimating the panel data. The three main basic panel data tests are cross-sectional dependence, unit root, and cointegration.

Cross-sectional dependence (CD)

A growing body of the panel data literature concludes that panel data models are likely to exhibit substantial CD in the errors resulting from frequent shocks, unobserved components, spatial dependence, and idiosyncratic pairwise dependence. Even though the impact of CD in estimation depends on several factors, relative to the static model, the effect of CD in dynamic panel estimators is more severe [ 64 ]. Moreover, Pesaran [ 65 ] notes that recessions and economic or financial crises potentially affect all countries, even though they might start from just one or two countries. These occurrences inevitably introduce cross-sectional interdependencies across the cross-sectional unit, their regressors, and the error terms. Hence, overlooking the CD in panel data leads to biased estimates and spurious results [ 64 , 66 ]. Further, the CD test determines the type of panel unit root and cointegration tests we should apply. Therefore, examining the CD is vital in panel data econometrics.

In the literature, there are several tests for CD, such as the Breusch and Pagan [ 67 ] Lagrange multiplier (LM) test, Pesaran [ 68 ] scaled LM test, Pesaran [ 68 ] CD test, and Baltagi et al. [ 69 ] bias-corrected scaled LM test (for more detail, see Tugcu and Tiwari [ 70 ]). Besides, Friedman [ 71 ] and Frees [ 72 , 73 ] also have other types of CD tests (for more detail, see De Hoyos and Sarafidis [ 64 ]). This study employs Frees [ 72 ] and Pesaran [ 68 ] among the existing CD tests. This is because, unlike the Breusch and Pagan [ 67 ] test, these tests do not require infinite T and fixed N, and are rather applicable for both a large N and T. Additionally, Free’s CD test can overcome the irregular signs associated with correlation. However, it also employs Friedman [ 71 ] CD for mixed results of the above tests.

Unit root test

The panel unit root and cointegration tests are common steps following the CD test. Generally, there are two types of panel unit root tests: (1) the first-generation panel unit root tests, such as Im et al. [ 74 ], Maddala and Wu [ 75 ], Choi [ 76 ], Levin et al. [ 77 ], Breitung [ 78 ] and Hadri [ 79 ], and (2) the second-generation panel unit root tests, such as [ 66 , 80 , 81 , 82 , 83 , 84 , 85 , 86 , 87 , 88 , 89 ].

The first-generation panel unit root tests have been criticized because they assume cross-sectional independence [ 90 , 91 , 92 , 93 ]. This hypothesis is somewhat restrictive and unrealistic, as macroeconomic time series exhibit significant cross-sectional correlation among countries in a panel [ 92 ], and co-movements of economies are often observed in the majority of macroeconomic applications of unit root tests [ 91 ]. The cross-sectional correlation of errors in panel data applications in economics is likely to be the rule rather than the exception [ 93 ]. Moreover, applying first-generation unit root tests under CD models can generate substantial size distortions [ 90 ], resulting in the null hypothesis of nonstationary being quickly rejected [ 66 , 94 ]. As a result, second-generation panel unit root tests have been proposed to take CD into account. Therefore, among the existing second-generation tests, this study employs Pesaran’s [ 66 ] cross-sectionally augmented panel unit root test (CIPS) for models 1A–1C . The rationale for this is that, unlike other unit root tests that allow CD, such as Bai and Ng [ 80 ], Moon and Perron [ 87 ], and Phillips and Sul [ 84 ], Pesaran’s [ 66 ] test is simple and clear. Besides, Pesaran [ 66 ] is robust when time-series’ heteroscedasticity is observed in the unobserved common factor [ 95 ]. Even though theoretically, Moon and Perron [ 87 ], Choi [ 96 ] and Pesaran [ 66 ] require large N and T, Pesaran [ 66 ] is uniquely robust in small sample sizes [ 97 ]. Therefore, this study employs the CIPS test to take into account CD, and heteroskedasticity in the unobserved common factor and both large and small sample countries. However, since there is no CD in model 1D , this study employs the first-generation unit root tests called Levin, Lin, and Chu (LLC), Im, Pesaran, Shin (IPS) and Fisher augmented Dickey–Fuller (ADF) for model 1D .

Cointegration test

The most common panel cointegration tests when there is CD are Westerlund [ 98 ], Westerlund and Edgerton [ 99 ], Westerlund and Edgerton [ 100 ], Groen and Kleibergen [ 101 ], Westerlund’s [ 102 ] Durbin-Hausman test, Gengenbach et al. [ 103 ] and Banerjee and Carrion-i-Silvestre [ 104 ]. However, except for a few, most tests are not coded in Statistical Software (STATA) and are affected by insufficient observations. The current study primarily uses Westerlund [ 98 ] and Banerjee and Carrion-i-Silvestre [ 104 ] for models 1A–1C . However, to decide uncertain results, it also uses McCoskey and Kao [ 105 ] cointegration tests for model 1C . The rationale for using Westerlund’s [ 98 ] cointegration test is that most panel cointegration has failed to reject the null hypothesis of no cointegration due to the failure of common-factor restriction [ 106 ]. However, Westerlund [ 98 ] does not require any common factor restriction [ 107 ] and allows for a large degree of heterogeneity (e.g., individual-specific short-run dynamics, intercepts, linear trends, and slope parameters) [ 92 , 107 , 108 ]. Besides, its command is coded and readily available in STATA. However, it suffers from insufficient observations, especially when the number of independent variables increases. The present study employs the Banerjee and Carrion-i-Silvestre [ 104 ] and McCoskey and Kao [ 105 ] cointegration tests to overcome this limitation. The two Engle-Granger-based cointegration tests applicable when there is no CD and are widely used and available in STATA are Pedroni [ 109 , 110 ] and Kao [ 111 ]. However, the Pedroni test has two benefits over Kao: it assumes cross-sectional dependency and considers heterogeneity by employing specific parameters [ 112 ]. Hence, this study uses the Pedroni cointegration test for model 1D .

Panel data estimation techniques

The panel data analysis can be conducted using different estimation techniques and is mainly determined by the results of basic panel econometric tests. Thus, this study mainly employs the Driscoll-Kraay [ 113 ] standard error (DKSE) (for models 1A and 1B ), FE (for model 1C ), and two-step GMM (for model 1D ) estimation techniques to examine the impact of food insecurity on health outcomes. It also employs the Granger causality test. However, for robustness checks, it uses fully modified ordinary least squares (FMOLS), panel-corrected standard error (PCSE), and feasible generalized least squares (FGLS) methods (for models 1A and 1B ). Moreover, it uses a random effect (RE) for model 1C and panel dynamic fixed effect (DFE) techniques for model 1D .

Even though several panel estimation techniques allow CD, most of them – such as cross-section augmented autoregressive distributed lag (CS-ARDL), cross-section augmented distributed lag (CS-DL), common correlated effects pooled (CCEP), and common correlated effects mean group (CCEMG) estimators – require a large number of observations over groups and periods. Similarly, the continuously updated fully modified (CUP-FM) and continuously updated bias-corrected (CUP-BC) estimators are not coded in STATA. Others, like the PCSE, FGLS, and seemingly unrelated regression (SUR), are feasible for T (the number of time series) > N (the number of cross-sectional units) [ 114 , 115 ]. However, a DKSE estimate is feasible for N > T [ 114 ]. Therefore, depending on the CD, cointegration test, availability in STATA, and comparing N against T, this study mainly employs the DKSE regression for models 1A and 1B , FE model for model 1C , and GMM for model 1C .

Finally, to check the robustness of the main result, this study employs FMOLS, FGLS, and PCSE estimation techniques for models 1A and 1B . Furthermore, even though the Hausman test confirms that the FE is more efficient, the study employs the RE for model 1C . This is because Firebaugh et al. [ 116 ] note that the RE and FE models perform best in panel data. Besides, unlike FE, RE assumes that individual differences are random. In addition, this study uses panel DFE for model 1D (selected based on the Hausman test). Finally, the robustness check is also conducted using an alternative model (i.e., when a dependent variable is without a natural log and Granger causality test).

Table 3 shows the overall mean of LNLEXP of the region is 4.063 years which indicates that the region can achieve only 57.43 (using ln(x) = 4.063 = loge (x)  = e 4.063 , where e = 2.718) years of life expectancy. This is very low compared to other regions. Besides, the ranges in the value of LNLEXP are between 3.698 and 4.345 or (40–76 years), implying high variation. Similarly, the mean value of LNINFMOR is 3.969; implying SSA countries recorded 52 infants death per 1000. Moreover, the range of LNINFMOR is between 2.525 and 4.919 or (12 – 135 infant death per 1000), implying high variation within the region. The mean value of people’s prevalence for undernourishment is 21.26; indicating 21% of the population is undernourished. However, the mean value of AVRDES is 107.826, which is greater than 100, implying that the calorie supply is adequate for all consumers if the food is distributed according to the requirements of individuals. When we observe the skewness and kurtosis of the variables of the models, except for LNLEXP and LNINFMOR, all variables are positively skewed. In addition, all variables have positive kurtosis with values between 2.202 and 6.092.

Table 3 also shows the degree of relationship between variables, such that most values are below the threshold or rule of thumb (0.7) for a greater association [ 117 ]. However, the association between LNINFMOR and LNLEXP, as well as between PRUNP and AVRDES, is over the threshold and seems to have a multicollinearity issue. Nevertheless, these variables did not exist together in the models, indicating the absence of a multicollinearity problem.

Table 4 shows whether the cross-sectional specific and time-specific FE in extended models ( model 1A-1D plus Eq.  2 ) are valid. The result reveals that the null hypothesis of the captured unobserved heterogeneity is homogenous across the countries, and time is rejected at 1%, implying the extended models are correctly specified. Besides, to check the robustness of the two-way error component model relative to the pooled OLS estimator, this study conducted an additional poolability test. The result shows the null hypothesis that intercepts homogeneity (pooling) is rejected at a 1% level; thus, the FE model is most applicable, but the pooled OLS is biased.

Cooks D is an indicator of high leverage and residuals. The impact is high when D exceeds 4/N, (N = number of observations). A D > 1 implies a significant outlier problem. The Cooks D result of this study confirms the absence of outliers' problem (see supplementary file 1 ).

Normality, heteroscedasticity, serial correlation, and multicollinearity tests

The results in Table 5 indicate that the probability value of the joint test for normality on e and u are above 0.01, implying that the residuals are normally distributed. The heteroscedasticity results show that the probability value of the chi-square statistic is less than 0.01 in all models. Therefore, the null hypothesis of constant variance can be rejected at a 1% level of significance. In other words, the modified Wald test result for Groupwise heteroskedasticity presented in Table 5 , rejects the null hypothesis of Groupwise homoskedasticity observed by the probability value of 0.0000, which implies the presence of heteroscedasticity in the residuals. Similarly, all models suffer from serial correlation since the probability value of 0.0000 rejects the null hypothesis of no first-order serial correlation, indicating the presence of autocorrelation in all panel models. Finally, the multicollinearity test reveals that the models have no multicollinearity problem since the Variance inflation Factors (VIF) values are below 5.

Cross-sectional dependence test

Results in Table 6 strongly reject the null hypothesis of cross-sectional independence for models 1A – 1C . However, for model 1D , the study found mixed results (i.e., Pesaran [ 68 ] fails to reject the null hypothesis of no CD while Frees [ 72 ] strongly rejects it). Thus, to decide, this study employs the Friedman [ 71 ] CD test. The result fails to reject the null hypothesis of cross-sectional independence, implying that two out of three tests fail to reject the null hypothesis of cross-sectional independence in model 1D . Therefore, unlike others, there is no CD in model 1D (see Table 6 ).

Unit root tests

Table 7 shows that all variables are highly (at 1% level) significant either at level (I(0)) or first difference (I(1)), which implies all variables are stationary. In other words, the result fails to reject the null hypothesis of unit root (non-stationary) for all variables at a 1%-significance level, either at levels or the first differences. Thus, we might expect a long-run connection between these variables collectively.

Cointegration tests

The results in Table 8 show that both the Westerlund [ 98 ] and Banerjee and Carrion-i-Silvestre [ 104 ] cointegration tests strongly reject the null hypothesis of no-cointegration in models 1A and 1B . However, model 1C provides a mixed result, i.e. the Banerjee and Carrion-i-Silvestre [ 104 ] test rejects the null hypothesis of no cointegration, yet the reverse is true for the Westerlund [ 98 ] test. Therefore, this study conducted further cointegration tests for model 1C . Even though Westerlund and Edgerton [ 99 ] suffer from insufficient observation, it is based on the McCoskey and Kao [ 105 ] LM test [ 118 ]. Thus, we can use a residual-based cointegration test in the heterogeneous panel framework proposed by McCoskey and Kao [ 105 ]. However, an efficient estimation technique of cointegrated variables is required, and hence the FMOLS and DOLS estimators are recommended. The residuals derived from the FMOLS and DOLS will be tested for stationarity with the null hypothesis of no cointegration amongst the regressors. Since the McCoskey and Kao [ 105 ] test involves averaging the individual LM statistics across the cross-sections, for testing the residuals FMOLS and DOLS stationarity, McCoskey, and Kao [ 105 ] test is in the spirit of IPS (Im et al. [ 74 ]) [ 119 ].

Though FMOLS and DOLS are recommended for the residuals cointegration test, DOLS is better than FMOLS (for more detail, see Kao and Chiang [ 120 ]); therefore, this study uses a residual test derived from DOLS. The result fails to reject the null hypothesis of no cointegration. Two (Banerjee and Carrion-i-Silvestre [ 104 ] and McCoskey and Kao [ 105 ]) out of three tests fail to reject the null hypothesis of no cointegration; hence, we can conclude that there is no long-run relationship among the variables in model 1C .

Unlike other models, since there is CD in model 1D , this study employs the Pedroni [ 109 ] and Kao [ 111 ] cointegration tests for model 1D . The result strongly rejects the null hypothesis of no cointegration, which is similar to models 1A and 1B , that a long-run relationship exists among the variables in model 1D (see Table 5 ).

Panel data estimation results

Table 9 provides long-run regression results of all models employing appropriate estimation techniques such as DKSE, FE, and two-step GMM, along with the Granger causality test. However, the DKSE regression can be estimated in three ways: FE with DKSE, RE with DKSE, and pooled Ordinary Least Squares/Weighted Least Squares (pooled OLS/WLS) regression with DKSE. Hence, we must choose the most efficient model using Hausman and Breusch-Pagan LM for RE tests (see supplementary file 2 ). As a result, this study employed FE with DKSE for models 1A and 1B . Further, due to Hausman's result, absence of cointegration and to deal with heterogeneity and spatial dependence in the dynamic panel, this study employs FE for the model1C (see the supplementary file 2). However, due to the absence of CD, the presence of cointegration, and N > T, this study uses GMM for model 1D . Moreover, according to Roodman [ 121 ], the GMM approach can solve heteroskedasticity and autocorrelation problems. Furthermore, even though two-step GMM produces only short-run results, it is possible to generate long-run coefficients from short-run results [ 122 , 123 ].

The DKSE result of model 1A shows that a 1% increment in people's prevalence for undernourishment reduces their life expectancy by 0.00348 PPs (1 year or 366 days). However, in model 1C, a 1% rise in the prevalence of undernourishment increases infant mortality by 0.0119 PPs (1 year or 369 days). The DKSE estimations in model 1B reveal that people’s life expectancy rises by 0.00317 PPs with every 1% increase in average dietary energy supply. However, the GMM result for model 1D confirms that a 1% incrementin average dietary energy supply reduces infant mortality by 0.0139 PPs. Moreover, this study conducted a panel Granger causality test to confirm whether or not food insecurity has a potential causality to health outcomes. The result demonstrates that the null hypothesis of change in people’s prevalence for undernourishment and average dietary energy supply does not homogeneously cause health outcomes is rejected at 1% significance, implying a change in food insecurity does Granger-cause health outcomes of SSA countries (see Table 9 ).

In addition to the main results, Table 9 also reports some post-estimation statistics to ascertain the consistency of the estimated results. Hence, in the case of DKSE and FE models, the validity of the models is determined by the values of R 2 and the F statistics. For instance, R 2 quantifies the proportion of the variance in the dependent variable explained by the independent variables, representing the model’s quality. The results in Table 9 demonstrate that the explanatory variables explain more than 62% of the variance on the dependent variable. Cohen [ 125 ] classifies the R 2 value of 2% as a moderate influence in social and behavioral sciences, while 13 and 26% are considered medium and large effects, respectively. Therefore, the explanatory variables substantially impact this study's models. Similarly, the F statistics explain all independent variables jointly explain the dependent one. For the two-step system GMM, the result fails to reject the null hypothesis of no first (AR(1)) and second-order (AR(2)) serial correlation, indicating that there is no first and second-order serial correlation. In addition, the Hansen [ 126 ] and Sargan [ 127 ] tests fail to reject the null hypothesis of the overall validity of the instruments used, which implies too many instruments do not weaken the model.

Robustness checks

The author believes the above findings may not be enough for policy recommendations unless robustness checks are undertaken. Hence, the study estimated all models without the natural logarithm of the dependent variables (see Table 10 ). The model 1A result reveals, similar to the above results, individuals’ prevalence for undernourishment significantly reduces their life expectancy in SSA countries. That means a 1% increase in the people's prevalence of undernourishment reduces their life expectancy by 0.1924 PPs. Moreover, in model 1B , life expectancy rises by 0.1763 PPs with every 1% increase in average dietary energy supply. In model 1C , the rise in infants’ prevalence for undernourishment has a positive and significant effect on their mortality rate in SSA countries. The FE result implies that a 1% rise in infants’ prevalence for undernourishment increases their mortality rate by 0.9785 PPs. The GMM result in model 1D indicates that improvement in average dietary energy supply significantly reduces infant mortality. Further, the Granger causality result confirms that the null hypothesis of change in the prevalence of undernourishment and average dietary energy supply does not homogeneously cause health outcomes and is rejected at a 1% level of significance. This implies a change in food insecurity does Granger-cause health outcomes in SSA countries (see Table 10 ).

The study also conducted further robustness checks using the same dependent variables (as Table 9 ) but different estimation techniques. The results confirm that people’s prevalence of undernourishment has a negative and significant effect on their life expectancy, but improvement in average dietary energy supply significantly increases life expectancy in SSA countries. However, the incidence of undernourishment in infants contributes to their mortality; however, progress in average dietary energy supply for infants significantly reduces their mortality (see Table 11 ).

The main objective of this study is to examine the impact of food insecurity on the health outcomes of SSA countries. Accordingly, the DKSE result of model 1A confirms that the rise in people’s prevalence for undernourishment significantly reduces their life expectancy in SSA countries. However, the FE result shows that an increment in the prevalence of undernourishment has a positive and significant impact on infant mortality in model 1C . This indicates that the percentage of the population whose food intake is insufficient to meet dietary energy requirements is high, which leads to reduce life expectancy but increases infant mortality in SSA countries. The reason for this result is linked to the insufficient food supply in SSA due to low production and yields, primitive tools, lack of supporting smallholder farms and investment in infrastructure, and government policies. Besides, even though the food is available, it is not distributed fairly according to the requirements of individuals. Moreover, inadequate access to food, poor nutrition, and chronic illnesses are caused by a lack of well-balanced diets. In addition, many of these countries are impacted by poverty, making it difficult for citizens to afford nutritious food. All these issues combine to create an environment where individuals are more likely to suffer malnutrition-related illnesses, resulting in a lower life expectancy rate. The DKSE estimation result in model 1B reveals that improvement in average dietary energy supply positively impacts people's life expectancy in SSA countries. However, the improvement in average dietary energy supply reduces infant mortality.

Based on the above results, we can conclude that food insecurity harms SSA nations' health outcomes. This is because the prevalence of undernourishment leads to increased infant mortality by reducing the vulnerability, severity, and duration of infectious diseases such as diarrhea, pneumonia, malaria, and measles. Similarly, the prevalence of undernourishment can reduce life expectancy by increasing the vulnerability, severity, and duration of infectious diseases. However, food security improves health outcomes – the rise in average dietary energy supply reduces infant mortality and increases the life expectancy of individuals.

Several facts and theories support the above findings. For instance, similar to the theoretical and conceptual framework section, food insecurity in SSA countries can affect health outcomes in nutritional, mental health, and behavioral channels. According to FAO et al. [ 128 ], the prevalence of undernourishment increased in Africa from 17.6% of the population in 2014 to 19.1% in 2019. This figure is more than twice the global average and the highest of all regions of the world. Similarly, SSA is the world region most at risk of food insecurity [ 129 ]. According to Global Nutrition [ 130 ] report, anemia affects an estimated 39.325% of women of reproductive age. Some 13.825% of infants have a low weight at birth in the SSA region. Excluding middle African countries (due to lack of data), the estimated average prevalence of infants aged 0 to 5 months who are exclusively breastfed is 35.73%, which is lower than the global average of 44.0%. Moreover, SSA Africa still experiences a malnutrition burden among children aged under five years. The average prevalence of overweight is 8.15%, which is higher than the global average of 5.7%. The prevalence of stunting is 30.825%—higher than the worldwide average of 22%. Conversely, the SSA countries’ prevalence of wasting is 5.375%, which is higher than most regions such as Central Asia, Eastern Asia, Western Asia, Latin America and the Caribbean, and North America. The SSA region's adult population also faces a malnutrition burden: an average of 9.375% of adult (aged 18 and over) women live with diabetes, compared to 8.25% of men. Meanwhile, 20.675% of women and 7.85% of men live with obesity.

According to Saltzman et al. [ 17 ], micronutrient deficiencies can affect people’s health throughout their life cycle. For instance, at the baby age, it causes (low birth weight, higher mortality rate, and impaired mental development), child (stunting, reduced mental capacity, frequent infections, reduced learning capacity, higher mortality rate), adolescent (stunting, reduced mental capacity, fatigue, and increased vulnerability to infection), pregnant women (increased mortality and perinatal complications), adult (reduced productivity, poor socio-economic status, malnutrition, and increased risk of chronic disease), elderly (increased morbidity (including osteoporosis and mental impairment), and higher mortality rate).

Though this study attempts to fill the existing gaps, it also has limitations. It examined the impact of food insecurity on infant mortality; however, their association is reflected indirectly through other health outcomes. Hence, future studies can extend this study by examining the indirect effect of food insecurity on infant mortality, which helps to look at in-depth relationships between the variables. Moreover, this study employed infant mortality whose age is below one year; hence, future studies can broaden the scope by decomposing infant mortality into (neonatal and postnatal) and under-five mortality.

Millions of people are dying every year due to hunger and hunger-related diseases worldwide, especially in SSA countries. Currently, the link between food insecurity and health status is on researchers' and policymakers' agendas. However, macro-level findings in this area for most concerned countries like SSA have been given only limited attention. Therefore, this study examined the impact of food insecurity on life expectancy and infant mortality rates. The study mainly employs DKSE, FE, two-step GMM, and Granger causality approaches, along with other estimation techniques for robustness checks for the years between 2001 and 2018. The result confirms that food insecurity harms health outcomes, while food security improves the health status of SSA nations'. That means that a rise in undernourishment increases the infant mortality rate and reduces life expectancy. However, an improvement in the average dietary energy supply reduces infant mortality and increases life expectancy. Therefore, SSA countries need to guarantee their food accessibility both in quality and quantity, which improves health status. Both development experts and political leaders agree that Africa has the potential for agricultural outputs, can feed the continent, and improve socio-economic growth. Besides, more than half of the world's unused arable land is found in Africa. Therefore, effective utilization of natural resources is essential to achieve food security. Moreover, since the majority of the food in SSA is produced by smallholder farmers [ 131 ] while they are the most vulnerable to food insecurity and poverty [ 132 , 133 ]; hence, special focus and support should be given to smallholder farmers that enhance food self-sufficiency. Further, improvement in investment in agricultural research; improvement in markets, infrastructures, and institutions; good macroeconomic policies and political stability; and developing sub-regional strategies based on their agroecological zone are crucial to overcoming food insecurity and improving health status. Finally, filling a stomach is not sufficient; hence, a person's diet needs to be comprehensive and secure, balanced (including all necessary nutrients), and available and accessible. Therefore, SSA countries should ensure availability, accessibility, usability, and sustainability to achieve food and nutrition security.

Availability of data and materials

The datasets used and/or analyzed during the current study are available in supplementary materials.

Abbreviations

Augmented Dickey–Fuller

Acquired Immunodeficiency Syndrome

Average Dietary Energy Supply

Common Correlated Effects Mean Group

Common Correlated Effects Pooled

Cross-Sectional Dependence

Cross-Sectionally Augmented Panel Unit Root Test

Cross-Section Augmented Autoregressive Distributed Lag

Cross-Section Augmented Distributed Lag

Continuously Updated Bias-Corrected

Continuously Updated Full Modified

Dynamic Fixed Effect

Driscoll-Kraay Standard Errors

Dynamic Ordinary Least Square

Error Correction Model

Food and Agricultural Organization

Fixed Effect

Feasible Generalised Least Squares

Fully Modified Ordinary Least Square

Gross Domestic Product (GDP) per capita

Generalised Method of Momentum

Domestic General Government Health Expenditure

Human Immunodeficiency Virus

Integration at First Difference

International Fund for Agricultural Development

Infant Mortality Rate

Im, Pesaran, Shin

Lag of Infant Mortality Rate

Lag of Natural Logarithm of Infant Mortality Rate

Life Expectancy at Birth

Levin, Lin, and Chu

Lagrange Multiplier

Natural Logarithm of Infant Mortality Rate

Natural Logarithm of Life Expectancy at Birth

Mean Years of Schooling

Ordinary Least Squares

Panel-Corrected Standard Error

Pooled Mean Group

Prevalence of Undernourishment

Random Effect

Sustainable Development Goals

Sub-Saharan African

Statistical Software

Seemingly Unrelated Regression

Urbanisation

World Food Programme

World Health Organization

Weighted Least Squares

Giller KE. The food security conundrum of sub-Saharan Africa. Glob Food Sec. 2020;2020(26): 100431.

Article   Google Scholar  

WHO. Reducing risks, promoting healthy life. World Health Report. Switzerland: WHO; 2002. Available from: https://apps.who.int/iris/bitstream/handle/10665/42510/WHR_2002.pdf?sequence=1 .

Food Research and Action Center. Hunger and health: the impact of poverty, food insecurity, and poor nutrition on health and well-being. Washington, DC: Food Research & Action Center; 2017.  https://frac.org/research/resource-library/hunger-health-impact-poverty-food-insecurity-poor-nutrition-health-well .

Google Scholar  

FAO, IFAD, WFP. The State of Food Insecurity in the World 2013. The multiple dimensions of food security. Rome: Food and Agriculture Organization of the United Nations; 2013. Available from:  http://www.fao.org/3/i3434e/i3434e00.htm .

Roser M, Ritchie H. Hunger, and Undernourishment. Oxford: University of Oxford; 2013. Available from:  https://ourworldindata.org/hunger-and-undernourishment .

Word Count. How many people die from hunger each year. Denmark: The World Counts; 2020. Available from:  https://www.theworldcounts.com/challenges/people-and-poverty/hunger-and-obesity/how-many-people-die-from-hunger-each-year/story .

FAO IFAD, UNICEF, WFP, and WHO. The State of Food Security and Nutrition in the World 2018. Building Climate Resilience for Food Security and Nutrition. Technical report. FAO. 11 September 2018. Available from: https://www.fao.org/3/I9553EN/i9553en.pdf or https://www.wfp.org/publications/2018-state-food-security-and-nutrition-world-sofi-report .

FAO, ECA, AUC. Africa regional overview of food security and nutrition 2019 -in brief. 2020.

UNHCR. UNHCR describes the alarming health and nutrition situation in South Sudan camps. News Stories, 24 August 2012. Available from: http://www.unhcr.org/503881659.html .

Laxmaiah A, Arlappa N, Balakrishna N, Mallikarjuna RK, Galreddy C, Kumar S, Ravindranath M, Brahmam GN. Prevalence and determinants of micronutrient deficiencies among rural children of eight states in India. Ann NutrMetab. 2013;62(3):231–41.

CAS   Google Scholar  

Muthayya S, Rah JH, Sugimoto JD, Roos FF, Kraemer K, Black RE. The global hidden hunger indices and maps: an advocacy tool for action. PLoS ONE. 2013;8(6): e67860.

Article   CAS   PubMed   PubMed Central   Google Scholar  

SOS. Hunger and Food Scarceness in Africa. Washington, DC: SOS Children’s Villages; 2018. Available from:  https://www.sos-usa.org/about-us/where-we-work/africa/hunger-in-africa .

Gulliford MC, Mahabir D, Rocke B. Food insecurity, food choices, and body mass index in adults: nutrition transition in Trinidad and Tobago. Int J Epidemiol. 2003;32:508–16.

Article   PubMed   Google Scholar  

Stuff JE, Casey PH, Szeto K, Gossett J, Weber J, Simpson P, et al. Household food insecurity and adult chronic disease in the lower Mississippi delta. J Federation Am Soc Experiment Biol. 2005;19:A986.

Parker ED, Widome R, Nettleton JA, Pereira MA. Food security and metabolic syndrome in U.S. adults and adolescents: findings from the National Health and Nutrition Examination Survey 1999-2006. Ann Epidemiol. 2010;20:364–70.

Article   PubMed   PubMed Central   Google Scholar  

WHO. Global Prevalence of Vitamin A Deficiency in Populations at Risk 1995–2005: WHO Global Database on Vitamin A Deficiency. 2009. Available from: www.who.int/vmnis/vitamina/en/ .

Saltzman A, Birol E, Wiesman D, Prasai N, Yohannes Y, Menon P, Thompson J. 2014 global hunger index: The challenge of hidden hunger. Washington, DC: International Food Policy Research Institute; 2014.

Weiser SD, Palar K, Hatcher AM, Young SL, Frongillo EA. Food insecurity and health: a conceptual framework. In: Ivers L, editors. Food insecurity and public health (pp. 23-50). 1st ed. Boca Raton: CRC Press; 2015.

Rose D. Economic determinants and dietary consequences of food insecurity in the United States. J Nutr. 1999;129(2):517S-S520.

Article   CAS   PubMed   Google Scholar  

Dixon LB, Winkleby MA, Radimer KL. Dietary intakes and serum nutrients differ between adults from food-insufficient and food-sufficient families: Third National Health and Nutrition Examination Survey, 1988–1994. J Nutr. 2001;131(4):1232–46.

Kirkpatrick SI, Tarasuk V. Food insecurity is associated with nutrient inadequacies among Canadian adults and adolescents. J Nutr. 2008;138(3):604–12.

Schaible UE, Kaufmann SH. Malnutrition and infection: complex mechanisms and global impacts. PLoS Med. 2007;4(5): e115.

Laraia B, Epel E, Siega-Riz AM. Food insecurity with the experience of restrained eating is a recipe for increased gestational weight gain. Appetite. 2013;65:178–84.

Seligman HK, Bindman AB, Vittinghoff E, Kanaya AM, Kushel MB. Food insecurity is associated with diabetes mellitus: results from the National Health Examination and Nutrition Examination Survey (NHANES) 1999–2002. J Gen Intern Med. 2007;22(7):1018–23.

Whitaker RC, Phillips SM, Orzol SM. Food insecurity and the risks of depression and anxiety in mothers and behavior problems in their preschool-aged children. Pediatrics. 2006;118(3):e859–68.

Black PH, Garbutt LD. Stress, inflammation, and cardiovascular disease. J Psychosom Res. 2002;52(1):1–23.

Cunningham WE, Andersen RM, Katz MH, Stein MD, Turner BJ, Crystal S, Zierler S, Kuromiya K, Morton SC, St. Clair P, Bozzette SA. The impact of competing subsistence needs and barriers on access to medical care for persons with human immunodeficiency virus receiving care in the United States. Med Care. 1999;37(12):1270–81.

Kushel MB, Gupta R, Gee L, Haas JS. Housing instability and food insecurity as barriers to health care among low-income Americans. J Gen Intern Med. 2006;21(1):71–7.

Weiser SD, Tuller DM, Frongillo EA, Senkungu J, Mukiibi N, Bangsberg DR. Food insecurity as a barrier to sustained antiretroviral therapy adherence in Uganda. PLoS ONE. 2010;5(4): e10340.

Bengle R, Sinnett S, Johnson T, Johnson MA, Brown A, Lee JS. Food insecurity is associated with cost-related medication non-adherence in community-dwelling, low-income older adults in Georgia. J Nutr Elder. 2010;29(2):170–91.

Weiser SD, Leiter K, Bangsberg DR, Butler LM, Percy-de Korte F, Hlanze Z, Phaladze N, Iacopino V, Heisler M. Food insufficiency is associated with high-risk sexual behavior among women in Botswana and Swaziland. PLoS med. 2007;4(10): e260.

Mehta S, Manji KP, Young AM, Brown ER, Chasela C, Taha TE, Read JS, Goldenberg RL, Fawzi WW. Nutritional indicators of adverse pregnancy outcomes and mother-to-child transmission of HIV among HIV-infected women. Am J ClinNutr. 2008;87(6):1639–49.

Weiser SD, Bangsberg DR, Kegeles S, Ragland K, Kushel MB, Frongillo EA. Food insecurity among homeless and marginally housed individuals living with HIV/AIDS in San Francisco. AIDS Behav. 2009;13(5):841–8.

Weiser SD, Frongillo EA, Ragland K, Hogg RS, Riley ED, Bangsberg DR. Food insecurity is associated with incomplete HIV RNA suppression among homeless and marginally housed HIV-infected individuals in San Francisco. J Gen Intern Med. 2009;24(1):14–20.

Kalichman SC, Cherry C, Amaral C, White D, Kalichman MO, Pope H, Swetsze C, Jones M, Macy R. Health and treatment implications of food insufficiency among people living with HIV/AIDS, Atlanta. Georgia J Urban Health. 2010;87(4):631–41.

Weiser SD, Gupta R, Tsai AC, Frongillo EA, Grede N, Kumbakumba E, Kawuma A, Hunt PW, Martin JN, Bangsberg DR. Changes in food insecurity, nutritional status, and physical health status after antiretroviral therapy initiation in rural Uganda. J Acquir Immune DeficSyndr (1999). 2012;61(2):179.

Article   CAS   Google Scholar  

Weiser SD, Fernandes KA, Brandson EK, Lima VD, Anema A, Bangsberg DR, Montaner JS, Hogg RS. The association between food insecurity and mortality among HIV-infected individuals on HAART. J Acquir Immune DeficSyndr (1999). 2009;52(3):342.

Stuff JE, Casey PH, Szeto KL, Gossett JM, Robbins JM, Simpson PM, et al. Household food insecurity is associated with adult health status. J Nutr. 2004;134(9):2330–5.

Dinour LM, Bergen D, Yeh MC. The food insecurity–obesity paradox: a review of the literature and the role food stamps may play. J Am Diet Assoc. 2007;107(11):1952–61.

Seligman HK, Davis TC, Schillinger D, Wolf MS. Food insecurity is associated with hypoglycemia and poor diabetes self-management in a low-income sample with diabetes. J Health Care Poor U. 2010;21(4):1227.

Laraia B, Siega-Riz AM, Gundersen C. Household food insecurity is associated with self-reported pregravid weight status, gestational weight gain, and pregnancy complications. J Am Diet Assoc. 2010;110:692–701.

Laraia BA, Siega-Riz AM, Gundersen C, Dole N. Psychosocial factors and socioeconomic indicators are associated with household food insecurity among pregnant women. J Nutr. 2006;136:177–82.

Miller CL, Bangsberg DR, Tuller DM, Senkungu J, Kawuma A, Frongillo EA, et al. Food insecurity and sexual risk in an HIV endemic community in Uganda. AIDS Behav. 2011;15(7):1512–9.

Tsai AC, Bangsberg DR, Frongillo EA, Hunt PW, Muzoora C, Martin JN, et al. Food insecurity, depression and the modifying role of social support among people living with HIV/AIDS in rural Uganda. SocSci Med. 2012;74(12):2012–9.

Vogenthaler NS, Kushel MB, Hadley C, Frongillo EA, Riley ED, Bangsberg DR, et al. Food insecurity and risky sexual behaviors among homeless and marginally housed HIV-infected individuals in San Francisco. AIDS Behav. 2013;17(5):1688–93.

Baig-Ansari N, Rahbar MH, Bhutta ZA, Badruddin SH. Child’s Child’s gender and household food insecurity are associated with stunting among young Pakistani children residing in urban squatter settlements. Food Nutr Bull. 2006;27(2):114–27.

Gundersen C, Kreider B. Bounding the effects of food insecurity on children’s health outcomes. J Health Econ. 2009;28(5):971–83.

Cole SM, Tembo G. The effect of food insecurity on mental health: panel evidence from rural Zambia. SocSci Med. 2011;73(7):1071–9.

Weaver LJ, Owens C, Tessema F, Kebede A, Hadley C. Unpacking the “black box” of global food insecurity and mental health. SocSci Med. 2021;282: 114042.

Uchendu FN. Hunger influenced life expectancy in war-torn Sub-Saharan African countries. J Health PopulNutr. 2018;37(1):1–4.

Asiseh F, Naanwaab C, Quaicoe O. The association between food insecurity and child health outcomes in low and middle-income countries. J Afr Dev. 2018;20(2):79–90.

Justice AE, Louis AA. The nexus between food security and infant mortality-further evidence from Nigeria. Amity J Econ. 2018;3(1):1–5.

Hameed S, Wei W, Chaudhary N. A dynamics appraisal of association among food Insecurity, women and child health: Evidence from developing countries. 2020. Available from:  https://www.preprints.org/manuscript/202007.0291/v1 .

Banerjee S, Radak T, Khubchandani J, Dunn P. Food insecurity and mortality in American adults: results from the NHANES-linked mortality study. Health PromotPract. 2021;22(2):204–14.

Cassidy-Vu L, Way V, Spangler J. The correlation between food insecurity and infant mortality in North Carolina. Public Health Nutr. 2022;25(4):1038–44.

Kennedy P. A guide to econometrics. 6th ed. Toronto: Wiley-Blackwell; 2008.

Smith MD, Meade B. Who Are the World’s Food Insecure? Identifying the Risk Factors of Food Insecurity around the World. Amber Waves: 2019. The Economics of Food, Farming, Natural Resources, and Rural America. Available from: https://www.ers.usda.gov/amber-waves/2019/june/who-are-the-world-s-food-insecure-identifying-the-risk-factors-of-food-insecurity-around-the-world/ .

The World Bank. Life expectancy at birth, total (years). 2022a. Available from: https://data.worldbank.org/indicator/SP.DYN.LE00.IN .

The World Bank. Mortality rate, infant (per 1,000 live births). 2022b. Available from: https://data.worldbank.org/indicator/SP.DYN.IMRT.IN

Vijayamohanan PN. Panel data analysis with Stata Part 1 fixed effects and random effects models. 2016. MPRA Paper. Available from: https://mpra.ub.uni-muenchen.de/76869/1/MPRA_paper_76869.pdf .

Cook RD. Detection of influential observation in linear regression. Technometrics. 1977;19:15–8.

Alejo J, Galvao A, Montes-Rojas G, Sosa-Escudero W. Tests for normality in linear panel-data models. Stand Genomic Sci. 2015;15(3):822–32.

Wooldridge JM. Econometric Analysis of Cross Section and Panel Data. 2nd ed. Cambridge, MA: MIT Press; 2010.

De Hoyos RE, Sarafidis V. Testing for cross-sectional dependence in panel-data models. STATA J. 2006;6(4):482–96.

Pesaran MH. Estimation and inference in large heterogeneous panels with a multifactor error structure. Econometrica. 2006;74(4):967–1012.

Pesaran MH. A simple panel unit root test in the presence of cross-section dependence. J ApplEconomet. 2007;22(2):265–312.

Breusch TS, Pagan AR. The Lagrange multiplier test and its applications to model specification in econometrics. Rev Econ Stud. 1980;47(1):239–53.

Pesaran MH. General diagnostic tests for cross-section dependence in panels. UK: IZA Discussion Paper No. 1240, University of Cambridge; 2004.

Book   Google Scholar  

Baltagi BH, Feng Q, Kao C. A Lagrange Multiplier test for cross-sectional dependence in a fixed effects panel data model. J Econometrics. 2012;170(1):164–77.

Tugcu CT, Tiwari AK. Does renewable and/or non-renewable energy consumption matter for total factor productivity (TFP) growth? Evidence from the BRICS. Renew Sust Energ Rev. 2016;65:610–6.

Friedman M. The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc. 1937;32(200):675–701.

Frees EW. Assessing cross-sectional correlation in panel data. J Econometrics. 1995;69:393–414.

Frees EW. Longitudinal and panel data: analysis and applications in the social sciences. Illustrated ed. Cambridge: Cambridge University Press; 2004.

Im KS, Pesaran MH, Shin Y. Testing for Unit Roots in Heterogeneous Panels. J Econometrics. 2003;115(1):53–74.

Maddala GS, Wu S. A comparative study of unit root tests with panel data and a new simple test. Oxford B Econ Stat. 1999;61(S1):631–52.

Choi I. Unit root tests for panel data. J Int Money Finance. 2001;20:249–72.

Levin A, Lin CF, Chu CSJ. Unit root tests in panel data: asymptotic and finite-sample properties. J Econometrics. 2002;108(1):1–24.

Breitung J. The local power of some unit root tests for panel data. In: B. Baltagi (ed.), Nonstationary Panels, Panel Cointegration, and Dynamic Panels. Adv Econom. 2000;15(JAI):161–178.

Hadri K. Testing for stationarity in heterogeneous panel data. Economist J. 2000;3(2):148–61.

Bai J, Ng S. A panic attack on unit roots and Cointegration. Econometrica. 2004;72:1127–77.

Chang Y, Non-linear IV. Unit root tests in panels with cross-sectional dependency. J Econometrics. 2002;110:261–92.

Chang Y. Bootstrap unit root tests in panels with cross-sectional dependency. J Econometrics. 2004;120:263–93.

Choi I. Combination unit root tests for cross-sectionally correlated panels. 2002.

Phillips PCB, Sul D. Dynamic panel estimation and homogeneity testing under cross section dependence. Economist J. 2003;6:217–59.

Harris R, Sollis R. Applied time series modeling and forecasting. 1st ed. Hoboken, New Jersey: Wiley; 2003.

Smith LV, Leybourne S, Kim TH, Newbold P. More powerful panel data unit root tests with an application to mean reversion in real exchange rates. J ApplEconomet. 2004;19(2):147–70.

Moon HR, Perron B. Testing for unit root in panels with dynamic factors. J Econometrics. 2004;122:81–126.

Cerrato M, Sarantis N. A bootstrap panel unit root test under cross-sectional dependence, with an application to PPP. Comput Stat Data An. 2007;51(8):4028–37.

Palm FC, Smeekes S, Urbain JP. Cross-sectional dependence robust block bootstrap panel unit root tests. J Econometrics. 2011;163(1):85–104.

O’Connell PGJ. The overvaluation of purchasing power parity. J Int Econ. 1998;44:1–19.

Hurlin C, Mignon V. Une Synthèse des Tests de Racine Unitaire sur Données de panel. Economie et Prévision. 2005;3–4(169):253–94.

Baltagi BH. Econometric Analysis of Panel Data. 3rd ed. Chichester: John Wiley & Sons; 2008.

Chudik A, Pesaran MH. Large panel data models with cross-sectional dependence: a survey. In: Baltagi BH, editor. The Oxford Handbook of panel data. New York: rd University Press; 2015. p. 3–45.

Chapter   Google Scholar  

Eberhardt M, Presbitero AF. Public debt and growth: Heterogeneity and non-linearity. J Int Econ. 2015;97(1):45–58.

Hashiguchi Y, Hamori S. Small sample properties of CIPS panel unit root test under conditional and unconditional heteroscedasticity. 2010. MPRA Paper No. 24053. Available from: https://mpra.ub.uni-muenchen.de/24053/ .

Choi I. Unit root tests for cross-sectionally correlated panels. In: Econometric theory and practice: frontiers of analysis and applied research. 2006.

Albulescu CT, Pépin D, Tiwari AK. A re-examination of real interest parity in CEECs using ‘old’ and ‘new’ second-generation panel unit root tests. B Econ Res. 2016;68(2):133–50.

Westerlund J. Testing for error correction in panel data. Oxford B Econ Stat. 2007;69(6):709–48.

Westerlund J, Edgerton DL. A panel bootstrap cointegration test. Econ Lett. 2007;97(3):185–90.

Westerlund J, Edgerton DL. A simple test for cointegration in dependent panels with structural breaks. Oxford B Econ Stat. 2008;70(5):665–704.

Groen JJJ, Kleibergen F. Likelihood-based cointegration analysis in panels of vector error-correction models. J Bus Econ Stat. 2003;21(2):295–318.

Westerlund J. Panel cointegration tests of the Fisher effect. J ApplEconomet. 2008;23(2):193–233.

Gengenbach C, Urbain JP, Westerlund J. Error correction testing in panels with common stochastic trends. J ApplEconomet. 2016;31(6):982–1004.

Banerjee A, Carrion-i-Silvestre JL. Testing for panel cointegration using common correlated effects estimators. J Time Ser Anal. 2017;38(4):610–36.

McCoskey S, Kao C. A residual-based test of the null of cointegration in panel data. Economet Rev. 1998;17(1):57–84.

Banerjee A, Dolado J, Mestre R. Error-correction mechanism tests for cointegration in a single-equation framework. J Time Ser Anal. 1998;19(3):267–83.

Abdullah SM, Siddiqua S, Huque R. Is health care a necessary or luxury product for Asian countries? An answer using the panel approach. Heal Econ Rev. 2017;7(1):1–12.

Martins PM. Aid absorption and spending in Africa: a panel cointegration approach. J Dev Stud. 2011;47(12):1925–53.

Pedroni P. Critical values for cointegration tests in heterogeneous panels with multiple regressors. Oxford B Econ Stat. 1999;61:653–70.

Pedroni P. Panel cointegration: asymptotic and finite sample properties of pooled time series tests with an application to the PPP hypothesis. Economet Theor. 2004;20(3):597–625.

Kao C. Spurious regression and residual-based tests for cointegration in panel data. J Econometrics. 1999;90:1–44.

Beyene SD. Kotosz B Testing the environmental Kuznets curve hypothesis: an empirical study for East African countries. Int J Environ Stud. 2020;77(4):636–54.

Driscoll JC, Kraay AC. Consistent covariance matrix estimation with spatially dependent panel data. Rev Econ Stat. 1998;80(4):549–60.

Hoechle D. Robust standard errors for panel regressions with cross-sectional dependence. STATA J. 2007;7(3):281–312.

Breitung J, Pesaran MH. Unit roots and cointegration in panels. In: The econometrics of panel data. Berlin, Heidelberg: Springer; 2008.

Firebaugh G, Warner C, Massoglia M. Fixed effects, random effects, and hybrid models for causal analysis. In: Morgan S, editor. Handbook of causal analysis for social research. Dordrecht: Springer; 2013. p. 113–32.

Allard A, Takman J, Uddin GS, Ahmed A. The N-shaped environmental Kuznets curve: an empirical evaluation using a panel quantile regression approach. Environ SciPollut Res. 2018;25(6):5848–61.

Bayar Y, Odabas H, Sasmaz MU, Ozturk OF. Corruption and shadow economy in transition economies of European Union countries: a panel cointegration and causality analysis. Econ Res-Ekon Istraz. 2018;31(1):1940–52.

Barbieri L. Panel Cointegration Tests: A Survey. Rivista Internazionale Di Scienze Sociali. 2008;116(1):3–36.

Kao C, Chiang MH. On the estimation and inference of a cointegrated regression in panel data. In Baltagi BH, Fomby TB, Hill RC (Ed.) Nonstationary panels, panel cointegration, and dynamic panels (Advances in Econometrics, Vol. 15, pp.179–222). Bingley: Emerald Group Publishing Limited; 2001.

Roodman D. How to do xtabond2: an introduction to difference and system GMM in Stata. Stand Genomic Sci. 2009;9(1):86–136.

Eslamloueyan K, Jokar Z. Energy consumption and economic growth in the Middle East and north Africa: a multivariate causality test. Iran J Econ Stud. 2014;18(57):27–46.

CrunchEconometrix. Econometrics and Data Analysis Resources: (Stata13): How to Generate Long-run GMM Coefficients. 2022. Available from: https://www.youtube.com/watch?v=01wUyHVZnTY&ab_channel=CrunchEconometrix

Dumitrescu EI, Hurlin C. Testing for granger non-causality in heterogeneous panels. Econ Model. 2012;29(4):1450–60.

Cohen J. Statistical Power Analysis for the Behavioural Sciences. 2nd ed. New York: Psychology Press; 1988.

Hansen LP. Large sample properties of generalized method of moments estimators. Econometrica. 1982;50(4):1029–54.

Sargan JD. The estimation of economic relationships using instrumental variables. Econometrica. 1958;26(3):393–415.

FAO, IFAD, UNICEF, WFP, WHO The State of Food Security and Nutrition in the World. Transforming Food Systems to Deliver Affordable Healthy Diets for All FAO. 2020. Available from: https://www.fao.org/publications/sofi/2020/en/ .

Van Ittersum MK, Van Bussel LG, Wolf J, Grassini P, Van Wart J, Guilpart N, Claessens L, De Groot H, Wiebe K, Mason-D’Croz D, Yang H. Can sub-Saharan Africa feed itself? Proce Natl Acad Sci. 2016;113(52):14964–9.

Global Nutrition Report. Country Nutrition Profiles. 2021.  https://globalnutritionreport.org/resources/nutrition-profiles/africa/ .

Herrero M, Thornton PK, Power B, Bogard JR, Remans R, Fritz S, Gerber JS, Nelson G, See L, Waha K, Watson RA. Farming and the geography of nutrient production for human use: a transdisciplinary analysis. Lancet Planet Health. 2017;1(1):e33-42.

Sibhatu KT, Qaim M. Rural food security, subsistence agriculture, and seasonality. PLoS ONE. 2017;12(10): e0186406.

Fanzo J. The role of farming and rural development is central to our diets. Physiol Behav. 2018;193:291–7.

Download references

Acknowledgements

The author sincerely thanks the Editor, the three anonymous reviewers, and BalázsKotosz (Adjunct Professor) for their comments and advice.

This study did not receive any specific grant.

Author information

Authors and affiliations.

College of Business and Economics, Department of Economics, Arsi University, Asella, Ethiopia

Sisay Demissew Beyene

You can also search for this author in PubMed   Google Scholar

Contributions

SDB collected, analyzed, and interpreted the data, and wrote and approved the paper for submission.

Corresponding author

Correspondence to Sisay Demissew Beyene .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for publication

Competing interests.

The author declares that there are no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1:.

  Table S1. Cook’s D results

Additional file 2:

  Table S2. Hausman and Breusch-Pagan LM for REtests. 

Additional file 3.

Rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Beyene, S.D. The impact of food insecurity on health outcomes: empirical evidence from sub-Saharan African countries. BMC Public Health 23 , 338 (2023). https://doi.org/10.1186/s12889-023-15244-3

Download citation

Received : 07 March 2022

Accepted : 08 February 2023

Published : 15 February 2023

DOI : https://doi.org/10.1186/s12889-023-15244-3

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Food insecurity
  • Life expectancy
  • Infant mortality
  • Panel data estimations
  • SSA countries

BMC Public Health

ISSN: 1471-2458

what is an outcome variable in research

  • United States
  • United Kingdom

What is generative AI? Artificial intelligence that creates

Generative ai models can carry on conversations, answer questions, write stories, produce source code, and create images and videos of almost any description. here's how generative ai works, how it's being used, and why it’s more limited than you might think..

Josh Fruhlinger

Contributing writer, InfoWorld |

shutterstock 1117048970 colorful balloons flying among paper planes and against a cloudy blue sky

The emergence of generative AI

How does generative ai work, what is an ai model, is generative ai sentient, testing the limits of computer intelligence.

  • Why does AI art have too many fingers?
  • Potential negative impacts of generative AI
  • Use cases for generative AI

Generative AI is a kind of artificial intelligence that creates new content, including text, images, audio, and video, based on patterns it has learned from existing content. Today’s generative AI models have been trained on enormous volumes of data using deep learning , or deep neural networks, and they can carry on conversations, answer questions, write stories, produce source code, and create images and videos of any description, all based on brief text inputs or “prompts.”

Generative AI is called generative because the AI creates something that didn’t previously exist. That’s what makes it different from discriminative AI , which draws distinctions between different kinds of input. To say it differently, discriminative AI tries to answer a question like “Is this image a drawing of a rabbit or a lion?” whereas generative AI responds to prompts like “Draw me a picture of a lion and a rabbit sitting next to each other.”

This article introduces you to generative AI and its uses with popular models like ChatGPT and DALL-E . We’ll also consider the limitations of the technology, including why “too many fingers” has become a dead giveaway for artificially generated art.

Generative AI has been around for years, arguably since  ELIZA , a chatbot that simulates talking to a therapist, was developed at MIT in 1966. But years of work on AI and machine learning have recently come to fruition with the release of new generative AI systems. You’ve almost certainly heard about ChatGPT , a text-based AI chatbot that produces remarkably human-like prose.  DALL-E  and  Stable Diffusion  have also drawn attention for their ability to create vibrant and realistic images based on text prompts.

Output from these systems is so uncanny that it has many people asking philosophical questions about the nature of consciousness—and worrying about the economic impact of generative AI on human jobs. But while all of these artificial intelligence creations are undeniably big news, there is arguably less going on beneath the surface than some may assume. We’ll get to some of those big-picture questions in a moment. First, let’s look at what’s going on under the hood.

Generative AI uses machine learning to process a huge amount of visual or textual data, much of which is scraped from the internet, and then determines what things are most likely to appear near other things. Much of the programming work of generative AI goes into creating algorithms that can distinguish the “things” of interest to the AI’s creators—words and sentences in the case of chatbots like ChatGPT , or visual elements for DALL-E . But fundamentally, generative AI creates its output by assessing an enormous corpus of data, then responding to prompts with something that falls within the realm of probability as determined by that corpus.

Autocomplete—when your cell phone or Gmail suggests what the remainder of the word or sentence you’re typing might be—is a low-level form of generative AI. ChatGPT and DALL-E just take the idea to significantly more advanced heights.

ChatGPT and DALL-E are interfaces to underlying AI functionality that is known in AI terms as a model. An AI model is a mathematical representation—implemented as an algorithm, or practice—that generates new data that will (hopefully) resemble a set of data you already have on hand. You’ll sometimes see ChatGPT and DALL-E themselves referred to as models; strictly speaking this is incorrect, as ChatGPT is a chatbot that gives users access to several different versions of the underlying GPT model. But in practice, these interfaces are how most people will interact with the models, so don’t be surprised to see the terms used interchangeably.

AI developers assemble a corpus of data of the type that they want their models to generate. This corpus is known as the model’s training set, and the process of developing the model is called training . The GPT models, for instance, were trained on a huge corpus of text scraped from the internet, and the result is that you can feed it natural language queries and it will respond in idiomatic English (or any number of other languages, depending on the input).

AI models treat different characteristics of the data in their training sets as vectors —mathematical structures made up of multiple numbers. Much of the secret sauce underlying these models is their ability to translate real-world information into vectors in a meaningful way, and to determine which vectors are similar to one another in a way that will allow the model to generate output that is similar to, but not identical to, its training set.

There are a number of different types of AI models out there, but keep in mind that the various categories are not necessarily mutually exclusive. Some models can fit into more than one category.

Probably the AI model type receiving the most public attention today is the large language models , or LLMs. LLMs are based on the concept of a transformer, first introduced in “ Attention Is All You Need ,” a 2017 paper from Google researchers. A transformer derives meaning from long sequences of text to understand how different words or semantic components might be related to one another, then determines how likely they are to occur in proximity to one another. The GPT models are LLMs, and the T stands for transformer. These transformers are run unsupervised on a vast corpus of natural language text in a process called  pretraining (that’s the  P in GPT), before being fine-tuned by human beings interacting with the model.

Diffusion is commonly used in generative AI models that produce images or video. In the diffusion process, the model adds noise —randomness, basically—to an image, then slowly removes it iteratively, all the while checking against its training set to attempt to match semantically similar images. Diffusion is at the core of AI models that perform text-to-image magic like Stable Diffusion and DALL-E.

A  generative adversarial network , or GAN, is based on a type of reinforcement learning , in which two algorithms compete against one another. One generates text or images based on probabilities derived from a big data set. The other—a discriminative AI—assesses whether that output is real or AI-generated. The generative AI repeatedly tries to “trick” the discriminative AI, automatically adapting to favor outcomes that are successful. Once the generative AI consistently “wins” this competition, the discriminative AI gets fine-tuned by humans and the process begins anew.

One of the most important things to keep in mind here is that, while there is human intervention in the training process, most of the learning and adapting happens automatically. Many, many iterations are required to get the models to the point where they produce interesting results, so automation is essential. The process is quite computationally intensive, and much of the recent explosion in AI capabilities has been driven by advances in GPU computing power and techniques for implementing parallel processing on these chips .

The mathematics and coding that go into creating and training generative AI models are quite complex, and well beyond the scope of this article. But if you interact with the models that are the end result of this process, the experience can be decidedly uncanny. You can get DALL-E to produce things that look like real works of art. You can have conversations with ChatGPT that feel like a conversation with another human. Have researchers truly created a thinking machine?

Chris Phipps, a former IBM natural language processing lead who worked on Watson AI products, says no. He describes ChatGPT as a “very good prediction machine.”

It’s very good at predicting what humans will find coherent. It’s not always coherent (it mostly is) but that’s not because ChatGPT “understands.” It’s the opposite: humans who consume the output are really good at making any implicit assumption we need in order to make the output make sense.

Phipps, who’s also a comedy performer, draws a comparison to a common improv game called Mind Meld.

Two people each think of a word, then say it aloud simultaneously—you might say “boot” and I say “tree.” We came up with those words completely independently and at first, they had nothing to do with each other. The next two participants take those two words and try to come up with something they have in common and say that aloud at the same time. The game continues until two participants say the same word.
Maybe two people both say “lumberjack.” It seems like magic, but really it’s that we use our human brains to reason about the input (“boot” and “tree”) and find a connection. We do the work of understanding, not the machine. There’s a lot more of that going on with ChatGPT and DALL-E than people are admitting. ChatGPT can write a story, but we humans do a lot of work to make it make sense.

Certain prompts that we can give to these AI models will make Phipps’ point fairly evident. For instance, consider the riddle “What weighs more, a pound of lead or a pound of feathers?” The answer, of course, is that they weigh the same (one pound), even though our instinct or common sense might tell us that the feathers are lighter.

ChatGPT will answer this riddle correctly, and you might assume it does so because it is a coldly logical computer that doesn’t have any “common sense” to trip it up. But that’s not what’s going on under the hood. ChatGPT isn’t logically reasoning out the answer; it’s just generating output based on its predictions of what should follow a question about a pound of feathers and a pound of lead. Since its training set includes a bunch of text explaining the riddle, it assembles a version of that correct answer.

However, if you ask ChatGPT whether two pounds of feathers are heavier than a pound of lead, it will confidently tell you they weigh the same amount, because that’s still the most likely output to a prompt about feathers and lead, based on its training set. It can be fun to tell the AI that it’s wrong and watch it flounder in response; I got it to apologize to me for its mistake and then suggest that two pounds of feathers weigh four times as much as a pound of lead.

  • Artificial Intelligence
  • Machine Learning
  • Emerging Technology
  • Technology Industry
  • Data Science
  • Generative AI

what is an outcome variable in research

U.S. flag

Official websites use .gov

A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS

A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

National Health and Nutrition Examination Survey

National Health and Nutrition Examination Survey

Picture of group of diverse people

If you were selected, learn more about participating

Picture of computers

Access data, documentation & response rates

Image of charts

View health and nutrition reports & CDC Growth Charts

image of woman at computer

Review step-by-step guidance on using  NHANES data

Picture of group of diverse people

Publications

  • Plan and Operations of the National Health and Nutrition Examination Survey, August 2021–August 2023

Data Release

  • Perfluoroalkyl and Polyfluoroalkyl Substances  (P_PFAS 2017-March 2020)
  • NHANES Longitudinal Study
  • Growth Charts
  • Surveys and Data Collection Systems
  • Research Data Center

Exit Notification / Disclaimer Policy

  • The Centers for Disease Control and Prevention (CDC) cannot attest to the accuracy of a non-federal website.
  • Linking to a non-federal website does not constitute an endorsement by CDC or any of its employees of the sponsors or the information and products presented on the website.
  • You will be subject to the destination website's privacy policy when you follow the link.
  • CDC is not responsible for Section 508 compliance (accessibility) on other federal or private website.

IMAGES

  1. PPT

    what is an outcome variable in research

  2. Types of Research Variable in Research with Example

    what is an outcome variable in research

  3. 13 Predictor and Outcome Variable Examples (2024)

    what is an outcome variable in research

  4. 27 Types of Variables in Research and Statistics (2024)

    what is an outcome variable in research

  5. Types Of Variables In Research Ppt

    what is an outcome variable in research

  6. PPT

    what is an outcome variable in research

VIDEO

  1. Statistics lecture 3, observations, variables, types of variables

  2. Types of variables in research|Controlled & extragenous variables|Intervening & moderating variables

  3. What does outcome variable mean?

  4. Logistic Regression Analysis in SPSS v28

  5. What is a variable?: Fundamentals part 1

  6. Statistics Exam 1 Review: Mean, Median, Standard Deviation, Histogram, Midpoint, Width of Class

COMMENTS

  1. Types of outcomes in clinical research

    Typical examples of outcomes are cure, clinical worsening, and mortality. The primary outcome is the variable that is the most relevant to answer the research question. Ideally, it should be patient-centered (i.e., an outcome that matters to patients, such as quality of life and survival). Secondary outcomes are additional outcomes monitored to ...

  2. Independent vs. Dependent Variables

    The independent variable is the cause. Its value is independent of other variables in your study. The dependent variable is the effect. Its value depends on changes in the independent variable. Example: Independent and dependent variables. You design a study to test whether changes in room temperature have an effect on math test scores.

  3. What type of variable is the outcome variable?

    The outcome is the attribute that you think might be predicted, or affected, by other attributes - for example, a disease that is affected by lifestyle factors. In a mathematical model, it is normally placed on the left hand side of the equation. The variables that you think might have an effect on the outcome are placed on the right hand ...

  4. 13 Predictor and Outcome Variable Examples (2024)

    Quality of diet is the predictor variable, and health is the outcome variable. 2. Noise Pollution and IQ. One scientist speculates that living in a noisy environment will affect a person's ability to concentrate, which will then affect their mental acuity and subsequent cognitive development.

  5. Variables in Research

    It is also known as the outcome variable, as it is the variable that is affected by the independent variable. Examples of dependent variables include blood pressure, test scores, and reaction time. ... Scientific research: Variables are used in scientific research to understand the relationships between different factors and to make predictions ...

  6. Variables in Research: Breaking Down the Essentials of Experimental

    The Role of Variables in Research. In scientific research, variables serve several key functions: Define Relationships: Variables allow researchers to investigate the relationships between different factors and characteristics, providing insights into the underlying mechanisms that drive phenomena and outcomes. Establish Comparisons: By manipulating and comparing variables, scientists can ...

  7. Types of Variables in Research & Statistics

    In an experiment, you manipulate the independent variable and measure the outcome in the dependent variable. For example, in an experiment about the effect of nutrients on crop growth: The independent variable is the amount of nutrients added to the crop field. The dependent variable is the biomass of the crops at harvest time.

  8. Independent & Dependent Variables (With Examples)

    Within research, especially scientific research, variables form the foundation of studies, as researchers are often interested in how one variable impacts another, and the relationships between different variables. For example: How someone's age impacts their sleep quality; How different teaching methods impact learning outcomes

  9. 1.1.4.1: IV and DV- Variables as Predictors and Outcomes

    All variables can be defined by their Scale of Measurement. Variables in research can also be described by whether the experimenter thinks that they are the cause of a behavior (IV), or the effect (DV). The IV is the variable that you use to do the explaining and the DV is the variable being explained.

  10. 1.10: The role of variables

    The terminology used to distinguish between different roles that a variable can play when analysing a data set. We could also use predictors and outcomes. The idea here is that what you're trying to do is use X X (the predictors) to make guesses about Y Y (the outcomes). This is summarized in the table:

  11. Sage Research Methods

    A dependent variable, also called an outcome variable, is the result of the action of one or more independent variables. It can also be defined as any outcome variable associated with some measure, such as a survey. ... Dependent variable. In Encyclopedia of Research Design (Vol. 0, pp. 348-348). SAGE Publications, Inc., https:// doi. org/10. ...

  12. Outcome Variables

    Variable. In most research, one or more outcome variables are measured. Statistical analysis is done on the outcome measures, and conclusions are drawn from the statistical analysis. One common source of misleading research results is giving inadequate attention to the choice of outcome variables. Making a good choice depends on the particulars ...

  13. Variables in Research

    In experimental research, the independent variable is what differentiates the control group from the experimental group, thereby setting the stage for meaningful comparison and analysis. Dependent variables. Dependent variables are the outcomes or effects that researchers aim to explore and understand in their studies.

  14. 1.1.2

    1.1.2 - Explanatory & Response Variables. In some research studies one variable is used to predict or explain differences in another variable. In those cases, the explanatory variable is used to predict or explain differences in the response variable. In an experimental study, the explanatory variable is the variable that is manipulated by the ...

  15. Understanding the different types of variable in statistics

    Experimental and Non-Experimental Research. Experimental research: In experimental research, the aim is to manipulate an independent variable(s) and then examine the effect that this change has on a dependent variable(s).Since it is possible to manipulate the independent variable(s), experimental research has the advantage of enabling a researcher to identify a cause and effect between variables.

  16. Organizing Your Social Sciences Research Paper

    A variable in research simply refers to a person, place, thing, or phenomenon that you are trying to measure in some way. The best way to understand the difference between a dependent and independent variable is that the meaning of each is implied by what the words tell us about the variable you are using. You can do this with a simple exercise ...

  17. Types of Variables

    A control variable in research is a factor that's kept constant to ensure that it doesn't influence the outcome. By controlling these variables, researchers can isolate the effects of the independent variable on the dependent variable, ensuring that other factors don't skew the results or introduce bias into the experiment.

  18. Doing Quantitative Research with Outcome Measures

    Outcome research is often characterised by the use of outcome measures, designed to identify the changes that take place during therapy. These contrast with process measures, which aim to identify the variables that cause these changes. One example of an outcome measure is the PHQ-9, which is a self-report measure of depressive symptoms. This can be completed at various intervals throughout ...

  19. Outcome variables

    Outcome variables are usually the dependent variables which are observed and measured by changing independent variables. These variables determine the effect of the cause (independent) variables when changed for different values. The dependent variables are the outcomes of the experiments determining what was caused or what changed as a result ...

  20. Research Hypothesis In Psychology: Types, & Examples

    Identify variables. The researcher manipulates the independent variable and the dependent variable is the measured outcome. Operationalized the variables being investigated. Operationalization of a hypothesis refers to the process of making the variables physically measurable or testable, e.g. if you are about to study aggression, you might ...

  21. The impact of food insecurity on health outcomes: empirical evidence

    Food insecurity adversely affects human health, which means food security and nutrition are crucial to improving people's health outcomes. Both food insecurity and health outcomes are the policy and agenda of the 2030 Sustainable Development Goals (SDGs). However, there is a lack of macro-level empirical studies (Macro-level study means studies at the broadest level using variables that ...

  22. What is generative AI? Artificial intelligence that creates

    Generative AI is a kind of artificial intelligence that creates new content, including text, images, audio, and video, based on patterns it has learned from existing content. Today's generative ...

  23. National Health and Nutrition Examination Survey

    Links with this icon indicate that you are leaving the CDC website.. The Centers for Disease Control and Prevention (CDC) cannot attest to the accuracy of a non-federal website. Linking to a non-federal website does not constitute an endorsement by CDC or any of its employees of the sponsors or the information and products presented on the website.