Hypothesis tests about the variance

by Marco Taboga , PhD

This page explains how to perform hypothesis tests about the variance of a normal distribution, called Chi-square tests.

We analyze two different situations:

when the mean of the distribution is known;

when it is unknown.

Depending on the situation, the Chi-square statistic used in the test has a different distribution.

At the end of the page, we propose some solved exercises.

Table of contents

Normal distribution with known mean

The null hypothesis, the test statistic, the critical region, the decision, the power function, the size of the test, how to choose the critical value, normal distribution with unknown mean, solved exercises.

The assumptions are the same previously made in the lecture on confidence intervals for the variance .

The sample is drawn from a normal distribution .

A test of hypothesis based on it is called a Chi-square test .

Otherwise the null is not rejected.

[eq8]

We explain how to do this in the page on critical values .

We now relax the assumption that the mean of the distribution is known.

[eq29]

See the comments on the choice of the critical value made for the case of known mean.

Below you can find some exercises with explained solutions.

Suppose that we observe 40 independent realizations of a normal random variable.

we run a Chi-square test of the null hypothesis that the variance is equal to 1;

[eq38]

Make the same assumptions of Exercise 1 above.

If the unadjusted sample variance is equal to 0.9, is the null hypothesis rejected?

How to cite

Please cite as:

Taboga, Marco (2021). "Hypothesis tests about the variance", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/fundamentals-of-statistics/hypothesis-testing-variance.

Most of the learning materials found on this website are now available in a traditional textbook format.

  • Convergence in probability
  • Multivariate normal distribution
  • Characteristic function
  • Moment generating function
  • Chi-square distribution
  • Beta function
  • Bernoulli distribution
  • Mathematical tools
  • Fundamentals of probability
  • Probability distributions
  • Asymptotic theory
  • Fundamentals of statistics
  • About Statlect
  • Cookies, privacy and terms of use
  • Posterior probability
  • IID sequence
  • Probability space
  • Probability density function
  • Continuous mapping theorem
  • To enhance your privacy,
  • we removed the social buttons,
  • but don't forget to share .

Logo for Open Library Publishing Platform

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

10.3 Statistical Inference for a Single Population Variance

Learning objectives.

  • Calculate and interpret a confidence interval for a population variance.
  • Conduct and interpret a hypothesis test on a single population variance.

The mean of a population is important, but in many cases the variance of the population is just as important.  In most production processes, quality is measured by how closely the process matches the target (i.e. the mean) and by the variability (i.e. the variance) of the process.  For example, if a process is to fill bags of coffee beans, we are interested in both the average weight of the bag and how much variation there is in the weight of the bags.  The quality is considered poor if the average weight of the bags is accurate but the variance of the weight of the bags is too high—a variance that is too large means some bags would be too full and some bags would be almost empty.

As with other population parameters, we can construct a confidence interval to capture the population variance and conduct a hypothesis test on the population variance.  In order to construct a confidence interval or conduct a hypothesis test on a population variance [latex]\sigma^2[/latex], we need to use the distribution of [latex]\displaystyle{\frac{(n-1) \times s^2}{\sigma^2}}[/latex].  Suppose we have a normal population with population variance [latex]\sigma^2[/latex] and a sample of size [latex]n[/latex] is taken from the population.  The sampling distribution of [latex]\displaystyle{\frac{(n-1) \times s^2}{\sigma^2}}[/latex] follows a [latex]\chi^2[/latex]-distribution with [latex]n-1[/latex] degrees of freedom.

Constructing a Confidence Interval for a Population Variance

To construct the confidence interval, take a random sample of size [latex]n[/latex] from a normally distributed population.  Calculate the sample variance [latex]s^2[/latex].  The limits for the confidence interval with confidence level [latex]C[/latex] for an unknown population variance [latex]\sigma^2[/latex] are

[latex]\begin{eqnarray*} \mbox{Lower Limit} & = & \frac{(n-1) \times s^2}{\chi^2_R} \\ \\ \mbox{Upper Limit} & = & \frac{(n-1) \times s^2}{\chi^2_L} \\ \\  \end{eqnarray*}[/latex]

where [latex]\chi^2_L[/latex] is the [latex]\chi^2[/latex]-score so that the area in the left-tail of the [latex]\chi^2[/latex]-distribution is [latex]\displaystyle{\frac{1-C}{2}}[/latex],  [latex]\chi^2_R[/latex] is the [latex]\chi^2[/latex]-score so that the area in the right-tail of the [latex]\chi^2[/latex]-distribution is [latex]\displaystyle{\frac{1-C}{2}}[/latex] and the [latex]\chi^2[/latex]-distribution has [latex]n-1[/latex] degrees of freedom.

This is a chi square distribution. Along the horizontal axis the points chi square L and chi square R are labeled. The area under the curve in between chi square L and chi square R equals C%. The area in the right tail to the right of chi square R equals (1-C)/2. The area in the left tail to the left of chi square L equals (1-C)/2.

  • Like the other confidence intervals we have seen, the [latex]\chi^2[/latex]-scores are the values that trap [latex]C\%[/latex] of the observations in the middle of the distribution so that the area of each tail is [latex]\displaystyle{\frac{1-C}{2}}[/latex].
  • Because the [latex]\chi^2[/latex]-distribution is not symmetrical, the confidence interval for a population variance requires that we calculate two different [latex]\chi^2[/latex]-scores:  one for the left tail and one for the right tail.  In Excel, we will need to use both the chisq.inv function (for the left tail) and the chisq.inv.rt function (for the right tail) to find the two different [latex]\chi^2[/latex]-scores.
  • The [latex]\chi^2[/latex]-score for the left tail is part of the formula for the upper limit and the [latex]\chi^2[/latex]-score for the right tail is part of the formula for the lower limit.  This is not a mistake .  It follows from the formula used to determine the limits for the confidence interval.

A local telecom company conducts broadband speed tests to measure how much data per second passes between a customer’s computer and the internet compared to what the customer pays for as part of their plan .  The company needs to estimate the variance in the broadband speed.  A sample of 15 ISPs is taken and amount of data per second is recorded.  The variance in the sample is 174.

  • Construct a 97% confidence interval for the variance in the amount of data per second that passes between a customer’s computer and the internet.
  • Interpret the confidence interval found in part 1.

This is a chi square distribution. Along the horizontal axis the point chi square L is labeled. The area in the left tail to the left of chi square L equals 1.5%

chisq.inv
0.015 5.0572…
14

We also need find the [latex]\chi^2_R[/latex]-score for the 97% confidence interval.  This means that we need to find the [latex]\chi^2_R[/latex]-score so that the area in the right tail is [latex]\displaystyle{\frac{1-0.97}{2}=0.015}[/latex].  The degrees of freedom for the [latex]\chi^2[/latex]-distribution is [latex]n-1=15-1=14[/latex].

This is a chi square distribution. Along the horizontal axis the point chi square R is labeled. The area in the left tail to the left of chi square R equals 1.5%

chisq.inv.rt
0.015 27.826…
14

So [latex]\chi^2_L=5.0572...[/latex] and [latex]\chi^2_R=27.826...[/latex].  From the sample data supplied in the question [latex]s^2=174[/latex] and [latex]n=15[/latex].  The 97% confidence interval is

[latex]\begin{eqnarray*} \mbox{Lower Limit} & = & \frac{(n-1) \times s^2}{\chi^2_R} \\ & = & \frac{(15-1) \times 174}{27.826...} \\ & = & 87.54  \\ \\ \mbox{Upper Limit} & = & \frac{(n-1) \times s^2}{\chi^2_R} \\ & = & \frac{(15-1) \times 174}{5.0572...} \\ & = & 481.69 \\ \\ \end{eqnarray*}[/latex]

  • We are 97% confident that the variance in the amount of data per second that passes between a customer’s computer and the internet is between 87.54 and 481.69.
  • When calculating the limits for the confidence interval keep all of the decimals in the [latex]\chi^2[/latex]-scores and other values throughout the calculation.  This will ensure that there is no round-off error in the answer.  You can use Excel to do the calculations of the limits, clicking on the cells containing the [latex]\chi^2[/latex]-scores and any other values.
  • When writing down the interpretation of the confidence interval, make sure to include the confidence level and the actual population variance captured by the confidence interval (i.e. be specific to the context of the question).  In this case, there are no units for the limits because variance does not have any limits.

Steps to Conduct a Hypothesis Test for a Population Variance

  • Write down the null and alternative hypotheses in terms of the population variance [latex]\sigma^2[/latex].
  • Use the form of the alternative hypothesis to determine if the test is left-tailed, right-tailed, or two-tailed.
  • Collect the sample information for the test and identify the significance level [latex]\alpha[/latex].

[latex]\begin{eqnarray*}\chi^2=\frac{(n-1) \times s^2}{\sigma^2} & \; \; \; \; \; \; \; \; & df=n-1 \\ \\ \end{eqnarray*}[/latex]

  • The results of the sample data are significant.  There is sufficient evidence to conclude that the null hypothesis [latex]H_0[/latex] is an incorrect belief and that the alternative hypothesis [latex]H_a[/latex] is most likely correct.
  • The results of the sample data are not significant.  There is not sufficient evidence to conclude that the alternative hypothesis [latex]H_a[/latex] may be correct.
  • Write down a concluding sentence specific to the context of the question.

A statistics instructor at a local college claims that the variance for the final exam scores was 25.  After speaking with his classmates, one the class’s best students thinks that the variance for the final exam scores is higher than the instructor claims.  The student challenges the instructor to prove her claim.  The instructor takes a sample 30 final exams and finds the variance of the scores is 28.  At the 5% significance level, test if the variance of the final exam scores is higher than the instructor claims.

Hypotheses:

[latex]\begin{eqnarray*} H_0: & & \sigma^2=25  \\ H_a: & & \sigma^2 \gt 25 \end{eqnarray*}[/latex]

From the question, we have [latex]n=30[/latex], [latex]s^2=28[/latex], and [latex]\alpha=0.05[/latex].

Because the alternative hypothesis is a [latex]\gt[/latex], the p -value is the area in the right tail of the [latex]\chi^2[/latex]-distribution.

This is a chi square distribution. Along the horizontal axis the point chi square is labeled. The area in the right tail to the right of chi square is shaded and labeled with p-value.

To use the chisq.dist.rt function, we need to calculate out the [latex]\chi^2[/latex]-score and the degrees of freedom:

[latex]\begin{eqnarray*} \chi^2 & = &\frac{(n-1) \times s^2}{\sigma^2} \\ & = & \frac{(30-1) \times 28}{25} \\ & = & 32.48 \\ \\ df & = & n-1 \\ & = & 30-1 \\ & = & 29 \end{eqnarray*}[/latex]

chisq.dist.rt
32.48 0.2992
29

So the p -value[latex]=0.2992[/latex].

Conclusion:

Because p -value[latex]=0.2992 \gt 0.05=\alpha[/latex], we do not reject the null hypothesis.  At the 5% significance level there is not enough evidence to suggest that the variance of the final exam scores is higher than 25.

  • The null hypothesis [latex]\sigma^2=25[/latex] is the claim that the variance on the final exam is 25.
  • The alternative hypothesis [latex]\sigma^2 \gt 25[/latex] is the claim that the variance on the final exam is greater than 25.
  • There are no units included with the hypotheses because variance does not have any units.
  • The function is chisq.dist.rt because we are finding the area in the right tail of a [latex]\chi^2[/latex]-distribution.
  • Field 1 is the value of [latex]\chi^2[/latex].
  • Field 2 is the degrees of freedom.
  • The p -value of 0.2992 is a large probability compared to the significance level, and so is likely to happen assuming the null hypothesis is true.  This suggests that the assumption that the null hypothesis is true is most likely correct, and so the conclusion of the test is to not reject the null hypothesis.  In other words, the variance of the scores on the final exam is most likely 25.

With individual lines at its various windows, a post office finds that the standard deviation for normally distributed waiting times for customers is 7.2 minutes. The post office experiments with a single, main waiting line and finds that for a random sample of 25 customers the waiting times for customers have a standard deviation of 4.5 minutes.  At the 5% significance level, determine if the single line changed the variation among the wait times for customers.

[latex]\begin{eqnarray*} H_0: & & \sigma^2=51.84  \\ H_a: & & \sigma^2 \neq 51.84 \end{eqnarray*}[/latex]

From the question, we have [latex]n=25[/latex], [latex]s^2=20.25[/latex], and [latex]\alpha=0.05[/latex].

Because the alternative hypothesis is a [latex]\neq[/latex], the p -value is the sum of the areas in the tails of the [latex]\chi^2[/latex]-distribution.

This is a chi square distribution curve. The points chi square L and chi square R are on the horizontal axis. A vertical line extends from point chi square R to the curve with the area to the right of this vertical line shaded and labeled as one half of the p-value. A vertical line extends from chi square L to the curve with the area to the left of this vertical line shaded and labeled as one half of the p-value. The p-value equals the sum of area of these two shaded regions.

We need to calculate out the [latex]\chi^2[/latex]-score and the degrees of freedom:

[latex]\begin{eqnarray*} \chi^2 & = &\frac{(n-1) \times s^2}{\sigma^2} \\ & = & \frac{(25-1) \times 20.25}{51.84} \\ & = & 9.375 \\ \\ df & = & n-1 \\ & = & 25-1 \\ & = & 24 \end{eqnarray*}[/latex]

Because this is a two-tailed test, we need to know which tail (left or right) we have the [latex]\chi^2[/latex]-score for so that we can use the correct Excel function.  If [latex]\chi^2 \gt df-2[/latex], the [latex]\chi^2[/latex]-score corresponds to the right tail.  If the [latex]\chi^2 \lt df-2[/latex], the [latex]\chi^2[/latex]-score corresponds to the left tail.  In this case, [latex]\chi^2=9.375 \lt 22=df-2[/latex], so the [latex]\chi^2[/latex]-score corresponds to the left tail.  We need to use chisq.dist to find the area in the left tail.

chisq.dist
9.375 0.0033
24

So the area in the left tail is 0.0033, which means that [latex]\frac{1}{2}[/latex]( p -value)=0.0033.  This is also the area in the right tail, so

p -value=[latex]0.0033+0.0033=0.0066[/latex]

Because p -value[latex]=0.0066 \lt 0.05=\alpha[/latex], we reject the null hypothesis in favour of the alternative hypothesis.  At the 5% significance level there is enough evidence to suggest that the variation among the wait times for customers has changed.

  • The null hypothesis [latex]\sigma^2=51.84[/latex] is the claim that the variance in the wait times is 51.84.  Note that we were given the standard deviation ([latex]\sigma=7.2[/latex]) in the question.  But this is a test on variance, so we must write the hypotheses in terms of the variance [latex]\sigma^2=7.2^2=51.84[/latex].
  • The alternative hypothesis [latex]\sigma^2 \neq 51.84[/latex] is the claim that the variance in the wait times has changed from 51.84.
  • We use chisq.dist to find the area in the left tail.  The area in the right tail equals the area in the left tail, so we can find the  p -value by adding the output from this function to itself.
  • We use chisq.dist.rt  to find the area in the right tail.  The area in the left tail equals the area in the right tail, so we can find the  p -value by adding the output from this function to itself.
  • The p -value of 0.0066 is a small probability compared to the significance level, and so is unlikely to happen assuming the null hypothesis is true.  This suggests that the assumption that the null hypothesis is true is most likely incorrect, and so the conclusion of the test is to reject the null hypothesis in favour of the alternative hypothesis.  In other words, the variance in the wait times has most likely changed.

A scuba instructor wants to record the collective depths each of his students dives during their checkout.  He is interested in how the depths vary, even though everyone should have been at the same depth.  He believes the standard deviation of the depths is 1.2 meters.  But his assistant thinks the standard deviation is less than 1.2 meters.  The instructor wants to test this claim.  The scuba instructor uses his most recent class of 20 students as a sample and finds that the standard deviation of the depths is 0.85 meters.  At the 1% significance level, test if the variability in the depths of the student scuba divers is less than claimed.

[latex]\begin{eqnarray*} H_0: & & \sigma^2=1.44  \\ H_a: & & \sigma^2 \lt 1.44  \end{eqnarray*}[/latex]

From the question, we have [latex]n=20[/latex], [latex]s^2=0.7225[/latex], and [latex]\alpha=0.01[/latex].

Because the alternative hypothesis is a [latex]\lt[/latex], the p -value is the area in the left tail of the [latex]\chi^2[/latex]-distribution.

This is a chi square distribution curve. The point chi square is on the horizontal axis. A vertical line extends from chi square to the curve with the area to the left of this vertical line shaded and labeled as the p-value.

To use the chisq.dist function, we need to calculate out the [latex]\chi^2[/latex]-score and the degrees of freedom:

[latex]\begin{eqnarray*} \chi^2 & = &\frac{(n-1) \times s^2}{\sigma^2} \\ & = & \frac{(20-1) \times 0.7225}{1.44} \\ & = & 9.5329... \\ \\ df & = & n-1 \\ & = & 20-1 \\ & = & 19 \end{eqnarray*}[/latex]

chisq.dist
9.5329… 0.0365
19
true

So the p -value[latex]=0.0365[/latex].

Because p -value[latex]=0.0365 \gt 0.01=\alpha[/latex], we do not reject the null hypothesis.  At the 1% significance level there is not enough evidence to suggest that the variation in the depths of the students is less than claimed.

Watch this video: Hypothesis Tests for One Population Variance by jbstatistics [8:51]

Concept Review

To construct a confidence interval or conduct a hypothesis test on a population variance, we use the sampling distribution of [latex]\displaystyle{\frac{(n-1) \times s^2}{\sigma^2}}[/latex], which follows a [latex]\chi^2[/latex]-distribution with [latex]n-1[/latex] degrees of freedom.

The hypothesis test for a population variance is a well established process:

  • Collect the sample information for the test and identify the significance level.
  • Find the  p -value (the area in the corresponding tail) for the test using the [latex]\chi^2[/latex]-distribution where [latex]\displaystyle{\chi^2=\frac{(n-1) \times s^2}{\sigma^2}}[/latex] and [latex]df=n-1[/latex].
  • Compare the  p -value to the significance level and state the outcome of the test.

[latex]\begin{eqnarray*} \mbox{Lower Limit} & = & \frac{(n-1) \times s^2}{\chi^2_R} \\ \\ \mbox{Upper Limit} & = & \frac{(n-1) \times s^2}{\chi^2_L} \end{eqnarray*}[/latex]

where [latex]\chi^2_L[/latex] is the [latex]\chi^2[/latex]-score so that the area in the left-tail of of the [latex]\chi^2[/latex]-distribution is [latex]\displaystyle{\frac{1-C}{2}}[/latex],  [latex]\chi^2_R[/latex] is the [latex]\chi^2[/latex]-score so that the area in the right-tail of of the [latex]\chi^2[/latex]-distribution is [latex]\displaystyle{\frac{1-C}{2}}[/latex] and the [latex]\chi^2[/latex]-distribution has [latex]n-1[/latex] degrees of freedom.

Attribution

“ 11.6   Test of a Single Variance “ in Introductory Statistics by OpenStax  is licensed under a  Creative Commons Attribution 4.0 International License.

Introduction to Statistics Copyright © 2022 by Valerie Watts is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Hypothesis Testing - Analysis of Variance (ANOVA)

Lisa Sullivan, PhD

Professor of Biostatistics

Boston University School of Public Health

population variance hypothesis test

Introduction

This module will continue the discussion of hypothesis testing, where a specific statement or hypothesis is generated about a population parameter, and sample statistics are used to assess the likelihood that the hypothesis is true. The hypothesis is based on available information and the investigator's belief about the population parameters. The specific test considered here is called analysis of variance (ANOVA) and is a test of hypothesis that is appropriate to compare means of a continuous variable in two or more independent comparison groups. For example, in some clinical trials there are more than two comparison groups. In a clinical trial to evaluate a new medication for asthma, investigators might compare an experimental medication to a placebo and to a standard treatment (i.e., a medication currently being used). In an observational study such as the Framingham Heart Study, it might be of interest to compare mean blood pressure or mean cholesterol levels in persons who are underweight, normal weight, overweight and obese.  

The technique to test for a difference in more than two independent means is an extension of the two independent samples procedure discussed previously which applies when there are exactly two independent comparison groups. The ANOVA technique applies when there are two or more than two independent groups. The ANOVA procedure is used to compare the means of the comparison groups and is conducted using the same five step approach used in the scenarios discussed in previous sections. Because there are more than two groups, however, the computation of the test statistic is more involved. The test statistic must take into account the sample sizes, sample means and sample standard deviations in each of the comparison groups.

If one is examining the means observed among, say three groups, it might be tempting to perform three separate group to group comparisons, but this approach is incorrect because each of these comparisons fails to take into account the total data, and it increases the likelihood of incorrectly concluding that there are statistically significate differences, since each comparison adds to the probability of a type I error. Analysis of variance avoids these problemss by asking a more global question, i.e., whether there are significant differences among the groups, without addressing differences between any two groups in particular (although there are additional tests that can do this if the analysis of variance indicates that there are differences among the groups).

The fundamental strategy of ANOVA is to systematically examine variability within groups being compared and also examine variability among the groups being compared.

Learning Objectives

After completing this module, the student will be able to:

  • Perform analysis of variance by hand
  • Appropriately interpret results of analysis of variance tests
  • Distinguish between one and two factor analysis of variance tests
  • Identify the appropriate hypothesis testing procedure based on type of outcome variable and number of samples

The ANOVA Approach

Consider an example with four independent groups and a continuous outcome measure. The independent groups might be defined by a particular characteristic of the participants such as BMI (e.g., underweight, normal weight, overweight, obese) or by the investigator (e.g., randomizing participants to one of four competing treatments, call them A, B, C and D). Suppose that the outcome is systolic blood pressure, and we wish to test whether there is a statistically significant difference in mean systolic blood pressures among the four groups. The sample data are organized as follows:

 

n

n

n

n

s

s

s

s

The hypotheses of interest in an ANOVA are as follows:

  • H 0 : μ 1 = μ 2 = μ 3 ... = μ k
  • H 1 : Means are not all equal.

where k = the number of independent comparison groups.

In this example, the hypotheses are:

  • H 0 : μ 1 = μ 2 = μ 3 = μ 4
  • H 1 : The means are not all equal.

The null hypothesis in ANOVA is always that there is no difference in means. The research or alternative hypothesis is always that the means are not all equal and is usually written in words rather than in mathematical symbols. The research hypothesis captures any difference in means and includes, for example, the situation where all four means are unequal, where one is different from the other three, where two are different, and so on. The alternative hypothesis, as shown above, capture all possible situations other than equality of all means specified in the null hypothesis.

Test Statistic for ANOVA

The test statistic for testing H 0 : μ 1 = μ 2 = ... =   μ k is:

and the critical value is found in a table of probability values for the F distribution with (degrees of freedom) df 1 = k-1, df 2 =N-k. The table can be found in "Other Resources" on the left side of the pages.

NOTE: The test statistic F assumes equal variability in the k populations (i.e., the population variances are equal, or s 1 2 = s 2 2 = ... = s k 2 ). This means that the outcome is equally variable in each of the comparison populations. This assumption is the same as that assumed for appropriate use of the test statistic to test equality of two independent means. It is possible to assess the likelihood that the assumption of equal variances is true and the test can be conducted in most statistical computing packages. If the variability in the k comparison groups is not similar, then alternative techniques must be used.

The F statistic is computed by taking the ratio of what is called the "between treatment" variability to the "residual or error" variability. This is where the name of the procedure originates. In analysis of variance we are testing for a difference in means (H 0 : means are all equal versus H 1 : means are not all equal) by evaluating variability in the data. The numerator captures between treatment variability (i.e., differences among the sample means) and the denominator contains an estimate of the variability in the outcome. The test statistic is a measure that allows us to assess whether the differences among the sample means (numerator) are more than would be expected by chance if the null hypothesis is true. Recall in the two independent sample test, the test statistic was computed by taking the ratio of the difference in sample means (numerator) to the variability in the outcome (estimated by Sp).  

The decision rule for the F test in ANOVA is set up in a similar way to decision rules we established for t tests. The decision rule again depends on the level of significance and the degrees of freedom. The F statistic has two degrees of freedom. These are denoted df 1 and df 2 , and called the numerator and denominator degrees of freedom, respectively. The degrees of freedom are defined as follows:

df 1 = k-1 and df 2 =N-k,

where k is the number of comparison groups and N is the total number of observations in the analysis.   If the null hypothesis is true, the between treatment variation (numerator) will not exceed the residual or error variation (denominator) and the F statistic will small. If the null hypothesis is false, then the F statistic will be large. The rejection region for the F test is always in the upper (right-hand) tail of the distribution as shown below.

Rejection Region for F   Test with a =0.05, df 1 =3 and df 2 =36 (k=4, N=40)

Graph of rejection region for the F statistic with alpha=0.05

For the scenario depicted here, the decision rule is: Reject H 0 if F > 2.87.

The ANOVA Procedure

We will next illustrate the ANOVA procedure using the five step approach. Because the computation of the test statistic is involved, the computations are often organized in an ANOVA table. The ANOVA table breaks down the components of variation in the data into variation between treatments and error or residual variation. Statistical computing packages also produce ANOVA tables as part of their standard output for ANOVA, and the ANOVA table is set up as follows: 

Source of Variation

Sums of Squares (SS)

Degrees of Freedom (df)

Mean Squares (MS)

F

Between Treatments

k-1

Error (or Residual)

N-k

Total

N-1

where  

  • X = individual observation,
  • k = the number of treatments or independent comparison groups, and
  • N = total number of observations or total sample size.

The ANOVA table above is organized as follows.

  • The first column is entitled "Source of Variation" and delineates the between treatment and error or residual variation. The total variation is the sum of the between treatment and error variation.
  • The second column is entitled "Sums of Squares (SS)" . The between treatment sums of squares is

and is computed by summing the squared differences between each treatment (or group) mean and the overall mean. The squared differences are weighted by the sample sizes per group (n j ). The error sums of squares is:

and is computed by summing the squared differences between each observation and its group mean (i.e., the squared differences between each observation in group 1 and the group 1 mean, the squared differences between each observation in group 2 and the group 2 mean, and so on). The double summation ( SS ) indicates summation of the squared differences within each treatment and then summation of these totals across treatments to produce a single value. (This will be illustrated in the following examples). The total sums of squares is:

and is computed by summing the squared differences between each observation and the overall sample mean. In an ANOVA, data are organized by comparison or treatment groups. If all of the data were pooled into a single sample, SST would reflect the numerator of the sample variance computed on the pooled or total sample. SST does not figure into the F statistic directly. However, SST = SSB + SSE, thus if two sums of squares are known, the third can be computed from the other two.

  • The third column contains degrees of freedom . The between treatment degrees of freedom is df 1 = k-1. The error degrees of freedom is df 2 = N - k. The total degrees of freedom is N-1 (and it is also true that (k-1) + (N-k) = N-1).
  • The fourth column contains "Mean Squares (MS)" which are computed by dividing sums of squares (SS) by degrees of freedom (df), row by row. Specifically, MSB=SSB/(k-1) and MSE=SSE/(N-k). Dividing SST/(N-1) produces the variance of the total sample. The F statistic is in the rightmost column of the ANOVA table and is computed by taking the ratio of MSB/MSE.  

A clinical trial is run to compare weight loss programs and participants are randomly assigned to one of the comparison programs and are counseled on the details of the assigned program. Participants follow the assigned program for 8 weeks. The outcome of interest is weight loss, defined as the difference in weight measured at the start of the study (baseline) and weight measured at the end of the study (8 weeks), measured in pounds.  

Three popular weight loss programs are considered. The first is a low calorie diet. The second is a low fat diet and the third is a low carbohydrate diet. For comparison purposes, a fourth group is considered as a control group. Participants in the fourth group are told that they are participating in a study of healthy behaviors with weight loss only one component of interest. The control group is included here to assess the placebo effect (i.e., weight loss due to simply participating in the study). A total of twenty patients agree to participate in the study and are randomly assigned to one of the four diet groups. Weights are measured at baseline and patients are counseled on the proper implementation of the assigned diet (with the exception of the control group). After 8 weeks, each patient's weight is again measured and the difference in weights is computed by subtracting the 8 week weight from the baseline weight. Positive differences indicate weight losses and negative differences indicate weight gains. For interpretation purposes, we refer to the differences in weights as weight losses and the observed weight losses are shown below.

Low Calorie

Low Fat

Low Carbohydrate

Control

8

2

3

2

9

4

5

2

6

3

4

-1

7

5

2

0

3

1

3

3

Is there a statistically significant difference in the mean weight loss among the four diets?  We will run the ANOVA using the five-step approach.

  • Step 1. Set up hypotheses and determine level of significance

H 0 : μ 1 = μ 2 = μ 3 = μ 4 H 1 : Means are not all equal              α=0.05

  • Step 2. Select the appropriate test statistic.  

The test statistic is the F statistic for ANOVA, F=MSB/MSE.

  • Step 3. Set up decision rule.  

The appropriate critical value can be found in a table of probabilities for the F distribution(see "Other Resources"). In order to determine the critical value of F we need degrees of freedom, df 1 =k-1 and df 2 =N-k. In this example, df 1 =k-1=4-1=3 and df 2 =N-k=20-4=16. The critical value is 3.24 and the decision rule is as follows: Reject H 0 if F > 3.24.

  • Step 4. Compute the test statistic.  

To organize our computations we complete the ANOVA table. In order to compute the sums of squares we must first compute the sample means for each group and the overall mean based on the total sample.  

 

Low Calorie

Low Fat

Low Carbohydrate

Control

n

5

5

5

5

Group mean

6.6

3.0

3.4

1.2

We can now compute

So, in this case:

Next we compute,

SSE requires computing the squared differences between each observation and its group mean. We will compute SSE in parts. For the participants in the low calorie diet:  

6.6

8

1.4

2.0

9

2.4

5.8

6

-0.6

0.4

7

0.4

0.2

3

-3.6

13.0

Totals

0

21.4

For the participants in the low fat diet:  

3.0

2

-1.0

1.0

4

1.0

1.0

3

0.0

0.0

5

2.0

4.0

1

-2.0

4.0

Totals

0

10.0

For the participants in the low carbohydrate diet:  

3

-0.4

0.2

5

1.6

2.6

4

0.6

0.4

2

-1.4

2.0

3

-0.4

0.2

Totals

0

5.4

For the participants in the control group:

2

0.8

0.6

2

0.8

0.6

-1

-2.2

4.8

0

-1.2

1.4

3

1.8

3.2

Totals

0

10.6

We can now construct the ANOVA table .

Source of Variation

Sums of Squares

(SS)

Degrees of Freedom

(df)

Means Squares

(MS)

F

Between Treatmenst

75.8

4-1=3

75.8/3=25.3

25.3/3.0=8.43

Error (or Residual)

47.4

20-4=16

47.4/16=3.0

Total

123.2

20-1=19

  • Step 5. Conclusion.  

We reject H 0 because 8.43 > 3.24. We have statistically significant evidence at α=0.05 to show that there is a difference in mean weight loss among the four diets.    

ANOVA is a test that provides a global assessment of a statistical difference in more than two independent means. In this example, we find that there is a statistically significant difference in mean weight loss among the four diets considered. In addition to reporting the results of the statistical test of hypothesis (i.e., that there is a statistically significant difference in mean weight losses at α=0.05), investigators should also report the observed sample means to facilitate interpretation of the results. In this example, participants in the low calorie diet lost an average of 6.6 pounds over 8 weeks, as compared to 3.0 and 3.4 pounds in the low fat and low carbohydrate groups, respectively. Participants in the control group lost an average of 1.2 pounds which could be called the placebo effect because these participants were not participating in an active arm of the trial specifically targeted for weight loss. Are the observed weight losses clinically meaningful?

Another ANOVA Example

Calcium is an essential mineral that regulates the heart, is important for blood clotting and for building healthy bones. The National Osteoporosis Foundation recommends a daily calcium intake of 1000-1200 mg/day for adult men and women. While calcium is contained in some foods, most adults do not get enough calcium in their diets and take supplements. Unfortunately some of the supplements have side effects such as gastric distress, making them difficult for some patients to take on a regular basis.  

 A study is designed to test whether there is a difference in mean daily calcium intake in adults with normal bone density, adults with osteopenia (a low bone density which may lead to osteoporosis) and adults with osteoporosis. Adults 60 years of age with normal bone density, osteopenia and osteoporosis are selected at random from hospital records and invited to participate in the study. Each participant's daily calcium intake is measured based on reported food intake and supplements. The data are shown below.   

1200

1000

890

1000

1100

650

980

700

1100

900

800

900

750

500

400

800

700

350

Is there a statistically significant difference in mean calcium intake in patients with normal bone density as compared to patients with osteopenia and osteoporosis? We will run the ANOVA using the five-step approach.

H 0 : μ 1 = μ 2 = μ 3 H 1 : Means are not all equal                            α=0.05

In order to determine the critical value of F we need degrees of freedom, df 1 =k-1 and df 2 =N-k.   In this example, df 1 =k-1=3-1=2 and df 2 =N-k=18-3=15. The critical value is 3.68 and the decision rule is as follows: Reject H 0 if F > 3.68.

To organize our computations we will complete the ANOVA table. In order to compute the sums of squares we must first compute the sample means for each group and the overall mean.  

Normal Bone Density

n =6

n =6

n =6

 If we pool all N=18 observations, the overall mean is 817.8.

We can now compute:

Substituting:

SSE requires computing the squared differences between each observation and its group mean. We will compute SSE in parts. For the participants with normal bone density:

1200

261.6667

68,486.9

1000

61.6667

3,806.9

980

41.6667

1,738.9

900

-38.3333

1,466.9

750

-188.333

35,456.9

800

-138.333

19,126.9

Total

0

130,083.3

For participants with osteopenia:

1000

200

40,000

1100

300

90,000

700

-100

10,000

800

0

0

500

-300

90,000

700

-100

10,000

Total

0

240,000

For participants with osteoporosis:

890

175

30,625

650

-65

4,225

1100

385

148,225

900

185

34,225

400

-315

99,225

350

-365

133,225

Total

0

449,750

Between Treatments

152,477.7

2

76,238.6

1.395

Error or Residual

819,833.3

15

54,655.5

Total

972,311.0

17

We do not reject H 0 because 1.395 < 3.68. We do not have statistically significant evidence at a =0.05 to show that there is a difference in mean calcium intake in patients with normal bone density as compared to osteopenia and osterporosis. Are the differences in mean calcium intake clinically meaningful? If so, what might account for the lack of statistical significance?

One-Way ANOVA in R

The video below by Mike Marin demonstrates how to perform analysis of variance in R. It also covers some other statistical issues, but the initial part of the video will be useful to you.

Two-Factor ANOVA

The ANOVA tests described above are called one-factor ANOVAs. There is one treatment or grouping factor with k > 2 levels and we wish to compare the means across the different categories of this factor. The factor might represent different diets, different classifications of risk for disease (e.g., osteoporosis), different medical treatments, different age groups, or different racial/ethnic groups. There are situations where it may be of interest to compare means of a continuous outcome across two or more factors. For example, suppose a clinical trial is designed to compare five different treatments for joint pain in patients with osteoarthritis. Investigators might also hypothesize that there are differences in the outcome by sex. This is an example of a two-factor ANOVA where the factors are treatment (with 5 levels) and sex (with 2 levels). In the two-factor ANOVA, investigators can assess whether there are differences in means due to the treatment, by sex or whether there is a difference in outcomes by the combination or interaction of treatment and sex. Higher order ANOVAs are conducted in the same way as one-factor ANOVAs presented here and the computations are again organized in ANOVA tables with more rows to distinguish the different sources of variation (e.g., between treatments, between men and women). The following example illustrates the approach.

Consider the clinical trial outlined above in which three competing treatments for joint pain are compared in terms of their mean time to pain relief in patients with osteoarthritis. Because investigators hypothesize that there may be a difference in time to pain relief in men versus women, they randomly assign 15 participating men to one of the three competing treatments and randomly assign 15 participating women to one of the three competing treatments (i.e., stratified randomization). Participating men and women do not know to which treatment they are assigned. They are instructed to take the assigned medication when they experience joint pain and to record the time, in minutes, until the pain subsides. The data (times to pain relief) are shown below and are organized by the assigned treatment and sex of the participant.

Table of Time to Pain Relief by Treatment and Sex

12

21

15

19

16

18

17

24

14

25

14

21

17

20

19

23

20

27

17

25

25

37

27

34

29

36

24

26

22

29

The analysis in two-factor ANOVA is similar to that illustrated above for one-factor ANOVA. The computations are again organized in an ANOVA table, but the total variation is partitioned into that due to the main effect of treatment, the main effect of sex and the interaction effect. The results of the analysis are shown below (and were generated with a statistical computing package - here we focus on interpretation). 

 ANOVA Table for Two-Factor ANOVA

Model

967.0

5

193.4

20.7

0.0001

Treatment

651.5

2

325.7

34.8

0.0001

Sex

313.6

1

313.6

33.5

0.0001

Treatment * Sex

1.9

2

0.9

0.1

0.9054

Error or Residual

224.4

24

9.4

Total

1191.4

29

There are 4 statistical tests in the ANOVA table above. The first test is an overall test to assess whether there is a difference among the 6 cell means (cells are defined by treatment and sex). The F statistic is 20.7 and is highly statistically significant with p=0.0001. When the overall test is significant, focus then turns to the factors that may be driving the significance (in this example, treatment, sex or the interaction between the two). The next three statistical tests assess the significance of the main effect of treatment, the main effect of sex and the interaction effect. In this example, there is a highly significant main effect of treatment (p=0.0001) and a highly significant main effect of sex (p=0.0001). The interaction between the two does not reach statistical significance (p=0.91). The table below contains the mean times to pain relief in each of the treatments for men and women (Note that each sample mean is computed on the 5 observations measured under that experimental condition).  

Mean Time to Pain Relief by Treatment and Gender

A

14.8

21.4

B

17.4

23.2

C

25.4

32.4

Treatment A appears to be the most efficacious treatment for both men and women. The mean times to relief are lower in Treatment A for both men and women and highest in Treatment C for both men and women. Across all treatments, women report longer times to pain relief (See below).  

Graph of two-factor ANOVA

Notice that there is the same pattern of time to pain relief across treatments in both men and women (treatment effect). There is also a sex effect - specifically, time to pain relief is longer in women in every treatment.  

Suppose that the same clinical trial is replicated in a second clinical site and the following data are observed.

Table - Time to Pain Relief by Treatment and Sex - Clinical Site 2

22

21

25

19

26

18

27

24

24

25

14

21

17

20

19

23

20

27

17

25

15

37

17

34

19

36

14

26

12

29

The ANOVA table for the data measured in clinical site 2 is shown below.

Table - Summary of Two-Factor ANOVA - Clinical Site 2

Source of Variation

Sums of Squares

(SS)

Degrees of freedom

(df)

Mean Squares

(MS)

F

P-Value

Model

907.0

5

181.4

19.4

0.0001

Treatment

71.5

2

35.7

3.8

0.0362

Sex

313.6

1

313.6

33.5

0.0001

Treatment * Sex

521.9

2

260.9

27.9

0.0001

Error or Residual

224.4

24

9.4

Total

1131.4

29

Notice that the overall test is significant (F=19.4, p=0.0001), there is a significant treatment effect, sex effect and a highly significant interaction effect. The table below contains the mean times to relief in each of the treatments for men and women.  

Table - Mean Time to Pain Relief by Treatment and Gender - Clinical Site 2

24.8

21.4

17.4

23.2

15.4

32.4

Notice that now the differences in mean time to pain relief among the treatments depend on sex. Among men, the mean time to pain relief is highest in Treatment A and lowest in Treatment C. Among women, the reverse is true. This is an interaction effect (see below).  

Graphic display of the results in the preceding table

Notice above that the treatment effect varies depending on sex. Thus, we cannot summarize an overall treatment effect (in men, treatment C is best, in women, treatment A is best).    

When interaction effects are present, some investigators do not examine main effects (i.e., do not test for treatment effect because the effect of treatment depends on sex). This issue is complex and is discussed in more detail in a later module. 

13.4 Test of Two Variances

Another use of the F distribution is testing two variances. It is often desirable to compare two variances rather than two averages. For instance, college administrators would like two college professors grading exams to have the same variation in their grading. For a lid to fit a container, the variation in the lid and the container should be the same. A supermarket might be interested in the variability of check-out times for two checkers.

To perform a F test of two variances, it is important that the following are true:

  • The populations from which the two samples are drawn are normally distributed.
  • The two populations are independent of each other.

Unlike most other tests in this book, the F test for equality of two variances is very sensitive to deviations from normality. If the two distributions are not normal, the test can give higher p -values than it should, or lower ones, in ways that are unpredictable. Many texts suggest that students not use this test at all, but in the interest of completeness we include it here.

Suppose we sample randomly from two independent normal populations. Let σ 1 2 σ 1 2 and σ 2 2 σ 2 2 be the population variances and s 1 2 s 1 2 and s 2 2 s 2 2 be the sample variances. Let the sample sizes be n 1 and n 2 . Since we are interested in comparing the two sample variances, we use the F ratio

F = [ ( s 1 ) 2 ( σ 1 ) 2 ] [ ( s 2 ) 2 ( σ 2 ) 2 ] . F = [ ( s 1 ) 2 ( σ 1 ) 2 ] [ ( s 2 ) 2 ( σ 2 ) 2 ] .

F has the distribution F ~ F ( n 1 – 1, n 2 – 1),

where n 1 – 1 are the degrees of freedom for the numerator and n 2 – 1 are the degrees of freedom for the denominator.

If the null hypothesis is σ 1 2 = σ 2 2 σ 1 2 = σ 2 2 , then the F ratio becomes F = [ ( s 1 ) 2 ( σ 1 ) 2 ] [ ( s 2 ) 2 ( σ 2 ) 2 ] = ( s 1 ) 2 ( s 2 ) 2 F = [ ( s 1 ) 2 ( σ 1 ) 2 ] [ ( s 2 ) 2 ( σ 2 ) 2 ] = ( s 1 ) 2 ( s 2 ) 2 .

The F ratio could also be ( s 2 ) 2 ( s 1 ) 2 ( s 2 ) 2 ( s 1 ) 2 . It depends on H a and on which sample variance is larger.

If the two populations have equal variances, then s 1 2 s 1 2 and s 2 2 s 2 2 are close in value and F = ( s 1 ) 2 ( s 2 ) 2 F = ( s 1 ) 2 ( s 2 ) 2 is close to 1. But if the two population variances are very different, s 1 2 s 1 2 and s 2 2 s 2 2 tend to be very different, too. Choosing s 1 2 s 1 2 as the larger sample variance causes the ratio ( s 1 ) 2 ( s 2 ) 2 ( s 1 ) 2 ( s 2 ) 2 to be greater than 1. If s 1 2 s 1 2 and s 2 2 s 2 2 are far apart, then F = ( s 1 ) 2 ( s 2 ) 2 F = ( s 1 ) 2 ( s 2 ) 2 is a large number.

Therefore, if F is close to 1, the evidence favors the null hypothesis (the two population variances are equal). But if F is much larger than 1, then the evidence is against the null hypothesis. A test of two variances may be left-tailed, right-tailed, or two-tailed.

Example 13.5

Two college instructors are interested in whethe there is any variation in the way they grade math exams. They each grade the same set of 30 exams. The first instructor’s grades have a variance of 52.3. The second instructor’s grades have a variance of 89.9. Test the claim that the first instructor’s variance is smaller. In most colleges, it is desirable for the variances of exam grades to be nearly the same among instructors. The level of significance is 10 percent.

Let 1 and 2 be the subscripts that indicate the first and second instructor, respectively.

n 1 = n 2 = 30.

H 0 : σ 1 2 = σ 2 2 σ 1 2 = σ 2 2 and H a : σ 1 2  <  σ 2 2 σ 1 2  <  σ 2 2 .

Calculate the test statistic: By the null hypothesis ( σ 1 2  =  σ 2 2 ) ( σ 1 2  =  σ 2 2 ) , the F statistic is

F = [ ( s 1 ) 2 ( σ 1 ) 2 ] [ ( s 2 ) 2 ( σ 2 ) 2 ] = ( s 1 ) 2 ( s 2 ) 2 = 52.3 89.9 = 0.5818. F = [ ( s 1 ) 2 ( σ 1 ) 2 ] [ ( s 2 ) 2 ( σ 2 ) 2 ] = ( s 1 ) 2 ( s 2 ) 2 = 52.3 89.9 = 0.5818.

Distribution for the test: F 29,29 where n 1 – 1 = 29 and n 2 – 1 = 29.

Graph: This test is left-tailed.

Draw the graph, labeling and shading appropriately.

Probability statement: p -value = P ( F < 0.5818) = 0.0753.

Compare α and the p -value: α = 0.10; α > p -value.

Make a decision: Since α > p -value, reject H 0 .

Conclusion: With a 10 percent level of significance from the data, there is sufficient evidence to conclude that the variance in grades for the first instructor is smaller.

Using the TI-83, 83+, 84, 84+ Calculator

Press STAT and arrow over to TESTS . Arrow down to D:2-SampFTest . Press ENTER . Arrow to Stats and press ENTER . For Sx1 , n1 , Sx2 , and n2 , enter ( 52.3 ) ( 52.3 ) , 30 , ( 89.9 ) ( 89.9 ) , and 30 . Press ENTER after each. Arrow to σ1: and < σ2 . Press ENTER . Arrow down to Calculate and press ENTER . F = 0.5818 and p -value = 0.0753. Do the procedure again and try Draw instead of Calculate .

Try It 13.5

The New York Choral Society divides male singers into four categories from highest voices to lowest: Tenor1, Tenor2, Bass1, and Bass2. In the table are heights of the men in the Tenor1 and Bass2 groups. One suspects that taller men will have lower voices, and that the variance of height may go up with the lower voices as well. Do we have good evidence that the variance of the heights of singers in each of these two groups (Tenor1 and Bass2) are different?

Tenor1 Bass2 Tenor1 Bass2 Tenor1 Bass2
69 72 67 72 68 67
72 75 70 74 67 70
71 67 65 70 64 70
66 75 72 66 69
76 74 70 68 72
74 72 68 75 71
71 72 64 68 74
66 74 73 70 75
68 72 66 72

As an Amazon Associate we earn from qualifying purchases.

This book may not be used in the training of large language models or otherwise be ingested into large language models or generative AI offerings without OpenStax's permission.

Want to cite, share, or modify this book? This book uses the Creative Commons Attribution License and you must attribute Texas Education Agency (TEA). The original material is available at: https://www.texasgateway.org/book/tea-statistics . Changes were made to the original material, including updates to art, structure, and other content updates.

Access for free at https://openstax.org/books/statistics/pages/1-introduction
  • Authors: Barbara Illowsky, Susan Dean
  • Publisher/website: OpenStax
  • Book title: Statistics
  • Publication date: Mar 27, 2020
  • Location: Houston, Texas
  • Book URL: https://openstax.org/books/statistics/pages/1-introduction
  • Section URL: https://openstax.org/books/statistics/pages/13-4-test-of-two-variances

© Jan 23, 2024 Texas Education Agency (TEA). The OpenStax name, OpenStax logo, OpenStax book covers, OpenStax CNX name, and OpenStax CNX logo are not subject to the Creative Commons license and may not be reproduced without the prior and express written consent of Rice University.

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Hypothesis Testing | A Step-by-Step Guide with Easy Examples

Published on November 8, 2019 by Rebecca Bevans . Revised on June 22, 2023.

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics . It is most often used by scientists to test specific predictions, called hypotheses, that arise from theories.

There are 5 main steps in hypothesis testing:

  • State your research hypothesis as a null hypothesis and alternate hypothesis (H o ) and (H a  or H 1 ).
  • Collect data in a way designed to test the hypothesis.
  • Perform an appropriate statistical test .
  • Decide whether to reject or fail to reject your null hypothesis.
  • Present the findings in your results and discussion section.

Though the specific details might vary, the procedure you will use when testing a hypothesis will always follow some version of these steps.

Table of contents

Step 1: state your null and alternate hypothesis, step 2: collect data, step 3: perform a statistical test, step 4: decide whether to reject or fail to reject your null hypothesis, step 5: present your findings, other interesting articles, frequently asked questions about hypothesis testing.

After developing your initial research hypothesis (the prediction that you want to investigate), it is important to restate it as a null (H o ) and alternate (H a ) hypothesis so that you can test it mathematically.

The alternate hypothesis is usually your initial hypothesis that predicts a relationship between variables. The null hypothesis is a prediction of no relationship between the variables you are interested in.

  • H 0 : Men are, on average, not taller than women. H a : Men are, on average, taller than women.

Prevent plagiarism. Run a free check.

For a statistical test to be valid , it is important to perform sampling and collect data in a way that is designed to test your hypothesis. If your data are not representative, then you cannot make statistical inferences about the population you are interested in.

There are a variety of statistical tests available, but they are all based on the comparison of within-group variance (how spread out the data is within a category) versus between-group variance (how different the categories are from one another).

If the between-group variance is large enough that there is little or no overlap between groups, then your statistical test will reflect that by showing a low p -value . This means it is unlikely that the differences between these groups came about by chance.

Alternatively, if there is high within-group variance and low between-group variance, then your statistical test will reflect that with a high p -value. This means it is likely that any difference you measure between groups is due to chance.

Your choice of statistical test will be based on the type of variables and the level of measurement of your collected data .

  • an estimate of the difference in average height between the two groups.
  • a p -value showing how likely you are to see this difference if the null hypothesis of no difference is true.

Based on the outcome of your statistical test, you will have to decide whether to reject or fail to reject your null hypothesis.

In most cases you will use the p -value generated by your statistical test to guide your decision. And in most cases, your predetermined level of significance for rejecting the null hypothesis will be 0.05 – that is, when there is a less than 5% chance that you would see these results if the null hypothesis were true.

In some cases, researchers choose a more conservative level of significance, such as 0.01 (1%). This minimizes the risk of incorrectly rejecting the null hypothesis ( Type I error ).

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

The results of hypothesis testing will be presented in the results and discussion sections of your research paper , dissertation or thesis .

In the results section you should give a brief summary of the data and a summary of the results of your statistical test (for example, the estimated difference between group means and associated p -value). In the discussion , you can discuss whether your initial hypothesis was supported by your results or not.

In the formal language of hypothesis testing, we talk about rejecting or failing to reject the null hypothesis. You will probably be asked to do this in your statistics assignments.

However, when presenting research results in academic papers we rarely talk this way. Instead, we go back to our alternate hypothesis (in this case, the hypothesis that men are on average taller than women) and state whether the result of our test did or did not support the alternate hypothesis.

If your null hypothesis was rejected, this result is interpreted as “supported the alternate hypothesis.”

These are superficial differences; you can see that they mean the same thing.

You might notice that we don’t say that we reject or fail to reject the alternate hypothesis . This is because hypothesis testing is not designed to prove or disprove anything. It is only designed to test whether a pattern we measure could have arisen spuriously, or by chance.

If we reject the null hypothesis based on our research (i.e., we find that it is unlikely that the pattern arose by chance), then we can say our test lends support to our hypothesis . But if the pattern does not pass our decision rule, meaning that it could have arisen by chance, then we say the test is inconsistent with our hypothesis .

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Normal distribution
  • Descriptive statistics
  • Measures of central tendency
  • Correlation coefficient

Methodology

  • Cluster sampling
  • Stratified sampling
  • Types of interviews
  • Cohort study
  • Thematic analysis

Research bias

  • Implicit bias
  • Cognitive bias
  • Survivorship bias
  • Availability heuristic
  • Nonresponse bias
  • Regression to the mean

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is used by scientists to test specific predictions, called hypotheses , by calculating how likely it is that a pattern or relationship between variables could have arisen by chance.

A hypothesis states your predictions about what your research will find. It is a tentative answer to your research question that has not yet been tested. For some research projects, you might have to write several hypotheses that address different aspects of your research question.

A hypothesis is not just a guess — it should be based on existing theories and knowledge. It also has to be testable, which means you can support or refute it through scientific research methods (such as experiments, observations and statistical analysis of data).

Null and alternative hypotheses are used in statistical hypothesis testing . The null hypothesis of a test always predicts no effect or no relationship between variables, while the alternative hypothesis states your research prediction of an effect or relationship.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bevans, R. (2023, June 22). Hypothesis Testing | A Step-by-Step Guide with Easy Examples. Scribbr. Retrieved June 9, 2024, from https://www.scribbr.com/statistics/hypothesis-testing/

Is this article helpful?

Rebecca Bevans

Rebecca Bevans

Other students also liked, choosing the right statistical test | types & examples, understanding p values | definition and examples, what is your plagiarism score.

Module 13: F-Distribution and One-Way ANOVA

Test of two variances, learning outcomes.

  • Conduct and interpret hypothesis tests of two variances

Another of the uses of the F distribution is testing two variances. It is often desirable to compare two variances rather than two averages. For instance, college administrators would like two college professors grading exams to have the same variation in their grading. In order for a lid to fit a container, the variation in the lid and the container should be the same. A supermarket might be interested in the variability of check-out times for two checkers.

In order to perform a F test of two variances, it is important that the following are true: The populations from which the two samples are drawn are normally distributed. The two populations are independent of each other.

Unlike most other tests in this book, the F test for equality of two variances is very sensitive to deviations from normality. If the two distributions are not normal, the test can give higher p -values than it should, or lower ones, in ways that are unpredictable. Many texts suggest that students not use this test at all, but in the interest of completeness we include it here.

Suppose we sample randomly from two independent normal populations. Let [latex]\displaystyle{{\sigma}_{{1}}}^{{2}},{{\sigma}_{{2}}}^{{2}}[/latex] be the sample variances. Let the sample sizes be n 1 and n 2 . Since we are interested in comparing the two sample variances, we use the F ratio:

[latex]\displaystyle{F}=\frac{{{\left[\frac{{({s}{1})}^{{2}}}{{(\sigma_{1})}^{{2}}}\right]}}}{{{\left[\frac{{({s}{2})}^{{2}}}{{(\sigma_{2})}^{{2}}}\right]}}}[/latex]

F has the distribution F ~ F ( n 1 – 1, n 2 – 1)

where n 1 – 1 are the degrees of freedom for the numerator and n 2 – 1 are the degrees of freedom for the denominator.

If the null hypothesis is [latex]\displaystyle{\sigma_{{1}}^{{2}}}={\sigma_{{2}}^{{2}}}[/latex] then the F Ratio becomes [latex]\displaystyle{F}=\frac{{{\left[\frac{{({s}{1})}^{{2}}}{{(\sigma{1})}^{{2}}}\right]}}}{{{\left[\frac{{({s}{2})}^{{2}}}{{(\sigma{2})}^{{2}}}\right]}}}[/latex] = [latex]\displaystyle\frac{{({s}_{1})}^{{2}}}{{({s}_{2})}^{{2}}}[/latex]

The F ratio could also be[latex]\displaystyle\frac{{({s}_{1})}^{{2}}}{{({s}_{2})}^{{2}}}[/latex]. It depends on H a and on which sample variance is larger. If the two populations have equal variances, then [latex]\displaystyle\sigma_{{1}}^{{2}},\sigma_{{2}}^{{2}}[/latex] are close in value and F =[latex]\displaystyle\frac{{({s}_{1})}^{{2}}}{{({s}_{2})}^{{2}}}[/latex] is close to one. But if the two population variances are very different,[latex]\displaystyle\sigma_{{1}}^{{2}},\sigma_{{2}}^{{2}}[/latex]tend to be very different, too. Choosing[latex]\displaystyle\sigma_{{1}}^{{2}}[/latex] as the larger sample variance causes the ratio ( s 1 ) 2 ( s 2 ) 2 to be greater than one. If [latex]\displaystyle\sigma_{{1}}^{{2}},\sigma_{{2}}^{{2}}[/latex] are far apart, then F =[latex]\displaystyle\frac{{({s}_{1})}^{{2}}}{{({s}_{2})}^{{2}}}[/latex]is a large number.

Therefore, if F is close to one, the evidence favors the null hypothesis (the two population variances are equal). But if F is much larger than one, then the evidence is against the null hypothesis. A test of two variances may be left, right, or two-tailed.

Two college instructors are interested in whether or not there is any variation in the way they grade math exams. They each grade the same set of 30 exams. The first instructor’s grades have a variance of 52.3. The second instructor’s grades have a variance of 89.9. Test the claim that the first instructor’s variance is smaller. (In most colleges, it is desirable for the variances of exam grades to be nearly the same among instructors.) The level of significance is 10%.

[latex]\displaystyle{F}=\frac{{{\left[\frac{{({s}{1})}^{{2}}}{{(\sigma{1})}^{{2}}}\right]}}}{{{\left[\frac{{({s}{2})}^{{2}}}{{(\sigma{2})}^{{2}}}\right]}}}[/latex] =[latex]\displaystyle\frac{{({s}_{1})}^{{2}}}{{({s}_{2})}^{{2}}}[/latex]=[latex]\frac{{52.3}}{{89.9}}={0.5818}[/latex]

Distribution for the test: F 29,29 where n 1 – 1 = 29 and n 2 – 1 = 29.

Graph: This test is left tailed.

Draw the graph labeling and shading appropriately.

Graph of left-tailed test with p-value = 0.0753 shaded.

The New York Choral Society divides male singers up into four categories from highest voices to lowest: Tenor1, Tenor2, Bass1, Bass2. In the table are heights of the men in the Tenor1 and Bass2 groups. One suspects that taller men will have lower voices, and that the variance of height may go up with the lower voices as well. Do we have good evidence that the variance of the heights of singers in each of these two groups (Tenor1 and Bass2) are different?

69  72  71  74  75

Tenor1 Bass2 Tenor 1 Bass 2 Tenor 1 Bass 2
69 72 67 72 68 67
72 75 70 74 67 70
71 67 65 70 64 70
66 75 72 66
76 74 70 68
74 72 68 75
71 72 64 68
66 74 73 70
68 72 66 72

The histograms are not as normal as one might like. Plot them to verify. However, we proceed with the test in any case.

Subscripts: T1= tenor1 and B2 = bass 2

The standard deviations of the samples are s T 1 = 3.3302 and s B 2 = 2.7208.

The hypotheses are

[latex]\displaystyle{H}_{{o}}:{\sigma}_{{T1}}^{{2}}={\sigma}_{{B2}}^{{2}}[/latex] and [latex]\displaystyle{H}_{{o}}:{\sigma}_{{T1}}^{{2}}\neq{\sigma}_{{B2}}^{{2}}[/latex] (two tailed test)

The F statistic is 1.4894 with 20 and 25 degrees of freedom.

The p -value is 0.3430. If we assume alpha is 0.05, then we cannot reject the null hypothesis.

We have no good evidence from the data that the heights of Tenor1 and Bass2 singers have different variances (despite there being a significant difference in mean heights of about 2.5 inches.)

  • Introductory Statistics, Test of Two Variances. Located at : http://cnx.org/contents/[email protected] . License : CC BY: Attribution
  • Introductory Statistics . Authored by : Barbara Illowski, Susan Dean. Provided by : Open Stax. Located at : http://cnx.org/contents/[email protected] . License : CC BY: Attribution . License Terms : Download for free at http://cnx.org/contents/[email protected]

Chi-Square test for One Pop. Variance

Instructions: This calculator conducts a Chi-Square test for one population variance (\(\sigma^2\)). Please select the null and alternative hypotheses, type the hypothesized variance, the significance level, the sample variance, and the sample size, and the results of the Chi-Square test will be presented for you:

population variance hypothesis test

Chi-Square test for One Population Variance

More about the Chi-Square test for one variance so you can better understand the results provided by this solver: A Chi-Square test for one population variance is a hypothesis that attempts to make a claim about the population variance (\(\sigma^2\)) based on sample information.

Main Properties of the Chi-Square Distribution

The test, as every other well formed hypothesis test, has two non-overlapping hypotheses, the null and the alternative hypothesis. The null hypothesis is a statement about the population variance which represents the assumption of no effect, and the alternative hypothesis is the complementary hypothesis to the null hypothesis.

The main properties of a one sample Chi-Square test for one population variance are:

  • The distribution of the test statistic is the Chi-Square distribution, with n-1 degrees of freedom
  • The Chi-Square distribution is one of the most important distributions in statistics, together with the normal distribution and the F-distribution
  • Depending on our knowledge about the "no effect" situation, the Chi-Square test can be two-tailed, left-tailed or right-tailed
  • The main principle of hypothesis testing is that the null hypothesis is rejected if the test statistic obtained is sufficiently unlikely under the assumption that the null hypothesis is true
  • The p-value is the probability of obtaining sample results as extreme or more extreme than the sample results obtained, under the assumption that the null hypothesis is true
  • In a hypothesis tests there are two types of errors. Type I error occurs when we reject a true null hypothesis, and the Type II error occurs when we fail to reject a false null hypothesis

Chi-Square test for one variance

Can you use Chi-square for one variable?

Absolutely! The Chi-Square statistics is a very versatile statistics, that can be used for a one-way situation (one variable) for example for testing for one variance, or for a goodness of fit test .

But it can also be used for a two-way situation (two variables) for example for a Chi-Square test of independence .

How do you do hypothesis test for single population variance?

The sample variance \(s^2\) has some very interesting distributional properties. In fact, based on how the variance is constructed, we can think of the variance as the sum of pieces that have a standard normal distribution but they are squared.

Without getting into much detail, the sum of squared standard normal distributions is tightly related to the Chi-Square distribution, as we will see in the next section.

What is the Chi-Square Formula?

The formula for a Chi-Square statistic for testing for one population variance is

The null hypothesis is rejected when the Chi-Square statistic lies on the rejection region, which is determined by the significance level (\(\alpha\)) and the type of tail (two-tailed, left-tailed or right-tailed).

To compute critical values directly, please go to our Chi-Square critical values calculator

Related Calculators

Descriptive Statistics Calculator of Grouped Data

log in to your account

Reset password.

Stack Exchange Network

Stack Exchange network consists of 183 Q&A communities including Stack Overflow , the largest, most trusted online community for developers to learn, share their knowledge, and build their careers.

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Hypothesis testing for the variance of a population

Someone is trying to introduce a new process in the production of a precision instrument for industrial use. The new process keeps the average weight but hopes to reduce the variability, which until now has been characterized by $\sigma^2 = 14.5$ . Because the complete introduction of the new process has costs, a test has been done and 16 instruments have been produced with this new method. For $\alpha = 0.05$ and knowing that the sample variance $s^2 > = 6.8$ , what is the decision to take? Suppose that the universe can be considered approximately normal.

I have the population variance, so I can use the normal distribution. My hypothesis is

$$H_0 : \sigma ^2 = 14.5$$

The test value:

$$X^2_0 = \frac{15*6.8^2}{14.5^2}) = 3.2989 $$

$$X^2_{\alpha,n-1} = X^2_{0.05,15} = 25.00$$

$X^2_0 > X^2_{\alpha,n-1}$ is false, so I fail to reject H_0?

  • probability
  • hypothesis-testing

RobPratt's user avatar

  • $\begingroup$ Some difficulties here, also mentioned in my Answer. (1) No alternative hypothesis is stated. Important to know if it's one-sided or two-sided. (2) And why do you write $6.8^2$ when $S^2 = 6.8$ is already a sample variance? (3) You need to clarify your notation for numbers associated with a chi-squared distribution. $\endgroup$ –  BruceET May 27, 2020 at 3:03
  • $\begingroup$ @BruceET The OP may have failed to state an alternative hypothesis, but I think it is clear from the quoted question what it is supposed to be. $\endgroup$ –  StubbornAtom May 27, 2020 at 6:52
  • 1 $\begingroup$ @StubbornAtom. You're probably right. But in my experience, if engineers can make a process better they can (often do) make it worse. But then I suppose the data would be hidden rather than analyzed. $\endgroup$ –  BruceET May 27, 2020 at 7:25
  • $\begingroup$ @BruceET "But in my experience, if engineers can make a process better they can (often do) make it worse. " sounds like my statistics professor. He's an engineer. $\endgroup$ –  Segmentation fault May 28, 2020 at 12:33

2 Answers 2

"Suppose that the universe can be considered approximately normal."

If you are willing to make the assumption that the underlying data are IID normal then you can derive a confidence interval from the pivotal quantity $(n-1) S^2/\sigma^2 \sim \text{ChiSq}(n-1)$ . The other answer by BruceET shows you how to do this, and this will solve your immediate question.

However, if you want to learn sampling theory properly, I strongly recommend you go beyond the immediate question here, and think about what you would need to do if you are not willing to make the (ridiculous) assumption that "the universe is approximately normal". You can find a more general discussion of moment results in sampling theory, and a solution to this broader problem, in O'Neill (2014) . The main thing to note is that the variance of the sample variance depends on the kurtosis of the underlying population distribution. If you assume a normal distribution then you are assuming knowledge of the kurtosis of the population, without reference to the actual data. This means that you will often over or underestimate the variability of the sample variance and get bad results for confidence intervals for the true variance.

As a general rule, if you want to look at the true moments of sample moments, you need to add their orders to find out how many moments of the underlying population affect this. Thus, determination of the variance (second order) of the sample variance (second order) requires knowledge up to the fourth moment of the underlying population distribution. It is generally a bad idea to ignore the data and instead assume specific values for underlying moments that affect the problem at hand. Thus, when formulating a confidence interval for the true variance of the population, it is generally a bad idea to assume normality of the population, since that assumes a specific value for the kurtosis, which affects the variance of the sample variance.

Ben's user avatar

  • 1 $\begingroup$ Good to have this (+1). // I have just added some reminders of the normal assumption to my Answer. (Also fixed some typos.) $\endgroup$ –  BruceET May 27, 2020 at 2:58

Because $\frac{(n-1)S^2}{\sigma^2} \sim \mathsf{Chisq}(\nu=n-1),$ a 95% confidence interval (CI) for $\sigma^2$ based on a sample variance $S^2$ is of the form $\left(\frac{(n-1)S^2}{U},\, \frac{(n-1)S^2}{U}\right),$ where $L$ and $U$ cut probability $0.025$ from the lower and upper tails of $\mathsf{Chisq}(\nu=n-1),$ respectively. [This confidence interval is based on the assumption that weights are normal.]

Thus a 95% CI for $\sigma^2$ based on $S^2 = 6.8$ for a normal sample of size $n = 16$ is $(3.71, 16.29).$ This CI represents non-rejectable values of $\sigma_0^2.$ [The computation is done below using R statistical software, but you can verify it using printed tables of the chi-squared distribution.]

Because $\sigma_0^2 = 14.5$ lies in the CI so you cannot reject $H_0: \sigma^2 = 14.5$ against $H_a: \sigma^2 \ne 14.5$ at the 5% level.

Can you find critical values for the test statistic $Q = \frac{(n-1)S^2}{\sigma_0^2},$ beyond which the null hypothesis would be rejected (in either direction)?

Notes: There is some confusion in your question: (1) You do not state null and alternative hypotheses, so it is not clear whether you are doing a one- or two-sided test. (2) You seem to be mixing up sample standard deviation $S$ with $S^2$ ---you are squaring $S^2.$

If you want a one-sided test, you need a one-sided CI, which gives an upper bound $(-\infty, 14.05),$ which does not contain $14.5.$

In this case, can you find a critical value for the test statistic $Q =\frac{(n-1)S^2}{\sigma_0^2},$ below which the null hypothesis $H_0: \sigma^2 = 14.5$ would be rejected in favor of the one-sided alternative $H_a: \mu < 14.5?$

BruceET's user avatar

You must log in to answer this question.

Not the answer you're looking for browse other questions tagged probability statistics hypothesis-testing ., hot network questions.

  • A trigonometric equation: how hard could it be?
  • Is this a valid PZN?
  • What terminal did David connect to his IMSAI 8080?
  • Need intuition for dice betting problem
  • Science fiction book about a world where bioengineered animals are used for common functions
  • How might a physicist define 'mind' using concepts of physics?
  • How do I snap the edges of hex tiles together?
  • I'm looking for a series where there was a civilization in the Mediterranean basin, which got destroyed by the Atlantic breaking in
  • Estimating Probability Density for Sample
  • Converting a 3d model into a square-dominant mesh with same sized faces
  • On a planet with 6 moons, how often would all 6 be full at the same time?
  • Why does the proposed Lunar Crater Radio Telescope suggest an optimal latitude of 20 degrees North?
  • In Catholicism, is it OK to join a church in a different neighborhood if I don't like the local one?
  • What should I get paid for if I can't work due to circumstances outside of my control?
  • How can I hang heavy bikes under a thick wooden shelf?
  • What role does CaCl2 play in a gelation medium?
  • Times New Roman Ligatures are not working in Overleaf
  • Handling cases of "potential" ChatGPT-generated reviews in non-anonymous program committees (as a PC member)
  • Is it theoretically possible for the sun to go dark?
  • Are there any jobs that are forbidden by law to convicted felons?
  • Complexity of definable global choice functions
  • How to remind myself of important matters in the heat of running the game?
  • \ifnum to draw a tikzpicture, less than or equals
  • Can I expect to find taxis at Kunming Changshui Airport at 2 am?

population variance hypothesis test

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons

Margin Size

  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Statistics LibreTexts

9.2: Hypothesis Testing

  • Last updated
  • Save as PDF
  • Page ID 20066

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

All hypotheses tests have the same basic steps:

  • Determine the hypothesis : What are we trying to figure out? This is formally written as the null and alternative hypotheses.
  • Calculate the evidence : This will be a test statistics and either a critical value or a p-value.
  • Make a decision : The options will be Reject the Null Hypothesis or Do not Reject the Null Hypothesis.
  • Determine the conclusion : What does the decision mean in terms of the problem given?

Null and Alternative Hypotheses

The actual test begins by considering two hypotheses . They are called the null hypothesis and the alternative hypothesis . These hypotheses contain opposing viewpoints.

\(H_0\): The null hypothesis: It is a statement of no difference between the variables—they are not related. This can often be considered the status quo and as a result if you cannot accept the null it requires some action.

\(H_a\): The alternative hypothesis: It is a claim about the population that is contradictory to \(H_0\) and what we conclude when we reject \(H_0\). This is usually what the researcher is trying to prove.

Since the null and alternative hypotheses are contradictory, you must examine evidence to decide if you have enough evidence to reject the null hypothesis or not. The evidence is in the form of sample data.

After you have determined which hypothesis the sample supports, you make a decision. There are two options for a decision. They are "reject \(H_0\)" if the sample information favors the alternative hypothesis or "do not reject \(H_0\)" or "decline to reject \(H_0\)" if the sample information is insufficient to reject the null hypothesis.

Table \(\PageIndex{1}\): Mathematical Symbols Used in \(H_{0}\) and \(H_{a}\):
equal (=) not equal \((\neq)\) greater than (>) less than (<)
greater than or equal to \((\geq)\) less than (<)
less than or equal to \((\geq)\) more than (>)

\(H_{0}\) always has a symbol with an equal in it. \(H_{a}\) never has a symbol with an equal in it. The choice of symbol depends on the wording of the hypothesis test. However, be aware that many researchers (including one of the co-authors in research work) use = in the null hypothesis, even with > or < as the symbol in the alternative hypothesis. This practice is acceptable because we only make the decision to reject or not reject the null hypothesis.

Example \(\PageIndex{1}\)

  • \(H_{0}\): No more than 30% of the registered voters in Santa Clara County voted in the primary election. \(p \leq 30\)
  • \(H_{a}\): More than 30% of the registered voters in Santa Clara County voted in the primary election. \(p > 30\)

Exercise \(\PageIndex{1}\)

A medical trial is conducted to test whether or not a new medicine reduces cholesterol by 25%. State the null and alternative hypotheses.

  • \(H_{0}\): The drug reduces cholesterol by 25%. \(p = 0.25\)
  • \(H_{a}\): The drug does not reduce cholesterol by 25%. \(p \neq 0.25\)

Example \(\PageIndex{2}\)

We want to test whether the mean GPA of students in American colleges is different from 2.0 (out of 4.0). The null and alternative hypotheses are:

  • \(H_{0}: \mu = 2.0\)
  • \(H_{a}: \mu \neq 2.0\)

Exercise \(\PageIndex{2}\)

We want to test whether the mean height of eighth graders is 66 inches. State the null and alternative hypotheses. Fill in the correct symbol \((=, \neq, \geq, <, \leq, >)\) for the null and alternative hypotheses.

  • \(H_{0}: \mu_ 66\)
  • \(H_{a}: \mu_ 66\)
  • \(H_{0}: \mu = 66\)
  • \(H_{a}: \mu \neq 66\)

Example \(\PageIndex{3}\)

We want to test if college students take less than five years to graduate from college, on the average. The null and alternative hypotheses are:

  • \(H_{0}: \mu \geq 66\)
  • \(H_{a}: \mu < 66\)

Exercise \(\PageIndex{3}\)

We want to test if it takes fewer than 45 minutes to teach a lesson plan. State the null and alternative hypotheses. Fill in the correct symbol ( =, ≠, ≥, <, ≤, >) for the null and alternative hypotheses.

  • \(H_{0}: \mu_ 45\)
  • \(H_{a}: \mu_ 45\)
  • \(H_{0}: \mu \geq 45\)
  • \(H_{a}: \mu < 45\)

Example \(\PageIndex{4}\)

In an issue of U. S. News and World Report , an article on school standards stated that about half of all students in France, Germany, and Israel take advanced placement exams and a third pass. The same article stated that 6.6% of U.S. students take advanced placement exams and 4.4% pass. Test if the percentage of U.S. students who take advanced placement exams is more than 6.6%. State the null and alternative hypotheses.

  • \(H_{0}: p \leq 0.066\)
  • \(H_{a}: p > 0.066\)

Exercise \(\PageIndex{4}\)

On a state driver’s test, about 40% pass the test on the first try. We want to test if more than 40% pass on the first try. Fill in the correct symbol (\(=, \neq, \geq, <, \leq, >\)) for the null and alternative hypotheses.

  • \(H_{0}: p_ 0.40\)
  • \(H_{a}: p_ 0.40\)
  • \(H_{0}: p = 0.40\)
  • \(H_{a}: p > 0.40\)

COLLABORATIVE EXERCISE

Bring to class a newspaper, some news magazines, and some Internet articles . In groups, find articles from which your group can write null and alternative hypotheses. Discuss your hypotheses with the rest of the class.

Outcomes and the Type I and Type II Errors

When you perform a hypothesis test, there are four possible outcomes depending on the actual truth (or falseness) of the null hypothesis \(H_{0}\) and the decision to reject or not. The outcomes are summarized in the following table:

Do not reject \(H_{0}\) Correct Outcome Type II error
Reject \(H_{0}\) Type I Error Correct Outcome

The four possible outcomes in the table are:

  • The decision is not to reject \(H_{0}\) when \(H_{0}\) is true (correct decision).
  • The decision is to reject \(H_{0}\) when \(H_{0}\) is true (incorrect decision known as aType I error).
  • The decision is not to reject \(H_{0}\) when, in fact, \(H_{0}\) is false (incorrect decision known as a Type II error).
  • The decision is to reject \(H_{0}\) when \(H_{0}\) is false ( correct decision whose probability is called the Power of the Test ).

Each of the errors occurs with a particular probability. The Greek letters \(\alpha\) and \(\beta\) represent the probabilities.

  • \(\alpha =\) probability of a Type I error \(= P(\text{Type I error}) =\) probability of rejecting the null hypothesis when the null hypothesis is true.
  • \(\beta =\) probability of a Type II error \(= P(\text{Type II error}) =\) probability of not rejecting the null hypothesis when the null hypothesis is false.

\(\alpha\) and \(\beta\) should be as small as possible because they are probabilities of errors. They are rarely zero.

The Power of the Test is \(1 - \beta\). Ideally, we want a high power that is as close to one as possible. Increasing the sample size can increase the Power of the Test. The following are examples of Type I and Type II errors.

Example \(\PageIndex{5}\): Type I vs. Type II errors

Suppose the null hypothesis, \(H_{0}\), is: Frank's rock climbing equipment is safe.

  • Type I error : Frank thinks that his rock climbing equipment may not be safe when, in fact, it really is safe.
  • Type II error : Frank thinks that his rock climbing equipment may be safe when, in fact, it is not safe.

\(\alpha =\) probability that Frank thinks his rock climbing equipment may not be safe when, in fact, it really is safe.

\(\beta =\) probability that Frank thinks his rock climbing equipment may be safe when, in fact, it is not safe.

Notice that, in this case, the error with the greater consequence is the Type II error. (If Frank thinks his rock climbing equipment is safe, he will go ahead and use it.)

Exercise \(\PageIndex{5}\)

Suppose the null hypothesis, \(H_{0}\), is: the blood cultures contain no traces of pathogen \(X\). State the Type I and Type II errors.

  • Type I error : The researcher thinks the blood cultures do contain traces of pathogen \(X\), when in fact, they do not.
  • Type II error : The researcher thinks the blood cultures do not contain traces of pathogen \(X\), when in fact, they do.

Example \(\PageIndex{6}\)

Suppose the null hypothesis, \(H_{0}\), is: The victim of an automobile accident is alive when he arrives at the emergency room of a hospital.

  • Type I error : The emergency crew thinks that the victim is dead when, in fact, the victim is alive.
  • Type II error : The emergency crew does not know if the victim is alive when, in fact, the victim is dead.

\(\alpha =\) probability that the emergency crew thinks the victim is dead when, in fact, he is really alive \(= P(\text{Type I error})\).

\(\beta =\) probability that the emergency crew does not know if the victim is alive when, in fact, the victim is dead \(= P(\text{Type II error})\).

The error with the greater consequence is the Type I error. (If the emergency crew thinks the victim is dead, they will not treat him.)

Exercise \(\PageIndex{6}\)

Suppose the null hypothesis, \(H_{0}\), is: a patient is not sick. Which type of error has the greater consequence, Type I or Type II?

The error with the greater consequence is the Type II error: the patient will be thought well when, in fact, he is sick, so he will not get treatment.

Example \(\PageIndex{7}\)

It’s a Boy Genetic Labs claim to be able to increase the likelihood that a pregnancy will result in a boy being born. Statisticians want to test the claim. Suppose that the null hypothesis, \(H_{0}\), is: It’s a Boy Genetic Labs has no effect on gender outcome.

  • Type I error : This results when a true null hypothesis is rejected. In the context of this scenario, we would state that we believe that It’s a Boy Genetic Labs influences the gender outcome, when in fact it has no effect. The probability of this error occurring is denoted by the Greek letter alpha, \(\alpha\).
  • Type II error : This results when we fail to reject a false null hypothesis. In context, we would state that It’s a Boy Genetic Labs does not influence the gender outcome of a pregnancy when, in fact, it does. The probability of this error occurring is denoted by the Greek letter beta, \(\beta\).

The error of greater consequence would be the Type I error since couples would use the It’s a Boy Genetic Labs product in hopes of increasing the chances of having a boy.

Exercise \(\PageIndex{7}\)

“Red tide” is a bloom of poison-producing algae–a few different species of a class of plankton called dinoflagellates. When the weather and water conditions cause these blooms, shellfish such as clams living in the area develop dangerous levels of a paralysis-inducing toxin. In Massachusetts, the Division of Marine Fisheries (DMF) monitors levels of the toxin in shellfish by regular sampling of shellfish along the coastline. If the mean level of toxin in clams exceeds 800 μg (micrograms) of toxin per kg of clam meat in any area, clam harvesting is banned there until the bloom is over and levels of toxin in clams subside. Describe both a Type I and a Type II error in this context, and state which error has the greater consequence.

In this scenario, an appropriate null hypothesis would be \(H_{0}\): the mean level of toxins is at most \(800 \mu\text{g}\), \(H_{0}: \mu_{0} \leq 800 \mu\text{g}\).

Example \(\PageIndex{8}\)

A certain experimental drug claims a cure rate of at least 75% for males with prostate cancer. Describe both the Type I and Type II errors in context. Which error is the more serious?

  • Type I : A cancer patient believes the cure rate for the drug is less than 75% when it actually is at least 75%.
  • Type II : A cancer patient believes the experimental drug has at least a 75% cure rate when it has a cure rate that is less than 75%.

In this scenario, the Type II error contains the more severe consequence. If a patient believes the drug works at least 75% of the time, this most likely will influence the patient’s (and doctor’s) choice about whether to use the drug as a treatment option.

Exercise \(\PageIndex{8}\)

Determine both Type I and Type II errors for the following scenario:

Assume a null hypothesis, \(H_{0}\), that states the percentage of adults with jobs is at least 88%. Identify the Type I and Type II errors from these four statements.

  • Not to reject the null hypothesis that the percentage of adults who have jobs is at least 88% when that percentage is actually less than 88%
  • Not to reject the null hypothesis that the percentage of adults who have jobs is at least 88% when the percentage is actually at least 88%.
  • Reject the null hypothesis that the percentage of adults who have jobs is at least 88% when the percentage is actually at least 88%.
  • Reject the null hypothesis that the percentage of adults who have jobs is at least 88% when that percentage is actually less than 88%.

Type I error: c

Type I error: b

Distribution Needed for Hypothesis Testing

Earlier in the course, we discussed sampling distributions. Particular distributions are associated with hypothesis testing. Perform tests of a population mean using a normal distribution or a Student's \(t\)-distribution. (Remember, use a Student's \(t\)-distribution when the population standard deviation is unknown and the distribution of the sample mean is approximately normal.) We perform tests of a population proportion using a normal distribution (usually \(n\) is large or the sample size is large).

If you are testing a single population mean, the distribution for the test is for means :

\[\bar{X} - N\left(\mu_{x}, \frac{\sigma_{x}}{\sqrt{n}}\right)\]

The population parameter is \(\mu\). The estimated value (point estimate) for \(\mu\) is \(\bar{x}\), the sample mean.

If you are testing a single population proportion, the distribution for the test is for proportions or percentages:

\[P' - N\left(p, \sqrt{\frac{p-q}{n}}\right)\]

The population parameter is \(p\). The estimated value (point estimate) for \(p\) is \(p′\). \(p' = \frac{x}{n}\) where \(x\) is the number of successes and n is the sample size.

Assumptions

When you perform a hypothesis test of a single population mean \(\mu\) using a Student's \(t\)-distribution (often called a \(t\)-test), there are fundamental assumptions that need to be met in order for the test to work properly. Your data should be a simple random sample that comes from a population that is approximately normally distributed. You use the sample standard deviation to approximate the population standard deviation. (Note that if the sample size is sufficiently large, a \(t\)-test will work even if the population is not approximately normally distributed).

When you perform a hypothesis test of a single population mean \(\mu\) using a normal distribution (often called a \(z\)-test), you take a simple random sample from the population. The population you are testing is normally distributed or your sample size is sufficiently large. You know the value of the population standard deviation which, in reality, is rarely known.

When you perform a hypothesis test of a single population proportion \(p\), you take a simple random sample from the population. You must meet the conditions for a binomial distribution which are: there are a certain number \(n\) of independent trials, the outcomes of any trial are success or failure, and each trial has the same probability of a success \(p\). The shape of the binomial distribution needs to be similar to the shape of the normal distribution. To ensure this, the quantities \(np\) and \(nq\) must both be greater than five \((np > 5\) and \(nq > 5)\). Then the binomial distribution of a sample (estimated) proportion can be approximated by the normal distribution with \(\mu = p\) and \(\sigma = \sqrt{\frac{pq}{n}}\). Remember that \(q = 1 – p\).

Rare Events, the Sample, Decision and Conclusion

Establishing the type of distribution, sample size, and known or unknown standard deviation can help you figure out how to go about a hypothesis test. However, there are several other factors you should consider when working out a hypothesis test.

Rare Events

Suppose you make an assumption about a property of the population (this assumption is the null hypothesis). Then you gather sample data randomly. If the sample has properties that would be very unlikely to occur if the assumption is true, then you would conclude that your assumption about the population is probably incorrect. (Remember that your assumption is just an assumption—it is not a fact and it may or may not be true. But your sample data are real and the data are showing you a fact that seems to contradict your assumption.)

For example, Didi and Ali are at a birthday party of a very wealthy friend. They hurry to be first in line to grab a prize from a tall basket that they cannot see inside because they will be blindfolded. There are 200 plastic bubbles in the basket and Didi and Ali have been told that there is only one with a $100 bill. Didi is the first person to reach into the basket and pull out a bubble. Her bubble contains a $100 bill. The probability of this happening is \(\frac{1}{200} = 0.005\). Because this is so unlikely, Ali is hoping that what the two of them were told is wrong and there are more $100 bills in the basket. A "rare event" has occurred (Didi getting the $100 bill) so Ali doubts the assumption about only one $100 bill being in the basket.

Using the Sample to Test the Null Hypothesis

Use the sample data to calculate the actual probability of getting the test result, called the \(p\)-value. The \(p\)-value is the probability that, if the null hypothesis is true, the results from another randomly selected sample will be as extreme or more extreme as the results obtained from the given sample.

A large \(p\)-value calculated from the data indicates that we should not reject the null hypothesis. The smaller the \(p\)-value, the more unlikely the outcome, and the stronger the evidence is against the null hypothesis. We would reject the null hypothesis if the evidence is strongly against it.

Draw a graph that shows the \(p\)-value. The hypothesis test is easier to perform if you use a graph because you see the problem more clearly.

Example \(\PageIndex{9}\)

Suppose a baker claims that his bread height is more than 15 cm, on average. Several of his customers do not believe him. To persuade his customers that he is right, the baker decides to do a hypothesis test. He bakes 10 loaves of bread. The mean height of the sample loaves is 17 cm. The baker knows from baking hundreds of loaves of bread that the standard deviation for the height is 0.5 cm. and the distribution of heights is normal.

  • The null hypothesis could be \(H_{0}: \mu \leq 15\)
  • The alternate hypothesis is \(H_{a}: \mu > 15\)

The words "is more than" translates as a "\(>\)" so "\(\mu > 15\)" goes into the alternate hypothesis. The null hypothesis must contradict the alternate hypothesis.

Since \(\sigma\) is known (\(\sigma = 0.5 cm.\)), the distribution for the population is known to be normal with mean \(μ = 15\) and standard deviation

\[\dfrac{\sigma}{\sqrt{n}} = \frac{0.5}{\sqrt{10}} = 0.16. \nonumber\]

Suppose the null hypothesis is true (the mean height of the loaves is no more than 15 cm). Then is the mean height (17 cm) calculated from the sample unexpectedly large? The hypothesis test works by asking the question how unlikely the sample mean would be if the null hypothesis were true. The graph shows how far out the sample mean is on the normal curve. The p -value is the probability that, if we were to take other samples, any other sample mean would fall at least as far out as 17 cm.

The \(p\) -value, then, is the probability that a sample mean is the same or greater than 17 cm. when the population mean is, in fact, 15 cm. We can calculate this probability using the normal distribution for means.

alt

\(p\text{-value} = P(\bar{x} > 17)\) which is approximately zero.

A \(p\)-value of approximately zero tells us that it is highly unlikely that a loaf of bread rises no more than 15 cm, on average. That is, almost 0% of all loaves of bread would be at least as high as 17 cm. purely by CHANCE had the population mean height really been 15 cm. Because the outcome of 17 cm. is so unlikely (meaning it is happening NOT by chance alone) , we conclude that the evidence is strongly against the null hypothesis (the mean height is at most 15 cm.). There is sufficient evidence that the true mean height for the population of the baker's loaves of bread is greater than 15 cm.

Exercise \(\PageIndex{9}\)

A normal distribution has a standard deviation of 1. We want to verify a claim that the mean is greater than 12. A sample of 36 is taken with a sample mean of 12.5.

  • \(H_{0}: \mu leq 12\)
  • \(H_{a}: \mu > 12\)

The \(p\)-value is 0.0013

Draw a graph that shows the \(p\)-value.

\(p\text{-value} = 0.0013\)

alt

Decision and Conclusion

A systematic way to make a decision of whether to reject or not reject the null hypothesis is to compare the \(p\)-value and a preset or preconceived \(\alpha\) (also called a " significance level "). A preset \(\alpha\) is the probability of a Type I error (rejecting the null hypothesis when the null hypothesis is true). It may or may not be given to you at the beginning of the problem.

When you make a decision to reject or not reject \(H_{0}\), do as follows:

  • If \(\alpha > p\text{-value}\), reject \(H_{0}\). The results of the sample data are significant. There is sufficient evidence to conclude that \(H_{0}\) is an incorrect belief and that the alternative hypothesis, \(H_{a}\), may be correct.
  • If \(\alpha \leq p\text{-value}\), do not reject \(H_{0}\). The results of the sample data are not significant.There is not sufficient evidence to conclude that the alternative hypothesis,\(H_{a}\), may be correct.

When you "do not reject \(H_{0}\)", it does not mean that you should believe that H 0 is true. It simply means that the sample data have failed to provide sufficient evidence to cast serious doubt about the truthfulness of \(H_{0}\).

Conclusion: After you make your decision, write a thoughtful conclusion about the hypotheses in terms of the given problem.

Example \(\PageIndex{10}\)

When using the \(p\)-value to evaluate a hypothesis test, it is sometimes useful to use the following memory device

  • If the \(p\)-value is low, the null must go.
  • If the \(p\)-value is high, the null must fly.

This memory aid relates a \(p\)-value less than the established alpha (the \(p\) is low) as rejecting the null hypothesis and, likewise, relates a \(p\)-value higher than the established alpha (the \(p\) is high) as not rejecting the null hypothesis.

Fill in the blanks.

Reject the null hypothesis when ______________________________________.

The results of the sample data _____________________________________.

Do not reject the null when hypothesis when __________________________________________.

The results of the sample data ____________________________________________.

Reject the null hypothesis when the \(p\) -value is less than the established alpha value . The results of the sample data support the alternative hypothesis .

Do not reject the null hypothesis when the \(p\) -value is greater than the established alpha value . The results of the sample data do not support the alternative hypothesis .

Exercise \(\PageIndex{10}\)

It’s a Boy Genetics Labs claim their procedures improve the chances of a boy being born. The results for a test of a single population proportion are as follows:

  • \(H_{0}: p = 0.50, H_{a}: p > 0.50\)
  • \(\alpha = 0.01\)
  • \(p\text{-value} = 0.025\)

Interpret the results and state a conclusion in simple, non-technical terms.

Since the \(p\)-value is greater than the established alpha value (the \(p\)-value is high), we do not reject the null hypothesis. There is not enough evidence to support It’s a Boy Genetics Labs' stated claim that their procedures improve the chances of a boy being born.

In a hypothesis test , sample data is evaluated in order to arrive at a decision about some type of claim. If certain conditions about the sample are satisfied, then the claim can be evaluated for a population. In a hypothesis test, we:

  • Evaluate the null hypothesis , typically denoted with \(H_{0}\). The null is not rejected unless the hypothesis test shows otherwise. The null statement must always contain some form of equality \((=, \leq \text{or} \geq)\)
  • Always write the alternative hypothesis , typically denoted with \(H_{a}\) or \(H_{1}\), using less than, greater than, or not equals symbols, i.e., \((\neq, >, \text{or} <)\).
  • If we reject the null hypothesis, then we can assume there is enough evidence to support the alternative hypothesis.
  • Never state that a claim is proven true or false. Keep in mind the underlying fact that hypothesis testing is based on probability laws; therefore, we can talk only in terms of non-absolute certainties.

In every hypothesis test, the outcomes are dependent on a correct interpretation of the data. Incorrect calculations or misunderstood summary statistics can yield errors that affect the results. A Type I error occurs when a true null hypothesis is rejected. A Type II error occurs when a false null hypothesis is not rejected. The probabilities of these errors are denoted by the Greek letters \(\alpha\) and \(\beta\), for a Type I and a Type II error respectively. The power of the test, \(1 - \beta\), quantifies the likelihood that a test will yield the correct result of a true alternative hypothesis being accepted. A high power is desirable.

In order for a hypothesis test’s results to be generalized to a population, certain requirements must be satisfied.

When testing for a single population mean:

  • A Student's \(t\)-test should be used if the data come from a simple, random sample and the population is approximately normally distributed, or the sample size is large, with an unknown standard deviation.
  • The normal test will work if the data come from a simple, random sample and the population is approximately normally distributed, or the sample size is large, with a known standard deviation.

When testing a single population proportion use a normal test for a single population proportion if the data comes from a simple, random sample, fill the requirements for a binomial distribution, and the mean number of successes and the mean number of failures satisfy the conditions: \(np > 5\) and \(nq > 5\) where \(n\) is the sample size, \(p\) is the probability of a success, and \(q\) is the probability of a failure.

When the probability of an event occurring is low, and it happens, it is called a rare event. Rare events are important to consider in hypothesis testing because they can inform your willingness not to reject or to reject a null hypothesis. To test a null hypothesis, find the p -value for the sample data and graph the results. When deciding whether or not to reject the null the hypothesis, keep these two parameters in mind:

  • \(\alpha > p-value\), reject the null hypothesis
  • \(\alpha \leq p-value\), do not reject the null hypothesis

Formula Review

\(H_{0}\) and \(H_{a}\) are contradictory.

equal \((=)\) greater than or equal to \((\geq)\) less than or equal to \((\leq)\)
has: not equal \((\neq)\) greater than \((>)\) less than \((<)\) less than \((<)\) greater than \((>)\)
  • If \(\alpha \leq p\)-value, then do not reject \(H_{0}\).
  • If\(\alpha > p\)-value, then reject \(H_{0}\).

\(\alpha\) is preconceived. Its value is set before the hypothesis test starts. The \(p\)-value is calculated from the data.

If there is no given preconceived \(\alpha\), then use \(\alpha = 0.05\).

Types of Hypothesis Tests

  • Single population mean, known population variance (or standard deviation): Normal test .
  • Single population mean, unknown population variance (or standard deviation): Student's \(t\)-test .
  • Single population proportion: Normal test .
  • For a single population mean , we may use a normal distribution with the following mean and standard deviation. Means: \(\mu = \mu_{\bar{x}}\) and \(\\sigma_{\bar{x}} = \frac{\sigma_{x}}{\sqrt{n}}\)
  • A single population proportion , we may use a normal distribution with the following mean and standard deviation. Proportions: \(\mu = p\) and \(\sigma = \sqrt{\frac{pq}{n}}\).

Data from the National Institute of Mental Health. Available online at http://www.nimh.nih.gov/publicat/depression.cfm .

  • It is continuous and assumes any real values.
  • The pdf is symmetrical about its mean of zero. However, it is more spread out and flatter at the apex than the normal distribution.
  • It approaches the standard normal distribution as \(n\) gets larger.
  • There is a "family" of \(t\)-distributions: every representative of the family is completely defined by the number of degrees of freedom which is one less than the number of data items.

Contributors and Attributions

Barbara Illowsky and Susan Dean (De Anza College) with many other contributing authors. Content produced by OpenStax College is licensed under a Creative Commons Attribution License 4.0 license. Download for free at http://cnx.org/contents/[email protected] .

P-Value And Statistical Significance: What It Is & Why It Matters

Saul Mcleod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul Mcleod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Learn about our Editorial Process

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

On This Page:

The p-value in statistics quantifies the evidence against a null hypothesis. A low p-value suggests data is inconsistent with the null, potentially favoring an alternative hypothesis. Common significance thresholds are 0.05 or 0.01.

P-Value Explained in Normal Distribution

Hypothesis testing

When you perform a statistical test, a p-value helps you determine the significance of your results in relation to the null hypothesis.

The null hypothesis (H0) states no relationship exists between the two variables being studied (one variable does not affect the other). It states the results are due to chance and are not significant in supporting the idea being investigated. Thus, the null hypothesis assumes that whatever you try to prove did not happen.

The alternative hypothesis (Ha or H1) is the one you would believe if the null hypothesis is concluded to be untrue.

The alternative hypothesis states that the independent variable affected the dependent variable, and the results are significant in supporting the theory being investigated (i.e., the results are not due to random chance).

What a p-value tells you

A p-value, or probability value, is a number describing how likely it is that your data would have occurred by random chance (i.e., that the null hypothesis is true).

The level of statistical significance is often expressed as a p-value between 0 and 1.

The smaller the p -value, the less likely the results occurred by random chance, and the stronger the evidence that you should reject the null hypothesis.

Remember, a p-value doesn’t tell you if the null hypothesis is true or false. It just tells you how likely you’d see the data you observed (or more extreme data) if the null hypothesis was true. It’s a piece of evidence, not a definitive proof.

Example: Test Statistic and p-Value

Suppose you’re conducting a study to determine whether a new drug has an effect on pain relief compared to a placebo. If the new drug has no impact, your test statistic will be close to the one predicted by the null hypothesis (no difference between the drug and placebo groups), and the resulting p-value will be close to 1. It may not be precisely 1 because real-world variations may exist. Conversely, if the new drug indeed reduces pain significantly, your test statistic will diverge further from what’s expected under the null hypothesis, and the p-value will decrease. The p-value will never reach zero because there’s always a slim possibility, though highly improbable, that the observed results occurred by random chance.

P-value interpretation

The significance level (alpha) is a set probability threshold (often 0.05), while the p-value is the probability you calculate based on your study or analysis.

A p-value less than or equal to your significance level (typically ≤ 0.05) is statistically significant.

A p-value less than or equal to a predetermined significance level (often 0.05 or 0.01) indicates a statistically significant result, meaning the observed data provide strong evidence against the null hypothesis.

This suggests the effect under study likely represents a real relationship rather than just random chance.

For instance, if you set α = 0.05, you would reject the null hypothesis if your p -value ≤ 0.05. 

It indicates strong evidence against the null hypothesis, as there is less than a 5% probability the null is correct (and the results are random).

Therefore, we reject the null hypothesis and accept the alternative hypothesis.

Example: Statistical Significance

Upon analyzing the pain relief effects of the new drug compared to the placebo, the computed p-value is less than 0.01, which falls well below the predetermined alpha value of 0.05. Consequently, you conclude that there is a statistically significant difference in pain relief between the new drug and the placebo.

What does a p-value of 0.001 mean?

A p-value of 0.001 is highly statistically significant beyond the commonly used 0.05 threshold. It indicates strong evidence of a real effect or difference, rather than just random variation.

Specifically, a p-value of 0.001 means there is only a 0.1% chance of obtaining a result at least as extreme as the one observed, assuming the null hypothesis is correct.

Such a small p-value provides strong evidence against the null hypothesis, leading to rejecting the null in favor of the alternative hypothesis.

A p-value more than the significance level (typically p > 0.05) is not statistically significant and indicates strong evidence for the null hypothesis.

This means we retain the null hypothesis and reject the alternative hypothesis. You should note that you cannot accept the null hypothesis; we can only reject it or fail to reject it.

Note : when the p-value is above your threshold of significance,  it does not mean that there is a 95% probability that the alternative hypothesis is true.

One-Tailed Test

Probability and statistical significance in ab testing. Statistical significance in a b experiments

Two-Tailed Test

statistical significance two tailed

How do you calculate the p-value ?

Most statistical software packages like R, SPSS, and others automatically calculate your p-value. This is the easiest and most common way.

Online resources and tables are available to estimate the p-value based on your test statistic and degrees of freedom.

These tables help you understand how often you would expect to see your test statistic under the null hypothesis.

Understanding the Statistical Test:

Different statistical tests are designed to answer specific research questions or hypotheses. Each test has its own underlying assumptions and characteristics.

For example, you might use a t-test to compare means, a chi-squared test for categorical data, or a correlation test to measure the strength of a relationship between variables.

Be aware that the number of independent variables you include in your analysis can influence the magnitude of the test statistic needed to produce the same p-value.

This factor is particularly important to consider when comparing results across different analyses.

Example: Choosing a Statistical Test

If you’re comparing the effectiveness of just two different drugs in pain relief, a two-sample t-test is a suitable choice for comparing these two groups. However, when you’re examining the impact of three or more drugs, it’s more appropriate to employ an Analysis of Variance ( ANOVA) . Utilizing multiple pairwise comparisons in such cases can lead to artificially low p-values and an overestimation of the significance of differences between the drug groups.

How to report

A statistically significant result cannot prove that a research hypothesis is correct (which implies 100% certainty).

Instead, we may state our results “provide support for” or “give evidence for” our research hypothesis (as there is still a slight probability that the results occurred by chance and the null hypothesis was correct – e.g., less than 5%).

Example: Reporting the results

In our comparison of the pain relief effects of the new drug and the placebo, we observed that participants in the drug group experienced a significant reduction in pain ( M = 3.5; SD = 0.8) compared to those in the placebo group ( M = 5.2; SD  = 0.7), resulting in an average difference of 1.7 points on the pain scale (t(98) = -9.36; p < 0.001).

The 6th edition of the APA style manual (American Psychological Association, 2010) states the following on the topic of reporting p-values:

“When reporting p values, report exact p values (e.g., p = .031) to two or three decimal places. However, report p values less than .001 as p < .001.

The tradition of reporting p values in the form p < .10, p < .05, p < .01, and so forth, was appropriate in a time when only limited tables of critical values were available.” (p. 114)

  • Do not use 0 before the decimal point for the statistical value p as it cannot equal 1. In other words, write p = .001 instead of p = 0.001.
  • Please pay attention to issues of italics ( p is always italicized) and spacing (either side of the = sign).
  • p = .000 (as outputted by some statistical packages such as SPSS) is impossible and should be written as p < .001.
  • The opposite of significant is “nonsignificant,” not “insignificant.”

Why is the p -value not enough?

A lower p-value  is sometimes interpreted as meaning there is a stronger relationship between two variables.

However, statistical significance means that it is unlikely that the null hypothesis is true (less than 5%).

To understand the strength of the difference between the two groups (control vs. experimental) a researcher needs to calculate the effect size .

When do you reject the null hypothesis?

In statistical hypothesis testing, you reject the null hypothesis when the p-value is less than or equal to the significance level (α) you set before conducting your test. The significance level is the probability of rejecting the null hypothesis when it is true. Commonly used significance levels are 0.01, 0.05, and 0.10.

Remember, rejecting the null hypothesis doesn’t prove the alternative hypothesis; it just suggests that the alternative hypothesis may be plausible given the observed data.

The p -value is conditional upon the null hypothesis being true but is unrelated to the truth or falsity of the alternative hypothesis.

What does p-value of 0.05 mean?

If your p-value is less than or equal to 0.05 (the significance level), you would conclude that your result is statistically significant. This means the evidence is strong enough to reject the null hypothesis in favor of the alternative hypothesis.

Are all p-values below 0.05 considered statistically significant?

No, not all p-values below 0.05 are considered statistically significant. The threshold of 0.05 is commonly used, but it’s just a convention. Statistical significance depends on factors like the study design, sample size, and the magnitude of the observed effect.

A p-value below 0.05 means there is evidence against the null hypothesis, suggesting a real effect. However, it’s essential to consider the context and other factors when interpreting results.

Researchers also look at effect size and confidence intervals to determine the practical significance and reliability of findings.

How does sample size affect the interpretation of p-values?

Sample size can impact the interpretation of p-values. A larger sample size provides more reliable and precise estimates of the population, leading to narrower confidence intervals.

With a larger sample, even small differences between groups or effects can become statistically significant, yielding lower p-values. In contrast, smaller sample sizes may not have enough statistical power to detect smaller effects, resulting in higher p-values.

Therefore, a larger sample size increases the chances of finding statistically significant results when there is a genuine effect, making the findings more trustworthy and robust.

Can a non-significant p-value indicate that there is no effect or difference in the data?

No, a non-significant p-value does not necessarily indicate that there is no effect or difference in the data. It means that the observed data do not provide strong enough evidence to reject the null hypothesis.

There could still be a real effect or difference, but it might be smaller or more variable than the study was able to detect.

Other factors like sample size, study design, and measurement precision can influence the p-value. It’s important to consider the entire body of evidence and not rely solely on p-values when interpreting research findings.

Can P values be exactly zero?

While a p-value can be extremely small, it cannot technically be absolute zero. When a p-value is reported as p = 0.000, the actual p-value is too small for the software to display. This is often interpreted as strong evidence against the null hypothesis. For p values less than 0.001, report as p < .001

Further Information

  • P-values and significance tests (Kahn Academy)
  • Hypothesis testing and p-values (Kahn Academy)
  • Wasserstein, R. L., Schirm, A. L., & Lazar, N. A. (2019). Moving to a world beyond “ p “< 0.05”.
  • Criticism of using the “ p “< 0.05”.
  • Publication manual of the American Psychological Association
  • Statistics for Psychology Book Download

Bland, J. M., & Altman, D. G. (1994). One and two sided tests of significance: Authors’ reply.  BMJ: British Medical Journal ,  309 (6958), 874.

Goodman, S. N., & Royall, R. (1988). Evidence and scientific research.  American Journal of Public Health ,  78 (12), 1568-1574.

Goodman, S. (2008, July). A dirty dozen: twelve p-value misconceptions . In  Seminars in hematology  (Vol. 45, No. 3, pp. 135-140). WB Saunders.

Lang, J. M., Rothman, K. J., & Cann, C. I. (1998). That confounded P-value.  Epidemiology (Cambridge, Mass.) ,  9 (1), 7-8.

Print Friendly, PDF & Email

Related Articles

Exploratory Data Analysis

Exploratory Data Analysis

What Is Face Validity In Research? Importance & How To Measure

Research Methodology , Statistics

What Is Face Validity In Research? Importance & How To Measure

Criterion Validity: Definition & Examples

Criterion Validity: Definition & Examples

Convergent Validity: Definition and Examples

Convergent Validity: Definition and Examples

Content Validity in Research: Definition & Examples

Content Validity in Research: Definition & Examples

Construct Validity In Psychology Research

Construct Validity In Psychology Research

population variance hypothesis test

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

7.4 - comparing two population variances.

So far, we considered inference to compare two proportions and inference to compare two means. In this section, we will present how to compare two population variances.

Why would we want to compare two population variances? There are many situations, such as in quality control problems, where you may want to choose the process with smaller variability for a variable of interest.

One of the essential steps of a test to compare two population variances is for checking the equal variances assumption if you want to use the pooled variances. Many people use this test as a guide to see if there are any clear violations, much like using the rule of thumb.

When we introduce inference for two parameters before, we started with the sampling distribution. We will not do this here. The details of this test are left out. We will simply present how to use it.

F-Test to Compare Two Population Variances Section  

To compare the variances of two quantitative variables, the hypotheses of interest are:

\(H_0\colon \dfrac{\sigma^2_1}{\sigma^2_2}=1\)

\(H_a\colon \dfrac{\sigma^2_1}{\sigma^2_2}\ne1\)

\(H_a\colon \dfrac{\sigma^2_1}{\sigma^2_2}>1\)

\(H_a\colon \dfrac{\sigma^2_1}{\sigma^2_2}<1\)

The last two alternatives are determined by how you arrange your ratio of the two sample statistics.

We will rely on Minitab to conduct this test for us. Minitab offers three (3) different methods to test equal variances.

  • The F -test : This test assumes the two samples come from populations that are normally distributed.
  • Bonett's test : this assumes only that the two samples are quantitative.
  • Levene's test : similar to Bonett's in that the only assumption is that the data is quantitative. Best to use if one or both samples are heavily skewed, and your two sample sizes are both under 20.

Bonett’s test and Levene’s test are both considered nonparametric tests. In our case, since the tests we will be considering are based on a normal distribution, we are expecting to use the F -test. Again, we will need to confirm this by plotting our sample data (i.e., using a probability plot).

Caution!  To use the F-test, the samples must come from a normal distribution. The Central Limit Theorem applies to sample means, not to the data. Therefore, if the sample size is large, it does not mean we can assume the data come from a normal distribution.

Example 7-8: Comparing Packing Time Variances Section  

Using the data in the packaging time from our previous discussion on two independent samples, we want to check whether it is reasonable to assume that the two machines have equal population variances.

Recall that the data are given below as ( machine.txt ):

42.1 41.3 42.4 43.2 41.8
41.0 41.8 42.8 42.3 42.7
42.7 43.8 42.5 43.1 44.0
43.6 43.3 43.5 41.7 44.1

  Minitab: F-test to Compare Two Population Variances

In Minitab...

  • Choose Stat  >  Basic Statistics  >  2 Variances  and complete the dialog boxes.
  • In the dialog box, check 'Use test and confidence intervals based on normal distribution' when we are confident the two samples come from a normal distribution.

Minitab 2 variance test dialog box.

Notes on using Minitab :

  • Minitab will compare the two variances using the popular F-test method.
  • If we only have summarized data (e.g. the sample sizes and sample variances or sample standard deviations), then the two variance test in Minitab will only provide an F-test.
  • Minitab will use the Bonett and Levene test that are more robust tests when normality is not assumed.
  • Minitab calculates the ratio based on Sample 1 divided by Sample 2.

The Minitab Output for the test for equal variances is as follows (a graph is also given in the output that provides confidence intervals and p -value for the test. This is not shown here):

Test and CI for Two Variances: New machine, Old machine

σ 1 : standard deviation of New machine

σ 2 : standard deviation of Old machine

Ratio: σ 1 /σ 2

F method was used. This method is accurate for normal data only.

Descriptive Statistics

Variable N StDev Variance 95% CI for σ
New Machine 10 0.683 0.467 (0.470, 1.248)
Old Machine 10 0.750 0.562 (0.516, 1.369)

Ratio of standard deviations

Estimated Ratio 95% CI for Ratio using F
0.911409 (0454, 1.829)

Null hypothesis

Alternative hypothesis

Significance level

H 0 : σ 1 /σ 2 =1

H 1 : σ 1 /σ 2 ≠1

Method Test Statistic DF1 DF2 P-Value
F 0.83 9 9 0.787

How do we interpret the Minitab output?

Note that \(S_{new}=0.683\) and \(s_{old}=0.750\) The test statistic \(F\) is computed as...

\(F=\dfrac{s^2_{new}}{s^2_{old}}=0.83\)

The p -value provided is that for the alternative selected i.e. two-sided. If the alternative were one sided, for example if our alternative in the above example was "ratio less than 1", then the p -value would be half the reported p -value for the two-sided test, or 0.393.

Minitab provided the results only from the F -test since we checked the box to assume normal distribution. Regardless, the hypotheses would be the same for any of the test options and the decision method is the same: if the p -value is less than alpha, we reject the null and conclude the two population variances are not equal. Otherwise, if the p -value is large (i.e. greater than alpha) then we fail to reject the null and can assume the two population variances are equal.

In this example, the p -value for the F -test is very large (larger than 0.1). Therefore, we fail to reject the null hypothesis and conclude that there is not enough evidence to conclude the variances are different.

Note!  Remember, if there is doubt about normality, the better choice is to NOT use the F -test. You need to check whether the normal assumption holds before you can use the F -test since the F -test is very sensitive to departure from the normal assumption.

IMAGES

  1. PPT

    population variance hypothesis test

  2. PPT

    population variance hypothesis test

  3. 24. Hypothesis Testing for Two Population Variances

    population variance hypothesis test

  4. PPT

    population variance hypothesis test

  5. Chapter 7 Hypothesis Testing with One Sample Larson Farber

    population variance hypothesis test

  6. PPT

    population variance hypothesis test

VIDEO

  1. Hypothesis Tests About the Population Variance for Single and TWO Variances

  2. Proportion Hypothesis Testing, example 2

  3. Elementary Stats Lesson 140: Conclusions from Variance Hypothesis test (Section 10.4 Problems 6-7)

  4. Statistics (Hypothesis tests for a population variance or standard deviation)

  5. Probability and Statistics: Testing the Hypothesis

  6. Lecture 8: Chi^2 Analysis of Variance Hypothesis Test Greater Than Example

COMMENTS

  1. 3.5: Hypothesis Test about a Variance

    The test statistic is. χ2 = (n − 1)S2 σ2 0 = (11 − 1)0.064 0.06 = 10.667. We fail to reject the null hypothesis. The forester does NOT have enough evidence to support the claim that the variance is greater than 0.06 gal.2 You can also estimate the p-value using the same method as for the student t-table.

  2. Lesson 12: Tests for Variances

    Lesson 12: Tests for Variances. Continuing our development of hypothesis tests for various population parameters, in this lesson, we'll focus on hypothesis tests for population variances. Specifically, we'll develop: a hypothesis test for testing whether a single population variance σ 2 equals a particular value. a hypothesis test for testing ...

  3. Hypothesis tests about the variance

    The null hypothesis. We test the null hypothesis that the variance is equal to a specific value : The test statistic. We construct a test statistic by using the sample mean and either the unadjusted sample variance or the adjusted sample variance. The test statistic, known as Chi-square statistic, is. The critical region

  4. 10.3 Statistical Inference for a Single Population Variance

    The hypothesis test for a population variance is a well established process: Write down the null and alternative hypotheses in terms of the population variance [latex]\sigma^2[/latex]. Use the form of the alternative hypothesis to determine if the test is left-tailed, right-tailed, or two-tailed.

  5. 10.1

    for testing the null hypothesis. H 0: μ = μ 0. against any of the possible alternative hypotheses H A: μ ≠ μ 0, H A: μ < μ 0, and H A: μ > μ 0. For the example in hand, the value of the test statistic is: Z = 80.94 − 85 11.6 / 25 = − 1.75. The critical region approach tells us to reject the null hypothesis at the α = 0.05 level ...

  6. Hypothesis Testing

    The hypothesis is based on available information and the investigator's belief about the population parameters. The specific test considered here is called analysis of variance (ANOVA) and is a test of hypothesis that is appropriate to compare means of a continuous variable in two or more independent comparison groups.

  7. 11.6 Test of a Single Variance

    where: n = the total number of data ; s 2 = sample variance ; σ 2 = population variance; You may think of s as the random variable in this test. The number of degrees of freedom is df = n - 1.A test of a single variance may be right-tailed, left-tailed, or two-tailed. Example 11.10 will show you how to set up the null and alternative hypotheses. The null and alternative hypotheses contain ...

  8. Hypothesis Test for Variance

    A test of a single variance assumes that the underlying distribution is normal.The null and alternative hypotheses are stated in terms of the population variance (or population standard deviation). The test statistic is: [latex]\displaystyle\dfrac{\left(n-1\right)s^2}{\sigma^2}[/latex]

  9. 13.4 Test of Two Variances

    9.3 Distribution Needed for Hypothesis Testing; 9.4 Rare Events, the Sample, ... Therefore, if F is close to 1, the evidence favors the null hypothesis (the two population variances are equal). But if F is much larger than 1, then the evidence is against the null hypothesis. A test of two variances may be left-tailed, right-tailed, or two ...

  10. Making Inferences about a Single Population Variance

    Example 1: Right-Tailed Hypothesis Test of Population Variance. A research team collected a sample of 10 observations from the random variable Y, which had a normal distribution N(μ,σ²). They found that Y-bar=57.9, where Y-bar is the mean of the 10 observations, and S²=485.2, where S² is the sample variance. Test the null hypothesis H0 ...

  11. Hypothesis Testing

    Step 2: Collect data. For a statistical test to be valid, it is important to perform sampling and collect data in a way that is designed to test your hypothesis. If your data are not representative, then you cannot make statistical inferences about the population you are interested in. Hypothesis testing example.

  12. 12.1

    12.1 - One Variance. Yeehah again! The theoretical work for developing a hypothesis test for a population variance σ 2 is already behind us. Recall that if you have a random sample of size n from a normal population with (unknown) mean μ and variance σ 2, then: χ 2 = ( n − 1) S 2 σ 2. follows a chi-square distribution with n −1 degrees ...

  13. Hypothesis Testing: Testing for a Population Variance

    A hypothesis testing is a procedure in which a claim about a certain population parameter is tested. A population parameter is a numerical constant that represents o characterizes a distribution. Typically, a hypothesis test is about a population mean, typically notated as \mu μ, but in reality it can be about any population parameter, such a ...

  14. 11.1

    11.1 - When Population Variances Are Equal. Let's start with the good news, namely that we've already done the dirty theoretical work in developing a hypothesis test for the difference in two population means μ 1 − μ 2 when we developed a ( 1 − α) 100 % confidence interval for the difference in two population means.

  15. Test of Two Variances

    Two college instructors are interested in whether or not there is any variation in the way they grade math exams. They each grade the same set of 30 exams. The first instructor's grades have a variance of 52.3. The second instructor's grades have a variance of 89.9. Test the claim that the first instructor's variance is smaller.

  16. Chi-Square test for One Pop. Variance

    The formula for a Chi-Square statistic for testing for one population variance is. \chi^2 = \frac { (n-1)s^2} {\sigma^2} χ2 = σ2(n−1)s2. The null hypothesis is rejected when the Chi-Square statistic lies on the rejection region, which is determined by the significance level ( \alpha α) and the type of tail (two-tailed, left-tailed or right ...

  17. Hypothesis testing for the variance of a population

    Someone is trying to introduce a new process in the production of a precision instrument for industrial use. The new process keeps the average weight but hopes to reduce the variability, which until now has been characterized by $\sigma^2 = 14.5$.. Because the complete introduction of the new process has costs, a test has been done and 16 instruments have been produced with this new method.

  18. 9.2: Hypothesis Testing

    In a hypothesis test, sample data is evaluated in order to arrive at a decision about some type of claim. If certain conditions about the sample are satisfied, then the claim can be evaluated for a population. In a hypothesis test, we: Evaluate the null hypothesis, typically denoted with H0.

  19. 11.2

    We reject the null hypothesis because the test statistic (\(t=3.54\)) falls in the rejection region: 2.004 -2.004 3.54 There is (again!) sufficient evidence at the \(\alpha=0.05\) level to conclude that the average fastest speed driven by the population of male college students differs from the average fastest speed driven by the population of ...

  20. Understanding P-Values and Statistical Significance

    In statistical hypothesis testing, you reject the null hypothesis when the p-value is less than or equal to the significance level (α) you set before conducting your test. The significance level is the probability of rejecting the null hypothesis when it is true. Commonly used significance levels are 0.01, 0.05, and 0.10.

  21. 10.2

    For the example in hand, the value of the test statistic is: The critical region approach tells us to reject the null hypothesis at the α = 0.05 level if t ≥ t 0.025, 99 = 1.9842 or if t ≤ t 0.025, 99 = − 1.9842. Therefore, we reject the null hypothesis because t = 4.762 > 1.9842, and therefore falls in the rejection region: 1.9842 -1. ...

  22. Calculating Sample Size for BI Hypothesis Tests

    Understanding what you want to achieve with your hypothesis test will guide you in making an informed decision about sample size. Add your perspective Help others by sharing more (125 characters ...

  23. 7.4

    To compare the variances of two quantitative variables, the hypotheses of interest are: Null. H 0: σ 1 2 σ 2 2 = 1. Alternatives. H a: σ 1 2 σ 2 2 ≠ 1. H a: σ 1 2 σ 2 2 > 1. H a: σ 1 2 σ 2 2 < 1. The last two alternatives are determined by how you arrange your ratio of the two sample statistics. We will rely on Minitab to conduct this ...