Hypothesis Testing

Hypothesis testing is a tool for making statistical inferences about the population data. It is an analysis tool that tests assumptions and determines how likely something is within a given standard of accuracy. Hypothesis testing provides a way to verify whether the results of an experiment are valid.

A null hypothesis and an alternative hypothesis are set up before performing the hypothesis testing. This helps to arrive at a conclusion regarding the sample obtained from the population. In this article, we will learn more about hypothesis testing, its types, steps to perform the testing, and associated examples.


What is Hypothesis Testing in Statistics?

Hypothesis testing uses sample data from the population to draw useful conclusions regarding the population probability distribution . It tests an assumption made about the data using different types of hypothesis testing methodologies. The hypothesis testing results in either rejecting or not rejecting the null hypothesis.

Hypothesis Testing Definition

Hypothesis testing can be defined as a statistical tool that is used to identify if the results of an experiment are meaningful or not. It involves setting up a null hypothesis and an alternative hypothesis. These two hypotheses will always be mutually exclusive. This means that if the null hypothesis is true then the alternative hypothesis is false and vice versa. An example of hypothesis testing is setting up a test to check if a new medicine works on a disease in a more efficient manner.

Null Hypothesis

The null hypothesis is a concise mathematical statement that is used to indicate that there is no difference between two possibilities. In other words, there is no difference between certain characteristics of data. This hypothesis assumes that the outcomes of an experiment are based on chance alone. It is denoted as \(H_{0}\). Hypothesis testing is used to conclude if the null hypothesis can be rejected or not. Suppose an experiment is conducted to check if girls are shorter than boys at the age of 5. The null hypothesis will say that they are the same height.

Alternative Hypothesis

The alternative hypothesis is an alternative to the null hypothesis. It is used to show that the observations of an experiment are due to some real effect. It indicates that there is a statistical significance between two possible outcomes and can be denoted as \(H_{1}\) or \(H_{a}\). For the above-mentioned example, the alternative hypothesis would be that girls are shorter than boys at the age of 5.

Hypothesis Testing P Value

In hypothesis testing, the p value is used to indicate whether the results obtained after conducting a test are statistically significant or not. It also indicates the probability of making an error in rejecting or not rejecting the null hypothesis.This value is always a number between 0 and 1. The p value is compared to an alpha level, \(\alpha\) or significance level. The alpha level can be defined as the acceptable risk of incorrectly rejecting the null hypothesis. The alpha level is usually chosen between 1% to 5%.

Hypothesis Testing Critical region

All sets of values that lead to rejecting the null hypothesis lie in the critical region. Furthermore, the value that separates the critical region from the non-critical region is known as the critical value.

Hypothesis Testing Formula

Depending upon the type of data available and the size, different types of hypothesis testing are used to determine whether the null hypothesis can be rejected or not. The hypothesis testing formula for some important test statistics are given below:

  • z = \(\frac{\overline{x}-\mu}{\frac{\sigma}{\sqrt{n}}}\). \(\overline{x}\) is the sample mean, \(\mu\) is the population mean, \(\sigma\) is the population standard deviation and n is the size of the sample.
  • t = \(\frac{\overline{x}-\mu}{\frac{s}{\sqrt{n}}}\). s is the sample standard deviation.
  • \(\chi ^{2} = \sum \frac{(O_{i}-E_{i})^{2}}{E_{i}}\). \(O_{i}\) is the observed value and \(E_{i}\) is the expected value.

We will learn more about these test statistics in the upcoming section.

Types of Hypothesis Testing

Selecting the correct test for performing hypothesis testing can be confusing. These tests are used to determine a test statistic on the basis of which the null hypothesis can either be rejected or not rejected. Some of the important tests used for hypothesis testing are given below.

Hypothesis Testing Z Test

A z test is a way of hypothesis testing that is used for a large sample size (n ≥ 30). It is used to determine whether there is a difference between the population mean and the sample mean when the population standard deviation is known. It can also be used to compare the mean of two samples. It is used to compute the z test statistic. The formulas are given as follows:

  • One sample: z = \(\frac{\overline{x}-\mu}{\frac{\sigma}{\sqrt{n}}}\).
  • Two samples: z = \(\frac{(\overline{x_{1}}-\overline{x_{2}})-(\mu_{1}-\mu_{2})}{\sqrt{\frac{\sigma_{1}^{2}}{n_{1}}+\frac{\sigma_{2}^{2}}{n_{2}}}}\).

Hypothesis Testing t Test

The t test is another method of hypothesis testing that is used for a small sample size (n < 30). It is also used to compare the sample mean and population mean. However, the population standard deviation is not known. Instead, the sample standard deviation is known. The mean of two samples can also be compared using the t test.

  • One sample: t = \(\frac{\overline{x}-\mu}{\frac{s}{\sqrt{n}}}\).
  • Two samples: t = \(\frac{(\overline{x_{1}}-\overline{x_{2}})-(\mu_{1}-\mu_{2})}{\sqrt{\frac{s_{1}^{2}}{n_{1}}+\frac{s_{2}^{2}}{n_{2}}}}\).

Hypothesis Testing Chi Square

The Chi square test is a hypothesis testing method that is used to check whether the variables in a population are independent or not. It is used when the test statistic is chi-squared distributed.

One Tailed Hypothesis Testing

One tailed hypothesis testing is done when the rejection region is only in one direction. It can also be known as directional hypothesis testing because the effects can be tested in one direction only. This type of testing is further classified into the right tailed test and left tailed test.

Right Tailed Hypothesis Testing

The right tail test is also known as the upper tail test. This test is used to check whether the population parameter is greater than some value. The null and alternative hypotheses for this test are given as follows:

\(H_{0}\): The population parameter is ≤ some value

\(H_{1}\): The population parameter is > some value.

If the test statistic has a greater value than the critical value then the null hypothesis is rejected

Right Tail Hypothesis Testing

Left Tailed Hypothesis Testing

The left tail test is also known as the lower tail test. It is used to check whether the population parameter is less than some value. The hypotheses for this hypothesis testing can be written as follows:

\(H_{0}\): The population parameter is ≥ some value

\(H_{1}\): The population parameter is < some value.

The null hypothesis is rejected if the test statistic has a value lesser than the critical value.

Left Tail Hypothesis Testing

Two Tailed Hypothesis Testing

In this hypothesis testing method, the critical region lies on both sides of the sampling distribution. It is also known as a non - directional hypothesis testing method. The two-tailed test is used when it needs to be determined if the population parameter is assumed to be different than some value. The hypotheses can be set up as follows:

\(H_{0}\): the population parameter = some value

\(H_{1}\): the population parameter ≠ some value

The null hypothesis is rejected if the test statistic has a value that is not equal to the critical value.

Two Tail Hypothesis Testing

Hypothesis Testing Steps

Hypothesis testing can be easily performed in five simple steps. The most important step is to correctly set up the hypotheses and identify the right method for hypothesis testing. The basic steps to perform hypothesis testing are as follows:

  • Step 1: Set up the null hypothesis by correctly identifying whether it is the left-tailed, right-tailed, or two-tailed hypothesis testing.
  • Step 2: Set up the alternative hypothesis.
  • Step 3: Choose the correct significance level, \(\alpha\), and find the critical value.
  • Step 4: Calculate the correct test statistic (z, t or \(\chi\)) and p-value.
  • Step 5: Compare the test statistic with the critical value or compare the p-value with \(\alpha\) to arrive at a conclusion. In other words, decide if the null hypothesis is to be rejected or not.

Hypothesis Testing Example

The best way to solve a problem on hypothesis testing is by applying the 5 steps mentioned in the previous section. Suppose a researcher claims that the mean average weight of men is greater than 100kgs with a standard deviation of 15kgs. 30 men are chosen with an average weight of 112.5 Kgs. Using hypothesis testing, check if there is enough evidence to support the researcher's claim. The confidence interval is given as 95%.

Step 1: This is an example of a right-tailed test. Set up the null hypothesis as \(H_{0}\): \(\mu\) = 100.

Step 2: The alternative hypothesis is given by \(H_{1}\): \(\mu\) > 100.

Step 3: As this is a one-tailed test, \(\alpha\) = 100% - 95% = 5%. This can be used to determine the critical value.

1 - \(\alpha\) = 1 - 0.05 = 0.95

0.95 gives the required area under the curve. Now using a normal distribution table, the area 0.95 is at z = 1.645. A similar process can be followed for a t-test. The only additional requirement is to calculate the degrees of freedom given by n - 1.

Step 4: Calculate the z test statistic. This is because the sample size is 30. Furthermore, the sample and population means are known along with the standard deviation.

z = \(\frac{\overline{x}-\mu}{\frac{\sigma}{\sqrt{n}}}\).

\(\mu\) = 100, \(\overline{x}\) = 112.5, n = 30, \(\sigma\) = 15

z = \(\frac{112.5-100}{\frac{15}{\sqrt{30}}}\) = 4.56

Step 5: Conclusion. As 4.56 > 1.645 thus, the null hypothesis can be rejected.

Hypothesis Testing and Confidence Intervals

Confidence intervals form an important part of hypothesis testing. This is because the alpha level can be determined from a given confidence interval. Suppose a confidence interval is given as 95%. Subtract the confidence interval from 100%. This gives 100 - 95 = 5% or 0.05. This is the alpha value of a one-tailed hypothesis testing. To obtain the alpha value for a two-tailed hypothesis testing, divide this value by 2. This gives 0.05 / 2 = 0.025.

Important Notes on Hypothesis Testing

  • Hypothesis testing is a technique that is used to verify whether the results of an experiment are statistically significant.
  • It involves the setting up of a null hypothesis and an alternate hypothesis.
  • There are three types of tests that can be conducted under hypothesis testing - z test, t test, and chi square test.
  • Hypothesis testing can be classified as right tail, left tail, and two tail tests.

Examples on Hypothesis Testing

  • Example 1: The average weight of a dumbbell in a gym is 90lbs. However, a physical trainer believes that the average weight might be higher. A random sample of 5 dumbbells with an average weight of 110lbs and a standard deviation of 18lbs. Using hypothesis testing check if the physical trainer's claim can be supported for a 95% confidence level. Solution: As the sample size is lesser than 30, the t-test is used. \(H_{0}\): \(\mu\) = 90, \(H_{1}\): \(\mu\) > 90 \(\overline{x}\) = 110, \(\mu\) = 90, n = 5, s = 18. \(\alpha\) = 0.05 Using the t-distribution table, the critical value is 2.132 t = \(\frac{\overline{x}-\mu}{\frac{s}{\sqrt{n}}}\) t = 2.484 As 2.484 > 2.132, the null hypothesis is rejected. Answer: The average weight of the dumbbells may be greater than 90lbs
  • Example 2: The average score on a test is 80 with a standard deviation of 10. With a new teaching curriculum introduced it is believed that this score will change. On random testing, the score of 38 students, the mean was found to be 88. With a 0.05 significance level, is there any evidence to support this claim? Solution: This is an example of two-tail hypothesis testing. The z test will be used. \(H_{0}\): \(\mu\) = 80, \(H_{1}\): \(\mu\) ≠ 80 \(\overline{x}\) = 88, \(\mu\) = 80, n = 36, \(\sigma\) = 10. \(\alpha\) = 0.05 / 2 = 0.025 The critical value using the normal distribution table is 1.96 z = \(\frac{\overline{x}-\mu}{\frac{\sigma}{\sqrt{n}}}\) z = \(\frac{88-80}{\frac{10}{\sqrt{36}}}\) = 4.8 As 4.8 > 1.96, the null hypothesis is rejected. Answer: There is a difference in the scores after the new curriculum was introduced.
  • Example 3: The average score of a class is 90. However, a teacher believes that the average score might be lower. The scores of 6 students were randomly measured. The mean was 82 with a standard deviation of 18. With a 0.05 significance level use hypothesis testing to check if this claim is true. Solution: The t test will be used. \(H_{0}\): \(\mu\) = 90, \(H_{1}\): \(\mu\) < 90 \(\overline{x}\) = 110, \(\mu\) = 90, n = 6, s = 18 The critical value from the t table is -2.015 t = \(\frac{\overline{x}-\mu}{\frac{s}{\sqrt{n}}}\) t = \(\frac{82-90}{\frac{18}{\sqrt{6}}}\) t = -1.088 As -1.088 > -2.015, we fail to reject the null hypothesis. Answer: There is not enough evidence to support the claim.

FAQs on Hypothesis Testing

What is hypothesis testing.

Hypothesis testing in statistics is a tool that is used to make inferences about the population data. It is also used to check if the results of an experiment are valid.

What is the z Test in Hypothesis Testing?

The z test in hypothesis testing is used to find the z test statistic for normally distributed data . The z test is used when the standard deviation of the population is known and the sample size is greater than or equal to 30.

What is the t Test in Hypothesis Testing?

The t test in hypothesis testing is used when the data follows a student t distribution . It is used when the sample size is less than 30 and standard deviation of the population is not known.

What is the formula for z test in Hypothesis Testing?

The formula for a one sample z test in hypothesis testing is z = \(\frac{\overline{x}-\mu}{\frac{\sigma}{\sqrt{n}}}\) and for two samples is z = \(\frac{(\overline{x_{1}}-\overline{x_{2}})-(\mu_{1}-\mu_{2})}{\sqrt{\frac{\sigma_{1}^{2}}{n_{1}}+\frac{\sigma_{2}^{2}}{n_{2}}}}\).

What is the p Value in Hypothesis Testing?

The p value helps to determine if the test results are statistically significant or not. In hypothesis testing, the null hypothesis can either be rejected or not rejected based on the comparison between the p value and the alpha level.

What is One Tail Hypothesis Testing?

When the rejection region is only on one side of the distribution curve then it is known as one tail hypothesis testing. The right tail test and the left tail test are two types of directional hypothesis testing.

What is the Alpha Level in Two Tail Hypothesis Testing?

To get the alpha level in a two tail hypothesis testing divide \(\alpha\) by 2. This is done as there are two rejection regions in the curve.

11.4 One-Way ANOVA and Hypothesis Tests for Three or More Population Means

Learning objectives.

  • Conduct and interpret hypothesis tests for three or more population means using one-way ANOVA.

The purpose of a one-way ANOVA (analysis of variance) test is to determine the existence of a statistically significant difference among the means of three or more populations.  The test actually uses variances to help determine if the population means are equal or not.

Throughout this section, we will use subscripts to identify the values for the means, sample sizes, and standard deviations for the populations:


[latex]k[/latex] is the number of populations under study, [latex]n[/latex] is the total number of observations in all of the samples combined, and [latex]\overline{\overline{x}}[/latex] is the mean of the sample means.

[latex]\begin{eqnarray*} n & = & n_1+n_2+\cdots+n_k \\ \\ \overline{\overline{x}} & = & \frac{n_1 \times \overline{x}_1 +n_2 \times \overline{x}_2 +\cdots+n_k \times \overline{x}_k}{n} \end{eqnarray*}[/latex]


A predictor variable is called a factor or independent variable .  For example age, temperature, and gender are factors.  The groups or samples are often referred to as treatments .  This terminology comes from the use of ANOVA procedures in medical and psychological research to determine if there is a difference in the effects of different treatments.

A local college wants to compare the mean GPA for players on four of its sports teams:  basketball, baseball, hockey, and lacrosse.  A random sample of players was taken from each team and their GPA recorded in the table below.

3.6 2.1 4.0 2.0
2.9 2.6 2.0 3.6
2.5 3.9 2.6 3.9
3.3 3.1 3.2 2.7
3.8 3.4 3.2 2.5

In this example, the factor is the sports team.

5 5 5 5
3.22 3.02 3 2.94

[latex]\begin{eqnarray*} k & = & 4 \\ \\ n & = & n_1+n_2+n_3+n_4 \\ & = & 5+5+5+5 \\ & = & 20 \\ \\ \overline{\overline{x}} & = & \frac{n_1 \times \overline{x}_1+n_2 \times \overline{x}_2+n_3 \times \overline{x}_3+n_4 \times \overline{x}_4}{n} \\ & = & \frac{5 \times 3.22+5 \times 3.02+5 \times 3+5 \times 2.94}{20}  \\& = & 3.045 \end{eqnarray*}[/latex]

The following assumptions are required to use a one-way ANOVA test:

  • Each population from which a sample is taken is normally distributed.
  • All samples are randomly selected and independently taken from the populations.
  • The populations are assumed to have equal variances.
  • The population data is numerical (interval or ratio level).

The logic behind one-way ANOVA is to compare population means based on two independent estimates of the (assumed) equal variance [latex]\sigma^2[/latex] between the populations:

  • One estimate of the equal variance [latex]\sigma^2[/latex] is based on the variability among the sample means themselves (called the between-groups estimate of population variance).
  • One estimate of the equal variance [latex]\sigma^2[/latex] is based on the variability of the data within each sample (called the within-groups estimate of population variance).

The one-way ANOVA procedure compares these two estimates of the population variance [latex]\sigma^2[/latex] to determine if the population means are equal or if there is a difference in the population means.  Because ANOVA involves the comparison of two estimates of variance, an [latex]F[/latex]-distribution is used to conduct the ANOVA test.  The test statistic is an [latex]F[/latex]-score that is the ratio of the two estimates of population variance:

[latex]\displaystyle{F=\frac{\mbox{variance between groups}}{\mbox{variance within groups}}}[/latex]

The degrees of freedom for the [latex]F[/latex]-distribution are [latex]df_1=k-1[/latex] and [latex]df_2=n-k[/latex] where [latex]k[/latex] is the number of populations and [latex]n[/latex] is the total number of observations in all of the samples combined.

The variance between groups estimate of the population variance is called the mean square due to treatment , [latex]MST[/latex].  The [latex]MST[/latex] is the estimate of the population variance determined by the variance of the sample means from the overall sample mean [latex]\overline{\overline{x}}[/latex].  When the population means are equal, [latex]MST[/latex] provides an unbiased estimate of the population variance.  When the population means are not equal, [latex]MST[/latex] provides an overestimate of the population variance.

[latex]\begin{eqnarray*} SST & = & n_1 \times (\overline{x}_1-\overline{\overline{x}})^2+n_2\times (\overline{x}_2-\overline{\overline{x}})^2+ \cdots +n_k \times (\overline{x}_k-\overline{\overline{x}})^2 \\  \\ MST & =& \frac{SST}{k-1} \end{eqnarray*}[/latex]

The variance within groups estimate of the population variance is called the mean square due to error , [latex]MSE[/latex].  The [latex]MSE[/latex] is the pooled estimate of the population variance using the sample variances as estimates for the population variance.  The [latex]MSE[/latex] always provides an unbiased estimate of the population variance because it is not affected by whether or not the population means are equal.

[latex]\begin{eqnarray*} SSE & = & (n_1-1) \times s_1^2+ (n_2-1) \times s_2^2+ \cdots + (n_k-1) \times s_k^2\\  \\ MSE & =& \frac{SSE}{n -k} \end{eqnarray*}[/latex]

The one-way ANOVA test depends on the fact that the variance between groups [latex]MST[/latex] is influenced by differences between the population means, which results in [latex]MST[/latex] being either an unbiased or overestimate of the population variance.  Because the variance within groups [latex]MSE[/latex] compares values of each group to its own group mean, [latex]MSE[/latex] is not affected by differences between the population means and is always an unbiased estimate of the population variance.

The null hypothesis in a one-way ANOVA test is that the population means are all equal and the alternative hypothesis is that there is a difference in the population means.  The [latex]F[/latex]-score for the one-way ANOVA test is [latex]\displaystyle{F=\frac{MST}{MSE}}[/latex] with [latex]df_1=k-1[/latex] and [latex]df_2=n-k[/latex].  The p -value for the test is the area in the right tail of the [latex]F[/latex]-distribution, to the right of the [latex]F[/latex]-score.

  • When the variance between groups [latex]MST[/latex] and variance within groups [latex]MSE[/latex] are close in value, the [latex]F[/latex]-score is close to 1 and results in a large p -value.  In this case, the conclusion is that the population means are equal.
  • When the variance between groups [latex]MST[/latex] is significantly larger than the variability within groups [latex]MSE[/latex], the [latex]F[/latex]-score is large and results in a small p -value.  In this case, the conclusion is that there is a difference in the population means.

Steps to Conduct a Hypothesis Test for Three or More Population Means

  • Verify that the one-way ANOVA assumptions are met.

[latex]\begin{eqnarray*} \\ H_0: &  &  \mu_1=\mu_2=\cdots=\mu_k\end{eqnarray*}[/latex].

[latex]\begin{eqnarray*} \\ H_a: &  & \mbox{at least one population mean is different from the others} \\ \\ \end{eqnarray*}[/latex]

  • Collect the sample information for the test and identify the significance level [latex]\alpha[/latex].

[latex]\begin{eqnarray*}F & = & \frac{MST}{MSE} \\ \\ df_1 & = & k-1 \\ \\ df_2 &  = & n-k \\ \\ \end{eqnarray*}[/latex]

  • The results of the sample data are significant.  There is sufficient evidence to conclude that the null hypothesis [latex]H_0[/latex] is an incorrect belief and that the alternative hypothesis [latex]H_a[/latex] is most likely correct.
  • The results of the sample data are not significant.  There is not sufficient evidence to conclude that the alternative hypothesis [latex]H_a[/latex] may be correct.
  • Write down a concluding sentence specific to the context of the question.

Assume the populations are normally distributed and have equal variances.  At the 5% significance level, is there a difference in the average GPA between the sports team.

Let basketball be population 1, let baseball be population 2, let hockey be population 3, and let lacrosse be population 4. From the question we have the following information:

[latex]n_1=5[/latex] [latex]n_2=5[/latex] [latex]n_3=5[/latex] [latex]n_4=5[/latex]
[latex]\overline{x}_1=3.22[/latex] [latex]\overline{x}_2=3.02[/latex] [latex]\overline{x}_3=3[/latex] [latex]\overline{x}_4=2.94[/latex]
[latex]s_1^2=0.277[/latex] [latex]s_2^2=0.487[/latex] [latex]s_3^2=0.56[/latex] [latex]s_4^2=0.613[/latex]

Previously, we found [latex]k=4[/latex], [latex]n=20[/latex], and [latex]\overline{\overline{x}}=3.045[/latex].


[latex]\begin{eqnarray*} H_0: & & \mu_1=\mu_2=\mu_3=\mu_4 \\   H_a: & & \mbox{at least one population mean is different from the others} \end{eqnarray*}[/latex]

To calculate out the [latex]F[/latex]-score, we need to find [latex]MST[/latex] and [latex]MSE[/latex].

[latex]\begin{eqnarray*} SST & = & n_1 \times (\overline{x}_1-\overline{\overline{x}})^2+n_2\times (\overline{x}_2-\overline{\overline{x}})^2+n_3 \times (\overline{x}_3-\overline{\overline{x}})^2  +n_4 \times (\overline{x}_4-\overline{\overline{x}})^2\\  & = & 5 \times (3.22-3.045)^2+5 \times (3.02-3.045)^2+5 \times (3-3.045)^2 \\ &  & +5 \times (2.94 -3.045)^2 \\ & = & 0.2215 \\ \\ MST & = & \frac{SST}{k-1} \\ & = & \frac{0.2215 }{4-1} \\ & = & 0.0738...\\ \\  SSE & = & (n_1-1) \times s_1^2+ (n_2-1) \times s_2^2+  (n_3-1) \times s_3^2+ (n_4-1) \times s_4^2\\  & = &( 5-1) \times 0.277+(5-1) \times 0.487+(5-1) \times 0.56 +(5-1)\times 0.623 \\ & = & 7.788 \\ \\ MSE & = & \frac{SSE}{n-k} \\ & = & \frac{7.788 }{20-4} \\ & = & 0.48675\end{eqnarray*}[/latex]

The p -value is the area in the right tail of the [latex]F[/latex]-distribution.  To use the f.dist.rt  function, we need to calculate out the [latex]F[/latex]-score and the degrees of freedom:

[latex]\begin{eqnarray*} F & = &\frac{MST}{MSE} \\ & = & \frac{0.0738...}{0.48675} \\ & = & 0.15168... \\ \\ df_1 & = & k-1 \\ & = & 4-1 \\ & = & 3 \\ \\df_2 & = & n-k \\ & = & 20-4 \\ & = & 16\end{eqnarray*}[/latex]

0.15168… 0.9271

So the p -value[latex]=0.9271[/latex].


Because p -value[latex]=0.9271 \gt 0.05=\alpha[/latex], we do not reject the null hypothesis.  At the 5% significance level there is  enough evidence to suggest that the mean GPA for the sports teams are the same.

  • The null hypothesis [latex]\mu_1=\mu_2=\mu_3=\mu_4[/latex] is the claim that the mean GPA for the sports teams are all equal.
  • The alternative hypothesis is the claim that at least one of the population means is not equal to the others.  The alternative hypothesis does not say that all of the population means are not equal, only that at least one of them is not equal to the others.
  • The function is f.dist.rt because we are finding the area in the right tail of an [latex]F[/latex]-distribution.
  • Field 1 is the value of [latex]F[/latex].
  • Field 2 is the value of [latex]df_1[/latex].
  • Field 3 is the value of [latex]df_2[/latex].
  • The p -value of 0.9271 is a large probability compared to the significance level, and so is likely to happen assuming the null hypothesis is true.  This suggests that the assumption that the null hypothesis is true is most likely correct, and so the conclusion of the test is to not reject the null hypothesis.  In other words, the population means are all equal.

ANOVA Summary Tables

The calculation of the [latex]MST[/latex], [latex]MSE[/latex], and the [latex]F[/latex]-score for a one-way ANOVA test can be time consuming, even with the help of software like Excel.  However, Excel has a built-in one-way ANOVA summary table that not only generates the averages, variances, [latex]MST[/latex] and [latex]MSE[/latex], but also calculates the required [latex]F[/latex]-score and p -value for the test.


In order to create a one-way ANOVA summary table, we need to use the Analysis ToolPak.  Follow these instructions to add the Analysis ToolPak.

  • Enter the data into an Excel worksheet.
  • Go to the Data tab and click on Data Analysis .  If you do not see Data Analysis in the Data tab, you will need to install the Analysis ToolPak.
  • In the Data Analysis window, select Anova:  Single Factor .  Click OK .
  • In the Inpu t range, enter the cell range for the data.
  • In the Grouped By box, select rows if your data is entered as rows (the default is columns).
  • Click on Labels in first row if the you included the column headings in the input range.
  • In the Alpha box, enter the significance level for the test.
  • From the Output Options , select the location where you want the output to appear.

This website provides additional information on using Excel to create a one-way ANOVA summary table.

Because we are using the p -value approach to hypothesis testing, it is not crucial that we enter the actual significance level we are using for the test.  The p -value (the area in the right tail of the [latex]F[/latex]-distribution) is not affected by significance level.  For the critical-value approach to hypothesis testing, we must enter the correct significance level for the test because the critical value does depend on the significance level.

Let basketball be population 1, let baseball be population 2, let hockey be population 3, and let lacrosse be population 4.

The ANOVA summary table generated by Excel is shown below:

Basketball 5 16.1 3.22 0.277
Baseball 5 15.1 3.02 0.487
Hockey 5 15 3 0.56
Lacrosse 5 14.7 2.94 0.623
Between Groups 0.2215 3 0.073833 0.151686 0.927083 3.238872
Within Groups 7.788 16 0.48675
Total 8.0095 19

The p -value for the test is in the P -value column of the between groups row .  So the p -value[latex]=0.9271[/latex].

  • In the top part of the ANOVA summary table (under the Summary heading), we have the averages and variances for each of the groups (basketball, baseball, hockey, and lacrosse).
  • The value of [latex]SST[/latex] (in the SS column of the between groups row).
  • The value of [latex]MST[/latex] (in the MS column of the between group s row).
  • The value of [latex]SSE[/latex] (in the SS column of the within groups row).
  • The value of [latex]MSE[/latex] (in the MS column of the within groups row).
  • The value of the [latex]F[/latex]-score (in the F column of the between groups row).
  • The p -value (in the p -value column of the between groups row).

A fourth grade class is studying the environment.  One of the assignments is to grow bean plants in different soils.  Tommy chose to grow his bean plants in soil found outside his classroom mixed with dryer lint.  Tara chose to grow her bean plants in potting soil bought at the local nursery.  Nick chose to grow his bean plants in soil from his mother’s garden.  No chemicals were used on the plants, only water.  They were grown inside the classroom next to a large window.  Each child grew five plants.  At the end of the growing period, each plant was measured, producing the data (in inches) in the table below.

24 25 23
21 31 27
23 23 22
30 20 30
23 28 20

Assume the heights of the plants are normally distribution and have equal variance.  At the 5% significance level, does it appear that the three media in which the bean plants were grown produced the same mean height?

Let Tommy’s plants be population 1, let Tara’s plants be population 2, and let Nick’s plants be population 3.

[latex]\begin{eqnarray*} H_0: & & \mu_1=\mu_2=\mu_3 \\   H_a: & & \mbox{at least one population mean is different from the others} \end{eqnarray*}[/latex]

Tommy’s Plants 5 121 24.2 11.7
Tara’s Plants 5 127 25.4 18.3
Nick’s Plants 5 122 24.4 16.3
Between Groups 4.133333 2 2.066667 0.133909 0.875958 3.885294
Within Groups 185.2 12 15.43333
Total 189.3333 14

So the p -value[latex]=0.8760[/latex].

Because p -value[latex]=0.8760 \gt 0.05=\alpha[/latex], we do not reject the null hypothesis.  At the 5% significance level there is  enough evidence to suggest that the mean heights of the plants grown in three media are the same.

  • The null hypothesis [latex]\mu_1=\mu_2=\mu_3[/latex] is the claim that the mean heights of the plants grown in the three different media are all equal.
  • The p -value of 0.8760 is a large probability compared to the significance level, and so is likely to happen assuming the null hypothesis is true.  This suggests that the assumption that the null hypothesis is true is most likely correct, and so the conclusion of the test is to not reject the null hypothesis.  In other words, the population means are all equal.

A statistics professor wants to study the average GPA of students in four different programs: marketing, management, accounting, and human resources.  The professor took a random sample of GPAs of students in those programs at the end of the past semester.  The data is recorded in the table below.

2.17 2.63 3.21 3.27
1.85 1.77 3.78 3.45
2.83 3.25 4.00 2.85
1.69 1.86 2.95 2.26
3.33 2.21 2.65 3.18

Assume the GPAs of the students are normally distributed and have equal variance.  At the 5% significance level, is there a difference in the average GPA of the students in the different programs?

Let marketing be population 1, let management be population 2, let accounting be population 3, and let human resources be population 4.

[latex]\begin{eqnarray*} H_0: & & \mu_1=\mu_2=\mu_3=\mu_4\\   H_a: & & \mbox{at least one population mean is different from the others} \end{eqnarray*}[/latex]

Marketing 5 11.87 2.374 0.47648
Management 5 11.72 2.344 0.37108
Accounting 5 16.59 3.318 0.31797
Human Resources 5 15.01 3.002 0.21947
Between Groups 3.459895 3 1.153298 3.330826 0.046214 3.238872
Within Groups 5.54 16 0.34625
Total 8.999895 19

So the p -value[latex]=0.0462[/latex].

Because p -value[latex]=0.0462 \lt 0.05=\alpha[/latex], we reject the null hypothesis in favour of the alternative hypothesis.  At the 5% significance level there is enough evidence to suggest that there is a difference in the average GPA of the students in the different programs.

A manufacturing company runs three different production lines to produce one of its products.  The company wants to know if the average production rate is the same for the three lines.  For each production line, a sample of eight hour shifts was taken and the number of items produced during each shift was recorded in the table below.

35 21 31
35 36 34
36 22 24
39 38 21
37 28 27
36 34 29
31 35 33
38 39 20
33 40 24

Assume the numbers of items produced on each line during an eight hour shift are normally distributed and have equal variance.  At the 1% significance level, is there a difference in the average production rate for the three lines?

Let Line 1 be population 1, let Line 2 be population 2, and let Line 3 be population 3.

Line 1 9 320 35.55556 6.027778
Line 2 9 293 32.55556 51.52778
Line 3 9 243 27 26
Between Groups 339.1852 2 169.5926 6.089096 0.007264 5.613591
Within Groups 668.4444 24 27.85185
Total 1007.63 26

So the p -value[latex]=0.0073[/latex].

Because p -value[latex]=0.0073 \lt 0.01=\alpha[/latex], we reject the null hypothesis in favour of the alternative hypothesis.  At the 1% significance level there is enough evidence to suggest that there is a difference in the average production rate of the three lines.

Concept Review

A one-way ANOVA hypothesis test determines if several population means are equal.  In order to conduct a one-way ANOVA test, the following assumptions must be met:

  • Each population from which a sample is taken is assumed to be normal.
  • All samples are randomly selected and independent.

The analysis of variance procedure compares the variation between groups [latex]MST[/latex] to the variation within groups [latex]MSE[/latex]. The ratio of these two estimates of variance is the [latex]F[/latex]-score from an [latex]F[/latex]-distribution with [latex]df_1=k-1[/latex] and [latex]df_2=n-k[/latex].  The p -value for the test is the area in the right tail of the [latex]F[/latex]-distribution.  The statistics used in an ANOVA test are summarized in the ANOVA summary table generated by Excel.

The one-way ANOVA hypothesis test for three or more population means is a well established process:

  • Write down the null and alternative hypotheses in terms of the population means.  The null hypothesis is the claim that the population means are all equal and the alternative hypothesis is the claim that at least one of the population means is different from the others.
  • Collect the sample information for the test and identify the significance level.
  • The p -value is the area in the right tail of the [latex]F[/latex]-distribution.  Use the ANOVA summary table generated by Excel to find the p -value.
  • Compare the  p -value to the significance level and state the outcome of the test.


" 13.1   One-Way ANOVA "  and " 13.2   The F Distribution and the F-Ratio " in Introductory Statistics by OpenStax  is licensed under a  Creative Commons Attribution 4.0 International License .

Introduction to Statistics Copyright © 2022 by Valerie Watts is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

  • Knowledge Base
  • Choosing the Right Statistical Test | Types & Examples

Choosing the Right Statistical Test | Types & Examples

Published on January 28, 2020 by Rebecca Bevans . Revised on June 22, 2023.

Statistical tests are used in hypothesis testing . They can be used to:

  • determine whether a predictor variable has a statistically significant relationship with an outcome variable.
  • estimate the difference between two or more groups.

Statistical tests assume a null hypothesis of no relationship or no difference between groups. Then they determine whether the observed data fall outside of the range of values predicted by the null hypothesis.

If you already know what types of variables you’re dealing with, you can use the flowchart to choose the right statistical test for your data.

Statistical tests flowchart

Table of contents

What does a statistical test do, when to perform a statistical test, choosing a parametric test: regression, comparison, or correlation, choosing a nonparametric test, flowchart: choosing a statistical test, other interesting articles, frequently asked questions about statistical tests.

Statistical tests work by calculating a test statistic – a number that describes how much the relationship between variables in your test differs from the null hypothesis of no relationship.

It then calculates a p value (probability value). The p -value estimates how likely it is that you would see the difference described by the test statistic if the null hypothesis of no relationship were true.

If the value of the test statistic is more extreme than the statistic calculated from the null hypothesis, then you can infer a statistically significant relationship between the predictor and outcome variables.

If the value of the test statistic is less extreme than the one calculated from the null hypothesis, then you can infer no statistically significant relationship between the predictor and outcome variables.

You can perform statistical tests on data that have been collected in a statistically valid manner – either through an experiment , or through observations made using probability sampling methods .

For a statistical test to be valid , your sample size needs to be large enough to approximate the true distribution of the population being studied.

To determine which statistical test to use, you need to know:

  • whether your data meets certain assumptions.
  • the types of variables that you’re dealing with.

Statistical assumptions

Statistical tests make some common assumptions about the data they are testing:

  • Independence of observations (a.k.a. no autocorrelation): The observations/variables you include in your test are not related (for example, multiple measurements of a single test subject are not independent, while measurements of multiple different test subjects are independent).
  • Homogeneity of variance : the variance within each group being compared is similar among all groups. If one group has much more variation than others, it will limit the test’s effectiveness.
  • Normality of data : the data follows a normal distribution (a.k.a. a bell curve). This assumption applies only to quantitative data .

If your data do not meet the assumptions of normality or homogeneity of variance, you may be able to perform a nonparametric statistical test , which allows you to make comparisons without any assumptions about the data distribution.

If your data do not meet the assumption of independence of observations, you may be able to use a test that accounts for structure in your data (repeated-measures tests or tests that include blocking variables).

Types of variables

The types of variables you have usually determine what type of statistical test you can use.

Quantitative variables represent amounts of things (e.g. the number of trees in a forest). Types of quantitative variables include:

  • Continuous (aka ratio variables): represent measures and can usually be divided into units smaller than one (e.g. 0.75 grams).
  • Discrete (aka integer variables): represent counts and usually can’t be divided into units smaller than one (e.g. 1 tree).

Categorical variables represent groupings of things (e.g. the different tree species in a forest). Types of categorical variables include:

  • Ordinal : represent data with an order (e.g. rankings).
  • Nominal : represent group names (e.g. brands or species names).
  • Binary : represent data with a yes/no or 1/0 outcome (e.g. win or lose).

Choose the test that fits the types of predictor and outcome variables you have collected (if you are doing an experiment , these are the independent and dependent variables ). Consult the tables below to see which test best matches your variables.

Parametric tests usually have stricter requirements than nonparametric tests, and are able to make stronger inferences from the data. They can only be conducted with data that adheres to the common assumptions of statistical tests.

The most common types of parametric test include regression tests, comparison tests, and correlation tests.

Regression tests

Regression tests look for cause-and-effect relationships . They can be used to estimate the effect of one or more continuous variables on another variable.

Predictor variable Outcome variable Research question example
What is the effect of income on longevity?
What is the effect of income and minutes of exercise per day on longevity?
Logistic regression What is the effect of drug dosage on the survival of a test subject?

Comparison tests

Comparison tests look for differences among group means . They can be used to test the effect of a categorical variable on the mean value of some other characteristic.

T-tests are used when comparing the means of precisely two groups (e.g., the average heights of men and women). ANOVA and MANOVA tests are used when comparing the means of more than two groups (e.g., the average heights of children, teenagers, and adults).

Predictor variable Outcome variable Research question example
Paired t-test What is the effect of two different test prep programs on the average exam scores for students from the same class?
Independent t-test What is the difference in average exam scores for students from two different schools?
ANOVA What is the difference in average pain levels among post-surgical patients given three different painkillers?
MANOVA What is the effect of flower species on petal length, petal width, and stem length?

Correlation tests

Correlation tests check whether variables are related without hypothesizing a cause-and-effect relationship.

These can be used to test whether two variables you want to use in (for example) a multiple regression test are autocorrelated.

Variables Research question example
Pearson’s  How are latitude and temperature related?

Non-parametric tests don’t make as many assumptions about the data, and are useful when one or more of the common statistical assumptions are violated. However, the inferences they make aren’t as strong as with parametric tests.

Predictor variable Outcome variable Use in place of…
Sign test One-sample -test
Kruskal–Wallis  ANOVA
Wilcoxon Rank-Sum test Independent t-test
Wilcoxon Signed-rank test Paired t-test

This flowchart helps you choose among parametric tests. For nonparametric alternatives, check the table above.

Choosing the right statistical test

Statistical tests commonly assume that:

  • the data are normally distributed
  • the groups that are being compared have similar variance
  • the data are independent

If your data does not meet these assumptions you might still be able to use a nonparametric statistical test , which have fewer requirements but also make weaker inferences.

A test statistic is a number calculated by a  statistical test . It describes how far your observed data is from the  null hypothesis  of no relationship between  variables or no difference among sample groups.

The test statistic tells you how different two or more groups are from the overall population mean , or how different a linear slope is from the slope predicted by a null hypothesis . Different test statistics are used in different statistical tests.

Statistical significance is a term used by researchers to state that it is unlikely their observations could have occurred under the null hypothesis of a statistical test . Significance is usually denoted by a p -value , or probability value.

Statistical significance is arbitrary – it depends on the threshold, or alpha value, chosen by the researcher. The most common threshold is p < 0.05, which means that the data is likely to occur less than 5% of the time under the null hypothesis .

When the p -value falls below the chosen alpha value, then we say the result of the test is statistically significant.

Quantitative variables are any variables where the data represent amounts (e.g. height, weight, or age).

Categorical variables are any variables where the data represent groups. This includes rankings (e.g. finishing places in a race), classifications (e.g. brands of cereal), and binary outcomes (e.g. coin flips).

You need to know what type of variables you are working with to choose the right statistical test for your data and interpret your results .

Discrete and continuous variables are two types of quantitative variables :

  • Discrete variables represent counts (e.g. the number of objects in a collection).
  • Continuous variables represent measurable amounts (e.g. water volume or weight).

Compare t-test of difference in means of 3 samples

I am comparing the statistical significance of the difference in means (say average age) using three samples (say classes) a, b, and c.

The results of t-test show that there is no significant difference between the mean of samples a and b, and sample b and c (average age of students in class a is not different than in class b, similarly classes b and c). However, there a significant difference between samples a and c.

From the first two results, we can conclude that a=b and b=c, which means that a=c. However, this contradicts with the third result.

What is the best way of analyzing this?

  • group-differences

M_M's user avatar

  • 3 $\begingroup$ Failure to reject a null does not mean the null is actually true ; it simply means your sample size(s) were too small to detect whatever small difference there may have been. Imagine for example that $\mu_a<\mu_b<\mu_c$ but only the outer two sample means were far enough apart to detect a difference at the same sizes you had. $\endgroup$ –  Glen_b Nov 17, 2015 at 4:54

2 Answers 2

I would make a recommendation that, rather than conducting three $t$-tests, you conduct an ANOVA test. This is a test designed to assess equality of means of three or more groups and, if memory serves correctly, requires the same assumptions as the $t$-test but for three or more groups.

In addition, your statement had a subtle flaw. The idea that two parameters are equal really means that we cannot detect a statistically significant difference between the means. Consider an example where you are evaluating the cost of a gallon of gasoline in three different cities.

City A has a cost of \$1.99 per gallon. City B has a cost of \$2.19 per gallon. City C has a cost of \$2.39 per gallon. Assume the standard deviation for each city is 11 cents (.11 dollars).

Thus A and B do not have statistically different means. B and C do not have statistically different means. However, A and C do have statistically different means. You could generate confidence intervals or execute three $t$-tests to confirm this.

Does this make sense?

Matt Brems's user avatar

Let $a$, $b$, and $c$ denote the estimated means of group A, B, and C respectively. Let: $$V = \left[\begin{array}{ccc} v_{aa} & v_{ab}&v_{ac} \\v_{ba} & v_{bb}&v_{bc} \\v_{ca} & v_{cb}&v_{cc} \end{array} \right] $$ be the estimated covariance matrix of your estimates $[a, b, c]'$. The standard error for the estimate $b-a$ would be: $$SE_{b-a} = \sqrt{v_{bb} - 2 v_{ab} + v_{aa}}$$ The t-stat would be: $$ \frac{b - a}{\sqrt{v_{bb} - 2 v_{ab} + v_{aa}}}$$ T-stat $<$ 2 roughly corresponds to significance at the 5 percent level.

$ \frac{c - b}{\sqrt{v_{cc} - 2 v_{bc} + v_{bb}}} < 2 \quad$ and $ \frac{b - a}{\sqrt{v_{bb} - 2 v_{ab} + v_{aa}}} < 2 \quad$ does not imply that $ \frac{c - a}{\sqrt{v_{cc} - 2 v_{ac} + v_{aa}}} < 2 \quad$

Hence it is possible that: $c-b$ is not significant at the 5 percent level, $b-a$ is not significant at the 5 percent level, but that $c-a$ is significant at the 5 percent level.

Matthew Gunn's user avatar

hypothesis test with 3 samples

Mathematics LibreTexts

8.6: Hypothesis Test of a Single Population Mean with Examples

  • Last updated
  • Save as PDF
  • Page ID 130297

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

Steps for performing Hypothesis Test of a Single Population Mean

Step 1: State your hypotheses about the population mean. Step 2: Summarize the data. State a significance level. State and check conditions required for the procedure

  • Find or identify the sample size, n, the sample mean, \(\bar{x}\) and the sample standard deviation, s .

The sampling distribution for the one-mean test statistic is, approximately, T- distribution if the following conditions are met

  • Sample is random with independent observations .
  • Sample is large. The population must be Normal or the sample size must be at least 30.

Step 3: Perform the procedure based on the assumption that \(H_{0}\) is true

  • Find the Estimated Standard Error: \(SE=\frac{s}{\sqrt{n}}\).
  • Compute the observed value of the test statistic: \(T_{obs}=\frac{\bar{x}-\mu_{0}}{SE}\).
  • Check the type of the test (right-, left-, or two-tailed)
  • Find the p-value in order to measure your level of surprise.

Step 4: Make a decision about \(H_{0}\) and \(H_{a}\)

  • Do you reject or not reject your null hypothesis?

Step 5: Make a conclusion

  • What does this mean in the context of the data?

The following examples illustrate a left-, right-, and two-tailed test.

Example \(\pageindex{1}\).

\(H_{0}: \mu = 5, H_{a}: \mu < 5\)

Test of a single population mean. \(H_{a}\) tells you the test is left-tailed. The picture of the \(p\)-value is as follows:

Normal distribution curve of a single population mean with a value of 5 on the x-axis and the p-value points to the area on the left tail of the curve.

Exercise \(\PageIndex{1}\)

\(H_{0}: \mu = 10, H_{a}: \mu < 10\)

Assume the \(p\)-value is 0.0935. What type of test is this? Draw the picture of the \(p\)-value.

left-tailed test


Example \(\PageIndex{2}\)

\(H_{0}: \mu \leq 0.2, H_{a}: \mu > 0.2\)

This is a test of a single population proportion. \(H_{a}\) tells you the test is right-tailed . The picture of the p -value is as follows:

Normal distribution curve of a single population proportion with the value of 0.2 on the x-axis. The p-value points to the area on the right tail of the curve.

Exercise \(\PageIndex{2}\)

\(H_{0}: \mu \leq 1, H_{a}: \mu > 1\)

Assume the \(p\)-value is 0.1243. What type of test is this? Draw the picture of the \(p\)-value.

right-tailed test


Example \(\PageIndex{3}\)

\(H_{0}: \mu = 50, H_{a}: \mu \neq 50\)

This is a test of a single population mean. \(H_{a}\) tells you the test is two-tailed . The picture of the \(p\)-value is as follows.

Normal distribution curve of a single population mean with a value of 50 on the x-axis. The p-value formulas, 1/2(p-value), for a two-tailed test is shown for the areas on the left and right tails of the curve.

Exercise \(\PageIndex{3}\)

\(H_{0}: \mu = 0.5, H_{a}: \mu \neq 0.5\)

Assume the p -value is 0.2564. What type of test is this? Draw the picture of the \(p\)-value.

two-tailed test


Full Hypothesis Test Examples

Example \(\pageindex{4}\).

Statistics students believe that the mean score on the first statistics test is 65. A statistics instructor thinks the mean score is higher than 65. He samples ten statistics students and obtains the scores 65 65 70 67 66 63 63 68 72 71. He performs a hypothesis test using a 5% level of significance. The data are assumed to be from a normal distribution.

Set up the hypothesis test:

A 5% level of significance means that \(\alpha = 0.05\). This is a test of a single population mean .

\(H_{0}: \mu = 65  H_{a}: \mu > 65\)

Since the instructor thinks the average score is higher, use a "\(>\)". The "\(>\)" means the test is right-tailed.

Determine the distribution needed:

Random variable: \(\bar{X} =\) average score on the first statistics test.

Distribution for the test: If you read the problem carefully, you will notice that there is no population standard deviation given . You are only given \(n = 10\) sample data values. Notice also that the data come from a normal distribution. This means that the distribution for the test is a student's \(t\).

Use \(t_{df}\). Therefore, the distribution for the test is \(t_{9}\) where \(n = 10\) and \(df = 10 - 1 = 9\).

The sample mean and sample standard deviation are calculated as 67 and 3.1972 from the data.

Calculate the \(p\)-value using the Student's \(t\)-distribution:

\[t_{obs} = \dfrac{\bar{x}-\mu_{\bar{x}}}{\left(\dfrac{s}{\sqrt{n}}\right)}=\dfrac{67-65}{\left(\dfrac{3.1972}{\sqrt{10}}\right)}\]

Use the T-table or Excel's t_dist() function to find p-value:

\(p\text{-value} = P(\bar{x} > 67) =P(T >1.9782 )= 1-0.9604=0.0396\)

Interpretation of the p -value: If the null hypothesis is true, then there is a 0.0396 probability (3.96%) that the sample mean is 65 or more.

Normal distribution curve of average scores on the first statistic tests with 65 and 67 values on the x-axis. A vertical upward line extends from 67 to the curve. The p-value points to the area to the right of 67.

Compare \(\alpha\) and the \(p-\text{value}\):

Since \(α = 0.05\) and \(p\text{-value} = 0.0396\). \(\alpha > p\text{-value}\).

Make a decision: Since \(\alpha > p\text{-value}\), reject \(H_{0}\).

This means you reject \(\mu = 65\). In other words, you believe the average test score is more than 65.

Conclusion: At a 5% level of significance, the sample data show sufficient evidence that the mean (average) test score is more than 65, just as the math instructor thinks.

The \(p\text{-value}\) can easily be calculated.

Put the data into a list. Press STAT and arrow over to TESTS . Press 2:T-Test . Arrow over to Data and press ENTER . Arrow down and enter 65 for \(\mu_{0}\), the name of the list where you put the data, and 1 for Freq: . Arrow down to \(\mu\): and arrow over to \(> \mu_{0}\). Press ENTER . Arrow down to Calculate and press ENTER . The calculator not only calculates the \(p\text{-value}\) (p = 0.0396) but it also calculates the test statistic ( t -score) for the sample mean, the sample mean, and the sample standard deviation. \(\mu > 65\) is the alternative hypothesis. Do this set of instructions again except arrow to Draw (instead of Calculate ). Press ENTER . A shaded graph appears with \(t = 1.9781\) (test statistic) and \(p = 0.0396\) (\(p\text{-value}\)). Make sure when you use Draw that no other equations are highlighted in \(Y =\) and the plots are turned off.

Exercise \(\PageIndex{4}\)

It is believed that a stock price for a particular company will grow at a rate of $5 per week with a standard deviation of $1. An investor believes the stock won’t grow as quickly. The changes in stock price is recorded for ten weeks and are as follows: $4, $3, $2, $3, $1, $7, $2, $1, $1, $2. Perform a hypothesis test using a 5% level of significance. State the null and alternative hypotheses, find the p -value, state your conclusion, and identify the Type I and Type II errors.

  • \(H_{0}: \mu = 5\)
  • \(H_{a}: \mu < 5\)
  • \(p = 0.0082\)

Because \(p < \alpha\), we reject the null hypothesis. There is sufficient evidence to suggest that the stock price of the company grows at a rate less than $5 a week.

  • Type I Error: To conclude that the stock price is growing slower than $5 a week when, in fact, the stock price is growing at $5 a week (reject the null hypothesis when the null hypothesis is true).
  • Type II Error: To conclude that the stock price is growing at a rate of $5 a week when, in fact, the stock price is growing slower than $5 a week (do not reject the null hypothesis when the null hypothesis is false).

Example \(\PageIndex{5}\)

The National Institute of Standards and Technology provides exact data on conductivity properties of materials. Following are conductivity measurements for 11 randomly selected pieces of a particular type of glass.

1.11; 1.07; 1.11; 1.07; 1.12; 1.08; .98; .98 1.02; .95; .95

Is there convincing evidence that the average conductivity of this type of glass is greater than one? Use a significance level of 0.05. Assume the population is normal.

Let’s follow a four-step process to answer this statistical question.

  • \(H_{0}: \mu \leq 1\)
  • \(H_{a}: \mu > 1\)
  • Plan : We are testing a sample mean without a known population standard deviation. Therefore, we need to use a Student's-t distribution. Assume the underlying population is normal.
  • Do the calculations : \(p\text{-value} ( = 0.036)\)

4. State the Conclusions : Since the \(p\text{-value} (= 0.036)\) is less than our alpha value, we will reject the null hypothesis. It is reasonable to state that the data supports the claim that the average conductivity level is greater than one.

The hypothesis test itself has an established process. This can be summarized as follows:

  • Determine \(H_{0}\) and \(H_{a}\). Remember, they are contradictory.
  • Determine the random variable.
  • Determine the distribution for the test.
  • Draw a graph, calculate the test statistic, and use the test statistic to calculate the \(p\text{-value}\). (A t -score is an example of test statistics.)
  • Compare the preconceived α with the p -value, make a decision (reject or do not reject H 0 ), and write a clear conclusion using English sentences.

Notice that in performing the hypothesis test, you use \(\alpha\) and not \(\beta\). \(\beta\) is needed to help determine the sample size of the data that is used in calculating the \(p\text{-value}\). Remember that the quantity \(1 – \beta\) is called the Power of the Test . A high power is desirable. If the power is too low, statisticians typically increase the sample size while keeping α the same.If the power is low, the null hypothesis might not be rejected when it should be.

Chapter 3: Hypothesis Testing

The previous two chapters introduced methods for organizing and summarizing sample data, and using sample statistics to estimate population parameters. This chapter introduces the next major topic of inferential statistics: hypothesis testing.

A hypothesis is a statement or claim about a property of a population.

The Fundamentals of Hypothesis Testing

When conducting scientific research, typically there is some known information, perhaps from some past work or from a long accepted idea. We want to test whether this claim is believable. This is the basic idea behind a hypothesis test:

  • State what we think is true.
  • Quantify how confident we are about our claim.
  • Use sample statistics to make inferences about population parameters.

For example, past research tells us that the average life span for a hummingbird is about four years. You have been studying the hummingbirds in the southeastern United States and find a sample mean lifespan of 4.8 years. Should you reject the known or accepted information in favor of your results? How confident are you in your estimate? At what point would you say that there is enough evidence to reject the known information and support your alternative claim? How far from the known mean of four years can the sample mean be before we reject the idea that the average lifespan of a hummingbird is four years?

Hypothesis testing is a procedure, based on sample evidence and probability, used to test claims regarding a characteristic of a population.

A hypothesis is a claim or statement about a characteristic of a population of interest to us. A hypothesis test is a way for us to use our sample statistics to test a specific claim.

The population mean weight is known to be 157 lb. We want to test the claim that the mean weight has increased.

Two years ago, the proportion of infected plants was 37%. We believe that a treatment has helped, and we want to test the claim that there has been a reduction in the proportion of infected plants.

Components of a Formal Hypothesis Test

The null hypothesis is a statement about the value of a population parameter, such as the population mean (µ) or the population proportion ( p ). It contains the condition of equality and is denoted as H 0 (H-naught).

H 0 : µ = 157 or H 0 : p = 0.37

The alternative hypothesis is the claim to be tested, the opposite of the null hypothesis. It contains the value of the parameter that we consider plausible and is denoted as H 1 .

H 1 : µ > 157 or H 1 : p ≠ 0.37

The test statistic is a value computed from the sample data that is used in making a decision about the rejection of the null hypothesis. The test statistic converts the sample mean ( x̄ ) or sample proportion ( p̂ ) to a Z- or t-score under the assumption that the null hypothesis is true . It is used to decide whether the difference between the sample statistic and the hypothesized claim is significant.

The p-value is the area under the curve to the left or right of the test statistic. It is compared to the level of significance ( α ).

The critical value is the value that defines the rejection zone (the test statistic values that would lead to rejection of the null hypothesis). It is defined by the level of significance.

The level of significance ( α ) is the probability that the test statistic will fall into the critical region when the null hypothesis is true. This level is set by the researcher.

The conclusion is the final decision of the hypothesis test. The conclusion must always be clearly stated, communicating the decision based on the components of the test. It is important to realize that we never prove or accept the null hypothesis. We are merely saying that the sample evidence is not strong enough to warrant the rejection of the null hypothesis. The conclusion is made up of two parts:

1) Reject or fail to reject the null hypothesis, and 2) there is or is not enough evidence to support the alternative claim.

Option 1) Reject the null hypothesis (H 0 ). This means that you have enough statistical evidence to support the alternative claim (H 1 ).

Option 2) Fail to reject the null hypothesis (H 0 ). This means that you do NOT have enough evidence to support the alternative claim (H 1 ).

Another way to think about hypothesis testing is to compare it to the US justice system. A defendant is innocent until proven guilty (Null hypothesis—innocent). The prosecuting attorney tries to prove that the defendant is guilty (Alternative hypothesis—guilty). There are two possible conclusions that the jury can reach. First, the defendant is guilty (Reject the null hypothesis). Second, the defendant is not guilty (Fail to reject the null hypothesis). This is NOT the same thing as saying the defendant is innocent! In the first case, the prosecutor had enough evidence to reject the null hypothesis (innocent) and support the alternative claim (guilty). In the second case, the prosecutor did NOT have enough evidence to reject the null hypothesis (innocent) and support the alternative claim of guilty.

The Null and Alternative Hypotheses

There are three different pairs of null and alternative hypotheses:


where c is some known value.

A Two-sided Test

This tests whether the population parameter is equal to, versus not equal to, some specific value.

H o : μ = 12 vs. H 1 : μ ≠ 12

The critical region is divided equally into the two tails and the critical values are ± values that define the rejection zones.


A forester studying diameter growth of red pine believes that the mean diameter growth will be different if a fertilization treatment is applied to the stand.

  • H o : μ = 1.2 in./ year
  • H 1 : μ ≠ 1.2 in./ year

This is a two-sided question, as the forester doesn’t state whether population mean diameter growth will increase or decrease.

A Right-sided Test

This tests whether the population parameter is equal to, versus greater than, some specific value.

H o : μ = 12 vs. H 1 : μ > 12

The critical region is in the right tail and the critical value is a positive value that defines the rejection zone.


A biologist believes that there has been an increase in the mean number of lakes infected with milfoil, an invasive species, since the last study five years ago.

  • H o : μ = 15 lakes
  • H 1 : μ >15 lakes

This is a right-sided question, as the biologist believes that there has been an increase in population mean number of infected lakes.

A Left-sided Test

This tests whether the population parameter is equal to, versus less than, some specific value.

H o : μ = 12 vs. H 1 : μ < 12

The critical region is in the left tail and the critical value is a negative value that defines the rejection zone.


A scientist’s research indicates that there has been a change in the proportion of people who support certain environmental policies. He wants to test the claim that there has been a reduction in the proportion of people who support these policies.

  • H o : p = 0.57
  • H 1 : p < 0.57

This is a left-sided question, as the scientist believes that there has been a reduction in the true population proportion.

Statistically Significant

When the observed results (the sample statistics) are unlikely (a low probability) under the assumption that the null hypothesis is true, we say that the result is statistically significant, and we reject the null hypothesis. This result depends on the level of significance, the sample statistic, sample size, and whether it is a one- or two-sided alternative hypothesis.

Types of Errors

When testing, we arrive at a conclusion of rejecting the null hypothesis or failing to reject the null hypothesis. Such conclusions are sometimes correct and sometimes incorrect (even when we have followed all the correct procedures). We use incomplete sample data to reach a conclusion and there is always the possibility of reaching the wrong conclusion. There are four possible conclusions to reach from hypothesis testing. Of the four possible outcomes, two are correct and two are NOT correct.


A Type I error is when we reject the null hypothesis when it is true. The symbol α (alpha) is used to represent Type I errors. This is the same alpha we use as the level of significance. By setting alpha as low as reasonably possible, we try to control the Type I error through the level of significance.

A Type II error is when we fail to reject the null hypothesis when it is false. The symbol β (beta) is used to represent Type II errors.

In general, Type I errors are considered more serious. One step in the hypothesis test procedure involves selecting the significance level ( α ), which is the probability of rejecting the null hypothesis when it is correct. So the researcher can select the level of significance that minimizes Type I errors. However, there is a mathematical relationship between α, β , and n (sample size).

  • As α increases, β decreases
  • As α decreases, β increases
  • As sample size increases (n), both α and β decrease

The natural inclination is to select the smallest possible value for α, thinking to minimize the possibility of causing a Type I error. Unfortunately, this forces an increase in Type II errors. By making the rejection zone too small, you may fail to reject the null hypothesis, when, in fact, it is false. Typically, we select the best sample size and level of significance, automatically setting β .


Power of the Test

A Type II error ( β ) is the probability of failing to reject a false null hypothesis. It follows that 1- β is the probability of rejecting a false null hypothesis. This probability is identified as the power of the test, and is often used to gauge the test’s effectiveness in recognizing that a null hypothesis is false.

The probability that at a fixed level α significance test will reject H 0 , when a particular alternative value of the parameter is true is called the power of the test.

Power is also directly linked to sample size. For example, suppose the null hypothesis is that the mean fish weight is 8.7 lb. Given sample data, a level of significance of 5%, and an alternative weight of 9.2 lb., we can compute the power of the test to reject μ = 8.7 lb. If we have a small sample size, the power will be low. However, increasing the sample size will increase the power of the test. Increasing the level of significance will also increase power. A 5% test of significance will have a greater chance of rejecting the null hypothesis than a 1% test because the strength of evidence required for the rejection is less. Decreasing the standard deviation has the same effect as increasing the sample size: there is more information about μ .

Hypothesis Test about the Population Mean ( μ ) when the Population Standard Deviation ( σ ) is Known

We are going to examine two equivalent ways to perform a hypothesis test: the classical approach and the p-value approach. The classical approach is based on standard deviations. This method compares the test statistic (Z-score) to a critical value (Z-score) from the standard normal table. If the test statistic falls in the rejection zone, you reject the null hypothesis. The p-value approach is based on area under the normal curve. This method compares the area associated with the test statistic to alpha ( α ), the level of significance (which is also area under the normal curve). If the p-value is less than alpha, you would reject the null hypothesis.

As a past student poetically said: If the p-value is a wee value, Reject Ho

Both methods must have:

  • Data from a random sample.
  • Verification of the assumption of normality.
  • A null and alternative hypothesis.
  • A criterion that determines if we reject or fail to reject the null hypothesis.
  • A conclusion that answers the question.

There are four steps required for a hypothesis test:

  • State the null and alternative hypotheses.
  • State the level of significance and the critical value.
  • Compute the test statistic.
  • State a conclusion.

The Classical Method for Testing a Claim about the Population Mean ( μ ) when the Population Standard Deviation ( σ ) is Known

A forester studying diameter growth of red pine believes that the mean diameter growth will be different from the known mean growth of 1.35 inches/year if a fertilization treatment is applied to the stand. He conducts his experiment, collects data from a sample of 32 plots, and gets a sample mean diameter growth of 1.6 in./year. The population standard deviation for this stand is known to be 0.46 in./year. Does he have enough evidence to support his claim?

Step 1) State the null and alternative hypotheses.

  • H o : μ = 1.35 in./year
  • H 1 : μ ≠ 1.35 in./year

Step 2) State the level of significance and the critical value.

  • We will choose a level of significance of 5% ( α = 0.05).
  • For a two-sided question, we need a two-sided critical value – Z α /2 and + Z α /2 .
  • The level of significance is divided by 2 (since we are only testing “not equal”). We must have two rejection zones that can deal with either a greater than or less than outcome (to the right (+) or to the left (-)).
  • We need to find the Z-score associated with the area of 0.025. The red areas are equal to α /2 = 0.05/2 = 0.025 or 2.5% of the area under the normal curve.
  • Go into the body of values and find the negative Z-score associated with the area 0.025.


  • The negative critical value is -1.96. Since the curve is symmetric, we know that the positive critical value is 1.96.
  • ±1.96 are the critical values. These values set up the rejection zone. If the test statistic falls within these red rejection zones, we reject the null hypothesis.

Step 3) Compute the test statistic.

  • The test statistic is the number of standard deviations the sample mean is from the known mean. It is also a Z-score, just like the critical value.


  • For this problem, the test statistic is


Step 4) State a conclusion.

  • Compare the test statistic to the critical value. If the test statistic falls into the rejection zones, reject the null hypothesis. In other words, if the test statistic is greater than +1.96 or less than -1.96, reject the null hypothesis.


In this problem, the test statistic falls in the red rejection zone. The test statistic of 3.07 is greater than the critical value of 1.96.We will reject the null hypothesis. We have enough evidence to support the claim that the mean diameter growth is different from (not equal to) 1.35 in./year.

A researcher believes that there has been an increase in the average farm size in his state since the last study five years ago. The previous study reported a mean size of 450 acres with a population standard deviation ( σ ) of 167 acres. He samples 45 farms and gets a sample mean of 485.8 acres. Is there enough information to support his claim?

  • H o : μ = 450 acres
  • H 1 : μ >450 acres
  • For a one-sided question, we need a one-sided positive critical value Z α .
  • The level of significance is all in the right side (the rejection zone is just on the right side).
  • We need to find the Z-score associated with the 5% area in the right tail.


  • Go into the body of values in the standard normal table and find the Z-score that separates the lower 95% from the upper 5%.
  • The critical value is 1.645. This value sets up the rejection zone.


  • Compare the test statistic to the critical value.


  • The test statistic does not fall in the rejection zone. It is less than the critical value.

We fail to reject the null hypothesis. We do not have enough evidence to support the claim that the mean farm size has increased from 450 acres.

A researcher believes that there has been a reduction in the mean number of hours that college students spend preparing for final exams. A national study stated that students at a 4-year college spend an average of 23 hours preparing for 5 final exams each semester with a population standard deviation of 7.3 hours. The researcher sampled 227 students and found a sample mean study time of 19.6 hours. Does this indicate that the average study time for final exams has decreased? Use a 1% level of significance to test this claim.

  • H o : μ = 23 hours
  • H 1 : μ < 23 hours
  • This is a left-sided test so alpha (0.01) is all in the left tail.


  • Go into the body of values in the standard normal table and find the Z-score that defines the lower 1% of the area.
  • The critical value is -2.33. This value sets up the rejection zone.


  • The test statistic falls in the rejection zone. The test statistic of -7.02 is less than the critical value of -2.33.

We reject the null hypothesis. We have sufficient evidence to support the claim that the mean final exam study time has decreased below 23 hours.

Testing a Hypothesis using P-values

The p-value is the probability of observing our sample mean given that the null hypothesis is true. It is the area under the curve to the left or right of the test statistic. If the probability of observing such a sample mean is very small (less than the level of significance), we would reject the null hypothesis. Computations for the p-value depend on whether it is a one- or two-sided test.

Steps for a hypothesis test using p-values:

  • State the level of significance.
  • Compute the test statistic and find the area associated with it (this is the p-value).
  • Compare the p-value to alpha ( α ) and state a conclusion.

Instead of comparing Z-score test statistic to Z-score critical value, as in the classical method, we compare area of the test statistic to area of the level of significance.

The Decision Rule: If the p-value is less than alpha, we reject the null hypothesis

Computing P-values

If it is a two-sided test (the alternative claim is ≠), the p-value is equal to two times the probability of the absolute value of the test statistic. If the test is a left-sided test (the alternative claim is “<”), then the p-value is equal to the area to the left of the test statistic. If the test is a right-sided test (the alternative claim is “>”), then the p-value is equal to the area to the right of the test statistic.

Let’s look at Example 6 again.

A forester studying diameter growth of red pine believes that the mean diameter growth will be different from the known mean growth of 1.35 in./year if a fertilization treatment is applied to the stand. He conducts his experiment, collects data from a sample of 32 plots, and gets a sample mean diameter growth of 1.6 in./year. The population standard deviation for this stand is known to be 0.46 in./year. Does he have enough evidence to support his claim?

Step 2) State the level of significance.

  • For this problem, the test statistic is:


The p-value is two times the area of the absolute value of the test statistic (because the alternative claim is “not equal”).


  • Look up the area for the Z-score 3.07 in the standard normal table. The area (probability) is equal to 1 – 0.9989 = 0.0011.
  • Multiply this by 2 to get the p-value = 2 * 0.0011 = 0.0022.

Step 4) Compare the p-value to alpha and state a conclusion.

  • Use the Decision Rule (if the p-value is less than α , reject H 0 ).
  • In this problem, the p-value (0.0022) is less than alpha (0.05).
  • We reject the H 0 . We have enough evidence to support the claim that the mean diameter growth is different from 1.35 inches/year.

Let’s look at Example 7 again.


The p-value is the area to the right of the Z-score 1.44 (the hatched area).

  • This is equal to 1 – 0.9251 = 0.0749.
  • The p-value is 0.0749.


  • Use the Decision Rule.
  • In this problem, the p-value (0.0749) is greater than alpha (0.05), so we Fail to Reject the H 0 .
  • The area of the test statistic is greater than the area of alpha ( α ).

We fail to reject the null hypothesis. We do not have enough evidence to support the claim that the mean farm size has increased.

Let’s look at Example 8 again.

  • H 0 : μ = 23 hours


The p-value is the area to the left of the test statistic (the little black area to the left of -7.02). The Z-score of -7.02 is not on the standard normal table. The smallest probability on the table is 0.0002. We know that the area for the Z-score -7.02 is smaller than this area (probability). Therefore, the p-value is <0.0002.


  • In this problem, the p-value (p<0.0002) is less than alpha (0.01), so we Reject the H 0 .
  • The area of the test statistic is much less than the area of alpha ( α ).

We reject the null hypothesis. We have enough evidence to support the claim that the mean final exam study time has decreased below 23 hours.

Both the classical method and p-value method for testing a hypothesis will arrive at the same conclusion. In the classical method, the critical Z-score is the number on the z-axis that defines the level of significance ( α ). The test statistic converts the sample mean to units of standard deviation (a Z-score). If the test statistic falls in the rejection zone defined by the critical value, we will reject the null hypothesis. In this approach, two Z-scores, which are numbers on the z-axis, are compared. In the p-value approach, the p-value is the area associated with the test statistic. In this method, we compare α (which is also area under the curve) to the p-value. If the p-value is less than α , we reject the null hypothesis. The p-value is the probability of observing such a sample mean when the null hypothesis is true. If the probability is too small (less than the level of significance), then we believe we have enough statistical evidence to reject the null hypothesis and support the alternative claim.

Software Solutions

(referring to Ex. 8)


One-Sample Z

Test of mu = 23 vs. < 23
The assumed standard deviation = 7.3
99% Upper
N Mean SE Mean Bound Z P
227 19.600 0.485 20.727 -7.02 0.000

Excel does not offer 1-sample hypothesis testing.

Hypothesis Test about the Population Mean ( μ ) when the Population Standard Deviation ( σ ) is Unknown

Frequently, the population standard deviation (σ) is not known. We can estimate the population standard deviation (σ) with the sample standard deviation (s). However, the test statistic will no longer follow the standard normal distribution. We must rely on the student’s t-distribution with n-1 degrees of freedom. Because we use the sample standard deviation (s), the test statistic will change from a Z-score to a t-score.


Steps for a hypothesis test are the same that we covered in Section 2.

Just as with the hypothesis test from the previous section, the data for this test must be from a random sample and requires either that the population from which the sample was drawn be normal or that the sample size is sufficiently large (n≥30). A t-test is robust, so small departures from normality will not adversely affect the results of the test. That being said, if the sample size is smaller than 30, it is always good to verify the assumption of normality through a normal probability plot.

We will still have the same three pairs of null and alternative hypotheses and we can still use either the classical approach or the p-value approach.


Selecting the correct critical value from the student’s t-distribution table depends on three factors: the type of test (one-sided or two-sided alternative hypothesis), the sample size, and the level of significance.

For a two-sided test (“not equal” alternative hypothesis), the critical value (t α /2 ), is determined by alpha ( α ), the level of significance, divided by two, to deal with the possibility that the result could be less than OR greater than the known value.

  • If your level of significance was 0.05, you would use the 0.025 column to find the correct critical value (0.05/2 = 0.025).
  • If your level of significance was 0.01, you would use the 0.005 column to find the correct critical value (0.01/2 = 0.005).

For a one-sided test (“a less than” or “greater than” alternative hypothesis), the critical value (t α ) , is determined by alpha ( α ), the level of significance, being all in the one side.

  • If your level of significance was 0.05, you would use the 0.05 column to find the correct critical value for either a left or right-side question. If you are asking a “less than” (left-sided question, your critical value will be negative. If you are asking a “greater than” (right-sided question), your critical value will be positive.

Find the critical value you would use to test the claim that μ ≠ 112 with a sample size of 18 and a 5% level of significance.

In this case, the critical value (t α /2 ) would be 2.110. This is a two-sided question (≠) so you would divide alpha by 2 (0.05/2 = 0.025) and go down the 0.025 column to 17 degrees of freedom.

What would the critical value be if you wanted to test that μ < 112 for the same data?

In this case, the critical value would be 1.740. This is a one-sided question (<) so alpha would be divided by 1 (0.05/1 = 0.05). You would go down the 0.05 column with 17 degrees of freedom to get the correct critical value.

In 2005, the mean pH level of rain in a county in northern New York was 5.41. A biologist believes that the rain acidity has changed. He takes a random sample of 11 rain dates in 2010 and obtains the following data. Use a 1% level of significance to test his claim.

4.70, 5.63, 5.02, 5.78, 4.99, 5.91, 5.76, 5.54, 5.25, 5.18, 5.01

The sample size is small and we don’t know anything about the distribution of the population, so we examine a normal probability plot. The distribution looks normal so we will continue with our test.


The sample mean is 5.343 with a sample standard deviation of 0.397.

  • H o : μ = 5.41
  • H 1 : μ ≠ 5.41
  • This is a two-sided question so alpha is divided by two.


  • t α /2 is found by going down the 0.005 column with 14 degrees of freedom.
  • t α /2 = ±3.169.
  • The test statistic is a t-score.


  • The test statistic does not fall in the rejection zone.

We will fail to reject the null hypothesis. We do not have enough evidence to support the claim that the mean rain pH has changed.

A One-sided Test

Cadmium, a heavy metal, is toxic to animals. Mushrooms, however, are able to absorb and accumulate cadmium at high concentrations. The government has set safety limits for cadmium in dry vegetables at 0.5 ppm. Biologists believe that the mean level of cadmium in mushrooms growing near strip mines is greater than the recommended limit of 0.5 ppm, negatively impacting the animals that live in this ecosystem. A random sample of 51 mushrooms gave a sample mean of 0.59 ppm with a sample standard deviation of 0.29 ppm. Use a 5% level of significance to test the claim that the mean cadmium level is greater than the acceptable limit of 0.5 ppm.

The sample size is greater than 30 so we are assured of a normal distribution of the means.

  • H o : μ = 0.5 ppm
  • H 1 : μ > 0.5 ppm
  • This is a right-sided question so alpha is all in the right tail.


  • t α is found by going down the 0.05 column with 50 degrees of freedom.
  • t α = 1.676


Step 4) State a Conclusion.


The test statistic falls in the rejection zone. We will reject the null hypothesis. We have enough evidence to support the claim that the mean cadmium level is greater than the acceptable safe limit.

BUT, what happens if the significance level changes to 1%?

The critical value is now found by going down the 0.01 column with 50 degrees of freedom. The critical value is 2.403. The test statistic is now LESS THAN the critical value. The test statistic does not fall in the rejection zone. The conclusion will change. We do NOT have enough evidence to support the claim that the mean cadmium level is greater than the acceptable safe limit of 0.5 ppm.

The level of significance is the probability that you, as the researcher, set to decide if there is enough statistical evidence to support the alternative claim. It should be set before the experiment begins.

P-value Approach

We can also use the p-value approach for a hypothesis test about the mean when the population standard deviation ( σ ) is unknown. However, when using a student’s t-table, we can only estimate the range of the p-value, not a specific value as when using the standard normal table. The student’s t-table has area (probability) across the top row in the table, with t-scores in the body of the table.

  • To find the p-value (the area associated with the test statistic), you would go to the row with the number of degrees of freedom.
  • Go across that row until you find the two values that your test statistic is between, then go up those columns to find the estimated range for the p-value.

Estimating P-value from a Student’s T-table


If your test statistic is 3.789 with 3 degrees of freedom, you would go across the 3 df row. The value 3.789 falls between the values 3.482 and 4.541 in that row. Therefore, the p-value is between 0.02 and 0.01. The p-value will be greater than 0.01 but less than 0.02 (0.01<p<0.02).

If your level of significance is 5%, you would reject the null hypothesis as the p-value (0.01-0.02) is less than alpha ( α ) of 0.05.

If your level of significance is 1%, you would fail to reject the null hypothesis as the p-value (0.01-0.02) is greater than alpha ( α ) of 0.01.

Software packages typically output p-values. It is easy to use the Decision Rule to answer your research question by the p-value method.

(referring to Ex. 12)


One-Sample T

Test of mu = 0.5 vs. > 0.5

95% Lower




SE Mean











Additional example: www.youtube.com/watch?v=WwdSjO4VUsg .

Hypothesis Test for a Population Proportion ( p )

Frequently, the parameter we are testing is the population proportion.

  • We are studying the proportion of trees with cavities for wildlife habitat.
  • We need to know if the proportion of people who support green building materials has changed.
  • Has the proportion of wolves that died last year in Yellowstone increased from the year before?

Recall that the best point estimate of p , the population proportion, is given by


when np (1 – p )≥10. We can use both the classical approach and the p-value approach for testing.

The steps for a hypothesis test are the same that we covered in Section 2.

The test statistic follows the standard normal distribution. Notice that the standard error (the denominator) uses p instead of p̂ , which was used when constructing a confidence interval about the population proportion. In a hypothesis test, the null hypothesis is assumed to be true, so the known proportion is used.


  • The critical value comes from the standard normal table, just as in Section 2. We will still use the same three pairs of null and alternative hypotheses as we used in the previous sections, but the parameter is now p instead of μ :


  • For a two-sided test, alpha will be divided by 2 giving a ± Z α /2 critical value.
  • For a left-sided test, alpha will be all in the left tail giving a – Z α critical value.
  • For a right-sided test, alpha will be all in the right tail giving a Z α critical value.

A botanist has produced a new variety of hybrid soy plant that is better able to withstand drought than other varieties. The botanist knows the seed germination for the parent plants is 75%, but does not know the seed germination for the new hybrid. He tests the claim that it is different from the parent plants. To test this claim, 450 seeds from the hybrid plant are tested and 321 have germinated. Use a 5% level of significance to test this claim that the germination rate is different from 75%.

  • H o : p = 0.75
  • H 1 : p ≠ 0.75

This is a two-sided question so alpha is divided by 2.

  • Alpha is 0.05 so the critical values are ± Z α /2 = ± Z .025 .
  • Look on the negative side of the standard normal table, in the body of values for 0.025.
  • The critical values are ± 1.96.


The test statistic does not fall in the rejection zone. We fail to reject the null hypothesis. We do not have enough evidence to support the claim that the germination rate of the hybrid plant is different from the parent plants.

Let’s answer this question using the p-value approach. Remember, for a two-sided alternative hypothesis (“not equal”), the p-value is two times the area of the test statistic. The test statistic is -1.81 and we want to find the area to the left of -1.81 from the standard normal table.

  • On the negative page, find the Z-score -1.81. Find the area associated with this Z-score.
  • The area = 0.0351.
  • This is a two-sided test so multiply the area times 2 to get the p-value = 0.0351 x 2 = 0.0702.

Now compare the p-value to alpha. The Decision Rule states that if the p-value is less than alpha, reject the H 0 . In this case, the p-value (0.0702) is greater than alpha (0.05) so we will fail to reject H 0 . We do not have enough evidence to support the claim that the germination rate of the hybrid plant is different from the parent plants.

You are a biologist studying the wildlife habitat in the Monongahela National Forest. Cavities in older trees provide excellent habitat for a variety of birds and small mammals. A study five years ago stated that 32% of the trees in this forest had suitable cavities for this type of wildlife. You believe that the proportion of cavity trees has increased. You sample 196 trees and find that 79 trees have cavities. Does this evidence support your claim that there has been an increase in the proportion of cavity trees?

Use a 10% level of significance to test this claim.

  • H o : p = 0.32
  • H 1 : p > 0.32

This is a one-sided question so alpha is divided by 1.

  • Alpha is 0.10 so the critical value is Z α = Z .10
  • Look on the positive side of the standard normal table, in the body of values for 0.90.
  • The critical value is 1.28.


  • The test statistic is the number of standard deviations the sample proportion is from the known proportion. It is also a Z-score, just like the critical value.


The test statistic is larger than the critical value (it falls in the rejection zone). We will reject the null hypothesis. We have enough evidence to support the claim that there has been an increase in the proportion of cavity trees.

Now use the p-value approach to answer the question. This is a right-sided question (“greater than”), so the p-value is equal to the area to the right of the test statistic. Go to the positive side of the standard normal table and find the area associated with the Z-score of 2.49. The area is 0.9936. Remember that this table is cumulative from the left. To find the area to the right of 2.49, we subtract from one.

p-value = (1 – 0.9936) = 0.0064

The p-value is less than the level of significance (0.10), so we reject the null hypothesis. We have enough evidence to support the claim that the proportion of cavity trees has increased.

(referring to Ex. 15)

Test and CI for One Proportion

Test of p = 0.32 vs. p > 0.32

90% Lower

Sample X N Sample p Bound Z-Value p-Value
1 79 196 0.403061 0.358160 2.49 0.006
Using the normal approximation.

Hypothesis Test about a Variance

When people think of statistical inference, they usually think of inferences involving population means or proportions. However, the particular population parameter needed to answer an experimenter’s practical questions varies from one situation to another, and sometimes a population’s variability is more important than its mean. Thus, product quality is often defined in terms of low variability.

Sample variance S 2 can be used for inferences concerning a population variance σ 2 . For a random sample of n measurements drawn from a normal population with mean μ and variance σ 2 , the value S 2 provides a point estimate for σ 2 . In addition, the quantity ( n – 1) S 2 / σ 2 follows a Chi-square ( χ 2 ) distribution, with df = n – 1.

The properties of Chi-square ( χ 2 ) distribution are:

  • Unlike Z and t distributions, the values in a chi-square distribution are all positive.
  • The chi-square distribution is asymmetric, unlike the Z and t distributions.
  • There are many chi-square distributions. We obtain a particular one by specifying the degrees of freedom (df = n – 1) associated with the sample variances S 2 .


One-sample χ 2 test for testing the hypotheses:


Alternative hypothesis:


where the χ 2 critical value in the rejection region is based on degrees of freedom df = n – 1 and a specified significance level of α .


As with previous sections, if the test statistic falls in the rejection zone set by the critical value, you will reject the null hypothesis.

A forester wants to control a dense understory of striped maple that is interfering with desirable hardwood regeneration using a mist blower to apply an herbicide treatment. She wants to make sure that treatment has a consistent application rate, in other words, low variability not exceeding 0.25 gal./acre (0.06 gal. 2 ). She collects sample data (n = 11) on this type of mist blower and gets a sample variance of 0.064 gal. 2 Using a 5% level of significance, test the claim that the variance is significantly greater than 0.06 gal. 2

H 0 : σ 2 = 0.06

H 1 : σ 2 >0.06

The critical value is 18.307. Any test statistic greater than this value will cause you to reject the null hypothesis.

The test statistic is


We fail to reject the null hypothesis. The forester does NOT have enough evidence to support the claim that the variance is greater than 0.06 gal. 2 You can also estimate the p-value using the same method as for the student t-table. Go across the row for degrees of freedom until you find the two values that your test statistic falls between. In this case going across the row 10, the two table values are 4.865 and 15.987. Now go up those two columns to the top row to estimate the p-value (0.1-0.9). The p-value is greater than 0.1 and less than 0.9. Both are greater than the level of significance (0.05) causing us to fail to reject the null hypothesis.

(referring to Ex. 16)


Test and CI for One Variance


Null hypothesis Sigma-squared = 0.06
Alternative hypothesis Sigma-squared > 0.06

The chi-square method is only for the normal distribution.


Method Statistic DF P-Value
Chi-Square 10.67 10 0.384

Excel does not offer 1-sample χ 2 testing.

Putting it all Together Using the Classical Method

To test a claim about μ when σ is known.

  • Write the null and alternative hypotheses.
  • State the level of significance and get the critical value from the standard normal table.


  • Compare the test statistic to the critical value (Z-score) and write the conclusion.

To Test a Claim about μ When σ is Unknown

  • State the level of significance and get the critical value from the student’s t-table with n-1 degrees of freedom.


  • Compare the test statistic to the critical value (t-score) and write the conclusion.

To Test a Claim about p

  • State the level of significance and get the critical value from the standard normal distribution.


To Test a Claim about Variance

  • State the level of significance and get the critical value from the chi-square table using n-1 degrees of freedom.


  • Compare the test statistic to the critical value and write the conclusion.

Natural Resources Biometrics Copyright © 2014 by Diane Kiernan is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

AP®︎/College Statistics

Course: ap®︎/college statistics   >   unit 10.

  • Hypothesis test for difference in proportions
  • Constructing hypotheses for two proportions
  • Writing hypotheses for testing the difference of proportions

Hypothesis test for difference in proportions example

  • Test statistic in a two-sample z test for the difference of proportions
  • P-value in a two-sample z test for the difference of proportions
  • Comparing P value to significance level for test involving difference of proportions
  • Confidence interval for hypothesis test for difference in proportions
  • Making conclusions about the difference of proportions

hypothesis test with 3 samples

Want to join the conversation?

  • Upvote Button navigates to signup page
  • Downvote Button navigates to signup page
  • Flag Button navigates to signup page

Video transcript

What is a Confidence Interval?

A confidence interval (CI) is a range of values that is likely to contain the value of an unknown population parameter . These intervals represent a plausible domain for the parameter given the characteristics of your sample data. Confidence intervals are derived from sample statistics and are calculated using a specified confidence level.

Population parameters are typically unknown because it is usually impossible to measure entire populations. By using a sample, you can estimate these parameters. However, the estimates rarely equal the parameter precisely thanks to random sampling error . Fortunately, inferential statistics procedures can evaluate a sample and incorporate the uncertainty inherent when using samples. Confidence intervals place a margin of error around the point estimate to help us understand how wrong the estimate might be.

You’ll frequently use confidence intervals to bound the sample mean and standard deviation parameters. But you can also create them for regression coefficients , proportions, rates of occurrence (Poisson), and the differences between populations.

Related post : Populations, Parameters, and Samples in Inferential Statistics

What is the Confidence Level?

The confidence level is the long-run probability that a series of confidence intervals will contain the true value of the population parameter.

Different random samples drawn from the same population are likely to produce slightly different intervals. If you draw many random samples and calculate a confidence interval for each sample, a percentage of them will contain the parameter.

The confidence level is the percentage of the intervals that contain the parameter. For 95% confidence intervals, an average of 19 out of 20 include the population parameter, as shown below.

Interval plot that displays 20 confidence intervals. 19 of them contain the population parameter.

The image above shows a hypothetical series of 20 confidence intervals from a study that draws multiple random samples from the same population. The horizontal red dashed line is the population parameter, which is usually unknown. Each blue dot is a the sample’s point estimate for the population parameter. Green lines represent CIs that contain the parameter, while the red line is a CI that does not contain it. The graph illustrates how confidence intervals are not perfect but usually correct.

The CI procedure provides meaningful estimates because it produces ranges that usually contain the parameter. Hence, they present plausible values for the parameter.

Technically, you can create CIs using any confidence level between 0 and 100%. However, the most common confidence level is 95%. Analysts occasionally use 99% and 90%.

Related posts : Populations and Samples  and Parameters vs. Statistics ,

How to Interpret Confidence Intervals

A confidence interval indicates where the population parameter is likely to reside. For example, a 95% confidence interval of the mean [9 11] suggests you can be 95% confident that the population mean is between 9 and 11.

Confidence intervals also help you navigate the uncertainty of how well a sample estimates a value for an entire population.

These intervals start with the point estimate for the sample and add a margin of error around it. The point estimate is the best guess for the parameter value. The margin of error accounts for the uncertainty involved when using a sample to estimate an entire population.

The width of the confidence interval around the point estimate reveals the precision. If the range is narrow, the margin of error is small, and there is only a tiny range of plausible values. That’s a precise estimate. However, if the interval is wide, the margin of error is large, and the actual parameter value is likely to fall somewhere  within that more extensive range . That’s an imprecise estimate.

Ideally, you’d like a narrow confidence interval because you’ll have a much better idea of the actual population value!

For example, imagine we have two different samples with a sample mean of 10. It appears both estimates are the same. Now let’s assess the 95% confidence intervals. One interval is [5 15] while the other is [9 11]. The latter range is narrower, suggesting a more precise estimate.

That’s how CIs provide more information than the point estimate (e.g., sample mean) alone.

Related post : Precision vs. Accuracy

Confidence Intervals for Effect Sizes

Confidence intervals are similarly helpful for understanding an effect size. For example, if you assess a treatment and control group, the mean difference between these groups is the estimated effect size. A 2-sample t-test can construct a confidence interval for the mean difference.

In this scenario, consider both the size and precision of the estimated effect. Ideally, an estimated effect is both large enough to be meaningful and sufficiently precise for you to trust. CIs allow you to assess both of these considerations! Learn more about this distinction in my post about Practical vs. Statistical Significance .

Learn more about how confidence intervals and hypothesis tests are similar .

Related post : Effect Sizes in Statistics

Avoid a Common Misinterpretation of Confidence Intervals

A frequent misuse is applying confidence intervals to the distribution of sample values. Remember that these ranges apply only to population parameters, not the data values.

For example, a 95% confidence interval [10 15] indicates that we can be 95% confident that the parameter is within that range.

However, it does NOT indicate that 95% of the sample values occur in that range.

If you need to use your sample to find the proportion of data values likely to fall within a range, use a tolerance interval instead.

Related post : See how confidence intervals compare to prediction intervals and tolerance intervals .

What Affects the Widths of Confidence Intervals?

Ok, so you want narrower CIs for their greater precision. What conditions produce tighter ranges?

Sample size, variability, and the confidence level affect the widths of confidence intervals. The first two are characteristics of your sample, which I’ll cover first.

Sample Variability

Variability present in your data affects the precision of the estimate. Your confidence intervals will be broader when your sample standard deviation is high.

It makes sense when you think about it. When there is a lot of variability present in your sample, you’re going to be less sure about the estimates it produces. After all, a high standard deviation means your sample data are really bouncing around! That’s not conducive for finding precise estimates.

Unfortunately, you often don’t have much control over data variability. You can institute measurement and data collection procedures that reduce outside sources of variability, but after that, you’re at the mercy of the variability inherent in your subject area. But, if you can reduce external sources of variation, that’ll help you reduce the width of your confidence intervals.

Sample Size

Increasing your sample size is the primary way to reduce the widths of confidence intervals because, in most cases, you can control it more than the variability. If you don’t change anything else and only increase the sample size, the ranges tend to narrow. Need even tighter CIs? Just increase the sample size some more!

Theoretically, there is no limit, and you can dramatically increase the sample size to produce remarkably narrow ranges. However, logistics, time, and cost issues will constrain your maximum sample size in the real world.

In summary, larger sample sizes and lower variability reduce the margin of error around the point estimate and create narrower confidence intervals. I’ll point out these factors again when we get to the formula later in this post.

Related post : Sample Statistics Are Always Wrong (to Some Extent)!

Changing the Confidence Level

The confidence level also affects the confidence interval width. However, this factor is a methodology choice separate from your sample’s characteristics.

If you increase the confidence level (e.g., 95% to 99%) while holding the sample size and variability constant, the confidence interval widens. Conversely, decreasing the confidence level (e.g., 95% to 90%) narrows the range.

I’ve found that many students find the effect of changing the confidence level on the width of the range to be counterintuitive.

Imagine you take your knowledge of a subject area and indicate you’re 95% confident that the correct answer lies between 15 and 20. Then I ask you to give me your confidence for it falling between 17 and 18. The correct answer is less likely to fall within the narrower interval, so your confidence naturally decreases.

Conversely, I ask you about your confidence that it’s between 10 and 30. That’s a much wider range, and the correct value is more likely to be in it. Consequently, your confidence grows.

Confidence levels involve a tradeoff between confidence and the interval’s spread. To have more confidence that the parameter falls within the interval, you must widen the interval. Conversely, your confidence necessarily decreases if you use a narrower range.

Confidence Interval Formula

Confidence intervals account for sampling uncertainty by using critical values, sampling distributions, and standard errors. The precise formula depends on the type of parameter you’re evaluating. The most common type is for the mean, so I’ll stick with that.

You’ll use critical Z-values or t-values to calculate your confidence interval of the mean. T-values produce more accurate confidence intervals when you do not know the population standard deviation. That’s particularly true for sample sizes smaller than 30. For larger samples, the two methods produce similar results. In practice, you’d usually use a t-value.

Below are the confidence interval formulas for both Z and t. However, you’d only use one of them.

Confidence interval formula.

  • x̄ = the sample mean, which is the point estimate.
  • Z = the critical z-value
  • t = the critical t-value
  • s = the sample standard deviation
  • s / √n = the standard error of the mean

The only difference between the two formulas is the critical value. If you’re using the critical z-value, you’ll always use 1.96 for 95% confidence intervals. However, for the t-value, you’ll need to know the degrees of freedom and then look up the critical value in a t-table or online calculator.

To calculate a confidence interval, take the critical value (Z or t) and multiply it by the standard error of the mean (SEM). This value is known as the margin of error (MOE) . Then add and subtract the MOE from the sample mean (x̄) to produce the upper and lower limits of the range.

Related posts : Critical Values , Standard Error of the Mean , and Sampling Distributions

Interval Widths Revisited

Think back to the discussion about the factors affecting the confidence interval widths. The formula helps you understand how that works. Recall that the critical value * SEM = MOE.

Smaller margins of error produce narrower confidence intervals. By looking at this equation, you can see that the following conditions create a smaller MOE:

  • Smaller critical values, which you obtain by decreasing the confidence level.
  • Smaller standard deviations, because they’re in the numerator of the SEM.
  • Large samples sizes, because its square root is in the denominator of the SEM.

How to Find a Confidence Interval

Let’s move on to using these formulas to find a confidence interval! For this example, I’ll use a fuel cost dataset that I’ve used in other posts: FuelCosts . The dataset contains a random sample of 25 fuel costs. We want to calculate the 95% confidence interval of the mean.

However, imagine we have only the following summary information instead of the dataset.

  • Sample mean: 330.6
  • Standard deviation: 154.2

Fortunately, that’s all we need to calculate our 95% confidence interval of the mean.

We need to decide on using the critical Z or t-value. I’ll use a critical t-value because the sample size (25) is less than 30. However, if the summary didn’t provide the sample size, we could use the Z-value method for an approximation.

My next step is to look up the critical t-value using my t-table. In the table, I’ll choose the alpha that equals 1 – the confidence level (1 – 0.95 = 0.05) for a two-sided test. Below is a truncated version of the t-table. Click for the full t-distribution table .

Portion of the t-table.

In the table, I see that for a two-sided interval with 25 – 1 = 24 degrees of freedom and an alpha of 0.05, the critical value is 2.064.

Entering Values into the Confidence Interval Formula

Let’s enter all of this information into the formula.

First, I’ll calculate the margin of error:

Example calculations for the confidence interval.

Next, I’ll take the sample mean and add and subtract the margin of error from it:

  • 330.6 + 63.6 = 394.2
  • 330.6 – 63.6 = 267.0

The 95% confidence interval of the mean for fuel costs is 267.0 – 394.2. We can be 95% confident that the population mean falls within this range.

If you had used the critical z-value (1.96), you would enter that into the formula instead of the t-value (2.064) and obtain a slightly different confidence interval. However, t-values produce more accurate results, particularly for smaller samples like this one.

As an aside, the Z-value method always produces narrower confidence intervals than t-values when your sample size is less than infinity. So, basically always! However, that’s not good because Z-values underestimate the uncertainty when you’re using a sample estimate of the standard deviation rather than the actual population value. And you practically never know the population standard deviation.

Neyman, J. (1937).  Outline of a Theory of Statistical Estimation Based on the Classical Theory of Probability .  Philosophical Transactions of the Royal Society A .  236  (767): 333–380.

If I take a sample and create a confidence interval for the mean, can I say that 95% of the mean of the other samples I will take can be found in this range?

February 23, 2024 at 8:40 pm

Unfortunately, that would be an invalid statement. The CI formula uses your sample to estimate the properties of the population to construct the CI. Your estimates are bound to be off by at least a little bit. If you knew the precise properties of the population, you could determine the range in which 95% of random samples from that population would fall. However, again, you don’t know the precise properties of the population. You just have estimates based on your sample.

September 29, 2023 at 6:55 pm

Hi Jim, My confusion is similar to one comment. What I cannot seem to understand is the concept of individual and many CIs and therefore statements such as X% of the CIs.

For a sampling distribution, which itself requires many samples to produce, we try to find a confidence interval. Then how come there are multiple CIs. More specifically “Different random samples drawn from the same population are likely to produce slightly different intervals. If you draw many random samples and calculate a confidence interval for each sample, a percentage of them will contain the parameter.” this is what confuses me. Is interval here represents the range of the samples drawn? If that is true, why is the term CI or interval used for sample range? If not, could you please explain what is mean by an individual CI or how are we calculating confidence interval for each sample? In the image depicting 19 out of 20 will have population parameter, is the green line the range of individual samples or the confidence interval?

Please try to sort this confusion out for me. I find your website really helpful for clearing my statistical concepts. Thank you in advance for helping out. Regards.

September 30, 2023 at 1:52 am

A key point to remember is that inferential statistics occur in the context of drawing many random samples from the same population. Of course, a single study typically draws a single sample. However, if that study were to draw another random sample, it would be somewhat different than the first sample. A third sample would be somewhat different as well. That produces the sampling distribution, which helps you calculate p-values and construct CIs. Inferential statistics procedures use the idea of many samples to incorporate random sampling error into the results.

For CIs, if you were to collect many random samples, a certain percentage of them will contain the population parameter. That percentage is the confidence interval. Again, a single study will only collect a single sample. However, picturing many CIs helps you understand the concept of the confidence level. In practice, a study generates one CI per parameter estimate. But the graph with multiple CIs is just to help you understand the concept of confidence level.

Alternatively, you can think of CIs as an object class. Suppose 100 disparate studies produce 95% CIs. You can assume that about 95 of those CIs actually contain the population parameter.   Using statistical procedures, you can estimate the sampling distribution using the sample itself without collecting many samples.

I don’t know what you mean by “Interval here represents the range of samples drawn.” As I write in this article, the CI is an interval of values that likely contain the population parameter. Reread the section titled How to Interpret Confidence Intervals to understand what each one means.

Each CI is estimated from a single sample and a study generates one CI per parameter estimate. However, again, understanding the concept of the confidence level is easier when you picture multiple CIs. But if a single study were to collect multiple samples and produces multiple CIs, that graph is what you’d expect to see. Although, in the real world, you never know for sure whether a CI actually contains the parameter or not.

The green lines represent CIs that contain the population parameter. Red lines represent CIs that do not contain the population parameter. The graph illustrates how CIs are not perfect but they are usually correct. I’ve added text to the article to clarify that image.

I also show you how to calculate the CI for a mean in this article. I’m not sure what more you need to understand there? I’m happy to clarify any part of that.

I hope that helps!

Hi Jim, This was an excellent article, thank you! I have a question: when computing a CI in its single-sample t-test module, SPSS appears to use the difference between population and sample means as a starting point (so the formula would be (X-bar-mu) +/- tcv(SEM)). I’ve consulted multiple stats books, but none of them compute a CI that way for a single-sample t-test. Maybe I’m just missing something and this is a perfectly acceptable way of doing things (I mean, SPSS does it :-)), but it yields substantially different lower and upper bounds from a CI that uses the traditional X-bar as a starting point. Do you have any insights? Many thanks in advance! Stephen

July 7, 2023 at 2:56 am

Hi Stephen,

I’m not an SPSS user but that formula is confusing. They presented this formula as being for the CI of a sample mean?

I’m not sure why they’re subtracting Mu. For one thing, you almost never know what Mu is because you’d have to measure the entire population. And, if you knew Mu, you wouldn’t need to perform a t-test! Why would you use a sample mean (X-bar) if you knew the population mean? None of it makes sense to me. It must be an error of some kind even if just of documentation.

Are there strict distinctions between the terms “confident”, “likely”, and “probability”? I’ve seen a number of other sources exclaim that for a given calculated confidence interval, the frequentist interpretation of that is the parameter is either in or not in that interval. They say another frequent misinterpretation is that the parameter lies within a calculated interval with a 95% probability.

It’s very confusing to balance that notion with practical casual communication of data in non-research settings.

October 13, 2022 at 5:43 pm

It is a confusing issue.

In this strictest technical sense, the confidence level is probability that applies to the process but NOT an individual confidence interval. There are several reasons for that.

In the frequentist framework, the probability that an individual CI contains the parameter is either 100% or 0%. It’s either in it or out. The parameter is not a random variable. However, because you don’t know the parameter value, you don’t know which of those two conditions is correct. That’s the conceptual approach. And the mathematics behind the scenes are complementary to that. There’s just no way to calculate the probability that an individual CI contains the parameter.

On the other hand, the process behind creating the intervals will cause X% of the CIs at the Xth confidence level to include that parameter. So, for all 95% CIs, you’d expect 95% of them to contain the parameter value. The confidence level applies to the process, not the individual CIs. Statisticians intentionally used the term “confidence” to describe that as opposed to “probability” hoping to make that distinction.

So, the 95% confidence applies the process but not individual CIs.

However, if you’re thinking that if 95% of many CIs contain the parameter, then surely a single CI has a 95% probability. From a technical standpoint, that is NOT true. However, it sure sounds logical. Most statistics make intuitive sense to me, but I struggle with that one myself. I’ve asked other statisticians to get their take on it. The basic gist of their answers is that there might be other information available which can alter the actual probability. Not all CIs produced by the process have the same probability. For example, if an individual CI is a bit higher or lower than most other CIs for the same thing, the CIs with the unusual values will have lower probabilities for containing the parameters.

I think that makes sense. The only problem is that you often don’t know where your individual CI fits in. That means you don’t know the probability for it specifically. But you do know the overall probability for the process.

The answer for this question is never totally satisfying. Just remember that there is no mathematical way in the frequentist framework to calculate the probability that an individual CI contains the parameter. However, the overall process is designed such that all CIs using a particular confidence level will have the specified proportion containing the parameter. However, you can’t apply that overall proportion to your individual CI because on the technical side there’s no mathematical way to do that and conceptually, you don’t know where your individual CI fits in the entire distribution of CIs.

9.1: Introduction

9.2: null and alternative hypotheses.

Some of the following statements refer to the null hypothesis, some to the alternate hypothesis.

State the null hypothesis, \(H_{0}\), and the alternative hypothesis. \(H_{a}\), in terms of the appropriate parameter \((\mu \text{or} p)\).

  • The mean number of years Americans work before retiring is 34.
  • At most 60% of Americans vote in presidential elections.
  • The mean starting salary for San Jose State University graduates is at least $100,000 per year.
  • Twenty-nine percent of high school seniors get drunk each month.
  • Fewer than 5% of adults ride the bus to work in Los Angeles.
  • The mean number of cars a person owns in her lifetime is not more than ten.
  • About half of Americans prefer to live away from cities, given the choice.
  • Europeans have a mean paid vacation each year of six weeks.
  • The chance of developing breast cancer is under 11% for women.
  • Private universities' mean tuition cost is more than $20,000 per year.
  • \(H_{0}: \mu = 34; H_{a}: \mu \neq 34\)
  • \(H_{0}: p \leq 0.60; H_{a}: p > 0.60\)
  • \(H_{0}: \mu \geq 100,000; H_{a}: \mu < 100,000\)
  • \(H_{0}: p = 0.29; H_{a}: p \neq 0.29\)
  • \(H_{0}: p = 0.05; H_{a}: p < 0.05\)
  • \(H_{0}: \mu \leq 10; H_{a}: \mu > 10\)
  • \(H_{0}: p = 0.50; H_{a}: p \neq 0.50\)
  • \(H_{0}: \mu = 6; H_{a}: \mu \neq 6\)
  • \(H_{0}: p ≥ 0.11; H_{a}: p < 0.11\)
  • \(H_{0}: \mu \leq 20,000; H_{a}: \mu > 20,000\)

Over the past few decades, public health officials have examined the link between weight concerns and teen girls' smoking. Researchers surveyed a group of 273 randomly selected teen girls living in Massachusetts (between 12 and 15 years old). After four years the girls were surveyed again. Sixty-three said they smoked to stay thin. Is there good evidence that more than thirty percent of the teen girls smoke to stay thin? The alternative hypothesis is:

  • \(p < 0.30\)
  • \(p \leq 0.30\)
  • \(p \geq 0.30\)
  • \(p > 0.30\)

A statistics instructor believes that fewer than 20% of Evergreen Valley College (EVC) students attended the opening night midnight showing of the latest Harry Potter movie. She surveys 84 of her students and finds that 11 attended the midnight showing. An appropriate alternative hypothesis is:

  • \(p = 0.20\)
  • \(p > 0.20\)
  • \(p < 0.20\)
  • \(p \leq 0.20\)

Previously, an organization reported that teenagers spent 4.5 hours per week, on average, on the phone. The organization thinks that, currently, the mean is higher. Fifteen randomly chosen teenagers were asked how many hours per week they spend on the phone. The sample mean was 4.75 hours with a sample standard deviation of 2.0. Conduct a hypothesis test. The null and alternative hypotheses are:

  • \(H_{0}: \bar{x} = 4.5, H_{a}: \bar{x} > 4.5\)
  • \(H_{0}: \mu \geq 4.5, H_{a}: \mu < 4.5\)
  • \(H_{0}: \mu = 4.75, H_{a}: \mu > 4.75\)
  • \(H_{0}: \mu = 4.5, H_{a}: \mu > 4.5\)

9.3: Outcomes and the Type I and Type II Errors

State the Type I and Type II errors in complete sentences given the following statements.

  • The mean number of cars a person owns in his or her lifetime is not more than ten.
  • Private universities mean tuition cost is more than $20,000 per year.
  • Type I error: We conclude that the mean is not 34 years, when it really is 34 years. Type II error: We conclude that the mean is 34 years, when in fact it really is not 34 years.
  • Type I error: We conclude that more than 60% of Americans vote in presidential elections, when the actual percentage is at most 60%.Type II error: We conclude that at most 60% of Americans vote in presidential elections when, in fact, more than 60% do.
  • Type I error: We conclude that the mean starting salary is less than $100,000, when it really is at least $100,000. Type II error: We conclude that the mean starting salary is at least $100,000 when, in fact, it is less than $100,000.
  • Type I error: We conclude that the proportion of high school seniors who get drunk each month is not 29%, when it really is 29%. Type II error: We conclude that the proportion of high school seniors who get drunk each month is 29% when, in fact, it is not 29%.
  • Type I error: We conclude that fewer than 5% of adults ride the bus to work in Los Angeles, when the percentage that do is really 5% or more. Type II error: We conclude that 5% or more adults ride the bus to work in Los Angeles when, in fact, fewer that 5% do.
  • Type I error: We conclude that the mean number of cars a person owns in his or her lifetime is more than 10, when in reality it is not more than 10. Type II error: We conclude that the mean number of cars a person owns in his or her lifetime is not more than 10 when, in fact, it is more than 10.
  • Type I error: We conclude that the proportion of Americans who prefer to live away from cities is not about half, though the actual proportion is about half. Type II error: We conclude that the proportion of Americans who prefer to live away from cities is half when, in fact, it is not half.
  • Type I error: We conclude that the duration of paid vacations each year for Europeans is not six weeks, when in fact it is six weeks. Type II error: We conclude that the duration of paid vacations each year for Europeans is six weeks when, in fact, it is not.
  • Type I error: We conclude that the proportion is less than 11%, when it is really at least 11%. Type II error: We conclude that the proportion of women who develop breast cancer is at least 11%, when in fact it is less than 11%.
  • Type I error: We conclude that the average tuition cost at private universities is more than $20,000, though in reality it is at most $20,000. Type II error: We conclude that the average tuition cost at private universities is at most $20,000 when, in fact, it is more than $20,000.

For statements a-j in Exercise 9.109 , answer the following in complete sentences.

  • State a consequence of committing a Type I error.
  • State a consequence of committing a Type II error.

When a new drug is created, the pharmaceutical company must subject it to testing before receiving the necessary permission from the Food and Drug Administration (FDA) to market the drug. Suppose the null hypothesis is “the drug is unsafe.” What is the Type II Error?

  • To conclude the drug is safe when in, fact, it is unsafe.
  • Not to conclude the drug is safe when, in fact, it is safe.
  • To conclude the drug is safe when, in fact, it is safe.
  • Not to conclude the drug is unsafe when, in fact, it is unsafe.

A statistics instructor believes that fewer than 20% of Evergreen Valley College (EVC) students attended the opening midnight showing of the latest Harry Potter movie. She surveys 84 of her students and finds that 11 of them attended the midnight showing. The Type I error is to conclude that the percent of EVC students who attended is ________.

  • at least 20%, when in fact, it is less than 20%.
  • 20%, when in fact, it is 20%.
  • less than 20%, when in fact, it is at least 20%.
  • less than 20%, when in fact, it is less than 20%.

It is believed that Lake Tahoe Community College (LTCC) Intermediate Algebra students get less than seven hours of sleep per night, on average. A survey of 22 LTCC Intermediate Algebra students generated a mean of 7.24 hours with a standard deviation of 1.93 hours. At a level of significance of 5%, do LTCC Intermediate Algebra students get less than seven hours of sleep per night, on average?

The Type II error is not to reject that the mean number of hours of sleep LTCC students get per night is at least seven when, in fact, the mean number of hours

  • is more than seven hours.
  • is at most seven hours.
  • is at least seven hours.
  • is less than seven hours.

Previously, an organization reported that teenagers spent 4.5 hours per week, on average, on the phone. The organization thinks that, currently, the mean is higher. Fifteen randomly chosen teenagers were asked how many hours per week they spend on the phone. The sample mean was 4.75 hours with a sample standard deviation of 2.0. Conduct a hypothesis test, the Type I error is:

  • to conclude that the current mean hours per week is higher than 4.5, when in fact, it is higher
  • to conclude that the current mean hours per week is higher than 4.5, when in fact, it is the same
  • to conclude that the mean hours per week currently is 4.5, when in fact, it is higher
  • to conclude that the mean hours per week currently is no higher than 4.5, when in fact, it is not higher

9.4: Distribution Needed for Hypothesis Testing

It is believed that Lake Tahoe Community College (LTCC) Intermediate Algebra students get less than seven hours of sleep per night, on average. A survey of 22 LTCC Intermediate Algebra students generated a mean of 7.24 hours with a standard deviation of 1.93 hours. At a level of significance of 5%, do LTCC Intermediate Algebra students get less than seven hours of sleep per night, on average? The distribution to be used for this test is \(\bar{X} \sim\) ________________

  • \(N\left(7.24, \frac{1.93}{\sqrt{22}}\right)\)
  • \(N\left(7.24, 1.93\right)\)

9.5: Rare Events, the Sample, Decision and Conclusion

The National Institute of Mental Health published an article stating that in any one-year period, approximately 9.5 percent of American adults suffer from depression or a depressive illness. Suppose that in a survey of 100 people in a certain town, seven of them suffered from depression or a depressive illness. Conduct a hypothesis test to determine if the true proportion of people in that town suffering from depression or a depressive illness is lower than the percent in the general adult American population.

  • Is this a test of one mean or proportion?
  • State the null and alternative hypotheses. \(H_{0}\) : ____________________ \(H_{a}\) : ____________________
  • Is this a right-tailed, left-tailed, or two-tailed test?
  • What symbol represents the random variable for this test?
  • In words, define the random variable for this test.
  • \(x =\) ________________
  • \(n =\) ________________
  • \(p′ =\) _____________
  • Calculate \(\sigma_{x} =\) __________. Show the formula set-up.
  • State the distribution to use for the hypothesis test.
  • Find the \(p\text{-value}\).
  • Reason for the decision:
  • Conclusion (write out in a complete sentence):

9.6: Additional Information and Full Hypothesis Test Examples

For each of the word problems, use a solution sheet to do the hypothesis test. The solution sheet is found in [link] . Please feel free to make copies of the solution sheets. For the online version of the book, it is suggested that you copy the .doc or the .pdf files.

If you are using a Student's \(t\) - distribution for one of the following homework problems, you may assume that the underlying population is normally distributed. (In general, you must first prove that assumption, however.)

A particular brand of tires claims that its deluxe tire averages at least 50,000 miles before it needs to be replaced. From past studies of this tire, the standard deviation is known to be 8,000. A survey of owners of that tire design is conducted. From the 28 tires surveyed, the mean lifespan was 46,500 miles with a standard deviation of 9,800 miles. Using \(\alpha = 0.05\), is the data highly inconsistent with the claim?

  • \(H_{0}: \mu \geq 50,000\)
  • \(H_{a}: \mu < 50,000\)
  • Let \(\bar{X} =\) the average lifespan of a brand of tires.
  • normal distribution
  • \(z = -2.315\)
  • \(p\text{-value} = 0.0103\)
  • Check student’s solution.
  • alpha: 0.05
  • Decision: Reject the null hypothesis.
  • Reason for decision: The \(p\text{-value}\) is less than 0.05.
  • Conclusion: There is sufficient evidence to conclude that the mean lifespan of the tires is less than 50,000 miles.
  • \((43,537, 49,463)\)

From generation to generation, the mean age when smokers first start to smoke varies. However, the standard deviation of that age remains constant of around 2.1 years. A survey of 40 smokers of this generation was done to see if the mean starting age is at least 19. The sample mean was 18.1 with a sample standard deviation of 1.3. Do the data support the claim at the 5% level?

The cost of a daily newspaper varies from city to city. However, the variation among prices remains steady with a standard deviation of 20¢. A study was done to test the claim that the mean cost of a daily newspaper is $1.00. Twelve costs yield a mean cost of 95¢ with a standard deviation of 18¢. Do the data support the claim at the 1% level?

  • \(H_{0}: \mu = $1.00\)
  • \(H_{a}: \mu \neq $1.00\)
  • Let \(\bar{X} =\) the average cost of a daily newspaper.
  • \(z = –0.866\)
  • \(p\text{-value} = 0.3865\)
  • \(\alpha: 0.01\)
  • Decision: Do not reject the null hypothesis.
  • Reason for decision: The \(p\text{-value}\) is greater than 0.01.
  • Conclusion: There is sufficient evidence to support the claim that the mean cost of daily papers is $1. The mean cost could be $1.
  • \(($0.84, $1.06)\)

An article in the San Jose Mercury News stated that students in the California state university system take 4.5 years, on average, to finish their undergraduate degrees. Suppose you believe that the mean time is longer. You conduct a survey of 49 students and obtain a sample mean of 5.1 with a sample standard deviation of 1.2. Do the data support your claim at the 1% level?

The mean number of sick days an employee takes per year is believed to be about ten. Members of a personnel department do not believe this figure. They randomly survey eight employees. The number of sick days they took for the past year are as follows: 12; 4; 15; 3; 11; 8; 6; 8. Let \(x =\) the number of sick days they took for the past year. Should the personnel team believe that the mean number is ten?

  • \(H_{0}: \mu = 10\)
  • \(H_{a}: \mu \neq 10\)
  • Let \(\bar{X}\) the mean number of sick days an employee takes per year.
  • Student’s t -distribution
  • \(t = –1.12\)
  • \(p\text{-value} = 0.300\)
  • \(\alpha: 0.05\)
  • Reason for decision: The \(p\text{-value}\) is greater than 0.05.
  • Conclusion: At the 5% significance level, there is insufficient evidence to conclude that the mean number of sick days is not ten.
  • \((4.9443, 11.806)\)

In 1955, Life Magazine reported that the 25 year-old mother of three worked, on average, an 80 hour week. Recently, many groups have been studying whether or not the women's movement has, in fact, resulted in an increase in the average work week for women (combining employment and at-home work). Suppose a study was done to determine if the mean work week has increased. 81 women were surveyed with the following results. The sample mean was 83; the sample standard deviation was ten. Does it appear that the mean work week has increased for women at the 5% level?

Your statistics instructor claims that 60 percent of the students who take her Elementary Statistics class go through life feeling more enriched. For some reason that she can't quite figure out, most people don't believe her. You decide to check this out on your own. You randomly survey 64 of her past Elementary Statistics students and find that 34 feel more enriched as a result of her class. Now, what do you think?

  • \(H_{0}: p \geq 0.6\)
  • \(H_{a}: p < 0.6\)
  • Let \(P′ =\) the proportion of students who feel more enriched as a result of taking Elementary Statistics.
  • normal for a single proportion
  • \(p\text{-value} = 0.1308\)
  • Conclusion: There is insufficient evidence to conclude that less than 60 percent of her students feel more enriched.

The “plus-4s” confidence interval is \((0.411, 0.648)\)

A Nissan Motor Corporation advertisement read, “The average man’s I.Q. is 107. The average brown trout’s I.Q. is 4. So why can’t man catch brown trout?” Suppose you believe that the brown trout’s mean I.Q. is greater than four. You catch 12 brown trout. A fish psychologist determines the I.Q.s as follows: 5; 4; 7; 3; 6; 4; 5; 3; 6; 3; 8; 5. Conduct a hypothesis test of your belief.

Refer to Exercise 9.119 . Conduct a hypothesis test to see if your decision and conclusion would change if your belief were that the brown trout’s mean I.Q. is not four.

  • \(H_{0}: \mu = 4\)
  • \(H_{a}: \mu \neq 4\)
  • Let \(\bar{X}\) the average I.Q. of a set of brown trout.
  • two-tailed Student's t-test
  • \(t = 1.95\)
  • \(p\text{-value} = 0.076\)
  • Reason for decision: The \(p\text{-value}\) is greater than 0.05
  • Conclusion: There is insufficient evidence to conclude that the average IQ of brown trout is not four.
  • \((3.8865,5.9468)\)

According to an article in Newsweek , the natural ratio of girls to boys is 100:105. In China, the birth ratio is 100: 114 (46.7% girls). Suppose you don’t believe the reported figures of the percent of girls born in China. You conduct a study. In this study, you count the number of girls and boys born in 150 randomly chosen recent births. There are 60 girls and 90 boys born of the 150. Based on your study, do you believe that the percent of girls born in China is 46.7?

A poll done for Newsweek found that 13% of Americans have seen or sensed the presence of an angel. A contingent doubts that the percent is really that high. It conducts its own survey. Out of 76 Americans surveyed, only two had seen or sensed the presence of an angel. As a result of the contingent’s survey, would you agree with the Newsweek poll? In complete sentences, also give three reasons why the two polls might give different results.

  • \(H_{a}: p < 0.13\)
  • Let \(P′ =\) the proportion of Americans who have seen or sensed angels
  • –2.688
  • \(p\text{-value} = 0.0036\)
  • Reason for decision: The \(p\text{-value}\)e is less than 0.05.
  • Conclusion: There is sufficient evidence to conclude that the percentage of Americans who have seen or sensed an angel is less than 13%.

The“plus-4s” confidence interval is (0.0022, 0.0978)

The mean work week for engineers in a start-up company is believed to be about 60 hours. A newly hired engineer hopes that it’s shorter. She asks ten engineering friends in start-ups for the lengths of their mean work weeks. Based on the results that follow, should she count on the mean work week to be shorter than 60 hours?

Data (length of mean work week): 70; 45; 55; 60; 65; 55; 55; 60; 50; 55.

Use the “Lap time” data for Lap 4 (see [link] ) to test the claim that Terri finishes Lap 4, on average, in less than 129 seconds. Use all twenty races given.

  • \(H_{0}: \mu \geq 129\)
  • \(H_{a}: \mu < 129\)
  • Let \(\bar{X} =\) the average time in seconds that Terri finishes Lap 4.
  • Student's t -distribution
  • \(t = 1.209\)
  • Conclusion: There is insufficient evidence to conclude that Terri’s mean lap time is less than 129 seconds.
  • \((128.63, 130.37)\)

Use the “Initial Public Offering” data (see [link] ) to test the claim that the mean offer price was $18 per share. Do not use all the data. Use your random number generator to randomly survey 15 prices.

The following questions were written by past students. They are excellent problems!

"Asian Family Reunion," by Chau Nguyen

Every two years it comes around.

We all get together from different towns.

In my honest opinion,

It's not a typical family reunion.

Not forty, or fifty, or sixty,

But how about seventy companions!

The kids would play, scream, and shout

One minute they're happy, another they'll pout.

The teenagers would look, stare, and compare

From how they look to what they wear.

The men would chat about their business

That they make more, but never less.

Money is always their subject

And there's always talk of more new projects.

The women get tired from all of the chats

They head to the kitchen to set out the mats.

Some would sit and some would stand

Eating and talking with plates in their hands.

Then come the games and the songs

And suddenly, everyone gets along!

With all that laughter, it's sad to say

That it always ends in the same old way.

They hug and kiss and say "good-bye"

And then they all begin to cry!

I say that 60 percent shed their tears

But my mom counted 35 people this year.

She said that boys and men will always have their pride,

So we won't ever see them cry.

I myself don't think she's correct,

So could you please try this problem to see if you object?

  • \(H_{0}: p = 0.60\)
  • \(H_{a}: p < 0.60\)
  • Let \(P′ =\) the proportion of family members who shed tears at a reunion.
  • –1.71
  • Reason for decision: \(p\text{-value} < \alpha\)
  • Conclusion: At the 5% significance level, there is sufficient evidence to conclude that the proportion of family members who shed tears at a reunion is less than 0.60. However, the test is weak because the \(p\text{-value}\) and alpha are quite close, so other tests should be done.
  • We are 95% confident that between 38.29% and 61.71% of family members will shed tears at a family reunion. \((0.3829, 0.6171)\). The“plus-4s” confidence interval (see chapter 8) is \((0.3861, 0.6139)\)

Note that here the “large-sample” \(1 - \text{PropZTest}\) provides the approximate \(p\text{-value}\) of 0.0438. Whenever a \(p\text{-value}\) based on a normal approximation is close to the level of significance, the exact \(p\text{-value}\) based on binomial probabilities should be calculated whenever possible. This is beyond the scope of this course.

"The Problem with Angels," by Cyndy Dowling

Although this problem is wholly mine,

The catalyst came from the magazine, Time.

On the magazine cover I did find

The realm of angels tickling my mind.

Inside, 69% I found to be

In angels, Americans do believe.

Then, it was time to rise to the task,

Ninety-five high school and college students I did ask.

Viewing all as one group,

Random sampling to get the scoop.

So, I asked each to be true,

"Do you believe in angels?" Tell me, do!

Hypothesizing at the start,

Totally believing in my heart

That the proportion who said yes

Would be equal on this test.

Lo and behold, seventy-three did arrive,

Out of the sample of ninety-five.

Now your job has just begun,

Solve this problem and have some fun.

"Blowing Bubbles," by Sondra Prull

Studying stats just made me tense,

I had to find some sane defense.

Some light and lifting simple play

To float my math anxiety away.

Blowing bubbles lifts me high

Takes my troubles to the sky.

POIK! They're gone, with all my stress

Bubble therapy is the best.

The label said each time I blew

The average number of bubbles would be at least 22.

I blew and blew and this I found

From 64 blows, they all are round!

But the number of bubbles in 64 blows

Varied widely, this I know.

20 per blow became the mean

They deviated by 6, and not 16.

From counting bubbles, I sure did relax

But now I give to you your task.

Was 22 a reasonable guess?

Find the answer and pass this test!

  • \(H_{0}: \mu \geq 22\)
  • \(H_{a}: \mu < 22\)
  • Let \(\bar{X} =\) the mean number of bubbles per blow.
  • –2.667
  • \(p\text{-value} = 0.00486\)
  • Conclusion: There is sufficient evidence to conclude that the mean number of bubbles per blow is less than 22.
  • \((18.501, 21.499)\)

"Dalmatian Darnation," by Kathy Sparling

A greedy dog breeder named Spreckles

Bred puppies with numerous freckles

The Dalmatians he sought

Possessed spot upon spot

The more spots, he thought, the more shekels.

His competitors did not agree

That freckles would increase the fee.

They said, “Spots are quite nice

But they don't affect price;

One should breed for improved pedigree.”

The breeders decided to prove

This strategy was a wrong move.

Breeding only for spots

Would wreak havoc, they thought.

His theory they want to disprove.

They proposed a contest to Spreckles

Comparing dog prices to freckles.

In records they looked up

One hundred one pups:

Dalmatians that fetched the most shekels.

They asked Mr. Spreckles to name

An average spot count he'd claim

To bring in big bucks.

Said Spreckles, “Well, shucks,

It's for one hundred one that I aim.”

Said an amateur statistician

Who wanted to help with this mission.

“Twenty-one for the sample

Standard deviation's ample:

They examined one hundred and one

Dalmatians that fetched a good sum.

They counted each spot,

Mark, freckle and dot

And tallied up every one.

Instead of one hundred one spots

They averaged ninety six dots

Can they muzzle Spreckles’

Obsession with freckles

Based on all the dog data they've got?

"Macaroni and Cheese, please!!" by Nedda Misherghi and Rachelle Hall

As a poor starving student I don't have much money to spend for even the bare necessities. So my favorite and main staple food is macaroni and cheese. It's high in taste and low in cost and nutritional value.

One day, as I sat down to determine the meaning of life, I got a serious craving for this, oh, so important, food of my life. So I went down the street to Greatway to get a box of macaroni and cheese, but it was SO expensive! $2.02 !!! Can you believe it? It made me stop and think. The world is changing fast. I had thought that the mean cost of a box (the normal size, not some super-gigantic-family-value-pack) was at most $1, but now I wasn't so sure. However, I was determined to find out. I went to 53 of the closest grocery stores and surveyed the prices of macaroni and cheese. Here are the data I wrote in my notebook:

Price per box of Mac and Cheese:

  • 5 stores @ $2.02
  • 15 stores @ $0.25
  • 3 stores @ $1.29
  • 6 stores @ $0.35
  • 4 stores @ $2.27
  • 7 stores @ $1.50
  • 5 stores @ $1.89
  • 8 stores @ 0.75.

I could see that the cost varied but I had to sit down to figure out whether or not I was right. If it does turn out that this mouth-watering dish is at most $1, then I'll throw a big cheesy party in our next statistics lab, with enough macaroni and cheese for just me. (After all, as a poor starving student I can't be expected to feed our class of animals!)

  • \(H_{0}: \mu \leq 1\)
  • \(H_{a}: \mu > 1\)
  • Let \(\bar{X} =\) the mean cost in dollars of macaroni and cheese in a certain town.
  • Student's \(t\)-distribution
  • \(t = 0.340\)
  • \(p\text{-value} = 0.36756\)
  • Conclusion: The mean cost could be $1, or less. At the 5% significance level, there is insufficient evidence to conclude that the mean price of a box of macaroni and cheese is more than $1.
  • \((0.8291, 1.241)\)

"William Shakespeare: The Tragedy of Hamlet, Prince of Denmark," by Jacqueline Ghodsi

THE CHARACTERS (in order of appearance):

  • HAMLET, Prince of Denmark and student of Statistics
  • POLONIUS, Hamlet’s tutor
  • HOROTIO, friend to Hamlet and fellow student

Scene: The great library of the castle, in which Hamlet does his lessons

(The day is fair, but the face of Hamlet is clouded. He paces the large room. His tutor, Polonius, is reprimanding Hamlet regarding the latter’s recent experience. Horatio is seated at the large table at right stage.)

POLONIUS: My Lord, how cans’t thou admit that thou hast seen a ghost! It is but a figment of your imagination!

HAMLET: I beg to differ; I know of a certainty that five-and-seventy in one hundred of us, condemned to the whips and scorns of time as we are, have gazed upon a spirit of health, or goblin damn’d, be their intents wicked or charitable.

POLONIUS If thou doest insist upon thy wretched vision then let me invest your time; be true to thy work and speak to me through the reason of the null and alternate hypotheses. (He turns to Horatio.) Did not Hamlet himself say, “What piece of work is man, how noble in reason, how infinite in faculties? Then let not this foolishness persist. Go, Horatio, make a survey of three-and-sixty and discover what the true proportion be. For my part, I will never succumb to this fantasy, but deem man to be devoid of all reason should thy proposal of at least five-and-seventy in one hundred hold true.

HORATIO (to Hamlet): What should we do, my Lord?

HAMLET: Go to thy purpose, Horatio.

HORATIO: To what end, my Lord?

HAMLET: That you must teach me. But let me conjure you by the rights of our fellowship, by the consonance of our youth, but the obligation of our ever-preserved love, be even and direct with me, whether I am right or no.

(Horatio exits, followed by Polonius, leaving Hamlet to ponder alone.)

(The next day, Hamlet awaits anxiously the presence of his friend, Horatio. Polonius enters and places some books upon the table just a moment before Horatio enters.)

POLONIUS: So, Horatio, what is it thou didst reveal through thy deliberations?

HORATIO: In a random survey, for which purpose thou thyself sent me forth, I did discover that one-and-forty believe fervently that the spirits of the dead walk with us. Before my God, I might not this believe, without the sensible and true avouch of mine own eyes.

POLONIUS: Give thine own thoughts no tongue, Horatio. (Polonius turns to Hamlet.) But look to’t I charge you, my Lord. Come Horatio, let us go together, for this is not our test. (Horatio and Polonius leave together.)

HAMLET: To reject, or not reject, that is the question: whether ‘tis nobler in the mind to suffer the slings and arrows of outrageous statistics, or to take arms against a sea of data, and, by opposing, end them. (Hamlet resignedly attends to his task.)

(Curtain falls)

"Untitled," by Stephen Chen

I've often wondered how software is released and sold to the public. Ironically, I work for a company that sells products with known problems. Unfortunately, most of the problems are difficult to create, which makes them difficult to fix. I usually use the test program X, which tests the product, to try to create a specific problem. When the test program is run to make an error occur, the likelihood of generating an error is 1%.

So, armed with this knowledge, I wrote a new test program Y that will generate the same error that test program X creates, but more often. To find out if my test program is better than the original, so that I can convince the management that I'm right, I ran my test program to find out how often I can generate the same error. When I ran my test program 50 times, I generated the error twice. While this may not seem much better, I think that I can convince the management to use my test program instead of the original test program. Am I right?

  • \(H_{0}: p = 0.01\)
  • \(H_{a}: p > 0.01\)
  • Let \(P′ =\) the proportion of errors generated
  • Normal for a single proportion
  • Decision: Reject the null hypothesis
  • Conclusion: At the 5% significance level, there is sufficient evidence to conclude that the proportion of errors generated is more than 0.01.

The“plus-4s” confidence interval is \((0.004, 0.144)\).

"Japanese Girls’ Names"

by Kumi Furuichi

It used to be very typical for Japanese girls’ names to end with “ko.” (The trend might have started around my grandmothers’ generation and its peak might have been around my mother’s generation.) “Ko” means “child” in Chinese characters. Parents would name their daughters with “ko” attaching to other Chinese characters which have meanings that they want their daughters to become, such as Sachiko—happy child, Yoshiko—a good child, Yasuko—a healthy child, and so on.

However, I noticed recently that only two out of nine of my Japanese girlfriends at this school have names which end with “ko.” More and more, parents seem to have become creative, modernized, and, sometimes, westernized in naming their children.

I have a feeling that, while 70 percent or more of my mother’s generation would have names with “ko” at the end, the proportion has dropped among my peers. I wrote down all my Japanese friends’, ex-classmates’, co-workers, and acquaintances’ names that I could remember. Following are the names. (Some are repeats.) Test to see if the proportion has dropped for this generation.

Ai, Akemi, Akiko, Ayumi, Chiaki, Chie, Eiko, Eri, Eriko, Fumiko, Harumi, Hitomi, Hiroko, Hiroko, Hidemi, Hisako, Hinako, Izumi, Izumi, Junko, Junko, Kana, Kanako, Kanayo, Kayo, Kayoko, Kazumi, Keiko, Keiko, Kei, Kumi, Kumiko, Kyoko, Kyoko, Madoka, Maho, Mai, Maiko, Maki, Miki, Miki, Mikiko, Mina, Minako, Miyako, Momoko, Nana, Naoko, Naoko, Naoko, Noriko, Rieko, Rika, Rika, Rumiko, Rei, Reiko, Reiko, Sachiko, Sachiko, Sachiyo, Saki, Sayaka, Sayoko, Sayuri, Seiko, Shiho, Shizuka, Sumiko, Takako, Takako, Tomoe, Tomoe, Tomoko, Touko, Yasuko, Yasuko, Yasuyo, Yoko, Yoko, Yoko, Yoshiko, Yoshiko, Yoshiko, Yuka, Yuki, Yuki, Yukiko, Yuko, Yuko.

"Phillip’s Wish," by Suzanne Osorio

My nephew likes to play

Chasing the girls makes his day.

He asked his mother

If it is okay

To get his ear pierced.

She said, “No way!”

To poke a hole through your ear,

Is not what I want for you, dear.

He argued his point quite well,

Says even my macho pal, Mel,

Has gotten this done.

It’s all just for fun.

C’mon please, mom, please, what the hell.

Again Phillip complained to his mother,

Saying half his friends (including their brothers)

Are piercing their ears

And they have no fears

He wants to be like the others.

She said, “I think it’s much less.

We must do a hypothesis test.

And if you are right,

I won’t put up a fight.

But, if not, then my case will rest.”

We proceeded to call fifty guys

To see whose prediction would fly.

Nineteen of the fifty

Said piercing was nifty

And earrings they’d occasionally buy.

Then there’s the other thirty-one,

Who said they’d never have this done.

So now this poem’s finished.

Will his hopes be diminished,

Or will my nephew have his fun?

  • \(H_{0}: p = 0.50\)
  • \(H_{a}: p < 0.50\)
  • Let \(P′ =\) the proportion of friends that has a pierced ear.
  • –1.70
  • \(p\text{-value} = 0.0448\)
  • Reason for decision: The \(p\text{-value}\) is less than 0.05. (However, they are very close.)
  • Conclusion: There is sufficient evidence to support the claim that less than 50% of his friends have pierced ears.
  • Confidence Interval: \((0.245, 0.515)\): The “plus-4s” confidence interval is \((0.259, 0.519)\).

"The Craven," by Mark Salangsang

Once upon a morning dreary

In stats class I was weak and weary.

Pondering over last night’s homework

Whose answers were now on the board

This I did and nothing more.

While I nodded nearly napping

Suddenly, there came a tapping.

As someone gently rapping,

Rapping my head as I snore.

Quoth the teacher, “Sleep no more.”

“In every class you fall asleep,”

The teacher said, his voice was deep.

“So a tally I’ve begun to keep

Of every class you nap and snore.

The percentage being forty-four.”

“My dear teacher I must confess,

While sleeping is what I do best.

The percentage, I think, must be less,

A percentage less than forty-four.”

This I said and nothing more.

“We’ll see,” he said and walked away,

And fifty classes from that day

He counted till the month of May

The classes in which I napped and snored.

The number he found was twenty-four.

At a significance level of 0.05,

Please tell me am I still alive?

Or did my grade just take a dive

Plunging down beneath the floor?

Upon thee I hereby implore.

Toastmasters International cites a report by Gallop Poll that 40% of Americans fear public speaking. A student believes that less than 40% of students at her school fear public speaking. She randomly surveys 361 schoolmates and finds that 135 report they fear public speaking. Conduct a hypothesis test to determine if the percent at her school is less than 40%.

  • \(H_{0}: p = 0.40\)
  • \(H_{a}: p < 0.40\)
  • Let \(P′ =\) the proportion of schoolmates who fear public speaking.
  • –1.01
  • \(p\text{-value} = 0.1563\)
  • Conclusion: There is insufficient evidence to support the claim that less than 40% of students at the school fear public speaking.
  • Confidence Interval: \((0.3241, 0.4240)\): The “plus-4s” confidence interval is \((0.3257, 0.4250)\).

Sixty-eight percent of online courses taught at community colleges nationwide were taught by full-time faculty. To test if 68% also represents California’s percent for full-time faculty teaching the online classes, Long Beach City College (LBCC) in California, was randomly selected for comparison. In the same year, 34 of the 44 online courses LBCC offered were taught by full-time faculty. Conduct a hypothesis test to determine if 68% represents California. NOTE: For more accurate results, use more California community colleges and this past year's data.

According to an article in Bloomberg Businessweek , New York City's most recent adult smoking rate is 14%. Suppose that a survey is conducted to determine this year’s rate. Nine out of 70 randomly chosen N.Y. City residents reply that they smoke. Conduct a hypothesis test to determine if the rate is still 14% or if it has decreased.

  • \(H_{0}: p = 0.14\)
  • \(H_{a}: p < 0.14\)
  • Let \(P′ =\) the proportion of NYC residents that smoke.
  • –0.2756
  • \(p\text{-value} = 0.3914\)
  • At the 5% significance level, there is insufficient evidence to conclude that the proportion of NYC residents who smoke is less than 0.14.
  • Confidence Interval: \((0.0502, 0.2070)\): The “plus-4s” confidence interval (see chapter 8) is \((0.0676, 0.2297)\).

The mean age of De Anza College students in a previous term was 26.6 years old. An instructor thinks the mean age for online students is older than 26.6. She randomly surveys 56 online students and finds that the sample mean is 29.4 with a standard deviation of 2.1. Conduct a hypothesis test.

Registered nurses earned an average annual salary of $69,110. For that same year, a survey was conducted of 41 California registered nurses to determine if the annual salary is higher than $69,110 for California nurses. The sample average was $71,121 with a sample standard deviation of $7,489. Conduct a hypothesis test.

  • \(H_{0}: \mu = 69,110\)
  • \(H_{0}: \mu > 69,110\)
  • Let \(\bar{X} =\) the mean salary in dollars for California registered nurses.
  • \(t = 1.719\)
  • \(p\text{-value}: 0.0466\)
  • Conclusion: At the 5% significance level, there is sufficient evidence to conclude that the mean salary of California registered nurses exceeds $69,110.
  • \(($68,757, $73,485)\)

La Leche League International reports that the mean age of weaning a child from breastfeeding is age four to five worldwide. In America, most nursing mothers wean their children much earlier. Suppose a random survey is conducted of 21 U.S. mothers who recently weaned their children. The mean weaning age was nine months (3/4 year) with a standard deviation of 4 months. Conduct a hypothesis test to determine if the mean weaning age in the U.S. is less than four years old.

Over the past few decades, public health officials have examined the link between weight concerns and teen girls' smoking. Researchers surveyed a group of 273 randomly selected teen girls living in Massachusetts (between 12 and 15 years old). After four years the girls were surveyed again. Sixty-three said they smoked to stay thin. Is there good evidence that more than thirty percent of the teen girls smoke to stay thin?

After conducting the test, your decision and conclusion are

  • Reject \(H_{0}\): There is sufficient evidence to conclude that more than 30% of teen girls smoke to stay thin.
  • Do not reject \(H_{0}\): There is not sufficient evidence to conclude that less than 30% of teen girls smoke to stay thin.
  • Do not reject \(H_{0}\): There is not sufficient evidence to conclude that more than 30% of teen girls smoke to stay thin.
  • Reject \(H_{0}\): There is sufficient evidence to conclude that less than 30% of teen girls smoke to stay thin.

A statistics instructor believes that fewer than 20% of Evergreen Valley College (EVC) students attended the opening night midnight showing of the latest Harry Potter movie. She surveys 84 of her students and finds that 11 of them attended the midnight showing.

At a 1% level of significance, an appropriate conclusion is:

  • There is insufficient evidence to conclude that the percent of EVC students who attended the midnight showing of Harry Potter is less than 20%.
  • There is sufficient evidence to conclude that the percent of EVC students who attended the midnight showing of Harry Potter is more than 20%.
  • There is sufficient evidence to conclude that the percent of EVC students who attended the midnight showing of Harry Potter is less than 20%.
  • There is insufficient evidence to conclude that the percent of EVC students who attended the midnight showing of Harry Potter is at least 20%.

Previously, an organization reported that teenagers spent 4.5 hours per week, on average, on the phone. The organization thinks that, currently, the mean is higher. Fifteen randomly chosen teenagers were asked how many hours per week they spend on the phone. The sample mean was 4.75 hours with a sample standard deviation of 2.0. Conduct a hypothesis test.

At a significance level of \(a = 0.05\), what is the correct conclusion?

  • There is enough evidence to conclude that the mean number of hours is more than 4.75
  • There is enough evidence to conclude that the mean number of hours is more than 4.5
  • There is not enough evidence to conclude that the mean number of hours is more than 4.5
  • There is not enough evidence to conclude that the mean number of hours is more than 4.75

Instructions: For the following ten exercises,

Hypothesis testing: For the following ten exercises, answer each question.

State the null and alternate hypothesis.

State the \(p\text{-value}\).

State \(\alpha\).

What is your decision?

Write a conclusion.

Answer any other questions asked in the problem.

According to the Center for Disease Control website, in 2011 at least 18% of high school students have smoked a cigarette. An Introduction to Statistics class in Davies County, KY conducted a hypothesis test at the local high school (a medium sized–approximately 1,200 students–small city demographic) to determine if the local high school’s percentage was lower. One hundred fifty students were chosen at random and surveyed. Of the 150 students surveyed, 82 have smoked. Use a significance level of 0.05 and using appropriate statistical evidence, conduct a hypothesis test and state the conclusions.

A recent survey in the N.Y. Times Almanac indicated that 48.8% of families own stock. A broker wanted to determine if this survey could be valid. He surveyed a random sample of 250 families and found that 142 owned some type of stock. At the 0.05 significance level, can the survey be considered to be accurate?

  • \(H_{0}: p = 0.488\) \(H_{a}: p \neq 0.488\)
  • \(p\text{-value} = 0.0114\)
  • \(\alpha = 0.05\)
  • Reject the null hypothesis.
  • At the 5% level of significance, there is enough evidence to conclude that 48.8% of families own stocks.
  • The survey does not appear to be accurate.

Driver error can be listed as the cause of approximately 54% of all fatal auto accidents, according to the American Automobile Association. Thirty randomly selected fatal accidents are examined, and it is determined that 14 were caused by driver error. Using \(\alpha = 0.05\), is the AAA proportion accurate?

The US Department of Energy reported that 51.7% of homes were heated by natural gas. A random sample of 221 homes in Kentucky found that 115 were heated by natural gas. Does the evidence support the claim for Kentucky at the \(\alpha = 0.05\) level in Kentucky? Are the results applicable across the country? Why?

  • \(H_{0}: p = 0.517\) \(H_{0}: p \neq 0.517\)
  • \(p\text{-value} = 0.9203\).
  • \(\alpha = 0.05\).
  • Do not reject the null hypothesis.
  • At the 5% significance level, there is not enough evidence to conclude that the proportion of homes in Kentucky that are heated by natural gas is 0.517.
  • However, we cannot generalize this result to the entire nation. First, the sample’s population is only the state of Kentucky. Second, it is reasonable to assume that homes in the extreme north and south will have extreme high usage and low usage, respectively. We would need to expand our sample base to include these possibilities if we wanted to generalize this claim to the entire nation.

For Americans using library services, the American Library Association claims that at most 67% of patrons borrow books. The library director in Owensboro, Kentucky feels this is not true, so she asked a local college statistic class to conduct a survey. The class randomly selected 100 patrons and found that 82 borrowed books. Did the class demonstrate that the percentage was higher in Owensboro, KY? Use \(\alpha = 0.01\) level of significance. What is the possible proportion of patrons that do borrow books from the Owensboro Library?

The Weather Underground reported that the mean amount of summer rainfall for the northeastern US is at least 11.52 inches. Ten cities in the northeast are randomly selected and the mean rainfall amount is calculated to be 7.42 inches with a standard deviation of 1.3 inches. At the \(\alpha = 0.05 level\), can it be concluded that the mean rainfall was below the reported average? What if \(\alpha = 0.01\)? Assume the amount of summer rainfall follows a normal distribution.

  • \(H_{0}: \mu \geq 11.52\) \(H_{a}: \mu < 11.52\)
  • \(p\text{-value} = 0.000002\) which is almost 0.
  • At the 5% significance level, there is enough evidence to conclude that the mean amount of summer rain in the northeaster US is less than 11.52 inches, on average.
  • We would make the same conclusion if alpha was 1% because the \(p\text{-value}\) is almost 0.

A survey in the N.Y. Times Almanac finds the mean commute time (one way) is 25.4 minutes for the 15 largest US cities. The Austin, TX chamber of commerce feels that Austin’s commute time is less and wants to publicize this fact. The mean for 25 randomly selected commuters is 22.1 minutes with a standard deviation of 5.3 minutes. At the \(\alpha = 0.10\) level, is the Austin, TX commute significantly less than the mean commute time for the 15 largest US cities?

A report by the Gallup Poll found that a woman visits her doctor, on average, at most 5.8 times each year. A random sample of 20 women results in these yearly visit totals

3; 2; 1; 3; 7; 2; 9; 4; 6; 6; 8; 0; 5; 6; 4; 2; 1; 3; 4; 1

At the \(\alpha = 0.05\) level can it be concluded that the sample mean is higher than 5.8 visits per year?

  • \(H_{0}: \mu \leq 5.8\) \(H_{a}: \mu > 5.8\)
  • \(p\text{-value} = 0.9987\)
  • At the 5% level of significance, there is not enough evidence to conclude that a woman visits her doctor, on average, more than 5.8 times a year.

According to the N.Y. Times Almanac the mean family size in the U.S. is 3.18. A sample of a college math class resulted in the following family sizes:

5; 4; 5; 4; 4; 3; 6; 4; 3; 3; 5; 5; 6; 3; 3; 2; 7; 4; 5; 2; 2; 2; 3; 2

At \(\alpha = 0.05\) level, is the class’ mean family size greater than the national average? Does the Almanac result remain valid? Why?

The student academic group on a college campus claims that freshman students study at least 2.5 hours per day, on average. One Introduction to Statistics class was skeptical. The class took a random sample of 30 freshman students and found a mean study time of 137 minutes with a standard deviation of 45 minutes. At α = 0.01 level, is the student academic group’s claim correct?

  • \(H_{0}: \mu \geq 150\) \(H_{0}: \mu < 150\)
  • \(p\text{-value} = 0.0622\)
  • \(\alpha = 0.01\)
  • At the 1% significance level, there is not enough evidence to conclude that freshmen students study less than 2.5 hours per day, on average.
  • The student academic group’s claim appears to be correct.

9.7: Hypothesis Testing of a Single Mean and Single Proportion

Research Hypothesis In Psychology: Types, & Examples

Saul Mcleod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul Mcleod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education.

Learn about our Editorial Process

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology.

On This Page:

A research hypothesis, in its plural form “hypotheses,” is a specific, testable prediction about the anticipated results of a study, established at its outset. It is a key component of the scientific method .

Hypotheses connect theory to data and guide the research process towards expanding scientific understanding

Some key points about hypotheses:

  • A hypothesis expresses an expected pattern or relationship. It connects the variables under investigation.
  • It is stated in clear, precise terms before any data collection or analysis occurs. This makes the hypothesis testable.
  • A hypothesis must be falsifiable. It should be possible, even if unlikely in practice, to collect data that disconfirms rather than supports the hypothesis.
  • Hypotheses guide research. Scientists design studies to explicitly evaluate hypotheses about how nature works.
  • For a hypothesis to be valid, it must be testable against empirical evidence. The evidence can then confirm or disprove the testable predictions.
  • Hypotheses are informed by background knowledge and observation, but go beyond what is already known to propose an explanation of how or why something occurs.
Predictions typically arise from a thorough knowledge of the research literature, curiosity about real-world problems or implications, and integrating this to advance theory. They build on existing literature while providing new insight.

Types of Research Hypotheses

Alternative hypothesis.

The research hypothesis is often called the alternative or experimental hypothesis in experimental research.

It typically suggests a potential relationship between two key variables: the independent variable, which the researcher manipulates, and the dependent variable, which is measured based on those changes.

The alternative hypothesis states a relationship exists between the two variables being studied (one variable affects the other).

A hypothesis is a testable statement or prediction about the relationship between two or more variables. It is a key component of the scientific method. Some key points about hypotheses:

  • Important hypotheses lead to predictions that can be tested empirically. The evidence can then confirm or disprove the testable predictions.

In summary, a hypothesis is a precise, testable statement of what researchers expect to happen in a study and why. Hypotheses connect theory to data and guide the research process towards expanding scientific understanding.

An experimental hypothesis predicts what change(s) will occur in the dependent variable when the independent variable is manipulated.

It states that the results are not due to chance and are significant in supporting the theory being investigated.

The alternative hypothesis can be directional, indicating a specific direction of the effect, or non-directional, suggesting a difference without specifying its nature. It’s what researchers aim to support or demonstrate through their study.

Null Hypothesis

The null hypothesis states no relationship exists between the two variables being studied (one variable does not affect the other). There will be no changes in the dependent variable due to manipulating the independent variable.

It states results are due to chance and are not significant in supporting the idea being investigated.

The null hypothesis, positing no effect or relationship, is a foundational contrast to the research hypothesis in scientific inquiry. It establishes a baseline for statistical testing, promoting objectivity by initiating research from a neutral stance.

Many statistical methods are tailored to test the null hypothesis, determining the likelihood of observed results if no true effect exists.

This dual-hypothesis approach provides clarity, ensuring that research intentions are explicit, and fosters consistency across scientific studies, enhancing the standardization and interpretability of research outcomes.

Nondirectional Hypothesis

A non-directional hypothesis, also known as a two-tailed hypothesis, predicts that there is a difference or relationship between two variables but does not specify the direction of this relationship.

It merely indicates that a change or effect will occur without predicting which group will have higher or lower values.

For example, “There is a difference in performance between Group A and Group B” is a non-directional hypothesis.

Directional Hypothesis

A directional (one-tailed) hypothesis predicts the nature of the effect of the independent variable on the dependent variable. It predicts in which direction the change will take place. (i.e., greater, smaller, less, more)

It specifies whether one variable is greater, lesser, or different from another, rather than just indicating that there’s a difference without specifying its nature.

For example, “Exercise increases weight loss” is a directional hypothesis.



The Falsification Principle, proposed by Karl Popper , is a way of demarcating science from non-science. It suggests that for a theory or hypothesis to be considered scientific, it must be testable and irrefutable.

Falsifiability emphasizes that scientific claims shouldn’t just be confirmable but should also have the potential to be proven wrong.

It means that there should exist some potential evidence or experiment that could prove the proposition false.

However many confirming instances exist for a theory, it only takes one counter observation to falsify it. For example, the hypothesis that “all swans are white,” can be falsified by observing a black swan.

For Popper, science should attempt to disprove a theory rather than attempt to continually provide evidence to support a research hypothesis.

Can a Hypothesis be Proven?

Hypotheses make probabilistic predictions. They state the expected outcome if a particular relationship exists. However, a study result supporting a hypothesis does not definitively prove it is true.

All studies have limitations. There may be unknown confounding factors or issues that limit the certainty of conclusions. Additional studies may yield different results.

In science, hypotheses can realistically only be supported with some degree of confidence, not proven. The process of science is to incrementally accumulate evidence for and against hypothesized relationships in an ongoing pursuit of better models and explanations that best fit the empirical data. But hypotheses remain open to revision and rejection if that is where the evidence leads.
  • Disproving a hypothesis is definitive. Solid disconfirmatory evidence will falsify a hypothesis and require altering or discarding it based on the evidence.
  • However, confirming evidence is always open to revision. Other explanations may account for the same results, and additional or contradictory evidence may emerge over time.

We can never 100% prove the alternative hypothesis. Instead, we see if we can disprove, or reject the null hypothesis.

If we reject the null hypothesis, this doesn’t mean that our alternative hypothesis is correct but does support the alternative/experimental hypothesis.

Upon analysis of the results, an alternative hypothesis can be rejected or supported, but it can never be proven to be correct. We must avoid any reference to results proving a theory as this implies 100% certainty, and there is always a chance that evidence may exist which could refute a theory.

How to Write a Hypothesis

  • Identify variables . The researcher manipulates the independent variable and the dependent variable is the measured outcome.
  • Operationalized the variables being investigated . Operationalization of a hypothesis refers to the process of making the variables physically measurable or testable, e.g. if you are about to study aggression, you might count the number of punches given by participants.
  • Decide on a direction for your prediction . If there is evidence in the literature to support a specific effect of the independent variable on the dependent variable, write a directional (one-tailed) hypothesis. If there are limited or ambiguous findings in the literature regarding the effect of the independent variable on the dependent variable, write a non-directional (two-tailed) hypothesis.
  • Make it Testable : Ensure your hypothesis can be tested through experimentation or observation. It should be possible to prove it false (principle of falsifiability).
  • Clear & concise language . A strong hypothesis is concise (typically one to two sentences long), and formulated using clear and straightforward language, ensuring it’s easily understood and testable.

Consider a hypothesis many teachers might subscribe to: students work better on Monday morning than on Friday afternoon (IV=Day, DV= Standard of work).

Now, if we decide to study this by giving the same group of students a lesson on a Monday morning and a Friday afternoon and then measuring their immediate recall of the material covered in each session, we would end up with the following:

  • The alternative hypothesis states that students will recall significantly more information on a Monday morning than on a Friday afternoon.
  • The null hypothesis states that there will be no significant difference in the amount recalled on a Monday morning compared to a Friday afternoon. Any difference will be due to chance or confounding factors.

More Examples

  • Memory : Participants exposed to classical music during study sessions will recall more items from a list than those who studied in silence.
  • Social Psychology : Individuals who frequently engage in social media use will report higher levels of perceived social isolation compared to those who use it infrequently.
  • Developmental Psychology : Children who engage in regular imaginative play have better problem-solving skills than those who don’t.
  • Clinical Psychology : Cognitive-behavioral therapy will be more effective in reducing symptoms of anxiety over a 6-month period compared to traditional talk therapy.
  • Cognitive Psychology : Individuals who multitask between various electronic devices will have shorter attention spans on focused tasks than those who single-task.
  • Health Psychology : Patients who practice mindfulness meditation will experience lower levels of chronic pain compared to those who don’t meditate.
  • Organizational Psychology : Employees in open-plan offices will report higher levels of stress than those in private offices.
  • Behavioral Psychology : Rats rewarded with food after pressing a lever will press it more frequently than rats who receive no reward.

Print Friendly, PDF & Email

5.3 - Hypothesis Testing for One-Sample Mean

In the previous section, we learned how to perform a hypothesis test for one proportion. The concepts of hypothesis testing remain constant for any hypothesis test. In these next few sections, we will present the hypothesis test for one mean. We start with our knowledge of the sampling distribution of the sample mean.

Hypothesis Test for One-Sample Mean

Recall that under certain conditions, the sampling distribution of the sample mean, \(\bar{x} \), is approximately normal with mean, \(\mu \), standard error \(\dfrac{\sigma}{\sqrt{n}} \), and estimated standard error \(\dfrac{s}{\sqrt{n}} \).

\(H_0\colon \mu=\mu_0\)


  • The distribution of the population is Normal
  • The sample size is large \( n>30 \).

Test Statistic:

If at least one of conditions are satisfied, then...

\( t=\dfrac{\bar{x}-\mu_0}{\frac{s}{\sqrt{n}}} \)

will follow a t-distribution with \(n-1 \) degrees of freedom.

Notice when working with continuous data we are going to use a t statistic as opposed to the z statistic. This is due to the fact that the sample size impacts the sampling distribution and needs to be taken into account. We do this by recognizing “degrees of freedom”. We will not go into too much detail about degrees of freedom in this course.

Let’s look at an example.

Example 5-1

This depends on the standard deviation of \(\bar{x} \) . 

\begin{align} t^*&=\dfrac{\bar{x}-\mu}{\frac{s}{\sqrt{n}}}\\&=\dfrac{8.3-8.5}{\frac{1.2}{\sqrt{61}}}\\&=-1.3 \end{align} 

Thus, we are asking if \(-1.3\) is very far away from zero, since that corresponds to the case when \(\bar{x}\) is equal to \(\mu_0 \). If it is far away, then it is unlikely that the null hypothesis is true and one rejects it. Otherwise, one cannot reject the null hypothesis. 

5.3.1- Steps in Conducting a Hypothesis Test for \(\mu\)

\( H_0\colon \mu=\mu_0 \)

\( H_a\colon \mu\ne \mu_0 \)

Conditions : The data comes from an approximately normal distribution or the sample size is at least 30

Typically, 5%. If \(\alpha\) is not specified, use 5%

One Mean t-test: \( t^*=\dfrac{\bar{x}-\mu_0}{\frac{s}{\sqrt{n}}} \)

Typically we will let Minitab handle this for us. But if you are really interested, you can look p values up in probability tables found in the appendix of your textbook!

If the p-value is less than the significance level, \(\alpha\), then reject \(H_0\) (and conclude \(H_a \)). If it is greater than the significance level, then do not reject \(H_0 \).

State an overall conclusion.

Conduct a one-sample mean t-test.

Note that these steps are very similar to those for one-mean confidence interval. The differences occur in steps 4 through 8.

To conduct the one sample mean t-test in Minitab...

  • Choose Stat > Basic Stat > 1 Sample t .
  • In the drop-down box use "One or more samples, each in a column" if you have the raw data, otherwise select "Summarized data" if you only have the sample statistics.
  • If using the raw data, enter the column of interest into the blank variable window below the drop down selection. If using summarized data, enter the sample size, sample mean, and sample standard deviation in their respective fields.
  • Choose the check box for "Perform hypothesis test" and enter the null hypothesis value.
  • Choose Options .
  • Enter the confidence level associated with alpha (e.g. 95% for alpha of 5%).
  • From the drop down list for "Alternative hypothesis" select the correct alternative.
  • Click OK and OK .


  1. Hypothesis Testing and Its Types. Learning Series I:

    hypothesis test with 3 samples

  2. Hypothesis Testing: Upper, Lower, and Two Tailed Tests

    hypothesis test with 3 samples

  3. Pin on Ciencias

    hypothesis test with 3 samples

  4. Hypothesis testing of frequency-based samples

    hypothesis test with 3 samples

  5. Hypothesis Testing Solved Problems

    hypothesis test with 3 samples

  6. Simpler & Easy Guide on Hypothesis Testing in R

    hypothesis test with 3 samples


  1. Hypothesis Testing

    Table of contents. Step 1: State your null and alternate hypothesis. Step 2: Collect data. Step 3: Perform a statistical test. Step 4: Decide whether to reject or fail to reject your null hypothesis. Step 5: Present your findings. Other interesting articles. Frequently asked questions about hypothesis testing.

  2. S.3.3 Hypothesis Testing Examples

    If the biologist set her significance level \(\alpha\) at 0.05 and used the critical value approach to conduct her hypothesis test, she would reject the null hypothesis if her test statistic t* were less than -1.6939 (determined using statistical software or a t-table):s-3-3. Since the biologist's test statistic, t* = -4.60, is less than -1.6939, the biologist rejects the null hypothesis.

  3. hypothesis testing

    However, it is possible to do the F-test if you have the 'sufficient statistics', which consist of the three sample sizes, the three sample means, and the three sample variances. My Answer gives elementary versions of the formulas (for equal sample sizes) and shows how to do the F-test using only the sufficient statistics. $\endgroup$ -

  4. 7.1: Basics of Hypothesis Testing

    Figure 7.1.1. Before calculating the probability, it is useful to see how many standard deviations away from the mean the sample mean is. Using the formula for the z-score from chapter 6, you find. z = ¯ x − μo σ / √n = 490 − 500 25 / √30 = − 2.19. This sample mean is more than two standard deviations away from the mean.

  5. A Complete Guide to Hypothesis Testing

    Photo from StepUp Analytics. Hypothesis testing is a method of statistical inference that considers the null hypothesis H₀ vs. the alternative hypothesis Ha, where we are typically looking to assess evidence against H₀. Such a test is used to compare data sets against one another, or compare a data set against some external standard. The former being a two sample test (independent or ...

  6. S.3 Hypothesis Testing

    S.3 Hypothesis Testing. In reviewing hypothesis tests, we start first with the general idea. Then, we keep returning to the basic procedures of hypothesis testing, each time adding a little more detail. The general idea of hypothesis testing involves: Making an initial assumption. Collecting evidence (data).

  7. 6a.2

    Below these are summarized into six such steps to conducting a test of a hypothesis. Set up the hypotheses and check conditions: Each hypothesis test includes two hypotheses about the population. One is the null hypothesis, notated as H 0, which is a statement of a particular parameter value. This hypothesis is assumed to be true until there is ...

  8. Statistical Hypothesis Testing Overview

    Hypothesis testing is a crucial procedure to perform when you want to make inferences about a population using a random sample. These inferences include estimating population properties such as the mean, differences between means, proportions, and the relationships between variables. This post provides an overview of statistical hypothesis testing.

  9. Significance tests (hypothesis testing)

    Significance tests give us a formal process for using sample data to evaluate the likelihood of some claim about a population value. Learn how to conduct significance tests and calculate p-values to see how likely a sample result is to occur by random chance. You'll also see how we use p-values to make conclusions about hypotheses.

  10. Hypothesis Testing

    Hypothesis testing is a technique that is used to verify whether the results of an experiment are statistically significant. It involves the setting up of a null hypothesis and an alternate hypothesis. There are three types of tests that can be conducted under hypothesis testing - z test, t test, and chi square test.

  11. 11.4 One-Way ANOVA and Hypothesis Tests for Three or More Population

    The null hypothesis is the claim that the population means are all equal and the alternative hypothesis is the claim that at least one of the population means is different from the others. Collect the sample information for the test and identify the significance level. The p-value is the area in the right tail of the [latex]F[/latex]-distribution.

  12. Hypothesis testing for equality of proportions with 3 samples

    In general, we can test for equality of proportions using a χ2 χ 2 test where the typical null hypothesis, H0 H 0, is the following: H0:p1 = p2 =... = pk H 0: p 1 = p 2 =... = p k. i.e., all of the proportions are equal to each other. Now in your case you null hypothesis is the following: H0: p1 = p2 = p3 H 0: p 1 = p 2 = p 3.

  13. Choosing the Right Statistical Test

    Choosing the Right Statistical Test | Types & Examples. Published on January 28, 2020 by Rebecca Bevans. Revised on June 22, 2023. ... A Step-by-Step Guide with Easy Examples Hypothesis testing is a formal procedure for investigating our ideas about the world. It allows you to statistically test your predictions. 2219.

  14. Compare t-test of difference in means of 3 samples

    2 Answers. I would make a recommendation that, rather than conducting three t t -tests, you conduct an ANOVA test. This is a test designed to assess equality of means of three or more groups and, if memory serves correctly, requires the same assumptions as the t t -test but for three or more groups. In addition, your statement had a subtle flaw.

  15. 2.3.4: Practice! Movies and Mood

    We have good reason to believe that people leaving the comedy will be in a better mood, so we use a one-tailed test at \(α\) = 0.05 to test our hypothesis. Our calculated test statistic has a value of \(t\) = 2.48, and in step 2 we found that the critical value is \(t*\) = 1.671.

  16. 8.6: Hypothesis Test of a Single Population Mean with Examples

    Full Hypothesis Test Examples. Example 8.6.4. Statistics students believe that the mean score on the first statistics test is 65. A statistics instructor thinks the mean score is higher than 65. He samples ten statistics students and obtains the scores 65 65 70 67 66 63 63 68 72 71.

  17. PDF 12 Hypothesis Testing With Three or More Population Means

    hypothesis tests. Explain measures of association and why they are necessary. Use SPSS to run analysis of variance and interpret the output. CHAPTER12 Hypothesis Testing With Three or More Population Means Analysis of Variance In Chapter 11, you learned how to determine whether a two-class categorical variable exerts an impact on a continuous

  18. ANOVA 3: Hypothesis test with F-statistic

    ANOVA is inherently a 2-sided test. Say you have two groups, A and B, and you want to run a 2-sample t-test on them, with the alternative hypothesis being: Ha: µ.a ≠ µ.b. You will get some test statistic, call it t, and some p-value, call it p1. If you then run an ANOVA on these two groups, you will get an test statistic, f, and a p-value p2.

  19. Chapter 3: Hypothesis Testing

    Components of a Formal Hypothesis Test. The null hypothesis is a statement about the value of a population parameter, such as the population mean (µ) or the population proportion (p).It contains the condition of equality and is denoted as H 0 (H-naught).. H 0: µ = 157 or H 0: p = 0.37. The alternative hypothesis is the claim to be tested, the opposite of the null hypothesis.

  20. Hypothesis test for difference in proportions example

    Flag. Evan. 4 years ago. Since we're subtracting the two samples, the mean would be the 1st sample mean minus the 2nd sample mean (µ1 - µ2). Sal finds that to be 0.38 - 0.33 = 0.05 at. 6:46. In this video, Sal is figuring out if there is convincing evidence that the difference in population means is actually 0.

  21. S.3.2 Hypothesis Testing (P-Value Approach)

    Two-Tailed. In our example concerning the mean grade point average, suppose again that our random sample of n = 15 students majoring in mathematics yields a test statistic t* instead of equaling -2.5.The P-value for conducting the two-tailed test H 0: μ = 3 versus H A: μ ≠ 3 is the probability that we would observe a test statistic less than -2.5 or greater than 2.5 if the population mean ...

  22. Confidence Intervals: Interpreting, Finding & Formulas

    A confidence interval (CI) is a range of values that is likely to contain the value of an unknown population parameter. These intervals represent a plausible domain for the parameter given the characteristics of your sample data. Confidence intervals are derived from sample statistics and are calculated using a specified confidence level.

  23. 9.E: Hypothesis Testing with One Sample (Exercises)

    Fifteen randomly chosen teenagers were asked how many hours per week they spend on the phone. The sample mean was 4.75 hours with a sample standard deviation of 2.0. Conduct a hypothesis test. The null and alternative hypotheses are: H0: ˉx = 4.5, Ha: ˉx > 4.5. H 0: x ¯ = 4.5, H a: x ¯ > 4.5. H0: μ ≥ 4.5, Ha: μ < 4.5.

  24. What Is Chi Square Test & How To Calculate Formula Equation

    Formula Calculation. Calculate the chi-square statistic (χ2) by completing the following steps: Calculate the expected frequencies and the observed frequencies. For each observed number in the table, subtract the corresponding expected number (O — E). Square the difference (O —E)². Sum all the values for (O - E)² / E.

  25. Choosing the Right Statistical Test: A Step-by-Step Guide

    7. Here's what else to consider. Be the first to add your personal experience. Choosing the right statistical test for your hypothesis is a pivotal step in data analysis. It can make the ...

  26. Calculating Sample Size for BI Hypothesis Tests

    Power analysis is a statistical method used to determine the sample size needed to detect an effect of a given size with a certain degree of confidence. It involves specifying the significance ...

  27. What Is The Null Hypothesis & When To Reject It

    The insignificance of null hypothesis significance testing. Political research quarterly, 52(3), 647-674. Krueger, J. (2001). Null hypothesis significance testing: On the survival of a flawed method. American Psychologist, 56(1), 16. Masson, M. E. (2011). A tutorial on a practical Bayesian alternative to null-hypothesis significance testing.

  28. Research Hypothesis In Psychology: Types, & Examples

    Examples. A research hypothesis, in its plural form "hypotheses," is a specific, testable prediction about the anticipated results of a study, established at its outset. It is a key component of the scientific method. Hypotheses connect theory to data and guide the research process towards expanding scientific understanding.

  29. 5.3

    If using the raw data, enter the column of interest into the blank variable window below the drop down selection. If using summarized data, enter the sample size, sample mean, and sample standard deviation in their respective fields. Choose the check box for "Perform hypothesis test" and enter the null hypothesis value. Choose Options.