The Classroom | Empowering Students in Their College Journey

What Is a Confirmed Hypothesis?

The Real Difference Between Reliability and Validity

The Real Difference Between Reliability and Validity

A hypothesis is a provisional idea or explanation requiring evaluation. It is a key component of the scientific method. Every scientific study, whether experimental or descriptive, begins with a hypothesis that the study is designed to test -- that is, depending on the results of the study, the hypothesis will be either confirmed or disconfirmed.

Unknown Outcome

Every well-designed study is designed to test something that we do not already know or which is reasonably subject to investigation. Though hypotheses are often “best guesses” about the outcome of a study, the outcome itself should not be something already known by the researcher. Outcomes that the researcher already knows are called “consequences” and should be taken into consideration when forming a study’s hypothesis. A consequence cannot in any real sense be confirmed, since it’s something that is already known.

Reducing the Question to Variables

As a first step in creating a hypothesis, researchers reduce the question that they're investigating to variables -- that is, measurable values. If a given question cannot be reduced to variables, it most likely is not a question answerable by scientific study. In an experimental study, which is one that attempts to show that one thing causes or affects another, these variables are directional -- the hypothesis will claim that if there is a certain change in one variable, there will be a corresponding change in another, but not necessarily the other way around. In a descriptive study, which attempts to show correlation but not necessarily causation between two or more things, there is no directionality to the variables.

Falsifiability

Falsifiability refers to the idea that there must be some set of conditions that could occur that would show that a given hypothesis is false. For example, if a hypothesis states, “If mice eat twice as many calories, they will show weight gain,” there is a set of conditions under which the hypothesis would be false -- the mice eat twice as many calories but do not show weight gain. If there is no such set of conditions, then the hypothesis has been poorly designed -- because it cannot be disconfirmed, it cannot in any real sense be confirmed.

Confirmation

If a well-designed study delivers the results predicted by the hypothesis, then that hypothesis is confirmed. Note, however, that there is a difference between a confirmed hypothesis and a “proven” hypothesis. Scientific studies can support a given hypothesis, but they do not claim to absolutely prove hypotheses -- there could always be some other explanation for why a given study obtained the results it did. Generally speaking, however, the more often a study or experiment obtains the same results, the more heavily supported -- and thus, more likely to be correct -- a given hypothesis is.

Related Articles

How to Evaluate Statistical Analysis

How to Evaluate Statistical Analysis

What Is Experimental Research Design?

What Is Experimental Research Design?

What Components Are Necessary for an Experiment to Be Valid?

What Components Are Necessary for an Experiment to Be Valid?

What Makes an Experiment Testable?

What Makes an Experiment Testable?

Qualitative and Quantitative Research Methods

Qualitative and Quantitative Research Methods

The Disadvantages of Qualitative & Quantitative Research

The Disadvantages of Qualitative & Quantitative Research

What Must Happen for Scientific Theories to Be Accepted as Valid?

What Must Happen for Scientific Theories to Be Accepted as Valid?

Types of Primary Data

Types of Primary Data

  • LiveScience.com: What Is a Scientific Hypothesis?

Based in Chicago, Adam Jefferys has been writing since 2007. He teaches college writing and literature, and has tutored students in ESL. He holds a Masters of Fine Arts in creative writing, and is currently completing a PhD in English Studies.

Research Hypothesis In Psychology: Types, & Examples

Saul Mcleod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul Mcleod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Learn about our Editorial Process

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

On This Page:

A research hypothesis, in its plural form “hypotheses,” is a specific, testable prediction about the anticipated results of a study, established at its outset. It is a key component of the scientific method .

Hypotheses connect theory to data and guide the research process towards expanding scientific understanding

Some key points about hypotheses:

  • A hypothesis expresses an expected pattern or relationship. It connects the variables under investigation.
  • It is stated in clear, precise terms before any data collection or analysis occurs. This makes the hypothesis testable.
  • A hypothesis must be falsifiable. It should be possible, even if unlikely in practice, to collect data that disconfirms rather than supports the hypothesis.
  • Hypotheses guide research. Scientists design studies to explicitly evaluate hypotheses about how nature works.
  • For a hypothesis to be valid, it must be testable against empirical evidence. The evidence can then confirm or disprove the testable predictions.
  • Hypotheses are informed by background knowledge and observation, but go beyond what is already known to propose an explanation of how or why something occurs.
Predictions typically arise from a thorough knowledge of the research literature, curiosity about real-world problems or implications, and integrating this to advance theory. They build on existing literature while providing new insight.

Types of Research Hypotheses

Alternative hypothesis.

The research hypothesis is often called the alternative or experimental hypothesis in experimental research.

It typically suggests a potential relationship between two key variables: the independent variable, which the researcher manipulates, and the dependent variable, which is measured based on those changes.

The alternative hypothesis states a relationship exists between the two variables being studied (one variable affects the other).

A hypothesis is a testable statement or prediction about the relationship between two or more variables. It is a key component of the scientific method. Some key points about hypotheses:

  • Important hypotheses lead to predictions that can be tested empirically. The evidence can then confirm or disprove the testable predictions.

In summary, a hypothesis is a precise, testable statement of what researchers expect to happen in a study and why. Hypotheses connect theory to data and guide the research process towards expanding scientific understanding.

An experimental hypothesis predicts what change(s) will occur in the dependent variable when the independent variable is manipulated.

It states that the results are not due to chance and are significant in supporting the theory being investigated.

The alternative hypothesis can be directional, indicating a specific direction of the effect, or non-directional, suggesting a difference without specifying its nature. It’s what researchers aim to support or demonstrate through their study.

Null Hypothesis

The null hypothesis states no relationship exists between the two variables being studied (one variable does not affect the other). There will be no changes in the dependent variable due to manipulating the independent variable.

It states results are due to chance and are not significant in supporting the idea being investigated.

The null hypothesis, positing no effect or relationship, is a foundational contrast to the research hypothesis in scientific inquiry. It establishes a baseline for statistical testing, promoting objectivity by initiating research from a neutral stance.

Many statistical methods are tailored to test the null hypothesis, determining the likelihood of observed results if no true effect exists.

This dual-hypothesis approach provides clarity, ensuring that research intentions are explicit, and fosters consistency across scientific studies, enhancing the standardization and interpretability of research outcomes.

Nondirectional Hypothesis

A non-directional hypothesis, also known as a two-tailed hypothesis, predicts that there is a difference or relationship between two variables but does not specify the direction of this relationship.

It merely indicates that a change or effect will occur without predicting which group will have higher or lower values.

For example, “There is a difference in performance between Group A and Group B” is a non-directional hypothesis.

Directional Hypothesis

A directional (one-tailed) hypothesis predicts the nature of the effect of the independent variable on the dependent variable. It predicts in which direction the change will take place. (i.e., greater, smaller, less, more)

It specifies whether one variable is greater, lesser, or different from another, rather than just indicating that there’s a difference without specifying its nature.

For example, “Exercise increases weight loss” is a directional hypothesis.

hypothesis

Falsifiability

The Falsification Principle, proposed by Karl Popper , is a way of demarcating science from non-science. It suggests that for a theory or hypothesis to be considered scientific, it must be testable and irrefutable.

Falsifiability emphasizes that scientific claims shouldn’t just be confirmable but should also have the potential to be proven wrong.

It means that there should exist some potential evidence or experiment that could prove the proposition false.

However many confirming instances exist for a theory, it only takes one counter observation to falsify it. For example, the hypothesis that “all swans are white,” can be falsified by observing a black swan.

For Popper, science should attempt to disprove a theory rather than attempt to continually provide evidence to support a research hypothesis.

Can a Hypothesis be Proven?

Hypotheses make probabilistic predictions. They state the expected outcome if a particular relationship exists. However, a study result supporting a hypothesis does not definitively prove it is true.

All studies have limitations. There may be unknown confounding factors or issues that limit the certainty of conclusions. Additional studies may yield different results.

In science, hypotheses can realistically only be supported with some degree of confidence, not proven. The process of science is to incrementally accumulate evidence for and against hypothesized relationships in an ongoing pursuit of better models and explanations that best fit the empirical data. But hypotheses remain open to revision and rejection if that is where the evidence leads.
  • Disproving a hypothesis is definitive. Solid disconfirmatory evidence will falsify a hypothesis and require altering or discarding it based on the evidence.
  • However, confirming evidence is always open to revision. Other explanations may account for the same results, and additional or contradictory evidence may emerge over time.

We can never 100% prove the alternative hypothesis. Instead, we see if we can disprove, or reject the null hypothesis.

If we reject the null hypothesis, this doesn’t mean that our alternative hypothesis is correct but does support the alternative/experimental hypothesis.

Upon analysis of the results, an alternative hypothesis can be rejected or supported, but it can never be proven to be correct. We must avoid any reference to results proving a theory as this implies 100% certainty, and there is always a chance that evidence may exist which could refute a theory.

How to Write a Hypothesis

  • Identify variables . The researcher manipulates the independent variable and the dependent variable is the measured outcome.
  • Operationalized the variables being investigated . Operationalization of a hypothesis refers to the process of making the variables physically measurable or testable, e.g. if you are about to study aggression, you might count the number of punches given by participants.
  • Decide on a direction for your prediction . If there is evidence in the literature to support a specific effect of the independent variable on the dependent variable, write a directional (one-tailed) hypothesis. If there are limited or ambiguous findings in the literature regarding the effect of the independent variable on the dependent variable, write a non-directional (two-tailed) hypothesis.
  • Make it Testable : Ensure your hypothesis can be tested through experimentation or observation. It should be possible to prove it false (principle of falsifiability).
  • Clear & concise language . A strong hypothesis is concise (typically one to two sentences long), and formulated using clear and straightforward language, ensuring it’s easily understood and testable.

Consider a hypothesis many teachers might subscribe to: students work better on Monday morning than on Friday afternoon (IV=Day, DV= Standard of work).

Now, if we decide to study this by giving the same group of students a lesson on a Monday morning and a Friday afternoon and then measuring their immediate recall of the material covered in each session, we would end up with the following:

  • The alternative hypothesis states that students will recall significantly more information on a Monday morning than on a Friday afternoon.
  • The null hypothesis states that there will be no significant difference in the amount recalled on a Monday morning compared to a Friday afternoon. Any difference will be due to chance or confounding factors.

More Examples

  • Memory : Participants exposed to classical music during study sessions will recall more items from a list than those who studied in silence.
  • Social Psychology : Individuals who frequently engage in social media use will report higher levels of perceived social isolation compared to those who use it infrequently.
  • Developmental Psychology : Children who engage in regular imaginative play have better problem-solving skills than those who don’t.
  • Clinical Psychology : Cognitive-behavioral therapy will be more effective in reducing symptoms of anxiety over a 6-month period compared to traditional talk therapy.
  • Cognitive Psychology : Individuals who multitask between various electronic devices will have shorter attention spans on focused tasks than those who single-task.
  • Health Psychology : Patients who practice mindfulness meditation will experience lower levels of chronic pain compared to those who don’t meditate.
  • Organizational Psychology : Employees in open-plan offices will report higher levels of stress than those in private offices.
  • Behavioral Psychology : Rats rewarded with food after pressing a lever will press it more frequently than rats who receive no reward.

Print Friendly, PDF & Email

Related Articles

Phenomenology In Qualitative Research

Research Methodology

Phenomenology In Qualitative Research

Ethnography In Qualitative Research

Ethnography In Qualitative Research

Narrative Analysis In Qualitative Research

Narrative Analysis In Qualitative Research

Thematic Analysis: A Step by Step Guide

Thematic Analysis: A Step by Step Guide

Metasynthesis Of Qualitative Research

Metasynthesis Of Qualitative Research

Grounded Theory In Qualitative Research: A Practical Guide

Grounded Theory In Qualitative Research: A Practical Guide

SEP home page

  • Table of Contents
  • Random Entry
  • Chronological
  • Editorial Information
  • About the SEP
  • Editorial Board
  • How to Cite the SEP
  • Special Characters
  • Advanced Tools
  • Support the SEP
  • PDFs for SEP Friends
  • Make a Donation
  • SEPIA for Libraries
  • Entry Contents

Bibliography

Academic tools.

  • Friends PDF Preview
  • Author and Citation Info
  • Back to Top

Confirmation

Human cognition and behavior heavily relies on the notion that evidence (data, premises) can affect the credibility of hypotheses (theories, conclusions). This general idea seems to underlie sound and effective inferential practices in all sorts of domains, from everyday reasoning up to the frontiers of science. Yet it is also clear that, even with extensive and truthful evidence available, drawing a mistaken conclusion is more than a mere possibility. For painfully tangible examples, one only has to consider missed medical diagnoses (see Winters et al. 2012) or judicial errors (see Liebman et al. 2000). The Scottish philosopher David Hume (1711–1776) is usually credited for having disclosed the theoretical roots of these considerations in a particularly transparent way (see Howson 2000, Lange 2011, and Varzi 2008). In most cases of interest, Hume pointed out, many alternative candidate hypotheses remain logically compatible with all the relevant information at one’s disposal, so that none of the former can be singled out by the latter with full certainty. Thus, under usual circumstances, reasoning from evidence must remain fallible.

This fundamental insight has been the source of a lasting theoretical challenge: if amenable to analysis, the role of evidence as supporting (or infirming) hypotheses has to be grasped by more nuanced tools than plain logical entailment. As emphasized in a joke attributed to American philosopher Morris Raphael Cohen (1880–1947), logic textbooks had to be divided in two parts: in the first part, on deductive logic, unwarranted forms of inference (deductive fallacies) are exposed; in the second part, on inductive logic, they are endorsed (see Meehl 1990, 110). In contemporary philosophy, confirmation theory can be roughly described as the area where efforts have been made to take up the challenge of defining plausible models of non-deductive reasoning. Its central technical term— confirmation —has often been used more or less interchangeably with “evidential support”, “inductive strength”, and the like. Here we will generally comply with this liberal usage (although more subtle conceptual and terminological distinctions are sometimes drawn).

Confirmation theory has proven a rather difficult endeavour. In principle, it would aim at providing understanding and guidance for tasks such as diagnosis, prediction, and learning in virtually any area of inquiry. Yet popular accounts of confirmation have often been taken to run into troubles even when faced with toy philosophical examples. Be that as it may, there is at least one real-world kind of activity which has remained a prevalent target and benchmark, i.e., scientific reasoning, and especially key episodes from the history of modern and contemporary natural science. The motivation for this is easily figured out. Mature sciences seem to have been uniquely effective in relying on observed evidence to establish extremely general, powerful, and sophisticated theories. Indeed, being capable of receiving genuine support from empirical evidence is itself a very distinctive trait of scientific hypotheses as compared to other kinds of statements. A philosophical characterization of what science is would then seem to require an understanding of the logic of confirmation. And so, traditionally, confirmation theory has come to be a central concern of philosophers of science.

In the following, major approaches to confirmation theory are overviewed according to a classification that is relatively standard (see Earman and Salmon 1992; Norton 2005): confirmation by instances (Section 1), hypothetico-deductivism and its variants (Section 2), and probabilistic (Bayesian) approaches (Section 3).

1.1 Hempel’s theory

1.2 two paradoxes and other difficulties, 2.1 hd vs. hempelian confirmation, 2.2 back to black (ravens), 2.3 underdetermination and the duhemian challenge, 2.4 the extended hd menu, 3.1 probabilistic confirmation as firmness, 3.2 strengths and infirmities of firmness, 3.3 probabilistic relevance confirmation, 3.4 differences, ratios, and partial entailment, 3.5 new evidence, old evidence, and total evidence, 3.6 paradoxes probabilified and other elucidations, other internet resources, related entries, 1. confirmation by instances.

In a seminal essay on induction, Jean Nicod (1924) offered the following important remark:

Consider the formula or the law: \(F\) entails \(G\). How can a particular proposition, or more briefly, a fact affect its probability? If this fact consists of the presence of \(G\) in a case of \(F\), it is favourable to the law […]; on the contrary, if it consists of the absence of \(G\) in a case of \(F\), it is unfavourable to this law. (219, notation slightly adapted)

Nicod’s work was an influential source for Carl Gustav Hempel’s (1943, 1945) early studies in the logic of confirmation. In Hempel’s view, the key valid message of Nicod’s statement is that the observation report that an object \(a\) displays properties \(F\) and \(G\) (e.g., that \(a\) is a swan and is white) confirms the universal hypothesis that all \(F\)-objects are \(G\)-objects (namely, that all swans are white). Apparently, it is by means of this kind of confirmation by instances that one can obtain supporting evidence for statements such as “sodium salts burn yellow”, “wolves live in a pack”, or “planets move in elliptical orbits” (also see Russell 1912, Ch. 6). We will now see the essential features of Hempel’s analysis of confirmation.

Hempel’s theory addresses the non-deductive relation of confirmation between evidence and hypothesis, but relies thoroughly on standard logic for its full technical formulation. As a consequence, it also goes beyond Nicod’s idea in terms of clarity and rigor.

Let \(\bL\) be the set of the closed sentences of a first-order logical language \(L\) (finite, for simplicity) and consider \(h, e \in \bL\). Also let \(e\), the evidence statement, be consistent and contain individual constants only (no quantifier), and let \(I(e)\) be the set of all constants occurring (non-vacuously) in \(e\). So, for example, if \(e = Qa \wedge Ra\), then \(I(e) = \{a\}\), and if \(e = Qa \wedge Qb\), then \(I(e) = \{a,b\}\). (The non-vacuity clause is meant to ensure that if sentence \(e\) happens to be, say, \(Qa \wedge Qb \wedge (Rc \vee \neg Rc)\), then \(I(e)\) still is \(\{a, b\}\), for \(e\) does not really state anything non-trivial about the individual denoted by \(c\). See Sprenger 2011a, 241–242.) Hempel’s theory relies on the technical construct of the development of hypothesis \(h\) for evidence \(e\), or the \(e\)-development of \(h\), indicated by \(dev_{e}(h)\). Intuitively, \(dev_{e}(h)\) is all that (and only what) \(h\) says once restricted to the individuals mentioned (non-vacuously) in \(e\), i.e., exactly those denoted by the elements of \(I(e)\).

The notion of the \(e\)-development of hypothesis \(h\) can be given an entirely general and precise definition, but we’ll not need this level of detail here. Suffice it to say that the \(e\)-development of a universally quantified material conditional \(\forall x(Fx \rightarrow Gx)\) is just as expected, that is: \(Fa \rightarrow Ga\) in case \(I(e) = \{a\}\); \((Fa \rightarrow Ga) \wedge (Fb \rightarrow Gb)\) in case \(I(e) = \{a,b\}\), and so on. Following Hempel, we will take universally quantified material conditionals as canonical logical representations of relevant hypotheses. So, for instance, we will count a statement of the form \(\forall x(Fx \rightarrow Gx)\) as an adequate rendition of, say, “all pieces of copper conduct electricity”.

In Hempel’s theory, evidence statement \(e\) is said to confirm hypothesis \(h\) just in case it entails, not \(h\) in its full extension, but suitable instantiations of \(h\). The technical notion of the \(e\)-development of \(h\) is devised to identify precisely those relevant instantiations, that is, the consequences of \(h\) as restricted to the individuals involved in \(e\). More precisely, Hempelian confirmation can be defined as follows:

  • evidence \(e\) directly Hempel-confirms hypothesis \(h\) if and only if \(e \vDash dev_{e}(h)\); \(e\) Hempel-confirms \(h\) if and only if, for some \(s \in \bL\), \(e \vDash dev_{e}(s)\) and \(s \vDash h\);
  • evidence \(e\) directly Hempel-disconfirms hypothesis \(h\) if and only if \(e \vDash dev_{e}(\neg h)\); \(e\) Hempel-disconfirms \(h\) if and only if, for some \(s \in \bL, e \vDash dev_{e}(s)\) and \(s \vDash \neg h\);
  • evidence \(e\) is Hempel-neutral for hypothesis \(h\) otherwise.

In each of clauses (i) and (ii), Hempelian confirmation (disconfirmation, respectively) is a generalization of direct Hempelian confirmation (disconfirmation). To retrieve the latter as a special case of the former, one only has to posit \(s = h\) \((\neg h\), respectively, for disconfirmation).

By direct Hempelian confirmation, evidence statement \(e\) that, say, object \(a\) is a white swan, \(swan(a) \wedge white(a)\), confirms hypothesis \(h\) that all swans are white, \(\forall x(swan(x) \rightarrow white(x))\), because the former entails the \(e\)-development of the latter, that is, \(swan(a) \rightarrow white(a)\). This is a desired result, according to Hempel’s reading of Nicod. By (indirect) Hempelian confirmation, moreover, \(swan(a) \wedge white(a)\) also confirms that a particular further object \(b\) will be white, if it’s a swan, i.e., \(swan(b) \rightarrow white(b)\) (to see this, just set \(s = \forall x(swan(x) \rightarrow white(x))\)).

The second possibility considered by Nicod (“the absence of \(G\) in a case of \(F\,\)”) can be accounted for by Hempelian disconfirmation. For the evidence statement \(e\) that \(a\) is a non-white swan—\(swan(a) \wedge \neg white(a)\)—entails (in fact, is identical to) the \(e\)-development of the hypothesis that there exist non-white swans—\(\exists x(swan(x) \wedge \neg white(x))\)—which in turn is just the negation of \(\forall x(swan(x) \rightarrow white(x))\). So the latter is disconfirmed by the evidence in this case. And finally, \(e = swan(a) \wedge \neg white(a)\) also Hempel-disconfirms that a particular further object \(b\) will be white, if it’s a swan, i.e., \(swan(b) \rightarrow white(b)\), because the negation of the latter, \(swan(b) \wedge \neg white(b)\), is entailed by \(s = \forall x(swan(x) \wedge \neg white(x))\) and \(e \vDash dev_{e}(s)\).

So, to sum up, we have four illustrations of how Hempel’s theory articulates Nicod’s basic ideas, to wit:

  • (the observation report of) a white swan (directly) Hempel-confirms that all swans are white;
  • (the observation report of) a white swan also Hempel-confirms that a further swan will be white;
  • (the observation report of) a non-white swan (directly) Hempel-disconfirms that all swans are white;
  • (the observation report of) a non-white swan also Hempel-disconfirms that a further swan will be white.

The ravens paradox (Hempel 1937, 1945). Consider the following statements:

Is hypothesis \(h\) confirmed by \(e\) and \(e^*\) alike? That is, is the claim that all ravens are black equally confirmed by the observation of a black raven and by the observation of a non-black non-raven (e.g., a green apple)? One would want to say no, but Hempel’s theory is unable to draw this distinction. Let’s see why.

As we know, \(e\) (directly) Hempel-confirms \(h\), according to Hempel’s reconstruction of Nicod. By the same token, \(e^*\) (directly) Hempel-confirms the hypothesis that all non-black objects are non-ravens, i.e., \(h^* = \forall x(\neg black(x) \rightarrow \neg raven(x))\). But \(h^* \vDash h\) (\(h\) and \(h^*\) are just logically equivalent). So, \(e^*\) (the observation report of a non-black non-raven), like \(e\) (black raven), does (indirectly) Hempel-confirm \(h\) (all ravens are black). Indeed, as \(\neg raven(a)\) entails \(raven(a) \rightarrow black(a)\), it can be shown that \(h\) is (directly) Hempel-confirmed by the observation of any object that is not a raven (an apple, a cat, a shoe), apparently disclosing puzzling “prospects for indoor ornithology” (Goodman 1955, 71).

\(Blite\) (Goodman 1955). Consider the peculiar predicate “\(blite\)”, defined as follows: an object is blite just in case (i) it is black if examined at some moment \(t\) up to some future time \(T\) (say, the next expected appearance of Halley’s comet, in 2061) and (ii) it is white if examined afterwards. So we posit \(blite(x) \equiv (ex_{t\le T}(x) \rightarrow black(x)) \wedge (\neg ex_{t\le T}(x) \rightarrow white(x))\). Now consider the following statements:

Does \(e\) confirm hypotheses \(h\) and \(h^*\) alike? That is, does the observation of a black raven before \(T\) confirms equally the claim that all ravens are black as the claim that all ravens are blite? Here again, one would want to say no, but Hempel’s theory is unable to draw the distinction. For one can check that the \(e\)-developments of \(h\) and \(h^*\) are both entailed by \(e\). Thus, \(e\) (the report of a raven examined no later than \(T\) and found to be black) does Hempel-confirm \(h^*\) (all ravens are blite) just as it confirms \(h\) (all ravens are black). Moreover, \(e\) also Hempel-confirms the statement that a raven will be white if examined after \(T\), because this is a logical consequence of \(h^*\) (which is directly Hempel-confirmed by \(e\)). And finally, suppose that \(blurple(x) \equiv (ex_{t\le T}(x) \rightarrow black(x)) \wedge (\neg ex_{t\le T}(x) \rightarrow purple(x)).\) We then have that the very same evidence statement \(e\) Hempel-confirms the hypothesis that all ravens are blurple, and thus also its implication that a raven will be \(purple\) if examined after \(T\)!

A seemingly obvious idea, here, is that there must be something inherently wrong with predicates such as \(blite\) or \(blurple\) (and perhaps non-raven and non-black , too) and thus a principled way to rule them out as “unnatural”. Then one could restrict confirmation theory accordingly, i.e., to “natural kinds” only (see, e.g., Quine 1970). Yet this point turns out be very difficult to pursue coherently and it has not borne much fruit in this discussion (Rinard 2014 is a recent exception). After all, for all we know, it is a perfectly “natural” feature of a token of the “natural kind” water that it is found in one physical state for temperatures below 0 degrees Celsius and in an entirely different state for temperatures above that threshold. So why should the time threshold \(T\) in \(blite\) or \(blurple\) be a reason to dismiss those predicates? (The water example comes from Howson 2000, 31–32. See Schwartz 2011, 399 ff., for a more general assessment of this issue.)

The above, widely known “paradoxes” then suggest that Hempel’s analysis of confirmation is too liberal : it sanctions the existence of confirmation relations that are intuitively very unsound (see Earman and Salmon 1992, 54, and Sprenger 2011a, 243, for more on this). Yet the Hempelian notion of confirmation turns out to be very restrictive, too, on other accounts. For suppose that hypothesis \(h\) and evidence \(e\) do not share any piece of non-logical vocabulary. \(h\) might be, say, Newton’s law of universal gravitation (connecting force, distances and masses), while \(e\) could be the description of certain spots on a telescopic image. Throughout modern physics, significant relations of confirmation and disconfirmation were taken to obtain between statements like these. Indeed, telescopic sightings have been crucial evidence for Newton’s law as applied to celestial bodies. However, as their non-logical vocabularies are disjoint, \(e\) and \(h\) must simply be logically independent, and so must be \(e\) and \(dev_{e}(h)\) (with very minor caveats, this follows from Craig’s so-called interpolation theorem, see Craig 1957). In such circumstances, there can be nothing but Hempel-neutrality between evidence and hypothesis. So Hempel’s original theory seems to lack the resources to capture a key feature of inductive inference in science as well as in several other domains, i.e., the kind of “vertical” relationships of confirmation (and disconfirmation) between the description of observed phenomena and hypotheses concerning underlying structures, causes, and processes.

To overcome the latter difficulty, Clark Glymour (1980a) embedded a refined version of Hempelian confirmation by instances in his analysis of scientific reasoning. In Glymour’s revision, hypothesis \(h\) is confirmed by some evidence \(e\) even if appropriate auxiliary hypotheses and assumptions must be involved for \(e\) to entail the relevant instances of \(h\). This important theoretical move turns confirmation into a three -place relation concerning the evidence, the target hypothesis, and (a conjunction of) auxiliaries. Originally, Glymour presented his sophisticated neo-Hempelian approach in stark contrast with the competing traditional view of so-called hypothetico-deductivism (HD). Despite his explicit intentions, however, several commentators have pointed out that, partly because of the due recognition of the role of auxiliary assumptions, Glymour’s proposal and HD end up being plagued by similar difficulties (see, e.g., Horwich 1983, Woodward 1983, and Worrall 1982). In the next section, we will discuss the HD framework for confirmation and also compare it with Hempelian confirmation. It will thus be convenient to have a suitable extended definition of the latter, following the remarks above. Here is one that serves our purposes:

  • \(e\) directly Hempel-confirms \(h\) relative to \(k\) if and only if \(e\wedge k \vDash dev_{e}(h)\); \(e\) Hempel-confirms \(h\) relative to \(k\) if and only if, for some \(s \in \bL, e\wedge k \vDash dev_{e}(s)\) and \(s\wedge k \vDash h\);
  • \(e\) directly Hempel-disconfirms \(h\) relative to \(k\) if and only if \(e\wedge k \vDash dev_{e}(\neg h)\); \(e\) Hempel-disconfirms \(h\) relative to \(k\) if and only if, for some \(s\in \bL, e\wedge k \vDash dev_{e}(s)\)a and \(s\wedge k \vDash \neg h\);
  • \(e\) is Hempel-neutral for \(h\) relative to \(k\) otherwise.

One can see that in the above definition the auxiliary assumptions in \(k\) are the \(e\)-development of further closed constant-free hypotheses (in fact, equations as applied to specific measured values, in typical examples from Glymour 1980a), where such hypotheses are meant to be conjoined in a single statement (\(\alpha\)) for convenience. This implies that the only terms occurring (non-vacuously) in \(k\) are individual constants already occurring (non-vacuously) in \(e\). For an empty \(\alpha\) (that is, tautologous: \(\alpha = \top\)), \(k\) must be empty too, and the original (restricted) definition of Hempelian confirmation applies. As for the proviso that \(\alpha \not\vDash h\), it rules out undesired cases of circularity—akin to so-called “macho” bootstrap confirmation, as discussed in Earman and Glymour 1988 (for more on Glymour’s theory and its developments, see Douven and Meijs 2006, and references therein).

2. Hypothetico-deductivism

The central idea of hypothetico-deductive (HD) confirmation can be roughly described as “deduction-in-reverse”: evidence is said to confirm a hypothesis in case the latter, while not entailed by the former, is able to entail it, with the help of suitable auxiliary hypotheses and assumptions. The basic version (sometimes labelled “naïve”) of the HD notion of confirmation can be spelled out thus:

  • \(e\) HD-confirms \(h\) relative to \(k\) if and only if \(h\wedge k \vDash e\) and \(k \not\vDash e\);
  • \(e\) HD-disconfirms \(h\) relative to \(k\) if and only if \(h\wedge k \vDash \neg e\), and \(k \not\vDash \neg e\);
  • \(e\) is HD-neutral for hypothesis \(h\) relative to \(k\) otherwise.

Note that clause (ii) above represents HD-disconfirmation as plain logical inconsistency of the target hypothesis with the data (given the auxiliaries) (see Hempel 1945, 98).

HD-confirmation and Hempelian confirmation convey different intuitions (see Huber 2008a for an original analysis). They are, in fact, distinct and strictly incompatible notions. This will be effectively illustrated by the consideration of the following conditions.

Entailment condition (EC) For any \(h,e,k \in \bL\), if \(e\wedge k\) is consistent, \(e\wedge k \vDash h\) and \(k \not\vDash h\), then \(e\) confirms \(h\) relative to \(k\).

Confirmation complementarity (CC) For any \(h, e, k \in \bL\), \(e\) confirms \(h\) relative to \(k\) if and only if \(e\) disconfirms \(\neg h\) relative to \(k\).

Special consequence condition (SCC) For any \(h, e, k \in \bL\), if \(e\) confirms \(h\) relative to \(k\) and \(h\wedge k \vDash h^*\), then \(e\) confirms \(h^*\) relative to \(k\).

On the implicit proviso that \(k\) is empty (that is, tautologous: \(k = \top\)), Hempel (1943, 1945) himself had put forward (EC) and (SCC) as compelling adequacy conditions for any theory of confirmation, and devised his own proposal accordingly. As for (CC), he took it as a plain definitional truth (1943, 127). Moreover, Hempelian confirmation (extended) satisfies all conditions above (of course, for arguments \(h\), \(e\) and \(k\) for which it is defined). HD-confirmation, on the contrary, violates all of them. Let us briefly discuss each one in turn.

It is rather common for a theory of ampliative (non-deductive) reasoning to retain classical logical entailment as a special case (a feature sometimes called “super-classicality”; see Strasser and Antonelli 2019). That’s essentially what (EC) implies for confirmation. Now given appropriate \(e\), \(h\) and \(k\), if \(e\wedge k\) entails \(h\), we readily get that \(e\) Hempel-confirms \(h\) relative to \(k\) in two simple steps. First, given that \(e\) and \(k\) are both quantifier-free, \(dev_{e}(e\wedge k) = e\wedge k\) according to Hempel’s full definition of \(dev\) (see Hempel 1943, 131). Then it trivially follows that \(e\wedge k \vDash dev_{e}(e\wedge k)\), so \(e\wedge k\) is (directly) Hempel-confirmed and its logical consequence \(h\) is likewise confirmed (indirectly). Logical entailment is thus retained as an instance of Hempelian confirmation in a fairly straightforward way. HD-confirmation, on the contrary, does not fulfil (EC). Here is one odd example (see Sprenger 2011a, 234). With \(k = \top\), just let \(e\) be the observation report that object \(a\) is a black swan, \(swan(a) \wedge black(a)\), and \(h\) be the hypothesis that black swans exist, \(\exists x(swan(x) \wedge black(x))\). Evidence \(e\) verifies \(h\) conclusively, and yet it does not HD-confirm it, simply because \(h \not\vDash e\). So the observation of a black swan turns out to be HD-neutral for the hypothesis that black swans exist! The same example shows how HD-confirmation violates (CC), too. In fact, while HD-neutral for \(h\), \(e\) HD-disconfirms its negation \(\neg h\) that no swan is black, \(\forall x(swan(x) \rightarrow \neg black(x))\), because the latter is obviously inconsistent with (refuted by) \(e\).

The violation of (EC) and (CC) by HD-confirmation is arguably a reason for concern, for those conditions seem highly plausible. The special consequence condition (SCC), on the other hand, deserves separate and careful consideration. As we will see later on, (SCC) is a strong constraint, and far from sacrosanct. For now, let us point out one major philosophical motivation in its favor. (SCC) has often been invoked as a means to ensure the fulfilment of the following condition (see, e.g., Hesse 1975, 88; Horwich 1983, 57):

Predictive inference condition (PIC) For any \(e, k \in \bL\), if \(e\) confirms \(\forall x(Fx \rightarrow Gx)\) relative to \(k\), then \(e\) confirms \(F(a) \rightarrow G(a)\) relative to \(k\).

In fact, (PIC) readily follows from (SCC) and so it is satisfied by Hempel’s theory. It says that, if evidence \(e\) confirms “all \(F\)s are \(G\)s”, then it also confirms that a further object will be \(G\), if it is \(F\). Notably, this does not hold for HD-confirmation. Here is why. Given \(k = Fa\) (i.e., the assumption that \(a\) comes from the \(F\) population), we have that \(e = Ga\) HD-confirms \(h = \forall x(Fx \rightarrow Gx)\), because the latter entails the former (given \(k\)). (That’s the HD reconstruction of Nicod’s insight, see below.) We also have, of course, that \(h\) entails \(h^* = Fb \rightarrow Gb\). And yet, contrary to (PIC), since \(h^*\) does not entail \(e\) (given \(k\)), it is not HD-confirmed by it either. The troubling conclusion is that the observation that a swan is white (or that a million of them are, for that matters) does not HD-confirm the prediction that a further swan will be found to be white.

One attractive feature of HD-confirmation is that it largely eludes the ravens paradox. As the hypothesis \(h\) that all ravens are black does not entail that some generally sampled object \(a\) will be a black raven, the HD view of confirmation is not committed to the eminently Hempelian implication that \(e = raven(a) \wedge black(a)\) confirms \(h\). Likewise, \(\neg black(a) \wedge \neg raven(a)\) does not HD-confirm that all non-black objects are non-raven. The derivation of the paradox, as presented above, is thus blocked.

Indeed, HD-confirmation yields a substantially different reading of Nicod’s insight as compared to Hempel’s theory (Okasha 2011 has an important discussion of this distinction). Here is how it goes. If object \(a\) is assumed to have been taken among ravens —so that, crucially, the auxiliary assumption \(k = raven(a)\) is made—and \(a\) is checked for color and found to be black, then, yes, the latter evidence, \(black(a)\), HD-confirms that all ravens are black \((h)\) relative to \(k\). By the same token, \(\neg black(a)\) HD-disconfirms \(h\) relative to the same assumption \(k = raven(a)\). And, again, this is as it should be, in line with Nicod’s mention of “the absence of \(G\) [here, non-black as evidence] in a case of \(F\) [here, raven as an auxiliary assumption]”. It is also true that an object that is found not to be a raven HD-confirms \(h\), but only relative to \(k = \neg black(a)\), that is, if \(a\) is assumed to have been taken among non-black objects to begin with; and this seems acceptable too (after all, while sampling from non-black objects, one might have found the counterinstance of a raven, but didn’t). Unlike Hempel’s theory, moreover, HD-confirmation does not yield the debatable implication that, by itself (that is, given \(k = \top\)), the observation of a non-raven \(a\), \(\neg raven(a)\), must confirm \(h\).

Interestingly, the introduction of auxiliary hypotheses and assumptions shows that the issues surrounding Nicod’s remarks can become surprisingly subtle. Consider the following statements (Maher’s 2006 example):

\(\alpha_1\) simply specifies that no object is both white and black, while \(\alpha_2\) says that, if there are swans at all, then there also is some black swan. Also posit, again, \(e = swan(a) \wedge white(a)\). Under \(\alpha_1\) and \(\alpha_2\), the observation of a white swan clearly dis confirms (indeed, refutes) the hypothesis \(h\) that all swans are white. Hempel’s theory (extended) faces difficulties here, because for \(k = dev_{e}(\alpha_1 \wedge \alpha_2)\) it turns out that \(e\wedge k\) is inconsistent. But HD-confirmation gets this case right, thus capturing appropriate boundary conditions for Nicod’s generally sensible claims. For, with \(k = \alpha_1 \wedge \alpha_2\), one has that \(h\wedge k\) is consistent and entails \(\neg e\) (for it entails that no swan exists), so that \(e\) HD-disconfirms (refutes) \(h\) relative to \(k\) (see Good 1967 for another famous counterexample to Nicod’s condition).

HD-confirmation, however, is also known to suffer from distinctive “paradoxical” implications. One of the most frustrating is surely the following (see Osherson, Smith, and Shafir 1986, 206, for further specific problems).

The irrelevant conjunction paradox . Suppose that \(e\) confirms \(h\) relative to (possibly empty) \(k\). Let statement \(q\) be logically consistent with \(e\wedge h\wedge k\), but otherwise entirely irrelevant for all of those conjuncts (perhaps belonging to a completely separate domain of inquiry). Does \(e\) confirm \(h\wedge q\) (relative to \(k\)) as it does with \(h\)? One would want to say no, and this implication can be suitably reconstructed in Hempel’s theory. HD-confirmation, on the contrary, can not draw this distinction: it is easy to show that, on the conditions specified, if the HD clause for confirmation is satisfied for \(e\) and \(h\) (given \(k\)), so it is for \(e\) and \(h\wedge q\) (given \(k\)). (This is simply because, if \(h\wedge k \vDash e\), then \(h\wedge q\wedge k \vDash e\), too, by the monotonicity of classical logical entailment.)

Kuipers (2000, 25) suggested that one can live with the irrelevant conjunction problem because, on the conditions specified, \(e\) would still not HD-confirm \(q\) alone (given \(k\)), so that HD-confirmation can be “localized”: \(h\) is the only bit of the conjunction \(h\wedge q\) that gets any confirmation on its own, as it were. Other authors have been reluctant to bite the bullet and have engaged in technical refinements of the “naïve” HD view. In these proposals, the spread of HD-confirmation upon frivolous conjunctions can be blocked at the expense of some additional logical machinery (see Gemes 1993, 1998; Schurz 1991, 1994).

Finally, it should be noted that HD-confirmation offers no substantial relief from the blite paradox. On the one hand, \(e = raven(a) \wedge ex_{t\le T}(a) \wedge black(a)\) does not , as such, HD-confirm either \(h = \forall x(raven(x) \rightarrow black(x))\) or \(h^* = \forall x(raven(x) \rightarrow blite(x))\), that is, for empty \(k\). On the other hand, if object \(a\) is assumed to have been sampled from ravens before \(T\) (that is, given \(k = raven(a) \wedge ex_{t\le T}(a))\), then \(black(a)\) is entailed by both “all ravens are black” and “all ravens are blite” and therefore HD-confirms each of them. So HD-confirmation, too, sanctions the existence of confirmation relations that seem intuitively unsound (indeed, indefinitely many of them: as we know, other variations of \(h^*\) can be conceived at will, like the “blurple” hypothesis). One could insist that HD does handle the blite paradox after all, because \(black(a)\) (given \(k\) as above) does not HD-confirms that a raven will be white if examined after \(T\) (Kuipers 2000, 29 ff.). Unfortunately (as pointed out by Schurz 2005, 148) \(black(a)\) does not HD-confirm that a raven will be black if examined after \(T\) either (again, given \(k\) as above). That’s because, as already pointed out, HD-confirmation fails the predictive inference condition (PIC) in general. So, all in all, HD-confirmation can not tell black from blite any more than Hempel-confirmation can.

The issues above look contrived and artificial to some people’s taste—even among philosophers. Many have suggested a closer look at real-world inferential practices in the sciences as a more appropriate benchmark for assessment. For one thing, the very idea of hypothetico-deductivism has often been said to stem from the origins of Western science. As reported by Simplicius of Cilicia (sixth century A.D.) in his commentary on Aristotle’s De Caelo , Plato had challenged his pupils to identify combinations of “ordered” motions by which one could account for (namely, deduce) the planets’ wandering trajectories across the heavens as observed by the Earth. As a matter of historical fact, mathematical astronomy has engaged in just this task for centuries: scholars have been trying to define geometrical models from which the apparent motion of celestial bodies would derive.

It is fair to say that, at its roots, the kind of challenges that the HD framework faces with scientific reasoning is not so different from the main puzzles that arise from philosophical considerations of a more formal kind. Still, the two areas turn out to be complementary in important ways. The following statement will serve as a useful starting point to extend the scope of our discussion.

Underdetermination Theorem (UT) for “naïve” HD-confirmation For any contingent \(h, e \in \bL\), if \(h\) and \(e\) are logically consistent, there exists some \(k \in \bL\) such that \(e\) HD-confirms \(h\) relative to \(k\).

(UT) is an elementary logical fact that has been long recognized (see, e.g., Glymour 1980a, 36). In purely formal terms, just positing \(k = h \rightarrow e\) will do for a proof. To appreciate how (UT) can spark any philosophical interest, one has to combine it with some insightful remarks first put forward by Pierre Duhem (1906) and then famously revived by Quine (1951) in a more radical style. (Indeed, (UT) essentially amounts to the “entailment version” of “Quinean underdetermination” in Laudan 1990, 274.)

Duhem (he himself a supporter of the HD view) pointed out that in mature sciences such as physics most hypotheses or theories of real interest can not be contradicted by any statement describing observable states of affairs. Taken in isolation, they simply do not logically imply, nor rule out, any observable fact, essentially because (unlike “all ravens are black”) they involve the mention of unobservable entities and processes. So, in effect, Duhem emphasized that, typically, scientific hypotheses or theories are logically consistent with any piece of checkable evidence. Unless, of course, the logical connection is underpinned by auxiliary hypotheses and assumptions suitably bridging the gap between the observational and non-observational vocabulary, as it were. But then, once auxiliaries are in play, logic alone guarantees that some \(k\) exists such that \(h\wedge k\) is consistent, \(h\wedge k \vDash e\), and \(k \not\vDash e\), so that confirmation holds in naïve HD terms (that’s just the UT result above). Apparently, when Duhem’s point applies, the uncritical supporter of whatever hypothesis \(h\) can legitimately claim (naïve HD) confirmation from any \(e\) by simply shaping \(k\) conveniently. In this sense, hypothesis assessment would be radically “underdetermined” by any amount of evidence practically available.

Influential authors such as Thomas Kuhn (1962/1970) (but see Laudan 1990, 268, for a more extensive survey) relied on Duhemian insights to suggest that confirmation by empirical evidence is too weak a force to drive the evaluation of theories in science, often inviting conclusions of a relativistic flavor (see Worrall 1996 for an illuminating reconstruction along these lines). Let us briefly consider a classical case, which Duhem himself thoroughly analyzed: the wave vs . particle theories of light in modern optics. Across the decades, wave theorists were able to deduce an impressive list of important empirical facts from their main hypothesis along with appropriate auxiliaries, diffraction phenomena being only one major example. But many particle theorists’ reaction was to retain their hypothesis nonetheless and to reshape other parts of the “theoretical maze” (i.e., \(k\); the term is Popper’s, 1963, p. 330) to recover those observed facts as consequences of their own proposal. And as we’ve seen, if the bare logic of naïve HD was to be taken strictly, surely they could have claimed their overall hypothesis to be confirmed too, just as much as their opponents.

Importantly, they didn’t. In fact, it was quite clear that particle theorists, unlike their wave-theory opponents, were striving to remedy weaknesses rather than scoring successes (see Worrall 1990). But why, then? Because, as Duhem himself clearly realized, the logic of naïve HD “is not the only rule for our judgments” (1906, 217). The lesson of (UT) and the Duhemian insight is not quite, it seems, that naïve HD is the last word and scientific inference is unconstrained by stringent rational principles, but rather that the HD view has to be strengthened in order to capture the real nature of evidential support in rational scientific inference. At least, that’s the position of a good deal of philosophers of science working within the HD framework broadly construed. It has even been maintained that “no serious twentieth-century methodologist” has ever subscribed to the naïve HD view above “without crucial qualifications” (Laudan 1990, 278; also see Laudan and Leplin 1991, 466).

So the HD approach to confirmation has yielded a number of more articulated variants to meet the challenge of underdetermination. Following (loosely) Norton (2005), we will now survey an instructive sample of them.

Naïve HD can be enriched by a resolute form of predictivism . According to this approach, the naïve HD clause for confirmation is too weak because \(e\) must have been predicted in advance from \(h\wedge k\). Karl Popper’s (1934/1959) account of the “corroboration” of hypotheses famously embedded this view, but squarely predictivist stances can be traced back to early modern thinkers like Christiaan Huygens (1629–1695) and Gottfried Wilhelm Leibniz (1646–1716), and in Duhem’s work itself. The predictivist sets a high bar for confirmation. Her favorite examples typically include stunning episodes in which the existence of previously unknown objects, phenomena, or whole classes of them is anticipated: the phases of Venus for Copernican astronomy or the discovery of Neptune for Newtonian physics, all the way up to the Higgs boson for so-called standard model of subatomic particles.

The predictivist solution to the underdetermination problem is fairly radical: many of the relevant factual consequences of \(h\wedge k\) will be already known when this theory is articulated, and so unfit for confirmation. Critics have objected that predictivism is in fact far too restrictive. There seem to be many cases in which already known phenomena clearly do provide support to a new hypothesis or theory. Zahar (1973) first raised this problem of “old evidence”, then made famous by Glymour (1980a, 85 ff.) as a difficulty for Bayesianism (see Section 3 below). Examples of this kind abound in the history of science as elsewhere, but the textbook illustration has become the precession of Mercury’s perihelion, a lasting anomaly for Newtonian physics: Einstein’s general relativity calculations got this long-known fact right, thereby gaining a remarkable piece of initial support for the new theory. In addition to this problem with old evidence, HD predictivism also seems to lack a principled rationale. After all, the temporal order of the discovery of \(e\) and of the articulation of \(h\) and \(k\) may well be an entirely accidental historical contingency. Why should it bear on the confirmation relationship among them? (See Giere 1983 and Musgrave 1974 for classical discussions of these issues. Douglas and Magnus 2013 and Barnes 2018 offer more recent views and rich lists of further references.)

As a possible response to the difficulties above, naïve HD can be enriched by the use-novelty criterion (UN) instead. The UN reaction to the underdetermination problem is more conservative than the temporal predictivist strategy. According to this view, to improve on the weak naïve HD clause for confirmation one only has to rule out one particular class of cases, i.e., those in which the description of a known fact, \(e\), served as a constraint in the construction of \(h\wedge k\). The UN view thus comes equipped with a rationale. If \(h\wedge k\) was shaped on the basis of \(e\), UN advocates point out, then it was bound to get that state of affairs right; the theory never ran any risk of failure, thus did not achieve any particularly significant success either. Precisely in these cases, and just for this reason, the evidence \(e\) must not be double-counted: by using it for the construction of the theory, its confirmational power becomes “dried out”, so to speak.

The UN completion of naïve HD originated from Lakatos and some of his collaborators (see Lakatos and Zahar 1975 and Worrall 1978; also see Giere 1979, 161–162, and Gillies 1989 for similar views), although important hints in the same direction can be found at least in the work of William Whewell (1840/1847). Consider the touchstone example of Mercury again. According to Zahar (1973), Einstein did not need to rely on the Mercury data to define theory and auxiliaries as to match observationally correct values for the perihelion precession (also see Norton 2011a; and Earman and Janssen 1993 for a very detailed, and more nuanced, account). Being already known, the fact was not of course predicted in a strictly temporal sense, and yet, on Zahar’s reading, it could have been : it was “use-novel” and thus fresh for use to confirm the theory. For a more mundane illustration, so-called cross-validation techniques represent a routine application of the UN idea in statistical settings (as pointed out by Schurz 2014, 92; also see Forster 2007, 592 ff.). According to some commentators, however, the UN criterion needs further elaboration (see Hitchcock and Sober 2004 and Lipton 2005), while others have criticized it as essentially wrong-headed (see Howson 1990 and Mayo 1991, 2014; also see Votsis 2014).

Yet another way to enrich naïve HD is to combine it with eliminativism . According to this view, the naïve HD clause for confirmation is too weak because there must have been a low (enough) objective chance of getting the outcome \(e\) (favorable to \(h\)) if \(h\) was false, so that few possibilities exist that \(e\) may have occurred for some reason other than the truth of \(h\). Briefly put, the occurrence of \(e\) must be such that most alternatives to \(h\) can be safely ruled out. The founding figure of eliminativism is Francis Bacon (1561–1626). John Stuart Mill (1843/1872) is a major representative in later times, and Deborah Mayo’s “error-statistical” approach to hypothesis testing arguably develops this tradition (Mayo 1996 and Mayo and Spanos 2010; see Bird 2010, Kitcher 1993, 219 ff., and Meehl 1990 for other contemporary variations).

Eliminativism is most credible when experimentation is at issue (see, e.g., Guala 2012). Indeed, the appeal to Bacon’s idea of crucial experiment ( instantia crucis ) and related notions (e.g., “severe testing”) is a fairly reliable mark of eliminativist inclinations. Experimentation is, to a large extent, precisely an array of techniques to keep undesired interfering factors at a minimum by active manipulation and deliberate control (think of the blinding procedure in medical trials, with \(h\) the hypothesized effectiveness of a novel treatment and \(e\) a relative improvement in clinical endpoints for a target subsample of patients thus treated). When this kind of control obtains, popular statistical tools are supposed to allow for the calculation of the probability of \(e\) in case \(h\) is false meant as a “relative frequency in a (real or hypothetical) series of test applications” (Mayo 1991, 529), and to secure a sufficiently low value to validate the positive outcome of the test. It is much less clear how firm a grip this approach can retain when inference takes place at higher levels of generality and theoretical commitment, where the hypothesis space is typically much too poorly ordered to fit routine error-statistical analyses. Indeed, Laudan (1997, 315; also see Musgrave 2010) spotted in this approach the risk of a “balkanization” of scientific reasoning, namely, a restricted focus on scattered pieces of experimental inference (but see Mayo 2010 for a defense).

Naïve HD can also be enriched by the notion of simplicity . According to this view, the naïve HD clause for confirmation is too weak because \(h\wedge k\) must be a simple (enough), unified way to account for evidence \(e\). A classic reference for the simplicity view is Newton’s first law of philosophizing in the Principia (“admit no more causes of natural things than such as are both true and sufficient to explain their appearances”), echoing very closely Ockham’s razor. This basic idea has never lost its appeal—even up to recent times (see, e.g., Quine and Ullian 1970, 69 ff.; Sober 1975; Zellner, Keuzenkamp, and McAleer 2002; Scorzato 2013).

Despite Thomas Kuhn’s (1957, 181) suggestions to the contrary, the success of Copernican astronomy over Ptolemy’s system has remained an influential case study fostering the simplicity view (Martens 2009). Moreover, in ordinary scientific problems such as curve fitting , formal criteria of model selection are applied where the paucity of parameters can be interpreted naturally as a key dimension of simplicity (Forster and Sober 1994). Traditionally, two main problems have proven pressing, and frustrating, for the simplicity approach. First, how to provide a sufficiently coherent and illuminating explication of this multifaceted and elusive notion (see Riesch 2010); and second, how to justify the role of simplicity as a properly epistemic (rather than merely pragmatic ) virtue (see Kelly 2007, 2008).

Finally, naïve HD can be enriched by the appeal to explanation . Here, the naïve HD clause for confirmation is meant to be too weak because \(h\wedge k\) must be able (not only to entail, but) to explain \(e\). By this move, the HD approach embeds the slogan of the so-called inference to the best explanation view: “observations support the hypothesis precisely because it would explain them” (Lipton 2000, 185; also see Lipton 2004). Historically, the main source for this connection between explanation and support is found in the work of Charles Sanders Peirce (1839–1914). Janssen (2003) offers a particularly neat contemporary exhibit, explicitly aimed at “curing cases of the Duhem-Quine disease” (484; also see Thagard 1978, and Douven 2017 for a relevant survey). Quite unlike eliminativist approaches, explanationist analyses tend to focus on large-scale theories and relatively high-level kinds of evidence. Dealing with Einstein’s general relativity, for instance, Janssen (2003) greatly emphasizes its explanation of the equivalence of inertial and gravitational mass (essentially a brute fact in Newtonian physics) over the resolution of the puzzle of Mercury’s perihelion. Explanationist accounts are also distinctively well-equipped to address inference patterns from non-experimental sciences (Cleland 2011).

The problems faced by these approaches are similar to those affecting the simplicity view. Agreement is still lacking on the nature of scientific explanation (see Woodward 2019) and it is not clear how far an explanationist variant of HD can go without a sound analysis of that notion. Moreover, some critics have wondered why the relationship of confirmation should be affected by an explanatory connection with the evidence per se (see Salmon 2001).

The above discussion does not display an exhaustive list (nor are the listed options mutually exclusive, for that matter: see, e.g., Baker 2003; also see Worrall 2010 for some overlapping implications in an applied setting of real practical value). And our sketched presentation hardly allows for any conclusive assessment. It does suggest, however, that reports of the death of hypothetico-deductivism (see Earman 1992, 64, and Glymour 1980b) might have been exaggerated. For all its difficulties, HD has proven fairly resilient at least as a basic framework to elucidate some key aspects of how hypotheses can be confirmed by the evidence (see Betz 2013, Gemes 2005, and Sprenger 2011b for consonant points of view).

3. Bayesian confirmation theories

Bayes’s theorem is a very central element of the probability calculus (see Joyce 2019). For historical reasons, Bayesian has become a standard label to allude to a range of approaches and positions sharing the common idea that probability (in its modern, mathematical sense) plays a crucial role in rational belief, inference, and behavior. According to Bayesian epistemologists and philosophers of science, (i) rational agents have credences differing in strength, which moreover (ii) satisfy the probability axioms, and can thus be represented in probabilistic form. (In non-Bayesian models (ii) is rejected, but (i) may well be retained: see Huber and Schmidt-Petri 2009, Levi 2008, and Spohn 2012.) Well-known arguments exist in favor of this position (see, e.g., Easwaran 2011a; Pettigrew 2016; Skyrms 1987; Vineberg 2016), although there is no lack of difficulties and criticism (see, e.g., Easwaran 2011b; Hájek 2008; Kelly and Glymour 2004; Norton 2011b).

Beyond the core ideas above, however, the theoretical landscape of Bayesianism is quite as hopelessly diverse as it is fertile. Surveys and state of art presentations are already numerous, and ostensibly growing (see, e.g., Good 1971; Joyce 2011; Oaksford and Chater 2007; Sprenger and Hartmann 2020; Weisberg 2015). For the present purposes, attention can be restricted to a classification that is still fairly coarse-grained, and based on just two dimensions or criteria.

First, there is a distinction between permissivism and impermissivism (see Meacham 2014 and Kopec and Titelbaum 2016 for this terminology). For permissive Bayesians (often otherwise labelled “subjectivists”), accordance with the probability axioms is the only clear-cut constraint on the credences of a rational agent. In impermissive forms of Bayesianism (often otherwise called “objective”), further constraints are put forward that significantly restrict the range of rational credences, possibly up to one single “right” probability function in any given setting. Second, there are different attitudes towards so-called principle of total evidence (TE) for the credences on which a reasoner relies. TE Bayesians maintain that the relevant credences should be represented by a probability function \(P\) which conveys the totality of what is known to the agent. For non-TE approaches, depending on the circumstances, \(P\) may (or should) be set up so that portions of the evidence available are in fact bracketed. (Unsurprisingly, further subtleties arise as soon as one delves a bit further into the precise meaning and scope of TE; see Fitelson 2008 and Williamson 2002, Chs. 9–10, for important discussions.)

Of course, many intermediate positions exist between extreme forms of permissivism and impermissivism so outlined, and more or less the same applies for the TE issue. The above distinctions are surely rough enough, but useful nonetheless. Impermissive TE Bayesianism has served as a received view in early Bayesian philosophy of science (e.g., Carnap 1950/1962). But impermissivism is easily found in combination with non-TE positions, too (see, e.g., Maher 1996). TE permissivism seems a good approximation of De Finetti’s (2008) stance, while non-TE permissivism is arguably close to a standard view nowadays (see, e.g., Howson and Urbach 2006). No more than this will be needed to begin our exploration of Bayesian confirmation theories.

Let us posit a set \(\bP\) of probability functions representing possible states of belief about a domain that is described in a finite language \(L\) with \(\bL\) the set of its closed sentences. From now on, unless otherwise specified, whenever considering some \(h, e, k \in \bL\) and \(P \in \bP\), we will invariably rely on the following provisos:

  • both \(e\wedge k\) and \(h\wedge k\) are consistent;
  • \(P(e\wedge k), P(h\wedge k) \gt 0;\)
  • \(P(k) \gt P(h\wedge k)\) (unless \(k \vDash h\));
  • \(P(e\wedge k) \gt P(e\wedge h\wedge k)\) (unless \(e\wedge k \vDash h\)); and
  • \(P(e\wedge h\wedge k) \gt 0\), as long as \(e\wedge h\wedge k\) is consistent.

(These assumptions are convenient and critical for technical reasons, but not entirely innocent. Festa 1999 and Kuipers 2000, 44 ff., discuss some limiting cases that are left aside here owing to these constraints.)

A probabilistic theory of confirmation can be spelled out through the definition of a function \(C_{P}(h, e\mid k): \{\bL^3 \times \bP\} \rightarrow \Re\) representing the degree of confirmation that hypothesis \(h\) receives from evidence \(e\) relative to \(k\) and probability function \(P\). \(C_{P}(h,e\mid k)\) will then have relevant probabilities as its building blocks, according to the following basic postulate of probabilistic confirmation:

(P0) Formality There exists a function \(g\) such that, for any \(h, e, k \in \bL\) and any \(P \in \bP\), \(C_{P}(h,e\mid k) = g[P(h\wedge e\mid k),P(h\mid k),P(e\mid k)]\).

Note that the probability distribution over the algebra generated by \(h\) and \(e\), conditional on \(k\), is entirely determined by \(P(h\wedge e\mid k)\), \(P(h\mid k)\) and \(P(e\mid k)\). Hence, (P0) simply states that \(C_{P}(h, e\mid k)\) depends on that distribution, and nothing else. (The label for this assumption is taken from Tentori, Crupi, and Osherson 2007, 2010.)

Hempelian and HD confirmation, as discussed above, are qualitative theories of confirmation. They only tell us whether evidence \(e\) confirms (disconfirms) hypothesis \(h\) given \(k\). However, assessments of the amount of support that some evidence brings to a hypothesis are commonly involved in scientific reasoning, as well as in other domains, if only in the form of comparative judgments such as “hypothesis \(h\) is more strongly confirmed by \(e_{1}\) than by \(e_{2}\)” or “\(e\) confirms \(h_{1}\) to a greater extent than \(h_{2}\)”. Consider, for instance, the following principle, a veritable cornerstone of probabilistic confirmation in all of its variations (see Crupi, Chater, and Tentori 2013 for a list of references):

(P1) Final probability For any \(h,e_{1},e_{2},k \in \bL\) and any \(P \in \bP\), \(C_{P}(h,e_{1}\mid k) \gtreqless C_{P}(h, e_{2}\mid k)\) if and only if \(P(h\mid e_{1} \wedge k) \gtreqless P(h\mid e_{2} \wedge k).\)

(P1) is itself a comparative, or ordinal , principle, stating that, for any fixed hypothesis \(h\), the final (or posterior) probability and confirmation always move in the same direction in the light of data, \(e\) (given \(k\)). Interestingly, (P0) and (P1) are already sufficient to single out one traditional class of measures of probabilistic confirmation, if conjoined with the following (see Crupi and Tentori 2016, 656, Schippers 2017, and also Törnebohm 1966, 81):

(P2) Local equivalence For any \(h_{1},h_{2},e,k \in \bL\) and any \(P\in \bP\), if \(h_{1}\) and \(h_{2}\) are logically equivalent given \(e\) and \(k\), then \(C_{P}(h_{1},e\mid k) = C_{P}(h_{2}, e\mid k).\)

The following can then be shown:

Theorem 1 (P0), (P1) and (P2) hold if and only if there exists a strictly increasing function \(f\) such that, for any \(h, e, k \in \bL\) and any \(P \in \bP\), \(C_{P}(h, e\mid k) = f[P(h\mid e\wedge k)]\).

Theorem 1 provides a simple axiomatic characterization of the class of confirmation functions that are strictly increasing with the final probability of the hypothesis given the evidence (and \(k\)) (proven in Schippers 2017). All the functions in this class are ordinally equivalent , meaning that they imply the same rank order of \(C_{P}(h, e\mid k)\) and \(C_{P^*}(h^*,e^*\mid k^*)\) for any \(h, h^*,e, e^*,k, k^* \in \bL\) and any \(P, P^* \in \bP.\)

By (P0), (P1) and (P2), we thus have \(C_{P}(h, e\mid k) = f[P(h\mid e \wedge k)]\), implying that the more likely \(h\) is given the evidence the more it is confirmed. This approach explicates confirmation precisely as the overall credibility of a hypothesis ( firmness is Carnap’s 1950/1962 telling term, xvi). In this view, “Bayesian confirmation theory is little more than the examination of [the] properties” of the posterior probability function (Howson 2000, 179).

As we will see, the ordinal level of analysis is a solid and convenient middleground between a purely qualitative and a thoroughly quantitative (metric) notion of confirmation. To begin with, ordinal notions are in general sufficient to move “upwards” to the qualitative level as follows:

  • \(e\) \(C_{P}\)- confirms \(h\) relative to \(k\) if and only if \(C_{P}(h, e\mid k) \gt C_{P}(\neg h, e\mid k);\)
  • \(e\) \(C_{P}\)- disconfirms \(h\) relative to \(k\) if and only if \(C_{P}(h, e\mid k) \lt C_{P}(\neg h, e\mid k);\)
  • \(e\) is \(C_{P}\)- neutral for \(h\) relative to \(k\) if and only if \(C_{P}(h, e\mid k) = C_{P}(\neg h, e\mid k).\)

Given Theorem 1, (P0), (P1) and (P2) can be combined with the definitions in (QC) to derive the following qualitative notion of probabilistic confirmation as firmness:

  • \(e\) \(F\)- confirms \(h\) relative to \(k\) if and only if \(P(h\mid e \wedge k) \gt \bfrac{1}{2};\)
  • \(e\) \(F\)- disconfirms \(h\) relative to \(k\) if and only if \(P(h\mid e \wedge k) \lt \bfrac{1}{2};\)
  • \(e\) is \(F\)- neutral for \(h\) relative to \(k\) if and only if \(P(h\mid e \wedge k) = \bfrac{1}{2}.\)

The point of qualitative \(F\)-confirmation is thus straightforward: \(h\) is said to be (dis)confirmed by \(e\) (given \(k\)) if it is more likely than not to be true (false). (Sometimes a threshold higher than a probability \(\bfrac{1}{2}\) is identified, but this complication would add little for our present purposes.)

The ordinal notion of confirmation is of high theoretical significance because ordinal divergences, unlike purely quantitative differences, imply opposite comparative judgments for some evidence-hypothesis pairs. A refinement from the ordinal to a properly quantitative level is also be of interest, however, and much useful for tractability and applications. For example, one can have 0 as a convenient neutrality threshold for confirmation as firmness, provided that the following functional representation is adopted (see Peirce 1878 for an early occurrence):

(The base of the logarithm can be chosen at convenience, as long as it is strictly greater than 1.)

A quantitative requirement that is often put forward is the following stringent form of additivity:

Strict additivity (SA) For any \(h, e_{1},e_{2},k \in \bL\) and any \(P \in \bP\), \(\ \ \ C_{P}(h, e_{1} \wedge e_{2}\mid k) = C_{P}(h, e_{1}\mid k) + C_{P}(h, e_{2}\mid e_{1} \wedge k).\)

Although extraneous to \(F\)-confirmation, Strict Additivity will prove of use later on for the discussion of further variants of Bayesian confirmation theory.

Confirmation as firmness shares a number of structural properties with Hempelian confirmation. It satisfies the Special Consequence Condition, thus the Predictive Inference Condition too. It satisfies the Entailment Condition and, in virtue of (P1), extends it smoothly to the following ordinal counterpart:

  • if, \(e_{1}\wedge k \vDash h\) and \(e_{2}\wedge k \not\vDash h\), then \(h\) is more confirmed by \(e_{1}\) than by \(e_{2}\) relative to \(k\), that is, \(C_{P}(h, e_{1}\mid k) \gt C_{P}(h, e_{2}\mid k);\)
  • if, \(e_{1}\wedge k\vDash h\) and \(e_{2}\wedge k\vDash h,\) then \(h\) is equally confirmed by \(e_{1}\) and by \(e_{2}\) relative to \(k\), that is, \(C_{P}(h, e_{1}\mid k) = C_{P}(h, e_{2}\mid k).\)

According to (EC-Ord) not only is classical entailment retained as a case of confirmation, it also represents a limiting case: it is the strongest possible form of confirmation that a fixed hypothesis \(h\) can receive.

\(F\)-confirmation also satisfies Confirmation Complementarity and, moreover, extends it to its appealing ordinal counterpart (see Crupi, Festa, and Buttasi 2010, 85–86), that is:

Confirmation complementarity (ordinal extension) (CC-Ord) \(C_{P}(\neg h, e\mid k)\) is a strictly decreasing function of \(C_{P}(h, e\mid k)\), that is, for any \(h, h^*,e, e^*,k \in \bL\) and any \(P\in \bP,\) \(C_{P}(h, e\mid k)\gtreqless C_{P}(h^*,e^*\mid k)\) if and only if \(C_{P}(\neg h, e\mid k) \lesseqgtr C_{P}(\neg h^*,e^*\mid k).\)

(CC-Ord) neatly reflects Keynes’ (1921, 80) remark that “an argument is always as near to proving or disproving a proposition, as it is to disproving or proving its contradictory”. Indeed, quantitatively, the measure \(F(h, e\mid k)\) instantiates Confirmation Complementarity in a simple and elegant way, that is, it satisfies \(C_{P}(h, e\mid k) = -C_{P}(\neg h, e\mid k).\)

\(F\)-confirmation also implies another attractive quantitative result, alleviating the ailments of the irrelevant conjunction paradox. In the statement below, indicating this result, the irrelevance of \(q\) for hypothesis \(h\) and evidence \(e\) (relative to \(k\)) is meant to amount to the probabilistic independence of \(q\) from \(h, e\) and their conjunction (given \(k\)), that is, to \(P(h \wedge q\mid k) = P(h\mid k)P(q\mid k),\) \(P(e \wedge q\mid k) = P(e\mid k)P(q\mid k)\), and \(P(h \wedge e \wedge q\mid k) = P(h \wedge e\mid k)P(q\mid k)\), respectively.

Confirmation upon irrelevant conjunction (ordinal solution) (CIC) For any \(h, e, q, k \in \bL\) and any \(P \in \bP,\) if \(e\) confirms \(h\) relative to \(k\) and \(q\) is irrelevant for \(h\) and \(e\) relative to \(k\), then \(\ \ \ C_{P}(h, e\mid k) \gt C_{P}(h \wedge q, e\mid k).\)

So, even in case it is qualitatively preserved across the tacking of \(q\) onto \(h\), the positive confirmation afforded by \(e\) is at least bound to quantitatively decrease thereby.

Partly because of appealing formal features such as those mentioned so far, there is a long list of distinguished scholars advocating the firmness view of confirmation, from Keynes (1921) and Hosiasson-Lindenbaum (1940) onwards, most often coupled with some form of impermissive Bayesianism (see Hawthorne 2011 and Williamson 2011 for contemporary variations). In fact, \(F\)-confirmation fits most neatly a classical form of TE impermissivism à la Carnap, where one assumes that \(k = \top,\) that \(P\) is an “objective” initial probability based on essentially logical considerations, and that all the non-logical information available is collected in \(e\). The spirit of the Carnapian project never lost its appeal entirely (see, e.g., Festa 2003, Franklin 2001, Maher 2010, Paris 2011). However, the idea of a “logical” interpretation of \(P\) got stuck into difficulties that are often seen as insurmountable (e.g., Earman and Salmon 1992, 85–89; Gillies 2000, Ch. 3; Hájek 2019; Howson and Urbach 2006, 59–72; van Fraassen 1989, Ch. 12; Zabell 2011). And arguably, lacking some robust and effective impermissivist policy, the account of confirmation as firmness ends up loosing much of its philosophical momentum. The issues surrounding the ravens and blite paradoxes provide a useful illustration.

Consider again \(h = \forall x(raven(x) \rightarrow black(x))\), and the main analyses of “the observation that \(a\) is a black raven” encountered so far, that is:

  • \(k = \top\) and \(e = raven(a) \wedge black(a)\), and
  • \(k = raven(a)\) and \(e = black(a).\)

In both cases, whether \(e\) \(F\)-confirms \(h\) or not (relative to \(k\)) critically depends on \(P\): if the prior \(P(h\mid k)\) is low enough, \(e\) won’t do no matter what under either (i) or (ii); and if it is high enough, \(h\) will be \(F\)-confirmed either way. As a consequence, the \(F\)-confirmation view, by itself, does not offer any definite hint as to when, how, and why Nicod’s remarks apply or not.

For the purposes of our discussion, the following condition reveals another debatable aspect of the firmness explication of confirmation.

Consistency condition (Cons) For any \(h, h^*,e, k \in \bL\) and any \(P \in \bP\), if \(k \vDash \neg(h\wedge h^*)\) then \(e\) confirms \(h\) given \(k\) if and only if \(e\) disconfirms \(h^*\) given \(k\).

(Cons) says that evidence \(e\) can never confirm incompatible hypotheses. But consider, by way of illustration, a clinical case of an infectious disease of unknown origin, and suppose that \(e\) is the failure of antibiotic treatment. Arguably, there is nothing wrong in saying that, by discrediting bacteria as possible causes, the evidence confirms (viz. provides some support for) any of a number of alternative viral diagnoses. This judgment clashes with (Cons), though, which then seems an overly strong constraint.

Notably, (Cons) was defended by Hempel (1945) and, in fact, one can show that it follows from the conjunction of (qualitative) Confirmation Complementary and the Special Consequence Condition, and so from both Hempelian and \(F\)-confirmation. This is but one sign of how stringent the Special Consequence Condition is. Mainly because of the latter, both the Hempelian and the firmness views of confirmation must depart from the plausible HD idea that hypotheses are generally confirmed by their verified consequences (see Hempel 1945, 103–104). We will come back to this while discussing our next topic: a very different Bayesian explication of confirmation, based on the notion of probabilistic relevance .

We’ve seen that the firmness notion of probabilistic confirmation can be singled out through one ordinal constraint, (P2), in addition to the fundamental principles (P0)–(P1). The counterpart condition for the so-called relevance notion of probabilistic confirmation is the following:

(P3) Tautological evidence For any \(h_{1},h_{2},k\in \bL\) and any \(P\in \bP\), \(C_{P}(h_{1},\top \mid k) = C_{P}(h_{2},\top \mid k).\)

(P3) implies that any hypothesis is equally “confirmed” by empty evidence. We will say that \(C_{P}(h, e\mid k)\) represents the probabilistic relevance notion of confirmation, or relevance-confirmation, if and only if it satisfies (P0), (P1) and (P3). These conditions are sufficient to derive the following, purely qualitative principle, according to the definitional method in (QC) above (see Crupi and Tentori 2014, 82, and Crupi 2015).

  • \(e\) relevance-confirms \(h\) relative to \(k\) if and only if \(P(h\mid e \wedge k)\gt P(h\mid k);\)
  • \(e\) relevance-disconfirms \(h\) relative to \(k\) if and only if \(P(h\mid e \wedge k)\lt P(h\mid k);\)
  • \(e\) is relevance-neutral for \(h\) relative to \(k\) if and only if \(P(h\mid e \wedge k) = P(h\mid k).\)

The point of relevance confirmation is that the credibility of a hypothesis can be changed in either a positive (confirmation in a strict sense) or negative way (disconfirmation) by the evidence concerned (given \(k\)). Confirmation (in the strict sense) thus reflects an increase from initial to final probability, whereas disconfirmation reflects a decrease (see Achinstein 2005 for some diverging views on this very idea).

The qualitative notions of confirmation as firmness and as relevance are demonstrably distinct. Unlike firmness, relevance confirmation can not be formalized by the final probability alone, or any increasing function thereof. To illustrate, the probability of an otherwise very rare disease \((h)\) can be quite low even after a relevant positive test result \((e)\); yet \(h\) is relevance-confirmed by \(e\) to the extent that its probability rises thereby. By the same token, the probability of the absence of the disease \((\neg h)\) can be quite high despite the positive test result \((e)\), yet \(\neg h\) is relevance-disconfirmed by \(e\) to the extent that its probability decreases thereby. Perhaps surprisingly, the distinction between firmness and relevance confirmation—“extremely fundamental” and yet “sometimes unnoticed”, as Salmon (1969, 48–49) put it—had to be stressed time and again to achieve theoretical clarity in philosophy (e.g., Popper 1954; Peijnenburg 2012) as well as in other domains concerned, such as artificial intelligence and the psychology of reasoning (see Horvitz and Heckerman 1986; Crupi, Fitelson, and Tentori 2008; Shogenji 2012).

The qualitative notion of relevance confirmation already has some interesting consequences. It implies, for instance, the following remarkable fact:

Complementary Evidence (CompE) For any \(h, e, k\in \bL\) and any \(P\in \bP,\) \(e\) confirms \(h\) relative to \(k\) if and only if \(\neg e\) disconfirms \(h\) relative to \(k.\)

The importance of (CompE) can be illustrated as follows. Consider the case of a father suspected of abusing his son. Suppose that the child does claim that s/he has been abused (label this evidence \(e\)). A forensic psychiatrist, when consulted, declares that this confirms guilt \((h)\). Alternatively, suppose that the child is asked and does not report having been abused \((\neg e).\) As pointed out by Dawes (2001), it may well happen that a forensic psychiatrist will nonetheless interpret this as evidence confirming guilt (suggesting that violence has prompted the child’s denial). One might want to argue that, other things being equal, this kind of “heads I win, tails you lose” judgment would be inconsistent, and thus in principle untenable. Whoever concurs with this line of argument (as Dawes 2001 himself did) is likely to be relying on the relevance notion of confirmation. In fact, no other notion of confirmation considered so far provides a general foundation for this judgment. \(F\)-confirmation, in particular, would not do, for it does allow that both \(e\) and \(\neg e\) confirm \(h\) (relative to \(k\)). This is because, mathematically, it is perfectly possible for both \(P(h\mid e \wedge k)\) and \(P(h\mid \neg e \wedge k)\) to be arbitrarily high above \(\bfrac{1}{2}.\) Condition (CompE), on the contrary, ensures that only one between the complementary statements \(e\) and \(\neg e\) can confirm hypothesis \(h\) (relative to \(k\)). (To be precise, HD-confirmation also satisfies condition CompE, yet it would fail the above example all the same, although for a different reason, that is, because the connection between \(h\) and \(e\) is plausibly one of probabilistic dependence but not of logical entailment.)

Remarks such as the foregoing have induced some contemporary Bayesian theorists to dismiss the notion of confirmation as firmness altogether, concluding with I.J. Good (1968, 134) that “if you had \(P(h\mid e \wedge k)\) close to unity, but less than \(P(h\mid k)\), you ought not to say that \(h\) was confirmed by \(e\)” (also see Salmon 1975, 13). Let us follow this suggestion and proceed to consider the ordinal (and quantitative) notions of relevance confirmation.

Just as with firmness, the ordinal analysis of relevance confirmation can be characterized axiomatically. With the relevance notion, however, a larger set of options arises. Consider the following principles.

(P4) Disjunction of alternative hypotheses For any \(e, h_{1},h_{2},k\in \bL\) and any \(P\in \bP,\) if \(k\vDash \neg (h_{1} \wedge h_{2})\), then \(C_{P}(h_{1},e\mid k) \gtreqless C_{P}(h_{1} \vee h_{2},e\mid k)\) if and only if \(P(h_{2}\mid e \wedge k)\gtreqless P(h_{2}\mid k).\)

(P5) Law of likelihood For any \(e, h_{1}, h_{2}, k\in \bL\) and any \(P\in \bP,\) \(C_{P}(h_{1}, e\mid k)\gtreqless C_{P}(h_{2}, e\mid k)\) if and only if \(P(e\mid h_{1} \wedge k)\gtreqless P(e\mid h_{2} \wedge k).\)

(P6) Modularity (for conditionally independent data) For any \(e_{1},e_{2},h, k\in \bL\) and any \(P\in \bP,\) if \(P(e_{1}\mid \pm h \wedge e_{2} \wedge k)=P(e_{1}\mid \pm h \wedge k),\) then \(C_{P}(h, e_{1}\mid e_{2} \wedge k) = C_{P}(h, e_{1}\mid k).\)

All the above conditions occur more or less widely in the literature (see Crupi, Chater, and Tentori 2013 and Crupi and Tentori 2016 for references and discussion). Interestingly, they’re all pairwise incompatible on the background of the Formality and the Final Probability principles (P0 and P1 above). Indeed, they sort out the relevance notion of confirmation into three distinct, classical families of measures, as follows (Crupi, Chater, and Tentori 2013; Crupi and Tentori 2016; Heckerman 1988; Sprenger and Hartmann 2020, Ch. 1):

  • (P4) holds if and only if \(C_{P}(h, e\mid k)\) is a probability difference measure , that is, if there exists a strictly increasing function \(f\) such that, for any \(h, e, k\in \bL\) and any \(P\in \bP,\) \(C_{P}(h, e\mid k) = f[P(h\mid e \wedge k) - P(h\mid k)];\)
  • (P5) holds if and only if \(C_{P}(h, e\mid k)\) is a probability ratio measure , that is, if there exists a strictly increasing function \(f\) such that, for any \(h, e, k\in \bL\) and any \(P\in \bP,\) \(C_{P}(h, e\mid k) =f[\frac{P(h\mid e \wedge k)}{P(h\mid k)}];\)
  • (P6) holds if and only if \(C_{P}(h, e\mid k)\) is a likelihood ratio measure , that is, if there exists a strictly increasing function \(f\) such that, for any \(h, e, k\in \bL\) and any \(P\in \bP,\) \(C_{P}(h, e\mid k) =f[\frac{P(e\mid h \wedge k)}{P(e\mid \neg h \wedge k)}].\)

If a strictly additive behavior (SA above) is imposed, one functional form is singled out for the quantitative representation of confirmation corresponding to each of the clauses above:

  • \(D_{P}(h, e\mid k) = P(h\mid e \wedge k) - P(h\mid k);\)
  • \(R_{P}(h, e\mid k) = \log[\frac{P(h\mid e \wedge k)}{P(h\mid k)}];\)
  • \(L_{P}(h, e\mid k) = \log[\frac{P(e\mid h \wedge k)}{P(e\mid \neg h \wedge k)}].\)

(The bases of the logarithms are assumed to be strictly greater than 1.)

Before discussing briefly this set of alternative quantitative measures of relevance confirmation, we will address one further related issue. It is a long-standing idea, going back to Carnap at least, that confirmation theory should yield an inductive logic that is analogous to classical deductive logic in some suitable sense, thus providing a theory of partial entailment, and partial refutation. Now, the deductive-logical notions of entailment and refutation (contradiction) exhibit the following well-known properties:

Contraposition of entailment Entailment is contrapositive, but not commutative. That is, it holds that \(e\) entails \(h\) \((e\vDash h)\) if and only if \(\neg h\) entails \(\neg e\) \((\neg h\vDash \neg e),\) while it does not hold that \(e\) entails \(h\) if and only if \(h\) entails \(e\) \((h\vDash e).\)

Commutativity of refutation Refutation, on the contrary, is commutative, but not contrapositive. That is, it holds that \(e\) refutes \(h\) \((e\vDash \neg h)\) if and only if \(h\) refutes \(e\) \((h\vDash \neg e)\), while it does not hold that \(e\) refutes \(h\) if and only if \(\neg h\) refutes \(\neg e\) \((\neg h \vDash \neg\neg e).\)

The confirmation-theoretic counterparts are fairly straightforward:

(P7) Contraposition of confirmation For any \(e, h, k\in \bL\) and any \(P\in \bP,\) if \(e\) relevance-confirms \(h\) relative to \(k,\) then \(C_{P}(h, e\mid k) = C_{P}(\neg e,\neg h\mid k).\)

(P8) Commutativity of disconfirmation For any \(e, h, k \in \bL\) and any \(P \in \bP,\) if \(e\) relevance-disconfirms \(h\) relative to \(k\), then \(C_{P}(h, e\mid k) = C_{P}(e, h\mid k).\)

The following can then be proven (Crupi and Tentori 2013):

Theorem 3 Given (P0) and (P1), (P7) and (P8) hold if and only if \(C_{P}(h, e\mid k)\) is a relative distance measure , that is, if there exists a strictly increasing function \(f\) such that, for any \(h, e, k\in \bL\) and any \(P\in \bP,\) \(C_{P}(h, e\mid k) = f[Z(h, e\mid k)],\) where:

\( Z(h,e\mid k)= \begin{cases} \dfrac{P(h\mid e \wedge k) - P(h\mid k)}{1-P(h\mid k)} & \mbox{if } P(h\mid e \wedge k) \ge P(h\mid k) \\ \\ \dfrac{P(h\mid e \wedge k) - P(h\mid k)}{P(h\mid k)} & \mbox{if } P(h\mid e \wedge k) \lt P(h\mid k) \end{cases} \)

So, despite some pessimistic suggestions (see, e.g., Hawthorne 2018, and the discussion in Crupi and Tentori 2013), a neat confirmation-theoretic generalization of logical entailment (and refutation) is possible after all. Interestingly, relative distance measures can be additive, but only for uniform pairs of arguments – both confirmatory or both disconfirmatory (see Milne 2014, p. 259). (Note: Crupi, Tentori, and Gonzalez 2007; Crupi, Festa, and Buttasi 2010; and Crupi and Tentori 2013, 2014, provide further discussions of the properties of relative distance measures and their intuitive motivations. Also see Mura 2008 for a related analysis.)

The plurality of alternative probabilistic measures of relevance confirmation has prompted some scholars to be skeptical or dismissive of the prospects for a quantitative theory of confirmation (see, e.g., Howson 2000, 184–185, and Kyburg and Teng 2001, 98 ff.). However, as we will see shortly, quantitative analyses of relevance confirmation have proved important for handling a number of puzzles and issues that plagued competing approaches. Moreover, various arguments in the philosophy of science and beyond have been shown to depend critically (and sometimes unwittingly) on the choice of one confirmation measure (or some of them) rather than others (see Festa and Cevolani 2017, Fitelson 1999, Brössel 2013, Glass 2013, Roche and Shogenji 2014, Rusconi et al . 2014, and van Enk 2014).

Recently, arguments have been offered by Huber (2008b) in favor of \(D\), by Park (2014), Pruss (2014), and Vassend (2015) in favor of \(L\) (also see Morey, Romeijn, and Rouder 2016 for an important connection with statistics), and by Crupi and Tentori (2010) in favor of \(Z\). Hájek and Joyce (2008, 123), on the other hand, have seen different measures as possibly capturing “distinct, complementary notions of evidential support” (also see Schlosshauer and Wheeler 2011, Sprenger and Hartmann 2020, Ch.1, and Steel 2007 for tempered forms of pluralism). The case of measure \(R\) deserves some more specific comments, however. Following Fitelson (2007), one could see \(R\) as conveying key tenets of so-called “likelihoodist” position about evidential reasoning (see Royall 1997 for a classical statement, and Chandler 2013 and Sober 1990 for consonant arguments and inclinations). There seems to be some consensus, however, that compelling objections can be raised against the adequacy of \(R\) as a proper measure of relevance confirmation (see, in particular, Crupi, Festa, and Buttasi 2010, 85–86; Eells and Fitelson 2002; Gillies 1986, 112; and compare Milne 1996 with Milne 2010, Other Internet Resources). In what follows, too, it will be convenient to restrict our discussion to \(D, L\) and \(Z\) as candidate measures. All the results to be presented below are invariant for whatever choice among these three options, and across ordinal equivalence with each of them (but those results do not always extend to measures ordinally equivalent to \(R\)).

Let us go back to a classical HD case, where the (consistent) conjunction \(h \wedge k\) (but not \(k\) alone) entails \(e.\) The following can be proven:

  • if \(P(e\mid k)\lt 1,\) then \(e\) relevance-confirms \(h\) relative to \(k\) and \(C_{P}(h, e\mid k)\) is a decreasing function of \(P(e\mid k);\)
  • if \(P(e\mid k) = 1,\) then \(e\) is relevance-neutral for \(h\) relative to \(k.\)

Formally, it is fairly simple to show that (SP) characterizes relevance confirmation (see, e.g., Crupi, Festa, and Buttasi 2010, 80; Hájek and Joyce 2008, 123), but the philosophical import of this result is nonetheless remarkable. For illustrative purposes, it is useful to assume the endorsement of the principle of total evidence (TE) as a default position for the Bayesian. This means that \(P\) is assumed to represent actual degrees of belief of a rational agent, that is, given all the background information available. Then, by clause (i) of (SP), we have that the occurrence of \(e\), a consequence of \(h \wedge k\) (but not of \(k\) alone), confirms \(h\) relative to \(k\) provided that \(e\) was initially uncertain to some degree (even given \(k\)). In other words: \(e\) must have been predicted on the basis of \(h \wedge k\). Moreover, again by (i), the confirmatory impact will be stronger the more surprising (unlikely) the evidence was unless \(h\) was conjoined to \(k\). So, under TE, relevance confirmation turns out to embed a squarely predictivist version of hypothetico-deductivism! As we know, this neutralizes the charge of underdetermination, yet it comes at the usual cost, namely, the old evidence problem. In fact, if TE is in force, then clause (ii) of (SP) implies that no statement that is known to be true (thus assigned probability 1) can ever have confirmatory import.

Interestingly, the Bayesian predictivist has an escape (neatly anticipated, and criticized, by Glymour 1980a, 91–92). Consider Einstein and Mercury once again. As effectively pointed out by Norton (2011a, 7), Einstein was extremely careful to emphasize that the precession phenomenon had been derived “ without having to posit any special [ auxiliary ] hypotheses at all ”. Why? Well, presumably because if one had allowed herself to arbitrarily devise ad hoc auxiliaries (within \(k\), in our notation) then one could have been pretty much certain in advance to find a way to get Mercury’s data right (remember: that’s the lesson of the underdetermination theorem). But getting those data right with auxiliaries \(k\) that were not thus adjusted—that would have been a natural consequence had the theory of general relativity been true and it would have been surprising otherwise . Arguably, this line of argument exploits much of the use-novelty idea within a predictivist framework. The crucial points are (i) that the evidence implied is not a verified empirical statement \(e\) but the logical fact that \(h \wedge k\) entails \(e\), and (ii) that the existence of this connection of entailment was not to be obviously anticipated at all, precisely because \(h \wedge k\) and \(e\) are such that the latter did not serve as a constraint to specify the former. On these conditions, it seems that \(h\) can be confirmed by this kind of “second-order” (logical) evidence in line with (SP) while TE is concurrently preserved .

At least two main problems arise, however. The first one is more technical in nature. Modelling rational uncertainty concerning logical facts (such as \(h \wedge k \vDash e\)) by probabilistic means is no trivial task. Garber (1983) put forward an influential proposal, but doubts have been raised that it might not be well-behaved (e.g., van Fraassen 1988; a careful survey with further references can be found in Eva and Hartmann forthcoming). Second, and more substantially, this solution of the old evidence problem can be charged of being an elusive change of the subject: for it was Mercury’s data , not anything else, that had to be recovered as having confirmed (and still confirming, some would add) Einstein’s theory. That’s the kind of judgment that confirmation theory must capture, and which remains unattainable for the predictivist Bayesian. (Earman 1992, 131 voiced this complaint forcefully. Hints for a possible rejoinder appear in Eells’s 1990 thorough discussion; see also Skyrms 1983.)

Bayesians that are unconvinced by the predictivist position are naturally led to dismiss TE and allow for the assignment of initial probabilities lower than 1 even to statements that were known all along. Of course, this brings the underdetermination problem back, for now \(k\) can still be concocted ad hoc to have known evidence \(e\) following from \(h \wedge k\) and moreover \(P(e\mid k)\lt 1\) is not prevented by TE anymore, thus potentially licencing arbitrary confirmation relations. Two moves can be combined to handle this problem. First, unlike HD, the Bayesian framework has the formal resources to characterize the auxiliaries themselves as more or less likely and thus their adoption as relatively safe or suspicious (the standard Bayesian treatment of auxiliary hypotheses is developed along these lines in Dorling 1979 and Howson and Urbach 2006, 92–102, and it is critically discussed in Rowbottom 2010, Strevens 2001, and Worrall 1993; also see Christensen 1997 for an important analysis of related issues). Second, one has to provide indications as to how TE should be relaxed. Non-TE Bayesians of the impermissivist strand often suggest that objective likelihood values concerning the outcome \(e\)—\(P(e\mid h \wedge k)\)—can be specified for the competing hypotheses at issue quite apart from the fact that \(e\) may have already occurred. Such values would typically be diverse for different hypotheses (thus mathematically implying \(P(e\mid k)\lt 1\)) and serve as a basis to capture formally the confirmatory impact of \(e\) (see Hawthorne 2005 for an argument along these lines). Permissivists, on the other hand, can not coherently rely on these considerations to articulate a non-TE position. They must invoke counterfactual degrees of belief instead, suggesting that \(P\) should be reconstructed as representing the beliefs that the agent would have, had she not known that \(e\) was true (see Howson 1991 for a statement and discussion, and Sprenger 2015 for an original recent variant; also see Jeffrey 1995 and Wagner 2001 for relevant technical results, and Steele and Werndl 2013 for an intriguing case-study from climate science).

The theory of Bayesian confirmation as relevance indicates when and why the HD idea works: if \(h \wedge k\) (but not \(k\)) entails \(e\), then \(h\) is relevance-confirmed by \(e\) (relative to \(k\)) because the latter increases the probability of the former— provided that \(P(e\mid k) \lt 1\). Admittedly, the meaning of the latter proviso partly depends on how one handles the problem of old evidence. Yet it seems legitimate to say that Bayesian relevance confirmation ( unlike the firmness view) retains a key point of ordinary scientific practice which is embedded in HD and yields further elements of clarification. Consider the following illustration.

Qualitative confirmation theories comply with the idea that \(h\) is confirmed both by \(e_{1} \wedge e_{2}\) and by \(e_{1} \wedge e_{2}^*.\) In the HD case, it is clear that \(h\) entails both conjunctions, given of course \(k\) stating that tigers, lions, and elephants are all mammals (an Hempelian account could also be given easily). Bayesian relevance confirmation unequivocally yields the same qualitative verdict. There is more, however. Presumably, one might also want to say that \(h\) is more strongly confirmed by \(e_{1} \wedge e_{2}\) than by \(e_{1} \wedge e_{2}^*,\) because the former offers a more varied and diverse body of positive evidence (interestingly, on experimental investigation, this pattern prevails in most people’s judgment, including children, see Lo et al. 2002). Indeed, the variety of evidence is a fairly central issue in the analysis of confirmation (see, e.g., Bovens and Hartmann 2002, Schlosshauer and Wheeler 2011, and Viale and Osherson 2000). In the illustrative case above, higher variety is readily captured by lower probability: it just seems a priori less likely that species as diverse as tigers and elephants share some unspecified genetic trait as compared to tigers and lions, that is, \(P(e_{1} \wedge e_{2}\mid k)\lt P(e_{1} \wedge e_{2}^*\mid k).\) By (SP) above, then, one immediately gets from the relevance confirmation view the sound implication that \(C_{P}(h, e_{1} \wedge e_{2}\mid k)\gt C_{P}(h, e_{1} \wedge e_{2}^*\mid k).\)

Principle (SP) is also of much use in the ravens problem. Posit \(h = \forall x(raven(x)\rightarrow black(x))\) once again. Just as HD, Bayesian relevance confirmation directly implies that \(e = black(a)\) confirms \(h\) given \(k = raven(a)\) and \(e^* =\neg raven(a)\) confirms \(h\) given \(k^* =\neg black(a)\) (provided, as we know, that \(P(e\mid k)\lt 1\) and \(P(e^*\mid k^*)\lt 1).\) That’s because \(h \wedge k\vDash e\) and \(h \wedge k^*\vDash e^*.\) But of course, to have \(h\) confirmed, sampling ravens and finding a black one is intuitively more significant than failing to find a raven while sampling the enormous set of the non-black objects. That is, it seems, because the latter is very likely to obtain anyway, whether or not \(h\) is true, so that \(P(e^*\mid k^*)\) is actually quite close to unity. Accordingly, (SP) implies that \(h\) is indeed more strongly confirmed by \(black(a)\) given \(raven(a)\) than it is by \(\neg raven(a)\) given \(\neg black(a)\)—that is, \(C_{P}(h, e\mid k)\gt C_{P}(h, e^*\mid k^*)\)—as long as the assumption \(P(e\mid k)\lt P(e^*\mid k^*)\) applies.

What then if the sampling in not constrained \((k = \top)\) and the evidence now amounts to the finding of a black raven, \(e = raven(a) \wedge black(a)\), versus a non-black non-raven, \(e^* =\neg black(a) \wedge \neg raven(a)\)? We’ve already seen that, for either Hempelian or HD-confirmation, \(e\) and \(e^*\) are on a par: both Hempel-confirm \(h\), none HD-confirms it. In the former case, the original Hempelian version of the ravens paradox immediately arises; in the latter, it is avoided, but at a cost: \(e\) is declared flatly irrelevant for \(h\)—a bit of a radical move. Can the Bayesian do any better? Quite so. Consider the following conditions:

  • \(P[raven(a)\mid h] = P[raven(a)] \gt 0\)
  • \(P[\neg raven(a) \wedge black(a)\mid h] = P[\neg raven(a) \wedge black(a)]\)

Roughly, (i) says that the size of the ravens population does not depend on their color (in fact, on \(h\)), and (ii) that the size of the population of black non -raven objects also does not depend on the color of ravens. Note that both (i) and (ii) seem fairly sound as far as our best understanding of our actual world is concerned. It is easy to show that, in relevance-confirmation terms, (i) and (ii) are sufficient to imply that \(e = raven(a) \wedge black(a)\), but not \(e^* = \neg raven(a) \wedge \neg black(a)\), confirms \(h\), that is \(C_{P}(h,e) \gt C_{P}(h,e^*) = 0\) (this observation is due to Mat Coakley). So the Bayesian relevance approach to confirmation can make a principled difference between \(e\) and \(e^*\) in both ordinal and qualitative terms. (A much broader analysis is provided by Fitelson and Hawthorne 2010, Hawthorne and Fitelson 2010 [Other Internet Resources]. Notably, their results include the full specification of the sufficient and necessary conditions for the main inequality \(C_{P}(h, e) \gt C_{P}(h, e^*)\).)

In general, Bayesian (relevance) confirmation theory implies that the evidential import of an instance of some generalization will often depend on the credence structure, and relies on its formal representation, \(P\), as a tool for more systematic analyses. Consider another instructive example. Assume that \(a\) denotes some company from some (otherwise unspecified) sector of the economy, and label the latter predicate \(S\). So, \(k = Sa\). You are informed that \(a\) increased revenues in 2019, represented as \(e = Ra\). Does this confirm \(h = \forall x(Sx \rightarrow Rx)\)? It does, at least to some degree, one would say. For an expansion of the whole sector (recall that you have no clue what this is) surely would account for the data. That’s a straightforward HD kind of reasoning (and a suitable Hempelian counterpart reconstruction would concur). But does \(e\) also confirm \(h^* = Sb \rightarrow Rb\) for some further company \(b\)? Well, another obvious account of the data \(e\) would be that company \(a\) has gained market shares at the expenses of some competitor, so that \(e\) might well seem to support \(\neg h^*,\) if anything (the revenues example is inspired by a remark in Blok, Medin, and Osherson 2007, 1362).

It can be shown that the Bayesian notion of relevance confirmation allows for this pattern of judgments, because (given \(k\)) evidence \(e\) above increases the probability of \(h\) but may well have the opposite effect on \(h^*\) (see Sober 1994 for important remarks along similar lines). Notably, \(h\) entails \(h^*\) by plain instantiation, and so contradicts \(\neg h^*\). As a consequence, the implication that \(C_{P}(h,e\mid k)\) is positive while \(C_{P}(h^*,e\mid k)\) is not clashes with each of the following, and proves them unduly restrictive: the Special Consequence Condition (SCC), the Predictive Inference Condition (PIC), and the Consistency Condition (Cons). Note that these principles were all evaded by HD-confirmation, but all implied by confirmation as firmness (see above).

At the same time, the most compelling features of \(F\)-confirmation, which the HD model was unable to capture, are retained by confirmation as relevance. In fact, all our measures of relevance confirmation (\(D, L\), and \(Z\)) entail the ordinal extension of the Entailment Condition (EC) as well as \(C_{P}(h, e\mid k) = -C_{P}(\neg h, e\mid k)\) and thereby Confirmation Complementarity in all of its forms (qualitative, ordinal, and quantitative). Moreover, the Bayesian confirmation theorist of either the firmness or the relevance strand can avail herself of the same quantitative strategy of “damage control” for the main specific paradox of HD-confirmation, i.e., the irrelevant conjunction problem. (See statement (CIC) above, and Crupi and Tentori 2010, Fitelson 2002. Also see Chandler 2007 for criticism, and Moretti 2006 for a related debate.)

We’re left with one last issue to conclude our discussion, to wit, the blite paradox. Recall that \(blite\) is so defined:

As always heretofore, we posit \(h = \forall x(raven(x)\rightarrow black(x)),\) \(h^* = \forall x(raven(x)\rightarrow blite(x)).\) We then consider the set up where \(k = raven(a) \wedge ex_{t\le T}(a),\) \(e= black(a),\) and \(P(e\mid k)\lt 1.\) Various authors have noted that, with Bayesian relevance confirmation, one has that \(P(h\mid k)\gt P(h^*\mid k)\) is sufficient to imply that \(C_{P}(h, e\mid k)\gt C_{P}(h^*,e\mid k)\) (see Gaifman 1979, 127–128; Sober 1994, 229–230; and Fitelson 2008, 131). So, as long as the black hypothesis is perceived as initially more credible than its blite counterpart, the former will be more strongly confirmed than the latter. Of course, \(P(h\mid k)\gt P(h^*\mid k)\) is an entirely commonsensical assumption, yet these same authors have generally, and quite understandably, failed to see this result as philosophically illuminating. Lacking some interesting, non-question-begging story as to why that inequality should obtain, no solution of the paradox seems to emerge. More modestly, one could point out that a measure of relevance confirmation \(C_{P}(h, e\mid k)\) implies (i) and (ii) below.

  • Necessarily (that is, for any \(P\in \bP\)), \(e\) confirms \(h\) relative to \(k\).
  • \(e\) confirms that a raven will be black if examined after \(T\), that is, \((raven(b)\wedge \neg ex_{t\le T}(b)) \rightarrow black(b),\) relative to \(k\); and
  • \(e\) does not confirm that a raven will be white if examined after \(T\), that is, \((raven(b)\wedge \neg ex_{t\le T}(b)) \rightarrow white(b),\) relative to \(k\).

Without a doubt, (i) and (ii) fall far short of a satisfactory solution of the blite paradox. Yet it seems at least a legitimate minimal requirement for a compelling solution (if any exists) that it implies both. It is then of interest to note that confirmation as firmness is inconsistent with (i), while Hempelian and HD-confirmation are inconsistent with (ii).

  • Achinstein, P. (ed.), 2005, Scientific Evidence: Philosophical Theories and Applications , Baltimore: John Hopkins University Press.
  • Baker, A., 2003, “Quantitative Parsimony and Explanatory Power”, British Journal for the Philosophy of Science , 54: 245–259.
  • Barnes, E.C., 2018, “Prediction versus Accommodation”, The Stanford Encyclopedia of Philosophy (Fall 2018 Edition), E.N. Zalta (ed.), URL = < https://plato.stanford.edu/archives/fall2018/entries/prediction-accommodation/ >.
  • Betz, G., 2013, “Revamping Hypothetico-Deductivism: A Dialectic Account of Confirmation”, Erkenntnis , 78: 991–1009.
  • Bird, A., 2010, “Eliminative Abduction—Examples from Medicine”, Studies in History and Philosophy of Science , 41: 345–352.
  • Blok, S.V., D.L. Medin, and D. Osherson, 2007, “Induction as Conditional Probability Judgment”, Memory & Cognition , 35: 1353–1364.
  • Bovens, L. and S. Hartmann, 2002, “Bayesian Networks and the Problem of Unreliable Instruments”, Philosophy of Science , 69: 29–72.
  • Brössel, P., 2013, “The Problem of Measure Sensitivity Redux”, Philosophy of Science , 80: 378–397.
  • Carnap, R., 1950/1962, Logical Foundations of Probability , Chicago: University of Chicago Press.
  • Chandler, J., 2007, “Solving the Tacking Problem with Contrast Classes”, British Journal for the Philosophy of Science , 58: 489–502.
  • –––, 2013, “Contrastive Confirmation: Some Competing Accounts”, Synthese , 190: 129–138.
  • Christensen, D., 1997, “What Is Relative Confirmation?”, Noûs , 3: 370–384.
  • Cleland, C.E., 2011, “Prediction and Explanation in Historical Natural Science”, British Journal for the Philosophy of Science , 62: 551–582.
  • Craig, W., 1957, “The Uses of the Herbrand-Gentzen Theorem in Relating Model Theory and Proof Theory”, Journal of Symbolic Logic , 3: 269–285.
  • Crupi, V., 2015, “Inductive Logic”, Journal of Philosophical Logic , 44 (40th anniversary issue): 641–650.
  • Crupi, V. and K. Tentori, 2010, “Irrelevant Conjunction: Statement and Solution of a New Paradox”, Philosophy of Science , 77: 1–13.
  • –––, 2013, “Confirmation as Partial Entailment: A Representation Theorem in Inductive Logic”, Journal of Applied Logic , 11: 364–372 [Erratum in Journal of Applied Logic , 12: 230–231].
  • –––, 2014, “Measuring Information and Confirmation”, Studies in the History and Philosophy of Science , 47: 81–90.
  • –––, 2016, “Confirmation Theory”, in A. Hájek and C. Hitchcock (eds.), Oxford Handbook of Philosophy and Probability , Oxford: Oxford University Press, pp. 650–665.
  • Crupi, V., N. Chater, and K. Tentori, 2013, “New Axioms for Probability and Likelihood Ratio Measures”, British Journal for the Philosophy of Science , 64: 189–204.
  • Crupi, V., R. Festa, and C. Buttasi, 2010, “Towards a Grammar of Bayesian Confirmation”, in M. Suárez, M. Dorato, and M. Rédei (eds.), Epistemology and Methodology of Science , Dordrecht: Springer, pp. 73–93.
  • Crupi, V., B. Fitelson, and K. Tentori, 2008, “Probability, Confirmation, and the Conjunction Fallacy”, Thinking & Reasoning , 14: 182–199.
  • Crupi, V., K. Tentori, and M. Gonzalez, 2007, “On Bayesian Measures of Evidential Support: Theoretical and Empirical Issues”, Philosophy of Science 74: 229–52.
  • Dawes, R.M., 2001, Everyday Irrationality , Boulder (CO): Westview.
  • De Finetti, B., 2008, Philosophical Lectures on Probability (edited by A. Mura), Dordrecht: Springer.
  • Dorling, J., 1979, “Bayesian Personalism, the Methodology of Scientific Research Programmes, and Duhem’s Problem”, Studies in History and Philosophy of Science , 10: 177–187.
  • Douglas, H. and P.D. Magnus, 2013, “Why Novel Prediction Matters”, Studies in History and Philosophy of Science , 44: 580–589.
  • Douven, I., 2017, “Abduction”, in E.N. Zalta (ed.), The Stanford Encyclopedia of Philosophy (Summer 2017 Edition), URL = < https://plato.stanford.edu/archives/sum2017/entries/abduction/ >.
  • Douven, I. and W. Meijs, 2006, “Bootstrap Confirmation Made Quantitative”, Synthese , 149: 97–132.
  • Duhem, P., 1906, The Aim and Structure of Physical Theory , Princeton (NJ): Princeton University Press, 1991.
  • Earman, J., 1992, Bayes or Bust? , Cambridge (MA): MIT Press.
  • ––– (ed.), 1983, Minnesota Studies in Philosophy of Science , Vol. 10: Testing Scientific Theories , Minneapolis: University of Minnesota Press.
  • Earman, J. and C. Glymour, 1988, “What Revisions Does Bootstrap Testing Need?”, Philosophy of Science , 55: 260–264.
  • Earman, J. and M. Janssen, 1993, “Einstein’s Explanation of the Motion of Mercury’s Perihelion”, in J. Earman, M. Janssen, and J. Norton (eds.), The Attraction of Gravity: New Studies in the History of General Relativity , Boston: Birkhäuser, pp. 129–172.
  • Earman, J. and W.C. Salmon, 1992, “The Confirmation of Scientific Hypotheses”, in M.H. Salmon et al., Introduction to the Philosophy of Science , Englewood Cliff: Prentice Hall, pp. 42–103.
  • Easwaran, K., 2011a, “Bayesianism I: Introduction and Arguments in Favor”, Philosophy Compass , 6: 312–320.
  • –––, 2011b, “Bayesianism II: Criticisms and Applications”, Philosophy Compass , 6: 321–332.
  • Eells, E., 1990, “Bayesian Problems of Old Evidence”, in Savage, 1990, pp. 205–223.
  • Eells, E. and B. Fitelson, 2002, “Symmetries and Asymmetries in Evidential Support”, Philosophical Studies , 107: 129–42.
  • Eva, B. and S. Hartmann, forthcoming, “On the Origins of Old Evidence”, Australasian Journal of Philosophy , first online 14 October 2019. doi:10.1080/00048402.2019.1658210
  • Festa, R., 1999, “Bayesian Confirmation”, in M. Galavotti and A. Pagnini (eds.), Experience, Reality, and Scientific Explanation , Dordrecht: Kluwer, pp. 55–87.
  • Festa, R. and G. Cevolani, 2017, “Unfolding the Grammar of Bayesian Confirmation: Likelihood and Anti-Likelihood Principles”, Philosophy of Science , 84: 56–81.
  • –––, 2003, “Induction, Probability, and Bayesian Epistemology”, Poznan Studies in the Philosophy of Science and the Humanities , 80: 251–284.
  • Fitelson, B., 1999, “The Plurality of Bayesian Measures of Confirmation and the Problem of Measure Sensitivity”, Philosophy of Science , 66: S362–78.
  • –––, 2002, “Putting the Irrelevance Back into the Problem of Irrelevant Conjunction”, Philosophy of Science , 69: 611–622.
  • –––, 2007, “Likelihoodism, Bayesianism, and Relational Confirmation”, Synthese 156: 473–89.
  • –––, 2008, “Goodman’s New Riddle”, Journal of Philosophical Logic , 37: 613–643.
  • Fitelson, B. and J. Hawthorne, 2010, “How Bayesian Confirmation Theory Handles the Paradox of the Ravens”, in E. Eells and J. Fetzer (eds.), The Place of Probability in Science , Dordrecht: Springer, pp. 247–276.
  • Forster, M., 2007, “A Philosopher’s Guide to Empirical Success”, Philosophy of Science , 74: 588–600.
  • Forster, M. and E. Sober, 1994, “How to Tell When Simpler, More Unified, or Less Ad Hoc Theories Will Provide More Accurate Predictions”, British Journal for the Philosophy of Science , 45: 1–35.
  • Franklin, J., 2001, “Resurrecting Logical Probability”, Erkenntnis , 55: 277–305.
  • Gabbay, D., S. Hartmann, and J. Woods (eds.), 2011, Handbook of the History of Logic , Vol. 10: Inductive Logic , Amsterdam: Elsevier.
  • Gaifman, H., 1979, “Subjective Probability, Natural Predicates, and Hempel’s Ravens”, Erkenntnis , 14: 105–147.
  • Garber, D., 1983, “Old Evidence and Logical Omniscience”, in Earman, 1983, pp. 99–131.
  • Gemes, K. 1993, “Hypothetico-Deductivism, Content and the Natural Axiomatization of Theories”, Philosophy of Science , 60: 477–487.
  • –––, 1998, “Hypothetico-Deductivism: The Current State of Play; The Criterion of Empirical Significance: Endgame”, Erkenntnis , 49: 1–20.
  • –––, 2005, “Hypothetico-Deductivism: Incomplete But Not Hopeless”, Erkenntnis , 63: 139–147.
  • Giere, R.N., 1983, “Testing Theoretical Hypotheses”, in Earman, 1983, pp. 269–298.
  • –––, 1979, Understanding Scientific Reasoning , Belmont (CA): Thomson / Wadsworth, 2006.
  • Gillies, D., 1986, “In Defense of the Popper-Miller Argument”, Philosophy of Science , 53: 110–113.
  • –––, 1989, “Non-Bayesian Confirmation Theory and the Principle of Explanatory Surplus”, in A. Fine and J. Leplin (eds.), Proceedings of the 1988 Biennial Meeting of Philosophy of Science Association , Vol. 2, East Lansing (MI): Philosophy of Science Association, pp. 381–392.
  • –––, 2000, Philosophical Theories of Probability , London: Routledge.
  • Glass, D.H., 2013, “Confirmation Measures of Association Rule Interestingness”, Knowledge-Based Systems , 44: 65–77.
  • Glymour, C., 1980a, Theory and Evidence , Princeton (NJ): Princeton University Press.
  • –––, 1980b, “Hypothetico-Deductivism Is Hopeless”, Philosophy of Science , 47: 322–325.
  • Good, I.J., 1967, “The White Shoe Is a Red Herring”, British Journal for the Philosophy of Science , 17: 322.
  • –––, 1968, “Corroboration, Explanation, Evolving Probabilities, Simplicity, and a Sharpened Razor”, British Journal for the Philosophy of Science , 19: 123–43.
  • –––, 1971, “46656 Varieties of Bayesians”, American Statistician , 25 (5): 62–63.
  • Goodman, N., 1955, Fact, Fiction, and Forecast , Cambridge (MA): Harvard University Press, 1983.
  • Guala, F., 2012, “Experimentation in Economics”, in D. Gabbay, P. Thagard, J. Woods, and U. Mäki (eds.), Handbook of the Philosophy of Science: Philosophy of Economics , Amsterdam: Elsevier, pp. 597–640.
  • Hájek, A., 2008. “Arguments For—Or Against—Probabilism?”, British Journal for the Philosophy of Science , 59: 793–819.
  • –––, 2019, “Interpretations of Probability”, E.N. Zalta (ed.), The Stanford Encyclopedia of Philosophy (Fall 2019 Edition), URL = < https://plato.stanford.edu/archives/fall2019/entries/probability-interpret/ >.
  • Hájek, A. and J. Joyce, 2008, “Confirmation”, in S. Psillos and M. Curd (eds.), Routledge Companion to the Philosophy of Science , New York: Routledge, pp. 115–29.
  • Hawthorne, J., 2005, “Degrees-of-Belief and Degrees-of-Support: Why Bayesians Need Both Notions”, Mind , 114: 277–320.
  • –––, 2011, “Confirmation Theory”, in D. Gabbay, P. Thagard, J. Woods, P.S. Bandyopadhyay, and M. Forster (eds.), Handbook of the Philosophy of Science: Philosophy of Statistics , Dordrecht: Elsevier, pp. 333–389.
  • –––, 2018, “Inductive Logic”, in E.N. Zalta (ed.), The Stanford Encyclopedia of Philosophy (Spring 2018 Edition), URL = < https://plato.stanford.edu/archives/spr2018/entries/logic-inductive/ >.
  • Heckerman, D., 1988, “An Axiomatic Framework for Belief Updates”, in J.F. Lemmer and L.N. Kanal (eds.), Uncertainty in Artificial Intelligence 2, Amsterdam: North-Holland, pp. 11–22.
  • Hempel, C.G., 1937, “Le problème de la vérité”, Theoria , 3: 206–246.
  • –––, 1943, “A Purely Syntactical Definition of Confirmation”, Journal of Symbolic Logic , 8: 122–143.
  • –––, 1945, “Studies in the Logic of Confirmation”, Mind , 54: 1–26, 97–121.
  • Hesse, M. 1975, “Bayesian Methods and the Initial Probabilities of Theories”, in Maxwell and Anderson, 1975, pp. 50–105.
  • Hitchcock, C. and E. Sober, 2004, “Prediction versus Accommodation and the Risk of Overfitting”, British Journal for the Philosophy of Science , 55: 1–34.
  • Horvitz, E. and D. Heckerman, 1986, “The Inconsistent Use of Measures of Certainty in Artificial Intelligence Research”, in L.N. Kanal and J.F. Lemmer (eds.), Uncertainty in Artificial Intelligence , Amsterdam: North-Holland, pp. 137–151.
  • Horwich, P., 1983, “Explanations of Irrelevance”, in Earman, 1983, pp. 55–65.
  • Hosiasson-Lindenbaum, J., 1940, “On Confirmation”, Journal of Symbolic Logic , 5: 133–148.
  • Howson, C., 1990, “Fitting Theory to the Facts: Probably Not Such a Bad Idea After All”, in Savage, 1990, pp. 224–244.
  • –––, 1991, “The ‘Old Evidence’ Problem”, British Journal for the Philosophy of Science , 42: 547–555.
  • –––, 2000, Hume’s Problem: Induction and the Justification of Belief , Oxford: Oxford University Press.
  • Howson, C. and P. Urbach, 2006, Scientific Reasoning. The Bayesian Approach , La Salle (IL): Open Court.
  • Huber, F., 2008a, “Hempel’s Logic of Confirmation”, Philosophical Studies , 139: 181–189.
  • –––, 2008b, “Assessing Theories, Bayes Style”, Synthese , 161: 89–118.
  • Huber, F. and C. Schmidt-Petri, 2009, Degrees of Belief , Dordrecht: Springer.
  • Janssen, M., 2003, “COI Stories: Explanation and Evidence in the History of Science”, Perspectives on Science , 10: 457–522.
  • Jeffrey, R., 1995, “Probability Reparation: The Problem of New Explanation”, Philosophical Studies , 77: 97–101.
  • Joyce, J., 2019, “Bayes’ Theorem”, in E.N. Zalta (ed.), The Stanford Encyclopedia of Philosophy (Spring 2019 Edition), URL = < https://plato.stanford.edu/archives/spr2019/entries/bayes-theorem/ >.
  • –––, 2011, “The Development of Subjective Bayesianism”, in Gabbay, Hartmann, and Woods, 2011, pp. 415–476.
  • Kelly, K., 2007, “A New Solution to the Puzzle of Simplicity”, Philosophy of Science , 74: 561–573.
  • –––, 2008, “Ockham’s Razor, Truth, and Information”, in D. Gabbay, P. Thagard, J. Woods, P. Adriaans, and J. van Benthem (eds.), Handbook of the Philosophy of Science: Philosophy of Information , Dordrecht: Elsevier, pp. 321–360.
  • Kelly, K. and C. Glymour, 2004, “Why Probability Does Not Capture the Logic of Scientific Justification”, in C. Hitchcock (ed.), Contemporary Debates in the Philosophy of Science , London: Blackwell, pp. 94–114.
  • Keynes, J., 1921, A Treatise on Probability , London: Macmillan.
  • Kitcher, P., 1993. The Advancement of Science , Oxford: Oxford University Press.
  • Kopec, M. and M. Titelbaum, 2016, “The Uniqueness Thesis”, The Philosophy Compass , 11: 189–200.
  • Kuhn, T., 1957, The Copernican Revolution , Cambridge (MA): Harvard University Press.
  • –––, 1962/1970, The Structure of Scientific Revolutions , Chicago: University of Chicago Press.
  • Kuipers, T., 2000, From Instrumentalism to Constructive Realism , Dordrecht: Reidel.
  • Kyburg, H.E. and C.M. Teng, 2001, Uncertain Inference , New York: Cambridge University Press.
  • Lakatos, I. and E. Zahar, 1975, “Why did Copernicus’ Research Programme Supersede Ptolemy’s?”, in R.S. Westman (ed.), The Copernican Achievement , Berkeley (CA): University of California Press, pp. 354–383 (reprinted in Lakatos, I., Philosophical Papers I: The Methodology of Scientific Research Programmes , Cambridge: Cambridge University Press, 1978, pp. 168–192).
  • Lange, M., 2011, “Hume and the Problem of Induction”, in Gabbay, Hartmann, and Woods, 2011, pp. 43–91.
  • Laudan, L., 1990, “Demystifying Underdetermination”, in Savage, 1990, pp. 267–297.
  • –––, 1997, “How About Bust? Factoring Explanatory Power Back into Theory Evaluation”, Philosophy of Science , 64: 206–216.
  • Laudan, L. and J. Leplin, 1991, “Empirical Equivalence and Underdetermination”, Journal of Philosophy , 88: 449–472.
  • Levi, I., 2008, “Degrees of Belief”, Journal of Logic and Computation , 18: 699–719.
  • Liebman, J., J. Fagan, V. West, and J. Lloyd, 2000, “Capital Attrition: Error Rates in Capital Cases, 1973–1995”, Texas Law Review , 78: 1839–1865.
  • Lipton, P., 2000, “Inference to the Best Explanation”, in W.H. Newton-Smith (ed.), A Companion to the Philosophy of Science , Oxford: Blackwell, pp. 184–193.
  • –––, 2004, Inference to the Best Explanation , London: Routledge.
  • –––, 2005, “Testing Hypotheses: Prediction and Prejudice”, Science , 307: 219–221.
  • Lo, Y., A. Sides, J. Rozelle, and D.N. Osherson, 2002, “Evidential Diversity and Premise Probability in Young Children’s Inductive Judgment”, Cognitive Science , 26: 181–206.
  • Maher, P., 1996, “Subjective and Objective Confirmation”, Philosophy of Science , 63: 149–174.
  • –––, 2006, “Confirmation Theory”, in D.M. Borchert (ed.), Encyclopedia of Philosophy (2 nd edition), Detroit (MI): Macmillan Reference.
  • –––, 2010, “Explication of Inductive Probability”, Journal of Philosophical Logic , 39: 593–616.
  • Martens, R., 2009, “Harmony and Simplicity: Aesthetic Virtues and the Rise of Testability”, Studies in History and Philosophy of Science , 40: 258–266.
  • Maxwell, G. and R.M. Anderson, Jr. (eds.), 1975, Minnesota Studies in Philosophy of Science , Vol. 6: Induction, Probability, and Confirmarion , Minneapolis: University of Minnesota Press.
  • Mayo, D., 1991, “Novel Evidence and Severe Tests”, Philosophy of Science , 58: 523–553.
  • –––, 1996, Error and the Growth of Experimental Knowledge , Chicago: University of Chicago Press.
  • –––, 2010,“Learning from Error: The Theoretical Significance of Experimental Knowledge”, The Modern Schoolman , 87: 191–217.
  • –––, 2014,“Some Surprising Facts about (the Problem of) Surprising Facts”, Studies in the History and Philosophy of Science , 45: 79–86.
  • Mayo, D. and A. Spanos (eds.), 2010, Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability, and the Objectivity and Rationality of Science , Cambridge and London: Cambridge University Press.
  • Meacham, C.J.G., 2014, “Impermissive Bayesianism”, Erkenntnis , 79: 1185–1217.
  • Meehl, P.E., 1990, “Appraising and Amending Theories: The Strategy of Lakatosian Defense and Two Principles That Warrant Using It”, Psychological Inquiry , 1: 108–141.
  • Mill, J.S., 1843/1872, A System of Logic , Honolulu: University Press of the Pacific, 2002.
  • Milne, P., 1996, “\(\Log[P(h\mid eb)/P(h\mid b)]\) Is the One True Measure of Confirmation”, Philosophy of Science , 63: 21–6.
  • –––, 2014, “Information, Confirmation, and Conditionals”, Journal of Applied Logic , 12: 252–262.
  • Moretti, L., 2006, “The Tacking by Disjunction Paradox: Bayesianism versus Hypothetico-Deductivism”, Erkenntnis , 64: 115–138.
  • Morey, R.D., J.-W. Romeijn, and J.N. Rouder, 2016, “The Philosophy of Bayes Factors and the Quantification of Statistical Evidence”, Journal of Mathematical Psychology , 72: 6–18.
  • Mura, A., 2008, “Can Logical Probability Be Viewed as a Measure of Degrees of Partial Entailment?”, Logic & Philosophy of Science 6: 25–33.
  • Musgrave, A., 1974, “Logical versus Historical Theories of Confirmation”, British Journal for the Philosophy of Science , 25: 1–23.
  • –––, 2010, “Critical Rationalism, Explanation, and Severe Tests”, in D. Mayo and A. Spanos, 2010, pp. 88–112.
  • Nicod, J., 1924, Le problème logique de l’induction , Paris: Alcan. (Engl. transl. “The Logical Problem of Induction”, in Foundations of Geometry and Induction , London: Routledge, 2000.)
  • Norton, J., 2005, “A Little Survey on Induction”, in P. Achinstein (ed.), Scientific Evidence: Philosophical Theories and Applications , Baltimore: John Hopkins University Press, pp. 9–34.
  • –––, 2011a, “History of Science and the Material Theory of Induction: Einstein’s Quanta, Mercury’s Perihelion”, European Journal for Philosophy of Science , 1: 3–27.
  • –––, 2011b, “Challenges to Bayesian Confirmation Theory”, in D. Gabbay, P. Thagard, J. Woods, S. Bandyopadhyay, and M. Forster (eds.), Handbook of the Philosophy of Science: Philosophy of Statistics , Amsterdam: Elsevier, pp. 391–440.
  • Oaksford, M. and N. Chater, 2007, Bayesian Rationality , Oxford: Clarendon Press.
  • Okasha, S., 2011, “Experiment, Observation, and the Confirmation of Laws”, Analysis , 71: 222–232.
  • Osherson, D., E.E. Smith, and E. Shafir, 1986, “Some Origins of Belief”, Cognition , 24: 197–224.
  • Paris, J., 2011, “Pure Inductive Logic”, in L. Horsten and R. Pettigrew (eds.), The Continuum Companion to Philosophical Logic , London: Continuum, pp. 428–449.
  • Park, I., 2014, “Confirmation Measures and Collaborative Belief Updating”, Synthese , 191: 3955–3975.
  • Peijnenburg, J., 2012, “A Case of Confusing Probability and Confirmation”, Synthese , 184: 101–107.
  • Peirce, C.S., 1878, “The Probability of Induction”, in Philosophical Writings of Peirce (edited by J. Buchler), New York: Dover, pp. 174–189.
  • Pettigrew, R., 2016, Accuracy and the Laws of Credence , Oxford: Oxford University Press.
  • Popper, K.R., 1934/1959, The Logic of Scientific Discovery , London: Routledge, 2002.
  • –––, 1954, “Degree of Corroboration”, British Journal for the Philosophy of Science , 5: 143–149.
  • –––, 1963, Conjectures and Refutations , London: Routledge, 2002.
  • Pruss, A.R., 2014, “Independent Tests and the Log-Likelihood Ratio Measure of Confirmation”, Thought , 3: 124–135.
  • Quine, W.v.O., 1951, “Two Dogmas of Empiricism”, in From a Logical Point of View , Cambridge (MA): Harvard University Press, pp. 20–46.
  • –––, 1970, “Natural Kinds”, in N. Rescher (ed.), Essays in Honor of Carl G. Hempel , Dordrecht: Reidel, pp. 41–56.
  • Quine, W.v.O. and J. Ullian, 1970, The web of Belief , New York: Random House.
  • Riesch, H., 2010, “Simple or Simplistic? Scientists’ Views on Occam’s Razor”, Theoria , 67: 75–90.
  • Rinard, S., 2014, “A New Bayesian Solution to the Problem of the Ravens”, Philosophy of Science , 81: 81–100.
  • Roche, W. and T. Shogenji, 2014, “Dwindling Confirmation”, Philosophy of Science , 81: 114–137.
  • Rowbottom, D.P., 2010, “Corroboration and Auxiliary Hypotheses: Duhem’s Thesis Revisited”, Synthese , 177: 139–149.
  • Royall, R., 1997, Statistical Evidence: A Likelihood Paradigm , London: Chapman & Hall.
  • Rusconi, P., M. Marelli, M. D’Addario, S. Russo, and P. Cherubini, 2014, “Evidence Evaluation: Measure Z Corresponds to Human Utility Judgments Better than Measure L and Optimal-Experimental-Design Models”, Journal of Experimental Psychology: Learning, Memory, and Cognition , 40: 703–723.
  • Russell, B., 1912, The Problems of Philosophy , Oxford: Oxford University Press, 1997.
  • Salmon, W.C., 1969, “Partial Entailment as a Basis for Inductive Logic”, in N. Rescher (ed.), Essays in Honor of Carl G. Hempel , Dordrecht: Reidel, pp. 47–81.
  • –––, 1975, “Confirmation and Relevance”, in Maxwell and Anderson, 1975, pp. 3–36.
  • –––, 2001, “Explanation and Confirmation: A Bayesian Critique of Inference to the Best Explanation”, in G. Hon and S.S. Rakover (eds.), Explanation: Theoretical Approaches and Applications , Dordrecht: Kluwer, pp. 61–91.
  • Savage, C.W. (ed.), 1990, Minnesota Studies in the Philosophy of Science , Vol. 14: Scientific Theories , Minneapolis: University of Minnesota Press.
  • Schippers, M., 2017, “A Representation Theorem for Absolute Confirmation”, Philosophy of Science , 84: 82–91.
  • Schlosshauer, M. and G. Wheeler, 2011, “Focussed Correlation, Confirmation, and the Jigsaw Puzzle of Variable Evidence”, Philosophy of Science , 78: 376–392.
  • Schurz, G., 1991, “Relevant Deduction”, Erkenntnis , 35: 391–437.
  • –––, 1994, “Relevant Deduction and Hypothetico-Deductivism: A Reply to Gemes”, Erkenntnis , 41: 183–188.
  • –––, 2005, “Bayesian H-D Confirmation and Structuralistic Truthlikeness: Discussion and Comparison with the Relevant-Element and the Content-Part Approach”, in R. Festa, A. Aliseda, and J. Peijnenburg (eds.), Confirmation, Empirical Progress, and Truth Approximation. Essays in Debate with Theo Kuipers , Vol. I, Amsterdam: Rodopi, pp. 141–159.
  • –––, 2014, “Bayesian Pseudo-Confirmation, Use-Novelty, and Genuine Confirmation”, Studies in the History and Philosophy of Science , 45: 87–96.
  • Schwartz, R., 2011, “Goodman and the Demise of Syntactic and Semantic Models”, in Gabbay, Hartmann, and Woods, 2011, pp. 391–413.
  • Scorzato, L., 2013, “On the Role of Simplicity in Science”, Synthese , 190: 2867–2895.
  • Shogenji, T., 2012, “The Degree of Epistemic Justification and the Conjunction Fallacy”, Synthese , 184: 29–48.
  • Skyrms, B., 1983, “Three Ways to Give a Probability Assignment a Memory”, in Earman, 1983, pp. 157–162.
  • –––, 1987, “Coherence”, in N. Rescher (ed.) Scientific Inquiry in Philosophical Perspective , Pittsburgh: University of Pittsburgh Press, pp. 225–242.
  • Sober, E., 1975, Simplicity , Oxford: Clarendon Press.
  • –––, 1990, “Contrastive Empiricism”, in Savage, 1990, pp. 392–412.
  • –––, 1994, “No Model, No Inference: A Bayesian Primer on the Grue Problem”, in D. Stalker (ed.), Grue! The New Riddle of Induction , Chicago (IL): Open Court, pp. 225–240.
  • Spohn, W., 2012, The Laws of Belief , Oxford: Oxford University Press.
  • Sprenger, J., 2011a, “Hempel and the Paradoxes of Confirmation”, in Gabbay, Hartmann, and Woods, 2011, pp. 231–260.
  • –––, 2011b, “Hypothetico-Deductive Confirmation”, Philosophy Compass , 6: 497–508.
  • –––, 2015, “A Novel Solution to the Problem of Old Evidence”, Philosophy of Science , 82: 383–401.
  • Sprenger, J. and S. Hartmann, 2019, Bayesian Philosophy of Science , Oxford: Oxford University Press.
  • Steel, D., 2007, “Bayesian Confirmation Theory and the Likelihood Principle”, Synthese , 156: 55–77.
  • Steele, K. and C. Werndl, 2013, “Climate Models, Calibration, and Confirmation”, British Journal for the Philosophy of Science , 64: 609–635.
  • Strasser, C. and G.A. Antonelli, 2019, “Non-monotonic Logic”, The Stanford Encyclopedia of Philosophy (Summer 2019 Edition), E.N. Zalta (ed.), URL = < https://plato.stanford.edu/archives/sum2019/entries/logic-nonmonotonic/ >.
  • Strevens, M., 2001, “The Bayesian Treatment of Auxiliary Hypotheses”, British Journal for the Philosophy of Science , 52: 515–537.
  • Tentori, K., V. Crupi, and D. Osherson, 2007, “Determinants of Confirmation”, Psychonomic Bulletin & Review , 14: 877–83.
  • –––, 2010, “Second-order Probability Affects Hypothesis Confirmation”, Psychonomic Bulletin & Review , 17: 129–34.
  • Thagard, P.R., 1978, “The Best Explanation: Criteria for Theory Choice”, Journal of Philosophy , 75: 76–92.
  • Törnebohm, H., 1966, “Two Measures of Evidential Strength”, in J. Hintikka and P. Suppes (eds.), Aspects of Inductive Logic , Amsterdam: North-Holland, pp. 81–95.
  • van Enk, S.J., 2014, “Bayesian Measures of Confirmation from Scoring Rules”, Philosophy of Science , 81: 101–113.
  • van Fraassen, B.C., 1988, “The Problem of Old Evidence”, in D.F. Austin (ed.), Philosophical Analysis , Dordrecht: Reidel, pp. 153–165.
  • –––, B.C., 1989, Laws and Symmetry , Oxford: Oxford University Press.
  • Varzi, A.C., 2008, “Patterns, Rules, and Inferences”, in J.A. Adler and L.J. Rips (eds.), Reasoning: Studies in Human Inference and Its Foundations , Cambridge: Cambridge University Press, pp. 282–290.
  • Vassend, O.B., 2015, “Confirmation Measures and Sensitivity”, Philosophy of Science , 82: 892–904.
  • Viale, R. and D. Osherson, 2000, “The Diversity Principle and the Little Scientist Hypothesis”, Foundations of Science , 5: 239–253.
  • Vineberg, S., 2016, “Dutch Book Arguments”, in E.N. Zalta (ed.), The Stanford Encyclopedia of Philosophy (Spring 2016 Edition), URL = < https://plato.stanford.edu/archives/spr2016/entries/dutch-book/ >.
  • Votsis, I., 2014, “Objectivity in Confirmation: Post Hoc Monsters and Novel Predictions”, Studies in the History and Philosophy of Science , 45: 70–78.
  • Wagner, C.G., 2001, “Old Evidence and New Explanation III”, Philosophy of Science , 68: S165–S175.
  • Weisberg, J., 2015, “You’ve Come a Long Way, Bayesians”, Journal of Philosophical Logic , 44 (The Fortieth Anniversary Issue): 817–834.
  • Whewell, W., 1840/1847, The Philosophy of the Inductive Sciences, Founded Upon Their History , Charleston (SC): BiblioBazaar, 2011.
  • Williamson, J., 2011, “An Objective Bayesian Account of Confirmation”, in D. Dieks, W.J. Gonzalez, S. Hartmann, T. Uebel, and M. Weber (eds.), Explanation, Prediction, and Confirmation , Berlin: Springer, 2011, pp. 53–81.
  • Williamson, T., 2002, Knowledge and Its Limits , Oxford: Oxford University Press.
  • Winters, B., J. Custer, S.M. Galvagno Jr, E. Colantuoni, S.G. Kapoor, H. Lee, V. Goode, K. Robinson, A. Nakhasi, P. Pronovost, and D. Newman-Toker, 2012, “Diagnostic Errors in the Intensive Care Unit: A Systematic Review of Autopsy Studies”, BMJ Quality and Safety , 21: 894–902.
  • Woodward, J., 1983, “Glymour on Theory Confirmation”, Philosophical Studies , 43: 147–152.
  • –––, 2019, “Scientific Explanation”, The Stanford Encyclopedia of Philosophy (Winter 2019 Edition), E.N. Zalta (ed.), URL= < https://plato.stanford.edu/archives/win2019/entries/scientific-explanation/ >.
  • Worrall, J., 1978, “The Ways in Which the Methodology of Scientific Research Programmes Improves on Popper’s Methodology”, in G. Radnitzky and G. Andersson (eds.), Progress and Rationality in Science , Dordrecht: Reidel, 1978, pp. 45–70.
  • –––, 1982, “Broken Bootstraps”, Erkenntnis , 18: 105–130.
  • –––, 1990, “Scientific Revolutions and Scientific Rationality: The Case of the ‘Elderly Holdout’”, in Savage, 1990, pp. 319–354.
  • –––, 1993, “Falsification, Rationality, and the Duhem Problem”, in J. Earman, G.J. Massey, and N. Rescher (eds.), Philosophical Problems of the Internal and External World: Essays on the Philosophy of A. Grünbaum , Pittsburgh: University of Pittsburgh Press, pp. 329–370.
  • –––, 1996, “‘Revolution in Permanence’: Popper on Theory-Change in Science”, in A. O’Hear (ed.), Karl Popper: Philosophy and Problems , Cambridge: Cambridge University Press, pp. 75–102.
  • –––, 2010, “Evidence: Philosophy of Science Meets Medicine”, Journal of Evaluation in Clinical Practice , 16: 356–362.
  • Zabell, S., 2011, “Carnap and the Logic of Induction”, in Gabbay, Hartmann, and Woods, 2011, pp. 265–310.
  • Zahar, E., 1973, “Why Did Einstein’s Programme Supersede Lorentz’s?”, British Journal for the Philosophy of Science , 24: 95–123, 223–262.
  • Zellner, A., H. Keuzenkamp, and M. McAleer (eds.), 2002, Simplicity, Inference, and Modelling: Keeping It Sophisticatedly Simple , Cambridge: Cambridge University Press.
How to cite this entry . Preview the PDF version of this entry at the Friends of the SEP Society . Look up topics and thinkers related to this entry at the Internet Philosophy Ontology Project (InPhO). Enhanced bibliography for this entry at PhilPapers , with links to its database.
  • Hawthorne, J. and B. Fitelson, 2010, An Even Better Solution to the Paradox of the Ravens , manuscript available online (PDF).
  • Milne, P., 2010, Measuring Confirmation (PDF), slides for a talk at the 7th Annual Formal Epistemology Workshop, Konstanz, 2–4 September 2010.

Carnap, Rudolf | epistemology: Bayesian | evidence | Hempel, Carl | induction: problem of | logic: inductive | probability, interpretations of | statistics, philosophy of

Acknowledgments

I would like to thank Gustavo Cevolani, Paul Dicken, and Jan Sprenger for useful comments on previous drafts of this entry, and Prof. Wonbae Choi for helping me correcting a mistake.

Copyright © 2020 by Vincenzo Crupi < vincenzo . crupi @ unito . it >

  • Accessibility

Support SEP

Mirror sites.

View this site from another server:

  • Info about mirror sites

The Stanford Encyclopedia of Philosophy is copyright © 2023 by The Metaphysics Research Lab , Department of Philosophy, Stanford University

Library of Congress Catalog Data: ISSN 1095-5054

  • Bipolar Disorder
  • Therapy Center
  • When To See a Therapist
  • Types of Therapy
  • Best Online Therapy
  • Best Couples Therapy
  • Best Family Therapy
  • Managing Stress
  • Sleep and Dreaming
  • Understanding Emotions
  • Self-Improvement
  • Healthy Relationships
  • Student Resources
  • Personality Types
  • Guided Meditations
  • Verywell Mind Insights
  • 2024 Verywell Mind 25
  • Mental Health in the Classroom
  • Editorial Process
  • Meet Our Review Board
  • Crisis Support

How to Write a Great Hypothesis

Hypothesis Definition, Format, Examples, and Tips

Kendra Cherry, MS, is a psychosocial rehabilitation specialist, psychology educator, and author of the "Everything Psychology Book."

what does confirm hypothesis mean

Amy Morin, LCSW, is a psychotherapist and international bestselling author. Her books, including "13 Things Mentally Strong People Don't Do," have been translated into more than 40 languages. Her TEDx talk,  "The Secret of Becoming Mentally Strong," is one of the most viewed talks of all time.

what does confirm hypothesis mean

Verywell / Alex Dos Diaz

  • The Scientific Method

Hypothesis Format

Falsifiability of a hypothesis.

  • Operationalization

Hypothesis Types

Hypotheses examples.

  • Collecting Data

A hypothesis is a tentative statement about the relationship between two or more variables. It is a specific, testable prediction about what you expect to happen in a study. It is a preliminary answer to your question that helps guide the research process.

Consider a study designed to examine the relationship between sleep deprivation and test performance. The hypothesis might be: "This study is designed to assess the hypothesis that sleep-deprived people will perform worse on a test than individuals who are not sleep-deprived."

At a Glance

A hypothesis is crucial to scientific research because it offers a clear direction for what the researchers are looking to find. This allows them to design experiments to test their predictions and add to our scientific knowledge about the world. This article explores how a hypothesis is used in psychology research, how to write a good hypothesis, and the different types of hypotheses you might use.

The Hypothesis in the Scientific Method

In the scientific method , whether it involves research in psychology, biology, or some other area, a hypothesis represents what the researchers think will happen in an experiment. The scientific method involves the following steps:

  • Forming a question
  • Performing background research
  • Creating a hypothesis
  • Designing an experiment
  • Collecting data
  • Analyzing the results
  • Drawing conclusions
  • Communicating the results

The hypothesis is a prediction, but it involves more than a guess. Most of the time, the hypothesis begins with a question which is then explored through background research. At this point, researchers then begin to develop a testable hypothesis.

Unless you are creating an exploratory study, your hypothesis should always explain what you  expect  to happen.

In a study exploring the effects of a particular drug, the hypothesis might be that researchers expect the drug to have some type of effect on the symptoms of a specific illness. In psychology, the hypothesis might focus on how a certain aspect of the environment might influence a particular behavior.

Remember, a hypothesis does not have to be correct. While the hypothesis predicts what the researchers expect to see, the goal of the research is to determine whether this guess is right or wrong. When conducting an experiment, researchers might explore numerous factors to determine which ones might contribute to the ultimate outcome.

In many cases, researchers may find that the results of an experiment  do not  support the original hypothesis. When writing up these results, the researchers might suggest other options that should be explored in future studies.

In many cases, researchers might draw a hypothesis from a specific theory or build on previous research. For example, prior research has shown that stress can impact the immune system. So a researcher might hypothesize: "People with high-stress levels will be more likely to contract a common cold after being exposed to the virus than people who have low-stress levels."

In other instances, researchers might look at commonly held beliefs or folk wisdom. "Birds of a feather flock together" is one example of folk adage that a psychologist might try to investigate. The researcher might pose a specific hypothesis that "People tend to select romantic partners who are similar to them in interests and educational level."

Elements of a Good Hypothesis

So how do you write a good hypothesis? When trying to come up with a hypothesis for your research or experiments, ask yourself the following questions:

  • Is your hypothesis based on your research on a topic?
  • Can your hypothesis be tested?
  • Does your hypothesis include independent and dependent variables?

Before you come up with a specific hypothesis, spend some time doing background research. Once you have completed a literature review, start thinking about potential questions you still have. Pay attention to the discussion section in the  journal articles you read . Many authors will suggest questions that still need to be explored.

How to Formulate a Good Hypothesis

To form a hypothesis, you should take these steps:

  • Collect as many observations about a topic or problem as you can.
  • Evaluate these observations and look for possible causes of the problem.
  • Create a list of possible explanations that you might want to explore.
  • After you have developed some possible hypotheses, think of ways that you could confirm or disprove each hypothesis through experimentation. This is known as falsifiability.

In the scientific method ,  falsifiability is an important part of any valid hypothesis. In order to test a claim scientifically, it must be possible that the claim could be proven false.

Students sometimes confuse the idea of falsifiability with the idea that it means that something is false, which is not the case. What falsifiability means is that  if  something was false, then it is possible to demonstrate that it is false.

One of the hallmarks of pseudoscience is that it makes claims that cannot be refuted or proven false.

The Importance of Operational Definitions

A variable is a factor or element that can be changed and manipulated in ways that are observable and measurable. However, the researcher must also define how the variable will be manipulated and measured in the study.

Operational definitions are specific definitions for all relevant factors in a study. This process helps make vague or ambiguous concepts detailed and measurable.

For example, a researcher might operationally define the variable " test anxiety " as the results of a self-report measure of anxiety experienced during an exam. A "study habits" variable might be defined by the amount of studying that actually occurs as measured by time.

These precise descriptions are important because many things can be measured in various ways. Clearly defining these variables and how they are measured helps ensure that other researchers can replicate your results.

Replicability

One of the basic principles of any type of scientific research is that the results must be replicable.

Replication means repeating an experiment in the same way to produce the same results. By clearly detailing the specifics of how the variables were measured and manipulated, other researchers can better understand the results and repeat the study if needed.

Some variables are more difficult than others to define. For example, how would you operationally define a variable such as aggression ? For obvious ethical reasons, researchers cannot create a situation in which a person behaves aggressively toward others.

To measure this variable, the researcher must devise a measurement that assesses aggressive behavior without harming others. The researcher might utilize a simulated task to measure aggressiveness in this situation.

Hypothesis Checklist

  • Does your hypothesis focus on something that you can actually test?
  • Does your hypothesis include both an independent and dependent variable?
  • Can you manipulate the variables?
  • Can your hypothesis be tested without violating ethical standards?

The hypothesis you use will depend on what you are investigating and hoping to find. Some of the main types of hypotheses that you might use include:

  • Simple hypothesis : This type of hypothesis suggests there is a relationship between one independent variable and one dependent variable.
  • Complex hypothesis : This type suggests a relationship between three or more variables, such as two independent and dependent variables.
  • Null hypothesis : This hypothesis suggests no relationship exists between two or more variables.
  • Alternative hypothesis : This hypothesis states the opposite of the null hypothesis.
  • Statistical hypothesis : This hypothesis uses statistical analysis to evaluate a representative population sample and then generalizes the findings to the larger group.
  • Logical hypothesis : This hypothesis assumes a relationship between variables without collecting data or evidence.

A hypothesis often follows a basic format of "If {this happens} then {this will happen}." One way to structure your hypothesis is to describe what will happen to the  dependent variable  if you change the  independent variable .

The basic format might be: "If {these changes are made to a certain independent variable}, then we will observe {a change in a specific dependent variable}."

A few examples of simple hypotheses:

  • "Students who eat breakfast will perform better on a math exam than students who do not eat breakfast."
  • "Students who experience test anxiety before an English exam will get lower scores than students who do not experience test anxiety."​
  • "Motorists who talk on the phone while driving will be more likely to make errors on a driving course than those who do not talk on the phone."
  • "Children who receive a new reading intervention will have higher reading scores than students who do not receive the intervention."

Examples of a complex hypothesis include:

  • "People with high-sugar diets and sedentary activity levels are more likely to develop depression."
  • "Younger people who are regularly exposed to green, outdoor areas have better subjective well-being than older adults who have limited exposure to green spaces."

Examples of a null hypothesis include:

  • "There is no difference in anxiety levels between people who take St. John's wort supplements and those who do not."
  • "There is no difference in scores on a memory recall task between children and adults."
  • "There is no difference in aggression levels between children who play first-person shooter games and those who do not."

Examples of an alternative hypothesis:

  • "People who take St. John's wort supplements will have less anxiety than those who do not."
  • "Adults will perform better on a memory task than children."
  • "Children who play first-person shooter games will show higher levels of aggression than children who do not." 

Collecting Data on Your Hypothesis

Once a researcher has formed a testable hypothesis, the next step is to select a research design and start collecting data. The research method depends largely on exactly what they are studying. There are two basic types of research methods: descriptive research and experimental research.

Descriptive Research Methods

Descriptive research such as  case studies ,  naturalistic observations , and surveys are often used when  conducting an experiment is difficult or impossible. These methods are best used to describe different aspects of a behavior or psychological phenomenon.

Once a researcher has collected data using descriptive methods, a  correlational study  can examine how the variables are related. This research method might be used to investigate a hypothesis that is difficult to test experimentally.

Experimental Research Methods

Experimental methods  are used to demonstrate causal relationships between variables. In an experiment, the researcher systematically manipulates a variable of interest (known as the independent variable) and measures the effect on another variable (known as the dependent variable).

Unlike correlational studies, which can only be used to determine if there is a relationship between two variables, experimental methods can be used to determine the actual nature of the relationship—whether changes in one variable actually  cause  another to change.

The hypothesis is a critical part of any scientific exploration. It represents what researchers expect to find in a study or experiment. In situations where the hypothesis is unsupported by the research, the research still has value. Such research helps us better understand how different aspects of the natural world relate to one another. It also helps us develop new hypotheses that can then be tested in the future.

Thompson WH, Skau S. On the scope of scientific hypotheses .  R Soc Open Sci . 2023;10(8):230607. doi:10.1098/rsos.230607

Taran S, Adhikari NKJ, Fan E. Falsifiability in medicine: what clinicians can learn from Karl Popper [published correction appears in Intensive Care Med. 2021 Jun 17;:].  Intensive Care Med . 2021;47(9):1054-1056. doi:10.1007/s00134-021-06432-z

Eyler AA. Research Methods for Public Health . 1st ed. Springer Publishing Company; 2020. doi:10.1891/9780826182067.0004

Nosek BA, Errington TM. What is replication ?  PLoS Biol . 2020;18(3):e3000691. doi:10.1371/journal.pbio.3000691

Aggarwal R, Ranganathan P. Study designs: Part 2 - Descriptive studies .  Perspect Clin Res . 2019;10(1):34-36. doi:10.4103/picr.PICR_154_18

Nevid J. Psychology: Concepts and Applications. Wadworth, 2013.

By Kendra Cherry, MSEd Kendra Cherry, MS, is a psychosocial rehabilitation specialist, psychology educator, and author of the "Everything Psychology Book."

What is a scientific hypothesis?

It's the initial building block in the scientific method.

A girl looks at plants in a test tube for a science experiment. What's her scientific hypothesis?

Hypothesis basics

What makes a hypothesis testable.

  • Types of hypotheses
  • Hypothesis versus theory

Additional resources

Bibliography.

A scientific hypothesis is a tentative, testable explanation for a phenomenon in the natural world. It's the initial building block in the scientific method . Many describe it as an "educated guess" based on prior knowledge and observation. While this is true, a hypothesis is more informed than a guess. While an "educated guess" suggests a random prediction based on a person's expertise, developing a hypothesis requires active observation and background research. 

The basic idea of a hypothesis is that there is no predetermined outcome. For a solution to be termed a scientific hypothesis, it has to be an idea that can be supported or refuted through carefully crafted experimentation or observation. This concept, called falsifiability and testability, was advanced in the mid-20th century by Austrian-British philosopher Karl Popper in his famous book "The Logic of Scientific Discovery" (Routledge, 1959).

A key function of a hypothesis is to derive predictions about the results of future experiments and then perform those experiments to see whether they support the predictions.

A hypothesis is usually written in the form of an if-then statement, which gives a possibility (if) and explains what may happen because of the possibility (then). The statement could also include "may," according to California State University, Bakersfield .

Here are some examples of hypothesis statements:

  • If garlic repels fleas, then a dog that is given garlic every day will not get fleas.
  • If sugar causes cavities, then people who eat a lot of candy may be more prone to cavities.
  • If ultraviolet light can damage the eyes, then maybe this light can cause blindness.

A useful hypothesis should be testable and falsifiable. That means that it should be possible to prove it wrong. A theory that can't be proved wrong is nonscientific, according to Karl Popper's 1963 book " Conjectures and Refutations ."

An example of an untestable statement is, "Dogs are better than cats." That's because the definition of "better" is vague and subjective. However, an untestable statement can be reworded to make it testable. For example, the previous statement could be changed to this: "Owning a dog is associated with higher levels of physical fitness than owning a cat." With this statement, the researcher can take measures of physical fitness from dog and cat owners and compare the two.

Types of scientific hypotheses

Elementary-age students study alternative energy using homemade windmills during public school science class.

In an experiment, researchers generally state their hypotheses in two ways. The null hypothesis predicts that there will be no relationship between the variables tested, or no difference between the experimental groups. The alternative hypothesis predicts the opposite: that there will be a difference between the experimental groups. This is usually the hypothesis scientists are most interested in, according to the University of Miami .

For example, a null hypothesis might state, "There will be no difference in the rate of muscle growth between people who take a protein supplement and people who don't." The alternative hypothesis would state, "There will be a difference in the rate of muscle growth between people who take a protein supplement and people who don't."

If the results of the experiment show a relationship between the variables, then the null hypothesis has been rejected in favor of the alternative hypothesis, according to the book " Research Methods in Psychology " (​​BCcampus, 2015). 

There are other ways to describe an alternative hypothesis. The alternative hypothesis above does not specify a direction of the effect, only that there will be a difference between the two groups. That type of prediction is called a two-tailed hypothesis. If a hypothesis specifies a certain direction — for example, that people who take a protein supplement will gain more muscle than people who don't — it is called a one-tailed hypothesis, according to William M. K. Trochim , a professor of Policy Analysis and Management at Cornell University.

Sometimes, errors take place during an experiment. These errors can happen in one of two ways. A type I error is when the null hypothesis is rejected when it is true. This is also known as a false positive. A type II error occurs when the null hypothesis is not rejected when it is false. This is also known as a false negative, according to the University of California, Berkeley . 

A hypothesis can be rejected or modified, but it can never be proved correct 100% of the time. For example, a scientist can form a hypothesis stating that if a certain type of tomato has a gene for red pigment, that type of tomato will be red. During research, the scientist then finds that each tomato of this type is red. Though the findings confirm the hypothesis, there may be a tomato of that type somewhere in the world that isn't red. Thus, the hypothesis is true, but it may not be true 100% of the time.

Scientific theory vs. scientific hypothesis

The best hypotheses are simple. They deal with a relatively narrow set of phenomena. But theories are broader; they generally combine multiple hypotheses into a general explanation for a wide range of phenomena, according to the University of California, Berkeley . For example, a hypothesis might state, "If animals adapt to suit their environments, then birds that live on islands with lots of seeds to eat will have differently shaped beaks than birds that live on islands with lots of insects to eat." After testing many hypotheses like these, Charles Darwin formulated an overarching theory: the theory of evolution by natural selection.

"Theories are the ways that we make sense of what we observe in the natural world," Tanner said. "Theories are structures of ideas that explain and interpret facts." 

  • Read more about writing a hypothesis, from the American Medical Writers Association.
  • Find out why a hypothesis isn't always necessary in science, from The American Biology Teacher.
  • Learn about null and alternative hypotheses, from Prof. Essa on YouTube .

Encyclopedia Britannica. Scientific Hypothesis. Jan. 13, 2022. https://www.britannica.com/science/scientific-hypothesis

Karl Popper, "The Logic of Scientific Discovery," Routledge, 1959.

California State University, Bakersfield, "Formatting a testable hypothesis." https://www.csub.edu/~ddodenhoff/Bio100/Bio100sp04/formattingahypothesis.htm  

Karl Popper, "Conjectures and Refutations," Routledge, 1963.

Price, P., Jhangiani, R., & Chiang, I., "Research Methods of Psychology — 2nd Canadian Edition," BCcampus, 2015.‌

University of Miami, "The Scientific Method" http://www.bio.miami.edu/dana/161/evolution/161app1_scimethod.pdf  

William M.K. Trochim, "Research Methods Knowledge Base," https://conjointly.com/kb/hypotheses-explained/  

University of California, Berkeley, "Multiple Hypothesis Testing and False Discovery Rate" https://www.stat.berkeley.edu/~hhuang/STAT141/Lecture-FDR.pdf  

University of California, Berkeley, "Science at multiple levels" https://undsci.berkeley.edu/article/0_0_0/howscienceworks_19

Sign up for the Live Science daily newsletter now

Get the world’s most fascinating discoveries delivered straight to your inbox.

Y chromosome is evolving faster than the X, primate study reveals

Earth from space: Trio of ringed ice caps look otherworldly on Russian Arctic islands

Strawberry Moon 2024: See summer's first full moon rise a day after solstice

Most Popular

  • 2 Mysterious 4,000-year-old 'palace' with maze-like walls found on Greek island of Crete
  • 3 NASA will put a 'new star' in the sky by the end of the decade in 1st-of-its-kind mission
  • 4 How long would it take to reach Planet 9, if we ever find it?
  • 5 Planet Nine: Is the search for this elusive world nearly over?
  • 2 Scientists inserted a window in a man's skull to read his brain with ultrasound
  • 3 Gilgamesh flood tablet: A 2,600-year-old text that's eerily similar to the story of Noah's Ark
  • 4 Y chromosome is evolving faster than the X, primate study reveals
  • 5 The sun's magnetic field is about to flip. Here's what to expect.

what does confirm hypothesis mean

what does confirm hypothesis mean

Internet Encyclopedia of Philosophy

Confirmation and induction.

The term “confirmation” is used in epistemology and the philosophy of science whenever observational data and evidence “speak in favor of” or support scientific theories and everyday hypotheses. Historically, confirmation has been closely related to the problem of induction, the question of what to believe regarding the future in the face of knowledge that is restricted to the past and present. One view of the relation between confirmation and induction is that the conclusion H of an inductively strong argument with premise E is confirmed by E . If inductive strength comes in degrees and the inductive strength of the argument with premise E and conclusion H is equal to r , then the degree of confirmation of H by E is likewise said to be equal to r .

This article begins by briefly reviewing Hume ‘s formulation of the problem of the justification of induction. Then it jumps to the middle of the twentieth century and Hempel ‘s pioneering work on confirmation. After looking at Popper’s falsificationism and the hypothetico-deductive method of hypotheses testing, the notion of probability, as it was defined by Kolmogorov, is introduced. Probability theory is the main mathematical tool for Carnap ‘s inductive logic as well as for Bayesian confirmation theory. Carnap’s inductive logic is based on a logical interpretation of probability, which is discussed at some length. However, his heroic efforts to construct a logical probability measure in purely syntactical terms can be considered to have failed. Goodman’s new riddle of induction serves to illustrate the shortcomings of such a purely syntactical approach to confirmation. Carnap’s work is nevertheless important because today’s most popular theory of confirmation—Bayesian confirmation theory—is to a great extent the result of replacing Carnap’s logical interpretation of probability with a subjective interpretation as degree of belief qua fair betting ratio. The rest of the article mainly is concerned with Bayesian confirmation theory, although the final section mentions some alternative views on confirmation and induction.

Table of Contents

  • Introduction: Confirmation and Induction
  • The Ravens Paradox
  • The Logic of Confirmation
  • Popper’s Falsificationism
  • Hypothetico-Deductive Confirmation
  • Kolmogorov’s Axiomatization
  • Logical Probability and Degree of Confirmation
  • Absolute and Incremental Confirmation
  • Carnap’s Analysis of Hempel’s Conditions
  • The New Riddle of Induction and the Demise of the Syntactic Approach
  • Subjective Probability and the Dutch Book Argument
  • Confirmation Measures
  • Some Success Stories
  • Taking Stock
  • References and Further Reading

1. Introduction: Confirmation and Induction

Whenever observational data and evidence speak in favor of, or support, scientific theories or everyday hypotheses, the latter are said to be confirmed by the former. The positive result of an allergy test speaks in favor of, or confirms, the hypothesis that the tested person has the allergy that is tested for. The dark clouds on the sky support, or confirm, the hypothesis that it will be raining soon.

Confirmation takes a qualitative and a quantitative form. Qualitative confirmation is usually construed as a relation, among other things, between three sentences or propositions: evidence E confirms hypothesis H relative to background information B . Quantitative confirmation is, among other things, a relation between evidence E , hypothesis H , background information B, and a number r : E confirms H relative to B to degree r . (Comparative confirmation— H 1 is more confirmed by E 1 relative to B 1 than H 2 by E 2 relative to B 2 —is usually derived from a quantitative notion of confirmation, and is not discussed in this article.)

Historically, confirmation has been closely related to the problem of induction, the question of what to believe regarding the future in the face of knowledge that is restricted to the past and present. David Hume gives the classic formulation of the problem of the justification of induction in A Treatise of Human Nature :

Let men be once fully persuaded of these two principles, that there is nothing in any object, consider’d in itself, which can afford us a reason for drawing a conclusion beyond it ; and, that even after the observation of the frequent or constant conjunction of objects, we have no reason to draw any inference concerning any object beyond those of which we have had experience ; (Hume 1739/2000, book 1, part 3, section 12)

The reason is that any such inference beyond those objects of which we had experience needs to be justified—and, according to Hume, this is not possible.

In order to justify induction one has to provide a deductively valid argument, or an inductively strong argument, whose premises we know to be true, and whose conclusion says that inductively strong arguments lead from true premises to true conclusions (most of the time). (An argument consists of a list of premises P 1 , …, P n and a conclusion C. An argument is deductively valid just in case the truth of the premises logically guarantees the truth of the conclusion. There is no standard definition of an inductively strong argument, but the idea is that the truth of all premises speaks in favor of, or supports, the truth of conclusion.) However, there is no deductively valid argument whose premises we know to be true and whose conclusion says that inductively strong arguments lead from true premises to true conclusions (most of the time). This is so, because all our knowledge is restricted to the past and present, the relevant conclusion is in part about the future, and it is a fact of logic that there are no deductively valid arguments whose premises are restricted to the past and present and whose conclusion is in part about the future. Furthermore, any inductively strong argument presumably has to be inductively strong in the sense of the very principle of induction that is to be justified—and thus begs the question: it is a petitio principii , an argument that presupposes the principle that it derives. For more, see the introductory Skyrms (2000), the intermediate Hacking (2001), and the advanced Howson (2000a).

Neglecting the background information B , as we will mostly do in the following, we can state the link between induction and confirmation as follows. The conclusion H of an inductively strong argument with premise E is confirmed by E . If r quantifies the strength of the inductive argument in question, the degree of confirmation of H by E is equal to r . Let us then start the discussion of confirmation by the first serious attempts to define the notion, and to develop a corresponding logic of confirmation.

2. Hempel and the Logic of Confirmation

A. the ravens paradox.

According to the Nicod criterion of confirmation (Hempel 1945), universal generalizations of the form “All F s are G s,” in symbols ∀ x ( Fx   →  Gx ), are confirmed by their instances “This particular object a is both F and G ,” or in symbols  Fa  ∧  Ga . (It would be more appropriate to call Fa → Ga rather than Fa  ∧  Ga an instance of ∀ x ( Fx  → Gx ).) The universal generalization “All ravens are black” is thus said to be confirmed by its instance “ a is a black raven.” As “ a is a non-black non-raven” is an instance of “All non-black things are non-ravens,” the Nicod criterion says that “ a is a non-black non-raven” confirms “All non-black things are non-ravens.” (It is sometimes said that a black raven confirms the ravens hypothesis “All ravens are black.” In this case, confirmation is a relation between a non-linguistic entity—namely, a black raven—and a hypothesis. Conformation is construed as a relation between, among other things, evidential propositions and hypotheses, and so we have to state the above in a clumsier way.)

One of Hempel’s conditions of adequacy for any relation of confirmation is the equivalence condition . It says that logically equivalent hypotheses are confirmed by the same evidential propositions. “All ravens are black” is logically equivalent to “All non-black things are non-ravens.” Therefore a non-black non-raven like a white shoe or a red herring can be used to confirm the ravens-hypothesis “All ravens are black.” Surely, this is absurd—and this is known as the ravens paradox.

Even worse, “All ravens are black,” ∀ x ( Rx → Bx ), is logically equivalent to “All things that are green or not green are not ravens or black,”∀ x [( Gx  ∨ ¬ Gx ) → (¬ Rx  ∨  Bx )]. “ a is green or not green, and a is not raven or black” is an instance of this hypothesis. Furthermore, it is logically equivalent to “ a is not a raven or a is black.” As everything is green or not green, we get the similarly paradoxical result that an object which is not a raven or which is black—anything but a non-black raven which could be used to falsify the ravens hypothesis is such an object—can be used to confirm the ravens hypothesis that all ravens are black.

Hempel (1945), who discussed these cases of the ravens, concluded that non-black non-ravens (as well as any other object that is not a raven or black) can indeed be used to confirm the ravens hypothesis. He attributed the paradoxical character of this alleged paradox to the psychological fact that we assume there to be far more non-black objects than ravens. However, the notion of confirmation he was explicating was supposed to presuppose no background knowledge whatsoever. An example by Good (1967) shows that such an unrelativized notion of confirmation is not useful (see Hempel 1967, Good 1968).

Others have been led to the rejection of the Nicod criterion. Howson (2000b, 113) considers the hypothesis “Everybody in the room leaves with somebody else’s hat,” which he attributes to Rosenkrantz (1981). If the background contains the information that there are only three individuals a , b , c in the room, then the evidence consisting of the two instances “ a leaves with b ‘s hat” and “ b leaves with a ‘s hat” falsifies rather than confirms the hypothesis. Besides pointing to the role played by the background information in this example, Hempel would presumably have stressed that the Nicod criterion has to be restricted to universal generalization in one variable only. Already in his (1945, 13: fn. 1) he notes that R ( a , b ) ∧ ¬ R ( a , b ) falsifies ∀ x ∀ y (¬[ R ( x , y ) ∧  R ( y , x )] → [ R ( x , y ) ∧ ¬ R ( x , y )]), which is equivalent to ∀ x ∀x R ( x , y ), although it satisfies both the antecedent and the consequent of the universal generalization (compare also Carnap 1950/1962, 469f).

b. The Logic of Confirmation

After discussing the ravens, Hempel (1945) considers the following conditions of adequacy for any relation of confirmation:

  • Entailment Condition : If an evidential proposition E logically implies some hypothesis H , then E confirms H .
  • Special Consequence Condition : If an evidential proposition E confirms some hypothesis H , and if H logically implies some hypothesis H’, then E also confirms H’.
  • Special Consistency Condition : If an evidential proposition E confirms some hypothesis H , and if H is not compatible with some hypothesis H’, then E does not confirm H’.
  • Converse Consequence Condition : If an evidential proposition E confirms some hypothesis H , and if H is logically implied by some hypothesis H’, then E also confirms H’.

(The equivalence condition mentioned above follows from 2 as well as from 4). Hempel then shows that any relation of confirmation satisfying 1, 2, and 4 is trivial in the sense that every evidential proposition E confirms every hypothesis H . This is easily seen as follows. As E logically implies itself, E confirms E according to the entailment condition. The conjunction of E and H , E  ∧  H , logically implies E , and so the converse consequence condition entails that E confirms E  ∧  H . But E  ∧  H logically implies H ; thus E confirms H by the special consequence condition. In fact, it suffices that confirmation satisfies 1 and 4 in order to be trivial: E logically implies and, by 1, confirms the disjunction of E and H , E  ∨  H . As H logically implies E  ∨  H , E confirms H by 4.

Hempel (1945) rejects the converse consequence condition as the culprit rendering trivial any relation of confirmation satisfying 1-4. The latter condition has nevertheless gained popularity in the philosophy of science—partly because it seems to be at the core of the account of confirmation we will discuss next.

3. Popper’s Falsificationism and Hypothetico-Deductive Confirmation

A. popper’s falsificationism.

Although Popper was an opponent of any kind of induction, his falsificationism gave rise to a qualitative account of confirmation. Popper started by observing that many scientific hypotheses have the form of universal generalizations, say “All metals conduct electricity.” Now there can be no amount of observational data that would verify a universal generalization. After all, the next piece of metal could be such that it does not conduct electricity. In order to verify this hypothesis we would have to investigate all pieces of metal there are—and even if there were only finitely many such pieces, we would never know this (unless there were only finitely many space-time regions we would have to search). Popper’s basic insight is that these universal generalizations can be falsified , though. We only need to find a piece of metal that does not conduct electricity in order to know that our hypothesis is false (supposing we can check this). Popper then generalized this. He suggested that all science should put forth bold hypotheses, which are then severely tested (where ‘bold’ means to have many observational consequences). As long as these hypotheses survive their tests, scientists should stick to them. However, once they are falsified, they should be put aside if there are competing hypotheses that remain unfalsified.

This is not the place to list the numerous problems of Popper’s falsificationism. Suffice it to say that there are many scientific hypotheses that are neither verifiable nor falsifiable, and that falsifying instances are often taken to be indicators of errors that lie elsewhere, say errors of measurement or errors in auxiliary hypotheses. As Duhem and Quine noted, confirmation is holistic in the sense that it is always a whole battery of hypotheses that is put to test, and the arrow of error usually does not point to a single hypothesis (Duhem 1906/1974, Quine 1953).

According to Popper’s falsificationism (see Popper 1935/1994) the hallmark of scientific (rather than meaningful, as in the early days of logical positivism) hypotheses is that they are falsifiable: scientific hypotheses must have consequences whose truth or falsity can in principle (and with a grain of salt) be ascertained by observation (with a grain of salt, because for Popper there is always an element of convention in stipulating the basis of science). If there are no conditions under which a given hypothesis is false, this hypothesis is not scientific (though it may very well be meaningful).

b. Hypothetico-Deductive Confirmation

The hypothetico-deductive notion of confirmation says that an evidential proposition E confirms a hypothesis H relative to background information B if and only if the conjunction of H and B , H ∧ B , logically implies E in some suitable way (which depends on the particular version of hypothetic-deductivism under consideration). The intuition here is that scientific hypotheses are tested; and if a hypothesis H survives a severe test, then, intuitively, this is evidence in favor of H . Furthermore, scientific hypotheses are often used for predictions. If a hypothesis H correctly predicts some experimental outcome E by logically implying it, then, intuitively, this is again evidence for the truth of H . Both of these related aspects are covered by the above definition, if surviving a test is tantamount to entailing the correct outcome.

Note that hypthetico-deductive confirmation—henceforth HD-confirmation—satisfies Hempel’s converse consequence condition. Suppose an evidential proposition E HD-confirms some hypothesis H . This means that H logically implies E in some suitable way. Now any hypothesis H’ which logically implies H also logically implies E . But this means—at least under most conditions fixing the “suitable way” of entailment—that E HD-confirms H’ .

Hypothetico-deductivism has run into serious difficulties. To mention just two, there is the problem of irrelevant conjunctions and the problem of irrelevant disjunctions. Suppose an evidential proposition E HD-confirms some hypothesis H . Then, by the converse consequence condition, E also HD-confirms H  ∧  H’, for any hypothesis H’ whatsoever. Assuming that the anomalous perihelion of Mercury confirms the general theory of relativity GTR (Earman 1992), it also confirms the conjunction of GTR and, say, that there is life on Mars—which seems to be wrong. Similarly, if E HD-confirms H , then E  ∨  E’ HD-confirms H , for any evidential proposition E’ whatsoever. For instance, the disjunctive proposition of the anomalous perihelion of Mercury or the moon’s being made of cheese HD-confirms GTR (Grimes 1990, Moretti 2004).

Another worry with HD-confirmation is that it is not clear how it should be applied to statistical hypotheses that do not entail anything that is not probabilistic, and hence they entail nothing that is observable (see, however, Albert 1992). The treatment of statistical hypotheses is no problem for probabilistic theories of confirmation, which we will turn to now.

4. Inductive Logic

For overview articles see Fitelson (2005) and Hawthorne (2005).

a. Kolmogorov’s Axiomatization

Before we turn to inductive logic, let us define the notion of probability as it was axiomatized by Kolmogorov (1933; 1956).

Let W be a non-empty set (of outcomes or possibilities), and let A be a field over W , that is, a set of subsets of W that contains the whole set W and is closed under complementation (with respect to W ) and finite unions. That is, A is a field over W if and only if A is a set of subsets of W such that

(i) W ∈ A (ii) if A ∈ A , then ( W\ A ) = – A ∈ A (iii) if A ∈ A and B ∈ A , then ( A ∪  B ) ∈ A

where “W\A” is the complement of A with respect to W. If (iii) is strengthened to

(iv) if A 1 ∈ A , … A n ∈ A , …, then ( A 1 ∪…∪ A n ∪…) ∈ A ,

so that A is closed under countable (and not only finite) unions, A is called a σ-field over W .

A function Pr : A → ℜ from the field A over W into the real numbers ℜ is a ( finitely additive ) probability measure on A if and only if it is a non-negative, normalized, and (finitely) additive measure; that is, if and only if for all A , B ∈ A

(K1) Pr ( A ) ≥ 0 (K2) Pr ( W ) = 1 (K3) if A ∩ B = ∅, then Pr ( A ∪  B ) = Pr ( A ) + Pr ( B )

The triple < W , A , Pr > with W a non-empty set, A a field over W , and Pr a probability measure on A is called a ( finitely additive ) probability space . If A is a σ-field over W and Pr : A → ℜ additionally satisfies

(K4) if A 1 ⊇ A 2 ⊇ … ⊇ A n … is a decreasing sequence of elements of A , i.e. A 1 ∈ A , … A n ∈ A , …, such that A 1 ∩ A 2 ∩…∩ A n ∩… = ∅, then lim n→∞ Pr ( A n ) = 0,

Pr is a σ-additive probability measure on A and < W , A , Pr > is a σ-additive probability space (Kolmogorov 1933; 1956, ch. 2). (K4) asserts that

lim n→∞ Pr ( A n ) = Pr ( A 1 ∩ A 2 ∩…∩ A n ∩…) = Pr (∅) = 0

for a decreasing sequence of elements of A . Given (K1-3), (K4) is equivalent to

(K5) if A 1 ∈ A , … A n ∈ A , …, and if A i ∩ A j = ∅ for all natural numbers i , j with i ≠ j , then Pr ( A 1 ∪…∪ A n ∪…) = Pr ( A 1 ) + … + Pr ( A n ) + …

A probability measure Pr : A → ℜ on A is regular just in case Pr ( A ) > 0 for every non-empty A ∈ A . Let < W , A , Pr > be a probability space, and define A* to be the set of all A ∈ A that have positive probability according to Pr , that is, A* = { A ∈ A : Pr ( A ) > 0}. The conditional probability measure Pr (•|-): A x A* → ℜ on A (based on the unconditional probability measure Pr ) is defined for all A ∈ A and B ∈ A* by the fraction

(K6) Pr ( A | B ) = Pr ( A ∩ B )/ Pr ( B )

(Kolmogorov 1933; 1956, ch. 1, §4). The domain of the second argument place of Pr (•|-) has to be restricted to A* , since the fraction Pr ( A ∩ B )/ Pr ( B ) is not defined when Pr ( B ) = 0. Note that Pr (•| B ): A → ℜ is a probability measure on A , for every B ∈ A* .

Here are some immediate consequences of the Kolmogorov axioms and the definition of conditional probability. For every probability space < W , A , Pr > and all A , B ∈ A ,

  • Law of Negation : Pr (- A )= 1 – Pr ( A )
  • Law of Conjunction : Pr ( A ∩ B ) = Pr ( B )• Pr ( A | B ) whenever Pr ( B ) > 0
  • Law of Disjunction : Pr ( A ∪ B ) = Pr ( A ) + Pr ( B ) – Pr ( A ∩ B )
  • Law of Total Probability : Pr ( B ) = Σ i Pr ( B | A i )• Pr ( A i ),

where the A i form a countable partition of W , i.e. A 1 , … A n , … is a sequence of mutually exclusive ( A i ∩ A j = ∅ for all i , j with i ≠ j ) and jointly exhaustive ( A 1 ∪…∪ A n ∪… = W ) elements of A . A special case of the Law of Total Probability is

Pr ( B ) = Pr ( B | A )• Pr ( A ) + Pr ( B |- A )• Pr (- A ).

Finally the definition of conditional probability is easily turned into

Bayes’s Theorem : Pr ( A | B ) = Pr ( B | A )• Pr ( A )/ Pr ( B ) = Pr ( B | A )• Pr ( A )/[ Pr ( B | A )• Pr ( A ) + Pr ( B |- A )• Pr (- A )] = Pr ( B | A )• Pr ( A )/Σ i Pr ( B | A i )• Pr ( A i ),

where the A i form a countable partition of W . The important role played by Bayes’s Theorem (in combination with some principle linking objective chances and subjective probabilities) for confirmation will be discussed below. For more on Bayes’s Theorem see Joyce (2003).

The names of the first three laws above indicate that probability measures can also be defined on formal languages. Instead of defining probability on a field A over some non-empty set W , we can take its domain to be a formal language L , that is, a set of (possibly open) well-formed formulas that contains the tautological sentence τ (corresponding to the whole set W ) and is closed under negation ¬ (corresponding to complementation) and disjunction ∨ (corresponding to finite union). That is, L is a language if and only if L is a set of well-formed formulas such that

(i) τ ∈ L (ii) if α ∈ L , then ¬α ∈ L (iii) if α ∈ L and β ∈ L , then (α∨ β) ∈ L

If L additionally satisfies

(iv) if α ∈ L , then ∃ xα ∈ L ,

L is called a quantificational language.

A function Pr : L → ℜ from the language L into the reals ℜ is a probability on L if and only if for all α , β ∈ L ,

(L0) Pr ( α ) = Pr ( β ) if α is logically equivalent (in the sense of classical logic CL) to β (L1) Pr ( α ) ≥ 0, (L2) Pr ( τ ) = 1, (L3) Pr (α∨ β) = Pr ( α ) + Pr ( β ), if α∧ β is logically inconsistent (in the sense of CL).

(L0) is not necessary, if (L2) is strengthened to: (L2 + ) Pr ( α ) = 1, if α is logically valid. If L is a quantificational language with an individual constant “ a i ” for each individual a i in the envisioned countable domain, i = 1, 2, …, n , …, and Pr : L → ℜ additionally satisfies

(L4) lim n→∞ Pr ( α [ a 1 / x ]∧…∧ α [ a n / x ]) = Pr (∀ xα ),

Pr is called a Gaifman-Snir probability. Here “ α [ a i / x ]” results from “ α [ x ]” by substituting the individual constant “ a i ” for all occurrences of the individual variable “ x ” in “α .” “ x ” in “ α [ x ]” indicates that “ x ” occurs free in “ α ,” that is to say, “ x ” is not bound in “ α ” by a quantifier like it is in “∀ xα .”

Given (L0-3) and the restriction to countable domains, (L4) is equivalent to

(L5) lim n→∞ Pr ( α [ a 1 / x ]∨…∨ α [ a n / x ]) = sup{ Pr ( α [ a 1 / x ]∨…∨ α [ a n / x ]): n ∈ N } = Pr (∃ xα ),

where the equation on the right-hand side is the slightly more general definition adopted by Gaifman & Snir (1982, 501). A probability Pr : L → ℜ on L is regular just in case Pr ( α ) > 0 for every consistent α ∈ L . For L* = { α ∈ L : Pr ( α ) > 0} the conditional probability Pr (•|-): L x L* → ℜ on L (based on Pr ) is defined for all α ∈ L and all β ∈ L* by the fraction

(L6) Pr ( α | β ) = Pr ( α ∧  β )/ Pr ( β ).

As before, Pr (•| β ): L → ℜ is a probability on L , for every β ∈ L .

Each probability Pr on a language L induces a probability space < W , A , Pr* > with W being the set Mod of all models for L , A being the smallest σ-field containing the field { Mod ( α ) ⊆ Mod : α ∈ L }, and Pr* being the unique σ-additive probability measure on A such that Pr* ( Mod ( α )) = Pr ( α ) for all α ∈ L . (A model for a language L with an individual constant for each individual in the envisioned domain can be represented by a function w : L → {0,1} from L into the set {0,1} such that for all α , β ∈ L : w (¬α) = 1 – w ( α ), w ( α ∨ β ) = max{ w ( α ), w ( β )}, and w (∃ xα ) = max{ w ( α [ a / x ]): “ a ” is an individual constant of L }.)

Some authors take conditional probability Pr (• given -) as primitive and define probability as Pr (• given W ) or Pr (• given τ ) (see Hájek 2003b). For more on probability and its interpretations see Hájek (2003a), Hájek & Hall (2000), Fitelson & Hájek & Hall (2005).

b. Logical Probability and Degree of Confirmation

There has always been a close connection between probability and induction. Probability was thought to provide the basis for an inductive logic. Early proponents of a logical conception of probability include Keynes (1921/1973) and Jeffreys (1939/1967). However, by far the biggest effort to construct an inductive logic was undertaken by Carnap in his Logical Foundations of Probability (1950/1962). Carnap starts from a simple formal language with countably many individual constants (such as “Carl Gustav Hempel”) denoting individuals (namely, Carl Gustav Hempel) and finitely many monadic predicates (such as “is a great philosopher of science”) denoting properties (namely, being a great philosopher of science), but not relations (such as being a better philosopher of science than). Then he defines a state-description to be a complete description of each individual with respect to all the predicates. For instance, if the language contains three individual constants “ a ,” “ b ,” and “ c ” (denoting the individuals a , b , and c , respectively), and four monadic predicates “ P ,” “ Q ,” “ R ,” and “ S ” (denoting the properties P ,   Q ,   R , and S , respectively), then there are 2 3•4 state descriptions of the form:

± Pa  ∧ ± Qa  ∧ ± Ra  ∧ ± Sa  ∧ ± Pb  ∧ ± Qb  ∧ ± Rb  ∧ ± Sb  ∧ ± Pc  ∧ ± Qc  ∧ ± Rc  ∧ ± Sc ,

where “±” indicates that the predicate in question is either unnegated as in “ Pa ” or negated as in “¬ Pa .” That is, a state description determines for each individual constant “ a ” and each predicate “ P ” whether or not Pa . Based on the notion of a state description, Carnap then introduces the notion of a structure description, a maximal disjunction of state descriptions which can be obtained from each other by uniformly substituting individual constants for each other. In the above example there are, among others, the following two structure descriptions:

( Pa  ∧  Qa  ∧  Ra  ∧ Sa ) ∧ ( Pb ∧  Qb  ∧  Rb  ∧  Sb ) ∧ ( Pc  ∧  Qc  ∧  Rc  ∧  Sc )
(( Pa  ∧  Qa ∧ Ra ∧ Sa ) ∧ ( Pb ∧ Qb ∧ Rb ∧ ¬ Sb ) ∧ ( Pc ∧ Qc ∧ ¬ Rc ∧ Sc )) ∨(( Pb ∧ Qb ∧ Rb ∧ Sb ) ∧ ( Pa ∧ Qa ∧ Ra ∧ ¬ Sa ) ∧ ( Pc ∧ Qc ∧ ¬ Rc ∧ Sc )) ∨(( Pc ∧ Qc ∧ Rc ∧ Sc ) ∧ ( Pb ∧ Qb ∧ Rb ∧ ¬ Sb ) ∧ ( Pa ∧ Qa ∧ ¬ Ra ∧ Sa )) ∨(( Pa ∧ Qa ∧ Ra ∧ Sa ) ∧ ( Pc ∧ Qc ∧ Rc ∧ ¬ Sc ) ∧ ( Pb ∧ Qb ∧ ¬ Rb ∧ Sb ))

So a structure description is a disjunction of one or more state descriptions. It says how many individuals satisfy the maximally consistent predicates (Carnap calls them Q -predicates) that can be formulated in the language. It may, but need not, say which individuals. The first structure description above says that all three individuals a , b , and c have the maximally consistent property Px  ∧ Qx  ∧ Rx  ∧ Sx . The second structure description says that exactly one individual has the maximally consistent property Px  ∧ Qx  ∧ Rx  ∧ Sx , exactly one individual has the maximally consistent property Px  ∧ Qx  ∧ Rx  ∧ ¬ Sx , and exactly one individual has the maximally consistent property Px  ∧ Qx  ∧ ¬ Rx  ∧ Sx . It does not say which of a , b , and c has the property in question.

Each function that assigns non-negative weights w i to the state descriptions z i whose sum Σ i w i equals 1 induces a probability on the language in question. Carnap then argues—by postulating various principles of symmetry and invariance—that each of the finitely many structure (not state) descriptions s j should be assigned the same weight v j such that their sum Σ j v j is equal to 1. This weight v j should then be divided equally among the state descriptions whose disjunction constitutes the structure description s j . The probability so obtained is Carnap’s favorite m * , which, like any other probability, induces what Carnap calls a confirmation function (and what we have called a conditional probability): c * ( H , E ) = m * ( H  ∧ E )/ m * ( E )

(In case the language contains countably infinitely many individual constants, some structure descriptions are disjunctions of infinitely many state descriptions. These state descriptions cannot all get the same positive weight. Therefore Carnap considers the limit of the measures m * n for the languages L n containing the first n individual constants in some enumeration of the individual constants, provided this limit exists.)

c * allows learning from experience in the sense that

c * (the n + 1st individual is P , k of the first n individuals are P ) > c * (the n + 1st individual is P , τ )

= m * (the n + 1st individual is P ),

where τ is the tautological sentence. If we assigned equal weights to the state descriptions instead of the structure descriptions, no such learning would be possible. Let us check that c * allows learning from experience for n = 2 in a language with three individual constants “ a ,” “ b ,” and “ c ” and one predicate “ P .” There are eight state descriptions and four structure descriptions:

z 1 = Pa  ∧ Pb  ∧ Pc s 1 = Pa  ∧ Pb  ∧ Pc : z 2 = Pa  ∧ Pb  ∧ ¬ Pc All three individuals are P . z 3 = Pa  ∧ ¬ Pb  ∧ Pc s 2 = ( Pa  ∧ Pb  ∧ ¬ Pc )∨( Pa  ∧ ¬ Pb  ∧ Pc )∨(¬ Pa  ∧ Pb  ∧ Pc ): z 4 = Pa  ∧ ¬ Pb  ∧ ¬ Pc Exactly two individuals are P . z 5 = ¬ Pa  ∧ Pb  ∧ Pc s 3 = ( Pa  ∧ ¬ Pb  ∧ ¬ Pc )∨(¬ Pa  ∧ Pb  ∧ ¬ Pc )∨(¬ Pa  ∧ ¬ Pb  ∧ Pc ): z 6 = ¬ Pa  ∧ Pb  ∧ ¬ Pc Exactly one individual is P . z 7 = ¬ Pa  ∧ ¬ Pb  ∧ Pc s 4 = ¬ Pa  ∧ ¬ Pb  ∧ ¬ Pc : z 8 = ¬ Pa  ∧ ¬ Pb  ∧ ¬ Pc None of the three individuals is P .

Each structure description s 1 – s 4 gets weight v j = 1/4 ( j = 1, …, 4).

s 1 = z 1 : v 1 = m * ( Pa  ∧ Pb  ∧ Pc ) = 1/4 s 2 = z 2 ∨ z 3 ∨ z 5 : v 2 = m * (( Pa  ∧ Pb  ∧ ¬ Pc )∨( Pa  ∧ ¬ Pb  ∧ Pc )∨(¬ Pa  ∧ Pb  ∧ Pc )) = 1/4 s 3 = z 4 ∨ z 6 ∨ z 7 : v 3 = m * (( Pa  ∧ ¬ Pb  ∧ ¬ Pc )∨(¬ Pa  ∧ Pb  ∧ ¬ Pc )∨(¬ Pa  ∧ ¬ Pb  ∧ Pc )) = 1/4 s 4 = z 8 : v 4 = m * (¬ Pa  ∧ ¬ Pb  ∧ ¬ Pc ) = 1/4

These weights are equally divided among the state descriptions z 1 – z 8 .

z 1 : w 1 = m * ( Pa  ∧ Pb  ∧ Pc ) = 1/4 z 5 : w 5 = m * (¬ Pa  ∧ Pb ∧ Pc ) = 1/12 z 2 : w 2 = m * ( Pa  ∧ Pb  ∧ ¬ Pc ) = 1/12 z 6 : w 6 = m * (¬ Pa  ∧ Pb  ∧ ¬ Pc ) = 1/12 z 3 : w 3 = m * ( Pa  ∧ ¬ Pb  ∧ Pc ) = 1/12 z 7 : w 7 = m * (¬ Pa  ∧ ¬ Pb  ∧ Pc ) = 1/12 z 4 : w 4 = m * ( Pa  ∧ ¬ Pb  ∧ ¬ Pc ) = 1/12 z 8 : w 8 = m * (¬ Pa  ∧ ¬ Pb  ∧ ¬ Pc ) = 1/4

Let us now compute the values of the confirmation function c * .

c * (the 3 rd individual is P , 2 of the first 2 individuals are P ) =

= m * (the 3 rd individual is P,  the first 2 individuals are P )/ m * (the first 2 individuals are P ) = m * (the first 3 individuals are P )/ m * (the first 2 individuals are P ) = m * ( Pa  ∧ Pb  ∧ Pc )/ m * ( Pa  ∧ Pb ) = (1/4)/(1/4 + 1/12) = 3/4 > 1/2 = m * ( Pc ) = c* (the 3 rd individual is P )

The general formula is (Carnap 1950/1962, 568)

c * (the n + 1st individual is P , k of the first n individuals are P ) = ( k + ϖ)/( n + κ) = ( k + (ϖ/κ)•κ)/( n + κ),

where ϖ is the “logical width” of the predicate “ P ” (Carnap 1950/1962, 127), that is, the number of maximally consistent properties or Q -predicates whose disjunction is logically equivalent to “ P ” (ϖ = 1 in our example: “ P” ). κ = 2 π is the total number of Q -predicates (κ = 2 1 = 2 in our example: “ P ” and “¬ P” ) with π being the number of primitive predicates (π = 1 in our example: “ P” ). This formula is dependent on the logical factor ϖ/κ of the “relative width” of the predicate “ P ,” and the empirical factor k / n of the relative frequency of P s.

Later on, Carnap (1952) generalizes this to a whole continuum of confirmation functions C λ where the parameter λ is inversely proportional to the impact of evidence. λ specifies how the confirmation function C λ weighs between the logical factor ϖ/κ and the empirical factor k / n . For λ = ∞, C λ is independent of the empirical factor k / n : C λ (the n + 1st individual is P , k of the first n individuals are P ) = ϖ/κ (Carnap 1952, §13). For λ = 0, C λ is independent of the logical factor ϖ/κ: C λ (the n + 1st individual is P , k of the first n individuals are P ) = k / n and thus coincides with what is known as the straight rule (Carnap 1952, §14). c * is the special case with λ = κ (Carnap 1952, §15). The general formula is (Carnap 1952, §9)

C λ (the n + 1st individual is P , k of the first n individuals are P ) = ( k + λ/κ)/( n + λ).

In his (1963) Carnap slightly modifies the set up and considers families of monadic predicates {“ P 1 ,” …, “ P p “} like the family of color predicates {“red,” “green,” …, “blue”}. For a given family {“ P 1 ,” …, “ P p “} and each individual constant “ a ” there is exactly one predicate “ P j ” such that P j a . Families thus generalize {“ P ,” “¬ P “} and correspond to random variables. Given his axioms (including A15 ), Carnap (1963, 976) can show that for each family {“ P 1 ,” …, “ P p “}, p ≥ 2,

C λ (the n + 1st individual is P j , k of the first n individuals are P j ) = ( k + λ/ p )/( n + λ).

One of the peculiar features of Carnap’s systems is that universal generalizations get degree of confirmation (alias conditional probability) 0. Hintikka (1966) generalizes Carnap’s project in this respect. For a neo-Carnapian approach see Maher (2004a).

Of more interest to us is Carnap’s discussion of “the controversial problem of the justification of induction ” (1963, 978, emphasis in the original). For Carnap, the justification of induction boils down to justifying the axioms specifying a set of confirmation functions. The “reasons are based upon our intuitive judgments concerning inductive validity”. Therefore “[i]t is impossible to give a purely deductive justification of induction,” and these “reasons are a priori” (Carnap 1963, 978). So according to Carnap, induction is justified by appeals to intuition about inductive validity. We will see below that Goodman, who is otherwise very skeptical about the prospects of Carnap’s project, shares this view of the justification of induction. The view also seems to be widely accepted among current Bayesian confirmation theorists and their desideratum/explicatum approach (see Fitelson 2001 for an example). [According to Carnap (1962), an explication is “the transformation of an inexact, prescientific concept, the explicandum , into a new exact concept, the explicatum .” (Carnap 1962, 3) The desideratum/explicatum approach consists in stating various “intuitively plausible desiderata” the explicatum is supposed to satisfy. Proposals for explicata that do not satisfy these desiderata are rejected. This appeal to intuitions is fine as long as we are engaging in conceptual analysis. However, contemporary confirmation theorists also sell their accounts as normative theories. Normative theories are not justified by appeal to intuitions. They are justified relative to a goal by showing that the norms in question further the goal at issue. See section 7.]

First, however, we will have a look at what Carnap has to say about Hempel’s conditions of adequacy.

c. Absolute and Incremental Confirmation

As we saw in the preceding section, one of Carnap’s goals was to define a quantitative notion of confirmation, explicated by a confirmation function in the manner indicated above. It is important to note that this quantitative concept of confirmation is a relation between two propositions H and E (three, if we include the background information B ), a number r , and a confirmation function c . In chapters VI and VII of his (1950/1962) Carnap discusses comparative and qualitative concepts of confirmation. The explicans for qualitative confirmation he offers is that of positive probabilistic relevance in the sense of some logical probability m . That is, E qualitatively confirms H in the sense of some logical measure m just in case E is positively relevant to H in the sense of m , that is,

m ( H ∧ E ) > m ( H )• m ( E ).

If both m ( H ) and m ( E ) are positive—which is the case whenever both H and E are not logically false, because Carnap assumes m to be regular—this is equivalently expressed by the following inequality:

c ( H , E ) > c ( H , τ ) = m ( H )

So provided both H and E have positive probability, E confirms H if and only if E raises the conditional probability (degree of confirmation in the sense of c ) of H . Let us call this concept incremental confirmation. Again, note that qualitative confirmation is a relation between two propositions H and E , and a conditional probability or confirmation function c . Incremental confirmation, or positive probabilistic relevance, is a qualitative notion. It says whether E raises the conditional probability (degree of confirmation in the sense of c ) of H . Its natural quantitative counterpart measures how much E raises the conditional probability of H . This measure may take several forms which will be discussed below.

Incremental confirmation is different from the concept of absolute confirmation on which it is based. The quantitative explication of absolute confirmation is given by one of Carnap’s confirmation functions c . The qualitative counterpart is to say that E absolutely confirms H in the sense of c if and only if the degree of absolute confirmation of H by E is sufficiently high, c ( H , E ) > r . So Carnap, who offers degree of absolute confirmation c ( H , E ) as explication for the quantitative notion of confirmation of H by E , and who offers incremental confirmation or positive probabilistic relevance between E and H as explication of the qualitative notion of confirmation, is, to say the least, not fully consistent in his terminology. He switches between absolute confirmation (for the quantitative notion) and incremental confirmation (for the qualitative notion). This is particularly peculiar, because Carnap (1950/1962, §87) is the locus classicus for the discussion of Hempel’s conditions of adequacy mentioned in section 2b.

d. Carnap’s Analysis of Hempel’s Conditions

In analyzing the special consequence condition, Carnap argues that

Hempel has in mind as explicandum the following relation: “the degree of confirmation of H by E is greater than r , where r is a fixed value, perhaps 0 or 1/2 (Carnap 1962, 475; notation adapted);

that is, the qualitative concept of absolute confirmation. Similarly when discussing the special consistency condition:

Hempel regards it as a great advantage of any explicatum satisfying [a more general form of the special consistency condition 3] “that it sets a limit, so to speak, to the strength of the hypotheses which can be confirmed by given evidence” … This argument does not seem to have any plausibility for our explicandum, (Carnap 1962, 477; emphasis in original)

which is the qualitative concept of incremental confirmation,

[b]ut it is plausible for the second explicandum mentioned earlier: the degree of [absolute] confirmation exceeding a fixed value r . Therefore we may perhaps assume that Hempel’s acceptance of [a more general form of 3] is due again to an inadvertent shift to the second explicandum. (Carnap 1962, 477-478)

Carnap’s analysis can be summarized as follows. In presenting his first three conditions of adequacy, Hempel was mixing up two distinct concepts of confirmation, two distinct explicanda in Carnap’s terminology, namely,

(i) the qualitative concept of incremental confirmation (positive probabilistic relevance) according to which E confirms H if and only if E (has non-zero probability and) increases the degree of absolute confirmation (conditional probability) of H , and (ii) the qualitative concept of absolute confirmation according to which E confirms H if and only if the degree of absolute confirmation (conditional probability) of H by E is greater than some value r .

Hempel’s second and third condition, 2 and 3, respectively, hold true for the second explicandum (for r ≥ 1/2), but they do not hold true for the first explicandum. On the other hand, Hempel’s first condition holds true for the first explicandum, but it does so only in a qualified form (Carnap 1950/1962, 473)—namely only if E is not assigned probability 0, and H is not already assigned probability 1.

This, however, means that, according to Carnap’s analysis, Hempel first had in mind the explicandum of incremental confirmation for the entailment condition. Then he had in mind the explicandum of absolute confirmation for the special consequence and the special consistency conditions 2 and 3, respectively. And then, when Hempel presented the converse consequence condition, he got completely confused and had in mind still another explicandum or concept of confirmation (neither the first nor the second explicandum satisfies the converse consequence condition). This is not a very charitable analysis. It is not a good one either, because the qualitative concept of absolute confirmation, which Hempel is said to have had in mind for 2 and 3, also satisfies 1—and it does so without the second qualification that H be assigned a probability smaller than 1. So there is no need to accuse Hempel of mixing up two concepts of confirmation. Indeed, the analysis is bad, because Carnap’s reading of Hempel also leaves open the question of what the third explicandum for the converse consequence condition might have been. For a different analysis of Hempel’s conditions and a corresponding logic of confirmation see Huber (2007a).

5. The New Riddle of Induction and the Demise of the Syntactic Approach

According to Goodman (1983, ch. III), the problem of justifying induction boils down to defining valid inductive rules, and thus to a definition of confirmation. The reason is that an inductive inference is justified by conformity to an inductive rule, and inductive rules are justified by their conformity to accepted inductive practices. One does not have to follow Goodman in this respect, however, in order to appreciate his insight that whether a hypothesis is confirmed by a piece of evidence depends on features other than their syntactical form.

In his (1946) he asks us to suppose a marble has been drawn from a certain bowl on each of the ninety-nine days up to and including VE day, and that each marble drawn was red. Our evidence can be described by the conjunction “Marble 1 is red and … and marble 99 is red,” in symbols: Ra 1 ∧ …∧ Ra 99 . Whatever the details of our theory of confirmation, this evidence will confirm the hypothesis “Marble 100 is red,” R 100 . Now consider the predicate S = “is drawn by VE day and is red, or is drawn after VE day and is not red.” In terms of S rather than R our evidence is described by the conjunction “Marble 1 is drawn by VE day and is red or it is drawn after VE day and is not red, and …, and marble 99 is drawn by VE day and is red or it is drawn after VE day and is not red,” Sa 1 ∧ …∧ Sa 99 . If our theory of confirmation relies solely on syntactical features of the evidence and the hypothesis, our evidence will confirm the conclusion “Marble 100 is drawn by VE and is red, or it is drawn after VE day and is not red,” S 100 . But we know that the next marble will be drawn after VE day. Given this, S 100 is logically equivalent to the negation of R 100 . So one and the same piece of evidence can be used to confirm a hypothesis and its negation, which is certainly absurd.

One might object to this example that the two formulations do not describe one and the same piece of evidence after all. The first formulation in terms of R should be the conjunction “Marble 1 is drawn by VE day and is red, and …, and marble 99 is drawn by VE day and is red,” ( Da 1 ∧ R a 1 )∧ …∧ ( Da 99 ∧ R a 99 ). The second formulation in terms of S should be “Marble 1 is drawn by VE day and it is drawn by VE day and red or drawn after VE and not red, and …, and marble 99 is drawn by VE day and it is drawn by VE day and red or drawn after VE day and not red,” ( Da 1 ∧ Sa 1 )∧ …∧ ( Da 99 ∧ Sa 99 ). Now the two formulations really describe one and the same piece of evidence in the sense of being logically equivalent. But then the problem is whether any interesting statement can ever be confirmed. The syntactical form of the evidence now seems to confirm Da 100 ∧ R a 100 , equivalently Da 100 ∧ Sa 100 . But we know that the next marble is drawn after VE day; that is, we know ¬ D a 100 . That the future resembles the past in all respects is thus false. That it resembles the past in some respects is trivial. The new riddle of induction is the question in which respects the future resembles the past, and in which it does not.

It has been suggested that the puzzling character of Goodman’s example is due to its mentioning a particular point of time, namely, VE day. A related reaction has been that gerrymandered predicates, whether or not they involve a particular point of time, cannot be used in inductive inferences. But there are plenty of similar examples (Stalker 1994), and it is commonly agreed that Goodman has succeeded in showing that a purely syntactical definition of (degree of) confirmation won’t do. Goodman himself sought to solve his new riddle of induction by distinguishing between “projectible” predicates such as “red” and unprojectible predicates such as “is drawn by VE day and is red, or is drawn after VE day and is not red.” The projectibility of a predicate is in turn determined by its entrenchment in natural language. This comes very close to saying that the projectible predicates are the ones that we do in fact project (that is, use in inductive inferences). (Quine’s 1969 “natural kinds” are special cases of what can be described by projectible predicates.)

6. Bayesian Confirmation Theory

Bayesian confirmation theory is by far the most popular and elaborated theory of confirmation. It has its origins in Rudolf Carnap’s work on inductive logic (Carnap 1950/1962), but relieves itself from defining confirmation in terms of logical probability. More or less any subjective degree of belief function satisfying the Kolmogorov axioms is considered to be an admissible probability measure.

a. Subjective Probability and the Dutch Book Argument

In Bayesian confirmation theory, a probability measure on a field of propositions is usually interpreted as an agent’s degree of belief function. There is disagreement about how broad the class of admissible probability measures is to be construed. Some objective Bayesians such as the early Carnap insist that the class consist of a single logical probability measure, whereas subjective Bayesians admit any probability measure. Most Bayesians will be somewhere in the middle of this spectrum when it comes to the question which particular degree of belief functions it is reasonable to adopt in a particular situation. However, they will agree that from a purely logical point of view any (regular) probability measure is acceptable. The standard argument for this position is the Dutch Book Argument.

The Dutch Book Argument starts with the assumption that there is a link between subjective degrees of belief and betting ratios. It is further assumed that it is pragmatically defective to accept a series of bets which guarantees a sure loss, that is, a Dutch Book. By appealing to the Dutch Book Theorem that an agent’s betting ratios satisfy the probability axioms just in case they do not make the agent vulnerable to such a Dutch Book, it is inferred that it is epistemically defective to have degrees of belief that violate the probability axioms. The strength of this inference is, of course, dependent on the link between degrees of belief and betting ratios. If this link is identity—as it is when one defines degrees of belief as betting ratios—the distinction between pragmatic and epistemic defectiveness disappears, and the Dutch Book Argument is a deductively valid argument. But this comes at the cost of rendering the link between degrees of belief and betting ratios implausible. If the link is weaker than identity—as it is when degrees of belief are only measured by betting ratios—the Dutch Book Argument is not deductively valid anymore, but it has more plausible assumptions.

The pragmatic nature of the Dutch Book Argument has led to so called depragmatized versions. A depragmatized Dutch Book Argument starts with a link between degrees of belief and fair betting ratios, and it assumes that it is epistemically defective to consider a series of bets that guarantees a sure loss as fair . Using the depragmatized Dutch Book Theorem that an agent’s fair betting ratios obey the probability calculus if and only if the agent never considers a Dutch Book as fair, it is then inferred that it is epistemically defective to have degrees of belief that do not obey the probability calculus. The thesis that an agent’s degree of belief function should obey the probability calculus is called probabilism. For more on the Dutch Book Argument see Hájek (2005) and Vineberg (2005). For a different justification of probabilism in terms of the accuracy of degrees of belief see Joyce (1998).

b. Confirmation Measures

Let A be a field of propositions over some set of possibilities W , let H , E , B be propositions from A , and let Pr be a probability measure on A . We already know that H is incrementally confirmed by E relative to B in the sense of Pr if and only if Pr( H ∩ E | B ) > Pr( H | B )•Pr( E | B ), and that this is a relation between three propositions and a probability space whose field contains the propositions. The central notion in Bayesian confirmation theory is that of a confirmation measure. A real valued function c : P → ℜ from the set P of all probability spaces < W , A , Pr > into the reals ℜ is a confirmation measure if and only if for every probability space < W , A , Pr > and all H , E , B ∈ A :

c ( H , E , B ) > 0 ↔  Pr ( H ∩ E | B ) > Pr ( H | B )• Pr ( E | B ) c ( H , E , B ) = 0 ↔  Pr ( H ∩ E | B ) = Pr ( H | B )• Pr ( E | B ) c ( H , E , B ) < 0 ↔  Pr ( H ∩ E | B ) < Pr ( H | B )• Pr ( E | B )

The six most popular confirmation measures are (what I now call) the Carnap measure c (Carnap 1962), the distance measure d (Earman 1992), the log-likelihood or Good-Fitelson measure l (Fitelson 1999 and Good 1983), the log-ratio or Milne measure r (Milne 1996), the Joyce-Christensen measure s (Christensen 1999, Joyce 1999, ch. 6), and the relative distance measure z (Crupi & Tentori & Gonzalez 2007) .

c ( H , E , B ) = Pr ( H ∩ E | B ) – Pr ( H | B )• Pr ( E | B ) d ( H , E , B ) = Pr ( H | E ∩ B ) – Pr ( H | B ) l ( H , E , B ) = log [ Pr ( E | H ∩ B )/ Pr ( E |- H ∩ B )] r ( H , E , B ) = log [ Pr ( H | E ∩ B )/ Pr ( H | B )] s ( H , E , B ) = Pr ( H | E ∩ B ) – Pr ( H |- E ∩ B ) z ( H , E , B ) = [ Pr ( H | E ∩ B ) – Pr ( H | B )]/ Pr (- H | B ) if Pr ( H | E ∩ B ) ≥ Pr ( H | B ) = [ Pr ( H | E ∩ B ) – Pr ( H | B )]/ Pr ( H | B ) if Pr ( H | E ∩ B ) < Pr ( H | B )

(Mathematically speaking, there are uncountably many confirmation measures.) For an overview article, see Eells (2005). Book length expositions are Earman (1992) and Howson & Urbach (1989/2005).

c. Some Success Stories

Bayesian confirmation theory captures the insights of Popper’s falsificationism and hypothetico-deductive confirmation. Suppose evidence E falsifies hypothesis H relative to background information B in the sense that B ∩ H ∩ E = ∅. Then Pr ( E ∩ H | B ) = 0, and so Pr ( E ∩ H | B ) = 0 < Pr ( H | B )• Pr ( E | B ), provided both Pr ( H | B ) and Pr ( E | B ) are positive. So as long as H is not already known to be false (in the sense of having probability 0 conditional on B ) and E is a possible outcome (one with positive probability conditional on B ), falsifying E incrementally disconfirms H relative to B in the sense of Pr .

Remember, E HD-confirms H relative to B if and only if the conjunction of H and B logically implies E (in some suitable way). In this case Pr ( E ∩ H | B ) = Pr ( H | B ), provided Pr ( B ) > 0. Hence as long as Pr( E | B ) < 1, we have

Pr( E ∩ H | B ) > Pr( H | B )•Pr( E | B ),

which means that E incrementally confirms H relative to B in the sense of Pr (Kuipers 2000).

If the conjunction of H and B logically implies E , but E is already known to be true in the sense of having probability 1 conditional on B , E does not incrementally confirm H relative to B in the sense of Pr . In fact, no E which receives probability 1 conditional on B can incrementally confirm any H whatsoever. This is the so called problem of old evidence (Glymour 1980). It is a special case of a more general phenomenon. The following is true for many confirmation measures ( d , l , and r , but not s ). If H is positively relevant to E given B , the degree to which E incrementally confirms H relative to B is greater, the smaller the probability of E given B . Similarly, if H is negatively relevant for E given B , the degree to which E disconfirms H relative to B is greater, the smaller the probability of E given B (Huber 2005a). If Pr( E | B ) = 1 we have the problem of old evidence. If Pr( E | B ) = 0 we have the above mentioned problem that E cannot disconfirm hypotheses it falsifies.

Some people simply deny that the problem of old evidence is a problem. Bayesian confirmation theory, it is said, does not explicate whether and how much E confirms H relative to B . It explicates whether E is additional evidence for H relative to B , and how much additional confirmation E provides for H relative to B . If E already has probability 1 conditional on B , it is part of the background knowledge, and so does not provide any additional evidence for H . More generally, the more we already believe in E , the less additional (dis)confirmation this provides for positively (negatively) relevant H . This reply does not work in case E is a falsifier of H with probability 0 conditional on B , for in this case Pr ( H | E ∩ B ) is not defined. It also does not agree with the fact that the problem of old evidence is taken seriously in the literature on Bayesian confirmation theory (Earman 1992, ch. 5). An alternative view (Joyce 1999, ch. 6) sees several different, but equally legitimate, concepts of confirmation at work. The intuition behind one concept is the reason for the implausibility of the explication of another.

In contrast to hypothetico-deductivism, Bayesian confirmation theory has no problem with assigning degrees of incremental confirmation to statistical hypotheses. Such alternative statistical hypotheses H 1 , … H n , … are taken to specify the probability of an outcome E . The probabilities Pr( E | H 1 ), …Pr( E | H n ), … are called the likelihoods of the hypotheses H i . Together with their prior probabilities Pr( H i ) the likelihoods determine the posterior probabilities of the H i via Bayes’s Theorem:

Pr( H i | E ) = Pr( E | H i )•Pr( H i )/[Σ j Pr( E | H j )•Pr( H j ) + Pr( E | H )•Pr( H )]

The so called “catchall” hypothesis H is the negation of the disjunction or union of all the alternative hypotheses H i , and so it is equivalent to -( H 1 ∪…∪ H n ∪…). It is important to note the implicit use of something like the principal principle (Lewis 1980) in such an application of Bayes’ Theorem. The probability measure Pr figuring in the above equation is an agent’s degree of belief function. The statistical hypotheses H i specify the objective chance of the outcome E as Ch i ( E ). Without a principle linking objective chances to subjective degrees of belief, nothing guarantees that the agent’s conditional degree of belief in E given H i , Pr( E | H i ), is equal to the chance of E as specified by H i , Ch i ( E ). The principal principle says that an agent’s conditional degree of belief in a proposition A given the information that the chance of A is equal to r (and no further inadmissible information) should be r , Pr( A |Ch( A ) = r ) = r . For more on the principal principle see Hall (1994), Lewis (1994), Thau (1994), as well as Briggs (2009a). Spohn (2010) shows that the principal principle is a special case of the reflection principle (van Fraassen 1984; 1995, Briggs 2009b). The latter principle says that an agent’s current conditional degree of belief in A given that her future degree of belief in A equals r should be r ,

Pr now (A|Pr later (A) = r) = r provided Pr now (Pr later (A)=r) > 0.

Bayesian confirmation theory can also handle the ravens paradox. As we have seen, Hempel thought that “ a is neither black nor a raven” confirms “All ravens are black” relative to no or tautological background information. He attributed the unintuitive character of this claim to a conflation of it and the claim that “ a is neither black nor a raven” confirms “All ravens are black” relative to our actual background knowledge A— and the fact that A contains the information that there are more non-black objects than ravens. The latter information is reflected in our degree of belief function Pr by the inequality

Pr (¬ Ba | A ) > Pr ( Ra | A ).

If we further assume that the probabilities of finding a non-black object as well as finding a raven are independent of whether or not all ravens are black,

Pr (¬ Ba |∀ x ( Rx → Bx )∧ A ) = Pr (¬ Ba | A ), Pr ( Ra |∀ x ( Rx → Bx )∧ A ) = Pr ( Ra | A ),

we can infer (when we assume all probabilities to be defined) that

Pr (∀ x ( Rx → Bx )| Ra ∧ Ba ∧ A ) > Pr (∀ x ( Rx → Bx )|¬ Ra ∧¬ Ba ∧ A ) > Pr (∀ x ( Rx → Bx )| A ).

So Hempel’s intuitions are vindicated by Bayesian confirmation theory to the extent that the above independence assumptions are plausible (or there are weaker assumptions entailing a similar result), and to the extent that he also took non-black non-ravens to confirm the ravens hypothesis relative to our actual background knowledge. For more, see Vranas (2004).

Let us finally consider the problem of irrelevant conjunction in Bayesian confirmation theory. HD-confirmation satisfies the converse consequence condition, and so has the undesirable feature that E confirms H ∧ H’ relative to B whenever E confirms H relative to B , for any H’ whatsoever. This is not true for incremental confirmation. Even if Pr( E ∧ H | B ) > Pr( E | B )•Pr( H | B ), it need not be the case that Pr( E ∧ H ∧ H’ | B ) > Pr( E | B )•Pr( H ∧ H’ | B ). However, the following special case is also true for incremental confirmation.

If H ∧ B logically implies E , then E incrementally confirms H ∧ H’ relative to B , for any H’ whatsoever (whenever the relevant probabilities are defined).

In the spirit of the last paragraph, one can, however, show that H ∧ H’ is less confirmed by E relative to B than H alone (in the sense of the distance measure d and the Good-Fitelson measure l ) if H’ is an irrelevant conjunct to H given B with respect to E in the sense that

Pr( E | H ∧ H’ ∧ B ) = Pr( E | H ∧ B )

(Hawthorne & Fitelson 2004). If H ∧ B logically implies E , then every H’ such that Pr( H ∧ H’ ∧ B ) > 0 is irrelevant in this sense. For more see Fitelson (2002), Hawthorne & Fitelson (2004), Maher (2004b).

7. Taking Stock

Let us grant that Bayesian confirmation theory adequately explicates the concept of confirmation. If so, then this is the concept scientists use when they say that the anomalous perihelion of Mercury confirms the general theory of relativity. It is also the concept more ordinary epistemic agents use when they say that, relative to what they have experienced so far, the dark clouds on the sky are evidence that it will rain soon. The question remains what happened to Hume’s problem of the justification of induction. We know—by definition—that the conclusion of an inductively strong argument is well-confirmed by its premises. But does that also justify our acceptance of that conclusion? Don’t we first have to justify our definition of confirmation before we can use it to justify our inductive inferences?

It seems we would have to, but, as Hume argued, such a justification of induction is not possible. All we could hope for is an adequate description of our inductive practices. As we have seen, Goodman took the task of adequately describing induction as being tantamount to its justification (Goodman 1983, ch. III, ascribes a similar view to Hume, which is somehow peculiar, because Hume argued that a justification of induction is impossible ). In doing so he appealed to deductive logic, which he claimed to be justified by its conformity to accepted practices of deductive reasoning. But that is not so. Deductive logic is not justified because it adequately describes our practices of deductive reasoning—it doesn’t. The rules of deductive logic are justified relative to the goal of truth preservation in all possible worlds. The reasons are that (i) in going from the premises of a deductively valid argument to its conclusion, truth is preserved in all possible worlds (this is known as soundness); and that (ii) any argument with that property is a deductively valid argument (this is known as completeness). Similarly for the rules of nonmonotonic logic, which are justified relative to the goal of truth preservation in all “normal” worlds (for normality see e.g. Koons 2005). The reason is that all and only nonmonotonically valid inferences are such that truth is preserved in all normal worlds when one jumps from the premises to the conclusion (Kraus & Lehmann & Magidor 1990, for a survey see Makinson 1994). More generally, the justification of a canon of normative principles—such as the rules of deductive logic, the rules of nonmonotonic logic, or the rules of inductive logic—are only justified relative to a certain goal when one can show that adhering to these normative principles in some sense furthers the goal in question.

Much like Goodman, Carnap sought to justify the principles of his inductive logic by appeals to intuition (cf. the quote in section 4b). Contemporary Bayesian confirmation theorists with their desideratum/explicatum approach follow Carnap and Goodman at least insofar as they apparently do not see the need for justifying their accounts of confirmation by more than appeals to intuition. These are supposed to show that their definitions of confirmation are adequate. But the alleged impossibility of justifying induction does not entail that its adequate description or explication in form of a particular theory of confirmation is sufficient to justify inductive inferences based on that theory. Moreover, as noted by Reichenbach (1938; 1940), a justification of induction is not impossible after all. Hume was right in claiming that there is no deductively valid argument with knowable premises and the conclusion that inductively strong arguments lead from true premises to true conclusions. But this is not the only conclusion that would justify induction. Reichenbach was mainly interested in the limiting relative frequencies of particular types of events in various sequences of events. He could show that a particular inductive rule—the straight rule that conjectures that the limiting relative frequency is equal to the observed relative frequency—will converge to the true limiting relative frequency, if any inductive rule does. However, the straight rule is not the only rule with this property. Therefore its justification relative to the goal of converging to limiting relative frequencies is at least incomplete. If we want to keep the analogy to deductive logic, we can put things as follows: Reichenbach was able to establish the soundness, but not the completeness, of his inductive logic (that is, the straight rule) with respect to the goal of converging to the true limiting relative frequency. (Reichenbach himself provides an example that proves the incompleteness of the straight rule with respect to this goal.)

While soundness in this sense is not sufficient for a justification of the straight rule, such results provide more reasons than appeals to intuition. They are necessary conditions for the justification of a normative rule of inference relative to a particular goal of inquiry. A similar view about the justification of induction is held by formal learning theory. Here one considers the objective reliability with which a particular method (such as the straight rule or a particular confirmation measure) finds out the correct answer to a given question. The use of a method to answer a question is only justified when the method reliably answers the question, if any method does. As different questions differ in their complexity, there are different senses of reliability. A method may correctly answer a question after finitely many steps and with a sign that the question is answered correctly—as when we answer the question whether the first observed raven is black by saying “yes” if it is, and “no” otherwise. Or it may answer the question after finitely many steps and with a sign that it has done so when the answer is “yes,” but not when the answer is “no”—as when we answer the question whether there exists a black raven by saying “yes” when we first observe a black raven, and by saying “no” otherwise. Or it may stabilize to the correct answer in the sense that the method conjectures the right answer after finitely many steps and continues to do so forever without necessarily giving a sign that it has arrived at the correct answer—as when we answer the question whether the limiting relative frequency of black ravens among all ravens is greater than .5 by saying “yes” as long as the observed relative frequency is greater than .5, and by saying “no” otherwise (under the assumption that this limit exists). And so on. This provides a classification of all problems in terms of their complexity. The use of a particular method for answering a question of a certain complexity is only justified if the method reliably answers the question in the sense of reliability determined by the complexity of the question. A discussion of Bayesian confirmation theory from the point of view of formal learning theory can be found in Kelly & Glymour (2004). Schulte (2002) gives an introduction to the main philosophical ideas of formal learning theory. A technically advanced book length exposition is Kelly (1996). The general idea is the same as before. A rule is justified relative to a certain goal to the extent that the rule furthers achieving the goal.

So can we justify particular inductive rules in the form of confirmation measures along these lines? We had better, for otherwise there might be inductive rules that would reliably lead us to the correct answer about a question where our inductive rules won’t (cf. Putnam 1963a; see also his 1963b). Before answering this question, let us first be clear which goal confirmation is supposed to further. In other words, why should we accept well-confirmed hypotheses rather than any other hypotheses? A natural answer is that science and our more ordinary epistemic enterprises aim at true hypotheses. The justification for confirmation would then be that we should accept well-confirmed hypotheses, because we are in some sense guaranteed to arrive at true hypotheses if (and only if) we stick to well-confirmed hypotheses. Something along these lines is true for absolute confirmation according to which degree of confirmation is equal to probability conditional on the data. More precisely, the Gaifman and Snir convergence theorem (Gaifman & Snir 1982) says that for almost every world or model w for the underlying language—that is, all worlds w except, possibly, for those in a set of measure 0 (in the sense of the measure Pr * on the σ-field A from section 4a)—the probability of a hypothesis conditional on the first n data sentences from w converges to its truth value in w (1 for true, 0 for false). It is assumed here that the set of all data sentences separates the set of all worlds (in the sense that for any two distinct worlds there is a data sentence which is true in the one and false in the other world). If we accept a hypothesis as true as soon as its probability is greater than .5 (or any other positive threshold value < 1), and reject it as false otherwise, we are guaranteed to almost surely arrive at true hypotheses after finitely many steps. That does not mean that no other method can do equally well. But it is more than to simply appeal to our intuitions, and a necessary condition for the justification of absolute confirmation relative to the goal of truth. See also Earman (1992, ch. 9) and Juhl (1997).

A more limited result is true for incremental confirmation. Based on the Gaifman and Snir convergence theorem one can show for every confirmation measure c and almost all worlds w that there is an n such that for all later m : the conjunction of the first m data sentences confirms hypotheses that are true in w to a non-negative degree, and it confirms hypotheses that are false in w to a non-positive degree (the set of all data sentences is again assumed to separate the set of all worlds). Even if this more limited result were a satisfying justification for the claim that incremental confirmation furthers the goal of truth, the question remains why one has to go to incremental confirmation in order to arrive at true theories. It also remains unclear what degrees of incremental confirmation are supposed to indicate, for it is completely irrelevant for the above result whether a positive degree of confirmation is high or low—all that matters is that it is positive. This is in contrast to absolute confirmation. There a high number represents a high probability—that is, a high probability of being true—which almost surely converges to the truth value itself. To make these vague remarks more vivid, let us consider an example.

Suppose I know I get a bottle of wine for my birthday, and I am curious as to whether it is a bottle or red wine, A , white wine, B , or rosé, C . It is common knowledge that I like red wine, and so my initial degree of belief function Pr is such that

Pr ( A ) = .9, Pr ( B ) = Pr ( C ) = .05, Pr ( A ∧ B ) = Pr ( A ∧ C ) = Pr ( B ∧ C ) = 0, Pr ( A ∨ B ) = Pr ( A ∨ C ) = .95, Pr ( B ∨ C ) = .1, Pr ( A ∨ B ∨ C ) = 1, Pr ( A ∧ G ) = .4, Pr ( B ∧ G ) = .03, Pr ( C ∧ G ) = .03, Pr ( G ) = .46,

where G is the proposition that I will get a bottle of Austrian wine. [More precisely, the probability space is < L , Pr > with L the propositional language over the set of propositional variables { A , B , C , G } and Pr such that Pr ( A ∧ G ) = .4, Pr ( B ∧ G ) = .03, Pr ( C ∧ G ) = .03, Pr ( A ∧¬ G ) = .5, Pr ( B ∧¬ G ) = .02, Pr ( C ∧¬ G ) = .02, Pr ( A ∧ B ) = Pr ( A ∧ C ) = Pr ( B ∧ C ) = Pr (¬ A ∧¬ B ∧¬ C )= 0.] This is a fairly reasonable degree of belief function. Most wine from Austria is white wine or rosé, although there are some Austrian red wines as well. Furthermore I tend to use the principal principle whenever I can (assuming a close connection between objective chances and relative frequencies). Now suppose I learn that I will get a bottle of Austrian wine, G . My new degrees of belief are

Pr ( A | G ) = 40/46, Pr ( B | G ) = 3/46, Pr ( C | G ) = 3/46, Pr ( A ∨ B | G ) = Pr ( A ∨ C | G ) = 43/46, Pr ( B ∨ C | G ) = 6/46, Pr ( A ∨ B ∨ C | G ) = 1.

G incrementally confirms B , C , B ∨ C , A ∨ C , B ∨ C , it neither incrementally confirms nor incrementally disconfirms A ∨ B ∨ C , and it incrementally disconfirms A .

However, my degree of belief in A is still more than thirteen times my degree of belief in B and my degree of belief in C . And whether I have to bet on these propositions or whether I am just curious what bottle of wine I will get, all I care about after having received evidence G will be my new degrees of belief in the various answers—and my utilities, including my desire to answer the question. I will be willing to bet on A at less favorable odds than on either B or C or even their disjunction; and should I buy new wine glasses for the occasion, I would buy red wine glasses. In this situation, incremental confirmation and degrees of incremental confirmation are at best misleading.

[What is important is a way of updating my old degree of belief function by the incoming evidence. The above example assumes evidence to come in the form of a proposition that I become certain of. In this case, probabilism says I should update my degree of belief function by Strict Conditionalization (see Vineberg 2000):

If Pr is your subjective probability at time t, and between t and t’ you learn E and no logically stronger proposition in the sense that your new degree of belief in E is 1, then your new subjective probability at time t’ should be Pr(•|E) .

As Jeffrey (1983) observes, we usually do not learn by becoming certain of a proposition. Evidence often merely changes our degrees of belief in various propositions. Jeffrey Conditionalization is a more general update rule than Strict Conditionalization:

If Pr is your subjective probability at time t, and between t and t’ your degrees of belief in the countable partition { E 1 , …, E n , … } change from Pr(E i ) to p i  ∈ [0,1] (with Pr(E i ) = p i for Pr(E i ) ∈ {0,1}), and your positive degrees of belief do not change on any superset thereof, then your new subjective probability at time t’ should be Pr *, where for all A , Pr*(A) = Σ i Pr(A|E i )•p i .

For evidential input of the above form, Jeffrey Conditionalization turns regular probability measures into regular probability measures, provided no contingent evidential proposition receives an extreme value p ∈ {0,1}. Radical probabilism (Jeffrey 2004) urges you not to assign such extreme values, and to have a regular initial degree of belief function—that is, whenever you can (but you can’t always). Field (1978) proposes an update rule for evidence of a different format.

This is also the place to mention different formal frameworks besides probability theory. For an overview, see Huber (2008a).]

More generally, degrees of belief are important to us, because together with our desires they determine which acts it is rational for us to take. The usual recommendation according to rational choice theory for choosing one’s acts is to maximize one’s expected utility (the mathematical representation of one’s desires), that is, the quantity

EU( a ) = Σ s ∈ S u( a ( s ))• Pr ( s ).

Here S is an exclusive and exhaustive set of states, u is the agent’s utility function over the set of outcomes a ( s ) which are the results of an act a in a state s (acts are identified with functions from states s to outcomes), and Pr is the agent’s probability measure on a field over S (Savage 1972, Joyce 1999, Buchak 2014). From this decision-theoretic point of view all we need—besides our utilities—are our degrees of belief encoded in Pr . Degrees of confirmation encoding how much one proposition increases the probability of another are of no use here.

In the above example I only consider the propositions A , B , C , because they are sufficiently informative to answer my question. If truth were the only thing I am interested in, I would be happy with the tautological answer that I will get some bottle of wine, A ∨ B ∨ C . But I am not. The reason is that I want to know what is going on out there—not only in the sense of having true beliefs, but also in the sense of having informative beliefs. In terms of decision theory, my decisions do not only depend on my degrees of belief—they also depend on my utilities. This is the idea behind the plausibility-informativeness theory (Huber 2008b), according to which epistemic utilities reduce to informativeness values. If we take as our epistemic utilities in the above example the informativeness values of the various answers (with positive probability) to our question, we get

I ( A ) = I ( B ) = I ( C ) = 1, I ( A ∨ B ) = I ( A ∨ C ) ≈ 40/83, I ( B ∨ C ) = 60/83, I ( A ∨ B ∨ C ) = 0,

where the question “What bottle of wine will I get for my birthday?” is represented by the partition Q = { A , B , C } and the informativeness values of the various answers are calculated according to

I ( A ) = 1 – [1 – Σ i Pr* ( X i | A ) 2 ]/[1 – Σ i Pr* ( X i ) 2 ],

a measure proposed by Hilpinen (1970). Contrary to what Hilpinen (1970, 112) claims, I ( A ) does not increase with the logical strength of A. The probability Pr* is the posterior degree of belief function from our example, Pr (•| G ). If we insert these values into the expected utility formula,

EU( a ) = Σ s∈ S u( a ( s ))• Pr* ( s ) = Σ X∈ Q u( a ( X ))• Pr* ( X ) = Σ X∈ Q I ( X )• Pr* ( X ),

we get the result that the act of accepting A as answer to our question maximizes our expected epistemic utility.

Not all is lost, however. The distance measure d turns out to measure the expected utility of accepting H when utility is identified with informativeness measured according to a measure proposed by Carnap & Bar-Hillel (1953) (one can think of this measure as measuring how much an answer informs about the most difficult question, namely, which world is the actual one?). Similarly, the Joyce-Christensen measure s turns out to measure the expected utility of accepting H when utility is identified with informativeness about the data measured according to a proposal by Hempel & Oppenheim (1948). So far, this is only interesting. It gets important by noting that d and s can also be justified relative to the goal of informative truth— and not just by appealing to our intuitions about maximizing expected utility. When based on a regular probability, there almost surely is an n such that for all later m : relative to the conjunction of the first m data sentences, contingently true hypotheses get a positive value and contingently false hypotheses get a negative value. Moreover, within the true hypotheses, logically stronger hypotheses get a higher value than logically weaker hypotheses. The logically strongest true hypothesis (the complete true theory about the world w ) gets the highest value, followed by all logically weaker true hypotheses all the way down to the logically weakest true hypothesis, the tautology, which is sent to 0. Similarly within the false hypotheses: the logically strongest false hypothesis, the contradiction, is sent to 0, followed by all logically weaker false hypotheses all the way down to the logically weakest false hypothesis (the negation of the complete theory about w ). As informativeness increases with logical strength, we can put this as follows (assuming that the underlying probability measure is regular): d and s do not only distinguish between true and false theories, as do all confirmation measures (as well as all conditional probabilities). They additionally distinguish between informative and uninformative true theories, as well as between informative and uninformative false theories. In this sense, they reveal the following structure of almost every world w [ w ( p ) = w ( q ) = 1 in the toy example]:

informative and contingently true in w p ∧ q > 0 contingently true in w p, q , p ↔ q uninformative and contingently true in w p∨ q , ¬ p ∨ q , p ∨¬ q = 0 logically determined p ∨¬ p , p ∧¬ p informative and contingently false in w ¬ p ∧¬ q , p ∧¬ q , ¬ p ∧ q < 0 contingently false in w ¬ p , ¬ q , p ↔¬ q uninformative and contingently false in w ¬ p ∨¬ q

This result is also true for the Carnap measure c , but it does not extend to all confirmation measures. It is false for the Milne measure r , which does not distinguish between informative and uninformative false theories. And it is false for the Good-Fitelson measure l , which distinguishes neither between informative and uninformative true theories nor between informative and uninformative false theories. For more see Huber (2005b).

The reason c , d , and s have this property of distinguishing between informative and uninformative truth and falsehood is that they are probabilistic assessment functions in the sense of the plausibility-informativeness theory (Huber 2008b)—and the above result is true for all probabilistic assessment functions (not only those that can be expressed as expected utilities). The plausibility-informativeness theory agrees with traditional philosophy that truth is an epistemic goal. Its distinguishing thesis is that there is a second epistemic goal besides truth, namely, informativeness, which has to be taken into account when we evaluate hypotheses. Like confirmation theory, the plausibility-informativeness theory assigns numbers to hypotheses in the light of evidence. But unlike confirmation theory, it does not appeal to intuitions when it comes to the question why one is justified in accepting hypotheses with high assessment values. The plausibility-informativeness theory answers this question by showing that accepting hypotheses according to the recommendation of an assessment function almost surely leads one to (the most) informative (among all) true hypotheses.

It is idle to speculate what Hume would have said to all this. Suffice it to note that his problem would not have gotten off the ground without our desire for informativeness.

8. References and Further Reading

  • Albert, Max (1992), “Die Falsifikation Statistischer Hypothesen.” Journal for General Philosophy of Science 23 , 1-32.
  • Alchourrón, Carlos E. & Gärdenfors, Peter & Makinson, David (1985), “On the Logic of Theory Change: Partial Meet Contraction and Revision Functions.” Journal of Symbolic Logic 50 , 510-530.
  • Briggs, Rachael (2009a), “The Big Bad Bug Bites Anti-Realists About Chance.” Synthese 167 , 81-92.
  • Briggs, Rachael (2009b), “Distorted Reflection.” Philosophical Review 118 , 59-85.
  • Buchak, Laraf (2014), Risk and Rationality . Oxford: Oxford University Press.
  • Carnap, Rudolf (1950/1962), Logical Foundations of Probability . 2 nd ed. Chicago: University of Chicago Press.
  • Carnap, Rudolf (1952), The Continuum of Inductive Methods . Chicago: University of Chicago Press.
  • Carnap, Rudolf (1963), “Replies and Systematic Expositions. Probability and Induction. ” In P.A. Schilpp (ed.), The Philosophy of Rudolf Carnap . La Salle, IL: Open Court, 966-998.
  • Carnap, Rudolf & Bar-Hillel, Yehoshua (1953), An Outline of a Theory of Semantic Information . Technical Report 247 . Research Laboratory of Electronics, MIT. Reprinted in Y. Bar-Hillel (1964), Language and Information . Selected Essays on Their Theory and Application . Reading, MA: Addison-Wesley, 221-274.
  • Christensen, David (1999), “Measuring Confirmation. ” Journal of Philosophy 96 , 437-461.
  • Crupi, Vincenzo and Tentori, Katya, and Gonzalez, Michel (2007), On Bayesian Measures of Evidential Support: Theoretical and Empirical Issues. Philosophy of Science 74 , 229-252.
  • Duhem, Pierre (1906/1974), The Aim and Structure of Physical Theory . New York: Atheneum.
  • Earman, John (1992), Bayes or Bust? A Critical Examination of Bayesian Confirmation Theory . Cambridge, MA: MIT Press.
  • Eells, Ellery (2005), “Confirmation Theory. ” In J. Pfeifer & S. Sarkar (eds.), The Philosophy of Science. An Encyclopedia . Oxford: Routledge.
  • Field, Hartry (1978), “A Note on Jeffrey Conditionalization. ” Philosophy of Science 45 , 361-367.
  • Fitelson, Branden (1999), “The Plurality of Bayesian Measures of Confirmation and the Problem of Measure Sensitivity. ” Philosophy of Science 66 (Proceedings), S362-S378.
  • Fitelson, Branden (2001), Studies in Bayesian Confirmation Theory . PhD Dissertation. Madison, WI: University of Wisconsin-Madison.
  • Fitelson, Branden (2002), “Putting the Irrelevance Back Into the Problem of Irrelevant Conjunction. ” Philosophy of Science 69 , 611-622.
  • Fitelson, Branden (2005), “Inductive Logic. ” In J. Pfeifer & S. Sarkar (eds.), The Philosophy of Science. An Encyclopedia . Oxford: Routledge.
  • Fitelson, Branden & Hájek, Alan & Hall, Ned (2005), “Probability. ” In J. Pfeifer & S. Sarkar (eds.), The Philosophy of Science. An Encyclopedia . Oxford: Routledge.
  • Gaifman, Haim & Snir, Marc (1982), “Probabilities over Rich Languages, Testing, and Randomness.” Journal of Symbolic Logic 47 , 495-548.
  • Gärdenfors, Peter (1988), Knowledge in Flux . Modeling the Dynamics of Epistemic States . Cambridge, MA: MIT Press.
  • Gärdenfors, Peter & Rott, Hans (1995), “Belief Revision. ” In D.M. Gabbay & C.J. Hogger & J.A. Robinson (eds.), Handbook of Logic in Artificial Intelligence and Logic Programming. Vol. 4. Epistemic and Temporal Reasoning . Oxford: Clarendon Press, 35-132.
  • Glymour, Clark (1980), Theory and Evidence . Princeton: Princeton University Press.
  • Good, Irving John (1967), “The White Shoe is a Red Herring.” British Journal for the Philosophy of Science 17 , 322.
  • Good, Irving John (1968), “The White Shoe qua Herring is Pink.” British Journal for the Philosophy of Science 19 , 156-157.
  • Good, Irving John (1983), Good Thinking: The Foundations of Probability and Its Applications . Minneapolis: University of Minnesota Press.
  • Goodman, Nelson (1946), “A Query on Confirmation.” Journal of Philosophy 43 , 383-385.
  • Goodman, Nelson (1983), Fact, Fiction, and Forecast . 4 th ed. Cambridge, MA: Harvard University Press.
  • Grimes, Thomas R. (1990), “Truth, Content, and the Hypothetico-Deductive Method.” Philosophy of Science 57 , 514-522.
  • Hacking, Ian (2001), An Introduction to Probability and Inductive Logic . Cambridge: Cambridge University Press.
  • Hájek, Alan (2003a), “Interpretations of Probability.” In E.N. Zalta (ed.), Stanford Encyclopedia of Philosophy .
  • Hájek, Alan (2003b), “What Conditional Probability Could Not Be.” Synthese 137 , 273-323.
  • Hájek, Alan (2005), “Scotching Dutch Books?” Philosopical Perspectives 19 (Epistemology), 139-151.
  • Hájek, Alan & Hall, Ned (2000), “Induction and Probability.” In P. Machamer & M. Silberstein (eds.), The Blackwell Guide to the Philosophy of Science . Oxford: Blackwell, 149-172.
  • Hall, Ned (1994), “Correcting the Guide to Objective Chance.” Mind 103 , 505-518.
  • Hawthorne, James (2005), “Inductive Logic.” In E.N. Zalta (ed.), Stanford Encyclopedia of Philosophy .
  • Hawthorne, James & Fitelson, Branden (2004), “Re-solving Irrelevant Conjunction with Probabilistic Independence.” Philosophy of Science 71 , 505-514.
  • Hempel, Carl Gustav (1945), “Studies in the Logic of Confirmation.” Mind 54 , 1-26, 97-121.
  • Hempel, Carl Gustav (1962), “Deductive-Nomological vs. Statistical Explanation.” In H. Feigl & G. Maxwell (eds.), Scientific Explanation, Space and Time . Minnesota Studies in the Philosophy of Science 3 . Minneapolis: University of Minnesota Press, 98-169.
  • Hempel, Carl Gustav (1967), “The White Shoe: No Red Herring.” British Journal for the Philosophy of Science 18 , 239-240.
  • Hempel, Carl Gustav & Oppenheim, Paul (1948), “Studies in the Logic of Explanation.” Philosophy of Science 15 , 135-175.
  • Hilpinen, Risto (1970), “On the Information Provided by Observations.” In J. Hintikka & P. Suppes (eds.), Information and Inference . Dordrecht: D. Reidel, 97-122.
  • Hintikka, Jaakko (1966), “A Tw-Dimensional Continuum of Inductive Methods.” In J. Hintikka & P. Suppes (eds.), Aspects of Inductive Logic . Amsterdam: North-Holland, 113-132.
  • Hitchcock, Christopher R. (2001), “The Intransitivity of Causation Revealed in Graphs and Equations.” Journal of Philosophy 98 , 273-299.
  • Howson, Colin (2000a), Hume’s Problem: Induction and the Justification of Belief . Oxford: Oxford University Press.
  • Howson, Colin (2000b), “Evidence and Confirmation.” In W.H. Newton-Smith (ed.), A Companion to the Philosophy of Science . Oxford: Blackwell, 108-116.
  • Howson, Colin & Urbach, Peter (1989/2005), Scientific Reasoning: The Bayesian Approach . 3 rd ed. La Salle, IL: Open Court.
  • Huber, Franz (2005a), “Subjective Probabilities as Basis for Scientific Reasoning?” British Journal for the Philosophy of Science 56 , 101-116.
  • Huber, Franz (2005b), “What Is the Point of Confirmation?” Philosophy of Science 75 , 1146-1159.
  • Huber, Franz (2008a) “Formal Epistemology.” In E. N. Zalta (ed.), Stanford Encyclopedia of Philosophy .
  • Huber, Franz (2008b), “Assessing Theories, Bayes Style.” Synthese 161 , 89-118.
  • Hume, David (1739/2000), A Treatise of Human Nature . Ed. by D.F. Norton & M.J. Norton. Oxford: Oxford University Press.
  • Jeffrey, Richard C. (1965/1983), The Logic of Decision . 2 nd ed. Chicago: University of Chicago Press.
  • Jeffrey, Richard C. (2004), Subjective Probability: The Real Thing . Cambridge: Cambridge University Press.
  • Jeffreys, Harold (1939/1967), Theory of Probability . 3 rd ed. Oxford: Clarendon Press.
  • Joyce, James F. (1998), “A Non-Pragmatic Vindication of Probabilism.” Philosophy of Science 65 , 575-603.
  • Joyce, James F. (1999), The Foundations of Causal Decision Theory . Cambridge: Cambridge University Press.
  • Joyce, James M. (2003), “Bayes’s Theorem.” In E.N. Zalta (ed.), Stanford Encyclopedia of Philosophy .
  • Juhl, Cory (1997), “Objectively Reliable Subjective Probabilities.” Synthese 109 , 293-309.
  • Kelly, Kevin T. (1996), The Logic of Reliable Inquiry . Oxford: Oxford University Press.
  • Kelly, Kevin T. & Glymour, Clark (2004), “Why Probability does not Capture the Logic of Scientific Justification.” In C. Hitchcock (ed.), Contemporary Debates in the Philosophy of Science . Oxford: Blackwell, 94-114.
  • Keynes, John Maynard (1921/1973), A Treatise on Probability . The Collected Writings of John Maynard Keynes . Vol. III. New York: St. Martin’s Press.
  • Kolmogoroff, Andrej N. (1933), Grundbegriffe der Wahrscheinlichkeitsrechnung . Berlin: Springer.
  • Kolmogorov, Andrej N. (1956), Foundations of the Theory of Probability , 2 nd ed. New York: Chelsea Publishing Company.
  • Koons, Robert (2005), “Defeasible Reasoning.” In E.N. Zalta (ed.), Stanford Encyclopedia of Philosophy .
  • Kraus, Sarit & Lehmann, Daniel & Magidor, Menachem (1990), “Nonmonotonic Reasoning, Preferential Models, and Cumulative Logics.” Artificial Intelligence 40 , 167-207.
  • Kuipers, Theo A.F. (2000), From Instrumentalism to Constructive Realism. On Some Relations between Confirmation, Empirical Progress, and Truth Approximation . Dordrecht: Kluwer.
  • Kyburg, Henry E. Jr. (1961), Probability and the Logic of Rational Belief . Middletown, CT: Wesleyan University Press.
  • Lewis, David (1980), “A Subjectivist’s Guide to Objective Chance.” In R.C. Jeffrey (ed.), Studies in Inductive Logic and Probability . Vol. II. Berkeley: University of California Press, 263-293. Reprinted in D. Lewis (1986), Philosophical Papers . Vol. II. Oxford: Oxford University Press, 83-113.
  • Lewis, David (1994), “Humean Supervenience Debugged.” Mind 103 , 473-490.
  • Maher, Patrick (1999), “Inductive Logic and the Ravens Paradox.” Philosophy of Science 66 , 50-70.
  • Maher, Patrick (2004a), “Probability Captures the Logic of Scientific Confirmation.” In C. Hitchcock (ed.), Contemporary Debates in Philosophy of Science . Oxford: Blackwell, 69-93.
  • Maher, Patrick (2004b), “Bayesianism and Irrelevant Conjunction.” Philosophy of Science 71 , 515-520.
  • Makinson, David (1994), “General Patterns in Nonmonotonic Logic.” In D.M. Gabbay & C.J. Hogger & J.A. Robinson (eds.), Handbook of Logic in Artificial Intelligence and Logic Programming. Vol. 3. Nonmonotonic Reasoning and Uncertain Reasoning . Oxford: Clarendon Press, 35-110.
  • Milne, Peter (1996), “log[P( h | eb )/P( h / b )] is the One True Measure of Confirmation.” Philosophy of Science 63 , 21-26.
  • Moretti, Luca (2004), “Grimes on the Tacking by Disjunction Problem.” Disputatio 17 , 16-20.
  • Pearl, Judea (2000), Causality: Models, Reasoning, and Inference . Cambridge: Cambridge University Press.
  • Popper, Karl R. (1935/1994), Logik der Forschung . Tübingen: J.C.B. Mohr.
  • Putnam, Hilary (1963a), “Degree of Confirmation and Inductive Logic.” P.A. Schilpp (ed.), The Philosophy of Rudolf Carnap . La Salle, IL: Open Court, 761-784. Reprinted in H. Putnam (1975/1979), Mathematics, Matter and Method . 2 nd ed. Cambridge: Cambridge University Press, 270-292.
  • Putnam, Hilary (1963b), “Probability and Confirmation.” The Voice of America, Forum Philosophy of Science 10 , U.S. Information Agency. Reprinted in H. Putnam (1975/1979), Mathematics, Matter and Method . 2nd ed. Cambridge: Cambridge University Press, 293-304.
  • Quine, Willard Van Orman (1953), “Two Dogmas of Empiricism.” The Philosophical Review 60 , 20-43.
  • Quine, Willard van Orman (1969), “Natural Kinds.” In N. Rescher et.al. (eds.), Essays in Honor of Carl G. Hempel. Dordrecht: Reidel, 5-23.
  • Reichenbach, Hans (1938), Experience and Prediction. An Analysis of the Foundations and the Structure of Knowledge . Chicago: University of Chicago Press.
  • Reichenbach, Hans (1940), “On the Justification of Induction.” Journal of Philosophy 37 , 97-103.
  • Rosenkrantz, Roger (1981), Foundations and Applications of Inductive Probability . New York: Ridgeview.
  • Roush, Sherrilyn (2005), “Problem of Induction.” In J. Pfeifer & S. Sarkar (eds.), The Philosophy of Science. An Encyclopedia . Oxford: Routledge.
  • Savage, Leonard J. (1954/1972), The Foundations of Statistics . 2 nd ed. New York: Dover.
  • Schulte, Oliver (2002), “Formal Learning Theory.” In E.N. Zalta (ed.), Stanford Encyclopedia of Philosophy .
  • Skyrms, Brian (2000), Choice and Chance . An Introduction to Inductive Logic . 4 th ed. Belmont, CA: Wadsworth Thomson Learning.
  • Spohn, Wolfgang (1988), “Ordinal Conditional Functions: A Dynamic Theory of Epistemic States.” In W.L. Harper & B. Skyrms (eds.), Causation in Decision, Belief Change, and Statistics II . Dordrecht: Kluwer, 105-134.
  • Spohn, Wolfgang (2010), “Chance and Necessity: From Humean Supervenience to Humean Projection.” In E. Eells & J. Fetzer (eds.), The Place of Probability in Science . Boston Studies in the Philosophy of Science 284 . Dordrecht: Springer, 101-131.
  • Stalker, Douglas F. (ed.) (1994), Grue! The New Riddle of Induction . Chicago: Open Court.
  • Thau, Michael (1994), “Undermining and Admissibility.” Mind 103 , 491-504.
  • van Fraassen, Bas C. (1984), “Belief and the Will.” Journal of Philosophy 81 , 235-256.
  • van Fraassen, Bas C. (1995), “Belief and the Problem of Ulysses and the Sirens.” Philosophical Studies 77 , 7-37.
  • Vineberg, Susan (2000), “The Logical Status of Conditionalization and its Role in Confirmation.” In N. Shanks & R.B. Gardner (eds.), Logic, Probability and Science . Poznan Studies in the Philosophy of the Science and Humanities 71 . Amsterdam: Rodopi, 77-94.
  • Vineberg, Susan (2005), “Dutch Book Argument.” In J. Pfeifer & S. Sarkar (eds.), The Philosophy of Science. An Encyclopedia . Oxford: Routledge.
  • Vranas, Peter B.M. (2004), “Hempel’s Raven Paradox: A Lacuna in the Standard Bayesian Solution.” British Journal for the Philosophy of Science 55 , 545-560.
  • Woodward, James F. (2003), Making Things Happen. A Theory of Causal Explanation . Oxford: Oxford University Press.

Author Information

Franz Huber Email: [email protected] California Institute of Technology U. S. A.

An encyclopedia of philosophy articles written by professional philosophers.

Grad Coach

What Is A Research (Scientific) Hypothesis? A plain-language explainer + examples

By:  Derek Jansen (MBA)  | Reviewed By: Dr Eunice Rautenbach | June 2020

If you’re new to the world of research, or it’s your first time writing a dissertation or thesis, you’re probably noticing that the words “research hypothesis” and “scientific hypothesis” are used quite a bit, and you’re wondering what they mean in a research context .

“Hypothesis” is one of those words that people use loosely, thinking they understand what it means. However, it has a very specific meaning within academic research. So, it’s important to understand the exact meaning before you start hypothesizing. 

Research Hypothesis 101

  • What is a hypothesis ?
  • What is a research hypothesis (scientific hypothesis)?
  • Requirements for a research hypothesis
  • Definition of a research hypothesis
  • The null hypothesis

What is a hypothesis?

Let’s start with the general definition of a hypothesis (not a research hypothesis or scientific hypothesis), according to the Cambridge Dictionary:

Hypothesis: an idea or explanation for something that is based on known facts but has not yet been proved.

In other words, it’s a statement that provides an explanation for why or how something works, based on facts (or some reasonable assumptions), but that has not yet been specifically tested . For example, a hypothesis might look something like this:

Hypothesis: sleep impacts academic performance.

This statement predicts that academic performance will be influenced by the amount and/or quality of sleep a student engages in – sounds reasonable, right? It’s based on reasonable assumptions , underpinned by what we currently know about sleep and health (from the existing literature). So, loosely speaking, we could call it a hypothesis, at least by the dictionary definition.

But that’s not good enough…

Unfortunately, that’s not quite sophisticated enough to describe a research hypothesis (also sometimes called a scientific hypothesis), and it wouldn’t be acceptable in a dissertation, thesis or research paper . In the world of academic research, a statement needs a few more criteria to constitute a true research hypothesis .

What is a research hypothesis?

A research hypothesis (also called a scientific hypothesis) is a statement about the expected outcome of a study (for example, a dissertation or thesis). To constitute a quality hypothesis, the statement needs to have three attributes – specificity , clarity and testability .

Let’s take a look at these more closely.

Need a helping hand?

what does confirm hypothesis mean

Hypothesis Essential #1: Specificity & Clarity

A good research hypothesis needs to be extremely clear and articulate about both what’ s being assessed (who or what variables are involved ) and the expected outcome (for example, a difference between groups, a relationship between variables, etc.).

Let’s stick with our sleepy students example and look at how this statement could be more specific and clear.

Hypothesis: Students who sleep at least 8 hours per night will, on average, achieve higher grades in standardised tests than students who sleep less than 8 hours a night.

As you can see, the statement is very specific as it identifies the variables involved (sleep hours and test grades), the parties involved (two groups of students), as well as the predicted relationship type (a positive relationship). There’s no ambiguity or uncertainty about who or what is involved in the statement, and the expected outcome is clear.

Contrast that to the original hypothesis we looked at – “Sleep impacts academic performance” – and you can see the difference. “Sleep” and “academic performance” are both comparatively vague , and there’s no indication of what the expected relationship direction is (more sleep or less sleep). As you can see, specificity and clarity are key.

A good research hypothesis needs to be very clear about what’s being assessed and very specific about the expected outcome.

Hypothesis Essential #2: Testability (Provability)

A statement must be testable to qualify as a research hypothesis. In other words, there needs to be a way to prove (or disprove) the statement. If it’s not testable, it’s not a hypothesis – simple as that.

For example, consider the hypothesis we mentioned earlier:

Hypothesis: Students who sleep at least 8 hours per night will, on average, achieve higher grades in standardised tests than students who sleep less than 8 hours a night.  

We could test this statement by undertaking a quantitative study involving two groups of students, one that gets 8 or more hours of sleep per night for a fixed period, and one that gets less. We could then compare the standardised test results for both groups to see if there’s a statistically significant difference. 

Again, if you compare this to the original hypothesis we looked at – “Sleep impacts academic performance” – you can see that it would be quite difficult to test that statement, primarily because it isn’t specific enough. How much sleep? By who? What type of academic performance?

So, remember the mantra – if you can’t test it, it’s not a hypothesis 🙂

A good research hypothesis must be testable. In other words, you must able to collect observable data in a scientifically rigorous fashion to test it.

Defining A Research Hypothesis

You’re still with us? Great! Let’s recap and pin down a clear definition of a hypothesis.

A research hypothesis (or scientific hypothesis) is a statement about an expected relationship between variables, or explanation of an occurrence, that is clear, specific and testable.

So, when you write up hypotheses for your dissertation or thesis, make sure that they meet all these criteria. If you do, you’ll not only have rock-solid hypotheses but you’ll also ensure a clear focus for your entire research project.

What about the null hypothesis?

You may have also heard the terms null hypothesis , alternative hypothesis, or H-zero thrown around. At a simple level, the null hypothesis is the counter-proposal to the original hypothesis.

For example, if the hypothesis predicts that there is a relationship between two variables (for example, sleep and academic performance), the null hypothesis would predict that there is no relationship between those variables.

At a more technical level, the null hypothesis proposes that no statistical significance exists in a set of given observations and that any differences are due to chance alone.

And there you have it – hypotheses in a nutshell. 

If you have any questions, be sure to leave a comment below and we’ll do our best to help you. If you need hands-on help developing and testing your hypotheses, consider our private coaching service , where we hold your hand through the research journey.

what does confirm hypothesis mean

Psst... there’s more!

This post was based on one of our popular Research Bootcamps . If you're working on a research project, you'll definitely want to check this out ...

You Might Also Like:

Research limitations vs delimitations

16 Comments

Lynnet Chikwaikwai

Very useful information. I benefit more from getting more information in this regard.

Dr. WuodArek

Very great insight,educative and informative. Please give meet deep critics on many research data of public international Law like human rights, environment, natural resources, law of the sea etc

Afshin

In a book I read a distinction is made between null, research, and alternative hypothesis. As far as I understand, alternative and research hypotheses are the same. Can you please elaborate? Best Afshin

GANDI Benjamin

This is a self explanatory, easy going site. I will recommend this to my friends and colleagues.

Lucile Dossou-Yovo

Very good definition. How can I cite your definition in my thesis? Thank you. Is nul hypothesis compulsory in a research?

Pereria

It’s a counter-proposal to be proven as a rejection

Egya Salihu

Please what is the difference between alternate hypothesis and research hypothesis?

Mulugeta Tefera

It is a very good explanation. However, it limits hypotheses to statistically tasteable ideas. What about for qualitative researches or other researches that involve quantitative data that don’t need statistical tests?

Derek Jansen

In qualitative research, one typically uses propositions, not hypotheses.

Samia

could you please elaborate it more

Patricia Nyawir

I’ve benefited greatly from these notes, thank you.

Hopeson Khondiwa

This is very helpful

Dr. Andarge

well articulated ideas are presented here, thank you for being reliable sources of information

TAUNO

Excellent. Thanks for being clear and sound about the research methodology and hypothesis (quantitative research)

I have only a simple question regarding the null hypothesis. – Is the null hypothesis (Ho) known as the reversible hypothesis of the alternative hypothesis (H1? – How to test it in academic research?

Tesfaye Negesa Urge

this is very important note help me much more

Trackbacks/Pingbacks

  • What Is Research Methodology? Simple Definition (With Examples) - Grad Coach - […] Contrasted to this, a quantitative methodology is typically used when the research aims and objectives are confirmatory in nature. For example,…

Submit a Comment Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

  • Print Friendly
  • Food & Dining
  • Coronavirus
  • Real Estate
  • Seattle History
  • PNW Politics

What Is the Next Step if an Experiment Fails to Confirm Your Hypothesis?

Related articles, advantages of apa writing style, what is coefficient of variation used for in biology, how does the particle model of matter explain dissolving.

  • What Is the Purpose of Adding Starch to the Titration Mixture?
  • Can a Thesis Statement Pose a Question?

You had a question, designed an experiment, formulated a hypothesis and it was not supported. First, it is important to note that you can definitively disprove something, but it is impossible to prove or confirm a hypothesis, only to support it. Whether your hypothesis was supported or not, the experiment added to what you know about your initial question, and there are always more questions to answer.

The Scientific Method

The scientific method is a set of logical steps that helps scientists understand the world. It is process of inductive reasoning, or using the observation of phenomena to make generalizations about how the world works. This contrasts with deductive reasoning, which is taking a theory or rule and applying it to an individual situation. The scientific method goes through a progression of steps to create new knowledge. You make an observation, have a question about your observation, form a hypothesis about how it works, test your hypothesis and then form new questions and hypotheses based on the results. The results of your experiment represent a single iteration of the scientific method.

The Importance of Replication

You have the results of your experiment, but you may wonder if you made a mistake somewhere in your methods that caused the results to come out the way they did. Repetition and replication are both important to verify your findings. Repetition is when you do the experiment over several times. When scientists do this, they often get slightly different answers and then use statistics to decide whether their hypothesis is generally supported or not. Replication is when you give your methods to someone else and see if they get the same results. Both are ways of taking human error out of the equation.

What You Do Know

When you have replicated results that don’t support your hypothesis, you have still created knowledge. You can now exclude your initial hypothesis from the possible ways to answer your question; you know how the system doesn’t work. When you have a hypothesis that you are excited about, it can be disappointing to have it disproved, but it is important to keep in mind that you are still one step closer to the most reasonable explanation. "No" can be an important answer.

Formulating a New Hypothesis

You can go through the steps of the scientific method, but the process is never really complete. If the initial hypothesis is not supported, you can go back to the drawing board and hypothesize a new answer to the question and a new way to test it. If your hypothesis is supported, you might think of ways to refine your hypothesis and test those. Either way, the process of experimentation often leads to whole new questions to explore. The possibilities are infinite, and the search for knowledge is never-ending.

  • University of Miami : Solving Problems in Biology
  • Understanding Science: Copycats in Science: The Role of Replication
  • Science Made Simple: Understanding and Using the Scientific Method
  • Exploratorium: Evidence: How Do We Know What We Know?

Based in Wenatchee, Wash., Andrea Becker specializes in biology, ecology and environmental sciences. She has written peer-reviewed articles in the "Journal of Wildlife Management," policy documents,and educational materials. She holds a Master of Science in wildlife management from Iowa State University. She was once charged by a grizzly bear while on the job.

Eight Critical Thinking Guidelines in Psychology

Types of observation in the scientific method, how is the nclex graded, how to write an explanation essay, what is the level of treatment in a scientific experiment, what is feedback inhibition and why is it important in regulating enzyme activity, how to do an egg drop experiment for physics, the difference between photosynthesis and solar cells, what molecule is the result of dna translation, most popular.

  • 1 Eight Critical Thinking Guidelines in Psychology
  • 2 Types of Observation in the Scientific Method
  • 3 How Is the NCLEX Graded?
  • 4 How to Write an Explanation Essay

Frequently asked questions

What is a hypothesis.

A hypothesis states your predictions about what your research will find. It is a tentative answer to your research question that has not yet been tested. For some research projects, you might have to write several hypotheses that address different aspects of your research question.

A hypothesis is not just a guess — it should be based on existing theories and knowledge. It also has to be testable, which means you can support or refute it through scientific research methods (such as experiments, observations and statistical analysis of data).

Frequently asked questions: Methodology

Attrition refers to participants leaving a study. It always happens to some extent—for example, in randomized controlled trials for medical research.

Differential attrition occurs when attrition or dropout rates differ systematically between the intervention and the control group . As a result, the characteristics of the participants who drop out differ from the characteristics of those who stay in the study. Because of this, study results may be biased .

Action research is conducted in order to solve a particular issue immediately, while case studies are often conducted over a longer period of time and focus more on observing and analyzing a particular ongoing phenomenon.

Action research is focused on solving a problem or informing individual and community-based knowledge in a way that impacts teaching, learning, and other related processes. It is less focused on contributing theoretical input, instead producing actionable input.

Action research is particularly popular with educators as a form of systematic inquiry because it prioritizes reflection and bridges the gap between theory and practice. Educators are able to simultaneously investigate an issue as they solve it, and the method is very iterative and flexible.

A cycle of inquiry is another name for action research . It is usually visualized in a spiral shape following a series of steps, such as “planning → acting → observing → reflecting.”

To make quantitative observations , you need to use instruments that are capable of measuring the quantity you want to observe. For example, you might use a ruler to measure the length of an object or a thermometer to measure its temperature.

Criterion validity and construct validity are both types of measurement validity . In other words, they both show you how accurately a method measures something.

While construct validity is the degree to which a test or other measurement method measures what it claims to measure, criterion validity is the degree to which a test can predictively (in the future) or concurrently (in the present) measure something.

Construct validity is often considered the overarching type of measurement validity . You need to have face validity , content validity , and criterion validity in order to achieve construct validity.

Convergent validity and discriminant validity are both subtypes of construct validity . Together, they help you evaluate whether a test measures the concept it was designed to measure.

  • Convergent validity indicates whether a test that is designed to measure a particular construct correlates with other tests that assess the same or similar construct.
  • Discriminant validity indicates whether two tests that should not be highly related to each other are indeed not related. This type of validity is also called divergent validity .

You need to assess both in order to demonstrate construct validity. Neither one alone is sufficient for establishing construct validity.

  • Discriminant validity indicates whether two tests that should not be highly related to each other are indeed not related

Content validity shows you how accurately a test or other measurement method taps  into the various aspects of the specific construct you are researching.

In other words, it helps you answer the question: “does the test measure all aspects of the construct I want to measure?” If it does, then the test has high content validity.

The higher the content validity, the more accurate the measurement of the construct.

If the test fails to include parts of the construct, or irrelevant parts are included, the validity of the instrument is threatened, which brings your results into question.

Face validity and content validity are similar in that they both evaluate how suitable the content of a test is. The difference is that face validity is subjective, and assesses content at surface level.

When a test has strong face validity, anyone would agree that the test’s questions appear to measure what they are intended to measure.

For example, looking at a 4th grade math test consisting of problems in which students have to add and multiply, most people would agree that it has strong face validity (i.e., it looks like a math test).

On the other hand, content validity evaluates how well a test represents all the aspects of a topic. Assessing content validity is more systematic and relies on expert evaluation. of each question, analyzing whether each one covers the aspects that the test was designed to cover.

A 4th grade math test would have high content validity if it covered all the skills taught in that grade. Experts(in this case, math teachers), would have to evaluate the content validity by comparing the test to the learning objectives.

Snowball sampling is a non-probability sampling method . Unlike probability sampling (which involves some form of random selection ), the initial individuals selected to be studied are the ones who recruit new participants.

Because not every member of the target population has an equal chance of being recruited into the sample, selection in snowball sampling is non-random.

Snowball sampling is a non-probability sampling method , where there is not an equal chance for every member of the population to be included in the sample .

This means that you cannot use inferential statistics and make generalizations —often the goal of quantitative research . As such, a snowball sample is not representative of the target population and is usually a better fit for qualitative research .

Snowball sampling relies on the use of referrals. Here, the researcher recruits one or more initial participants, who then recruit the next ones.

Participants share similar characteristics and/or know each other. Because of this, not every member of the population has an equal chance of being included in the sample, giving rise to sampling bias .

Snowball sampling is best used in the following cases:

  • If there is no sampling frame available (e.g., people with a rare disease)
  • If the population of interest is hard to access or locate (e.g., people experiencing homelessness)
  • If the research focuses on a sensitive topic (e.g., extramarital affairs)

The reproducibility and replicability of a study can be ensured by writing a transparent, detailed method section and using clear, unambiguous language.

Reproducibility and replicability are related terms.

  • Reproducing research entails reanalyzing the existing data in the same manner.
  • Replicating (or repeating ) the research entails reconducting the entire analysis, including the collection of new data . 
  • A successful reproduction shows that the data analyses were conducted in a fair and honest manner.
  • A successful replication shows that the reliability of the results is high.

Stratified sampling and quota sampling both involve dividing the population into subgroups and selecting units from each subgroup. The purpose in both cases is to select a representative sample and/or to allow comparisons between subgroups.

The main difference is that in stratified sampling, you draw a random sample from each subgroup ( probability sampling ). In quota sampling you select a predetermined number or proportion of units, in a non-random manner ( non-probability sampling ).

Purposive and convenience sampling are both sampling methods that are typically used in qualitative data collection.

A convenience sample is drawn from a source that is conveniently accessible to the researcher. Convenience sampling does not distinguish characteristics among the participants. On the other hand, purposive sampling focuses on selecting participants possessing characteristics associated with the research study.

The findings of studies based on either convenience or purposive sampling can only be generalized to the (sub)population from which the sample is drawn, and not to the entire population.

Random sampling or probability sampling is based on random selection. This means that each unit has an equal chance (i.e., equal probability) of being included in the sample.

On the other hand, convenience sampling involves stopping people at random, which means that not everyone has an equal chance of being selected depending on the place, time, or day you are collecting your data.

Convenience sampling and quota sampling are both non-probability sampling methods. They both use non-random criteria like availability, geographical proximity, or expert knowledge to recruit study participants.

However, in convenience sampling, you continue to sample units or cases until you reach the required sample size.

In quota sampling, you first need to divide your population of interest into subgroups (strata) and estimate their proportions (quota) in the population. Then you can start your data collection, using convenience sampling to recruit participants, until the proportions in each subgroup coincide with the estimated proportions in the population.

A sampling frame is a list of every member in the entire population . It is important that the sampling frame is as complete as possible, so that your sample accurately reflects your population.

Stratified and cluster sampling may look similar, but bear in mind that groups created in cluster sampling are heterogeneous , so the individual characteristics in the cluster vary. In contrast, groups created in stratified sampling are homogeneous , as units share characteristics.

Relatedly, in cluster sampling you randomly select entire groups and include all units of each group in your sample. However, in stratified sampling, you select some units of all groups and include them in your sample. In this way, both methods can ensure that your sample is representative of the target population .

A systematic review is secondary research because it uses existing research. You don’t collect new data yourself.

The key difference between observational studies and experimental designs is that a well-done observational study does not influence the responses of participants, while experiments do have some sort of treatment condition applied to at least some participants by random assignment .

An observational study is a great choice for you if your research question is based purely on observations. If there are ethical, logistical, or practical concerns that prevent you from conducting a traditional experiment , an observational study may be a good choice. In an observational study, there is no interference or manipulation of the research subjects, as well as no control or treatment groups .

It’s often best to ask a variety of people to review your measurements. You can ask experts, such as other researchers, or laypeople, such as potential participants, to judge the face validity of tests.

While experts have a deep understanding of research methods , the people you’re studying can provide you with valuable insights you may have missed otherwise.

Face validity is important because it’s a simple first step to measuring the overall validity of a test or technique. It’s a relatively intuitive, quick, and easy way to start checking whether a new measure seems useful at first glance.

Good face validity means that anyone who reviews your measure says that it seems to be measuring what it’s supposed to. With poor face validity, someone reviewing your measure may be left confused about what you’re measuring and why you’re using this method.

Face validity is about whether a test appears to measure what it’s supposed to measure. This type of validity is concerned with whether a measure seems relevant and appropriate for what it’s assessing only on the surface.

Statistical analyses are often applied to test validity with data from your measures. You test convergent validity and discriminant validity with correlations to see if results from your test are positively or negatively related to those of other established tests.

You can also use regression analyses to assess whether your measure is actually predictive of outcomes that you expect it to predict theoretically. A regression analysis that supports your expectations strengthens your claim of construct validity .

When designing or evaluating a measure, construct validity helps you ensure you’re actually measuring the construct you’re interested in. If you don’t have construct validity, you may inadvertently measure unrelated or distinct constructs and lose precision in your research.

Construct validity is often considered the overarching type of measurement validity ,  because it covers all of the other types. You need to have face validity , content validity , and criterion validity to achieve construct validity.

Construct validity is about how well a test measures the concept it was designed to evaluate. It’s one of four types of measurement validity , which includes construct validity, face validity , and criterion validity.

There are two subtypes of construct validity.

  • Convergent validity : The extent to which your measure corresponds to measures of related constructs
  • Discriminant validity : The extent to which your measure is unrelated or negatively related to measures of distinct constructs

Naturalistic observation is a valuable tool because of its flexibility, external validity , and suitability for topics that can’t be studied in a lab setting.

The downsides of naturalistic observation include its lack of scientific control , ethical considerations , and potential for bias from observers and subjects.

Naturalistic observation is a qualitative research method where you record the behaviors of your research subjects in real world settings. You avoid interfering or influencing anything in a naturalistic observation.

You can think of naturalistic observation as “people watching” with a purpose.

A dependent variable is what changes as a result of the independent variable manipulation in experiments . It’s what you’re interested in measuring, and it “depends” on your independent variable.

In statistics, dependent variables are also called:

  • Response variables (they respond to a change in another variable)
  • Outcome variables (they represent the outcome you want to measure)
  • Left-hand-side variables (they appear on the left-hand side of a regression equation)

An independent variable is the variable you manipulate, control, or vary in an experimental study to explore its effects. It’s called “independent” because it’s not influenced by any other variables in the study.

Independent variables are also called:

  • Explanatory variables (they explain an event or outcome)
  • Predictor variables (they can be used to predict the value of a dependent variable)
  • Right-hand-side variables (they appear on the right-hand side of a regression equation).

As a rule of thumb, questions related to thoughts, beliefs, and feelings work well in focus groups. Take your time formulating strong questions, paying special attention to phrasing. Be careful to avoid leading questions , which can bias your responses.

Overall, your focus group questions should be:

  • Open-ended and flexible
  • Impossible to answer with “yes” or “no” (questions that start with “why” or “how” are often best)
  • Unambiguous, getting straight to the point while still stimulating discussion
  • Unbiased and neutral

A structured interview is a data collection method that relies on asking questions in a set order to collect data on a topic. They are often quantitative in nature. Structured interviews are best used when: 

  • You already have a very clear understanding of your topic. Perhaps significant research has already been conducted, or you have done some prior research yourself, but you already possess a baseline for designing strong structured questions.
  • You are constrained in terms of time or resources and need to analyze your data quickly and efficiently.
  • Your research question depends on strong parity between participants, with environmental conditions held constant.

More flexible interview options include semi-structured interviews , unstructured interviews , and focus groups .

Social desirability bias is the tendency for interview participants to give responses that will be viewed favorably by the interviewer or other participants. It occurs in all types of interviews and surveys , but is most common in semi-structured interviews , unstructured interviews , and focus groups .

Social desirability bias can be mitigated by ensuring participants feel at ease and comfortable sharing their views. Make sure to pay attention to your own body language and any physical or verbal cues, such as nodding or widening your eyes.

This type of bias can also occur in observations if the participants know they’re being observed. They might alter their behavior accordingly.

The interviewer effect is a type of bias that emerges when a characteristic of an interviewer (race, age, gender identity, etc.) influences the responses given by the interviewee.

There is a risk of an interviewer effect in all types of interviews , but it can be mitigated by writing really high-quality interview questions.

A semi-structured interview is a blend of structured and unstructured types of interviews. Semi-structured interviews are best used when:

  • You have prior interview experience. Spontaneous questions are deceptively challenging, and it’s easy to accidentally ask a leading question or make a participant uncomfortable.
  • Your research question is exploratory in nature. Participant answers can guide future research questions and help you develop a more robust knowledge base for future research.

An unstructured interview is the most flexible type of interview, but it is not always the best fit for your research topic.

Unstructured interviews are best used when:

  • You are an experienced interviewer and have a very strong background in your research topic, since it is challenging to ask spontaneous, colloquial questions.
  • Your research question is exploratory in nature. While you may have developed hypotheses, you are open to discovering new or shifting viewpoints through the interview process.
  • You are seeking descriptive data, and are ready to ask questions that will deepen and contextualize your initial thoughts and hypotheses.
  • Your research depends on forming connections with your participants and making them feel comfortable revealing deeper emotions, lived experiences, or thoughts.

The four most common types of interviews are:

  • Structured interviews : The questions are predetermined in both topic and order. 
  • Semi-structured interviews : A few questions are predetermined, but other questions aren’t planned.
  • Unstructured interviews : None of the questions are predetermined.
  • Focus group interviews : The questions are presented to a group instead of one individual.

Deductive reasoning is commonly used in scientific research, and it’s especially associated with quantitative research .

In research, you might have come across something called the hypothetico-deductive method . It’s the scientific method of testing hypotheses to check whether your predictions are substantiated by real-world data.

Deductive reasoning is a logical approach where you progress from general ideas to specific conclusions. It’s often contrasted with inductive reasoning , where you start with specific observations and form general conclusions.

Deductive reasoning is also called deductive logic.

There are many different types of inductive reasoning that people use formally or informally.

Here are a few common types:

  • Inductive generalization : You use observations about a sample to come to a conclusion about the population it came from.
  • Statistical generalization: You use specific numbers about samples to make statements about populations.
  • Causal reasoning: You make cause-and-effect links between different things.
  • Sign reasoning: You make a conclusion about a correlational relationship between different things.
  • Analogical reasoning: You make a conclusion about something based on its similarities to something else.

Inductive reasoning is a bottom-up approach, while deductive reasoning is top-down.

Inductive reasoning takes you from the specific to the general, while in deductive reasoning, you make inferences by going from general premises to specific conclusions.

In inductive research , you start by making observations or gathering data. Then, you take a broad scan of your data and search for patterns. Finally, you make general conclusions that you might incorporate into theories.

Inductive reasoning is a method of drawing conclusions by going from the specific to the general. It’s usually contrasted with deductive reasoning, where you proceed from general information to specific conclusions.

Inductive reasoning is also called inductive logic or bottom-up reasoning.

Triangulation can help:

  • Reduce research bias that comes from using a single method, theory, or investigator
  • Enhance validity by approaching the same topic with different tools
  • Establish credibility by giving you a complete picture of the research problem

But triangulation can also pose problems:

  • It’s time-consuming and labor-intensive, often involving an interdisciplinary team.
  • Your results may be inconsistent or even contradictory.

There are four main types of triangulation :

  • Data triangulation : Using data from different times, spaces, and people
  • Investigator triangulation : Involving multiple researchers in collecting or analyzing data
  • Theory triangulation : Using varying theoretical perspectives in your research
  • Methodological triangulation : Using different methodologies to approach the same topic

Many academic fields use peer review , largely to determine whether a manuscript is suitable for publication. Peer review enhances the credibility of the published manuscript.

However, peer review is also common in non-academic settings. The United Nations, the European Union, and many individual nations use peer review to evaluate grant applications. It is also widely used in medical and health-related fields as a teaching or quality-of-care measure. 

Peer assessment is often used in the classroom as a pedagogical tool. Both receiving feedback and providing it are thought to enhance the learning process, helping students think critically and collaboratively.

Peer review can stop obviously problematic, falsified, or otherwise untrustworthy research from being published. It also represents an excellent opportunity to get feedback from renowned experts in your field. It acts as a first defense, helping you ensure your argument is clear and that there are no gaps, vague terms, or unanswered questions for readers who weren’t involved in the research process.

Peer-reviewed articles are considered a highly credible source due to this stringent process they go through before publication.

In general, the peer review process follows the following steps: 

  • First, the author submits the manuscript to the editor.
  • Reject the manuscript and send it back to author, or 
  • Send it onward to the selected peer reviewer(s) 
  • Next, the peer review process occurs. The reviewer provides feedback, addressing any major or minor issues with the manuscript, and gives their advice regarding what edits should be made. 
  • Lastly, the edited manuscript is sent back to the author. They input the edits, and resubmit it to the editor for publication.

Exploratory research is often used when the issue you’re studying is new or when the data collection process is challenging for some reason.

You can use exploratory research if you have a general idea or a specific question that you want to study but there is no preexisting knowledge or paradigm with which to study it.

Exploratory research is a methodology approach that explores research questions that have not previously been studied in depth. It is often used when the issue you’re studying is new, or the data collection process is challenging in some way.

Explanatory research is used to investigate how or why a phenomenon occurs. Therefore, this type of research is often one of the first stages in the research process , serving as a jumping-off point for future research.

Exploratory research aims to explore the main aspects of an under-researched problem, while explanatory research aims to explain the causes and consequences of a well-defined problem.

Explanatory research is a research method used to investigate how or why something occurs when only a small amount of information is available pertaining to that topic. It can help you increase your understanding of a given topic.

Clean data are valid, accurate, complete, consistent, unique, and uniform. Dirty data include inconsistencies and errors.

Dirty data can come from any part of the research process, including poor research design , inappropriate measurement materials, or flawed data entry.

Data cleaning takes place between data collection and data analyses. But you can use some methods even before collecting data.

For clean data, you should start by designing measures that collect valid data. Data validation at the time of data entry or collection helps you minimize the amount of data cleaning you’ll need to do.

After data collection, you can use data standardization and data transformation to clean your data. You’ll also deal with any missing values, outliers, and duplicate values.

Every dataset requires different techniques to clean dirty data , but you need to address these issues in a systematic way. You focus on finding and resolving data points that don’t agree or fit with the rest of your dataset.

These data might be missing values, outliers, duplicate values, incorrectly formatted, or irrelevant. You’ll start with screening and diagnosing your data. Then, you’ll often standardize and accept or remove data to make your dataset consistent and valid.

Data cleaning is necessary for valid and appropriate analyses. Dirty data contain inconsistencies or errors , but cleaning your data helps you minimize or resolve these.

Without data cleaning, you could end up with a Type I or II error in your conclusion. These types of erroneous conclusions can be practically significant with important consequences, because they lead to misplaced investments or missed opportunities.

Data cleaning involves spotting and resolving potential data inconsistencies or errors to improve your data quality. An error is any value (e.g., recorded weight) that doesn’t reflect the true value (e.g., actual weight) of something that’s being measured.

In this process, you review, analyze, detect, modify, or remove “dirty” data to make your dataset “clean.” Data cleaning is also called data cleansing or data scrubbing.

Research misconduct means making up or falsifying data, manipulating data analyses, or misrepresenting results in research reports. It’s a form of academic fraud.

These actions are committed intentionally and can have serious consequences; research misconduct is not a simple mistake or a point of disagreement but a serious ethical failure.

Anonymity means you don’t know who the participants are, while confidentiality means you know who they are but remove identifying information from your research report. Both are important ethical considerations .

You can only guarantee anonymity by not collecting any personally identifying information—for example, names, phone numbers, email addresses, IP addresses, physical characteristics, photos, or videos.

You can keep data confidential by using aggregate information in your research report, so that you only refer to groups of participants rather than individuals.

Research ethics matter for scientific integrity, human rights and dignity, and collaboration between science and society. These principles make sure that participation in studies is voluntary, informed, and safe.

Ethical considerations in research are a set of principles that guide your research designs and practices. These principles include voluntary participation, informed consent, anonymity, confidentiality, potential for harm, and results communication.

Scientists and researchers must always adhere to a certain code of conduct when collecting data from others .

These considerations protect the rights of research participants, enhance research validity , and maintain scientific integrity.

In multistage sampling , you can use probability or non-probability sampling methods .

For a probability sample, you have to conduct probability sampling at every stage.

You can mix it up by using simple random sampling , systematic sampling , or stratified sampling to select units at different stages, depending on what is applicable and relevant to your study.

Multistage sampling can simplify data collection when you have large, geographically spread samples, and you can obtain a probability sample without a complete sampling frame.

But multistage sampling may not lead to a representative sample, and larger samples are needed for multistage samples to achieve the statistical properties of simple random samples .

These are four of the most common mixed methods designs :

  • Convergent parallel: Quantitative and qualitative data are collected at the same time and analyzed separately. After both analyses are complete, compare your results to draw overall conclusions. 
  • Embedded: Quantitative and qualitative data are collected at the same time, but within a larger quantitative or qualitative design. One type of data is secondary to the other.
  • Explanatory sequential: Quantitative data is collected and analyzed first, followed by qualitative data. You can use this design if you think your qualitative data will explain and contextualize your quantitative findings.
  • Exploratory sequential: Qualitative data is collected and analyzed first, followed by quantitative data. You can use this design if you think the quantitative data will confirm or validate your qualitative findings.

Triangulation in research means using multiple datasets, methods, theories and/or investigators to address a research question. It’s a research strategy that can help you enhance the validity and credibility of your findings.

Triangulation is mainly used in qualitative research , but it’s also commonly applied in quantitative research . Mixed methods research always uses triangulation.

In multistage sampling , or multistage cluster sampling, you draw a sample from a population using smaller and smaller groups at each stage.

This method is often used to collect data from a large, geographically spread group of people in national surveys, for example. You take advantage of hierarchical groupings (e.g., from state to city to neighborhood) to create a sample that’s less expensive and time-consuming to collect data from.

No, the steepness or slope of the line isn’t related to the correlation coefficient value. The correlation coefficient only tells you how closely your data fit on a line, so two datasets with the same correlation coefficient can have very different slopes.

To find the slope of the line, you’ll need to perform a regression analysis .

Correlation coefficients always range between -1 and 1.

The sign of the coefficient tells you the direction of the relationship: a positive value means the variables change together in the same direction, while a negative value means they change together in opposite directions.

The absolute value of a number is equal to the number without its sign. The absolute value of a correlation coefficient tells you the magnitude of the correlation: the greater the absolute value, the stronger the correlation.

These are the assumptions your data must meet if you want to use Pearson’s r :

  • Both variables are on an interval or ratio level of measurement
  • Data from both variables follow normal distributions
  • Your data have no outliers
  • Your data is from a random or representative sample
  • You expect a linear relationship between the two variables

Quantitative research designs can be divided into two main categories:

  • Correlational and descriptive designs are used to investigate characteristics, averages, trends, and associations between variables.
  • Experimental and quasi-experimental designs are used to test causal relationships .

Qualitative research designs tend to be more flexible. Common types of qualitative design include case study , ethnography , and grounded theory designs.

A well-planned research design helps ensure that your methods match your research aims, that you collect high-quality data, and that you use the right kind of analysis to answer your questions, utilizing credible sources . This allows you to draw valid , trustworthy conclusions.

The priorities of a research design can vary depending on the field, but you usually have to specify:

  • Your research questions and/or hypotheses
  • Your overall approach (e.g., qualitative or quantitative )
  • The type of design you’re using (e.g., a survey , experiment , or case study )
  • Your sampling methods or criteria for selecting subjects
  • Your data collection methods (e.g., questionnaires , observations)
  • Your data collection procedures (e.g., operationalization , timing and data management)
  • Your data analysis methods (e.g., statistical tests  or thematic analysis )

A research design is a strategy for answering your   research question . It defines your overall approach and determines how you will collect and analyze data.

Questionnaires can be self-administered or researcher-administered.

Self-administered questionnaires can be delivered online or in paper-and-pen formats, in person or through mail. All questions are standardized so that all respondents receive the same questions with identical wording.

Researcher-administered questionnaires are interviews that take place by phone, in-person, or online between researchers and respondents. You can gain deeper insights by clarifying questions for respondents or asking follow-up questions.

You can organize the questions logically, with a clear progression from simple to complex, or randomly between respondents. A logical flow helps respondents process the questionnaire easier and quicker, but it may lead to bias. Randomization can minimize the bias from order effects.

Closed-ended, or restricted-choice, questions offer respondents a fixed set of choices to select from. These questions are easier to answer quickly.

Open-ended or long-form questions allow respondents to answer in their own words. Because there are no restrictions on their choices, respondents can answer in ways that researchers may not have otherwise considered.

A questionnaire is a data collection tool or instrument, while a survey is an overarching research method that involves collecting and analyzing data from people using questionnaires.

The third variable and directionality problems are two main reasons why correlation isn’t causation .

The third variable problem means that a confounding variable affects both variables to make them seem causally related when they are not.

The directionality problem is when two variables correlate and might actually have a causal relationship, but it’s impossible to conclude which variable causes changes in the other.

Correlation describes an association between variables : when one variable changes, so does the other. A correlation is a statistical indicator of the relationship between variables.

Causation means that changes in one variable brings about changes in the other (i.e., there is a cause-and-effect relationship between variables). The two variables are correlated with each other, and there’s also a causal link between them.

While causation and correlation can exist simultaneously, correlation does not imply causation. In other words, correlation is simply a relationship where A relates to B—but A doesn’t necessarily cause B to happen (or vice versa). Mistaking correlation for causation is a common error and can lead to false cause fallacy .

Controlled experiments establish causality, whereas correlational studies only show associations between variables.

  • In an experimental design , you manipulate an independent variable and measure its effect on a dependent variable. Other variables are controlled so they can’t impact the results.
  • In a correlational design , you measure variables without manipulating any of them. You can test whether your variables change together, but you can’t be sure that one variable caused a change in another.

In general, correlational research is high in external validity while experimental research is high in internal validity .

A correlation is usually tested for two variables at a time, but you can test correlations between three or more variables.

A correlation coefficient is a single number that describes the strength and direction of the relationship between your variables.

Different types of correlation coefficients might be appropriate for your data based on their levels of measurement and distributions . The Pearson product-moment correlation coefficient (Pearson’s r ) is commonly used to assess a linear relationship between two quantitative variables.

A correlational research design investigates relationships between two variables (or more) without the researcher controlling or manipulating any of them. It’s a non-experimental type of quantitative research .

A correlation reflects the strength and/or direction of the association between two or more variables.

  • A positive correlation means that both variables change in the same direction.
  • A negative correlation means that the variables change in opposite directions.
  • A zero correlation means there’s no relationship between the variables.

Random error  is almost always present in scientific studies, even in highly controlled settings. While you can’t eradicate it completely, you can reduce random error by taking repeated measurements, using a large sample, and controlling extraneous variables .

You can avoid systematic error through careful design of your sampling , data collection , and analysis procedures. For example, use triangulation to measure your variables using multiple methods; regularly calibrate instruments or procedures; use random sampling and random assignment ; and apply masking (blinding) where possible.

Systematic error is generally a bigger problem in research.

With random error, multiple measurements will tend to cluster around the true value. When you’re collecting data from a large sample , the errors in different directions will cancel each other out.

Systematic errors are much more problematic because they can skew your data away from the true value. This can lead you to false conclusions ( Type I and II errors ) about the relationship between the variables you’re studying.

Random and systematic error are two types of measurement error.

Random error is a chance difference between the observed and true values of something (e.g., a researcher misreading a weighing scale records an incorrect measurement).

Systematic error is a consistent or proportional difference between the observed and true values of something (e.g., a miscalibrated scale consistently records weights as higher than they actually are).

On graphs, the explanatory variable is conventionally placed on the x-axis, while the response variable is placed on the y-axis.

  • If you have quantitative variables , use a scatterplot or a line graph.
  • If your response variable is categorical, use a scatterplot or a line graph.
  • If your explanatory variable is categorical, use a bar graph.

The term “ explanatory variable ” is sometimes preferred over “ independent variable ” because, in real world contexts, independent variables are often influenced by other variables. This means they aren’t totally independent.

Multiple independent variables may also be correlated with each other, so “explanatory variables” is a more appropriate term.

The difference between explanatory and response variables is simple:

  • An explanatory variable is the expected cause, and it explains the results.
  • A response variable is the expected effect, and it responds to other variables.

In a controlled experiment , all extraneous variables are held constant so that they can’t influence the results. Controlled experiments require:

  • A control group that receives a standard treatment, a fake treatment, or no treatment.
  • Random assignment of participants to ensure the groups are equivalent.

Depending on your study topic, there are various other methods of controlling variables .

There are 4 main types of extraneous variables :

  • Demand characteristics : environmental cues that encourage participants to conform to researchers’ expectations.
  • Experimenter effects : unintentional actions by researchers that influence study outcomes.
  • Situational variables : environmental variables that alter participants’ behaviors.
  • Participant variables : any characteristic or aspect of a participant’s background that could affect study results.

An extraneous variable is any variable that you’re not investigating that can potentially affect the dependent variable of your research study.

A confounding variable is a type of extraneous variable that not only affects the dependent variable, but is also related to the independent variable.

In a factorial design, multiple independent variables are tested.

If you test two variables, each level of one independent variable is combined with each level of the other independent variable to create different conditions.

Within-subjects designs have many potential threats to internal validity , but they are also very statistically powerful .

Advantages:

  • Only requires small samples
  • Statistically powerful
  • Removes the effects of individual differences on the outcomes

Disadvantages:

  • Internal validity threats reduce the likelihood of establishing a direct relationship between variables
  • Time-related effects, such as growth, can influence the outcomes
  • Carryover effects mean that the specific order of different treatments affect the outcomes

While a between-subjects design has fewer threats to internal validity , it also requires more participants for high statistical power than a within-subjects design .

  • Prevents carryover effects of learning and fatigue.
  • Shorter study duration.
  • Needs larger samples for high power.
  • Uses more resources to recruit participants, administer sessions, cover costs, etc.
  • Individual differences may be an alternative explanation for results.

Yes. Between-subjects and within-subjects designs can be combined in a single study when you have two or more independent variables (a factorial design). In a mixed factorial design, one variable is altered between subjects and another is altered within subjects.

In a between-subjects design , every participant experiences only one condition, and researchers assess group differences between participants in various conditions.

In a within-subjects design , each participant experiences all conditions, and researchers test the same participants repeatedly for differences between conditions.

The word “between” means that you’re comparing different conditions between groups, while the word “within” means you’re comparing different conditions within the same group.

Random assignment is used in experiments with a between-groups or independent measures design. In this research design, there’s usually a control group and one or more experimental groups. Random assignment helps ensure that the groups are comparable.

In general, you should always use random assignment in this type of experimental design when it is ethically possible and makes sense for your study topic.

To implement random assignment , assign a unique number to every member of your study’s sample .

Then, you can use a random number generator or a lottery method to randomly assign each number to a control or experimental group. You can also do so manually, by flipping a coin or rolling a dice to randomly assign participants to groups.

Random selection, or random sampling , is a way of selecting members of a population for your study’s sample.

In contrast, random assignment is a way of sorting the sample into control and experimental groups.

Random sampling enhances the external validity or generalizability of your results, while random assignment improves the internal validity of your study.

In experimental research, random assignment is a way of placing participants from your sample into different groups using randomization. With this method, every member of the sample has a known or equal chance of being placed in a control group or an experimental group.

“Controlling for a variable” means measuring extraneous variables and accounting for them statistically to remove their effects on other variables.

Researchers often model control variable data along with independent and dependent variable data in regression analyses and ANCOVAs . That way, you can isolate the control variable’s effects from the relationship between the variables of interest.

Control variables help you establish a correlational or causal relationship between variables by enhancing internal validity .

If you don’t control relevant extraneous variables , they may influence the outcomes of your study, and you may not be able to demonstrate that your results are really an effect of your independent variable .

A control variable is any variable that’s held constant in a research study. It’s not a variable of interest in the study, but it’s controlled because it could influence the outcomes.

Including mediators and moderators in your research helps you go beyond studying a simple relationship between two variables for a fuller picture of the real world. They are important to consider when studying complex correlational or causal relationships.

Mediators are part of the causal pathway of an effect, and they tell you how or why an effect takes place. Moderators usually help you judge the external validity of your study by identifying the limitations of when the relationship between variables holds.

If something is a mediating variable :

  • It’s caused by the independent variable .
  • It influences the dependent variable
  • When it’s taken into account, the statistical correlation between the independent and dependent variables is higher than when it isn’t considered.

A confounder is a third variable that affects variables of interest and makes them seem related when they are not. In contrast, a mediator is the mechanism of a relationship between two variables: it explains the process by which they are related.

A mediator variable explains the process through which two variables are related, while a moderator variable affects the strength and direction of that relationship.

There are three key steps in systematic sampling :

  • Define and list your population , ensuring that it is not ordered in a cyclical or periodic order.
  • Decide on your sample size and calculate your interval, k , by dividing your population by your target sample size.
  • Choose every k th member of the population as your sample.

Systematic sampling is a probability sampling method where researchers select members of the population at a regular interval – for example, by selecting every 15th person on a list of the population. If the population is in a random order, this can imitate the benefits of simple random sampling .

Yes, you can create a stratified sample using multiple characteristics, but you must ensure that every participant in your study belongs to one and only one subgroup. In this case, you multiply the numbers of subgroups for each characteristic to get the total number of groups.

For example, if you were stratifying by location with three subgroups (urban, rural, or suburban) and marital status with five subgroups (single, divorced, widowed, married, or partnered), you would have 3 x 5 = 15 subgroups.

You should use stratified sampling when your sample can be divided into mutually exclusive and exhaustive subgroups that you believe will take on different mean values for the variable that you’re studying.

Using stratified sampling will allow you to obtain more precise (with lower variance ) statistical estimates of whatever you are trying to measure.

For example, say you want to investigate how income differs based on educational attainment, but you know that this relationship can vary based on race. Using stratified sampling, you can ensure you obtain a large enough sample from each racial group, allowing you to draw more precise conclusions.

In stratified sampling , researchers divide subjects into subgroups called strata based on characteristics that they share (e.g., race, gender, educational attainment).

Once divided, each subgroup is randomly sampled using another probability sampling method.

Cluster sampling is more time- and cost-efficient than other probability sampling methods , particularly when it comes to large samples spread across a wide geographical area.

However, it provides less statistical certainty than other methods, such as simple random sampling , because it is difficult to ensure that your clusters properly represent the population as a whole.

There are three types of cluster sampling : single-stage, double-stage and multi-stage clustering. In all three types, you first divide the population into clusters, then randomly select clusters for use in your sample.

  • In single-stage sampling , you collect data from every unit within the selected clusters.
  • In double-stage sampling , you select a random sample of units from within the clusters.
  • In multi-stage sampling , you repeat the procedure of randomly sampling elements from within the clusters until you have reached a manageable sample.

Cluster sampling is a probability sampling method in which you divide a population into clusters, such as districts or schools, and then randomly select some of these clusters as your sample.

The clusters should ideally each be mini-representations of the population as a whole.

If properly implemented, simple random sampling is usually the best sampling method for ensuring both internal and external validity . However, it can sometimes be impractical and expensive to implement, depending on the size of the population to be studied,

If you have a list of every member of the population and the ability to reach whichever members are selected, you can use simple random sampling.

The American Community Survey  is an example of simple random sampling . In order to collect detailed data on the population of the US, the Census Bureau officials randomly select 3.5 million households per year and use a variety of methods to convince them to fill out the survey.

Simple random sampling is a type of probability sampling in which the researcher randomly selects a subset of participants from a population . Each member of the population has an equal chance of being selected. Data is then collected from as large a percentage as possible of this random subset.

Quasi-experimental design is most useful in situations where it would be unethical or impractical to run a true experiment .

Quasi-experiments have lower internal validity than true experiments, but they often have higher external validity  as they can use real-world interventions instead of artificial laboratory settings.

A quasi-experiment is a type of research design that attempts to establish a cause-and-effect relationship. The main difference with a true experiment is that the groups are not randomly assigned.

Blinding is important to reduce research bias (e.g., observer bias , demand characteristics ) and ensure a study’s internal validity .

If participants know whether they are in a control or treatment group , they may adjust their behavior in ways that affect the outcome that researchers are trying to measure. If the people administering the treatment are aware of group assignment, they may treat participants differently and thus directly or indirectly influence the final results.

  • In a single-blind study , only the participants are blinded.
  • In a double-blind study , both participants and experimenters are blinded.
  • In a triple-blind study , the assignment is hidden not only from participants and experimenters, but also from the researchers analyzing the data.

Blinding means hiding who is assigned to the treatment group and who is assigned to the control group in an experiment .

A true experiment (a.k.a. a controlled experiment) always includes at least one control group that doesn’t receive the experimental treatment.

However, some experiments use a within-subjects design to test treatments without a control group. In these designs, you usually compare one group’s outcomes before and after a treatment (instead of comparing outcomes between different groups).

For strong internal validity , it’s usually best to include a control group if possible. Without a control group, it’s harder to be certain that the outcome was caused by the experimental treatment and not by other variables.

An experimental group, also known as a treatment group, receives the treatment whose effect researchers wish to study, whereas a control group does not. They should be identical in all other ways.

Individual Likert-type questions are generally considered ordinal data , because the items have clear rank order, but don’t have an even distribution.

Overall Likert scale scores are sometimes treated as interval data. These scores are considered to have directionality and even spacing between them.

The type of data determines what statistical tests you should use to analyze your data.

A Likert scale is a rating scale that quantitatively assesses opinions, attitudes, or behaviors. It is made up of 4 or more questions that measure a single attitude or trait when response scores are combined.

To use a Likert scale in a survey , you present participants with Likert-type questions or statements, and a continuum of items, usually with 5 or 7 possible responses, to capture their degree of agreement.

In scientific research, concepts are the abstract ideas or phenomena that are being studied (e.g., educational achievement). Variables are properties or characteristics of the concept (e.g., performance at school), while indicators are ways of measuring or quantifying variables (e.g., yearly grade reports).

The process of turning abstract concepts into measurable variables and indicators is called operationalization .

There are various approaches to qualitative data analysis , but they all share five steps in common:

  • Prepare and organize your data.
  • Review and explore your data.
  • Develop a data coding system.
  • Assign codes to the data.
  • Identify recurring themes.

The specifics of each step depend on the focus of the analysis. Some common approaches include textual analysis , thematic analysis , and discourse analysis .

There are five common approaches to qualitative research :

  • Grounded theory involves collecting data in order to develop new theories.
  • Ethnography involves immersing yourself in a group or organization to understand its culture.
  • Narrative research involves interpreting stories to understand how people make sense of their experiences and perceptions.
  • Phenomenological research involves investigating phenomena through people’s lived experiences.
  • Action research links theory and practice in several cycles to drive innovative changes.

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is used by scientists to test specific predictions, called hypotheses , by calculating how likely it is that a pattern or relationship between variables could have arisen by chance.

Operationalization means turning abstract conceptual ideas into measurable observations.

For example, the concept of social anxiety isn’t directly observable, but it can be operationally defined in terms of self-rating scores, behavioral avoidance of crowded places, or physical anxiety symptoms in social situations.

Before collecting data , it’s important to consider how you will operationalize the variables that you want to measure.

When conducting research, collecting original data has significant advantages:

  • You can tailor data collection to your specific research aims (e.g. understanding the needs of your consumers or user testing your website)
  • You can control and standardize the process for high reliability and validity (e.g. choosing appropriate measurements and sampling methods )

However, there are also some drawbacks: data collection can be time-consuming, labor-intensive and expensive. In some cases, it’s more efficient to use secondary data that has already been collected by someone else, but the data might be less reliable.

Data collection is the systematic process by which observations or measurements are gathered in research. It is used in many different contexts by academics, governments, businesses, and other organizations.

There are several methods you can use to decrease the impact of confounding variables on your research: restriction, matching, statistical control and randomization.

In restriction , you restrict your sample by only including certain subjects that have the same values of potential confounding variables.

In matching , you match each of the subjects in your treatment group with a counterpart in the comparison group. The matched subjects have the same values on any potential confounding variables, and only differ in the independent variable .

In statistical control , you include potential confounders as variables in your regression .

In randomization , you randomly assign the treatment (or independent variable) in your study to a sufficiently large number of subjects, which allows you to control for all potential confounding variables.

A confounding variable is closely related to both the independent and dependent variables in a study. An independent variable represents the supposed cause , while the dependent variable is the supposed effect . A confounding variable is a third variable that influences both the independent and dependent variables.

Failing to account for confounding variables can cause you to wrongly estimate the relationship between your independent and dependent variables.

To ensure the internal validity of your research, you must consider the impact of confounding variables. If you fail to account for them, you might over- or underestimate the causal relationship between your independent and dependent variables , or even find a causal relationship where none exists.

Yes, but including more than one of either type requires multiple research questions .

For example, if you are interested in the effect of a diet on health, you can use multiple measures of health: blood sugar, blood pressure, weight, pulse, and many more. Each of these is its own dependent variable with its own research question.

You could also choose to look at the effect of exercise levels as well as diet, or even the additional effect of the two combined. Each of these is a separate independent variable .

To ensure the internal validity of an experiment , you should only change one independent variable at a time.

No. The value of a dependent variable depends on an independent variable, so a variable cannot be both independent and dependent at the same time. It must be either the cause or the effect, not both!

You want to find out how blood sugar levels are affected by drinking diet soda and regular soda, so you conduct an experiment .

  • The type of soda – diet or regular – is the independent variable .
  • The level of blood sugar that you measure is the dependent variable – it changes depending on the type of soda.

Determining cause and effect is one of the most important parts of scientific research. It’s essential to know which is the cause – the independent variable – and which is the effect – the dependent variable.

In non-probability sampling , the sample is selected based on non-random criteria, and not every member of the population has a chance of being included.

Common non-probability sampling methods include convenience sampling , voluntary response sampling, purposive sampling , snowball sampling, and quota sampling .

Probability sampling means that every member of the target population has a known chance of being included in the sample.

Probability sampling methods include simple random sampling , systematic sampling , stratified sampling , and cluster sampling .

Using careful research design and sampling procedures can help you avoid sampling bias . Oversampling can be used to correct undercoverage bias .

Some common types of sampling bias include self-selection bias , nonresponse bias , undercoverage bias , survivorship bias , pre-screening or advertising bias, and healthy user bias.

Sampling bias is a threat to external validity – it limits the generalizability of your findings to a broader group of people.

A sampling error is the difference between a population parameter and a sample statistic .

A statistic refers to measures about the sample , while a parameter refers to measures about the population .

Populations are used when a research question requires data from every member of the population. This is usually only feasible when the population is small and easily accessible.

Samples are used to make inferences about populations . Samples are easier to collect data from because they are practical, cost-effective, convenient, and manageable.

There are seven threats to external validity : selection bias , history, experimenter effect, Hawthorne effect , testing effect, aptitude-treatment and situation effect.

The two types of external validity are population validity (whether you can generalize to other groups of people) and ecological validity (whether you can generalize to other situations and settings).

The external validity of a study is the extent to which you can generalize your findings to different groups of people, situations, and measures.

Cross-sectional studies cannot establish a cause-and-effect relationship or analyze behavior over a period of time. To investigate cause and effect, you need to do a longitudinal study or an experimental study .

Cross-sectional studies are less expensive and time-consuming than many other types of study. They can provide useful insights into a population’s characteristics and identify correlations for further research.

Sometimes only cross-sectional data is available for analysis; other times your research question may only require a cross-sectional study to answer it.

Longitudinal studies can last anywhere from weeks to decades, although they tend to be at least a year long.

The 1970 British Cohort Study , which has collected data on the lives of 17,000 Brits since their births in 1970, is one well-known example of a longitudinal study .

Longitudinal studies are better to establish the correct sequence of events, identify changes over time, and provide insight into cause-and-effect relationships, but they also tend to be more expensive and time-consuming than other types of studies.

Longitudinal studies and cross-sectional studies are two different types of research design . In a cross-sectional study you collect data from a population at a specific point in time; in a longitudinal study you repeatedly collect data from the same sample over an extended period of time.

Longitudinal study Cross-sectional study
observations Observations at a in time
Observes the multiple times Observes (a “cross-section”) in the population
Follows in participants over time Provides of society at a given point

There are eight threats to internal validity : history, maturation, instrumentation, testing, selection bias , regression to the mean, social interaction and attrition .

Internal validity is the extent to which you can be confident that a cause-and-effect relationship established in a study cannot be explained by other factors.

In mixed methods research , you use both qualitative and quantitative data collection and analysis methods to answer your research question .

The research methods you use depend on the type of data you need to answer your research question .

  • If you want to measure something or test a hypothesis , use quantitative methods . If you want to explore ideas, thoughts and meanings, use qualitative methods .
  • If you want to analyze a large amount of readily-available data, use secondary data. If you want data specific to your purposes with control over how it is generated, collect primary data.
  • If you want to establish cause-and-effect relationships between variables , use experimental methods. If you want to understand the characteristics of a research subject, use descriptive methods.

A confounding variable , also called a confounder or confounding factor, is a third variable in a study examining a potential cause-and-effect relationship.

A confounding variable is related to both the supposed cause and the supposed effect of the study. It can be difficult to separate the true effect of the independent variable from the effect of the confounding variable.

In your research design , it’s important to identify potential confounding variables and plan how you will reduce their impact.

Discrete and continuous variables are two types of quantitative variables :

  • Discrete variables represent counts (e.g. the number of objects in a collection).
  • Continuous variables represent measurable amounts (e.g. water volume or weight).

Quantitative variables are any variables where the data represent amounts (e.g. height, weight, or age).

Categorical variables are any variables where the data represent groups. This includes rankings (e.g. finishing places in a race), classifications (e.g. brands of cereal), and binary outcomes (e.g. coin flips).

You need to know what type of variables you are working with to choose the right statistical test for your data and interpret your results .

You can think of independent and dependent variables in terms of cause and effect: an independent variable is the variable you think is the cause , while a dependent variable is the effect .

In an experiment, you manipulate the independent variable and measure the outcome in the dependent variable. For example, in an experiment about the effect of nutrients on crop growth:

  • The  independent variable  is the amount of nutrients added to the crop field.
  • The  dependent variable is the biomass of the crops at harvest time.

Defining your variables, and deciding how you will manipulate and measure them, is an important part of experimental design .

Experimental design means planning a set of procedures to investigate a relationship between variables . To design a controlled experiment, you need:

  • A testable hypothesis
  • At least one independent variable that can be precisely manipulated
  • At least one dependent variable that can be precisely measured

When designing the experiment, you decide:

  • How you will manipulate the variable(s)
  • How you will control for any potential confounding variables
  • How many subjects or samples will be included in the study
  • How subjects will be assigned to treatment levels

Experimental design is essential to the internal and external validity of your experiment.

I nternal validity is the degree of confidence that the causal relationship you are testing is not influenced by other factors or variables .

External validity is the extent to which your results can be generalized to other contexts.

The validity of your experiment depends on your experimental design .

Reliability and validity are both about how well a method measures something:

  • Reliability refers to the  consistency of a measure (whether the results can be reproduced under the same conditions).
  • Validity   refers to the  accuracy of a measure (whether the results really do represent what they are supposed to measure).

If you are doing experimental research, you also have to consider the internal and external validity of your experiment.

A sample is a subset of individuals from a larger population . Sampling means selecting the group that you will actually collect data from in your research. For example, if you are researching the opinions of students in your university, you could survey a sample of 100 students.

In statistics, sampling allows you to test a hypothesis about the characteristics of a population.

Quantitative research deals with numbers and statistics, while qualitative research deals with words and meanings.

Quantitative methods allow you to systematically measure variables and test hypotheses . Qualitative methods allow you to explore concepts and experiences in more detail.

Methodology refers to the overarching strategy and rationale of your research project . It involves studying the methods used in your field and the theories or principles behind them, in order to develop an approach that matches your objectives.

Methods are the specific tools and procedures you use to collect and analyze data (for example, experiments, surveys , and statistical tests ).

In shorter scientific papers, where the aim is to report the findings of a specific study, you might simply describe what you did in a methods section .

In a longer or more complex research project, such as a thesis or dissertation , you will probably include a methodology section , where you explain your approach to answering the research questions and cite relevant sources to support your choice of methods.

Ask our team

Want to contact us directly? No problem.  We  are always here for you.

Support team - Nina

Our team helps students graduate by offering:

  • A world-class citation generator
  • Plagiarism Checker software powered by Turnitin
  • Innovative Citation Checker software
  • Professional proofreading services
  • Over 300 helpful articles about academic writing, citing sources, plagiarism, and more

Scribbr specializes in editing study-related documents . We proofread:

  • PhD dissertations
  • Research proposals
  • Personal statements
  • Admission essays
  • Motivation letters
  • Reflection papers
  • Journal articles
  • Capstone projects

Scribbr’s Plagiarism Checker is powered by elements of Turnitin’s Similarity Checker , namely the plagiarism detection software and the Internet Archive and Premium Scholarly Publications content databases .

The add-on AI detector is powered by Scribbr’s proprietary software.

The Scribbr Citation Generator is developed using the open-source Citation Style Language (CSL) project and Frank Bennett’s citeproc-js . It’s the same technology used by dozens of other popular citation tools, including Mendeley and Zotero.

You can find all the citation styles and locales used in the Scribbr Citation Generator in our publicly accessible repository on Github .

What Is a Hypothesis? (Science)

If...,Then...

Angela Lumsden/Getty Images

  • Scientific Method
  • Chemical Laws
  • Periodic Table
  • Projects & Experiments
  • Biochemistry
  • Physical Chemistry
  • Medical Chemistry
  • Chemistry In Everyday Life
  • Famous Chemists
  • Activities for Kids
  • Abbreviations & Acronyms
  • Weather & Climate
  • Ph.D., Biomedical Sciences, University of Tennessee at Knoxville
  • B.A., Physics and Mathematics, Hastings College

A hypothesis (plural hypotheses) is a proposed explanation for an observation. The definition depends on the subject.

In science, a hypothesis is part of the scientific method. It is a prediction or explanation that is tested by an experiment. Observations and experiments may disprove a scientific hypothesis, but can never entirely prove one.

In the study of logic, a hypothesis is an if-then proposition, typically written in the form, "If X , then Y ."

In common usage, a hypothesis is simply a proposed explanation or prediction, which may or may not be tested.

Writing a Hypothesis

Most scientific hypotheses are proposed in the if-then format because it's easy to design an experiment to see whether or not a cause and effect relationship exists between the independent variable and the dependent variable . The hypothesis is written as a prediction of the outcome of the experiment.

Null Hypothesis and Alternative Hypothesis

Statistically, it's easier to show there is no relationship between two variables than to support their connection. So, scientists often propose the null hypothesis . The null hypothesis assumes changing the independent variable will have no effect on the dependent variable.

In contrast, the alternative hypothesis suggests changing the independent variable will have an effect on the dependent variable. Designing an experiment to test this hypothesis can be trickier because there are many ways to state an alternative hypothesis.

For example, consider a possible relationship between getting a good night's sleep and getting good grades. The null hypothesis might be stated: "The number of hours of sleep students get is unrelated to their grades" or "There is no correlation between hours of sleep and grades."

An experiment to test this hypothesis might involve collecting data, recording average hours of sleep for each student and grades. If a student who gets eight hours of sleep generally does better than students who get four hours of sleep or 10 hours of sleep, the hypothesis might be rejected.

But the alternative hypothesis is harder to propose and test. The most general statement would be: "The amount of sleep students get affects their grades." The hypothesis might also be stated as "If you get more sleep, your grades will improve" or "Students who get nine hours of sleep have better grades than those who get more or less sleep."

In an experiment, you can collect the same data, but the statistical analysis is less likely to give you a high confidence limit.

Usually, a scientist starts out with the null hypothesis. From there, it may be possible to propose and test an alternative hypothesis, to narrow down the relationship between the variables.

Example of a Hypothesis

Examples of a hypothesis include:

  • If you drop a rock and a feather, (then) they will fall at the same rate.
  • Plants need sunlight in order to live. (if sunlight, then life)
  • Eating sugar gives you energy. (if sugar, then energy)
  • White, Jay D.  Research in Public Administration . Conn., 1998.
  • Schick, Theodore, and Lewis Vaughn.  How to Think about Weird Things: Critical Thinking for a New Age . McGraw-Hill Higher Education, 2002.
  • Null Hypothesis Examples
  • Examples of Independent and Dependent Variables
  • Difference Between Independent and Dependent Variables
  • Null Hypothesis Definition and Examples
  • Definition of a Hypothesis
  • What Are the Elements of a Good Hypothesis?
  • Six Steps of the Scientific Method
  • Independent Variable Definition and Examples
  • What Are Examples of a Hypothesis?
  • Understanding Simple vs Controlled Experiments
  • Scientific Method Flow Chart
  • Scientific Method Vocabulary Terms
  • What Is a Testable Hypothesis?
  • What 'Fail to Reject' Means in a Hypothesis Test
  • How To Design a Science Fair Experiment
  • What Is an Experiment? Definition and Design

PrepScholar

Choose Your Test

Sat / act prep online guides and tips, what is a hypothesis and how do i write one.

author image

General Education

body-glowing-question-mark

Think about something strange and unexplainable in your life. Maybe you get a headache right before it rains, or maybe you think your favorite sports team wins when you wear a certain color. If you wanted to see whether these are just coincidences or scientific fact, you would form a hypothesis, then create an experiment to see whether that hypothesis is true or not.

But what is a hypothesis, anyway? If you’re not sure about what a hypothesis is--or how to test for one!--you’re in the right place. This article will teach you everything you need to know about hypotheses, including: 

  • Defining the term “hypothesis” 
  • Providing hypothesis examples 
  • Giving you tips for how to write your own hypothesis

So let’s get started!

body-picture-ask-sign

What Is a Hypothesis?

Merriam Webster defines a hypothesis as “an assumption or concession made for the sake of argument.” In other words, a hypothesis is an educated guess . Scientists make a reasonable assumption--or a hypothesis--then design an experiment to test whether it’s true or not. Keep in mind that in science, a hypothesis should be testable. You have to be able to design an experiment that tests your hypothesis in order for it to be valid. 

As you could assume from that statement, it’s easy to make a bad hypothesis. But when you’re holding an experiment, it’s even more important that your guesses be good...after all, you’re spending time (and maybe money!) to figure out more about your observation. That’s why we refer to a hypothesis as an educated guess--good hypotheses are based on existing data and research to make them as sound as possible.

Hypotheses are one part of what’s called the scientific method .  Every (good) experiment or study is based in the scientific method. The scientific method gives order and structure to experiments and ensures that interference from scientists or outside influences does not skew the results. It’s important that you understand the concepts of the scientific method before holding your own experiment. Though it may vary among scientists, the scientific method is generally made up of six steps (in order):

  • Observation
  • Asking questions
  • Forming a hypothesis
  • Analyze the data
  • Communicate your results

You’ll notice that the hypothesis comes pretty early on when conducting an experiment. That’s because experiments work best when they’re trying to answer one specific question. And you can’t conduct an experiment until you know what you’re trying to prove!

Independent and Dependent Variables 

After doing your research, you’re ready for another important step in forming your hypothesis: identifying variables. Variables are basically any factor that could influence the outcome of your experiment . Variables have to be measurable and related to the topic being studied.

There are two types of variables:  independent variables and dependent variables. I ndependent variables remain constant . For example, age is an independent variable; it will stay the same, and researchers can look at different ages to see if it has an effect on the dependent variable. 

Speaking of dependent variables... dependent variables are subject to the influence of the independent variable , meaning that they are not constant. Let’s say you want to test whether a person’s age affects how much sleep they need. In that case, the independent variable is age (like we mentioned above), and the dependent variable is how much sleep a person gets. 

Variables will be crucial in writing your hypothesis. You need to be able to identify which variable is which, as both the independent and dependent variables will be written into your hypothesis. For instance, in a study about exercise, the independent variable might be the speed at which the respondents walk for thirty minutes, and the dependent variable would be their heart rate. In your study and in your hypothesis, you’re trying to understand the relationship between the two variables.

Elements of a Good Hypothesis

The best hypotheses start by asking the right questions . For instance, if you’ve observed that the grass is greener when it rains twice a week, you could ask what kind of grass it is, what elevation it’s at, and if the grass across the street responds to rain in the same way. Any of these questions could become the backbone of experiments to test why the grass gets greener when it rains fairly frequently.

As you’re asking more questions about your first observation, make sure you’re also making more observations . If it doesn’t rain for two weeks and the grass still looks green, that’s an important observation that could influence your hypothesis. You'll continue observing all throughout your experiment, but until the hypothesis is finalized, every observation should be noted.

Finally, you should consult secondary research before writing your hypothesis . Secondary research is comprised of results found and published by other people. You can usually find this information online or at your library. Additionally, m ake sure the research you find is credible and related to your topic. If you’re studying the correlation between rain and grass growth, it would help you to research rain patterns over the past twenty years for your county, published by a local agricultural association. You should also research the types of grass common in your area, the type of grass in your lawn, and whether anyone else has conducted experiments about your hypothesis. Also be sure you’re checking the quality of your research . Research done by a middle school student about what minerals can be found in rainwater would be less useful than an article published by a local university.

body-pencil-notebook-writing

Writing Your Hypothesis

Once you’ve considered all of the factors above, you’re ready to start writing your hypothesis. Hypotheses usually take a certain form when they’re written out in a research report.

When you boil down your hypothesis statement, you are writing down your best guess and not the question at hand . This means that your statement should be written as if it is fact already, even though you are simply testing it.

The reason for this is that, after you have completed your study, you'll either accept or reject your if-then or your null hypothesis. All hypothesis testing examples should be measurable and able to be confirmed or denied. You cannot confirm a question, only a statement! 

In fact, you come up with hypothesis examples all the time! For instance, when you guess on the outcome of a basketball game, you don’t say, “Will the Miami Heat beat the Boston Celtics?” but instead, “I think the Miami Heat will beat the Boston Celtics.” You state it as if it is already true, even if it turns out you’re wrong. You do the same thing when writing your hypothesis.

Additionally, keep in mind that hypotheses can range from very specific to very broad.  These hypotheses can be specific, but if your hypothesis testing examples involve a broad range of causes and effects, your hypothesis can also be broad.  

body-hand-number-two

The Two Types of Hypotheses

Now that you understand what goes into a hypothesis, it’s time to look more closely at the two most common types of hypothesis: the if-then hypothesis and the null hypothesis.

#1: If-Then Hypotheses

First of all, if-then hypotheses typically follow this formula:

If ____ happens, then ____ will happen.

The goal of this type of hypothesis is to test the causal relationship between the independent and dependent variable. It’s fairly simple, and each hypothesis can vary in how detailed it can be. We create if-then hypotheses all the time with our daily predictions. Here are some examples of hypotheses that use an if-then structure from daily life: 

  • If I get enough sleep, I’ll be able to get more work done tomorrow.
  • If the bus is on time, I can make it to my friend’s birthday party. 
  • If I study every night this week, I’ll get a better grade on my exam. 

In each of these situations, you’re making a guess on how an independent variable (sleep, time, or studying) will affect a dependent variable (the amount of work you can do, making it to a party on time, or getting better grades). 

You may still be asking, “What is an example of a hypothesis used in scientific research?” Take one of the hypothesis examples from a real-world study on whether using technology before bed affects children’s sleep patterns. The hypothesis read s:

“We hypothesized that increased hours of tablet- and phone-based screen time at bedtime would be inversely correlated with sleep quality and child attention.”

It might not look like it, but this is an if-then statement. The researchers basically said, “If children have more screen usage at bedtime, then their quality of sleep and attention will be worse.” The sleep quality and attention are the dependent variables and the screen usage is the independent variable. (Usually, the independent variable comes after the “if” and the dependent variable comes after the “then,” as it is the independent variable that affects the dependent variable.) This is an excellent example of how flexible hypothesis statements can be, as long as the general idea of “if-then” and the independent and dependent variables are present.

#2: Null Hypotheses

Your if-then hypothesis is not the only one needed to complete a successful experiment, however. You also need a null hypothesis to test it against. In its most basic form, the null hypothesis is the opposite of your if-then hypothesis . When you write your null hypothesis, you are writing a hypothesis that suggests that your guess is not true, and that the independent and dependent variables have no relationship .

One null hypothesis for the cell phone and sleep study from the last section might say: 

“If children have more screen usage at bedtime, their quality of sleep and attention will not be worse.” 

In this case, this is a null hypothesis because it’s asking the opposite of the original thesis! 

Conversely, if your if-then hypothesis suggests that your two variables have no relationship, then your null hypothesis would suggest that there is one. So, pretend that there is a study that is asking the question, “Does the amount of followers on Instagram influence how long people spend on the app?” The independent variable is the amount of followers, and the dependent variable is the time spent. But if you, as the researcher, don’t think there is a relationship between the number of followers and time spent, you might write an if-then hypothesis that reads:

“If people have many followers on Instagram, they will not spend more time on the app than people who have less.”

In this case, the if-then suggests there isn’t a relationship between the variables. In that case, one of the null hypothesis examples might say:

“If people have many followers on Instagram, they will spend more time on the app than people who have less.”

You then test both the if-then and the null hypothesis to gauge if there is a relationship between the variables, and if so, how much of a relationship. 

feature_tips

4 Tips to Write the Best Hypothesis

If you’re going to take the time to hold an experiment, whether in school or by yourself, you’re also going to want to take the time to make sure your hypothesis is a good one. The best hypotheses have four major elements in common: plausibility, defined concepts, observability, and general explanation.

#1: Plausibility

At first glance, this quality of a hypothesis might seem obvious. When your hypothesis is plausible, that means it’s possible given what we know about science and general common sense. However, improbable hypotheses are more common than you might think. 

Imagine you’re studying weight gain and television watching habits. If you hypothesize that people who watch more than  twenty hours of television a week will gain two hundred pounds or more over the course of a year, this might be improbable (though it’s potentially possible). Consequently, c ommon sense can tell us the results of the study before the study even begins.

Improbable hypotheses generally go against  science, as well. Take this hypothesis example: 

“If a person smokes one cigarette a day, then they will have lungs just as healthy as the average person’s.” 

This hypothesis is obviously untrue, as studies have shown again and again that cigarettes negatively affect lung health. You must be careful that your hypotheses do not reflect your own personal opinion more than they do scientifically-supported findings. This plausibility points to the necessity of research before the hypothesis is written to make sure that your hypothesis has not already been disproven.

#2: Defined Concepts

The more advanced you are in your studies, the more likely that the terms you’re using in your hypothesis are specific to a limited set of knowledge. One of the hypothesis testing examples might include the readability of printed text in newspapers, where you might use words like “kerning” and “x-height.” Unless your readers have a background in graphic design, it’s likely that they won’t know what you mean by these terms. Thus, it’s important to either write what they mean in the hypothesis itself or in the report before the hypothesis.

Here’s what we mean. Which of the following sentences makes more sense to the common person?

If the kerning is greater than average, more words will be read per minute.

If the space between letters is greater than average, more words will be read per minute.

For people reading your report that are not experts in typography, simply adding a few more words will be helpful in clarifying exactly what the experiment is all about. It’s always a good idea to make your research and findings as accessible as possible. 

body-blue-eye

Good hypotheses ensure that you can observe the results. 

#3: Observability

In order to measure the truth or falsity of your hypothesis, you must be able to see your variables and the way they interact. For instance, if your hypothesis is that the flight patterns of satellites affect the strength of certain television signals, yet you don’t have a telescope to view the satellites or a television to monitor the signal strength, you cannot properly observe your hypothesis and thus cannot continue your study.

Some variables may seem easy to observe, but if you do not have a system of measurement in place, you cannot observe your hypothesis properly. Here’s an example: if you’re experimenting on the effect of healthy food on overall happiness, but you don’t have a way to monitor and measure what “overall happiness” means, your results will not reflect the truth. Monitoring how often someone smiles for a whole day is not reasonably observable, but having the participants state how happy they feel on a scale of one to ten is more observable. 

In writing your hypothesis, always keep in mind how you'll execute the experiment.

#4: Generalizability 

Perhaps you’d like to study what color your best friend wears the most often by observing and documenting the colors she wears each day of the week. This might be fun information for her and you to know, but beyond you two, there aren’t many people who could benefit from this experiment. When you start an experiment, you should note how generalizable your findings may be if they are confirmed. Generalizability is basically how common a particular phenomenon is to other people’s everyday life.

Let’s say you’re asking a question about the health benefits of eating an apple for one day only, you need to realize that the experiment may be too specific to be helpful. It does not help to explain a phenomenon that many people experience. If you find yourself with too specific of a hypothesis, go back to asking the big question: what is it that you want to know, and what do you think will happen between your two variables?

body-experiment-chemistry

Hypothesis Testing Examples

We know it can be hard to write a good hypothesis unless you’ve seen some good hypothesis examples. We’ve included four hypothesis examples based on some made-up experiments. Use these as templates or launch pads for coming up with your own hypotheses.

Experiment #1: Students Studying Outside (Writing a Hypothesis)

You are a student at PrepScholar University. When you walk around campus, you notice that, when the temperature is above 60 degrees, more students study in the quad. You want to know when your fellow students are more likely to study outside. With this information, how do you make the best hypothesis possible?

You must remember to make additional observations and do secondary research before writing your hypothesis. In doing so, you notice that no one studies outside when it’s 75 degrees and raining, so this should be included in your experiment. Also, studies done on the topic beforehand suggested that students are more likely to study in temperatures less than 85 degrees. With this in mind, you feel confident that you can identify your variables and write your hypotheses:

If-then: “If the temperature in Fahrenheit is less than 60 degrees, significantly fewer students will study outside.”

Null: “If the temperature in Fahrenheit is less than 60 degrees, the same number of students will study outside as when it is more than 60 degrees.”

These hypotheses are plausible, as the temperatures are reasonably within the bounds of what is possible. The number of people in the quad is also easily observable. It is also not a phenomenon specific to only one person or at one time, but instead can explain a phenomenon for a broader group of people.

To complete this experiment, you pick the month of October to observe the quad. Every day (except on the days where it’s raining)from 3 to 4 PM, when most classes have released for the day, you observe how many people are on the quad. You measure how many people come  and how many leave. You also write down the temperature on the hour. 

After writing down all of your observations and putting them on a graph, you find that the most students study on the quad when it is 70 degrees outside, and that the number of students drops a lot once the temperature reaches 60 degrees or below. In this case, your research report would state that you accept or “failed to reject” your first hypothesis with your findings.

Experiment #2: The Cupcake Store (Forming a Simple Experiment)

Let’s say that you work at a bakery. You specialize in cupcakes, and you make only two colors of frosting: yellow and purple. You want to know what kind of customers are more likely to buy what kind of cupcake, so you set up an experiment. Your independent variable is the customer’s gender, and the dependent variable is the color of the frosting. What is an example of a hypothesis that might answer the question of this study?

Here’s what your hypotheses might look like: 

If-then: “If customers’ gender is female, then they will buy more yellow cupcakes than purple cupcakes.”

Null: “If customers’ gender is female, then they will be just as likely to buy purple cupcakes as yellow cupcakes.”

This is a pretty simple experiment! It passes the test of plausibility (there could easily be a difference), defined concepts (there’s nothing complicated about cupcakes!), observability (both color and gender can be easily observed), and general explanation ( this would potentially help you make better business decisions ).

body-bird-feeder

Experiment #3: Backyard Bird Feeders (Integrating Multiple Variables and Rejecting the If-Then Hypothesis)

While watching your backyard bird feeder, you realized that different birds come on the days when you change the types of seeds. You decide that you want to see more cardinals in your backyard, so you decide to see what type of food they like the best and set up an experiment. 

However, one morning, you notice that, while some cardinals are present, blue jays are eating out of your backyard feeder filled with millet. You decide that, of all of the other birds, you would like to see the blue jays the least. This means you'll have more than one variable in your hypothesis. Your new hypotheses might look like this: 

If-then: “If sunflower seeds are placed in the bird feeders, then more cardinals will come than blue jays. If millet is placed in the bird feeders, then more blue jays will come than cardinals.”

Null: “If either sunflower seeds or millet are placed in the bird, equal numbers of cardinals and blue jays will come.”

Through simple observation, you actually find that cardinals come as often as blue jays when sunflower seeds or millet is in the bird feeder. In this case, you would reject your “if-then” hypothesis and “fail to reject” your null hypothesis . You cannot accept your first hypothesis, because it’s clearly not true. Instead you found that there was actually no relation between your different variables. Consequently, you would need to run more experiments with different variables to see if the new variables impact the results.

Experiment #4: In-Class Survey (Including an Alternative Hypothesis)

You’re about to give a speech in one of your classes about the importance of paying attention. You want to take this opportunity to test a hypothesis you’ve had for a while: 

If-then: If students sit in the first two rows of the classroom, then they will listen better than students who do not.

Null: If students sit in the first two rows of the classroom, then they will not listen better or worse than students who do not.

You give your speech and then ask your teacher if you can hand out a short survey to the class. On the survey, you’ve included questions about some of the topics you talked about. When you get back the results, you’re surprised to see that not only do the students in the first two rows not pay better attention, but they also scored worse than students in other parts of the classroom! Here, both your if-then and your null hypotheses are not representative of your findings. What do you do?

This is when you reject both your if-then and null hypotheses and instead create an alternative hypothesis . This type of hypothesis is used in the rare circumstance that neither of your hypotheses is able to capture your findings . Now you can use what you’ve learned to draft new hypotheses and test again! 

Key Takeaways: Hypothesis Writing

The more comfortable you become with writing hypotheses, the better they will become. The structure of hypotheses is flexible and may need to be changed depending on what topic you are studying. The most important thing to remember is the purpose of your hypothesis and the difference between the if-then and the null . From there, in forming your hypothesis, you should constantly be asking questions, making observations, doing secondary research, and considering your variables. After you have written your hypothesis, be sure to edit it so that it is plausible, clearly defined, observable, and helpful in explaining a general phenomenon.

Writing a hypothesis is something that everyone, from elementary school children competing in a science fair to professional scientists in a lab, needs to know how to do. Hypotheses are vital in experiments and in properly executing the scientific method . When done correctly, hypotheses will set up your studies for success and help you to understand the world a little better, one experiment at a time.

body-whats-next-post-it-note

What’s Next?

If you’re studying for the science portion of the ACT, there’s definitely a lot you need to know. We’ve got the tools to help, though! Start by checking out our ultimate study guide for the ACT Science subject test. Once you read through that, be sure to download our recommended ACT Science practice tests , since they’re one of the most foolproof ways to improve your score. (And don’t forget to check out our expert guide book , too.)

If you love science and want to major in a scientific field, you should start preparing in high school . Here are the science classes you should take to set yourself up for success.

If you’re trying to think of science experiments you can do for class (or for a science fair!), here’s a list of 37 awesome science experiments you can do at home

author image

Ashley Sufflé Robinson has a Ph.D. in 19th Century English Literature. As a content writer for PrepScholar, Ashley is passionate about giving college-bound students the in-depth information they need to get into the school of their dreams.

Ask a Question Below

Have any questions about this article or other topics? Ask below and we'll reply!

Improve With Our Famous Guides

  • For All Students

The 5 Strategies You Must Be Using to Improve 160+ SAT Points

How to Get a Perfect 1600, by a Perfect Scorer

Series: How to Get 800 on Each SAT Section:

Score 800 on SAT Math

Score 800 on SAT Reading

Score 800 on SAT Writing

Series: How to Get to 600 on Each SAT Section:

Score 600 on SAT Math

Score 600 on SAT Reading

Score 600 on SAT Writing

Free Complete Official SAT Practice Tests

What SAT Target Score Should You Be Aiming For?

15 Strategies to Improve Your SAT Essay

The 5 Strategies You Must Be Using to Improve 4+ ACT Points

How to Get a Perfect 36 ACT, by a Perfect Scorer

Series: How to Get 36 on Each ACT Section:

36 on ACT English

36 on ACT Math

36 on ACT Reading

36 on ACT Science

Series: How to Get to 24 on Each ACT Section:

24 on ACT English

24 on ACT Math

24 on ACT Reading

24 on ACT Science

What ACT target score should you be aiming for?

ACT Vocabulary You Must Know

ACT Writing: 15 Tips to Raise Your Essay Score

How to Get Into Harvard and the Ivy League

How to Get a Perfect 4.0 GPA

How to Write an Amazing College Essay

What Exactly Are Colleges Looking For?

Is the ACT easier than the SAT? A Comprehensive Guide

Should you retake your SAT or ACT?

When should you take the SAT or ACT?

Stay Informed

Follow us on Facebook (icon)

Get the latest articles and test prep tips!

Looking for Graduate School Test Prep?

Check out our top-rated graduate blogs here:

GRE Online Prep Blog

GMAT Online Prep Blog

TOEFL Online Prep Blog

Holly R. "I am absolutely overjoyed and cannot thank you enough for helping me!”

Encyclopedia Britannica

  • Games & Quizzes
  • History & Society
  • Science & Tech
  • Biographies
  • Animals & Nature
  • Geography & Travel
  • Arts & Culture
  • On This Day
  • One Good Fact
  • New Articles
  • Lifestyles & Social Issues
  • Philosophy & Religion
  • Politics, Law & Government
  • World History
  • Health & Medicine
  • Browse Biographies
  • Birds, Reptiles & Other Vertebrates
  • Bugs, Mollusks & Other Invertebrates
  • Environment
  • Fossils & Geologic Time
  • Entertainment & Pop Culture
  • Sports & Recreation
  • Visual Arts
  • Demystified
  • Image Galleries
  • Infographics
  • Top Questions
  • Britannica Kids
  • Saving Earth
  • Space Next 50
  • Student Center
  • Introduction

Implications

Green, yellow and black bananas

confirmation bias

Our editors will review what you’ve submitted and determine whether to revise the article.

  • Simply Psychology - How Confirmation Bias Works
  • Verywell Mind - How Confirmation Bias Works
  • WebMD - What is Confirmation Bias?
  • Frontiers - Characterizing the Influence of Confirmation Bias on Web Search Behavior
  • Academia - Confirmation Bias
  • Psychology Today - What is Confirmation Bias?
  • Table Of Contents

confirmation bias , people’s tendency to process information by looking for, or interpreting, information that is consistent with their existing beliefs. This biased approach to decision making is largely unintentional, and it results in a person ignoring information that is inconsistent with their beliefs. These beliefs can include a person’s expectations in a given situation and their predictions about a particular outcome. People are especially likely to process information to support their own beliefs when an issue is highly important or self-relevant.

Confirmation bias is one example of how humans sometimes process information in an illogical, biased manner. The manner in which a person knows and understands the world is often affected by factors that are simply unknown to that person. Philosophers note that people have difficulty processing information in a rational , unbiased manner once they have developed an opinion about an issue. Humans are better able to rationally process information, giving equal weight to multiple viewpoints, if they are emotionally distant from the issue (although a low level of confirmation bias can still occur when an individual has no vested interests).

(Read Steven Pinker’s Britannica entry on rationality.)

One explanation for why people are susceptible to confirmation bias is that it is an efficient way to process information. Humans are incessantly bombarded with information and cannot possibly take the time to carefully process each piece of information to form an unbiased conclusion. Human decision making and information processing is often biased because people are limited to interpreting information from their own viewpoint. People need to process information quickly to protect themselves from harm. It is adaptive for humans to rely on instinctive, automatic behaviours that keep them out of harm’s way.

Another reason why people show confirmation bias is to protect their self-esteem . People like to feel good about themselves, and discovering that a belief that they highly value is incorrect makes them feel bad about themselves. Therefore, people will seek information that supports their existing beliefs. Another closely related motive is wanting to be correct. People want to feel that they are intelligent, but information that suggests that they are wrong or that they made a poor decision suggests they are lacking intelligence—and thus confirmation bias will encourage them to disregard this information.

Research has shown that confirmation bias is strong and widespread and that it occurs in several contexts . In the context of decision making , once an individual makes a decision, they will look for information that supports it. Information that conflicts with a person’s decision may cause discomfort, and the person will therefore ignore it or give it little consideration. People give special treatment to information that supports their personal beliefs. In studies examining my-side bias , people were able to generate and remember more reasons supporting their side of a controversial issue than the opposing side. Only when a researcher directly asked people to generate arguments against their own beliefs were they able to do so. It is not that people are incapable of generating arguments that are counter to their beliefs, but, rather, people are not motivated to do so.

what does confirm hypothesis mean

Confirmation bias also surfaces in people’s tendency to look for positive instances. When seeking information to support their hypotheses or expectations, people tend to look for positive evidence that confirms that a hypothesis is true rather than information that would prove the view is false (if it is false).

Confirmation bias also operates in impression formation. If people are told what to expect from a person they are about to meet, such as that the person is warm, friendly, and outgoing, people will look for information that supports their expectations. When interacting with people whom perceivers think have certain personalities, the perceivers will ask questions of those people that are biased toward supporting the perceivers’ beliefs. For example, if Maria expects her roommate to be friendly and outgoing, Maria may ask her if she likes to go to parties rather than asking if she often studies in the library.

Confirmation bias is important because it may lead people to hold strongly to false beliefs or to give more weight to information that supports their beliefs than is warranted by the evidence. People may be overconfident in their beliefs because they have accumulated evidence to support them, when in reality they have overlooked or ignored a great deal of evidence refuting their beliefs—evidence which, if they had considered it, should lead them to question their beliefs. These factors may lead to risky decision making and lead people to overlook warning signs and other important information. In this manner, confirmation bias is often a component of black swan events , which are high-impact events that are unexpected but, in retrospect , appear to be inevitable.

Confirmation bias has important implications in the real world, including in medicine, law, and interpersonal relationships. Research has shown that medical doctors are just as likely to have confirmation biases as everyone else. Doctors often have a preliminary hunch regarding the diagnosis of a medical condition early in the treatment process. This hunch can interfere with the doctor’s ability to assess information that may indicate an alternative diagnosis is more likely. Another related outcome is how patients react to diagnoses . Patients are more likely to agree with a diagnosis that supports their preferred outcome than a diagnosis that goes against their preferred outcome. Both of these examples demonstrate that confirmation bias has implications for individuals’ health and well-being.

In the context of law, judges and jurors sometimes form an opinion about a defendant’s guilt or innocence before all of the evidence is known. Once a judge or juror forms an opinion, confirmation bias will interfere with their ability to process new information that emerges during a trial, which may lead to unjust verdicts.

In interpersonal relations, confirmation bias can be problematic because it may lead a person to form inaccurate and biased impressions of others. This may result in miscommunication and conflict in intergroup settings. In addition, when someone treats a person according to their expectations, that person may unintentionally change their behavior to conform to the other person’s expectations, thereby providing further support for the perceiver’s confirmation bias.

what does confirm hypothesis mean

Confirmation is Key to Reducing Risk and Increasing Accuracy

Updated: February 3, 2023 by iSixSigma Staff

what does confirm hypothesis mean

Confirmation is a key concept in Six Sigma and lean manufacturing. It’s also one of the most misunderstood elements of these two disciplines. Confirmation is simply the act of confirming your hypothesis about results, or making sure that what you think happened actually happened. For example, if you want to confirm that an improvement has been made on a process by looking at the data from that process, you need to make sure that it’s actually better before declaring victory

Overview: What is Confirmation?

Confirmation is one of the most important steps in a Lean process. It’s a systematic way to confirm that you’re on the right track and working toward achieving an optimized solution. Confirmation doesn’t mean that everything is perfect , but it does mean that you’ve identified any major issues with your initial hypothesis and tackled them head-on.

Confirmation differs from validation and verification because it’s not as black-and-white. Validation and verification are tools used to make sure your data collection methods are sound; they check whether or not what you’re measuring matches up with what you believe should be measured (i.e., if you’re measuring sales figures accurately). Confirmation, on the other hand, takes place after all this has been done—and focuses on determining how much confidence you should have in your results. For certain tasks or specific projects within an organization, confirmation depends on testing different approaches or theories against each other until one stands out as being more effective than others.

3 Benefits to Confirmation

Confirmation is a key tool in Lean Six Sigma because it helps you to verify whether or not the results you are getting are accurate. It provides evidence that your assumptions and decisions (or, hypothesis) are correct.

There are three benefits to confirmation:

1. Confirmation allows you to verify the accuracy of your process, which can help identify issues that may be affecting it.

This means that you can make changes to improve the process, saving time and money.

2. Confirmation gives you confidence in the results, which makes it easier for others to trust them too.

When everyone knows they can rely on the data, it’s easier for everyone to work together towards their goals.

3. Confirmation allows you to reduce risk by ensuring that there aren’t any mistakes in your calculations or analysis.

Doing this before moving forward with any decisions based on those results (such as purchasing new equipment) is in the best interest of the organization.

Why is Confirmation Important to Understand?

In the scientific method, confirmation is the process of confirming that a hypothesis is true. It can be performed using statistical methods or other means.

In Six Sigma and Lean Six Sigma, confirmation is an important part of the six sigma methodology because it’s necessary to confirm your hypothesis before running an experiment. If you don’t confirm your hypothesis first, then you may waste time running experiments that aren’t even close to what you’re trying to test. It helps you make sure that your assumptions are correct, and that you’re not over-optimizing for a small part of the process.

Confirmation is also a key component of the DMAIC model, which stands for Define, Measure, Analyze, Improve and Control.

In the case of Six Sigma, this mean that confirmation is additionally used to verify that a problem has been solved or an improvement has been made. It’s important because it helps you make sure that your solution is actually working.

An Industry Example of Confirmation

An industry example of LSS confirmation is the automotive industry.

The auto industry is one of the most data-driven industries in the world, and it uses lean six sigma to confirm that its processes are working as intended. When a car manufacturer wants to streamline its process for making one particular part, they will first identify the part that takes up the most time and effort during manufacturing. Then they will take that part and create a new process for making it, using only the equipment they already have on hand. They will then test this new process by making a small batch of parts using that process before shifting over to making large batches using the new system. Once they’ve confirmed that their new system works properly with small batches, they can scale up production without worrying about losing any efficiency or quality control issues.

3 Best Practices When Thinking About Confirmation

Confirmation is a key concept in Lean Six Sigma. It’s the process of taking action on some data, and then checking whether that action was effective. It’s important to remember that confirming an action doesn’t mean you know it was effective—it just means that you’ve done something and want to learn more about whether or not it worked.

In order to confirm an action, there are three best practices:

1. Do it quickly.

You should be able to move forward with your project without waiting long – if at all – for results.

2. Capture all relevant data.

The data you collect should be useful in determining whether or not your action was effective.

3. Be scientific.

You should keep track of what happens after taking an action so that you can draw conclusions about whether or not it worked.

Frequently Asked Questions (FAQs) About Confirmation

What is confirmation?

Confirmation is a way of checking that a process works as intended and produces desired results. For example, if you have a customer survey process, you can confirm that it’s working by comparing your results with those from previous surveys. You could also use the process to check that customer satisfaction has improved over time.

Is there a difference between confirmation and validation in the context of Six Sigma?

Confirmation is an activity that is performed when a process is suspected to be in control, but its status has not yet been confirmed. Validation is a similar activity that is used to determine whether the process is capable of achieving its target. Validation can be thought of as confirmation with one important difference: validation seeks to determine whether or not a process is capable of achieving its target by measuring the capability of the process while it is under control.

Is there a difference between confirmation and verification in the context of Six Sigma?

Verification is also an activity that seeks to determine whether or not a process is capable of achieving its target, but it does so by measuring the capability of the process while it is NOT under control. Confirmation and verification are similar processes that seek to answer different questions: confirmation seeks to answer whether or not a process is capable of achieving its target while verification seeks to determine if that capability exists regardless of whether or not the process is under control at any given time.

Confirmation Before Action

So, what is confirmation? It’s a simple concept, but it can be confusing for some people. Confirmation is when you gather more information about something you think may be true before making major decisions that affect others or your company. Confirming before acting could save time in the long run by preventing errors or delays later on. Even though we try, nobody knows everything 100% at any given moment.

About the Author

' src=

iSixSigma Staff

  • Dictionaries home
  • American English
  • Collocations
  • German-English
  • Grammar home
  • Practical English Usage
  • Learn & Practise Grammar (Beta)
  • Word Lists home
  • My Word Lists
  • Recent additions
  • Resources home
  • Text Checker

Definition of hypothesis noun from the Oxford Advanced Learner's Dictionary

  • to formulate/confirm a hypothesis
  • a hypothesis about the function of dreams
  • There is little evidence to support these hypotheses.
  • formulate/​advance a theory/​hypothesis
  • build/​construct/​create/​develop a simple/​theoretical/​mathematical model
  • develop/​establish/​provide/​use a theoretical/​conceptual framework
  • advance/​argue/​develop the thesis that…
  • explore an idea/​a concept/​a hypothesis
  • make a prediction/​an inference
  • base a prediction/​your calculations on something
  • investigate/​evaluate/​accept/​challenge/​reject a theory/​hypothesis/​model
  • design an experiment/​a questionnaire/​a study/​a test
  • do research/​an experiment/​an analysis
  • make observations/​measurements/​calculations
  • carry out/​conduct/​perform an experiment/​a test/​a longitudinal study/​observations/​clinical trials
  • run an experiment/​a simulation/​clinical trials
  • repeat an experiment/​a test/​an analysis
  • replicate a study/​the results/​the findings
  • observe/​study/​examine/​investigate/​assess a pattern/​a process/​a behaviour
  • fund/​support the research/​project/​study
  • seek/​provide/​get/​secure funding for research
  • collect/​gather/​extract data/​information
  • yield data/​evidence/​similar findings/​the same results
  • analyse/​examine the data/​soil samples/​a specimen
  • consider/​compare/​interpret the results/​findings
  • fit the data/​model
  • confirm/​support/​verify a prediction/​a hypothesis/​the results/​the findings
  • prove a conjecture/​hypothesis/​theorem
  • draw/​make/​reach the same conclusions
  • read/​review the records/​literature
  • describe/​report an experiment/​a study
  • present/​publish/​summarize the results/​findings
  • present/​publish/​read/​review/​cite a paper in a scientific journal
  • Her hypothesis concerns the role of electromagnetic radiation.
  • Her study is based on the hypothesis that language simplification is possible.
  • It is possible to make a hypothesis on the basis of this graph.
  • None of the hypotheses can be rejected at this stage.
  • Scientists have proposed a bold hypothesis.
  • She used this data to test her hypothesis
  • The hypothesis predicts that children will perform better on task A than on task B.
  • The results confirmed his hypothesis on the use of modal verbs.
  • These observations appear to support our working hypothesis.
  • a speculative hypothesis concerning the nature of matter
  • an interesting hypothesis about the development of language
  • Advances in genetics seem to confirm these hypotheses.
  • His hypothesis about what dreams mean provoked a lot of debate.
  • Research supports the hypothesis that language skills are centred in the left side of the brain.
  • The survey will be used to test the hypothesis that people who work outside the home are fitter and happier.
  • This economic model is really a working hypothesis.
  • speculative
  • concern something
  • be based on something
  • predict something
  • on a/​the hypothesis
  • hypothesis about
  • hypothesis concerning

Join our community to access the latest language learning and assessment tips from Oxford University Press!

  • It would be pointless to engage in hypothesis before we have the facts.

Other results

Nearby words.

  • More from M-W
  • To save this word, you'll need to log in. Log In

Definition of confirm

transitive verb

  • authenticate
  • corroborate
  • substantiate

confirm , corroborate , substantiate , verify , authenticate , validate mean to attest to the truth or validity of something.

confirm implies the removing of doubts by an authoritative statement or indisputable fact.

corroborate suggests the strengthening of what is already partly established.

substantiate implies the offering of evidence that sustains the contention.

verify implies the establishing of correspondence of actual facts or details with those proposed or guessed at.

authenticate implies establishing genuineness by adducing legal or official documents or expert opinion.

validate implies establishing validity by authoritative affirmation or by factual proof.

Examples of confirm in a Sentence

These examples are programmatically compiled from various online sources to illustrate current usage of the word 'confirm.' Any opinions expressed in the examples do not represent those of Merriam-Webster or its editors. Send us feedback about these examples.

Word History

Middle English, from Anglo-French cunfermer , from Latin confirmare , from com- + firmare to make firm, from firmus firm

13th century, in the meaning defined at sense 1

Dictionary Entries Near confirm

confirmability theory

Cite this Entry

“Confirm.” Merriam-Webster.com Dictionary , Merriam-Webster, https://www.merriam-webster.com/dictionary/confirm. Accessed 18 Jun. 2024.

Kids Definition

Kids definition of confirm, legal definition, legal definition of confirm, more from merriam-webster on confirm.

Nglish: Translation of confirm for Spanish Speakers

Britannica English: Translation of confirm for Arabic Speakers

Subscribe to America's largest dictionary and get thousands more definitions and advanced search—ad free!

Play Quordle: Guess all four words in a limited number of tries.  Each of your guesses must be a real 5-letter word.

Can you solve 4 words at once?

Word of the day.

See Definitions and Examples »

Get Word of the Day daily email!

Popular in Grammar & Usage

Plural and possessive names: a guide, more commonly misspelled words, your vs. you're: how to use them correctly, every letter is silent, sometimes: a-z list of examples, more commonly mispronounced words, popular in wordplay, 8 words for lesser-known musical instruments, birds say the darndest things, 10 words from taylor swift songs (merriam's version), 10 scrabble words without any vowels, 12 more bird names that sound like insults (and sometimes are), games & quizzes.

Play Blossom: Solve today's spelling word game by finding as many words as you can using just 7 letters. Longer words score more points.

  • Press Releases
  • The Impeachment of DHS Secretary Alejandro Mayorkas
  • Border Startling Stats
  • Border Security and Enforcement
  • Cybersecurity and Infrastructure Protection
  • Emergency Management and Technology
  • Counterterrorism, Law Enforcement, and Intelligence
  • Oversight, Investigations, and Accountability
  • Transportation and Maritime Security
  • Blow The Whistle

ICYMI: Microsoft President Testifies on Past Security Failures, Accountability Measures in Wake of Chinese Hack of Government Accounts

June 17, 2024

WASHINGTON, D.C. ––  Last week, the House Committee on Homeland Security, led by Chairman Mark E. Green, MD (R-TN), held  a hearing  to examine Microsoft’s security culture in the wake of the  Cyber Safety Review Board’s (CSRB) report  on the Microsoft Online Exchange 2023 cyber intrusion by Storm-0558, a threat actor affiliated with the People’s Republic of China (PRC). Witness testimony was provided by Microsoft Vice Chairman and President Brad Smith, who accepted Microsoft’s responsibility in his opening statement for the intrusion that successfully compromised 22 enterprise organizations and over 500 individuals globally, including federal government accounts, due to what the CSRB described as “a cascade of failures” by Microsoft. Chairman Green and Ranking Member Bennie Thompson (D-MS) formally requested Smith’s testimony  on May 9 .    In the hearing, members highlighted the risks associated with Microsoft’s presence in China, its approach to artificial intelligence (AI) development and deployment, Microsoft’s current and future approaches to business decisions, and the company’s plans to strengthen cybersecurity measures following the intrusion. Members also discussed the January 2024 cyber intrusion by “Midnight Blizzard,” a state-sponsored cyber actor affiliated with the Russian Foreign Intelligence Agency that was also responsible for the attack on SolarWinds in 2020.   Although the Committee commends Microsoft for announcing steps to reform its security practices, ensuring follow-through on the company’s stated commitments will be crucial for ensuring U.S. government networks and Americans––including U.S. officials––are not exposed to further risk.

Image

In his opening statement,  Chairman Green  highlighted the broader questions the Committee must examine regarding the mitigation of economic and national security risks: “To be clear, the U.S. government would never expect a private company to work alone in protecting itself against nation-state attacks. We need to do more work to define roles and responsibilities for public and private sector actors in the event of nation-state attacks on our networks.  Our nation’s adversaries possess advanced cyber capabilities and substantial resources, often exceeding the defensive cybersecurity measures available to even the most sophisticated companies. However, we do expect government vendors to implement basic cybersecurity practices. ”   “First, closing the cyber workforce gap—my top priority for the Committee this year. The security challenges we face as a nation are compounded by the persistent shortage of cybersecurity professionals. As Microsoft continues its work to invest in our cyber workforce, we must harken back to the lessons from the CSRB report. Our cyber professionals must be trained to think of security first. We must equip them with the right skills to protect our networks and to build our systems securely. Second, we need to define the role of public and private sector entities in protecting our networks against nation-state actors. These attacks have become increasingly common, rather than anomalies. We need clearly defined responsibilities so that we can effectively respond to nation-state attacks on our networks. Finally, we must address a fundamental issue: the economic incentives that drive cybersecurity investments. As the CSRB’s report recently revealed, underinvestment in essential security measures exposed critical vulnerabilities.”

Image

Subcommittee on Transportation and Maritime Security Chairman Carlos Gimenez (R-FL) highlighted the dangers of doing business in Communist China and asked Smith if Microsoft shares critical information on cybersecurity with the Chinese Communist Party (CCP)––as companies are required to do under Chinese law:   “This law requires all organizations and citizens to cooperate with China’s intelligence agencies, including the People’s Liberation Army, in matters of national security. While the law does not specifically mention companies working in China, it does apply to all organizations operating within the country, including foreign companies. [Do] you operate in China?”   Smith answered:    “Yes, we do”   Gimenez continued:    “Do you comply with this law?”   Smith answered:    “No, we do not”   Gimenez continued:    “How is it you got away with not complying with the law? Do you have a waiver from the Chinese government saying you don’t have to comply with this law?”   Smith answered:    “But there are many laws– there are two types of countries in the world. Those that apply every law they enact, and those that enact certain laws but don’t always apply them. And in this context, China, for that law, is in the second category.”   Gimenez continued:    “Do you really believe that because––look, I sit on the Select Committee on China, and that’s not the information that we get––that all companies in China have to cooperate with the intelligence agencies of China and the People’s Liberation Army. You operate in China, and you’re sitting there telling me that you don’t have to comply with the laws of China?”   After pressing Smith further, Gimenez later concluded:    “I’m sorry, I just––for some reason, I just don’t trust what you’re saying.”

Image

Subcommittee on Border Security and Enforcement Chairman Clay Higgins (R-LA)  asked Smith why Microsoft did not correct, in a timely manner, its inaccurate public statements about the 2023 cyber intrusion:   “After the hack, the 2023 Microsoft Online Exchange intrusion, why did it take six months for Microsoft to update the means by which most Americans would sort of be made aware of such a hack?”   Smith answered:    “First of all, I appreciate the question, it’s one that I asked our team when I read the CSRB report. It’s the part of the report that surprised me the most. You know, we had five versions of that blog, the original, and then four updates. And we do a lot of updates of these reports. And when I asked the team, they said the specific thing that had changed, namely a theory, a hypothesis about the cause of the intrusion, changed over time. But it didn’t change in a way that would give anyone useful or actionable information that they could apply—”   Higgins continued:    “Mr. Smith, respectfully, that answer does not encourage trust. And regular Americans listening are going to have to move the tape back on the Microsoft instrument and listen to what you said again. But you didn’t do it, I mean, you’re Microsoft, [you] had a major thing happen, and the means by which you communicate with your customers was not updated for six months. So I’m just going to say that I don’t really accept your answer as thoroughly honest.”   Smith answered:    “I said the same thing, and we had the same conversation inside the company.”  

Image

Congresswoman Laurel Lee (R-FL)  asked Smith how to improve the victim notification process in the wake of the challenges that Microsoft faced in notifying those impacted by the 2023 Storm-0558 hack:   “I’d like to hear more about one of the things that was identified in the report as an area in need of improvement––victim notification. So, I’d like for you to elaborate a little bit more on your thoughts and going forward plan on how to improve victim notification.”   Smith answered:    “When we find that someone has been a victim of an attack, it doesn’t mean that the fault was ours, it’s just that our threat detection system may have found it. We need to let them know. Well, how do you let somebody know? If it’s an enterprise, we probably have a connection, there’s probably somebody there we can call. But if it’s a consumer, like a consumer-based email system, we don’t necessarily know who the human is, we just have an email address. So, we send an email.    “There was a member of Congress we sent an email to last year. That member of Congress did what you sort of expect, they said well, that’s not really Microsoft, is it? It’s spam. […] That’s the world in which we live. And so, the CSRB has a great recommendation on this. It’s to create the equivalent of the Amber Alert. But it will require support from congress that CISA lead this, that the tech sector, and probably the telecommunications companies, and the phone makers, and the phone operating system makers all come together. This would be a huge step forward.”

Image

Congressman Dale Strong (R-AL)  pressed Smith on any vulnerabilities still present in Microsoft’s products due to the length of time the threat actor had access to stolen credentials:   “What are the security implications of China and other potential threat actors having access into your network for so long? What is the threat of that, you know, thank goodness it was discovered, but what is the threat do you see for them being in your system for so long without being noticed?”   Smith answered:     “I would just like to qualify a little bit of the premise, because I noticed in some of the questions that were floating around this week that people suggested that because the Chinese had acquired this key in 2021 and we didn’t find it until 2023 that they must’ve had access for two years. I think that in fact they kept it in storage until they were ready to use it, knowing that once they did, it would likely be discovered quickly.”   Strong continued:    “Thank you, and that leads to my next question. Are the Chinese still able to access Microsoft’s corporate network today?”   Smith answered:    “No, not with anything they did before, and [we] do everything we can do to ensure they don’t get in any other way.”

Image

Subcommittee on Emergency Management and Technology Chairman Anthony D’Esposito (R-NY )  asked Smith why the government should continue using its products after the CSRB questioned Microsoft’s ability to prevent future hacks without an “overhaul” of its security culture:   “Are you confident that moving forward Microsoft has the ability to quickly detect and react to an intrusion like this?”   Smith answered:    “I feel very confident that we have the strongest threat detection system that you’re going to find in, quite possibly, in any organization private or public on the planet. Will that always mean we will be the first to find everything, well no, that doesn’t work that way. But I feel very good about what we have, and I feel very confident about what we’re building.” In his closing remarks,  Chairman Green  highlighted the importance of public-private partnerships in cybersecurity and harmonizing regulations in order to prevent future intrusions:   “Sometimes government, in this public-private partnership that we talked about a couple times … sometimes the government can get in the way too, and I want to ask that you educate us as much as possible. I will give you an example. The SEC ruling, the four-day report for a breach. Some of the big cybersecurity companies, I mean the biggest in the nation, told me it [takes] seven or eight days to fix a breach. We are announcing to the world that, at four days, we have a hole in the wall, and it takes seven days to close a hole––this is the government forcing companies to invite the enemy to come in. That is a stupid regulation.    “We need help on understanding where the government also creates problems, so I would appreciate anything that comes to mind. One of the initiatives here, we talked about cyber workforce, one of the other initiatives is the synchronization of the regulations that are out there, making sure we are not duplicative, and we aren’t contradictory, because as I understand there are some regulations that are.”   “If we are causing you to have duplicitous effort, that is money that could be spent on real cybersecurity. In this partnership, we need communication, not just on the issues that are brought up here––the breach that was identified––but how we make things better and work better on how we regulate and create compliance requirements.”

IMAGES

  1. 13 Different Types of Hypothesis (2024)

    what does confirm hypothesis mean

  2. 🏷️ Formulation of hypothesis in research. How to Write a Strong

    what does confirm hypothesis mean

  3. Hypothesis

    what does confirm hypothesis mean

  4. What is hypothesis?

    what does confirm hypothesis mean

  5. Research Hypothesis: Definition, Types, Examples and Quick Tips

    what does confirm hypothesis mean

  6. hypothesis in research methodology notes

    what does confirm hypothesis mean

VIDEO

  1. Concept of Hypothesis

  2. Hypothesis Mean data

  3. 27 Consistent Hypothesis and Inconsistent Hypothesis Example

  4. Patterns of Reasoning

  5. Confirm meaning with 5 examples

  6. 4.6.1 Overview of Hypothesis Testing-Part 1

COMMENTS

  1. Don't talk about hypotheses as being "either confirmed, partially

    Kevin Lewis points us to this article by Paige Shaffer et al., "Gambling Research and Funding Biases," which reports, "Gambling industry funded studies were no more likely than studies not funded by the gambling industry to report either confirmed, partially confirmed, or rejected hypotheses.". The paradox is that this particular study was itself funded by the gambling industry!

  2. What Is a Confirmed Hypothesis?

    A hypothesis is a provisional idea or explanation requiring evaluation. It is a key component of the scientific method. Every scientific study, whether experimental or descriptive, begins with a hypothesis that the study is designed to test -- that is, depending on the results of the study, the hypothesis will be either confirmed or disconfirmed.

  3. Hypothesis Testing

    Present the findings in your results and discussion section. Though the specific details might vary, the procedure you will use when testing a hypothesis will always follow some version of these steps. Table of contents. Step 1: State your null and alternate hypothesis. Step 2: Collect data. Step 3: Perform a statistical test.

  4. Research Hypothesis In Psychology: Types, & Examples

    Examples. A research hypothesis, in its plural form "hypotheses," is a specific, testable prediction about the anticipated results of a study, established at its outset. It is a key component of the scientific method. Hypotheses connect theory to data and guide the research process towards expanding scientific understanding.

  5. Confirmation

    Confirmation. Human cognition and behavior heavily relies on the notion that evidence (data, premises) can affect the credibility of hypotheses (theories, conclusions). This general idea seems to underlie sound and effective inferential practices in all sorts of domains, from everyday reasoning up to the frontiers of science.

  6. Hypothesis: Definition, Examples, and Types

    A hypothesis is a tentative statement about the relationship between two or more variables. It is a specific, testable prediction about what you expect to happen in a study. It is a preliminary answer to your question that helps guide the research process. Consider a study designed to examine the relationship between sleep deprivation and test ...

  7. What is a scientific hypothesis?

    A scientific hypothesis is a tentative, testable explanation for a phenomenon in the natural world. It's the initial building block in the scientific method. Many describe it as an "educated guess ...

  8. How to Write a Strong Hypothesis

    Developing a hypothesis (with example) Step 1. Ask a question. Writing a hypothesis begins with a research question that you want to answer. The question should be focused, specific, and researchable within the constraints of your project. Example: Research question.

  9. The scientific method (article)

    The scientific method. At the core of biology and other sciences lies a problem-solving approach called the scientific method. The scientific method has five basic steps, plus one feedback step: Make an observation. Ask a question. Form a hypothesis, or testable explanation. Make a prediction based on the hypothesis.

  10. Confirmation and Induction

    Confirmation and Induction. The term "confirmation" is used in epistemology and the philosophy of science whenever observational data and evidence "speak in favor of" or support scientific theories and everyday hypotheses. Historically, confirmation has been closely related to the problem of induction, the question of what to believe ...

  11. What Is A Research Hypothesis? A Simple Definition

    A research hypothesis (also called a scientific hypothesis) is a statement about the expected outcome of a study (for example, a dissertation or thesis). To constitute a quality hypothesis, the statement needs to have three attributes - specificity, clarity and testability. Let's take a look at these more closely.

  12. Hypothesis

    The hypothesis of Andreas Cellarius, showing the planetary motions in eccentric and epicyclical orbits. A hypothesis (pl.: hypotheses) is a proposed explanation for a phenomenon.For a hypothesis to be a scientific hypothesis, the scientific method requires that one can test it. Scientists generally base scientific hypotheses on previous observations that cannot satisfactorily be explained with ...

  13. Scientific hypothesis

    hypothesis. science. scientific hypothesis, an idea that proposes a tentative explanation about a phenomenon or a narrow set of phenomena observed in the natural world. The two primary features of a scientific hypothesis are falsifiability and testability, which are reflected in an "If…then" statement summarizing the idea and in the ...

  14. What Is the Next Step if an Experiment Fails to Confirm Your Hypothesis

    The scientific method goes through a progression of steps to create new knowledge. You make an observation, have a question about your observation, form a hypothesis about how it works, test your hypothesis and then form new questions and hypotheses based on the results. The results of your experiment represent a single iteration of the ...

  15. What a Hypothesis Is and How to Formulate One

    A hypothesis is a prediction of what will be found at the outcome of a research project and is typically focused on the relationship between two different variables studied in the research. It is usually based on both theoretical expectations about how things work and already existing scientific evidence. Within social science, a hypothesis can ...

  16. What is a hypothesis?

    A hypothesis states your predictions about what your research will find. It is a tentative answer to your research question that has not yet been tested. For some research projects, you might have to write several hypotheses that address different aspects of your research question. A hypothesis is not just a guess — it should be based on ...

  17. What Is a Hypothesis? The Scientific Method

    A hypothesis (plural hypotheses) is a proposed explanation for an observation. The definition depends on the subject. In science, a hypothesis is part of the scientific method. It is a prediction or explanation that is tested by an experiment. Observations and experiments may disprove a scientific hypothesis, but can never entirely prove one.

  18. What Is a Hypothesis and How Do I Write One?

    Merriam Webster defines a hypothesis as "an assumption or concession made for the sake of argument.". In other words, a hypothesis is an educated guess. Scientists make a reasonable assumption--or a hypothesis--then design an experiment to test whether it's true or not.

  19. Confirmation bias

    Confirmation bias also surfaces in people's tendency to look for positive instances. When seeking information to support their hypotheses or expectations, people tend to look for positive evidence that confirms that a hypothesis is true rather than information that would prove the view is false (if it is false).. Confirmation bias also operates in impression formation.

  20. Confirmation is Key to Reducing Risk and Increasing Accuracy

    Confirmation is a key tool in Lean Six Sigma because it helps you to verify whether or not the results you are getting are accurate. It provides evidence that your assumptions and decisions (or, hypothesis) are correct. There are three benefits to confirmation: 1. Confirmation allows you to verify the accuracy of your process, which can help ...

  21. Hypothesis Definition & Meaning

    hypothesis: [noun] an assumption or concession made for the sake of argument. an interpretation of a practical situation or condition taken as the ground for action.

  22. hypothesis noun

    a speculative hypothesis concerning the nature of matter; an interesting hypothesis about the development of language; Advances in genetics seem to confirm these hypotheses. His hypothesis about what dreams mean provoked a lot of debate. Research supports the hypothesis that language skills are centred in the left side of the brain.

  23. Confirm Definition & Meaning

    The meaning of CONFIRM is to give approval to : ratify. How to use confirm in a sentence. Synonym Discussion of Confirm.

  24. ICYMI: Microsoft President Testifies on Past Security Failures

    Subcommittee on Transportation and Maritime Security Chairman Carlos Gimenez (R-FL)highlighted the dangers of doing business in Communist China and asked Smith if Microsoft shares critical information on cybersecurity with the Chinese Communist Party (CCP)--as companies are required to do under Chinese law: "This law requires all organizations and citizens to cooperate with China's ...