The term natural experiment is used instead when a study takes advantage of an “exogenous assignment” mechanism such as an error in implementation (as in the case of Morris et al. ), rather than explicit allocation by an experimenter or other decision maker who may be able to bias decisions about recruitment/participation.
Quasi-experimental methods are used increasingly to evaluate programs in health systems research. Gaarder et al. [11] , Baird et al. [12] , and Kabeer and Waddington [13] have published reviews incorporating quasi-experimental studies on conditional cash transfer (CCT) programs, which make welfare benefits conditional upon beneficiaries taking specified actions like attending a health facility during the pre/post-natal period or enrolling children in school. Other reviews including quasi-experimental studies have evaluated health insurance schemes [14] , [15] and maternal and child health programs [16] . Other papers in this themed issue of the Journal of Clinical Epidemiology describe how quasi-experimental studies can be identified for evidence synthesis [17] , how data are best collected from quasi-experimental studies [18] , and how the global capacity for including quasi-experimental studies in evidence synthesis can best be expanded [19] , [20] . In this paper, we use studies from the reviews on the effects of CCT programs to illustrate the wide range of quasi-experimental methods used to quantify causal effects of the programs ( Table 1 ).
Experimental and quasi-experimental approaches applied in studies evaluating the effects of conditional cash transfer (CCT) programs
Study design label | Method of analysis | CCT program example |
---|---|---|
Randomized assignment | Bivariate (means comparison), multivariable regression | PROGRESSA, Mexico |
Regression discontinuity design | Regression analysis | Programme of Advancement Through Health and Education (PATH), Jamaica |
Instrumental variables regression (“fuzzy” discontinuity) | Bono de Desarrollo Humano (BDH), Ecuador | |
Natural experiment | Instrumental variables (e.g., two-stage least squares) regression analysis | Bolsa Alimentação, Brazil |
Interrupted time series | Time-series regression analysis | Safe Delivery Incentive Programme (SDIP), Nepal |
Difference study | Difference-in-differences (DID) regression analysis | Familias en Accion, Colombia |
Triple differences (DDD) regression analysis | Cambodia Education Sector Support Project (CESSP) | |
Cohort study | Propensity score matching (PSM), retrospective cohort | Tekoporã, Paraguay |
Cross-sectional study | Propensity score matching (PSM), regression analysis | Bolsa Familia, Brazil |
Some of the earliest CCT programs randomly assigned clusters (communities of households) and used longitudinal household survey data collected by researchers to estimate the effects of CCTs on the health of both adults and children [21] . The design and analysis of a cluster-randomized controlled trial of this kind is familiar to health care researchers [29] .
In other cases, it was not possible to assign beneficiaries randomly. In Jamaica's PATH program [22] , benefits were allocated to people with scores below a criterion level on a multidimensional deprivation index and the effects of the program were estimated using a regression discontinuity analysis. This study involved recruiting a cohort of participants being considered for benefits, to whom a policy decision was applied (i.e., assign benefits or not on the basis the specified deprivation threshold). In such studies, by assigning the intervention on the basis of a cutoff value for a covariate, the assignment mechanism (usually correlated with the outcome of interest) is completely known and can provide a strong basis for inferences, although usually in a less efficient manner than in randomized controlled trials (RCTs). The treatment effect is estimated as the difference (“discontinuity”) between two predictions of the outcome based on the covariate (the average treatment effect at the cutoff): one for individuals just above the covariate cutoff (control group) and one for individuals just below the cutoff (intervention group) [30] . The covariate is often a test score (e.g., to decide who receives a health or education intervention) [31] but can also be distance from a geographic boundary [32] . Challenges of this design are assignment determined approximately, but not perfectly, by the cutoff [33] or circumstances in which participants may be able to control factors determining their assignment status such as their score or location.
As with health care evaluation, many studies in health systems research combine multiple methods. In Ecuador's Bono de Desarrollo Humano program, leakages in implementation caused ineligible families to receive the program, compromising the original discontinuity assignment. To compensate for this problem, the effects of the program were estimated as a “fuzzy discontinuity” using IVE [23] . An instrument (in this case, a dichotomous variable taking the value of 1 or 0 depending on whether the participating family had a value on a proxy means test below or above a cutoff value used to determine eligibility to the program) must be associated with the assignment of interest, unrelated to potential confounding factors and related to the outcome of interest only by virtue of the relationship with the assignment of interest (and not, e.g., eligibility to another program which may affect the outcome of interest). If these conditions hold, then an unbiased effect of assignment can be estimated using two-stage regression methods [10] . The challenge lies not in the analysis itself (although such analyses are, typically, inefficient) but in demonstrating that the conditions for having a good instrument are met.
In the case of Bolsa Alimentação in Brazil, a computer error led eligible participants whose names contained nonstandard alphabetical characters to be excluded from the program. Because there are no reasons to believe that these individuals would have had systematically different characteristics to others, the exclusion of individuals was considered “as good as random” (i.e., a true natural experiment based on quasi-random assignment) [9] .
Comparatively few studies in this review used ITS estimation, and we are not aware of any studies in this literature which have been able to draw on sufficiently long time series with longitudinal data for individual units of observation in order for the design to qualify “as good as randomized.” An evaluation of Nepal's Safe Delivery Incentive Programme (SDIP) drew on multiple cohorts of eligible households before and after implementation over a 7-year period [24] . The outcome (neonatal mortality) for each household was available at points in time that could be related to the inception of the program. Unfortunately, comparison group data were not available for nonparticipants, so an analysis of secular trends due to general improvements in maternal and child health care (i.e., not due to SDIP) was not possible. However, the authors were able to implement a regression “placebo test” (sometimes called a “negative control”), in which SDIP treatment was linked to an outcome (use of antenatal care) which was not expected to be affected by the program, the rationale being that the lack of an estimated spike in antenatal care at the time of the expected change in mortality might suggest that these other confounding factors were not at play. But ultimately, due to the lack of comparison group data, the authors themselves note that the study is only able to provide “plausible evidence of an impact” rather than probabilistic evidence (p. 224).
Individual-level DID analyses use participant-level panel data (i.e., information collected in a consistent manner over time for a defined cohort of individuals). The Familias en Accion program in Colombia was evaluated using a DID analysis, where eligible and ineligible administrative clusters were matched initially using propensity scores. The effect of the intervention was estimated as the difference between groups of clusters that were or were not eligible for the intervention, taking into account the propensity scores on which they were matched [25] . DID analysis is only a credible method when we expect unobservable factors which determine outcomes to affect both groups equally over time (the “common trends” assumption). In the absence of common trends across groups, it is not possible to attribute the growth in the outcome to the program using the DID analysis. The problem is that we rarely have multiple period baseline data to compare variation between groups in outcomes over time before implementation, so the assumption is not usually verifiable. In such cases, placebo tests on outcomes which are related to possible confounders, but not the program of interest, can be investigated (see also above). Where multiple period baseline data are available, it may be possible to test for common trends directly and, where common trends in outcome levels are not supported, undertake a “difference-in-difference-in-differences” (DDDs) analysis. In Cambodia, the evaluators used DDD analysis to evaluate the Cambodia Education Sector Support Project, overcoming the observed lack of common trends in preprogram outcomes between beneficiaries and nonbeneficiaries [26] .
As in the case of Attanasio et al. above [25] , difference studies are usually made more credible when combined with methods of statistical matching because such studies are restricted to (or weighted by) individuals and groups with similar probabilities of participation based on observed characteristics—that is, observations “in the region of common support.” However, where panel or multiple time series cohort data are not available, statistical matching methods are often used alone. By contrast with the above examples, a conventional cohort study design was used to evaluate Tekoporã in Paraguay, relying on PSM and propensity weighted regression analysis of beneficiaries and nonbeneficiaries at entry into the cohort to control for confounding [27] . Similarly, for Bolsa Familia in Brazil evaluators applied PSM to cross-sectional (census) data [28] . Variables used to match observations in treatment and comparison should not be determined by program participation and are therefore best collected at baseline. However, this type of analysis alone does not satisfy the criterion of enabling adjustment for unobservable sources of confounding because it cannot rule out confounding of health outcomes data by unmeasured confounding factors, even when participants are well characterized at baseline.
The term “quasi-experimental” is also used by health care evaluation and social science researchers to describe studies in which assignment is nonrandom and influenced by the researchers. At the first appearance, many of the designs seem similar, although they are often labeled differently. Although an assignment rule may be known, it may not be exploitable in the way described above for health system evaluations; for example, quasi-random allocation may be biased because of a lack of concealment, even when the allocation rule is “as good as random.”
Researchers also use more conventional epidemiological designs, sometimes called observational, that exploit naturally occurring variation. Sometimes, the effects of interventions can be estimated in these cohorts using instrumental variables (prescribing preference; surgical volume; geographic variation, distance from health care facility), quantifying the effects of an intervention in a way that is considered to be unbiased [34] , [35] , [36] . Instrumental variable estimation using data from a randomized controlled trial to estimate the effect of treatment in the treated, when there is substantial nonadherence to the allocated intervention, is a particular instance of this approach [37] , [38] .
Nonrandomized study design labels commonly used by health care evaluation researchers include: nonrandomized controlled trial, controlled before-and-after study (CBA), interrupted time series study (ITS; and CITS), prospective, retrospective or historically controlled cohort studies (PCS, RCS and HCS respectively), nested case–control study, case–control study, cross-sectional study, and before-after study. Thumbnail sketches of these study designs are given in Box 2 . In addition, researchers sometimes report findings for uncontrolled cohorts or individuals (“case” series or reports), which only describe outcomes after an intervention [54] ; these are not considered further because these studies do not collect data for an explicit comparator. It should be noted that these sketches are the authors' interpretations of the labels; studies that other researchers describe using these labels may not conform to these descriptions.
Studies are cited which correspond to the way in which we conceive studies described with these labels. | |
Randomized controlled trial (RCT) | Individual participants, or clusters of participants, are randomly allocated to intervention or comparator. This design is the same as the RCT design described in . |
Quasi-randomized controlled trial (Q-RCT) | Individual participants, or clusters of participants, are allocated to intervention or comparator in a quasi-random manner. In health care evaluation studies, the allocation rule is often by alternation, day of the week, odd/even hospital, or social security number . The allocation rule may be as good as random but, typically, gives rise to a less credible study (compared to health system studies, where the allocation rule is applied by a higher level decision maker); if allocation is not concealed, research personnel who know the rule can recruit selectively or allocate participants in a biased way. This design is essentially the same as the Q-RCT design described in but with different mechanisms for allocation. |
Controlled before-and-after study (CBA) | Study in which outcomes are assessed at two time periods for several clusters (usually geographic). Clusters are classified into intervention and comparator groups. All clusters are studied without the intervention during period 1. Between periods 1 and 2, clusters in the intervention group implement the intervention of interest whereas clusters in the comparator group do not. The outcome for clusters receiving the intervention is compared to the outcome for comparator clusters during period 2, adjusted for the outcomes observed during period 1 (when no clusters had had the intervention). Observations usually represent episodes of care, so may or may not correspond to the same individuals during the two time periods. Data at either an aggregate or individual level can be analyzed. This design has similarities to the DID design described in . |
Nonrandomized controlled trial (NRCT) | This is usually a prospective cohort study in which allocation to intervention and comparator is not random or quasi-random and is applied by research personnel . The involvement of research personnel in the allocation rule may be difficult to discern; such studies may be labeled observational if the personnel responsible for the allocation rule are not clearly described or some personnel have both health care decision making and researcher roles. Individual-level data are usually analyzed. Note that nonrandom allocation of a health care intervention is often defined in relation to organizational factors (ward, clinic, doctor, provider organization) , and the analysis should take account of the data hierarchy if one exists. |
Interrupted time series (ITS) | When used to study health care interventions, observations usually represent episodes of care or events, the cohorts studied may or may not correspond to the same individuals at different time points and are often clustered in organizational units (e.g., a health facility or district). (Such studies may be considered to consist of multiple cross-sectional “snapshots.”) The analysis may be aggregated at the level of the clusters or at the level of individual episodes of care . If ITS do not have the benefit of analyzing multiple measurements from the same cohort over time ( ), confounding by secular trends needs to be assessed, for example, with reference to a contemporaneous comparison group (controlled interrupted time series, CITS, below). NB. Entries in are for ITS as defined in ; for ITS as defined here, entries for some cells would change. This design is similar to the ITS design described in . |
Controlled interrupted time series (CITS) | As above for an ITS but with data for a contemporaneous comparison group in which the intervention was not implemented . Measurements for the comparison group should be collected using the same methods. This design is similar to the CITS design described in . |
Concurrently controlled prospective cohort study (PCS) | A cohort study in which subjects are identified prospectively and classified as having received the intervention or comparator of interest on the basis of the prospectively collected information . Data for individuals are usually analyzed. However, it is important to note that nonrandom receipt of a health care intervention is almost always defined in relation to organizational factors (ward, clinic, doctor, provider organization), and the analysis should take into account the data hierarchy. This is equivalent to a “pipeline design” used in health systems program evaluation. It is very similar to a NRCT, except with respect to the method of allocation. |
Concurrently controlled retrospective cohort study (RCS) | A cohort study in which subjects are identified from historic records and classified as having received the intervention or comparator of interest on the basis of the historic information . As for a PCS, data for individuals are usually analyzed, but the analysis should take account of the data hierarchy. |
Historically controlled cohort study (HCS) | This type of cohort study is a combination of an RCS (for one group, usually receiving the comparator) and a PCS (for the second group, usually receiving the intervention) . Thus, the comparison between groups is not contemporaneous. The analysis should take into account the data hierarchy. |
Case–control study (CC) | Consecutive individuals experiencing an outcome of interest are identified, preferably prospectively, from within a defined population (but for whom relevant data have not been collected) and form a group of “cases” . Individuals, sometimes matched to the cases, who did not experience the outcome of interest are also identified from within the defined population and form the group of “controls.” Data characterizing the intervention or comparator received in the past are collected retrospectively from existing records or by interviewing participants. The receipt of the intervention or comparator of interest is compared among cases and controls. If applicable, the analysis should take into account the data hierarchy. |
Nested case–control study (NCC) | Individuals experiencing an outcome of interest are identified from within a defined cohort (for which some data have already been collected) and form a group of “cases.” Individuals, often matched to the cases, who did not experience the outcome of interest are also identified from within the defined cohort and form the group of “controls” . Additional data required for the study, characterizing the intervention or comparator received in the past, are collected retrospectively from existing records or by interviewing participants. The receipt of the intervention or comparator of interest is compared among cases and controls. If applicable, the analysis should take into account the data hierarchy. |
Before after study (BA) | As for CBA but without data for a control group of clusters . An uncontrolled comparison is made between frequencies of outcomes for the two time points. This term may also be applied to a study in which a cohort of individuals have the outcome (e.g., function, symptoms, or quality of life) measured before an intervention and after the intervention . This type of study comprises a single “exposed” cohort (often called a “case series”), with the outcome measured before and after exposure. If applicable, the analysis should take into account the data hierarchy. |
Cross-sectional study (XS) | The feature of this study design is that information required to classify individuals according to receipt of the intervention or comparator of interest and according to outcome are collected at the same time, sometimes preventing researchers from knowing whether the intervention preceded the outcome . In cross-sectional studies of health interventions, despite collecting data about the intervention/comparator and outcome at one point in time, the nature of the intervention and outcome may allow one to be confident about whether the intervention preceded the outcome. This design is similar to the XS design described in . |
The designs can have diverse features, despite having the same label. Particular features are often chosen to address the logistical challenges of evaluating particular research questions and settings. Therefore, it is not possible to illustrate them with examples drawn from a single review as in part 1; instead, studies exemplifying each design are cited across a wide range of research questions and settings. The converse also occurs, that is, study design labels are often inconsistently applied. This can present great difficulties when trying to classify studies, for example, to describe eligibility for inclusion in a review. Relying on the study design labels used by primary researchers themselves to describe their studies can lead to serious misclassifications.
For some generic study designs, there are distinct study types. For example, a cohort study can study intervention and comparator groups concurrently, with information about the intervention and comparator collected prospectively (PCS) or retrospectively (RCS), or study one group retrospectively and the other group prospectively (HCS). These different kinds of cohort study are conventionally distinguished according to the time when intervention and comparator groups are formed, in relation to the conception of the study. Some studies are sometimes incorrectly termed PCS, in our view, when data are collected prospectively, for example, for a clinical database, but when definitions of intervention and comparator required for the evaluation are applied retrospectively; in our view, this should be an RCS.
Some of the study designs described in parts 1 and 2 may seem similar, for example, DID and CBA, although they are labeled differently. Some other study design labels, for example, CITS/ITS, are used in both types of literature. In our view, these labels obscure some of the detailed features of the study designs that affect the robustness of causal attribution. Therefore, we have extended the checklist of features to highlight these differences. Where researchers use the same label to describe studies with subtly different features, we do not intend to imply that one or other use is incorrect; we merely wish to point out that studies referred to by the same labels may differ in ways that affect the robustness of an inference about the causal effect of the intervention of interest.
The checklist now includes seven questions ( Table 2 ). The table also sets out our responses for the range of study designs as described in Box 1 , Box 2 . The response “possibly” (P) is prevalent in the table, even given the descriptions in these boxes. We regard this as evidence of the ambiguity/inadequate specificity of the study design labels.
Quasi-experimental taxonomy features checklist
RCT | Q-RCT | IV | RD | CITS | ITS | DID | CBA | NRCT | PCS | RCS | HCT | NCC | CC | XS | BA | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1. Was the intervention/comparator: (answer “yes” to more than 1 item, if applicable) | ||||||||||||||||
Allocated to (provided for/administered to/chosen by) individuals? | P | P | Y | Y | P | P | P | P | P | P | P | P | Y | Y | P | P |
Allocated to (provided for/administered to/chosen by) clusters of individuals? | P | P | N | N | P | P | P | P | P | P | P | P | N | N | P | P |
Clustered in the way it was provided (by practitioner or organizational unit)? | P | P | P | P | P | P | P | P | P | P | P | P | P | P | P | P |
2. Were outcome data available: (answer “yes” to only 1 item) | ||||||||||||||||
After intervention/comparator only (same individuals)? | P | P | P | P | N | N | N | N | P | P | P | P | Y | Y | Y | N |
After intervention/comparator only (not all same individuals)? | N | N | N | N | P | P | N | P | P | P | P | P | N | N | N | P |
Before (once) AND after intervention/comparator (same individuals)? | P | P | P | P | N | N | N | P | P | P | P | P | N | N | P | Y |
Before (once) AND after intervention/comparator (not all same individuals)? | N | N | N | N | P | P | P | P | P | P | P | P | N | N | N | P |
Multiple times before AND multiple times after intervention/comparator (same individuals)? | P | P | P | P | P | P | P | P | P | P | P | P | N | N | P | P |
Multiple times before AND multiple times after intervention/comparator (not all same individuals)? | N | N | N | N | P | P | P | P | N | N | N | N | N | N | N | N |
3. Was the intervention effect estimated by: (answer “yes” to only one item) | ||||||||||||||||
Change over time (same individuals at different time points)? | N | N | N | N | N | Y | N | N | N | N | N | N | N | N | N | P |
Change over time (not all same individuals at different time points)? | N | N | N | N | N | Y | N | N | N | N | N | N | N | N | N | P |
Difference between groups (of individuals or clusters receiving either intervention or comparator)? | Y | Y | Y | Y | Y | N | Y | Y | Y | Y | Y | Y | Y | Y | Y | N |
4. Did the researchers aim to control for confounding (design or analysis) (answer “yes” to only one item) | ||||||||||||||||
Using methods that control in principle for any confounding? | Y | Y | Y | Y | Y | Y | N | N | N | N | N | N | N | N | N | N |
Using methods that control in principle for time-invariant unobserved confounding? | N | N | N | N | N | N | Y | Y | N | N | N | N | N | N | N | N |
Using methods that control only for confounding by observed covariates? | P | P | P | P | P | P | P | P | Y | Y | Y | Y | Y | Y | Y | N |
5. Were groups of individuals or clusters formed by (answer “yes” to more than one item, if applicable) | ||||||||||||||||
Randomization? | Y | N | N | N | N | na | N | N | N | N | N | N | N | N | N | na |
Quasi-randomization? | N | Y | N | N | N | na | N | N | N | N | N | N | N | N | N | na |
Explicit rule for allocation based on a threshold for a variable measured on a continuous or ordinal scale or boundary (in conjunction with identifying the variable dimension, below)? | N | N | Y | Y | N | na | N | N | N | N | N | N | N | N | N | na |
Some other action of researchers? | N | N | P | P | P | na | N | N | Y | P | P | P | N | N | N | na |
Time differences? | N | N | N | N | Y | na | N | N | N | N | N | Y | N | N | N | na |
Location differences? | N | N | P | P | P | na | P | P | P | P | P | P | N | N | P | na |
Health care decision makers/practitioners? | N | N | P | P | P | na | P | P | P | P | P | P | N | N | P | na |
Participants' preferences? | N | N | P | N | N | na | P | P | P | P | P | P | N | N | P | na |
Policy maker | N | N | P | P | P | na | P | P | P | P | P | P | N | N | P | na |
On the basis of outcome? | N | N | N | N | N | na | N | N | N | N | N | N | Y | Y | N | na |
Some other process? (specify) | N | N | P | P | P | na | P | P | P | P | P | P | N | N | P | na |
6. Were the following features of the study carried out after the study was designed (answer “yes” to more than one item, if applicable) | ||||||||||||||||
Characterization of individuals/clusters before intervention? | Y | Y | P | P | P | P | P | P | Y | Y | P | P | N | N | N | P |
Actions/choices leading to an individual/cluster becoming a member of a group? | Y | Y | P | P | P | na | P | P | Y | Y | P | P | N | N | N | na |
Assessment of outcomes? | Y | Y | P | P | P | P | P | P | Y | Y | P | P | P | P | N | P |
7. Were the following variables measured before intervention: (answer “yes” to more than one item, if applicable) | ||||||||||||||||
Potential confounders? | P | P | P | P | P | N | P | P | P | P | P | P | P | P | N | N |
Outcome variable(s)? | P | P | P | P | Y | Y | Y | Y | P | P | P | P | N | N | N | P |
Abbreviations: RCT, randomized controlled trial; Q-RCT, quasi-randomized controlled trial; IV, instrumental variable; RD, regression discontinuity; CITS, controlled interrupted time series; ITS, interrupted time series; DID, difference-in-difference; CBA, controlled before-and-after study; NRCT, nonrandomized controlled trial; PCS, prospective cohort study; RCS, retrospective cohort study; HCT, historically controlled study; NCC, nested case–control study; CC, case–control study; XS, cross-sectional study; BA, before-after study; Y, yes; N, no; P, possibly; na, not applicable.
Cells in the table are completed with respect to the thumbnail sketches of the corresponding designs described in Box 1 , Box 2 .
Question 1 is new and addresses the issue of clustering, either by design or through the organizational structure responsible for delivering the intervention ( Box 3 ). This question avoids the need for separate checklists for designs based on assigning individual and clusters. A “yes” response can be given to more than one response item; the different types clustering may both occur in a single study and implicit clustering can occur an individually allocated nonrandomized study.
Clustering is a potentially important consideration in both RCTs and nonrandomized studies. Clusters exist when observations are nested within higher level organizational units or structures for implementing an intervention or data collected; typically, observations within clusters will be more similar with respect to outcomes of interest than observations between clusters. Clustering is a natural consequence of many methods of nonrandomized assignment/designation because of the way in which many interventions are implemented. Analyses of clustered data that do not take clustering into account will tend to overestimate the precision of effect estimates.
Clustering occurs when implementation of an intervention is explicitly at the level of a cluster/organizational unit (as in a cluster-randomized controlled trial, in which each cluster is explicitly allocated to control or intervention). Clustering can also arise implicitly, from naturally occurring hierarchies in the data set being analyzed, that reflect clusters that are intrinsically involved in the delivery of the intervention or comparator. Both explicit and implicit clustering can be present in a single study.
Question 1 in the checklist distinguishes individual allocation, cluster allocation (explicit clustering), and clustering due to the organizational hierarchy involved in the delivery of the interventions being compared (implicit clustering). Users should respond factually, that is, with respect to the presence of clustering, without making a judgment about the likely importance of clustering (degree of dependence between observations within clusters).
Questions 2–4 are also new, replacing the first question (“Was there a relevant comparison?”) in the original checklist [1] , [2] . These questions are designed to tease apart the nature of the research question and the basis for inferring causality.
Question 2 classifies studies according to the number of times outcome assessments were available. In each case, the response items distinguish whether or not the outcome is assessed in the same or different individuals at different times. Only one response item can be answered “yes.”
Treatment effects can be estimated as changes over time or between groups. Question 3 aims to classify studies according to the parameter being estimated. Response items distinguish changes over time for the same or different individuals. Only one response item can be answered “yes.”
Question 4 asks about the principle through which the primary researchers aimed to control for confounding. Three response items distinguish methods that:
The choice between these items (again, only one can be answered “yes”) is key to understanding the basis for inferring causality.
Questions 5–7 are essentially the same as in the original checklist [1] , [2] . Question 5 asks about how groups (of individuals or clusters) were formed because treatment effects are most frequently estimated from between group comparisons. An additional response option, namely by a forcing variable, has been included to identify credible quasi-experimental studies that use an explicit rule for assignment based on a threshold for a variable measured on a continuous or ordinal scale or in relation to a spatial boundary. When answering “yes” to this item, the review author should also identify the nature of the variable by answering “yes” to another item. Possible assignment rules are identified: the action of researchers, time differences, location differences, health care decision makers/practitioners, policy makers, on the basis of the outcome, or some other process. Other, nonexperimental, study designs should be classified by the method of assignment (same list of variables) but without there being an explicit assignment rule.
Question 6 asks about important features of a study in relation to the timing of their implementation. Studies are classified according to whether three key steps were carried out after the study was designed, namely: acquisition of source data to characterize individuals/clusters before intervention; actions or choices leading to an individual or cluster becoming a member of a group; and the assessment of outcomes. One or more of these items can be answered “yes,” as would be the case for all steps in a conventional RCT.
Question 7 asks about the variables that were measured and available to control for confounding in the analysis. The two broad classes of variables that are important are the identification and collection of potential confounder variables and baseline assessment of the outcome variable(s). The answers to this question will be less important if the researchers of the original study used a method to control for any confounding, that is, used a credible quasi-experimental design.
The health care evaluation community has historically been much more difficult to win around to the potential value of nonrandomized studies to evaluate interventions. We think that the checklist helps to explain why, that is, because designs used in health care evaluation do not often control for unobservables when the study features are examined carefully. To the extent that these features are immutable, the skepticism is justified. However, to the extent that studies may be possible with features that promote the credibility of causal inference, health care evaluation researchers may be missing an opportunity to provide high-quality evidence.
Reflecting on the circumstances of nonrandomized evaluations of health care and health system interventions may provide some insights why these different groups have disagreed about the credibility of effects estimated in quasi-experimental studies. The checklist shows that credible quasi-experimental studies gain credibility from using high-quality longitudinal/panel data; such data characterizing health care are rare, leading to evaluations that “make do” with the data that are available in existing information systems.
The risk of confounding in health care settings is inherently greater because participants' characteristics are fundamental to choices about interventions in usual care; mitigating against this risk requires high-quality clinical data to characterize participants at baseline and, for pharmaco-epidemiological studies about safety, often over time. Important questions about health care for which quasi-experimental methods of evaluation are typically considered are often to do with the outcome of discrete episodes of care, usually binary, rather than long-term outcomes for a cohort of individuals; this can lead to a focus on the invariant nature of the organizations providing the care rather than the varying nature of the individuals receiving care. These contrasts are apparent between, for example: DID studies using panel data to evaluate an intervention such as CCT among individuals with CBA studies of an intervention implemented at an organizational level studying multiple cross-sections of health care episodes; or credible and less credible interrupted time series.
There is a new article in the field of hospital epidemiology which also highlights various features of what it terms as quasi-experimental designs [56] . The list of features appears to be aimed at researchers designing a quasi-experimental study, acting more as a prompt (e.g., “consider options for …”) rather than as a checklist for a researcher appraising a study to communicate clearly to others about the nature of a published study, which is our perspective (e.g., a review author). There is some overlap with our checklist, but the list described also includes several study attributes intended to reduce the risk of bias, for example, blinding. By contrast, we consider that an assessment of the risk of bias in a study is essential and needs to be carried out as a separate task.
The primary intention of the checklist is to help review authors to set eligibility criteria for studies to include in a review that relate directly to the intrinsic strength of the studies in inferring causality. The checklist should also illuminate the debate between researchers in different fields about the strength of studies with different features—a debate which has to date been somewhat obscured by the use of different terminology by researchers working in different fields of investigation. Furthermore, where disagreements persist, the checklist should allow researchers to inspect the basis for these differences, for example, the principle through which researchers aimed to control for confounding and shift their attention to clarifying the basis for their respective responses for particular items.
Authors' contributions: All three authors collaborated to draw up the extended checklist. G.A.W. prepared the first draft of the paper. H.W. contributed text for Part 1. B.C.R. revised the first draft and created the current structure. All three authors approved submission of the final manuscript.
Funding: B.C.R is supported in part by the U.K. National Institute for Health Research Bristol Cardiovascular Biomedical Research Unit. H.W. is supported by 3ie.
Home » Quasi-Experimental Research Design – Types, Methods
Table of Contents
Quasi-experimental design is a research method that seeks to evaluate the causal relationships between variables, but without the full control over the independent variable(s) that is available in a true experimental design.
In a quasi-experimental design, the researcher uses an existing group of participants that is not randomly assigned to the experimental and control groups. Instead, the groups are selected based on pre-existing characteristics or conditions, such as age, gender, or the presence of a certain medical condition.
There are several types of quasi-experimental designs that researchers use to study causal relationships between variables. Here are some of the most common types:
This design involves selecting two groups of participants that are similar in every way except for the independent variable(s) that the researcher is testing. One group receives the treatment or intervention being studied, while the other group does not. The two groups are then compared to see if there are any significant differences in the outcomes.
This design involves collecting data on the dependent variable(s) over a period of time, both before and after an intervention or event. The researcher can then determine whether there was a significant change in the dependent variable(s) following the intervention or event.
This design involves measuring the dependent variable(s) before and after an intervention or event, but without a control group. This design can be useful for determining whether the intervention or event had an effect, but it does not allow for control over other factors that may have influenced the outcomes.
This design involves selecting participants based on a specific cutoff point on a continuous variable, such as a test score. Participants on either side of the cutoff point are then compared to determine whether the intervention or event had an effect.
This design involves studying the effects of an intervention or event that occurs naturally, without the researcher’s intervention. For example, a researcher might study the effects of a new law or policy that affects certain groups of people. This design is useful when true experiments are not feasible or ethical.
Here are some data analysis methods that are commonly used in quasi-experimental designs:
This method involves summarizing the data collected during a study using measures such as mean, median, mode, range, and standard deviation. Descriptive statistics can help researchers identify trends or patterns in the data, and can also be useful for identifying outliers or anomalies.
This method involves using statistical tests to determine whether the results of a study are statistically significant. Inferential statistics can help researchers make generalizations about a population based on the sample data collected during the study. Common statistical tests used in quasi-experimental designs include t-tests, ANOVA, and regression analysis.
This method is used to reduce bias in quasi-experimental designs by matching participants in the intervention group with participants in the control group who have similar characteristics. This can help to reduce the impact of confounding variables that may affect the study’s results.
This method is used to compare the difference in outcomes between two groups over time. Researchers can use this method to determine whether a particular intervention has had an impact on the target population over time.
This method is used to examine the impact of an intervention or treatment over time by comparing data collected before and after the intervention or treatment. This method can help researchers determine whether an intervention had a significant impact on the target population.
This method is used to compare the outcomes of participants who fall on either side of a predetermined cutoff point. This method can help researchers determine whether an intervention had a significant impact on the target population.
Here are the general steps involved in conducting a quasi-experimental design:
Here are some examples of real-time quasi-experimental designs:
Here are some applications of quasi-experimental design:
Here are some situations where quasi-experimental designs may be appropriate:
The purpose of quasi-experimental design is to investigate the causal relationship between two or more variables when it is not feasible or ethical to conduct a randomized controlled trial (RCT). Quasi-experimental designs attempt to emulate the randomized control trial by mimicking the control group and the intervention group as much as possible.
The key purpose of quasi-experimental design is to evaluate the impact of an intervention, policy, or program on a targeted outcome while controlling for potential confounding factors that may affect the outcome. Quasi-experimental designs aim to answer questions such as: Did the intervention cause the change in the outcome? Would the outcome have changed without the intervention? And was the intervention effective in achieving its intended goals?
Quasi-experimental designs are useful in situations where randomized controlled trials are not feasible or ethical. They provide researchers with an alternative method to evaluate the effectiveness of interventions, policies, and programs in real-life settings. Quasi-experimental designs can also help inform policy and practice by providing valuable insights into the causal relationships between variables.
Overall, the purpose of quasi-experimental design is to provide a rigorous method for evaluating the impact of interventions, policies, and programs while controlling for potential confounding factors that may affect the outcome.
Quasi-experimental designs have several advantages over other research designs, such as:
There are several limitations associated with quasi-experimental designs, which include:
Researcher, Academic Writer, Web developer
Our systems are now restored following recent technical disruption, and we’re working hard to catch up on publishing. We apologise for the inconvenience caused. Find out more: https://www.cambridge.org/universitypress/about-us/news-and-blogs/cambridge-university-press-publishing-update-following-technical-disruption
We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings .
from Part III - Data Collection
Published online by Cambridge University Press: 25 May 2023
In this chapter, we discuss the logic and practice of quasi-experimentation. Specifically, we describe four quasi-experimental designs – one-group pretest–posttest designs, non-equivalent group designs, regression discontinuity designs, and interrupted time-series designs – and their statistical analyses in detail. Both simple quasi-experimental designs and embellishments of these simple designs are presented. Potential threats to internal validity are illustrated along with means of addressing their potentially biasing effects so that these effects can be minimized. In contrast to quasi-experiments, randomized experiments are often thought to be the gold standard when estimating the effects of treatment interventions. However, circumstances frequently arise where quasi-experiments can usefully supplement randomized experiments or when quasi-experiments can fruitfully be used in place of randomized experiments. Researchers need to appreciate the relative strengths and weaknesses of the various quasi-experiments so they can choose among pre-specified designs or craft their own unique quasi-experiments.
Save book to kindle.
To save this book to your Kindle, first ensure [email protected] is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle .
Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Find out more about the Kindle Personal Document Service .
To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox .
To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive .
IMAGES
VIDEO
COMMENTS
Like a true experiment, a quasi-experimental design aims to establish a cause-and-effect relationship between an independent and dependent variable. However, unlike a true experiment, a quasi-experiment does not rely on random assignment. Instead, subjects are assigned to groups based on non-random criteria.
What is a Quasi Experimental Design? A quasi experimental design is a method for identifying causal relationships that does not randomly assign participants to the experimental groups. Instead, researchers use a non-random process. For example, they might use an eligibility cutoff score or preexisting groups to determine who receives the treatment.
Quasi-experimental research is a quantitative research method. It involves numerical data collection and statistical analysis. Quasi-experimental research compares groups with different circumstances or treatments to find cause-and-effect links. It draws statistical conclusions from quantitative data.
research design. Explain the three essential components of experimental designs, and compare and contrast the following experimental designs: randomized controlled trials, crossover, factorial, and Solomon four group designs. Discuss the advantages and disadvantages of various experimental designs. Compare and contrast the nonequivalent
Explain what quasi-experimental research is and distinguish it clearly from both experimental and correlational research. Describe three different types of quasi-experimental research designs (nonequivalent groups, pretest-posttest, and interrupted time series) and identify examples of each one.
Quasi-experimental design refers to a type of experimental design that uses pre-existing groups of people rather than random groups. Because the groups of research participants already exist, they cannot be randomly assigned to a cohort.
The paper is structured in three parts. Part 1 sets out designs currently used for health systems evaluations, illustrating their use through inclusion of different designs/analyses in a recent systematic review. Part 2 describes designs used for health intervention/program evaluations.
Quasi-experimental design is a research method that seeks to evaluate the causal relationships between variables, but without the full control over the independent variable (s) that is available in a true experimental design.
Specifically, we describe four quasi-experimental designs – one-group pretest–posttest designs, non-equivalent group designs, regression discontinuity designs, and interrupted time-series designs – and their statistical analyses in detail.
Publisher: SAGE Publications Ltd. Publication year: 2020. Online pub date: September 23, 2020. Discipline: Psychology. Methods: Quasi-experimental designs, Internal validity, Independent variables. Length: 5k+ Words. DOI: https:// doi. org/10.4135/9781526421036914289. More information. Entry.