Jump to navigation

Home

Cochrane Training

Chapter 3: defining the criteria for including studies and how they will be grouped for the synthesis.

Joanne E McKenzie, Sue E Brennan, Rebecca E Ryan, Hilary J Thomson, Renea V Johnston, James Thomas

Key Points:

  • The scope of a review is defined by the types of population (participants), types of interventions (and comparisons), and the types of outcomes that are of interest. The acronym PICO (population, interventions, comparators and outcomes) helps to serve as a reminder of these.
  • The population, intervention and comparison components of the question, with the additional specification of types of study that will be included, form the basis of the pre-specified eligibility criteria for the review. It is rare to use outcomes as eligibility criteria: studies should be included irrespective of whether they report outcome data, but may legitimately be excluded if they do not measure outcomes of interest, or if they explicitly aim to prevent a particular outcome.
  • Cochrane Reviews should include all outcomes that are likely to be meaningful and not include trivial outcomes. Critical and important outcomes should be limited in number and include adverse as well as beneficial outcomes.
  • Review authors should plan at the protocol stage how the different populations, interventions, outcomes and study designs within the scope of the review will be grouped for analysis.

Cite this chapter as: McKenzie JE, Brennan SE, Ryan RE, Thomson HJ, Johnston RV, Thomas J. Chapter 3: Defining the criteria for including studies and how they will be grouped for the synthesis [last updated August 2023]. In: Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, Welch VA (editors). Cochrane Handbook for Systematic Reviews of Interventions version 6.5. Cochrane, 2024. Available from www.training.cochrane.org/handbook .

3.1 Introduction

One of the features that distinguishes a systematic review from a narrative review is that systematic review authors should pre-specify criteria for including and excluding studies in the review (eligibility criteria, see MECIR Box 3.2.a ).

When developing the protocol, one of the first steps is to determine the elements of the review question (including the population, intervention(s), comparator(s) and outcomes, or PICO elements) and how the intervention, in the specified population, produces the expected outcomes (see Chapter 2, Section 2.5.1 and Chapter 17, Section 17.2.1 ). Eligibility criteria are based on the PICO elements of the review question plus a specification of the types of studies that have addressed these questions. The population, interventions and comparators in the review question usually translate directly into eligibility criteria for the review, though this is not always a straightforward process and requires a thoughtful approach, as this chapter shows. Outcomes usually are not part of the criteria for including studies, and a Cochrane Review would typically seek all sufficiently rigorous studies (most commonly randomized trials) of a particular comparison of interventions in a particular population of participants, irrespective of the outcomes measured or reported. It should be noted that some reviews do legitimately restrict eligibility to specific outcomes. For example, the same intervention may be studied in the same population for different purposes; or a review may specifically address the adverse effects of an intervention used for several conditions (see Chapter 19 ).

Eligibility criteria do not exist in isolation, but should be specified with the synthesis of the studies they describe in mind. This will involve making plans for how to group variants of the PICO elements for synthesis. This chapter describes the processes by which the structure of the synthesis can be mapped out at the beginning of the review, and the interplay between the review question, considerations for the analysis and their operationalization in terms of eligibility criteria. Decisions about which studies to include (and exclude), and how they will be combined in the review’s synthesis, should be documented and justified in the review protocol.

A distinction between three different stages in the review at which the PICO construct might be used is helpful for understanding the decisions that need to be made. In Chapter 2, Section 2.3 , we introduced the ideas of a review PICO (on which eligibility of studies is based), the PICO for each synthesis (defining the question that each specific synthesis aims to answer) and the PICO of the included studies (what was actually investigated in the included studies). In this chapter, we focus on the review PICO and the PICO for each synthesis as a basis for specifying which studies should be included in the review and planning its syntheses. These PICOs should relate clearly and directly to the questions or hypotheses that are posed when the review is formulated (see Chapter 2 ) and will involve specifying the population in question, and a set of comparisons between the intervention groups.

An integral part of the process of setting up the review is to specify which characteristics of the interventions (e.g. individual compounds of a drug), populations (e.g. acute and chronic conditions), outcomes (e.g. different depression measurement scales) and study designs, will be grouped together. Such decisions should be made independent of knowing which studies will be included and the methods of synthesis that will be used (e.g. meta-analysis). There may be a need to modify the comparisons and even add new ones at the review stage in light of the data that are collected. For example, important variations in the intervention may be discovered only after data are collected, or modifying the comparison may facilitate the possibility of synthesis when only one or few studies meet the comparison PICO. Planning for the latter scenario at the protocol stage may lead to less post-hoc decision making ( Chapter 2, Section 2.5.3 ) and, of course, any changes made during the conduct of the review should be recorded and documented in the final report.

3.2 Articulating the review and comparison PICO

3.2.1 defining types of participants: which people and populations.

The criteria for considering types of people included in studies in a review should be sufficiently broad to encompass the likely diversity of studies and the likely scenarios in which the interventions will be used, but sufficiently narrow to ensure that a meaningful answer can be obtained when studies are considered together; they should be specified in advance (see MECIR Box 3.2.a ). As discussed in Chapter 2, Section 2.3.1 , the degree of breadth will vary, depending on the question being asked and the analytical approach to be employed. A range of evidence may inform the choice of population characteristics to examine, including theoretical considerations, evidence from other interventions that have a similar mechanism of action, and in vitro or animal studies. Consideration should be given to whether the population characteristic is at the level of the participant (e.g. age, severity of disease) or the study (e.g. care setting, geographical location), since this has implications for grouping studies and for the method of synthesis ( Chapter 10, Section 10.11.5 ). It is often helpful to consider the types of people that are of interest in three steps.

MECIR Box 3.2.a Relevant expectations for conduct of intervention reviews

Predefining unambiguous criteria for participants ( )

Predefined, unambiguous eligibility criteria are a fundamental prerequisite for a systematic review. The criteria for considering types of people included in studies in a review should be sufficiently broad to encompass the likely diversity of studies, but sufficiently narrow to ensure that a meaningful answer can be obtained when studies are considered in aggregate. Considerations when specifying participants include setting, diagnosis or definition of condition and demographic factors. Any restrictions to study populations must be based on a sound rationale, since it is important that Cochrane Reviews are widely relevant.

Predefining a strategy for studies with a subset of eligible participants ( )

Sometimes a study includes some ‘eligible’ participants and some ‘ineligible’ participants, for example when an age cut-off is used in the review’s eligibility criteria. If data from the eligible participants cannot be retrieved, a mechanism for dealing with this situation should be pre-specified.

First, the diseases or conditions of interest should be defined using explicit criteria for establishing their presence (or absence). Criteria that will force the unnecessary exclusion of studies should be avoided. For example, diagnostic criteria that were developed more recently – which may be viewed as the current gold standard for diagnosing the condition of interest – will not have been used in earlier studies. Expensive or recent diagnostic tests may not be available in many countries or settings, and time-consuming tests may not be practical in routine healthcare settings.

Second, the broad population and setting of interest should be defined . This involves deciding whether a specific population group is within scope, determined by factors such as age, sex, race, educational status or the presence of a particular condition such as angina or shortness of breath. Interest may focus on a particular setting such as a community, hospital, nursing home, chronic care institution, or outpatient setting. Box 3.2.a outlines some factors to consider when developing population criteria.

Whichever criteria are used for defining the population and setting of interest, it is common to encounter studies that only partially overlap with the review’s population. For example, in a review focusing on children, a cut-point of less than 16 years might be desirable, but studies may be identified with participants aged from 12 to 18. Unless the study reports separate data from the eligible section of the population (in which case data from the eligible participants can be included in the review), review authors will need a strategy for dealing with these studies (see MECIR Box 3.2.a ). This will involve balancing concerns about reduced applicability by including participants who do not meet the eligibility criteria, against the loss of data when studies are excluded. Arbitrary rules (such as including a study if more than 80% of the participants are under 16) will not be practical if detailed information is not available from the study. A less stringent rule, such as ‘the majority of participants are under 16’ may be sufficient. Although there is a risk of review authors’ biases affecting post-hoc inclusion decisions (which is why many authors endeavour to pre-specify these rules), this may be outweighed by a common-sense strategy in which eligibility decisions keep faith with the objectives of the review rather than with arbitrary rules. Difficult decisions should be documented in the review, checked with the advisory group (if available, see Chapter 1 ), and sensitivity analyses can assess the impact of these decisions on the review’s findings (see Chapter 10, Section 10.14 and MECIR Box 3.2.b ).

Box 3.2.a Factors to consider when developing criteria for ‘Types of participants’

MECIR Box 3.2.b Relevant expectations for conduct of intervention reviews

Changing eligibility criteria ( )

Following pre-specified eligibility criteria is a fundamental attribute of a systematic review. However, unanticipated issues may arise. Review authors should make sensible post-hoc decisions about exclusion of studies, and these should be documented in the review, possibly accompanied by sensitivity analyses. Changes to the protocol must not be made on the basis of the findings of the studies or the synthesis, as this can introduce bias.

Third, there should be consideration of whether there are population characteristics that might be expected to modify the size of the intervention effects (e.g. different severities of heart failure). Identifying subpopulations may be important for implementation of the intervention. If relevant subpopulations are identified, two courses of action are possible: limiting the scope of the review to exclude certain subpopulations; or maintaining the breadth of the review and addressing subpopulations in the analysis.

Restricting the review with respect to specific population characteristics or settings should be based on a sound rationale. It is important that Cochrane Reviews are globally relevant, so the rationale for the exclusion of studies based on population characteristics should be justified. For example, focusing a review of the effectiveness of mammographic screening on women between 40 and 50 years old may be justified based on biological plausibility, previously published systematic reviews and existing controversy. On the other hand, focusing a review on a particular subgroup of people on the basis of their age, sex or ethnicity simply because of personal interests, when there is no underlying biologic or sociological justification for doing so, should be avoided, as these reviews will be less useful to decision makers and readers of the review.

Maintaining the breadth of the review may be best when it is uncertain whether there are important differences in effects among various subgroups of people, since this allows investigation of these differences (see Chapter 10, Section 10.11.5 ). Review authors may combine the results from different subpopulations in the same synthesis, examining whether a given subdivision explains variation (heterogeneity) among the intervention effects. Alternatively, the results may be synthesized in separate comparisons representing different subpopulations. Splitting by subpopulation risks there being too few studies to yield a useful synthesis (see Table 3.2.a and Chapter 2, Section 2.3.2 ). Consideration needs to be given to the subgroup analysis method, particularly for population characteristics measured at the participant level (see Chapter 10 and Chapter 26 , Fisher et al 2017). All subgroup analyses should ideally be planned a priori and stated as a secondary objective in the protocol, and not driven by the availability of data.

In practice, it may be difficult to assign included studies to defined subpopulations because of missing information about the population characteristic, variability in how the population characteristic is measured across studies (e.g. variation in the method used to define the severity of heart failure), or because the study does not wholly fall within (or report the results separately by) the defined subpopulation. The latter issue mainly applies for participant characteristics but can also arise for settings or geographic locations where these vary within studies. Review authors should consider planning for these scenarios (see example reviews Hetrick et al 2012, Safi et al 2017; Table 3.2.b , column 3).

Table 3.2.a Examples of population attributes and characteristics

Intended recipient of intervention

Patient, carer, healthcare provider (general practitioners, nurses, allied health professionals), health system, policy maker, community

In a review of e-learning programmes for health professionals, a subgroup analysis was planned to examine if the effects were modified by the (doctors, nurses or physiotherapists). The authors hypothesized that e-learning programmes for doctors would be more effective than for other health professionals, but did not provide a rationale (Vaona et al 2018).

Disease/condition (to be treated or prevented)

Type and severity of a condition

In a review of platelet-rich therapies for musculoskeletal soft tissue injuries, a subgroup analysis was undertaken to examine if the effects of platelet-rich therapies were modified by the (e.g. rotator cuff tear, anterior cruciate ligament reconstruction, chronic Achilles tendinopathy) (Moraes et al 2014).

In planning a review of beta-blockers for heart failure, subgroup analyses were specified to examine if the effects of beta-blockers are modified by the (e.g. idiopathic dilated cardiomyopathy, ischaemic heart disease, valvular heart disease, hypertension) and the (‘reduced left ventricular ejection fraction (LVEF)’ ≤ 40%, ‘mid-range LVEF’ > 40% and < 50%, ‘preserved LVEF’ ≥ 50%, mixed, not specified). Studies have shown that patient characteristics and comorbidities differ by heart failure severity, and that therapies have been shown to reduce morbidity in ‘reduced LVEF’ patients, but the benefits in the other groups are uncertain (Safi et al 2017).

Participant characteristics

Age (neonate, child, adolescent, adult, older adult)

Race/ethnicity

Sex/gender

PROGRESS-Plus equity characteristics (e.g. place of residence, socio-economic status, education) (O’Neill et al 2014)

In a review of newer-generation antidepressants for depressive disorders in children and adolescents, a subgroup analysis was undertaken to examine if the effects of the antidepressants were modified by . The rationale was based on the findings of another review that suggested that children and adolescents may respond differently to antidepressants. The age groups were defined as ‘children’ (aged approximately 6 to 12 years), ‘adolescents’ (aged approximately 13 to 18 years), and ‘children and adolescents’ (when the study included both children and adolescents, and results could not be obtained separately by these subpopulations) (Hetrick et al 2012).

Setting

Setting of care (primary care, hospital, community)

Rurality (urban, rural, remote)

Socio-economic setting (low and middle-income countries, high-income countries)

Hospital ward (e.g. intensive care unit, general medical ward, outpatient)

In a review of hip protectors for preventing hip fractures in older people, separate comparisons were specified based on (institutional care or community-dwelling) for the critical outcome of hip fracture (Santesso et al 2014).

3.2.2 Defining interventions and how they will be grouped

In some reviews, predefining the intervention ( MECIR Box 3.2.c ) may be straightforward. For example, in a review of the effect of a given anticoagulant on deep vein thrombosis, the intervention can be defined precisely. A more complicated definition might be required for a multi-component intervention composed of dietary advice, training and support groups to reduce rates of obesity in a given population.

The inherent complexity present when defining an intervention often comes to light when considering how it is thought to achieve its intended effect and whether the effect is likely to differ when variants of the intervention are used. In the first example, the anticoagulant warfarin is thought to reduce blood clots by blocking an enzyme that depends on vitamin K to generate clotting factors. In the second, the behavioural intervention is thought to increase individuals’ self-efficacy in their ability to prepare healthy food. In both examples, we cannot assume that all forms of the intervention will work in the same way. When defining drug interventions, such as anticoagulants, factors such as the drug preparation, route of administration, dose, duration, and frequency should be considered. For multi-component interventions (such as interventions to reduce rates of obesity), the common or core features of the interventions must be defined, so that the review authors can clearly differentiate them from other interventions not included in the review.

MECIR Box 3.2.c Relevant expectations for conduct of intervention reviews

Predefining unambiguous criteria for interventions and comparators ( )

Predefined, unambiguous eligibility criteria are a fundamental prerequisite for a systematic review. Specification of comparator interventions requires particular clarity: are the experimental interventions to be compared with an inactive control intervention (e.g. placebo, no treatment, standard care, or a waiting list control), or with an active control intervention (e.g. a different variant of the same intervention, a different drug, a different kind of therapy)? Any restrictions on interventions and comparators, for example, regarding delivery, dose, duration, intensity, co-interventions and features of complex interventions should also be predefined and explained.

In general, it is useful to consider exactly what is delivered, who delivers it, how it is delivered, where it is delivered, when and how much is delivered, and whether the intervention can be adapted or tailored , and to consider this for each type of intervention included in the review (see the TIDieR checklist (Hoffmann et al 2014)). As argued in Chapter 17 , separating interventions into ‘simple’ and ‘complex’ is a false dichotomy; all interventions can be complex in some ways. The critical issue for review authors is to identify the most important factors to be considered in a specific review. Box 3.2.b outlines some factors to consider when developing broad criteria for the ‘Types of interventions’ (and comparisons).

Box 3.2.b Factors to consider when developing criteria for ‘Types of interventions’

Once interventions eligible for the review have been broadly defined, decisions should be made about how variants of the intervention will be handled in the synthesis. Differences in intervention characteristics across studies occur in all reviews. If these reflect minor differences in the form of the intervention used in practice (such as small differences in the duration or content of brief alcohol counselling interventions), then an overall synthesis can provide useful information for decision makers. Where differences in intervention characteristics are more substantial (such as delivery of brief alcohol counselling by nurses versus doctors), and are expected to have a substantial impact on the size of intervention effects, these differences should be examined in the synthesis. What constitutes an important difference requires judgement, but in general differences that alter decisions about how an intervention is implemented or whether the intervention is used or not are likely to be important. In such circumstances, review authors should consider specifying separate groups (or subgroups) to examine in their synthesis.

Clearly defined intervention groups serve two main purposes in the synthesis. First, the way in which interventions are grouped for synthesis (meta-analysis or other synthesis) is likely to influence review findings. Careful planning of intervention groups makes best use of the available data, avoids decisions that are influenced by study findings (which may introduce bias), and produces a review focused on questions relevant to decision makers. Second, the intervention groups specified in a protocol provide a standardized terminology for describing the interventions throughout the review, overcoming the varied descriptions used by study authors (e.g. where different labels are used for the same intervention, or similar labels used for different techniques) (Michie et al 2013). This standardization enables comparison and synthesis of information about intervention characteristics across studies (common characteristics and differences) and provides a consistent language for reporting that supports interpretation of review findings.

Table 3.2.b   outlines a process for planning intervention groups as a basis for/precursor to synthesis, and the decision points and considerations at each step. The table is intended to guide, rather than to be prescriptive and, although it is presented as a sequence of steps, the process is likely to be iterative, and some steps may be done concurrently or in a different sequence. The process aims to minimize data-driven approaches that can arise once review authors have knowledge of the findings of the included studies. It also includes principles for developing a flexible plan that maximizes the potential to synthesize in circumstances where there are few studies, many variants of an intervention, or where the variants are difficult to anticipate. In all stages, review authors should consider how to categorize studies whose reports contain insufficient detail.

Table 3.2.b A process for planning intervention groups for synthesis

1. Identify intervention characteristics that may modify the effect of the intervention.

Consider whether differences in interventions characteristics might modify the size of the intervention effect importantly. Content-specific research literature and expertise should inform this step.

The TIDieR checklist – a tool for describing interventions – outlines the characteristics across which an intervention might differ (Hoffmann et al 2014). These include ‘what’ materials and procedures are used, ‘who’ provides the intervention, ‘when and how much’ intervention is delivered. The iCAT-SR tool provides equivalent guidance for complex interventions (Lewin et al 2017).

differ across multiple characteristics, which vary in importance depending on the review.

In a review of exercise for osteoporosis, whether the exercise is weight-bearing or non-weight-bearing may be a key characteristic, since the mechanism by which exercise is thought to work is by placing stress or mechanical load on bones (Howe et al 2011).

Different mechanisms apply in reviews of exercise for knee osteoarthritis (muscle strengthening), falls prevention (gait and balance), cognitive function (cardiovascular fitness).

The differing mechanisms might suggest different ways of grouping interventions (e.g. by intensity, mode of delivery) according to potential modifiers of the intervention effects.

2a. Label and define intervention groups to be considered in the synthesis.

 

For each intervention group, provide a short label (e.g. supportive psychotherapy) and describe the core characteristics (criteria) that will be used to assign each intervention from an included study to a group.

Groups are often defined by intervention content (especially the active components), such as materials, procedures or techniques (e.g. a specific drug, an information leaflet, a behaviour change technique). Other characteristics may also be used, although some are more commonly used to define subgroups (see ): the purpose or theoretical underpinning, mode of delivery, provider, dose or intensity, duration or timing of the intervention (Hoffmann et al 2014).

In specifying groups:

Logic models may help structure the synthesis (see and ).

In a review of psychological therapies for coronary heart disease, a single group was specified for meta-analysis that included all types of therapy. Subgroups were defined to examine whether intervention effects were modified by intervention components (e.g. cognitive techniques, stress management) or mode of delivery (e.g. individual, group) (Richards et al 2017).

In a review of psychological therapies for panic disorder (Pompoli et al 2016), eight types of therapy were specified:

1. psychoeducation;

2. supportive psychotherapy (with or without a psychoeducational component);

3. physiological therapies;

4. behaviour therapy;

5. cognitive therapy;

6. cognitive behaviour therapy (CBT);

7. third-wave CBT; and

8. psychodynamic therapies.

Groups were defined by the theoretical basis of each therapy (e.g. CBT aims to modify maladaptive thoughts through cognitive restructuring) and the component techniques used.

2b. Define levels for groups based on dose or intensity.

For groups based on ‘how much’ of an intervention is used (e.g. dose or intensity), criteria are needed to quantify each group. This may be straightforward for easy-to-quantify characteristics, but more complex for characteristics that are hard to quantify (e.g. duration or intensity of rehabilitation or psychological therapy).

The levels should be based on how the intervention is used in practice (e.g. cut-offs for low and high doses of a supplement based on recommended nutrient intake), or on a rationale for how the intervention might work.

In reviews of exercise, intensity may be defined by training time (session length, frequency, program duration), amount of work (e.g. repetitions), and effort/energy expenditure (exertion, heart rate) (Regnaux et al 2015).

In a review of organized inpatient care for stroke, acute stroke units were categorized as ‘intensive’, ‘semi-intensive’ or ‘non-intensive’ based on whether the unit had continuous monitoring, high nurse staffing, and life support facilities (Stroke Unit Trialists Collaboration 2013).

3. Determine whether there is an existing system for grouping interventions.

 

In some fields, intervention taxonomies and frameworks have been developed for labelling and describing interventions, and these can make it easier for those using a review to interpret and apply findings.

Using an agreed system is preferable to developing new groupings. Existing systems should be assessed for relevance and usefulness. The most useful systems:

Systems for grouping interventions may be generic, widely applicable across clinical areas, or specific to a condition or intervention type. Some Cochrane Groups recommend specific taxonomies.

The (BCT) (Michie et al 2013) categorizes intervention elements such as goal setting, self-monitoring and social support. A protocol for a review of social media interventions used this taxonomy to describe interventions and examine different BCTs as potential effect modifiers (Welch et al 2018).

The has been used to group interventions (or components) by function (e.g. to educate, persuade, enable) (Michie et al 2011). This system was used to describe the components of dietary advice interventions (Desroches et al 2013).

 

Multiple reviews have used the consensus-based taxonomy developed by the Prevention of Falls Network Europe (ProFaNE) (e.g. Verheyden et al 2013, Kendrick et al 2014). The taxonomy specifies broad groups (e.g. exercise, medication, environment/assistive technology) within which are more specific groups (e.g. exercise: gait, balance and functional training; flexibility; strength and resistance) (Lamb et al 2011).

4. Plan how the specified groups will be used in synthesis and reporting.

Decide whether it is useful to pool all interventions in a single meta-analysis (‘lumping’), within which specific characteristics can be explored as effect modifiers (e.g. in subgroups). Alternatively, if pooling all interventions is unlikely to address a useful question, separate synthesis of specific interventions may be more appropriate (‘splitting’).

Determining the right analytic approach is discussed further in .

In a review of exercise for knee osteoarthritis, the different categories of exercise were combined in a single meta-analysis, addressing the question ‘what is the effect of exercise on knee osteoarthritis?’. The categories were also analysed as subgroups within the meta-analysis to explore whether the effect size varied by type of exercise (Fransen et al 2015). Other subgroup analyses examined mode of delivery and dose.

5. Decide how to group interventions with multiple components or co-interventions.

Some interventions, especially those considered ‘complex’, include multiple components that could also be implemented independently (Guise et al 2014, Lewin et al 2017). These components might be eligible for inclusion in the review alone, or eligible only if used alongside an eligible intervention.

Options for considering multi-component interventions may include the following.

and Welton et al 2009, Caldwell and Welton 2016, Higgins et al 2019).

The first two approaches may be challenging but are likely to be most useful (Caldwell and Welton 2016).

See Section . for the special case of when a co-intervention is administered in both treatment arms.

In a review of psychological therapies for panic disorder, two of the eight eligible therapies (psychoeducation and supportive psychotherapy) could be used alone or as part of a multi-component therapy. When accompanied by another eligible therapy, the intervention was categorized as the other therapy (i.e. psychoeducation + cognitive behavioural therapy was categorized as cognitive behavioural therapy) (Pompoli et al 2016).

 

In a review of psychosocial interventions for smoking cessation in pregnancy, two approaches were used. All intervention types were included in a single meta-analysis with subgroups for multi-component, single and tailored interventions. Separate meta-analyses were also performed for each intervention type, with categorization of multi-component interventions based on the ‘main’ component (Chamberlain et al 2017).

6. Build in contingencies by specifying both specific and broader intervention groups.

Consider grouping interventions at more than one level, so that studies of a broader group of interventions can be synthesized if too few studies are identified for synthesis in more specific groups. This will provide flexibility where review authors anticipate few studies contributing to specific groups (e.g. in reviews with diverse interventions, additional diversity in other PICO elements, or few studies overall, see also ).

In a review of psychosocial interventions for smoking cessation, the authors planned to group any psychosocial intervention in a single comparison (addressing the higher level question of whether, on average, psychosocial interventions are effective). Given that sufficient data were available, they also presented separate meta-analyses to examine the effects of specific types of psychosocial interventions (e.g. counselling, health education, incentives, social support) (Chamberlain et al 2017).

3.2.3 Defining which comparisons will be made

When articulating the PICO for each synthesis, defining the intervention groups alone is not sufficient for complete specification of the planned syntheses. The next step is to define the comparisons that will be made between the intervention groups. Setting aside for a moment more complex analyses such as network meta-analyses, which can simultaneously compare many groups ( Chapter 11 ), standard meta-analysis ( Chapter 10 ) aims to draw conclusions about the comparative effects of two groups at a time (i.e. which of two intervention groups is more effective?). These comparisons form the basis for the syntheses that will be undertaken if data are available. Cochrane Reviews sometimes include one comparison, but most often include multiple comparisons. Three commonly identified types of comparisons include the following (Davey et al 2011).

  • newer generation antidepressants versus placebo (Hetrick et al 2012); and
  • vertebroplasty for osteoporotic vertebral compression fractures versus placebo (sham procedure) (Buchbinder et al 2018).
  • chemotherapy or targeted therapy plus best supportive care (BSC) versus BSC for palliative treatment of esophageal and gastroesophageal-junction carcinoma (Janmaat et al 2017); and
  • personalized care planning versus usual care for people with long-term conditions (Coulter et al 2015).
  • early (commenced at less than two weeks of age) versus late (two weeks of age or more) parenteral zinc supplementation in term and preterm infants (Taylor et al 2017);
  • high intensity versus low intensity physical activity or exercise in people with hip or knee osteoarthritis (Regnaux et al 2015);
  • multimedia education versus other education for consumers about prescribed and over the counter medications (Ciciriello et al 2013).

The first two types of comparisons aim to establish the effectiveness of an intervention, while the last aims to compare the effectiveness of two interventions. However, the distinction between the placebo and control is often arbitrary, since any differences in the care provided between trials with a control arm and those with a placebo arm may be unimportant , especially where ‘usual care’ is provided to both. Therefore, placebo and control groups may be determined to be similar enough to be combined for synthesis.

In reviews including multiple intervention groups, many comparisons are possible. In some of these reviews, authors seek to synthesize evidence on the comparative effectiveness of all their included interventions, including where there may be only indirect comparison of some interventions across the included studies ( Chapter 11, Section 11.2.1 ). However, in many reviews including multiple intervention groups, a limited subset of the possible comparisons will be selected. The chosen subset of comparisons should address the most important clinical and research questions. For example, if an established intervention (or dose of an intervention) is used in practice, then the synthesis would ideally compare novel or alternative interventions to this established intervention, and not, for example, to no intervention.

3.2.3.1 Dealing with co-interventions

Planning is needed for the special case where the same supplementary intervention is delivered to both the intervention and comparator groups. A supplementary intervention is an additional intervention delivered alongside the intervention of interest, such as massage in a review examining the effects of aromatherapy (i.e. aromatherapy plus massage versus massage alone). In many cases, the supplementary intervention will be unimportant and can be ignored. In other situations, the effect of the intervention of interest may differ according to whether participants receive the supplementary therapy. For example, the effect of aromatherapy among people who receive a massage may differ from the effect of the aromatherapy given alone. This will be the case if the intervention of interest interacts with the supplementary intervention leading to larger (synergistic) or smaller (dysynergistic/antagonistic) effects than the intervention of interest alone (Squires et al 2013). While qualitative interactions are rare (where the effect of the intervention is in the opposite direction when combined with the supplementary intervention), it is possible that there will be more variation in the intervention effects (heterogeneity) when supplementary interventions are involved, and it is important to plan for this. Approaches for dealing with this in the statistical synthesis may include fitting a random-effects meta-analysis model that encompasses heterogeneity ( Chapter 10, Section 10.10.4 ), or investigating whether the intervention effect is modified by the addition of the supplementary intervention through subgroup analysis ( Chapter 10, Section 10.11.2 ).

3.2.4 Selecting, prioritizing and grouping review outcomes

3.2.4.1 selecting review outcomes.

Broad outcome domains are decided at the time of setting up the review PICO (see Chapter 2 ). Once the broad domains are agreed, further specification is required to define the domains to facilitate reporting and synthesis (i.e. the PICO for comparison) (see Chapter 2, Section 2.3 ). The process for specifying and grouping outcomes largely parallels that used for specifying intervention groups.

Reporting of outcomes should rarely determine study eligibility for a review. In particular, studies should not be excluded because they do not report results of an outcome they may have measured, or provide ‘no usable data’ ( MECIR Box 3.2.d ). This is essential to avoid bias arising from selective reporting of findings by the study authors (see Chapter 13 ). However, in some circumstances, the measurement of certain outcomes may be a study eligibility criterion. This may be the case, for example, when the review addresses the potential for an intervention to prevent a particular outcome, or when the review addresses a specific purpose of an intervention that can be used in the same population for different purposes (such as hormone replacement therapy, or aspirin).

MECIR Box 3.2.d Relevant expectations for conduct of intervention reviews

Clarifying role of outcomes ( )

Outcome measures should not always form part of the criteria for including studies in a review. However, some reviews do legitimately restrict eligibility to specific outcomes. For example, the same intervention may be studied in the same population for different purposes (e.g. hormone replacement therapy, or aspirin); or a review may address specifically the adverse effects of an intervention used for several conditions. If authors do exclude studies on the basis of outcomes, care should be taken to ascertain that relevant outcomes are not available because they have not been measured rather than simply not reported.

Predefining outcome domains ( )

Full specification of the outcomes includes consideration of outcome domains (e.g. quality of life) and outcome measures (e.g. SF-36). Predefinition of outcome reduces the risk of selective outcome reporting. The should be as few as possible and should normally reflect at least one potential benefit and at least one potential area of harm. It is expected that the review should be able to synthesize these outcomes if eligible studies are identified, and that the conclusions of the review will be based largely on the effects of the interventions on these outcomes. Additional important outcomes may also be specified. Up to seven critical and important outcomes will form the basis of the GRADE assessment and summarized in the review’s abstract and other summary formats, although the review may measure more than seven outcomes.

Choosing outcomes ( )

Cochrane Reviews are intended to support clinical practice and policy, and should address outcomes that are critical or important to consumers. These should be specified at protocol stage. Where available, established sets of core outcomes should be used. Patient-reported outcomes should be included where possible. It is also important to judge whether evidence of resource use and costs might be an important component of decisions to adopt the intervention or alternative management strategies around the world. Large numbers of outcomes, while sometimes necessary, can make reviews unfocused, unmanageable for the user, and prone to selective outcome reporting bias. Biochemical, interim and process outcomes should be considered where they are important to decision makers. Any outcomes that would not be described as critical or important can be left out of the review.

Predefining outcome measures ( )

Having decided what outcomes are of interest to the review, authors should clarify acceptable ways in which these outcomes can be measured. It may be difficult, however, to predefine adverse effects.

C17: Predefining choices from multiple outcome measures ( )

Prespecification guards against selective outcome reporting, and allows users to confirm that choices were not overly influenced by the results. A predefined hierarchy of outcomes measures may be helpful. It may be difficult, however, to predefine adverse effects. A rationale should be provided for the choice of outcome measure

C18: Predefining time points of interest ( )

Prespecification guards against selective outcome reporting, and allows users to confirm that choices were not overly influenced by the results. Authors may consider whether all time frames or only selected time points will be included in the review. These decisions should be based on outcomes important for making healthcare decisions. One strategy to make use of the available data could be to group time points into prespecified intervals to represent ‘short-term’, ‘medium-term’ and ‘long-term’ outcomes and to take no more than one from each interval from each study for any particular outcome.

In general, systematic reviews should aim to include outcomes that are likely to be meaningful to the intended users and recipients of the reviewed evidence. This may include clinicians, patients (consumers), the general public, administrators and policy makers. Outcomes may include survival (mortality), clinical events (e.g. strokes or myocardial infarction), behavioural outcomes (e.g. changes in diet, use of services), patient-reported outcomes (e.g. symptoms, quality of life), adverse events, burdens (e.g. demands on caregivers, frequency of tests, restrictions on lifestyle) and economic outcomes (e.g. cost and resource use). It is critical that outcomes used to assess adverse effects as well as outcomes used to assess beneficial effects are among those addressed by a review (see Chapter 19 ).

Outcomes that are trivial or meaningless to decision makers should not be included in Cochrane Reviews. Inclusion of outcomes that are of little or no importance risks overwhelming and potentially misleading readers. Interim or surrogate outcomes measures, such as laboratory results or radiologic results (e.g. loss of bone mineral content as a surrogate for fractures in hormone replacement therapy), while potentially helpful in explaining effects or determining intervention integrity (see Chapter 5, Section 5.3.4.1 ), can also be misleading since they may not predict clinically important outcomes accurately. Many interventions reduce the risk for a surrogate outcome but have no effect or have harmful effects on clinically relevant outcomes, and some interventions have no effect on surrogate measures but improve clinical outcomes.

Various sources can be used to develop a list of relevant outcomes, including input from consumers and advisory groups (see Chapter 2 ), the clinical experiences of the review authors, and evidence from the literature (including qualitative research about outcomes important to those affected (see Chapter 21 )). A further driver of outcome selection is consideration of outcomes used in related reviews. Harmonization of outcomes across reviews addressing related questions facilitates broader evidence synthesis questions being addressed through the use of Overviews of reviews (see Chapter V ).

Outcomes considered to be meaningful, and therefore addressed in a review, may not have been reported in the primary studies. For example, quality of life is an important outcome, perhaps the most important outcome, for people considering whether or not to use chemotherapy for advanced cancer, even if the available studies are found to report only survival (see Chapter 18 ). A further example arises with timing of the outcome measurement, where time points determined as clinically meaningful in a review are not measured in the primary studies. Including and discussing all important outcomes in a review will highlight gaps in the primary research and encourage researchers to address these gaps in future studies.

3.2.4.2 Prioritizing review outcomes

Once a full list of relevant outcomes has been compiled for the review, authors should prioritize the outcomes and select the outcomes of most relevance to the review question. The GRADE approach to assessing the certainty of evidence (see Chapter 14 ) suggests that review authors separate outcomes into those that are ‘critical’, ‘important’ and ‘not important’ for decision making.

The critical outcomes are the essential outcomes for decision making, and are those that would form the basis of a ‘Summary of findings’ table or other summary versions of the review, such as the Abstract or Plain Language Summary. ‘Summary of findings’ tables provide key information about the amount of evidence for important comparisons and outcomes, the quality of the evidence and the magnitude of effect (see Chapter 14, Section 14.1 ). There should be no more than seven outcomes included in a ‘Summary of findings’ table, and those outcomes that will be included in summaries should be specified at the protocol stage. They should generally not include surrogate or interim outcomes. They should not be chosen on the basis of any anticipated or observed magnitude of effect, or because they are likely to have been addressed in the studies to be reviewed. Box 3.2.c summarizes the principal factors to consider when selecting and prioritizing review outcomes.

Box 3.2.c Factors to consider when selecting and prioritizing review outcomes

3.2.4.3 Defining and grouping outcomes for synthesis

Table 3.2.c outlines a process for planning for the diversity in outcome measurement that may be encountered in the studies included in a review and which can complicate, and sometimes prevent, synthesis. Research has repeatedly documented inconsistency in the outcomes measured across trials in the same clinical areas (Harrison et al 2016, Williamson et al 2017). This inconsistency occurs across all aspects of outcome measurement, including the broad domains considered, the outcomes measured, the way these outcomes are labelled and defined, and the methods and timing of measurement. For example, a review of outcome measures used in 563 studies of interventions for dementia and mild cognitive impairment found that 321 unique measurement methods were used for 1278 assessments of cognitive outcomes (Harrison et al 2016). Initiatives like COMET ( Core Outcome Measures in Effectiveness Trials ) aim to encourage standardization of outcome measurement across trials (Williamson et al 2017), but these initiatives are comparatively new and review authors will inevitably encounter diversity in outcomes across studies.

The process begins by describing the scope of each outcome domain in sufficient detail to enable outcomes from included studies to be categorized ( Table 3.2.c Step 1). This step may be straightforward in areas for which core outcome sets (or equivalent systems) exist ( Table 3.2.c Step 2). The methods and timing of outcome measurement also need to be specified, giving consideration to how differences across studies will be handled ( Table 3.2.c Steps 3 and 4). Subsequent steps consider options for dealing with studies that report multiple measures within an outcome domain ( Table 3.2.c Step 5), planning how outcome domains will be used in synthesis ( Table 3.2.c Step 6), and building in contingencies to maximize potential to synthesize ( Table 3.2.c Step 7).

Table 3.2.c A process for planning outcome groups for synthesis

1. Fully specify outcome domains.

For each outcome domain, provide a short label (e.g. cognition, consumer evaluation of care) and describe the domain in sufficient detail to enable eligible outcomes from each included study to be categorized. The definition should be based on the concept (or construct) measured, that is ‘what’ is measured. ‘When’ and ‘how’ the outcome is measured will be considered in subsequent steps.

Outcomes can be defined hierarchically, starting with very broad groups (e.g. physiological/clinical outcomes, life impact, adverse events), then outcome domains (e.g. functioning and perceived health status are domains within ‘life impact’). Within these may be narrower domains (e.g. physical function, cognitive function), and then specific outcome measures (Dodd et al 2018). The level at which outcomes are grouped for synthesis alters the question addressed, and so decisions should be guided by the review objectives.

In specifying outcome domains:

In a review of computer-based interventions for sexual health promotion, three broad outcome domains were defined (cognitions, behaviours, biological) based on a conceptual model of how the intervention might work. Each domain comprised more specific domains and outcomes (e.g. condom use, seeking health services such as STI testing); listing these helped define the broad domains and guided categorization of the diverse outcomes reported in included studies (Bailey et al 2010).

In a protocol for a review of social media interventions for improving health, the rationale for synthesizing broad groupings of outcomes (e.g. health behaviours, physical health) was based on prediction of a common underlying mechanism by which the intervention would work, and the review objective, which focused on overall health rather than specific outcomes (Welch et al 2018).

2. Determine whether there is an existing system for identifying and grouping important outcomes.

Systems for categorizing outcomes include core outcome sets including the and initiatives, and outcome taxonomies (Dodd et al 2018). These systems define agreed outcomes that should be measured for specific conditions (Williamson et al 2017).These systems can be used to standardize the varied outcome labels used across studies and enable grouping and comparison (Kirkham et al 2013). Agreed terminology may help decision makers interpret review findings.

The COMET website provides a database of core outcome sets agreed or in development. Some Cochrane Groups have developed their own outcome sets. While the availability of outcome sets and taxonomies varies across clinical areas, several taxonomies exist for specifying broad outcome domains (e.g. Dodd et al 2018, ICHOM 2018).

In a review of combined diet and exercise for preventing gestational diabetes mellitus, a core outcome set agreed by the Cochrane Pregnancy and Childbirth group was used (Shepherd et al 2017).

In a review of decision aids for people facing health treatment or screening decisions (Stacey et al 2017), outcome domains were based on criteria for evaluating decision aids agreed in the (IPDAS). Doing so helped to assess the use of aids across diverse clinical decisions.

The Cochrane Consumers and Communication Group has an agreed taxonomy to guide specification of outcomes of importance in evaluating communication interventions (Cochrane Consumers & Communication Group).

3. Define the outcome time points.

A key attribute of defining an outcome is specifying the time of measurement. In reviews, time frames, and not specific time points, are often specified to handle the likely diversity in timing of outcome measurement across studies (e.g. a ‘medium-term’ time frame might be defined as including outcomes measured between 6 and 12 months).

In specifying outcome timing:

In a review of psychological therapies for panic disorder, the main outcomes were ‘short-term’ (≤6 months from treatment commencement). ‘Long-term’ outcomes (>6 months from treatment commencement) were considered important, but not specified as critical because of concerns of participant attrition (Pompoli et al 2018).

In contrast, in a review of antidepressants, a clinically meaningful time frame of 6 to 12 months might be specified for the critical outcome ‘depression’, since this is the recommended treatment duration. However, it may be anticipated that many studies will be of shorter duration with short-term follow-up, so an additional important outcome of ‘depression (<3 months)’ might also be specified.

4. Specify the measurement tool or measurement method.

For each outcome domain, specify:

Minimum criteria for inclusion of a measure may include:

(e.g. consistent scores across time and raters when the outcome is unchanged), and (e.g. comparable results to similar measures, including a gold standard if available); and

Measures may be identified from core outcome sets (e.g. Williamson et al 2017, ICHOM 2018) or systematic reviews of instruments (see COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) initiative for a database of examples).

In a review of interventions to support women to stop smoking, objective (biochemically validated) and subjective (self-report) measures of smoking cessation were specified separately to examine bias due to the method used to measure the outcome (Step 6) (Chamberlain et al 2017).

In a review of high-intensity versus low-intensity exercise for osteoarthritis, measures of pain were selected based on relevance of the content and properties of the measurement tool (i.e. evidence of validity and reliability) (Regnaux et al 2015).

5. Specify how multiplicity of outcomes will be handled.

For a particular domain, multiple outcomes within a study may be available for inclusion. This may arise from:

Effects of the intervention calculated from these different sources of multiplicity are statistically dependent, since they have been calculated using the same participants. To deal with this dependency, select only one outcome per study for a particular comparison, or use a meta-analysis method that accounts for the dependency (see Step 6).

Pre-specify the method of selection from multiple outcomes or measures in the protocol, using an approach that is independent of the result (see ) (López-López et al 2018). Document all eligible outcomes or measures in the ‘Characteristics of included studies’ table, noting which was selected and why.

Multiplicity can arise from the reporting of multiple analyses of the same outcome (e.g. analyses that do and do not adjust for prognostic factors; intention-to-treat and per-protocol analyses) and multiple reports of the same study (e.g. journal articles, conference abstracts). Approaches for dealing with this type of multiplicity should also be specified in the protocol (López-López et al 2018).

It may be difficult to anticipate all forms of multiplicity when developing a protocol. Any post-hoc approaches used to select outcomes or results should be noted at the beginning of the Methods section, or if extensive, within an additional supplementary material.

The following hierarchy was specified to select one outcome per domain in a review examining the effects of portion, package or tableware size (Hollands et al 2015):

Selection of the outcome was made blinded to the results. All available outcome measures were documented in the ‘Characteristics of included studies’ table.

In a review of audit and feedback for healthcare providers, the outcome domains were ‘provider performance’ (e.g. compliance with recommended use of a laboratory test) and ‘patient health outcomes’ (e.g. smoking status, blood pressure) (Ivers et al 2012). For each domain, outcomes were selected using the following hierarchy:

6. Plan how the specified outcome domains will be used in the synthesis.

When different measurement methods or tools have been used across studies, consideration must be given to how these will be synthesized. Options include the following.

and ). There may be increased heterogeneity, warranting use of a random-effects model ( ).

In a review of interventions to support women to stop smoking, separate outcome domains were specified for biochemically validated measures of smoking and self-report measures. The two domains were meta-analysed together, but sensitivity analyses were undertaken restricting the meta-analyses to studies with only biochemically validated outcomes, to examine if the results were robust to the method of measurement (Chamberlain et al 2017).

In a review of psychological therapies for youth internalizing and externalizing disorders, most studies contributed multiple effects (e.g. in one meta-analysis of 443 studies, there were 5139 included measures). The authors used multilevel modelling to address the dependency among multiple effects contributed from each study (Weisz et al 2017).

7. Where possible, build in contingencies by specifying both specific and broader outcome domains.

Consider building in flexibility to group outcomes at different levels or time intervals. Inflexible approaches can undermine the potential to synthesize, especially when few studies are anticipated, or there is likely to be diversity in the way outcomes are defined and measured and the timing of measurement. If insufficient studies report data for meaningful synthesis using the narrower domains, the broader domains can be used (see also ).

Consider a hypothetical review aiming to examine the effects of behavioural psychological interventions for the treatment of overweight and obese adults. A specific outcome is body mass index (BMI). However, also specifying a broader outcome domain ‘indicator of body mass’ will facilitate synthesis in the circumstance where few studies report BMI, but most report an indicator of body mass (such as weight or waist circumference). This is particularly important when few studies may be anticipated or there is expected diversity in the measurement methods or tools.

3.3 Determining which study designs to include

Some study designs are more appropriate than others for answering particular questions. Authors need to consider a priori what study designs are likely to provide reliable data with which to address the objectives of their review ( MECIR Box 3.3.a ). Sections 3.3.1 and 3.3.2 cover randomized and non-randomized designs for assessing treatment effects; Chapter 17, Section 17.2.5  discusses other study designs in the context of addressing intervention complexity.

MECIR Box 3.3.a Relevant expectations for conduct of intervention reviews

Predefining study designs ( )

Predefined, unambiguous eligibility criteria are a fundamental prerequisite for a systematic review. This is particularly important when non-randomized studies are considered. Some labels commonly used to define study designs can be ambiguous. For example a ‘double blind’ study may not make it clear who was blinded; a ‘case-control’ study may be nested within a cohort, or be undertaken in a cross-sectional manner; or a ‘prospective’ study may have only some features defined or undertaken prospectively.

Justifying choice of study designs ( )

It might be difficult to address some interventions or some outcomes in randomized trials. Authors should be able to justify why they have chosen either to restrict the review to randomized trials or to include non-randomized studies. The particular study designs included should be justified with regard to appropriateness to the review question and with regard to potential for bias.

3.3.1 Including randomized trials

Because Cochrane Reviews address questions about the effects of health care, they focus primarily on randomized trials and randomized trials should be included if they are feasible for the interventions of interest ( MECIR Box 3.3.b ). Randomization is the only way to prevent systematic differences between baseline characteristics of participants in different intervention groups in terms of both known and unknown (or unmeasured) confounders (see Chapter 8 ), and claims about cause and effect can be based on their findings with far more confidence than almost any other type of study. For clinical interventions, deciding who receives an intervention and who does not is influenced by many factors, including prognostic factors. Empirical evidence suggests that, on average, non-randomized studies produce effect estimates that indicate more extreme benefits of the effects of health care than randomized trials. However, the extent, and even the direction, of the bias is difficult to predict. These issues are discussed at length in Chapter 24 , which provides guidance on when it might be appropriate to include non-randomized studies in a Cochrane Review.

Practical considerations also motivate the restriction of many Cochrane Reviews to randomized trials. In recent decades there has been considerable investment internationally in establishing infrastructure to index and identify randomized trials. Cochrane has contributed to these efforts, including building up and maintaining a database of randomized trials, developing search filters to aid their identification, working with MEDLINE to improve tagging and identification of randomized trials, and using machine learning and crowdsourcing to reduce author workload in identifying randomized trials ( Chapter 4, Section 4.6.6.2 ). The same scale of organizational investment has not (yet) been matched for the identification of other types of studies. Consequently, identifying and including other types of studies may require additional efforts to identify studies and to keep the review up to date, and might increase the risk that the result of the review will be influenced by publication bias. This issue and other bias-related issues that are important to consider when defining types of studies are discussed in detail in Chapter 7 and Chapter 13 .

Specific aspects of study design and conduct should be considered when defining eligibility criteria, even if the review is restricted to randomized trials. For example, whether cluster-randomized trials ( Chapter 23, Section 23.1 ) and crossover trials ( Chapter 23, Section 23.2 ) are eligible, as well as other criteria for eligibility such as use of a placebo comparison group, evaluation of outcomes blinded to allocation sequence, or a minimum period of follow-up. There will always be a trade-off between restrictive study design criteria (which might result in the inclusion of studies that are at low risk of bias, but very few in number) and more liberal design criteria (which might result in the inclusion of more studies, but at a higher risk of bias). Furthermore, excessively broad criteria might result in the inclusion of misleading evidence. If, for example, interest focuses on whether a therapy improves survival in patients with a chronic condition, it might be inappropriate to look at studies of very short duration, except to make explicit the point that they cannot address the question of interest.

MECIR Box 3.3.b Relevant expectations for conduct of intervention reviews

Including randomized trials ( )

if it is feasible to conduct them to evaluate the interventions and outcomes of interest.

Randomized trials are the best study design for evaluating the efficacy of interventions. If it is feasible to conduct them to evaluate questions that are being addressed by the review, they must be considered eligible for the review. However, appropriate exclusion criteria may be put in place, for example regarding length of follow-up.

3.3.2 Including non-randomized studies

The decision of whether non-randomized studies (and what type) will be included is decided alongside the formulation of the review PICO. The main drivers that may lead to the inclusion of non-randomized studies include: (i) when randomized trials are unable to address the effects of the intervention on harm and long-term outcomes or in specific populations or settings; or (ii) for interventions that cannot be randomized (e.g. policy change introduced in a single or small number of jurisdictions) (see Chapter 24 ). Cochrane, in collaboration with others, has developed guidance for review authors to support their decision about when to look for and include non-randomized studies (Schünemann et al 2013).

Non-randomized designs have the commonality of not using randomization to allocate units to comparison groups, but their different design features mean that they are variable in their susceptibility to bias. Eligibility criteria should be based on explicit study design features, and not the study labels applied by the primary researchers (e.g. case-control, cohort), which are often used inconsistently (Reeves et al 2017; see Chapter 24 ).

When non-randomized studies are included, review authors should consider how the studies will be grouped and used in the synthesis. The Cochrane Non-randomized Studies Methods Group taxonomy of design features (see Chapter 24 ) may provide a basis for grouping together studies that are expected to have similar inferential strength and for providing a consistent language for describing the study design.

Once decisions have been made about grouping study designs, planning of how these will be used in the synthesis is required. Review authors need to decide whether it is useful to synthesize results from non-randomized studies and, if so, whether results from randomized trials and non-randomized studies should be included in the same synthesis (for the purpose of examining whether study design explains heterogeneity among the intervention effects), or whether the effects should be synthesized in separate comparisons (Valentine and Thompson 2013). Decisions should be made for each of the different types of non-randomized studies under consideration. Review authors might anticipate increased heterogeneity when non-randomized studies are synthesized, and adoption of a meta-analysis model that encompasses heterogeneity is wise (Valentine and Thompson 2013) (such as a random effects model, see Chapter 10, Section 10.10.4 ). For further discussion of non-randomized studies, see Chapter 24 .

3.4 Eligibility based on publication status and language

Chapter 4 contains detailed guidance on how to identify studies from a range of sources including, but not limited to, those in peer-reviewed journals. In general, a strategy to include studies reported in all types of publication will reduce bias ( Chapter 7 ). There would need to be a compelling argument for the exclusion of studies on the basis of their publication status ( MECIR Box 3.4.a ), including unpublished studies, partially published studies, and studies published in ‘grey’ literature sources. Given the additional challenge in obtaining unpublished studies, it is possible that any unpublished studies identified in a given review may be an unrepresentative subset of all the unpublished studies in existence. However, the bias this introduces is of less concern than the bias introduced by excluding all unpublished studies, given what is known about the impact of reporting biases (see Chapter 13 on bias due to missing studies, and Chapter 4, Section 4.3 for a more detailed discussion of searching for unpublished and grey literature).

Likewise, while searching for, and analysing, studies in any language can be extremely resource-intensive, review authors should consider carefully the implications for bias (and equity, see Chapter 16 ) if they restrict eligible studies to those published in one specific language (usually English). See Chapter 4, Section 4.4.5 , for further discussion of language and other restrictions while searching.

MECIR Box 3.4.a Relevant expectations for conduct of intervention reviews

Excluding studies based on publication status ( )

Obtaining and including data from unpublished studies (including grey literature) can reduce the effects of publication bias. However, the unpublished studies that can be located may be an unrepresentative sample of all unpublished studies.

3.5 Chapter information

Authors: Joanne E McKenzie, Sue E Brennan, Rebecca E Ryan, Hilary J Thomson, Renea V Johnston, James Thomas

Acknowledgements: This chapter builds on earlier versions of the Handbook . In particular, Version 5, Chapter 5 , edited by Denise O’Connor, Sally Green and Julian Higgins.

Funding: JEM is supported by an Australian National Health and Medical Research Council (NHMRC) Career Development Fellowship (1143429). SEB and RER’s positions are supported by the NHMRC Cochrane Collaboration Funding Program. HJT is funded by the UK Medical Research Council (MC_UU_12017-13 and MC_UU_12017-15) and Scottish Government Chief Scientist Office (SPHSU13 and SPHSU15). RVJ’s position is supported by the NHMRC Cochrane Collaboration Funding Program and Cabrini Institute. JT is supported by the National Institute for Health Research (NIHR) Collaboration for Leadership in Applied Health Research and Care North Thames at Barts Health NHS Trust. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health.

3.6 References

Bailey JV, Murray E, Rait G, Mercer CH, Morris RW, Peacock R, Cassell J, Nazareth I. Interactive computer-based interventions for sexual health promotion. Cochrane Database of Systematic Reviews 2010; 9 : CD006483.

Bender R, Bunce C, Clarke M, Gates S, Lange S, Pace NL, Thorlund K. Attention should be given to multiplicity issues in systematic reviews. Journal of Clinical Epidemiology 2008; 61 : 857–865.

Buchbinder R, Johnston RV, Rischin KJ, Homik J, Jones CA, Golmohammadi K, Kallmes DF. Percutaneous vertebroplasty for osteoporotic vertebral compression fracture. Cochrane Database of Systematic Reviews 2018; 4 : CD006349.

Caldwell DM, Welton NJ. Approaches for synthesising complex mental health interventions in meta-analysis. Evidence-Based Mental Health 2016; 19 : 16–21.

Chamberlain C, O’Mara-Eves A, Porter J, Coleman T, Perlen S, Thomas J, McKenzie J. Psychosocial interventions for supporting women to stop smoking in pregnancy. Cochrane Database of Systematic Reviews 2017; 2 : CD001055.

Ciciriello S, Johnston RV, Osborne RH, Wicks I, deKroo T, Clerehan R, O’Neill C, Buchbinder R. Multimedia educational interventions for consumers about prescribed and over-the-counter medications. Cochrane Database of Systematic Reviews 2013; 4 : CD008416.

Cochrane Consumers & Communication Group. Outcomes of Interest to the Cochrane Consumers & Communication Group: taxonomy. http://cccrg.cochrane.org/ .

COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) initiative. COSMIN database of systematic reviews of outcome measurement instruments. https://database.cosmin.nl/ .

Coulter A, Entwistle VA, Eccles A, Ryan S, Shepperd S, Perera R. Personalised care planning for adults with chronic or long-term health conditions. Cochrane Database of Systematic Reviews 2015; 3 : CD010523.

Davey J, Turner RM, Clarke MJ, Higgins JPT. Characteristics of meta-analyses and their component studies in the Cochrane Database of Systematic Reviews: a cross-sectional, descriptive analysis. BMC Medical Research Methodology 2011; 11 : 160.

Desroches S, Lapointe A, Ratte S, Gravel K, Legare F, Turcotte S. Interventions to enhance adherence to dietary advice for preventing and managing chronic diseases in adults. Cochrane Database of Systematic Reviews 2013; 2 : CD008722.

Deyo RA, Dworkin SF, Amtmann D, Andersson G, Borenstein D, Carragee E, Carrino J, Chou R, Cook K, DeLitto A, Goertz C, Khalsa P, Loeser J, Mackey S, Panagis J, Rainville J, Tosteson T, Turk D, Von Korff M, Weiner DK. Report of the NIH Task Force on research standards for chronic low back pain. Journal of Pain 2014; 15 : 569–585.

Dodd S, Clarke M, Becker L, Mavergames C, Fish R, Williamson PR. A taxonomy has been developed for outcomes in medical research to help improve knowledge discovery. Journal of Clinical Epidemiology 2018; 96 : 84–92.

Fisher DJ, Carpenter JR, Morris TP, Freeman SC, Tierney JF. Meta-analytical methods to identify who benefits most from treatments: daft, deluded, or deft approach? BMJ 2017; 356 : j573.

Fransen M, McConnell S, Harmer AR, Van der Esch M, Simic M, Bennell KL. Exercise for osteoarthritis of the knee. Cochrane Database of Systematic Reviews 2015; 1 : CD004376.

Guise JM, Chang C, Viswanathan M, Glick S, Treadwell J, Umscheid CA. Systematic reviews of complex multicomponent health care interventions. Report No. 14-EHC003-EF . Rockville, MD: Agency for Healthcare Research and Quality; 2014.

Harrison JK, Noel-Storr AH, Demeyere N, Reynish EL, Quinn TJ. Outcomes measures in a decade of dementia and mild cognitive impairment trials. Alzheimer’s Research and Therapy 2016; 8 : 48.

Hedges LV, Tipton E, Johnson M, C. Robust variance estimation in meta-regression with dependent effect size estimates. Research Synthesis Methods 2010; 1 : 39–65.

Hetrick SE, McKenzie JE, Cox GR, Simmons MB, Merry SN. Newer generation antidepressants for depressive disorders in children and adolescents. Cochrane Database of Systematic Reviews 2012; 11 : CD004851.

Higgins JPT, López-López JA, Becker BJ, Davies SR, Dawson S, Grimshaw JM, McGuinness LA, Moore THM, Rehfuess E, Thomas J, Caldwell DM. Synthesizing quantitative evidence in systematic reviews of complex health interventions. BMJ Global Health 2019; 4 : e000858.

Hoffmann T, Glasziou P, Barbour V, Macdonald H. Better reporting of interventions: template for intervention description and replication (TIDieR) checklist and guide. BMJ 2014; 1687 : 1-13.

Hollands GJ, Shemilt I, Marteau TM, Jebb SA, Lewis HB, Wei Y, Higgins JPT, Ogilvie D. Portion, package or tableware size for changing selection and consumption of food, alcohol and tobacco. Cochrane Database of Systematic Reviews 2015; 9 : CD011045.

Howe TE, Shea B, Dawson LJ, Downie F, Murray A, Ross C, Harbour RT, Caldwell LM, Creed G. Exercise for preventing and treating osteoporosis in postmenopausal women. Cochrane Database of Systematic Reviews 2011; 7 : CD000333.

ICHOM. The International Consortium for Health Outcomes Measurement 2018. http://www.ichom.org/ .

IPDAS. International Patient Decision Aid Standards Collaboration (IPDAS) standards. www.ipdas.ohri.ca .

Ivers N, Jamtvedt G, Flottorp S, Young JM, Odgaard-Jensen J, French SD, O’Brien MA, Johansen M, Grimshaw J, Oxman AD. Audit and feedback: effects on professional practice and healthcare outcomes. Cochrane Database of Systematic Reviews 2012; 6 : CD000259.

Janmaat VT, Steyerberg EW, van der Gaast A, Mathijssen RH, Bruno MJ, Peppelenbosch MP, Kuipers EJ, Spaander MC. Palliative chemotherapy and targeted therapies for esophageal and gastroesophageal junction cancer. Cochrane Database of Systematic Reviews 2017; 11 : CD004063.

Kendrick D, Kumar A, Carpenter H, Zijlstra GAR, Skelton DA, Cook JR, Stevens Z, Belcher CM, Haworth D, Gawler SJ, Gage H, Masud T, Bowling A, Pearl M, Morris RW, Iliffe S, Delbaere K. Exercise for reducing fear of falling in older people living in the community. Cochrane Database of Systematic Reviews 2014; 11 : CD009848.

Kirkham JJ, Gargon E, Clarke M, Williamson PR. Can a core outcome set improve the quality of systematic reviews? A survey of the Co-ordinating Editors of Cochrane Review Groups. Trials 2013; 14 : 21.

Konstantopoulos S. Fixed effects and variance components estimation in three-level meta-analysis. Research Synthesis Methods 2011; 2 : 61–76.

Lamb SE, Becker C, Gillespie LD, Smith JL, Finnegan S, Potter R, Pfeiffer K. Reporting of complex interventions in clinical trials: development of a taxonomy to classify and describe fall-prevention interventions. Trials 2011; 12 : 125.

Lewin S, Hendry M, Chandler J, Oxman AD, Michie S, Shepperd S, Reeves BC, Tugwell P, Hannes K, Rehfuess EA, Welch V, Mckenzie JE, Burford B, Petkovic J, Anderson LM, Harris J, Noyes J. Assessing the complexity of interventions within systematic reviews: development, content and use of a new tool (iCAT_SR). BMC Medical Research Methodology 2017; 17 : 76.

López-López JA, Page MJ, Lipsey MW, Higgins JPT. Dealing with multiplicity of effect sizes in systematic reviews and meta-analyses. Research Synthesis Methods 2018; 9 : 336–351.

Mavridis D, Salanti G. A practical introduction to multivariate meta-analysis. Statistical Methods in Medical Research 2013; 22 : 133–158.

Michie S, van Stralen M, West R. The Behaviour Change Wheel: a new method for characterising and designing behaviour change interventions. Implementation Science 2011; 6 : 42.

Michie S, Richardson M, Johnston M, Abraham C, Francis J, Hardeman W, Eccles MP, Cane J, Wood CE. The behavior change technique taxonomy (v1) of 93 hierarchically clustered techniques: building an international consensus for the reporting of behavior change interventions. Annals of Behavioral Medicine 2013; 46 : 81–95.

Moraes VY, Lenza M, Tamaoki MJ, Faloppa F, Belloti JC. Platelet-rich therapies for musculoskeletal soft tissue injuries. Cochrane Database of Systematic Reviews 2014; 4 : CD010071.

O'Neill J, Tabish H, Welch V, Petticrew M, Pottie K, Clarke M, Evans T, Pardo Pardo J, Waters E, White H, Tugwell P. Applying an equity lens to interventions: using PROGRESS ensures consideration of socially stratifying factors to illuminate inequities in health. Journal of Clinical Epidemiology 2014; 67 : 56–64.

Pompoli A, Furukawa TA, Imai H, Tajika A, Efthimiou O, Salanti G. Psychological therapies for panic disorder with or without agoraphobia in adults: a network meta-analysis. Cochrane Database of Systematic Reviews 2016; 4 : CD011004.

Pompoli A, Furukawa TA, Efthimiou O, Imai H, Tajika A, Salanti G. Dismantling cognitive-behaviour therapy for panic disorder: a systematic review and component network meta-analysis. Psychological Medicine 2018; 48 : 1–9.

Reeves BC, Wells GA, Waddington H. Quasi-experimental study designs series-paper 5: a checklist for classifying studies evaluating the effects on health interventions – a taxonomy without labels. Journal of Clinical Epidemiology 2017; 89 : 30–42.

Regnaux J-P, Lefevre-Colau M-M, Trinquart L, Nguyen C, Boutron I, Brosseau L, Ravaud P. High-intensity versus low-intensity physical activity or exercise in people with hip or knee osteoarthritis. Cochrane Database of Systematic Reviews 2015; 10 : CD010203.

Richards SH, Anderson L, Jenkinson CE, Whalley B, Rees K, Davies P, Bennett P, Liu Z, West R, Thompson DR, Taylor RS. Psychological interventions for coronary heart disease. Cochrane Database of Systematic Reviews 2017; 4 : CD002902.

Safi S, Korang SK, Nielsen EE, Sethi NJ, Feinberg J, Gluud C, Jakobsen JC. Beta-blockers for heart failure. Cochrane Database of Systematic Reviews 2017; 12 : CD012897.

Santesso N, Carrasco-Labra A, Brignardello-Petersen R. Hip protectors for preventing hip fractures in older people. Cochrane Database of Systematic Reviews 2014; 3 : CD001255.

Shepherd E, Gomersall JC, Tieu J, Han S, Crowther CA, Middleton P. Combined diet and exercise interventions for preventing gestational diabetes mellitus. Cochrane Database of Systematic Reviews 2017; 11 : CD010443.

Squires J, Valentine J, Grimshaw J. Systematic reviews of complex interventions: framing the review question. Journal of Clinical Epidemiology 2013; 66 : 1215–1222.

Stacey D, Légaré F, Lewis K, Barry MJ, Bennett CL, Eden KB, Holmes-Rovner M, Llewellyn-Thomas H, Lyddiatt A, Thomson R, Trevena L. Decision aids for people facing health treatment or screening decisions. Cochrane Database of Systematic Reviews 2017; 4 : CD001431.

Stroke Unit Trialists Collaboration. Organised inpatient (stroke unit) care for stroke. Cochrane Database of Systematic Reviews 2013; 9 : CD000197.

Taylor AJ, Jones LJ, Osborn DA. Zinc supplementation of parenteral nutrition in newborn infants. Cochrane Database of Systematic Reviews 2017; 2 : CD012561.

Valentine JC, Thompson SG. Issues relating to confounding and meta-analysis when including non-randomized studies in systematic reviews on the effects of interventions. Research Synthesis Methods 2013; 4 : 26–35.

Vaona A, Banzi R, Kwag KH, Rigon G, Cereda D, Pecoraro V, Tramacere I, Moja L. E-learning for health professionals. Cochrane Database of Systematic Reviews 2018; 1 : CD011736.

Verheyden GSAF, Weerdesteyn V, Pickering RM, Kunkel D, Lennon S, Geurts ACH, Ashburn A. Interventions for preventing falls in people after stroke. Cochrane Database of Systematic Reviews 2013; 5 : CD008728.

Weisz JR, Kuppens S, Ng MY, Eckshtain D, Ugueto AM, Vaughn-Coaxum R, Jensen-Doss A, Hawley KM, Krumholz Marchette LS, Chu BC, Weersing VR, Fordwood SR. What five decades of research tells us about the effects of youth psychological therapy: a multilevel meta-analysis and implications for science and practice. American Psychologist 2017; 72 : 79–117.

Welch V, Petkovic J, Simeon R, Presseau J, Gagnon D, Hossain A, Pardo Pardo J, Pottie K, Rader T, Sokolovski A, Yoganathan M, Tugwell P, DesMeules M. Interactive social media interventions for health behaviour change, health outcomes, and health equity in the adult population. Cochrane Database of Systematic Reviews 2018; 2 : CD012932.

Welton NJ, Caldwell DM, Adamopoulos E, Vedhara K. Mixed treatment comparison meta-analysis of complex interventions: psychological interventions in coronary heart disease. American Journal of Epidemiology 2009; 169 : 1158–1165.

Williamson PR, Altman DG, Bagley H, Barnes KL, Blazeby JM, Brookes ST, Clarke M, Gargon E, Gorst S, Harman N, Kirkham JJ, McNair A, Prinsen CAC, Schmitt J, Terwee CB, Young B. The COMET Handbook: version 1.0. Trials 2017; 18 : 280.

For permission to re-use material from the Handbook (either academic or commercial), please see here for full details.

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Methodology

  • Inclusion and Exclusion Criteria | Examples & Definition

Inclusion and Exclusion Criteria | Examples & Definition

Published on September 17, 2022 by Kassiani Nikolopoulou . Revised on June 22, 2023.

Inclusion and exclusion criteria determine which members of the target population can or can’t participate in a research study. Collectively, they’re known as eligibility criteria , and establishing them is critical when seeking study participants for clinical trials.

This allows researchers to study the needs of a relatively homogeneous group (e.g., people with liver disease) with precision. Examples of common inclusion and exclusion criteria are:

  • Demographic characteristics: Age, gender identity, ethnicity
  • Study-specific variables: Type and stage of disease, previous treatment history, presence of chronic conditions, ability to attend follow-up study appointments, technological requirements (e.g., internet access)
  • Control variables : Fitness level, tobacco use, medications used

Failure to properly define inclusion and exclusion criteria can undermine your confidence that causal relationships exist between treatment and control groups, affecting the internal validity of your study and the generalizability ( external validity ) of your findings.

Table of contents

What are inclusion criteria, what are exclusion criteria, examples of inclusion and exclusion criteria, why are inclusion and exclusion criteria important, other interesting articles, frequently asked questions.

Inclusion criteria comprise the characteristics or attributes that prospective research participants must have in order to be included in the study. Common inclusion criteria can be demographic, clinical, or geographic in nature.

  • 18 to 80 years of age
  • Diagnosis of chronic heart failure at least 6 months before trial
  • On stable doses of heart failure therapies
  • Willing to return for required follow-up (posttest) visits

People who meet the inclusion criteria are then eligible to participate in the study.

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

inclusion and exclusion criteria in literature review importance

Exclusion criteria comprise characteristics used to identify potential research participants who should not be included in a study. These can also include those that lead to participants withdrawing from a research study after being initially included.

In other words, individuals who meet the inclusion criteria may also possess additional characteristics that can interfere with the outcome of the study. For this reason, they must be excluded.

Typical exclusion criteria can be:

  • Ethical considerations , such as being a minor or being unable to give informed consent
  • Practical considerations, such as not being able to read

If potential participants possess any additional characteristics that can affect the results, such as another medical condition or a pregnancy, these are also often grounds for exclusion.

  • The patient requires valve or other cardiac surgery
  • The patient is unable to carry out any physical activity without discomfort
  • The patient had a stroke within three months prior to enrollment
  • The patient refuses to give informed consent
  • The patient is a candidate for coronary bypass surgery or something similar

People who meet one or more of the exclusion criteria must be disqualified. This means that they can’t participate in the study even if they meet the inclusion criteria.

It is important that researchers clearly define the appropriate inclusion and exclusion criteria prior to recruiting participants for their experiment or trial.

Here are some examples of effective and ineffective ways to phrase your criteria:

Inclusion criteria

Bad example: “Subjects will be included in the study if they have insomnia.”

This is too vague. How are you going to establish that participants have insomnia?

Good example: “Subjects will be included in the study if they have been diagnosed with insomnia by a physician and have had symptoms (i.e., trouble falling and/or staying asleep) for at least 3 nights a week for a minimum of 3 months.”

Here, the diagnosis and symptoms are clear. Specifying the time frame ensures that the condition (insomnia) is more likely to be stable throughout the study.

Exclusion criteria

Bad example: “Subjects will be excluded from the study if they are taking medications.”

This is too broad. There are many forms of medication, and some surely will not interfere with your study results. Excluding anyone who is using any type of medication—be it painkillers, birth control, or antidepressants—makes recruitment of study participants for your sample difficult. This, in turn, affects the feasibility of your study.

Good example: “Subjects will be excluded from the study if they are currently on any medication affecting sleep, prescription drugs, or other drugs that in the opinion of the research team may interfere with the results of the study.”

Researchers review inclusion and exclusion criteria with each potential participant to determine their eligibility.

Defining inclusion and exclusion criteria is important in any type of research that examines characteristics of a specific subset of a population . This helps researchers identify the study population in a consistent, reliable, and objective manner. As a result, study participants are more likely to have the attributes that will make it possible to robustly answer the research question .

In clinical trials, establishing inclusion and exclusion criteria minimizes the likelihood of harming participants (e.g., excluding pregnant women) and safeguards vulnerable individuals from exploitation (e.g., excluding individuals who are unable to comprehend what the research entails.) Ethical considerations like these are critical in human-based research.

The main goal of clinical trials is to prove that a medication is safe and effective when used by the target population it was designed for. Therefore, ensuring that study participants are representative of the target population is crucial to the success of the study.

By applying inclusion and exclusion criteria to recruit participants, researchers can ensure that participants are indeed representative of the target population, ensuring external validity . Relatedly, defining robust inclusion and exclusion criteria strengthens your claim that causal relationships exist between your treatment and control groups , ensuring internal validity .

Strong inclusion and exclusion criteria also help other researchers, because they can follow what you did and how you selected participants, allowing them to accurately replicate or reproduce your study.

Ethnographies and a few other types of qualitative research do not usually specify exclusion criteria. However, inclusion criteria help researchers define the community of interest—for example, users of Apple watches. In this way, they can find individuals who have attributes that can help them meet the research objectives .

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Student’s  t -distribution
  • Normal distribution
  • Null and Alternative Hypotheses
  • Chi square tests
  • Confidence interval
  • Quartiles & Quantiles
  • Cluster sampling
  • Stratified sampling
  • Data cleansing
  • Reproducibility vs Replicability
  • Peer review
  • Prospective cohort study

Research bias

  • Implicit bias
  • Cognitive bias
  • Placebo effect
  • Hawthorne effect
  • Hindsight bias
  • Affect heuristic
  • Social desirability bias

I nternal validity is the degree of confidence that the causal relationship you are testing is not influenced by other factors or variables .

External validity is the extent to which your results can be generalized to other contexts.

The validity of your experiment depends on your experimental design .

Inclusion and exclusion criteria are predominantly used in non-probability sampling . In purposive sampling and snowball sampling , restrictions apply as to who can be included in the sample .

Inclusion and exclusion criteria are typically presented and discussed in the methodology section of your thesis or dissertation .

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Nikolopoulou, K. (2023, June 22). Inclusion and Exclusion Criteria | Examples & Definition. Scribbr. Retrieved September 16, 2024, from https://www.scribbr.com/methodology/inclusion-exclusion-criteria/

Is this article helpful?

Kassiani Nikolopoulou

Kassiani Nikolopoulou

Other students also liked, population vs. sample | definitions, differences & examples, external validity | definition, types, threats & examples, reproducibility vs replicability | difference & examples, get unlimited documents corrected.

✔ Free APA citation check included ✔ Unlimited document corrections ✔ Specialized in correcting academic texts

inclusion and exclusion criteria in literature review importance

Systematic Reviews for Health Sciences and Medicine

  • Systematic Reviews
  • The research question
  • Common search errors
  • Search translation
  • Managing results
  • Inclusion and exclusion criteria
  • Critical appraisal
  • Updating a Review
  • Resources by Review Stage

Inclusion and Exclusion Criteria

Inclusion and exclusion criteria set the boundaries for the systematic review.  They are determined after setting the research question usually before the search is conducted, however scoping searches may need to be undertaken to determine appropriate criteria.  Many different factors can be used as inclusion or exclusion criteria. Information about the inclusion and exclusion criteria is usually recorded as a paragraph or table within the methods section of the systematic review.   It may also be necessary to give the definitions, and source of the definition, used for particular concepts in the research question (e.g. adolescence, depression).  

inclusion and exclusion criteria in literature review importance

Other inclusion/exclusion criteria can include the sample size, method of sampling or availability of a relevant comparison group in the study.  Where a single study is reported across multiple papers the findings from the papers may be merged or only the latest data may be included.

  • << Previous: Managing results
  • Next: Critical appraisal >>
  • Last Updated: Aug 27, 2024 2:17 PM
  • URL: https://unimelb.libguides.com/sysrev

Log in using your username and password

  • Search More Search for this keyword Advanced search
  • Latest content
  • Current issue
  • Write for Us
  • BMJ Journals

You are here

  • Volume 19, Issue 1
  • Reviewing the literature
  • Article Text
  • Article info
  • Citation Tools
  • Rapid Responses
  • Article metrics

Download PDF

  • Joanna Smith 1 ,
  • Helen Noble 2
  • 1 School of Healthcare, University of Leeds , Leeds , UK
  • 2 School of Nursing and Midwifery, Queens's University Belfast , Belfast , UK
  • Correspondence to Dr Joanna Smith , School of Healthcare, University of Leeds, Leeds LS2 9JT, UK; j.e.smith1{at}leeds.ac.uk

https://doi.org/10.1136/eb-2015-102252

Statistics from Altmetric.com

Request permissions.

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Implementing evidence into practice requires nurses to identify, critically appraise and synthesise research. This may require a comprehensive literature review: this article aims to outline the approaches and stages required and provides a working example of a published review.

Are there different approaches to undertaking a literature review?

What stages are required to undertake a literature review.

The rationale for the review should be established; consider why the review is important and relevant to patient care/safety or service delivery. For example, Noble et al 's 4 review sought to understand and make recommendations for practice and research in relation to dialysis refusal and withdrawal in patients with end-stage renal disease, an area of care previously poorly described. If appropriate, highlight relevant policies and theoretical perspectives that might guide the review. Once the key issues related to the topic, including the challenges encountered in clinical practice, have been identified formulate a clear question, and/or develop an aim and specific objectives. The type of review undertaken is influenced by the purpose of the review and resources available. However, the stages or methods used to undertake a review are similar across approaches and include:

Formulating clear inclusion and exclusion criteria, for example, patient groups, ages, conditions/treatments, sources of evidence/research designs;

Justifying data bases and years searched, and whether strategies including hand searching of journals, conference proceedings and research not indexed in data bases (grey literature) will be undertaken;

Developing search terms, the PICU (P: patient, problem or population; I: intervention; C: comparison; O: outcome) framework is a useful guide when developing search terms;

Developing search skills (eg, understanding Boolean Operators, in particular the use of AND/OR) and knowledge of how data bases index topics (eg, MeSH headings). Working with a librarian experienced in undertaking health searches is invaluable when developing a search.

Once studies are selected, the quality of the research/evidence requires evaluation. Using a quality appraisal tool, such as the Critical Appraisal Skills Programme (CASP) tools, 5 results in a structured approach to assessing the rigour of studies being reviewed. 3 Approaches to data synthesis for quantitative studies may include a meta-analysis (statistical analysis of data from multiple studies of similar designs that have addressed the same question), or findings can be reported descriptively. 6 Methods applicable for synthesising qualitative studies include meta-ethnography (themes and concepts from different studies are explored and brought together using approaches similar to qualitative data analysis methods), narrative summary, thematic analysis and content analysis. 7 Table 1 outlines the stages undertaken for a published review that summarised research about parents’ experiences of living with a child with a long-term condition. 8

  • View inline

An example of rapid evidence assessment review

In summary, the type of literature review depends on the review purpose. For the novice reviewer undertaking a review can be a daunting and complex process; by following the stages outlined and being systematic a robust review is achievable. The importance of literature reviews should not be underestimated—they help summarise and make sense of an increasingly vast body of research promoting best evidence-based practice.

  • ↵ Centre for Reviews and Dissemination . Guidance for undertaking reviews in health care . 3rd edn . York : CRD, York University , 2009 .
  • ↵ Canadian Best Practices Portal. http://cbpp-pcpe.phac-aspc.gc.ca/interventions/selected-systematic-review-sites / ( accessed 7.8.2015 ).
  • Bridges J , et al
  • ↵ Critical Appraisal Skills Programme (CASP). http://www.casp-uk.net / ( accessed 7.8.2015 ).
  • Dixon-Woods M ,
  • Shaw R , et al
  • Agarwal S ,
  • Jones D , et al
  • Cheater F ,

Twitter Follow Joanna Smith at @josmith175

Competing interests None declared.

Read the full text or download the PDF:

UHN Library Logo

  • Introduction
  • Preamble: Systematic Review: What it is and isn't
  • Systematic Review Guidelines
  • 1. Formulate a Research Question
  • 2. Develop a Research Protocol
  • 3. Conduct a Thorough Literature Search
  • 4. Apply Inclusion and Exclusion Criteria
  • 5. Perform Data Extraction/Abstraction
  • 6. Conduct a Quality Appraisal of Included Studies
  • 7. Complete Data Analysis and Compile Results
  • 8. Interpret Results
  • How to Appraise a Systematic Review
  • Systematic Review Software and Tools
  • Knowledge Synthesis Services This link opens in a new window

Systematic Review Overview : 4. Apply Inclusion and Exclusion Criteria

  • Health Science Information Consortium
  • University Health Network - New
  • Systematic Review Overview

Table of Contents

  • Systematic Review Software

Step 4. Apply Inclusion and Exclusion Criteria

At the beginning of large systematic reviews, researchers discuss and develop a series of inclusion and exclusion criteria to fit in with their review question and/or the brief provided by whoever is funding the project.

Systematic reviews often exclude studies if they do not conform to specific study designs, are not written in English or within a certain time frame. As a researcher, you should be cautious of any bias you might introduce into the review by adding certain inclusion or exclusion criteria. For example: limiting to studies in English may miss important studies published in other languages, leading to language bias.

All decisions to include or exclude certain studies or groups of studies should be documented in the methods section of the research proposal/protocol - this way it can be demonstrated that a systematic process has been followed.

In large systematic reviews, the inclusion/exclusion criteria are applied by at least 2 reviewers to all the studies retrieved by the literature search. A strategy to resolve any disagreements between the reviewers should be outlined in the protocol, such as bringing in a third screener.

There are two levels of the screening process. The first level of screening involves scanning the titles and abstracts of the articles; those that are clearly irrelevant can be excluded.

Full text papers are obtained for the remaining articles and the criteria are applied again for the second level of screening on the full text. Those that meet the criteria are included in the review (although sometimes if too many papers are obtained, the question and criteria are refined and the process repeated). At this stage of screening, the reason for exclusion(s) must be recorded. This process is represented by the following flow diagram ( See PRISMA Flow Diagram ).

Key Points Regarding Study Selection

  • Section 1.3.2. Process for Study selection (http://www.york.ac.uk/inst/crd/pdf/Systematic_Reviews.pdf, actual page #35)
  • Studies should be selected in an unbiased way, based on selection criteria that flow directly from the review questions, and that have been piloted to check that they can be reliably applied.
  • Study selection is a staged process involving sifting through the citations located by the search, retrieving full reports of potentially relevant citations and, from their assessment, identifying those studies that fulfill the inclusion criteria.
  • Parallel independent assessments should be conducted to minimize the risk of errors of judgment. If disagreements occur between reviewers, they should be resolved according to a predefined strategy using consensus and arbitration as appropriate.
  • The study selection process should be documented, detailing reasons for inclusion and exclusion.

Tips to Improve Inter-Rater Reliability / Screener Selection Accuracy

While awaiting search strategy development and final citation results:

  • Provide clear and explicit inclusion and exclusion criteria, with definitions and explanations where warranted.
  • Conduct thorough training for all involved.
  • Provide clear guidelines which should be reviewed by all prior to starting the activity.
  • Provide pilot testing or beta testing of screening tools/procedures, using samples/subsets of real data (with test inter-rater reliability calculations to determine preliminary agreement or variability).
  • Optional: pilot or beta test screeners in pairs: one screener with previous experience paired with a more novice screener.
  • Conduct ongoing, active surveillance/auditing of activities (can see if/when going off course)
  • Provide ongoing opportunities for discussion, education, and training.
  • The Screening Phase for Reviews Tutorial (5 min+) This tutorial presents information on the screening process for systematic reviews or other knowledge syntheses, and contains a variety of resources including guidelines, best practices, tips, and tools for successfully preparing to complete this important research stage.
  • 1. Slavin RE. Best evidence synthesis: an intelligent alternative to meta-analysis. J Clin Epidemiol. 1995 Jan;48(1):9-18.
  • 2. Eysenck HJ. Meta-analysis and its problems. BMJ. 1994 Sep 24;309(6957):789-92.
  • 3. Moher D, Fortin P, Jadad AR, Juni P, Klassen T, Le Lorier J, et al. Completeness of reporting of trials published in languages other than English: implications for conduct and reporting of systematic reviews. Lancet. 1996 Feb 10;347(8998):363-6.
  • 4. Vickers A, Goyal N, Harland R, Rees R. Do certain countries produce only positive results? A systematic review of controlled trials. Control Clin Trials. 1998 Apr;19(2):159-66.
  • 5. Moher D, Pham B, Klassen T, Schultz KF, Berlin J, Jadad AR, et al. Does the language of publication of reports of randomized trials influence the estimates of intervention effectiveness reported in meta-analyses? Systematic Reviews: Evidence in Action,
  • 6. PRISMA Statement. "The PRISMA Statement consists of a 27-item checklist and a four-phase flow diagram. It is an evolving document that is subject to change periodically as new evidence emerges. In fact, the PRISMA Statement is an update and expansion of the now-out dated QUORUM Statement. This website contains the current definitive version of the PRISMA Statement."
  • << Previous: 3. Conduct a Thorough Literature Search
  • Next: 5. Perform Data Extraction/Abstraction >>
  • Last Updated: Sep 12, 2024 1:14 PM
  • URL: https://guides.hsict.library.utoronto.ca/c.php?g=699108

We acknowledge this sacred land on which the University Health Network operates. For thousands of years it has been the traditional territory of the Huron-Wendat, the Haudenosaunee, and most recently, the Mississaugas of the Credit River. This territory was the subject of the Dish With One Spoon Wampum Belt Covenant, an agreement between the Haudenosaunee Confederacy and the Confederacy of the Ojibwe and allied nations to peaceably share and care for the resources around the Great Lakes. Today, the meeting place of Toronto is still the home to many Indigenous people from across Turtle Island and we are grateful to have the opportunity to work and learn on this territory

UHN Library and Information Services

Scoping Reviews

  • What is a Scoping Review?
  • Best Practices
  • Review Protocol

Eligibility Criteria

Inclusion criteria, exclusion criteria.

  • Database Search Strategies
  • Study Selection (Screening)
  • Data Extraction
  • Reference Management

Specify study characteristics (e.g., PICOS, length of follow-up) and report characteristics (e.g., years considered, language, publication status) used as criteria for eligibility, giving rationale. 

Think about criteria that will be used to select articles for your literature review based on your research question.  These are commonly known as  inclusion criteria  and  exclusion criteria .  You may introduce bias into the final review if these are not used thoughtfully. 

According to the PRISMA-SCcR Checklist , item 6 , authors should "specify characteristics of the sources of evidence used as eligibility criteria (e.g., years considered, language, and publication status), and provide a rationale."

Inclusion criteria are the elements of an article that must be present in order for it to be eligible for inclusion in a literature review.  

For example, included studies must:

  • have compared certain treatments
  • be experimental or observational or both
  • have been published in a certain timeframe (must have compelling reason)
  • be certain publication type(s)
  • have recruited a certain population

Exclusion criteria are the elements of an article that disqualify the study from inclusion in a literature review.  

For example, excluded studies: 

  • used qualitative methodology
  • used a certain study design (e.g, observational)
  • are a certain publication type (e.g., systematic reviews)
  • were published before a certain year (must have compelling reason)
  • used animal models
  • was published in a language other than English
  • << Previous: Review Protocol
  • Next: Database Search Strategies >>
  • Last Updated: Sep 12, 2024 4:02 PM
  • URL: https://musc.libguides.com/scopingreviews

Systematic Reviews: Inclusion and Exclusion Criteria

  • What Type of Review is Right for You?
  • What is in a Systematic Review
  • Finding and Appraising Systematic Reviews
  • Formulating Your Research Question
  • Inclusion and Exclusion Criteria
  • Creating a Protocol
  • Results and PRISMA Flow Diagram
  • Searching the Published Literature
  • Searching the Gray Literature
  • Methodology and Documentation
  • Managing the Process
  • Scoping Reviews

Defining Inclusion/Exclusion Criteria

An important part of the SR process is defining what will and will not be included in your review. 

Inclusion and exclusion criteria are developed after a research question is finalized but before a search is carried out. They determine the limits for the evidence synthesis and are typically reported in the methods section of the publication. For unfamiliar or unclear concepts, a definition may be necessary to adequately describe the criterion for readers. 

Some examples of common inclusion/exclusion criteria might be:

  • Date of publication : only articles published in the last ten years
  • Exposure to intervention/ or specific health condition : only people who have participated in the DASH diet
  • Language of Publication* : only looking at English articles 
  • Settings : Hospitals, nursing homes, schools
  • Geography : specific locations such as states, countries, or specific populations

*note of caution: research is published all over the world and in multiple languages. Limiting to just English can be considered a bias to your research.

  • Common Inclusion/Exclusion Criteria from the University of Melbourne

What happens if no study meets my inclusion/exclusion criteria?

Empty reviews are when no studies meet the inclusion criteria for a SR. Empty reviews are more likely to subject to publication bias, however, they are important in identifying gaps in the literature. 

  • Unanswered questions implications of an empty review Slyer, Jason T. Unanswered questions, JBI Database of Systematic Reviews and Implementation Reports: June 2016 - Volume 14 - Issue 6 - p 1-2 doi: 10.11124/JBISRIR-2016-002934
  • Rapid Prompting Method and Autism Spectrum Disorder: Systematic Review Exposes Lack of Evidence Schlosser, R.W., Hemsley, B., Shane, H. et al. Rapid Prompting Method and Autism Spectrum Disorder: Systematic Review Exposes Lack of Evidence. Rev J Autism Dev Disord 6, 403–412 (2019).
  • << Previous: Formulating Your Research Question
  • Next: Creating a Protocol >>
  • Last Updated: Sep 6, 2024 1:05 PM
  • URL: https://guides.lib.lsu.edu/Systematic_Reviews

Provide Website Feedback Accessibility Statement

Banner

Evidence-Based Practice (EBP)

  • The EBP Process
  • Forming a Clinical Question
  • Inclusion & Exclusion Criteria
  • Acquiring Evidence
  • Appraising the Quality of the Evidence
  • Writing a Literature Review
  • Finding Psychological Tests & Assessment Instruments

Selection Criteria

Inclusion and exclusion are two sides of the same coin.

Inclusion and exclusion criteria are determined after formulating the research question but usually before the search is conducted (although preliminary scoping searches may need to be undertaken to determine appropriate criteria).  It may be helpful to determine the inclusion criteria and exclusion criteria for each PICO component.

Be aware that you may  introduce bias  into the final review if these are not used thoughtfully. 

Inclusion and exclusion are two sides of the same coin, so—depending on your perspective—a single database filter can be said to either include or exclude. For instance, if articles must be published within the last 3 years, that is inclusion. If articles cannot be more than 3 years old, that is exclusion. 

The most straightforward way to include or exclude results is to use database limiters (filters), usually found on the left side of the search results page.

Inclusion Criteria

Inclusion criteria are the elements of an article  that must be present  in order for it to be eligible for inclusion in a literature review. Some examples are:

  • Included studies must have compared certain treatments
  • Included studies must be a certain type (e.g., only Randomized Controlled Trials)
  • Included studies must be located in a certain geographic area
  • Included studies must have been published in the last 5 years

Exclusion Criteria

Exclusion criteria are the elements of an article that  disqualify the study from inclusion  in a literature review. Some examples are:

  • Study used an observational design
  • Study used a qualitative methodology
  • Study was published more than 5 years ago
  • Study was published in a language other than English
  • << Previous: Forming a Clinical Question
  • Next: Acquiring Evidence >>
  • Last Updated: May 16, 2024 2:44 PM
  • URL: https://libguides.umsl.edu/ebp

Europe PMC requires Javascript to function effectively.

Either your web browser doesn't support Javascript or it is currently turned off. In the latter case, please turn on Javascript support in your web browser and reload this page.

inclusion and exclusion criteria in literature review importance

  • Subscribe to journal Subscribe
  • Get new issue alerts Get alerts

Secondary Logo

Journal logo.

Colleague's E-mail is Invalid

Your message has been successfully sent to your colleague.

Save my selection

Developing the Review Question and Inclusion Criteria

Stern, Cindy PhD; Jordan, Zoe PhD; McArthur, Alexa MPHC, MClinSc

Cindy Stern is a senior research fellow in communication science at the Joanna Briggs Institute in Adelaide, South Australia, where Zoe Jordan is the acting executive director and Alexa McArthur is a senior research fellow. Stern is also the coordinator of the Cochrane Nursing Care Field, one of 12 fields within the Cochrane Collaboration supporting systematic reviews. Contact author: Cindy Stern, [email protected] . The authors have disclosed no potential conflicts of interest, financial or otherwise.

The Joanna Briggs Institute aims to inform health care decision making globally through the use of research evidence. It has developed innovative methods for appraising and synthesizing evidence; facilitating the transfer of evidence to health systems, health care professionals, and consumers; and creating tools to evaluate the impact of research on outcomes. For more on the institute's approach to weighing the evidence for practice, go to http://joannabriggs.org/jbi-approach.html .

Overview 

This article is the second in a new series on the systematic review from the Joanna Briggs Institute, an international collaborative supporting evidence-based practice in nursing, medicine, and allied health fields. The purpose of the series is to show nurses how to conduct a systematic review—one step at a time. This article details the process of articulating a review question to guide the search for relevant studies and discusses how to define inclusion criteria for the study-selection phase of the review.

This second article in a series from the Joanna Briggs Institute describes how to construct a well-built clinical question for a systematic review.

What constitutes appropriate “evidence” for evidence-based practice? This question has been the subject of considerable discussion for many years. It's also of critical importance when conducting a systematic review. The first article in this series on systematic reviews from the Joanna Briggs Institute (JBI), published last month, presented an overview and definitions of the systematic review. Briefly, a systematic review is research undertaken to identify, evaluate, and synthesize the results of individual studies on a particular topic, making reliable data available in a usable form. 1 Alan Pearson and colleagues at the JBI have written that when making decisions “clinicians (often quite subconsciously) are frequently trying to select an appropriate activity or intervention and to assess the degree to which the decision will meet the four practice interests of health professionals”—namely, whether it's feasible, appropriate, meaningful, and effective (FAME). 2

But the evidence-based practice movement has focused largely on just one of these interests, effectiveness. Pearson and colleagues have argued for a pluralistic approach when considering what counts as evidence in health care; they write that not all questions can be answered from studies measuring effectiveness alone. 3 To meet the wide array of problems health care professionals encounter, a wide range of research methodologies and a broad definition of evidence are warranted.

Constructing a well-built clinical question for a systematic review is a skill that can be learned. Reviewers can ask themselves a number of questions, such as: Is the information we seek analytical? Is it focused on a particular therapy? If so, will we examine its preventive effects in terms of quality of life? Or, conversely, will we look at its economic viability? Will the outcomes measured be meaningful enough to justify the high costs of conducting the review? Remember, the question puts the review process in motion and forms the basis for the inclusion and exclusion criteria. It therefore merits careful consideration.

THE REVIEW QUESTION

A clear question will not only guide researchers in conducting a review, it will also help readers to discern whether or not they should read it. The question also facilitates indexing in online databases such as PubMed or the Cumulative Index to Nursing and Allied Health Literature (CINAHL) and will show a clear relationship to the inclusion criteria.

If we compare the two following questions, we can see that the first is clearer in its intentions and contains more information for both reviewer and reader than the second.

  • What are the effects of turning long-term care residents every two hours compared with every four hours in preventing pressure ulcers?
  • What is the best way to prevent pressure ulcers?

Determining the question is one of the first steps in planning a systematic review because it largely establishes the conduct of the review; for example, inclusion criteria are developed as a result of the question. A good question should incorporate the four elements included in the PICO mnemonic:

  • P opulation
  • I ntervention
  • C omparison intervention
  • O utcome measures

A variety of mnemonics exists to help reviewers structure the review question. PICO is most frequently used in quantitative reviews (those incorporating research based on traditional scientific methods that generate numerical data). 4 Its variants PICOS and PICOT, where S stands for study designs (indicating which study designs, such as randomized controlled trial [RCT] or diagnostic study, are eligible to answer your question) and T stands for time frame (a period over which outcomes are assessed, such as 24 hours after surgery), can also be used. Such mnemonics aid in the clarification of the structure of the review and its question. At the JBI 4 and the Cochrane Collaboration, 5 PICO is the preferred choice for question development. Furthermore, PICO may be used in the systematic review process to guide concept mapping when designing the search strategy. (Concept mapping is used to help identify relevant keywords and search terms for your review.)

PICo (with a lowercase o )can be equally useful for qualitative reviews (those seeking to analyze human experience and social phenomena). 4 With qualitative evidence there is no outcome or comparator to be considered. The core elements of PICo are:

  • phenomenon of I nterest

The phenomenon of interest differs from an intervention in its focus. Quantitative reviews are concerned with an intervention and seek to isolate it from the happenings and influences of study participants. Reviews containing qualitative evidence focus on the engagement between the participant and the intervention. A qualitative review may describe an intervention, but its question focuses on the perspective of the individuals experiencing it as part of a larger phenomenon.

Other mnemonics useful for qualitative reviews include SPICE (Setting, Perspective, Intervention, Comparison, Evaluation) and SPIDER (Sample, Phenomenon of Interest, Design, Evaluation, Research type). 6

Perhaps you want to ask both a quantitative and qualitative question on the same topic. For example, if you were interested in the effectiveness of compression stockings in preventing deep vein thrombosis (DVT), you might want to find studies that compare the stockings to placebo (a quantitative review), as well as explore the experiences of those who use them (a qualitative review). Although compression stockings may be effective in preventing DVT, they may be uncomfortable to use, and compliance rates may be low. Reviews that incorporate more than one type of data are called “comprehensive” or “mixed methods” systematic reviews.

Let's look at both quantitative and qualitative review questions in more detail.

THE QUANTITATIVE REVIEW: A QUESTION OF EFFECT

A solid objective will inform the identification and subsequent inclusion of studies, the data extraction, and the data synthesis. The objective can also help readers in a preliminary assessment of the review's relevance to them. Quantitative reviews conducted by JBI researchers will specify the population, the intervention, and the outcomes of interest.

When developing the review question, reviewers should consider how general the review will be with regard to the characteristics of the population (for example, “nurses” versus “female RNs with a minimum of five years’ experience”), the type of intervention (such as any drug therapy used for depression of any dosage for any duration), and the outcomes of interest (such as adverse effects or depression measured by any validated scale at any time). These details can then be added when completing the inclusion criteria. The JBI's reviewers’ manual and the Cochrane Collaboration's reviewer's handbook recommend that the following features be considered when developing a question for a quantitative review 4, 5 :

  • the most significant features of the population under investigation (such as age or illness)
  • the experimental and control interventions
  • any variations in the intervention (such as administration method or dosage) and whether studies involving such variations will be included
  • whether RCTs addressing only part of the intervention or combined with another intervention will be included

Regarding outcomes, the Cochrane Collaboration makes the following suggestions 5 :

“Outcomes may include survival (mortality), clinical events (e.g. strokes or myocardial infarction), patient-reported outcomes (e.g. symptoms, quality of life), adverse events, burdens (e.g. demands on caregivers, frequency of tests, restrictions on lifestyle) and economic outcomes (e.g. cost and resource use). It is critical that outcomes used to assess adverse effects as well as outcomes used to assess beneficial effects are among those addressed by a review…. If combinations of outcomes will be considered, these need to be specified. For example, if a study fails to make a distinction between non-fatal and fatal strokes, will these data be included in a meta-analysis if the question specifically relates to stroke death?”

“Review authors should consider how outcomes may be measured, both in terms of the type of scale likely to be used and the timing of measurement. Outcomes may be measured objectively (e.g. blood pressure, number of strokes) or subjectively as rated by a clinician, patient, or carer (e.g. disability scales). It may be important to specify whether measurement scales have been published or validated. When defining the timing of outcome measurement, authors may consider whether all time frames or only selected time-points will be included in the review.”

An example of a quantitative review question is this from the JBI database: “What is the effect of an individualized survivorship care plan as compared to usual care on quality of life on the adult female breast cancer survivor?” 7 The review question clearly satisfies all four of the PICO elements: population (adult female breast cancer survivors), intervention (individualized survivorship care plan), comparison intervention (usual care), and outcome measure (quality of life).

THE QUALITATIVE REVIEW: A QUESTION OF EXPERIENCE

Qualitative reviews seek “to understand the meaning of phenomena and their relationships” 8 and use the PICo mnemonic. Specifications on the population (either for inclusion or exclusion criteria) must be delineated. Although the term population is also used in qualitative reviews, its use doesn't imply that all of the features relevant to quantitative reviews such as sampling methods or homogeneity (which refers to similarity among included studies’ results) are appropriate here. Rather, population characteristics in a qualitative review relate to peoples’ subjective experience or the meaning that a disease or an intervention holds for them.

A phenomenon of interest is the experience, event, or process under study. Examples might include patients’ responses to pain or how they cope with breast cancer. The level of detail ascribed to the phenomenon will differ depending on the nature or intricacy of the subject. A question on the experience of older adults exercising may be rather straightforward, for example, whereas a question on the experiences of women who were sexually abused as children may lend itself to a more complex kind of detail. Regardless, the question may be clarified, expanded, or revised as the protocol develops.

In reviews containing qualitative evidence, context will also vary; it will depend on the review's objective and questions. When determining context, reviewers may consider factors such as geographic location, interests based on race or gender, and clinical setting (such as long-term care). Remember that in qualitative reviews there is no need to list outcomes; the focus is on the experiences of the participants.

An example of a qualitative review question from the JBI database is: “What is the experience of the adult neutropenic patient with cancer being nursed in the isolation room?” 9 The review question identifies the population (adult neutropenic patients with cancer), the phenomenon of interest (the patients’ experiences while being cared for by nurses), and the context (in isolation).

THE REVIEW PROTOCOL

A good review question lays the foundation for the development of a robust protocol—that is, where you flesh out the elements of PICO or PICo in making a plan that ensures scientific rigor and minimizes bias. Regardless of whether the review involves quantitative or qualitative research (or both), criteria exist that must be addressed in the protocol (such as inclusion criteria and methods).

Inclusion criteria determine which research articles will be selected. In order for the reader to understand the focus of the review (and its limitations), the reviewers need to be precise in outlining the inclusion criteria. The following aspects should be addressed 2 :

  • the types of studies to be included (such as cohort or ethnographic studies)
  • the intervention, activity, or phenomenon under investigation (such as drug therapy for smoking cessation or the experience of smokers undertaking hypnotherapy)
  • the outcome (for quantitative questions; for example, the effectiveness of drug therapy for smoking cessation)
  • the population (such as females ages 16 years or older who have smoked for at least three years)
  • publication language (such as English only or English, simplified Chinese, and Japanese)
  • the time period (such as studies published between 1999 and 2013)

The clarity of the inclusion criteria also ensures the replicability of the review.

Methods. It is important to clarify the methods you will use to search the literature, appraise the studies retrieved, and extract and synthesize the data. (These steps will be discussed in later articles in this series.)

Conclusion. While health care workers frequently want to answer very general questions, it is often easier to conduct a systematic review on a narrow, more focused question. In doing so, the final product is also more likely to present useful results that can be applied when making clinical decisions. If a reviewer is interested in a broad topic such as managing heart disease, which covers several factors (pharmacologic or surgical treatment and lifestyle modifications, for example), it is better to establish a series of questions related to that topic and conduct a series of reviews than try to cover all of them in a single review.

clinical question; inclusion criteria; qualitative review; quantitative review; review question; systematic review

  • + Favorites
  • View in Gallery

Readers Of this Article Also Read

The systematic review: an overview, jbi's systematic reviews: study selection and critical appraisal, systematic reviews: constructing a search strategy and searching for evidence, jbi's systematic reviews: data extraction and synthesis, evidence-based practice: step by step: the seven steps of evidence-based....

Nature

Library guides

Our library guides bring together the essential resources in your subject area and connect you quickly and easily to information about Library Services

CityLibrary Search

Literature searching and finding evidence.

  • Literature searching or literature review?
  • Use the PICO or PEO frameworks

Establish your Inclusion and Exclusion criteria

  • Find related search terms
  • Subject Heading/MeSH Searching
  • Select databases to search
  • Structure your search
  • Search techniques
  • Search key databases
  • Manage results in EBSCOhost and Ovid
  • Analyse your search results
  • Document your search results
  • Training and support

These criteria help you decide which pieces of evidence (for example, which primary research studies) will/will not be included in your work. Using specific criteria will help make sure your final review is as unbiased, transparent and ethical as possible.

How to establish your Inclusion and Exclusion criteria

To establish your criteria you need to define each aspect of your question to clarify what you are focusing on, and consider if there are any variations you also wish to explore. This is where using frameworks like PICO help:

Example:   Alternatives to drugs for controlling headaches in children.

Using the PICO structure you clarify what aspects you are most interested in. Here are some examples to consider:

    Children

A specific age group? Teenagers and adolescents?

    Alternatives to drugs

What alternatives are there? Complementary therapies? Alternative medicines? Changes in lifestyle? All three?

If you decide to focus on 'complementary therapies' do you want to examine all therapies or a specific therapy like holistic therapy?

    Drugs

All drugs that treat headaches, or a group of drugs, or a specific drug?

   Headaches

All types of headaches, or a specific type such as tension headaches or migraines?

The aspects of the topic you decide to focus on are the  Inclusion  criteria.

The aspects you don't wish to include are the  Exclusion  criteria.

  • << Previous: Use the PICO or PEO frameworks
  • Next: Find related search terms >>
  • Last Updated: Aug 23, 2024 3:26 PM
  • URL: https://libguides.city.ac.uk/SHS-Litsearchguide

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • My Bibliography
  • Collections
  • Citation manager

Save citation to file

Email citation, add to collections.

  • Create a new collection
  • Add to an existing collection

Add to My Bibliography

Your saved search, create a file for external citation management software, your rss feed.

  • Search in PubMed
  • Search in NLM Catalog
  • Add to Search

Inclusion and exclusion criteria in research studies: definitions and why they matter

Affiliation.

  • 1 Methods in Epidemiologic, Clinical, and Operations Research-MECOR-program, American Thoracic Society/Asociación Latinoamericana del Tórax, Montevideo, Uruguay.
  • PMID: 29791550
  • PMCID: PMC6044655
  • DOI: 10.1590/s1806-37562018000000088

PubMed Disclaimer

Similar articles

  • The generalizability of clinical and experimental tobacco research: Do exclusion criteria impact men and women differently? Spinella TC, Perry RN, Schlagintweit HE, Barrett SP. Spinella TC, et al. J Psychopharmacol. 2020 May;34(5):584-585. doi: 10.1177/0269881119900984. Epub 2020 Jan 22. J Psychopharmacol. 2020. PMID: 31965900 No abstract available.
  • Exclusion criteria in treatment research on alcohol, tobacco and illicit drug use disorders: A review and critical analysis. Moberg CA, Humphreys K. Moberg CA, et al. Drug Alcohol Rev. 2017 May;36(3):378-388. doi: 10.1111/dar.12438. Epub 2016 Jun 21. Drug Alcohol Rev. 2017. PMID: 27324921 Review.
  • The European Union Directive on Clinical Research: present status of implementation in EU member states' legislations with regard to the incompetent patient. Lemaire F, Bion J, Blanco J, Damas P, Druml C, Falke K, Kesecioglu J, Larsson A, Mancebo J, Matamis D, Pesenti A, Pimentel J, Ranieri M; ESICM Task Force on Legislation Affecting Clinical Research in the Critically Ill Patient. Lemaire F, et al. Intensive Care Med. 2005 Mar;31(3):476-9. doi: 10.1007/s00134-005-2574-8. Epub 2005 Feb 15. Intensive Care Med. 2005. PMID: 15711974 No abstract available.
  • Impact of patient selection in various study designs: identifying potential bias in clinical results. Berbano EP, Baxi N. Berbano EP, et al. South Med J. 2012 Mar;105(3):149-55. doi: 10.1097/SMJ.0b013e31824b4690. South Med J. 2012. PMID: 22392211 Review.
  • Standardized patient method in tuberculosis research. Grace GA, Devaleenal DB. Grace GA, et al. Natl Med J India. 2017 Jul-Aug;30(4):210-211. Natl Med J India. 2017. PMID: 29162755 No abstract available.
  • Experiences of Nurses and Midwives With Indecorously Structured Duty Rosters at Selected Health Facilities in Ho, Volta Region of Ghana: A Qualitative Study. Dartey AF, Tackie V, Lotse CW, Lily D, Sagbo FM. Dartey AF, et al. SAGE Open Nurs. 2024 Aug 23;10:23779608241275323. doi: 10.1177/23779608241275323. eCollection 2024 Jan-Dec. SAGE Open Nurs. 2024. PMID: 39185503 Free PMC article.
  • Efficacy of Benson's Relaxation Technique on Stress and Pain Among Patients Undergoing Maintenance Hemodialysis: A Systematic Review. Abu Maloh HIA, Soh KL, Chong SC, Ismail SIF, Soh KG, Abu Maloh DI, AbuRuz ME. Abu Maloh HIA, et al. SAGE Open Nurs. 2024 May 6;10:23779608241251663. doi: 10.1177/23779608241251663. eCollection 2024 Jan-Dec. SAGE Open Nurs. 2024. PMID: 38715771 Free PMC article. Review.
  • ROUTE-T1D: A behavioral intervention to promote optimal continuous glucose monitor use among racially minoritized youth with type 1 diabetes: Design and development. Straton E, Bryant BL, Kang L, Wang C, Barber J, Perkins A, Gallant L, Marks B, Agarwal S, Majidi S, Monaghan M, Streisand R. Straton E, et al. Contemp Clin Trials. 2024 May;140:107493. doi: 10.1016/j.cct.2024.107493. Epub 2024 Mar 7. Contemp Clin Trials. 2024. PMID: 38460913 Clinical Trial.
  • Long Non-coding RNA X-Inactive Specific Transcript Promotes Esophageal Squamous Cell Carcinoma Progression via the MicroRNA 34a/Zinc Finger E-box-Binding Homeobox 1 Pathway. Guo B, He M, Ma M, Tian Z, Jin J, Tian G. Guo B, et al. Dig Dis Sci. 2024 Apr;69(4):1169-1181. doi: 10.1007/s10620-024-08269-0. Epub 2024 Feb 16. Dig Dis Sci. 2024. PMID: 38366093 Free PMC article.
  • Study protocol for the family empowerment program: a randomized waitlist-controlled trial to evaluate the effectiveness of online Community Reinforcement and Family Training (CRAFT) on the wellbeing of family members with a relative experiencing substance dependence and mental illness. Allan J, Snowdon N, Thapa S, Ahmed KY. Allan J, et al. BMC Psychiatry. 2024 Jan 10;24(1):43. doi: 10.1186/s12888-023-05487-0. BMC Psychiatry. 2024. PMID: 38200508 Free PMC article.
  • Montes de Oca M.Menezes A.Wehrmeister FC.Lopez Varela MV.Casas A.Ugalde L Adherence to inhaled therapies of COPD patients from seven Latin American countries The LASSYC study. PLoS One. 2017;12(11):e0186777. doi: 10.1371/journal.pone.0186777. - DOI - PMC - PubMed
  • Hulley SB, Cummings SR, Browner WS, Grady DG, Newman TB. Designing Clinical Research. PA: Lippincott Williams & Wilkins; 2007.
  • Search in MeSH

Related information

  • Cited in Books

LinkOut - more resources

Full text sources.

  • Europe PubMed Central
  • PubMed Central
  • Scientific Electronic Library Online

Other Literature Sources

  • scite Smart Citations
  • Citation Manager

NCBI Literature Resources

MeSH PMC Bookshelf Disclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.

STEM Outside of School: a Meta-Analysis of the Effects of Informal Science Education on Students' Interests and Attitudes for STEM

  • Open access
  • Published: 17 September 2024

Cite this article

You have full access to this open access article

inclusion and exclusion criteria in literature review importance

  • Xin Xia   ORCID: orcid.org/0009-0009-1717-8511 1 ,
  • Lillian R. Bentley 2 ,
  • Xitao Fan 3 &
  • Robert H. Tai 4  

This meta-analysis explores the impact of informal science education experiences (such as after-school programs, enrichment activities, etc.) on students' attitudes towards, and interest in, STEM disciplines (Science, Technology, Engineering, and Mathematics). The research addresses two primary questions: (1) What is the overall effect size of informal science learning experiences on students' attitudes towards and interest in STEM? (2) How do various moderating factors (e.g., types of informal learning experience, student grade level, academic subjects, etc.) impact student attitudes and interests in STEM? The studies included in this analysis were conducted within the United States in K-12 educational settings, over a span of thirty years (1992–2022). The findings indicate a positive association between informal science education programs and student interest in STEM. Moreover, the variability in these effects is contingent upon several moderating factors, including the nature of the informal science program, student grade level, STEM subjects, publication type, and publication year. Summarized effects of informal science education on STEM interest are delineated, and the implications for research, pedagogy, and practice are discussed.

Avoid common mistakes on your manuscript.

STEM careers and opportunities, encompassing Science, Technology, Engineering, and Mathematics, are fundamental drivers of economic stability within contemporary society (Xie & Killewald, 2012 ; Xie et al., 2015 ). The United States has demonstrated a strategic commitment to fostering successive generations of STEM-focused individuals to maintain global competitiveness within the modern market (National Research Council [NRC], 2007 ). An educational priority has thus been the cultivation of student interest in STEM disciplines. It is widely acknowledged that students exhibiting such interest possess heightened prospects for navigating the STEM pipeline toward attaining a career in the field (Beier & Rittmayer, 2008 ; Business-Higher Education Forum, 2011 ; Tai et al., 2006 ). Attainment of a STEM career not only bolsters individual economic stability but also amplifies the potential for national STEM-driven innovation (Xie et al., 2015 ).

Students gain STEM experiences through interacting in formal (such as that occurring in schools, colleges, and universities) and informal learning environments (that occur everywhere else; Krishnamurthi & Rennie, 2013 ). On average, children spend approximately 20% of their time learning in formal educational environments (Eshach, 2007 ; Falk & Dierking, 2010 ; Sacco et al., 2014 ). This suggests that informal learning experiences could contain enormous potential to strengthen and enrich school STEM experiences (Bevan et al., 2010 ; NRC, 2009 ; Phillipset al., 2007 ). The United States has a variety of informal learning institutions for people to engage in STEM learning opportunities, including public libraries, zoos, aquariums, and museums (Falk & Dierking, 2010 ). Nevertheless, there is limited understanding regarding the specific types of informal experiences that ignite and sustain children's STEM learning (National Academies of Sciences, Engineering, and Medicine [NASEM], 2016 ). Therefore, understanding how these opportunities support STEM interest is an important area of research.

Research into the effects of informal science programs highlights that situational factors can create conflicting results. On the one hand, informal science learning was shown to promote and strengthen student understanding of school science and interest in STEM (Bevan et al., 2010 ; Maiorca et al., 2021 ; Phillips et al., 2007 ). On the other hand, however, informal science experiences could have negative effects on students’ attitudes toward STEM (Migliarese, 2011 ; Shields, 2010 ). The inconsistency of such research findings suggest that research attention is needed to understand how informal science learning may affect students’ attitudes toward STEM.

The aim of this investigation was to conduct a current quantitative meta-analysis examining the impacts of informal science programs on students' attitudes and interests towards STEM disciplines. This meta-analysis offers fresh perspectives on potential determinants underlying the variation in effect sizes observed across diverse studies. These nuanced insights offer valuable guidance for educators, policymakers, and practitioners in their endeavor to create informal science learning initiatives aimed at cultivating and nurturing students' interest and attitudes towards STEM.

Literature Review

Formal and informal learning experiences.

Formal and informal learning environments can support STEM-focused learning experiences (Decoito, 2014 ; Frantz et al., 2011 ). Formal STEM learning is any experience or activity that happens within a normal class period and at a school. Informal science learning is any activity or experience that occurs outside of the regular school day, and it includes activities that are not developed as part of an ongoing school curriculum (Crane, 1994 ). Formal STEM learning experiences tend to be mandatory, whereas informal science learning experiences are mostly voluntary (Crane, 1994 ). For this meta-analysis, we include school-based field trips, student out-of-school projects (after-school camps), community-based science youth programs, visits to museums and zoos, after-school programs, summer camps, and weekend school camps as informal science activities (Hofstein & Rosenfeld, 1996 ; So et al., 2018 ).

Attitude and Interest Toward STEM

Various conceptualizations of attitudes and interests in STEM exist within scholarly discourse. Gauld and Hukins ( 1980 ) delineated attitudes towards scientists, scientific inquiry, science learning, science-related activities, science careers, and the embrace of scientific attitudes as affective behaviors indicative of interests in science. Alternatively, Osborne et al. ( 2003 ) posited that attitudes towards science encompass emotions, beliefs, and values pertaining to science and its societal impact. Additionally, Potvin and Hasni ( 2014 ) categorized attitudes into distinct dimensions such as motivation, enjoyment, interest, self-efficacy related to being able to complete STEM tasks and STEM career aspirations.

There are diverse definitions of attitude and interest in STEM. Gauld and Hukins ( 1980 ) proposed that attitudes toward scientists, scientific inquiry, science learning, science-related activities, science careers, and the adoption of scientific attitudes were all affective behaviors representing interests in science. Osborne et al. ( 2003 ) proposed that attitudes toward science are composed of feelings, beliefs, and values held about science, and the impact of science on society. Attitudes can also include the ideas of enjoyment, motivation, interest, self-efficacy, and career aspirations (Potvin & Hasni, 2014 ).

Drawing from a comprehensive examination of pertinent scholarly literature concerning attitudes toward science, and extending this framework to encompass STEM disciplines, we operationalized attitude toward STEM across four distinct dimensions: interest, self-efficacy, overall attitude toward STEM, and career interests (Wiebe et al., 2018 ). Interest in STEM pertains to the affective domain, encapsulating individuals' emotions and sentiments regarding the learning of scientific subjects (e.g., Zhang & Tang, 2017 ). Self-efficacy refers to students' perceptions of their own capabilities to excel in STEM domains, as articulated by Bandura ( 1995 ). Attitude toward STEM denotes individuals' perspectives on the value, utility, and societal ramifications of science (e.g., Dowey, 2013 ). Lastly, STEM career interest is delineated as the extent to which students can furnish meaningful insights into their aspirations for future careers within STEM disciplines (Maltese & Tai, 2011 ).

This meta-analysis extends the scope of prior research conducted by Young et al. ( 2017 ). Young et al. completed a meta-analysis spanning 2009–2015 on the impact of out-of-school time learning on students' interests in STEM. The authors operationally defined out-of-school time as encompassing summer enrichment programs and after-school activities. Their findings indicated a favorable influence of out-of-school time on students' inclination towards STEM subjects. Furthermore, the variability in these effects was found to be moderated by factors such as the methodological rigor of the research design, the thematic focus of the programs, and the grade level of the participants.

Our analysis built on this foundation in several keyways. First, we extended the scope of time by including 30 years of empirical studies (1992–2022), compared to six from Young et al. ( 2017 ). We also broaden the types of programs evaluated by looking at all informal programs including field trips, outreach programs, and mobile labs. Additionally, we looked at the moderating effects of STEM as an entire discipline, then Science, Technology, Engineering, and Mathematics separately. We included the additional moderators of publication type, and year of study in the overall analysis. Finally, we extended the outcome variable by examining student interest, attitude, and self-efficacy toward STEM. The additional moderators and scope of the study will considerably increase the breadth and depth of the empirical knowledge base about the effects of informal science education experiences on students' interests and attitudes toward STEM.

Potential Moderators

The following literature review aims to elucidate salient findings concerning the different moderators used in this analysis. Specifically, the review focuses on the scholarship concerning different distinct categories of informal science programs, the influence of grade-level variations on students' STEM interests, the diverse typologies of STEM activities, as well as the impact of publication type and publication year on the overall effect of informal programing on students interests and attitudes toward STEM. Additionally, an examination of pertinent empirical conclusions concerning student interest, attitude, and self-efficacy regarding informal science programs will be summarized.

Type of Program

There are many different types of informal science programs. In this analysis, we consider after-school programs, summer camps, outreach programs, weekend school camps, field trips, mobile labs, and a mixture of these different out-of-school experiences to be included in informal education. Young et al. ( 2017 ) concluded that out-of-school programming (after school or summer school) did not produce significant effects on students' attitudes toward STEM. We are interested in re-examining the effects of after-school programs on students’ attitudes towards STEM and will also include other types of informal science experiences in our analysis. Variability in the outcomes of studies examining the association between informal STEM programs and students' interest levels in STEM may be attributable to the diversity of program structures investigated. Consequently, there exists potential for enriching our comprehension of informal science programming through the exploration of alternative forms of informal science experiences.

After-School Programs

After-school programs are any type of informal program that happens after-school and within or outside of school property. The programs can be administered by the school or by community-based organizations (Dryfoos, 1999 ). In this analysis, we define after-school programs as programs administered by the school.

Summer Enrichment Programs and Weekend School Camps

Summer camps and weekend school camps represent extracurricular educational endeavors that occur beyond the confines of the regular academic calendar, particularly during weekends and summer (Young et al., 2017 ). In the context of our investigation, we defined summer enrichment programs and weekend school camps as initiatives with a distinct focus on Science, Technology, Engineering, and Mathematics (STEM). Summer enrichment programs have been associated with accelerated learning trajectories including early graduation (Berliner, 2009 ), while concurrently catering to the educational needs of culturally and linguistically diverse students, as well as those hailing from economically disadvantaged backgrounds (Keiler, 2011 ; Matthews & Mellom, 2012 ).

However, discrepant to these positive associations, Young et al. concluded a lack of significant moderation effect exerted by summer camps on student interest in STEM disciplines. Concurrently, empirical inquiry into the impact of weekend school camps remains notably scarce, thus creating a paucity in the research space to investigate how these activities support interest in and attitudes toward STEM.

Outreach Programs

We define outreach programs as any STEM activity that is organized by an outside institution. There can be various formats to the program, from lecture-style outreach where a visiting professor will talk to a group of students about STEM, to a field trip to a university where STEM career pathways are highlighted (Tillinghast et al., 2020 ). Limited studies have measured the effects of these programs on students' interests and attitudes toward STEM, highlighting a need for research expansion, especially using meta-analysis techniques.

Field Trips and Virtual Museums

Alternative forms of informal science program engagements include field trips and virtual museum experiences. We delineate field trips and virtual museums as occasions wherein students embark on guided tours of STEM facilities, either physically or virtually, during school hours but outside the structured curriculum timeframe. These experiences have the potential to influence students' academic achievement and foster interest in STEM careers. For instance, secondary school students who interacted with scientists during field trips demonstrated heightened awareness regarding STEM career pathways (Jensen & Sjaastad, 2013 ). Furthermore, research indicates that students exhibited enhanced performance in mathematics subsequent to participating in a field trip to the New York Hall of Science (Alliance, 2011 ). Notably, despite these findings, there remains a paucity of systematic investigations utilizing meta-analytical approaches to scrutinize the collective impact of such field trip experiences.

Mobile Labs

The last type of informal science experience that we analyzed was the use of mobile labs. Mobile labs are mobile vehicles and buses that transport STEM labs for students to experience hands-on science at their schools. They first became popular in the late-1990s and currently have 29 member programs in 17 different states (Jones & Stapleton, 2017 ) .

Mixed Studies

Some programs are a combination of informal experiences. When we could not categorize an informal experience as one separate group, we analyzed the effect of the program in a mixed category.

Recent empirical scholarship focused on the association of students’ grade level and their interest and attitude towards STEM has produced conflicting conclusions. Scholars indicated a notable trend whereby students' attitudes and perceptions towards science and STEM disciplines diminish with increasing age as they progress through their educational schooling (George, 2000 ; Morrell & Lederman, 1998 ; Murphy & Beggs, 2003 ; Osbourne et al., 2003 ; Silver & Rushton, 2008 ). Additionally, these scholars underscored a discernible decline in student interest and enjoyment of science from intermediate to high school (George, 2000 ; Morrell & Lederman, 1998 ). Conversely, research conducted by Maltese and Tai ( 2011 ) revealed a positive association between eighth-grade students' perceptions of science's utility for their future and their likelihood of pursuing STEM degrees. Additionally, Sadler et al. ( 2012 ) established that students' career aspirations in STEM fields upon entering high school emerged as robust predictors of their vocational interests upon completing high school.

In summary, student attitudes towards STEM and their STEM career inclinations undergo a dynamic evolution throughout their elementary and secondary educational experiences. Once students reach early high school, research indicates that students’ interests and attitudes towards STEM solidify and can become predictors of their career choices later in college. Leveraging meta-analytic methodologies, our study aimed to augment the existing body of research by exploring the differential impacts of elementary and secondary education levels on students' overarching attitudes and interests in STEM disciplines as a result of their informal STEM experiences.

Scholarship investigating how students’ attitudes and interests might change when they study Science, Technology, Engineering, and Mathematics (STEM) as individual subjects and in conjunction, represents a research space that has limited conclusions. Looking at STEM experiences in conjunction, Wiebe et al. ( 2018 ) concluded that there was a reciprocal relationship between students' STEM experiences and the development of specific STEM career aspirations. Notably, success in mathematics (independent of STEM) was positively associated with the likelihood of students pursuing advanced education in STEM fields (Wang, 2012 ). Informal STEM learning activities have been shown to affect students’ self-efficacy related to mathematics (Jiang et al., 2024 ). Likewise, heightened self-efficacy related to science among students augments the propensity for embarking upon a career trajectory within the STEM domain. Additionally, throughout high school, an elevated interest in STEM is associated with a student's sustained commitment to undertaking advanced coursework in both mathematics and science (Simpkins et al., 2006 ). Overall, when students have positive experiences in STEM both interdisciplinary and individually, these experiences have a positive association with their STEM career interests and higher education aspirations.

Most STEM research focuses on science and mathematics, neglecting technology and engineering's impact on student attitudes and interest. This opens up a paucity in this research space to investigate the moderating effect of STEM as an entire discipline, and science, technology, engineering, and mathematics as separate subjects as they are associated with students' interests and attitudes towards informal science experiences.

Learning Outcomes

As delineated earlier, we operationalized attitude in four distinct facets: interest in STEM, self-efficacy related to STEM, attitude towards STEM, and STEM career interests. Interest in STEM pertains to the emotions and sentiments pertaining to the process of learning STEM subjects (Zhang & Tang, 2017 ). Self-efficacy, on the other hand, revolves around students' convictions regarding their competencies to excel in STEM endeavors, as well as their perseverance in persisting within STEM domains (Bandura, 1995 ). Attitude towards STEM signifies individuals' perceptions regarding the inherent value and significance of STEM disciplines (Dowey, 2013 ). Lastly, STEM career interest denotes students' proclivity towards prospective careers within STEM domains (Maltese & Tai, 2011 ).

Publication Type

Within the realm of meta-analysis, publication type stands out as a prominent moderator variable. Our study notably incorporates both peer-reviewed journal articles and dissertations. It is well-established that studies reporting statistically significant findings are more likely to be published compared to those reporting non-significant results, a phenomenon commonly termed publication bias (Rosenthal, 1979 ). The presence of heterogeneous findings across studies may arise from differences in publication types rather than alternative moderator variables. Consequently, we systematically examined publication type as a potential moderator variable within this meta-analytic inquiry, classifying it into two categories: "journal article" and "thesis_dissertation," with the latter encompassing both theses and dissertations.

Publication Year

Given the dynamic nature of education policies spanning from 1992 to 2022, exemplified by the publication of the Next Generation Science Standards (NGSS), the formats and instructional methodologies employed in science education have undergone significant transformations over time (NRC, 2011 ). Consequently, these alterations can change long-term attitudes towards and interest in STEM. Considering the possible changes in STEM education over the past 30 years, we included publication year as a potential moderator.

Aim of the Meta-analysis

Informal science learning plays an important role in science learning and contains various formats. Nonetheless, the effect of informal science learning in general, and the possible difference among different informal science learning settings in particular, have not been fully examined. The purpose of this meta-analysis is to explore whether informal science education is effective in increasing students’ learning interests and attitudes toward STEM. The following research questions were explored:

What is the overall effect size of informal science learning experiences on students' attitudes towards and interest in STEM?

How do various moderating factors, including the type of informal learning experience (such as museum visits, out-of-school programs, after-school activities), student grade level, academic subjects (science, technology, engineering, mathematics, or the broader STEM domain), type of publication (dissertation or peer-reviewed journal article), and publication year, impact student attitudes and interests in STEM?

Literature Search for Primary Studies

In this study, we sought primary studies from Proquest, EBSCO, Web of Science, and ScienceDirect. We concentrated on empirical studies exploring the effect between informal science education and K-12 students' interest in science. This encompassed peer-reviewed journal articles as well as unpublished dissertations or conference papers. Keywords used were as follows: (“Interest” OR “attitude”) AND (“out-of-school” OR “informal” OR “after school”) AND (“science education”). In addition, we reviewed articles that were cited in a previous meta-analysis (Young et al., 2017 ) and examined each article for potential addition to our analysis using Google Scholar (scholar.google.com). Articles were filtered by timeframe (1992–2022), language (English), and location (United States). In this study, we were interested in the effects of informal science programs within the United States. We conducted our search using the keywords individually or in various combinations and did not restrict the publication status, hence we can have both gray literature and journal articles.

Inclusion and Exclusion Criteria

For inclusion in this meta-analysis, we listed the criteria that studies must satisfy below:

A study necessitates the exploration of an informal science learning setting wherein explicit documentation of informal activities is provided. The study should align with the established criteria for informal learning activities as mentioned previously. The criteria were based on those used by Hofstein and Rosenfeld ( 1996 ).

A study was a quasi-experimental study, with an experimental group or groups (e.g., groups of students involved in an informal science learning program or activities) and a control or comparison group (business as usual or not involved in an informal science program or activities), respectively. Studies that lacked a control/comparison group, wherein participants did not engage in informal science learning programs, were systematically excluded from the analysis. For instance, Knapp and Barrie ( 2001 ) compared two field trips to a science center, and Wilson et al. ( 2012 ) tested the effectiveness of two versions of a film for part of the science center planetarium demographic to compare children's learning and attitude changes in response to films. These studies lacked a control group, and were not included in this study.

A pretest–posttest design was included in the study, typically involving collecting data from participants at two distinct time points: before the implementation of an informal science learning intervention or treatment (pretest) and after the intervention or treatment had been administered (posttest). The studies involve the same group of participants being measured on the same variables at both the pretest and posttest stages. Studies that did not collect the pretest data were excluded from this study.

A study had to include students’ interests, attitudes, and/or self-efficacy as outcomes. In our meta-analysis, we define attitude towards STEM in four ways (as mentioned above): interest, self-efficacy, attitude toward STEM, and career interests, as mentioned above. Studies that investigated the impacts of informal science learning on achievement were excluded from our study.

To be eligible for inclusion, a study had to provide sufficiently detailed quantitative data that facilitated the calculation and extraction of the relevant relationship as an effect size. This criterion ensured that the selected studies provide the necessary information required for effect size estimation, thereby enabling a rigorous analysis of the relationship under investigation. If a study only contained a qualitative interview for analysis, we did not include this study in our analysis because we could not calculate effect sizes.

The present study exclusively focused on samples comprising students from grades K-12, excluding college students or other participant groups.

A study must be published or available in English.

Study Selection

As illustrated in a PRISMA flowchart (Fig.  1 ), our initial search yielded 1042 potentially relevant studies as previously used by Maltese and Tai ( 2011 ). We included studies from published journal articles, and gray literature which included conference papers, and theses/dissertations. On the first round of screening, we removed 291 duplicate articles, hence 751 studies remained. Secondary round screening was to review the title and abstract of each article and determine if the study focused on the impact of informal science education on students' interests and attitudes. At this stage, 192 studies were reminded for further full-text examination.

figure 1

Flowchart of the inclusion and exclusion in the meta-analysis

During the third round of screening, a comprehensive review of all articles was conducted independently by both the first and second authors to assess the eligibility of the 192 studies against the pre-established inclusion and exclusion criteria. Although an initial discrepancy emerged between the two reviewers, resolution was achieved through deliberation among the entire research team to determine the final inclusion status of each study.

As is common in many meta-analytic reviews, among the included primary studies, some, or even many, contain multiple effect sizes due to the following reasons: a) testing students’ attitudes and interests in different subjects (e.g. science, mathematics, technology, or engineering, etc.); b) measuring students’ attitudes in various dimensions (e.g. attitudes towards STEM content learning, attitudes towards STEM career). Subsequently, a total of 19 studies, comprising 68 effect sizes, were deemed to meet the predetermined inclusion criteria and were consequently incorporated into the meta-analysis. The cumulative sample size across these 19 studies amounted to 6160 participants.

Coding Process

Two authors of the article finished coding. Initially, the two coders collaborated in coding 25% of the primary studies together, which was the “trial coding” phase. The discordances encountered between the two coders were effectively addressed through a process of deliberation undertaken within the research team, whereby the identified issues observed during the coding procedure were thoroughly examined and subject to comprehensive discussion. Following the initial trial coding phase, the research team proceeded to establish a definitive coding table. Subsequently, two coders independently conducted coding on the remaining primary studies. Upon completion of the final coding phase, no significant disparities were observed, although minor variances were addressed through collaborative discussion.

Coding of Study Characteristics as Variables

Information from each study was coded about informal science learning program characteristics, student sample, publication year, and publication type as detailed below.

Studies were conducted on informal science learning in different formats. We divided the type of programs into eight categories: after-school, summer camp, outreach program, weekend school camp, field trip or a virtual museum, mobile lab, and mixed program. While studies contained more than one category of informal activities, we coded them as mixed programs.

Four categories of grades were coded: elementary (grade K-5), middle (grade 6–8), high (grade 9–12), and mixed grade in which studies included participants across school levels.

Research studies focusing on the examination of students' learning interests or attitudes in individual subjects, including science, mathematics, engineering and technology, and in STEM as a whole, were specifically considered in this analysis. Only those studies that encompassed all four domains of science, mathematics, engineering, and technology were coded and categorized as STEM. In this study, primary studies examined engineering and technology together, separate from science and mathematics. Therefore, we included an engineering and technology category.

Studies were categorized into four outcomes: self-efficacy, attitude, interest (i.e., in subject content), and career interest (i.e., in STEM careers).

Another frequently utilized moderator variable in meta-analysis is Publication type (Cai et al., 2017 ). In this study, we coded studies into two categories: journal articles or thesis/dissertations.

The last moderator in this study is the publication year that recorded the year of studies that were published.

Procedures of Meta-analytic

Calculation of effect size.

As described previously, the studies included in this meta-analytic review were based on the inclusion criteria mentioned above, and these studies had pre- and post-measures of interest/attitude of STEM subjects, thus providing interest/attitude change scores between pre- and post-measures from both experimental and control groups. Such a design was described as pretest–posttest-control (PPC) in the literature (Morris, 2008 ), which could be either experimental design with randomized assignment, or quasi-experimental design without randomized assignment (Morris, 2008 ). Becker ( 1988 ) presented an effect size measure for such PPC design, which was essentially the difference of standardized mean change scores between the two groups (treatment vs. control). Built upon Becker’s work on the effect size measure for the PPC design, Morris ( 2008 ) provided an in-depth discussion and empirical assessment about several alternative forms of effect size measures in such a pretest–posttest-control (PPC) design. The empirical findings in Morris ( 2008 ) showed that one effect size measure (labeled d ppc2 in Morris, 2008 , p. 369), has better and more robust performance than other alternatives. Guided by the empirical findings and conclusions of Morris ( 2008 ), in the current meta-analytical review, we used d ppc2 as the effect size measure between the students in an informal science learning program vs. those with no informal science learning. d ppc2 is based on the difference of standardized change scores (i.e., posttest score – pretest score) of the two groups, and it is conceptually equivalent to Hedges’ g (Hedges & Olkin, 1985 ). The technical details of d ppc2 and other alternative forms are available in Morris ( 2008 ). For the sake of simplicity for our readers, in the following, we will use the well-known g , instead of d ppc2 , in our presentation and discussion.

If a study did not directly provide the components needed for calculating effect size as described above, but provided sufficient other statistics (e.g., t-statistic, F-statistic, odds ratio, etc.) that allowed us to obtain the effect size measure based on available conversion formula in the literature (e.g., Hedges & Olkin, 1985 ), the effect size was obtained. In this context, a positive value of g was interpreted as an indication of improved performance by the treatment group of students who participated in an informal science learning program over the control group of students.

Random Effect Meta-Analysis Model

Some primary studies included in this meta-analysis presented multiple effect sizes within one study, thus, we used a random-effects model for analyzing the effect sizes. A random-effects model assumes that the effects of the variables are random and can vary across studies, depending on different study conditions. It is suitable when there is unobserved heterogeneity between the units or individuals. The weighting scheme in the random-effects model incorporates within-study variance alongside a constant value ( T 2 ), representing between-study variance, reducing the relative differences among the weights. Consequently, a random-effects model promotes a more balanced distribution of relative weights across studies compared to a fixed-effect model (Borenstein et al., 2010 ). This study reported and interpreted the weighted average effect sizes, confidence intervals (lower and upper limits), and z-test results. We used the R statistical platform (R Core Team, 2019 ) with Viechtbauer’s ( 2010 ) metafor package for analysis. The random effect coefficients were estimated using the maximum likelihood estimation method. Q statistics addressed the homogeneity of effect sizes.

Moderator Analysis

When Q statistics showed statistical heterogeneity across the studies, moderator analyses were conducted to test the potential study features that could have contributed to the inconsistency among the study's effect sizes. This study extracted four categorical variables (i.e., type of program, grade, disciplines, and type of publication) and one continuous variable (i.e., publication year). Significant moderation effects were assessed through the utilization of two distinct approaches. One is the omnibus test for categorical moderators which was employed to determine the presence of statistically significant moderation effects when dealing with categorical moderators, while the other is the slope analysis for the continuous moderator, which was utilized to identify significant moderation effects (Cai et al., 2022 ).

Publication Bias

Publication bias poses a significant challenge to the validity of meta-analytic findings. Research demonstrating large effect sizes or statistically significant findings may have a higher likelihood of being published and consequently included in a meta-analysis compared to studies with small effect sizes or statistically nonsignificant results (Mao et al., 2021 ; Rosenthal, 1979 ). As publication bias has the capacity to distort estimations of the true effect being investigated, it poses a pervasive challenge when conducting a meta-analysis (Thornton & Lee, 2000 ). To address the publication bias, this study used the funnel plot and Egger’s regression to assess potential publication bias. As Rothstein et al. ( 2005 ) discussed, the absence of bias can be inferred when the funnel plot demonstrates a symmetrical distribution. Egger et al. ( 1997 ) proposed a linear regression method for evaluating publication bias, which a p-value greater than 0.05 in this test suggests the absence of publication bias.

Viechtbauer and Cheung ( 2010 ) demonstrated that meta-analyses need to include influential case diagnostics to identify outliers or extreme effect sizes and separate them from the rest of the data. We used Cook's distances (Fig.  2 ), DFBETAS (Fig.  3 ), and standardized deleted residuals (Fig.  4 ) to detect the potential outliers. As Figs.  2 , 3 , and 4 showed below, two effect sizes (i.e., the 30th and 58th effect sizes) were identified as influential outliers. After excluding the influential outlier, the revised pooled effect size was determined to be 0.21 (95% CI [0.12, 0.30], p  < 0.001), demonstrating close similarity to the previous pooled effect size of 0.21 (95% CI [0.10, 0.32], p  < 0.001). Sensitivity analysis showed that the overall effect size remained virtually unchanged even after removing influential outliers. Therefore, we can conclude that including influential outliers did not change the main results of our meta-analysis.

figure 2

Cook’s distances of the effect sizes

figure 3

EFBETA values of effect sizes

figure 4

Standentized deleted residuals of effect sizes

We include independent studies in the analysis that spanned from 1997 to 2022. A total of 19 studies were incorporated, yielding a comprehensive set of 68 effect sizes. Data sheet can be provided by contacting the corresponding author. These effect sizes exhibited a diverse range, extending from -1.15 to 1.59, with a median value of 0.17. The magnitude of the effect size and the accuracy of its estimation differ. The majority of effect sizes were positive (81%), while 10 effect sizes were negative and three were close to zero.

The Q test was statistically significant (As shown in Table  1 ), indicating significant heterogeneity across the effect sizes ( Q ( df  = 67) = 1663.24, p  < 0.001), with a total heterogeneity ( I 2 ) of 91.63%. This means that the variation in effect sizes across the studies cannot be explained solely by sampling error (Borenstein et al., 2017 ). We can reject the null hypothesis that the true effect sizes are homogeneous. The true effectiveness of informal science programs appears to differ across the studies.

As shown in Table  1 , the estimated effect size of 0.21 with a 95% confidence interval ranging from 0.10 to 0.32 suggests a statistically significant effect. Kraft ( 2020 ) put forth a comprehensive framework aimed at elucidating effect sizes within the context of educational interventions targeting the academic achievement of pre-K-12 students. Specifically, the author delineated the categorization of effect sizes based on their magnitude and argued that effect sizes below 0.05 donate a small effect size, while effect sizes ranging from 0.05 to 0.2 represent a medium effect size, and effect sizes surpassing the threshold of 0.2 were identified as indicative of a large effect size. Therefore, interpreted within this framework, 0.21 in our study was characterized as a statistically significant medium to large effect size in terms of the overall effect of informal science learning programs on students' interests or attitudes in STEM.

“Forest plot” is a graphic tool that presents the effect sizes of each study and the overall effect size derived from random-effect modeling in one graph. Figure  5 presents all 68 effect sizes. The position of the square dot along the horizontal axis represents the estimated effect size for a given study and the size of the dot indicates the weight or precision of the study’s estimate. Confidence intervals (CI) were presented by the horizontal bars extending from the square dot with two ends where the bar's length shows the confidence interval's width. In addition, the overall effect size is located below all individual effect sizes in a diamond shape, and its width represents the 95% confidence interval around the summary estimate. The vertical dot line is the “null” effect, meaning that there is zero effect by informal science learning activities on students' learning interests and attitudes. Since the “null” effect line did not cross the diamond shape, we concluded that the overall “effect” was statistically significant.

figure 5

Forest plot

Specifically, in Fig.  5 , the predominant trend manifests as positive outcomes, with a notable presence of 55 positive effect sizes positioned to the right of the "null" effect line. Conversely, 10 effect sizes exhibited negative trends, while three were proximate to zero. It is noteworthy that certain effect sizes demonstrated exceptionally broad confidence intervals, as evidenced by Garvin ( 2015 ) and Parker and Gerber ( 2000 ), while others exhibited narrower confidence intervals, exemplified by Crawford and Huscroft-D’Angelo ( 2015 ) and Roberson ( 2010 ). The sample size associated with an effect size affected the confidence interval width, with smaller sample sizes associated with wider confidence intervals. Consequently, in order to mitigate against the potential influence stemming from variations in confidence intervals on the overall outcomes of the study, it was necessary to assign appropriate weights to the effect size within a meta-analysis. The accumulated overall effect size is based on weighted individual effect sizes across the studies, with effect sizes from larger samples weighted more than the effect sizes from smaller samples.

Moderators Analysis

We conducted random effects analysis to explore how moderators influence the effects of informal science learning programs. Table 2 summarizes the results, and the following discussion will address each moderator separately below.

The effect of informal science learning was found to be moderated by the type of program. As the omnibus test show, different types of the program explained the effect-size heterogeneity was statistically significant ( Q moderators ( df  = 7) = 31.80, p  < 0.0001), indicating that the effect sizes from the studies based on different programs could differ statistically. Under this moderator of Type of Program , Outreach Program stood out as showing the largest average effect size of 0.72, which is also statistically significant (Mean g  = 0.72; 95% CI [0.43, 1.02]; p  = 0.0043). Studies under other types of programs showed smaller effect sizes in general, although all positive.

Similar to the results for program type above, the grade was also a statistically significant moderator ( Q moderators ( df  = 4) = 30.48, p  < 0.0001). The studies involving middle school students (Mean g  = 0.43; 95% CI [0.12, 0.59]; p  = 0.0033) and high school students (Mean g  = 0.42; 95% CI [0.15, 0.69]; p  = 0.0024) showed larger effect sizes than those involving elementary school students (Mean g  =   −  0.24; 95% CI [ −  0.77, 0.29]; p  = 0.3657) and mixed grade students (Mean g  = 0.08; 95% CI [ −  0.06, 0.21]; p  = 0.2490).

We found a statistically significant moderating effect of subjects ( Q moderators ( df  = 4) = 19.53, p  = 0.0006). This suggests that the studies that examined interests or attitudes toward science (Mean g  = 0.14; 95% CI [0.01, 0.26]; p  = 0.0318), mathematics (Mean g  = 0.44; 95% CI [0.04, 0.84]; p  = 0.03), technology and engineering (Mean g  = 0.48; 95% CI [0.06, 0.90]; p  = 0.03), or full STEM (Mean g  = 0.36; 95% CI [0.05, 0.68]; p  = 0.02), although all having statistically significant effect sizes positively related to informal science activities, had statistical variations in their respective effect sizes related to the subject areas, with those studies focusing on interests or attitudes toward science showing smaller effect sizes.

Similarly, we found a significant moderating effect of outcome types ( Q moderators ( df  = 4) = 22.08, p  = 0.0002). More specifically, the studies that examined self-efficacy and attitude showed larger effect sizes than those that examined interest or career interest. However, in regard to self-efficacy, there was only one article with two effect sizes that addressed this outcome. For this reason, the reliability of the finding is questionable because of the small sample size, and caution is warranted for the interpretation of this sub-group finding. In general, the finding suggests that the informal science learning activities showed a significant positive effect on students’ attitudes toward STEM areas.

The dataset used in this meta-analysis consisted of 38 effect sizes from journal articles, and 30 effect sizes from dissertations or theses. The results revealed a statistically significant effect of the publication type moderator ( Q moderators ( df  = 2) = 22.50, p  < 0.0001), indicating publication type explains heterogeneity in the observed effect sizes. Specifically, the effect sizes derived from journal articles exhibited a larger magnitude (Mean g  = 0.34; 95% CI [0.205, 0.48]; p  < 0.0001) compared to those obtained from dissertations or theses (Mean g  = 0.05; 95% CI [ −  0.10, 0.21]; p  = 0.4913).

No statistically significant difference was revealed in terms of the effect size heterogeneity of publication year ( Q moderators ( df  = 1) = 2.80, p  = 0.09). The slope of publication year was statistically non-significant ( \(\beta\) =  −  0.01; 95% CI [ −  0.03, 0.00]; p  = 0.0941), suggesting that the variable of publication year did not have a significant impact on the effect sizes.

To assess the potential publication bias, we used both the funnel plot and Egger’s regression method (Egger et al., 1997 ) in this study. As Fig.  6 shows, the funnel plot had an approximately symmetrical distribution, which indicates a general absence of publication bias. In addition, Egger’s regression test (p = 0.28) is not statistically significant, indicating a lack of evidence for publication bias.

figure 6

Funnel plot for the effect sizes

This meta-analysis served as a comprehensive summary of quantitative studies carried out within the context of informal science learning in the United States. Building upon previous meta-analytic research, we aimed to conduct a systematic meta-analysis to examine the impact of informal science learning by focusing on changes in interests or attitudes toward STEM areas before and after students participated in informal science activities. More specifically, we compared the interest or attitude change scores across treatment groups (i.e., with informal science learning activities) and control groups (i.e., without informal science learning activities). By addressing these aspects, our meta-analysis provides valuable insights into the role of informal science learning and its effects on students' interests and attitudes toward STEM areas.

Specifically, the random effect modeling revealed the statistically significant overall mean effect size (Mean g  = 0.21), indicating that informal science learning opportunities had a positive effect on students' STEM interests, and this positive effect, as discussed in Kraft ( 2020 ), could be characterized as a medium to large effect. The result was consistent with previous funding by Young et al. ( 2017 ) and consistent with empirical findings on informal science learning toward STEM interests (e.g., Crawford & Huscroft-D’Angelo, 2015 ; Havasy, 1997 ; Yang & Chittoori, 2022 ).

The statistically significant outcomes of the heterogeneity test conducted among the effect sizes revealed notable statistical diversity in the impacts of informal science learning opportunities on students' attitudes and interests in STEM across the encompassed studies. Consequently, an examination of potential moderators was initiated to address the second research question. The significant moderators include the type of informal science program, grade level, publication type, and sample size. However, due to our small sample size in this meta-analysis, we highlight that the overall effect of informal STEM programming was large (0.21) and had a significant impact on students' interests and attitudes toward STEM, rather than focusing on the differences between moderators. The heterogeneity could be due to the program’s focus. Laurer et al. ( 2006 ) concluded that informal program focus was a significant moderator of mathematics achievement among at-risk youth. Since STEM interest has been associated with achievement (Maltese & Tai, 2011 ), it is important to determine how program focus affects student interest in STEM (Young et al., 2017 ). Shaby et al. ( 2024 ) pointed out the importance of pedagogical design of Laboratory Group Activity in a Science Museum for students’ interaction and learning outcomes. The various pedagogical designs of the same type of program might result in different effects. Future research in program focus might uncover why there is a difference between different types of informal science experiences.

Grade level emerged as a significant moderator on student interest in and attitudes toward STEM. However, the available data are insufficient to draw a statistical conclusion regarding the disparity between grade levels. Specifically, only a very limited number of studies conducted at the elementary school level were available for this meta-analytic review, with only 2 out of 19 studies involving elementary students. Nevertheless, there is accumulating evidence suggesting that early engagement in STEM learning, particularly during elementary school, may yield long-term benefits for sustained interest in STEM (Curran & Kitchin, 2019 ; Morgan et al., 2016 ). The scant representation of elementary-level studies in the research literature underscores a notable gap in the research landscape, emphasizing the necessity for further investigation to elucidate the impact of informal STEM experiences on elementary school students.

Although there was a lack of evidence for publication bias for studies included in this meta-analysis, the type of publications (i.e., journal articles vs. thesis/dissertations) showed different magnitude of effect sizes. Publication bias is influenced by various factors such as language bias, time lag bias, and selective reporting. The publication bias test as implemented in this study may not be sufficiently sensitive to capture all aspects of bias. Therefore, it is possible for the publication bias test to indicate no bias while the moderator test reveals an impact of publication type.

Conclusions and Future Directions

Students acquire STEM-related knowledge through formal and informal education experiences (Goldstein, 2015 ). Formal STEM experiences are part of the national curriculum provided to the students in their schools. Researchers can gauge the effectiveness of formal STEM programs by tracking students’ progress in classrooms and with national testing data. However, it is more difficult to assess the impact of informal science learning experiences, since they happen outside of school. Based on this meta-analysis, informal STEM experiences have a positive effect on students' interest in and attitudes toward STEM, and should be incorporated into students’ educational experiences.

Despite our exhaustive efforts in locating relevant studies conducted in a span of thirty years (1992–2022), we were only able to find a very limited number of studies that quantitatively assessed the effects of informal science learning on students’ interest and attitude for science and STEM subjects. The majority of studies in this area were not appropriate for our quantitative synthesis because of different reasons: lack of sufficient information for effect size calculations for meta-analysis; studies of qualitative approaches that did not have quantifiable data on STEM interest changes; quantitative studies with only one-time data collection, but could not be used to assess the effect of informal science learning on students’ interests/attitude for science/STEM. Future research may consider these and other similar issues for designing methodologically rigorous quantitative studies that examine how informal science learning experiences contribute to students’ interest/attitude for STEM.

Limitations

Several limitations were identified during the evaluation and analysis process of this meta-analysis. During the data collection process, several articles were not available for the researchers to review, due to the internal restrictions of our database access and limitations from our university library. Another limitation is the conflicting results of our publication type analysis and publication bias assessment. Publication type was used as a moderator in our analysis, and we concluded that journal articles had a statistically significant effect size when compared to dissertation articles. This contradicted our publication bias assessment results, which suggested a lack of sufficient statistical evidence for publication bias, and this could be a limitation of this analysis. But the overall very limited number of studies ( N  = 19) available for this meta-analysis made it difficult to explore this issue in a more meaningful way. Because of these considerations, caution is warranted in the interpretation of the findings related to this issue.

Small sample size (in terms of both the number of studies and number of effect sizes) is a general limitation in this meta-analysis, especially for some moderator analysis. Obviously, this meta-analysis, like other synthesis studies, is at the mercy of what is available in the research literature. Because of the small sample size issue, we need to exercise caution when interpreting the findings, especially the findings in the moderator analysis as shown in Table  2 . As discussed above, future research may further examine some of the issues revealed in the study.

We did not include some other possible variables, such as dosage, duration, gender, race, school type, or sampling method. Although these could be potential moderators, the lack of relevant information in the primary studies made it impossible for us to consider these in the meta-analysis. Furthermore, the interaction effects of moderator variables on the effect of learning interest and attitudes were not addressed in this study, because the limited sample size made it statistically impractical to conduct interaction analysis.

In addition, we were not able to explore what components of different types of programs actually had an effect on students’ interests and attitudes. Previous studies revealed that hands-on activities and challenge assessments could enhance students' interest and motivation (Hamari et al., 2016 ; Parsons & Taylor, 2011 ; Poudel et al., 2005 ). It is possible that the outreach program included in this meta-analytic study contained more challenging activities, which might explain that such studies' larger effect size than some other studies. But lack of relevant information in the primary studies included in our meta-analysis made it impossible for us to explore such potentially relevant issues.

Data Availability

The datasets used in this meta-analysis are derived from publicly available sources (Proquest, EBSCO, Web of Science, and ScienceDirect) and previously published studies. The references and citations for all included studies are provided in the reference list with the asterisk accompanying this publication.

*: A primary study with effect size(s) included in this meta-analysis.

Alliance, A. (2011). STEM learning in afterschool: An analysis of impact and outcomes . Afterschool Alliance .

Bandura, A. (1995). Exercise of personal and collective efficacy in changing societies (A. Bandura, Ed.). Cambridge University.

Becker, B. J. (1988). Synthesizing standardized mean-change measures. British Journal of Mathematical and Statistical Psychology, 41 , 257–278.

Article   Google Scholar  

Beier, M., & Rittmayer, A. (2008). Literature overview: Motivational factors in STEM: Interest and self-concept . Assessing Women and Men in Engineering.

Berliner, D. C. (2009). Poverty and potential: Out-of-school-factors and school success . Education and the Public Interest Center, University of Colorado/Education Policy Research Unit, Arizona State University. Retrieved October 26, 2015, from http://epicpolicy.org/publication/poverty-and-potential

Bevan, B., Dillon, J., Hein, G. E., Macdonald, M., Michalchik, V., Miller, D., Root, D., Rudder-Kilkenny, L., Xanthoudaki, M., & Yoon, S. (2010). Making science matter: Collaborations between informal science education organizations and schools . Center for Advancement of Informal Science Education.

Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. R. (2010). A basic introduction to fixed-effect and random-effects models for meta-analysis. Research Synthesis Methods, 1 (2), 97–111. https://doi.org/10.1002/jrsm.12

Borenstein, M., Higgins, J. P., Hedges, L. V., & Rothstein, H. R. (2017). Basics of meta-analysis: I^2 is not an absolute measure of heterogeneity. Research Synthesis Methods, 8 (1), 5–18.

Business-Higher Education Forum (2011). Creating the workforce of the future: The STEM interest and proficiency challenge . BHEF Research Brief. Business-Higher Education Forum.

Cai, Z., Fan, X., & Du, J. (2017). Gender and attitudes toward technology use: A meta-analysis. Computers & Education, 105 , 1–13.

Cai, Z., Mao, P., Wang, D., He, J., Chen, X., & Fan, X. (2022). Effects of scaffolding in digital game-based learning on student’s achievement: A three-level meta-analysis. Educational Psychology Review, 34 (2), 537–574.

Crane, V. (1994). Informal science learning: What the research says about television, science museums, and community-based projects . Research Communications, Limited.

*Crawford, L., & Huscroft-D’Angelo, J. (2015). Mission to space: Evaluating one type of informal science education. The Electronic Journal for Research in Science & Mathematics Education , 19 (1). https://ejrsme.icrsme.com/article/view/13753

Curran, F. C., & Kitchin, J. (2019). Early elementary science instruction: Does more time on science or science topics/skills predict science achievement in the early grades? AERA Open, 5 , 1–18. https://doi.org/10.1177/2332858419861081

DeCoito, I. (2014). Focusing on science, technology, engineering, and mathematics (STEM) in the 21st century. Ontario Professional Surveyor, 57 (1), 34–36.

Google Scholar  

Dowey, A. L. (2013). Attitudes, Interests, and Perceived Self-Efficacy toward Science of Middle School Minority Female Students: Considerations for Their Low Achievement and Participation in STEM Disciplines (Unpublished doctoral dissertation). University of California.

Dryfoos, J. G. (1999). The role of the school in children’s out-of-school time. The Future of Children, 9 (2), 117–134.

Egger, M., Smith, G. D., Schneider, M., & Minder, C. (1997). Bias in meta-analysis detected by a simple, graphical test. BMJ, 315 (7109), 629–634. https://doi.org/10.1136/bmj.315.7109.629

Eshach, H. (2007). Bridging in-school and out-of-school learning: Formal, non-formal, and informal education. Journal of Science Education and Technology, 16 , 171–190.

Falk, J. H., & Dierking, L. D. (2010). The 95 percent solution. American Scientist, 98 (6), 486–493.

Frantz, T. D., Miranda, M., & Siller, T. (2011). Knowing what engineering and technology teachers need to know: An analysis of pre-service teachers’ engineering design problems. International Journal of Technology Design Education, 21 , 307–320.

*Garvin, B. A. (2015). An investigation of a culturally responsive approach to science education in a summer program for marginalized youth (Doctoral dissertation). University of South Carolina. Retrieved from ProQuest Dissertation Theses Global.

Gauld, C. F., & Hukins, A. A. (1980). Scientific attitudes: A review. Studies in Science Education, 7 (1), 129–161.

George, R. (2000). Measuring change in students’ attitudes toward science over time: An application of latent variable growth modeling. Journal of Science Education and Technology, 9 (3), 213–225.

Goldstein, D. (2015). The teacher wars: A history of America’s most embattled profession . Anchor.

Hamari, J., Shernoff, D. J., Rowe, E., Coller, B., Asbell-Clarke, J., & Edwards, T. (2016). Challenging games help students learn: An empirical study on engagement, flow and immersion in game-based learning. Computers in Human Behavior, 54 , 170–179.

*Havasy, R. A. D. P. (1997). The effect of informal science experiences on science achievement and attitude of high school biology students (Ed.D.). Teachers College, Columbia University. Retrieved from ProQuest Dissertation Theses Global.

Hedges, L. V., & Olkin, I. (1985). Statistical methods for meta-analysis . Academic press.

Hofstein, A., & Rosenfeld, S. (1996). Bridging the gap between formal and informal science learning. Studies in Science Education, 28 , 87–112.

Jensen, F., & Sjaastad, J. (2013). A Norwegian out-of-school mathematics project’s influence on secondary students’ STEM motivation. International Journal of Science and Mathematics Education, 11 (6), 1437–1461. https://doi.org/10.1007/s10763-013-9401-4

Jiang, H., Chugh, R., Turnbull, D., Wang, X., & Chen, S. (2024). Exploring the effects of technology-related informal mathematics learning activities: A structural equation modeling analysis. International Journal of Science and Mathematics Education , 1–21. https://doi.org/10.1007/s10763-024-10456-4

Jones, A. L., & Stapleton, M. K. (2017). 1.2 million kids and counting—Mobile science laboratories drive student interest in STEM. PLoS biology, 15 (5), e2001692.

Keiler, L. S. (2011). An effective urban summer school: Students’ perspectives on their success. The Urban Review, 43 (3), 358–378.

Knapp, D., & Barrie, E. (2001). Content evaluation of an environmental science field trip. Journal of Science Education and Technology, 10 (4), 351–357. https://doi.org/10.1023/A:1012247203157

Kraft, M. A. (2020). Interpreting effect sizes of education interventions. Educational Researcher, 49 (4), 241–253. https://doi.org/10.3102/0013189X20912798

Krishnamurthi, A., & Rennie, L. J. (2013). Informal science learning and education: Definition and goals. Afterschool Alliance.

Laurer, P. A., Akiba, M., Wilkerson, S. B., Apthorp, H. S., Snow, D., & Martin-Glen, M. L. (2006). Out-of-school-time programs: A meta-analysis of effects for at-risk students. Review of Educational Research, 76 (2), 275–313.

Maiorca, C., Roberts, T., Jackson, C., Bush, S., Delaney, A., Mohr-Schroeder, M. J., & Soledad, S. Y. (2021). Informal learning environments and impact on interest in STEM careers. International Journal of Science and Mathematics Education, 19 , 45–64.

Maltese, A. V., & Tai, R. H. (2011). Pipeline persistence: Examining the association of educational experiences with earned degrees in STEM among U.S. students. Science Education, 95 (5), 877–907.

Mao, P., Cai, Z., He, J., Chen, X., & Fan, X. (2021). The relationship between attitude toward science and academic achievement in science: A three-level meta-analysis.  Frontiers in psychology ,  12 , 784068.

Matthews, P. H., & Mellom, P. J. (2012). Shaping aspirations, awareness, academics, and action outcomes of summer enrichment programs for English-learning secondary students. Journal of Advanced Academics, 23 (2), 105–124.

*Migliarese, N. L. (2011). "That's my kind of animal!" Designing and assessing an outdoor science education program with children's megafaunaphilia in mind. (Doctoral dissertation). University of California, Berkeley. Retrieved from ProQuest Dissertation Theses Global.

Morgan, P. L., Farkas, G., Hillemeier, M. M., & Maczuga, S. (2016). Science achievement gaps begin very early, persist, and are largely explained by modifiable factors. Educational Researcher, 45 (1), 18–35.

Morrell, P. D., & Lederman, N. G. (1998). Student’s attitudes toward school and classroom science: Are they independent phenomena? School Science and Mathematics, 98 (2), 76–83.

Morris, S. B. (2008). Estimating effect sizes from pretest-posttest-control group designs. Organizational Research Methods, 11 (2), 364–386.

Murphy, C., & Beggs, J. (2003). Children’s perceptions of school science. School Science Review, 84 , 109–116.

National Research Council. (2007). Taking science to school: Learning and teaching science in grades K-8. National Academies Press.

National Research Council. (2009). Learning science in informal environments: People, places, and pursuits. National Academies Press.

National Research Council, Division of Behavioral, Board on Testing, Assessment, Board on Science Education, & Committee on Highly Successful Schools or Programs for K-12 STEM Education. (2011). Successful K-12 STEM education: Identifying effective approaches in science, technology, engineering, and mathematics . National Academies Press.

National Academies of Sciences, Engineering, and Medicine. (2016). Parenting matters: Supporting parents of children ages 0–8. The National Academies Press.

Osborne, J., Simon, S., & Collins, S. (2003). Attitudes towards science: A review of the literature and its implications. International Journal of Science Education, 25 (9), 1049–1079.

*Parker, V., & Gerber, B. (2000). Effects of a science intervention program on middle‐grade student achievement and attitudes.  School Science and Mathematics, 100 (5), 236-242.

Parsons, J., & Taylor, L. (2011). Improving student engagement. Current Issues in Education, 14 (1).

Phillips, M., Finkelstein, D., & Wever-Frerichs, S. (2007). School site to museum floor: How informal science institutions work with schools. International Journal of Science Education, 29 (12), 1489–1507.

Potvin, P., & Hasni, A. (2014). Interest, motivation and attitude towards science and technology at K-12 levels: A systematic review of 12 years of educational research. Studies in Science Education, 50 (1), 85–129.

Poudel, D. D., Vincent, L. M., Anzalone, C., Huner, J., Wollard, D., Clement, T., DeRamus, A., & Blakewood, G. (2005). Hands-on activities and challenge tests in agricultural and environmental education. The Journal of Environmental Education, 36 (4), 10–22.

R Core Team. (2019). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. Retrieved March 10, 2020 from  https://www.R-project.org/

*Roberson, S. V. (2010). Science skills on wheels: The exploration of a mobile science lab's influence on teacher and student attitudes and beliefs about science. (Ed.D dissertation). University of Pennsylvania. Retrieved from ProQuest Dissertation Theses Global.

Rosenthal, R. (1979). The file drawer problem and tolerance for null results. Psychological Bulletin, 86 , 638–641. https://doi.org/10.1037/0033-2909.86.3.638

Rothstein, H., Sutton, A. J., & Borenstein, M. (Eds.). (2005). Publication bias in meta-analysis: Prevention, assessment and adjustments. Wiley.

Sadler, P. M., Sonnert, G., Hazari, Z., & Tai, R. (2012). Stability and volatility of STEM career interest in high school: A gender study. Science Education, 96 (3), 411–427

Sacco, K., Falk, J. H., & Bell, J. (2014). Informal science education: Lifelong, life-wide, life-deep. PLoS Biology, 12 (11), e1001986.

Shaby, N., Assaraf, O. B. Z., & Koch, N. P. (2024). Students’ interactions during laboratory group activity in a science museum. International Journal of Science and Mathematics Education, 22 (4), 703–720.

*Shields, N. C. (2010). Elementary students' knowledge and interests related to active learning in a summer camp at a zoo (Doctoral dissertation). Purdue University. Retrieved from ProQuest Dissertation Theses Global.

Silver, A., & Rushton, B. (2008). Primary-school children’s attitudes towards science, engineering and technology and their images of scientists and engineers. Education 3–13, 36 (1), 51–67.

Simpkins, S. D., Davis-Kean, P. E., & Eccles, J. S. (2006). Math and science motivation: A longitudinal examination of the links between choices and beliefs. Developmental Psychology, 42 (1), 70–83.

So, W. W. M., Zhan, Y., Chow, S. C. F., & Leung, C. F. (2018). Analysis of STEM activities in primary students’ science projects in an informal learning environment. International Journal of Science and Mathematics Education, 16 , 1003–1023.

Tai, R. H., Liu, C. Q., Maltese, A. V., & Fan, X. (2006). Career choice: Planning early for careers in science. Science, 312 (5777), 1143–1144.

Thornton, A., & Lee, P. (2000). Publication bias in meta-analysis: Its causes and consequences. Journal of Clinical Epidemiology, 53 (2), 207–216. https://doi.org/10.1016/S0895-4356(99)00161-4

Tillinghast, R. C., Appel, D. C., Winsor, C., & Mansouri, M. (2020, August). STEM outreach: A literature review and definition. Paper presented at 2020 IEEE Integrated STEM Education Conference (ISEC).

Viechtbauer, W. (2010). Conducting meta-analyses in R with the metafor package. Journal of Statistical Software, 36 (3), 1–48. https://doi.org/10.18637/jss.v036.i03

Viechtbauer, W., & Cheung, M. W. L. (2010). Outlier and influence diagnostics for meta-analysis. Research Synthesis Methods, 1 (2), 112–125. https://doi.org/10.1002/jrsm.11

Wang, M.-T. (2012). Educational and career interests in math: A longitudinal examination of the links between classroom environment, motivational beliefs, and interests. Developmental Psychology, 48 (6), 1643–1657.

Wiebe, E., Unfried, A., & Faber, M. (2018). The relationship of STEM attitudes and career interest. EURASIA Journal of Mathematics, Science and Technology Education, 14 (10), 1–17.

Wilson, A. C., Gonzalez, L. L., & Pollock, J. A. (2012). Evaluating learning and attitudes on tissue engineering: A study of children viewing animated digital dome shows detailing the biomedicine of tissue engineering. Tissue Engineering Part A, 18 (5–6), 576–586. https://doi.org/10.1089/ten.tea.2011.0242

Xie, Y., Fang, M., & Shauman, K. (2015). STEM education. Annual Review of Sociology, 41 , 331–357.

Xie, Y., & Killewald, A. A. (2012). Is American science in decline? Harvard University Press.

Book   Google Scholar  

*Yang, D., & Chittoori, B. (2022). Investigating Title I school student STEM attitudes and experience in an after-school problem-based bridge building project. Journal of STEM Education: Innovations and Research, 23 (1), 17–24.

Young, J., Ortiz, N., & Young, J. (2017). STEMulating interest: A meta-analysis of the effects of out-of-school time on student STEM interest. International Journal of Education in Mathematics, Science and Technology, 5 (1), 62–74.

Zhang, D., & Tang, X. (2017). The influence of extracurricular activities on middle school students’ science learning in China. International Journal of Science Education, 39 , 1381–1402. https://doi.org/10.1080/09500693.2017.1332797

Download references

Parts of this work have been supported by the US National Science Foundation (NSF DRL 1811265). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the US National Science Foundation.

Author information

Authors and affiliations.

University of Virginia, 405 Emmet St S, Charlottesville, VA, USA

Georgia State University, 33 Gilmer Street SE, Atlanta, GA, 30303, USA

Lillian R. Bentley

The Chinese University of Hong Kong, Hong Kong SAR, China

Australian Catholic University, 25A Barker Road, Strathfield, NSW, 2135, Australia

Robert H. Tai

You can also search for this author in PubMed   Google Scholar

Contributions

All authors contributed to the study conception and design. Material preparation, data collection were performed by Xin Xia, Lillian Bentley; and analysis by Xin Xia. The first draft of the manuscript was written by Xin Xia and Lillian Bentley, and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Xin Xia .

Ethics declarations

Ethical approval.

Not applicable.

Competing Interests

We declare that there is no competing interests relevant to this meta-analysis. No financial or personal relationships with individuals or organizations have influenced the study design, data collection, analysis, interpretation, or the decision to publish the findings. The research was conducted with impartiality and objectivity to ensure the integrity of the meta-analysis process and the credibility of its outcomes.

Conflict of Interest

The authors have no conflicts of interest to declare. All co-authors have read and agree with the contents of the manuscript. All co-authors have no financial interest to report.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Xia, X., Bentley, L.R., Fan, X. et al. STEM Outside of School: a Meta-Analysis of the Effects of Informal Science Education on Students' Interests and Attitudes for STEM. Int J of Sci and Math Educ (2024). https://doi.org/10.1007/s10763-024-10504-z

Download citation

Received : 12 February 2024

Accepted : 06 September 2024

Published : 17 September 2024

DOI : https://doi.org/10.1007/s10763-024-10504-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Informal science education
  • Meta-analysis
  • Student interests and attitudes
  • Find a journal
  • Publish with us
  • Track your research

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

The PMC website is updating on October 15, 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Indian J Anaesth
  • v.60(9); 2016 Sep

Methodology for research I

Rakesh garg.

Department of Onco-anaesthesiology and Palliative Medicine, Dr. BRAIRCH, All India Institute of Medical Sciences, New Delhi, India

The conduct of research requires a systematic approach involving diligent planning and its execution as planned. It comprises various essential predefined components such as aims, population, conduct/technique, outcome and statistical considerations. These need to be objective, reliable and in a repeatable format. Hence, the understanding of the basic aspects of methodology is essential for any researcher. This is a narrative review and focuses on various aspects of the methodology for conduct of a clinical research. The relevant keywords were used for literature search from various databases and from bibliographies of the articles.

INTRODUCTION

Research is a process for acquiring new knowledge in systematic approach involving diligent planning and interventions for discovery or interpretation of the new-gained information.[ 1 , 2 ] The outcome reliability and validity of a study would depend on well-designed study with objective, reliable, repeatable methodology with appropriate conduct, data collection and its analysis with logical interpretation. Inappropriate or faulty methodology would make study unacceptable and may even provide clinicians faulty information. Hence, the understanding the basic aspects of methodology is essential.

This is a narrative review based on existing literature search. This review focuses on specific aspects of the methodology for conduct of a research/clinical trial. The relevant keywords for literature search included ‘research’, ‘study design’, ‘study controls’, ‘study population’, ‘inclusion/exclusion criteria’, ‘variables’, ‘sampling’, ‘randomisation’, ‘blinding’, ‘masking’, ‘allocation concealment’, ‘sample size’, ‘bias’, ‘confounders’ alone and in combinations. The search engine included PubMed/MEDLINE, Google Scholar and Cochrane. The bibliographies of the searched articles were specifically searched for missing manuscripts from the search engines and manually from the print journals in the library.

The following text highlights/describes the basic essentials of methodology which needs to be adopted for conducting a good research.

Aims and objectives of study

The aims and objectives of research need to be known thoroughly and should be specified before start of the study based on thorough literature search and inputs from professional experience. Aims and objectives state whether nature of the problem (formulated as research question or research problem) has to be investigated or its solution has to be found by different more appropriate method. The lacunae in existing knowledge would help formulate a research question. These statements have to be objective specific with all required details such as population, intervention, control, outcome variables along with time interventions.[ 3 , 4 , 5 ] This would help formulate a hypothesis which is a scientifically derived statement about a particular problem in the defined population. The hypothesis generation depends on the type of study as well. Researcher observation related to any aspect initiates hypothesis generation. A cross-sectional survey would generate hypothesis. An observational study establishes associations and supports/rejects the hypothesis. An experiment would finally test the hypothesis.[ 5 , 6 , 7 ]

STUDY POPULATION AND PATIENT SELECTION, STUDY AREA, STUDY PERIOD

The flow of study in an experimental design has various sequential steps [ Figure 1 ].[ 1 , 2 , 6 ] Population refers to an aggregate of individuals, things, cases, etc., i.e., observation units that are of interest and remain the focus of investigation. This reference population or target population is the group on which the study outcome would be extrapolated.[ 6 ] Once this target population is identified, researcher needs to assess whether it is possible to study all the individuals for an outcome. Usually, all cannot be included, so a study population is sampled. The important attribute of a sample is that every individual should have equal and non-zero chance of getting included in the study. The sample should be made independently, i.e., selection of one does not influence inclusion or exclusion of other. In clinical practice, the sampling is restricted to a particular place (patients attending to clinics or posted for surgery) or includes multiple centres rather than sampling the universe. Hence, the researcher should be cautious in generalising the outcomes. For example, in a tertiary care hospital, patients are referred and may have more risk factors as compared to primary centres where a patient with lesser severity are managed. Hence, researchers must disclose details of the study area. The study period needs to be disclosed as it would make readers understand the population characteristics. Furthermore, study period would tell about relevance of the study with respect to the present period.

An external file that holds a picture, illustration, etc.
Object name is IJA-60-640-g001.jpg

Flow of an experimental study

The size of sample has to be pre-determined, analytically approached and sufficiently large to represent the population.[ 7 , 8 , 9 ] Including a larger sample would lead to wastage of resources, risk that the true treatment effect may be missed due to heterogeneity of large population and would be time-consuming.[ 6 ] If a study is too small, it will not provide the suitable answer to research question. The main determinant of the sample size includes clinical hypothesis, primary endpoint, study design, probability of Type I and II error, power, minimum treatment difference of clinical importance.[ 7 ] Attrition of patients should be attended during the sample size calculation.[ 6 , 9 ]

SELECTION OF STUDY DESIGN

The appropriate study design is essential for the intervention outcome in terms of its best possible and most reliable estimate. The study design selection is based on parameters such as objectives, therapeutic area, treatment comparison, outcome and phase of the trial.[ 6 ] The study design may be broadly classified as:[ 5 , 6 , 7 ]

  • Descriptive: Case report, case series, survey
  • Analytical: Case-control, cohort, cross-sectional
  • Experimental: Randomised controlled trial (RCT), quasi-experiment
  • Qualitative.

For studying causality, analytical observational studies would be prudent to avoid posing risk to subjects. For clinical drugs or techniques, experimental study would be more appropriate.[ 6 ] The treatments remain concurrent, i.e. the active and control interventions happen at the same period in RCT. It may parallel group design wherein treatment and control groups are allocated to different individuals. This requires comparing a placebo group or a gold standard intervention (control) with newer agent or technique.[ 6 ] In matched-design RCT, randomisation is between matched pairs. For cross-over study design, two or more treatments are administered sequentially to the same subject and thus each subject acts as its own control. However, researches should be aware of ‘carryover effect’ of the previous intervention and suitable wash period needs to be ensured. In cohort study design, subjects with disease/symptom or free of study variable are followed for a particular period. The cross-sectional study examines the prevalence of the disease, surveys, validating instruments, tools and questionnaires. The qualitative research is a study design wherein health-related issue in the population is explored with regard to its description, exploration and explanation.[ 6 ]

Selection of controls

The control is required because disease may be self-remitting, Hawthorne effect (change in response or behaviours of subjects when included in study), placebo effect (patients feel improvement even with placebo), effect of confounder, co-intervention and regression to the mean phenomenon (for example, white coat hypertension, i.e. patients at recruitment may have higher study parameter but subsequently may get normal).[ 2 , 6 , 7 ] The control could be a placebo, no treatment, different dose or regimen or intervention or the standard/gold treatment. Avoiding a routine care for placebo is not desirable and unethical. For instance, for studying analgesic regimen, it would be unethical not to administer analgesics in a control group. It is advisable to continue standard of care, i.e. providing routine analgesics even in control group. The use of placebo or no treatment may be considered where no current proven intervention exists or placebo is required to evaluate efficacy or safety of an intervention without serious or irreversible harm.

The comparisons to be made in the study among groups also need to be specified.[ 6 , 7 , 9 ] These comparisons may prove superiority, non-inferiority or equivalence among groups. The superiority trials demonstrate superiority either to a placebo in a placebo-controlled trial or to an active control treatment. The non-inferiority trials would prove that the efficacy of an intervention is no worse than that of the active comparative treatment. The equivalence trials demonstrate that the outcome of two or more interventions differs by a clinically unimportant margin and either technique or drug may be clinically acceptable.

STUDY TOOLS

The study tools such as measurements scales, questionnaires and scoring systems need to be specified with an objective definition. These tools should be validated before its use and appropriate use by the research staff is mandatory to avoid any bias. These tools should be simple and easily understandable to everyone involved in the study.

Inclusion/exclusion criteria

In clinical research, specific group of relatively homogeneous patient population needs to be selected.[ 6 ] Inclusion and exclusion criteria define who can be included or excluded from the study sample. The inclusion criteria identify the study population in a consistent, reliable, uniform and objective manner. The exclusion criteria include factors or characteristics that make the recruited population ineligible for the study. These factors may be confounders for the outcome parameter. For example, patients with liver disease would be excluded if coagulation parameters would impact the outcome. The exclusion criteria are inclusive of inclusion criteria.

VARIABLES: PRIMARY AND SECONDARY

Variables are definite characteristics/parameters that are being studied. Clear, precise and objective definition for measurement of these characteristics needs to be defined.[ 2 ] These should be measurable and interpretable, sensitive to the objective of the study and clinically relevant. The most common end-point is related to efficacy, safety and quality of life. The study variables could be primary or secondary.[ 6 ] The primary end-point, usually one, provides the most relevant, reliable and convincing evidence related to the aim and objective. It is the characteristic on the basis of which research question/hypothesis has been formulated. It reflects clinically relevant and important treatment benefits. It determines the sample size. Secondary end-points are the other objectives indirectly related to primary objective with regard to its close association or they may be some associated effects/adverse effects related to intervention. The measurement timing of the variables must be defined a priori . These are usually done at screening, baseline and completion of trial.

The study end-point parameter may be clinical or surrogate in nature. A clinical end-point is related directly to clinical implications with regard to beneficial outcome of the intervention. The surrogate end-point is indirectly related to patient clinical benefit and is usually measures laboratory measurement or physical sign as a substitute for a clinically meaningful end-point. Surrogate end-points are more convenient, easily measurable, repeatable and faster.

SAMPLING TECHNIQUES: RANDOMISATION, BLINDING/MASKING AND ALLOCATION CONCEALMENT

Randomisation.

Randomisation or random allocation is a method to allocate individuals into one of the groups (arms) of a study.[ 1 , 2 ] It is the basic assumption required for statistical analysis of data. The randomisation would maximise statistical power, especially in subgroup analyses, minimise selection bias and minimise allocation bias (or confounding). This leads to distribution of all the characteristics, measured or non-measured, visible or invisible and known or unknown equally into the groups. Randomisation uses various strategies as per the study design and outcome.

Probability sampling/randomisation

  • Simple/unrestricted: Each individual of the population has the same chance of being included in the sample. This is used when population is small, homogenous and the sampling frame is available. For example, lottery method, table of random numbers or computer-generated
  • Stratified: It is used in non-homogenous population. Population is divided into homogenous groups (strata), and the sample is drawn for each stratum at random. It keeps the ‘characteristics’ of the participants (for example, age, weight or physical status) as similar as possible across the study groups. The allocation to strata can be by equal or proportional allocation
  • Systematic: This is used when complete and up-to-date sampling frame is available. The first unit is selected at random and the rest get selected automatically according to some pre-designed pattern
  • Cluster: This applies for large geographical area. Population is divided into a finite numbers of distinct and identifiable units (sampling units/element). A group of such elements is a cluster and sampling of these clusters is done. All units of the selected clusters are included in the study
  • Multistage: This applies for large nationwide surveys. Sampling is done in stages using random sampling. Here, sub-sampling within the selected clusters is done. If procedure is repeated in more number of stages, then they termed as multistage sampling
  • Multiphase: Here, some data are collected from whole of the units of a sample, and other data are collected from a sub-sample of the units constituting the original sample (two-phase sampling). If three or more phases are used, then they termed as multiphase sampling.

Non-probability sampling/randomisation

This technique does not give equal and non-zero chances to all the individuals in the population to be selected in the sample.

  • Convenience: Sampling is done as per the convenience of the investigator, i.e., easily available
  • Purposive/judgemental/selective/subjective: The sample is selected as per judgement of investigator
  • Quota: It is done as per judgement of the interviewer based on some specified characteristics such as sex and physical status.

ALLOCATION CONCEALMENT

Allocation concealment refers to the process ensuring the person who generates the random assignment remains blind to what arm the person will be allotted.[ 8 , 9 , 10 ] It is a strategy to avoid ascertainment or selection bias. For example, based on an outcome, researcher may recruit a specific category as lesser sicker patients to a particular group and vice versa to the other group. This selective recruitment would underestimate (if treatment group is sicker) or overestimate (if control group is sicker) the intervention effect.[ 9 ] The allocation should be concealed from investigator till the initiation of intervention. Hence, randomisation should be performed by an independent person who is not involved in the conduct of the study or its monitoring. The randomisation list is kept secret. The methods of allocation concealment include:[ 9 , 10 ]

  • Central randomisation: Some centrally independent authority performs randomisation and informs the investigators via telephone, E-mail or fax
  • Pharmacy controlled: Here, pharmacy provides coded drugs for use
  • Sequentially numbered containers: Identical containers equal in weight, similar in appearance and tamper-proof are used
  • Sequentially numbered, opaque, sealed envelopes: The randomised numbers are concealed in opaque envelope to be opened just before intervention and are the most common and easy to perform method.

BLINDING/MASKING

Blinding ensures the group to which the study subjects are assigned not known or easily ascertained by those who are ‘masked’, i.e., participants, investigators, evaluators or statistician to limit occurrence of bias.[ 1 , 2 ] It confirms that the intervention and standard or placebo treatment appears the same. Blinding is different from allocation concealment. Allocation concealment is done before, whereas blinding is done at and after initiation of treatment. In situations such as study drugs with different formulations or medical versus surgical interventions, blinding may not be feasible.[ 8 ] Sham blocks or needling in subjects may not be ethical. In such situation, the outcome measurement should be made objective to the fullest to avoid bias and whosoever may be masked should be blinded. The research manuscript must mention the details about blinding including who was blinded after assignment to interventions and process or technique used. Blinding could be:[ 8 , 9 ]

  • Unblinded: The process cannot conceal randomisation
  • Single blind: One of the participants, investigators or evaluators remains masked
  • Double-blind: The investigator and participants remained masked
  • Triple blind: Not only investigator but also participant maintains a blind data analysis.

BIAS AND CONFOUNDERS

Bias is a systematic deviation of the real, true effect (better or worst outcome) resulting from faulty study design.[ 1 , 2 ] The various steps of study such as randomisation, concealment, blinding, objective measurement and strict protocol adherence would reduce bias.

The various possible and potential biases in a trial can be:[ 7 ]

  • Investigator bias: An investigator either consciously or subconsciously favours one group than other
  • Evaluator bias: The investigator taking end-point variable measurement intentionally or unintentionally favours one group over other. It is more common with subjective or quality of life end-points
  • Performance bias: It occurs when participant knows of exposure to intervention or its response, be it inactive or active
  • Selection bias: This occurs due to sampling method such as admission bias (selective factors for admission), non-response bias (refusals to participate and the population who refused may be different from who participated) or sample is not representative of the population
  • Ascertainment or information bias: It occurs due to measurement error or misclassification of patient. For example, diagnostic bias (more diagnostic procedures performed in cases as compared with controls), recall bias (error of categorisation, investigator aggressively search for exposure variables in cases)
  • Allocation bias: Allocation bias occurs when the measured treatment effect differs from the true treatment effect
  • Detection bias: It occurs when observations in one group are not as vigilantly sought as in the other
  • Attrition bias/loss-to-follow-up bias: It occurs when patient is lost to follow-up preferentially in a particular group.

Confounding occurs when outcome parameters are affected by effects of other factors not directly relevant to the research question.[ 1 , 7 ] For example, if impact of drug on haemodynamics is studied on hypertensive patients, then diabetes mellitus would be confounder as it also effects the hemodynamic response to autonomic disturbances. Hence, it becomes prudent during the designing stage for a study that all potential confounders should be carefully considered. If the confounders are known, then they can be adjusted statistically but with loss of precision (statistical power). Hence, confounding can be controlled either by preventing it or by adjusting for it in the statistical analysis. The confounding can be controlled by restriction by study design (for example, restricted age range as 2-6 years), matching (use of constraints in the selection of the comparison group so that the study and comparison group have similar distribution with regard to potential confounder), stratification in the analysis without matching (involves restriction of the analysis to narrow ranges of the extraneous variable) and mathematical modelling in the analysis (use of advanced statistical methods of analysis such as multiple linear regression and logistic regression). Strategies during data analysis include stratified analysis using the Mantel-Haenszel method to adjust for confounders, using a matched design approach, data restriction and model fitting using regression techniques.

Basic understanding of the methodology is essential to have reliable, repeatable and clinically acceptable outcome. The study plan including all its components needs to be designed before start of the study, and the study protocol should be strictly adhered during the conduct of study.

Financial support and sponsorship

Conflicts of interest.

There are no conflicts of interest.

  • Open access
  • Published: 14 September 2024

"No Papers, No Treatment": a scoping review of challenges faced by undocumented immigrants in accessing emergency healthcare

  • Sezer Kisa 1   na1 &
  • Adnan Kisa 2 , 3   na1  

International Journal for Equity in Health volume  23 , Article number:  184 ( 2024 ) Cite this article

1 Altmetric

Metrics details

Undocumented immigrants face many obstacles in accessing emergency healthcare. Legal uncertainties, economic constraints, language differences, and cultural disparities lead to delayed medical care and thereby exacerbate health inequities. Addressing the healthcare needs of this vulnerable group is crucial for both humanitarian and public health reasons. Comprehensive strategies are needed to ensure equitable health outcomes.

This study aimed to identify and analyze the barriers undocumented immigrants face in accessing emergency healthcare services and the consequences on health outcomes.

We used a scoping review methodology that adhered to established frameworks. Utilizing MEDLINE/PubMed, Embase, Web of Science, PsychoInfo, and the Cumulative Index to Nursing and Allied Health Literature (CINAHL), we identified 153 studies of which 12 focused on the specific challenges that undocumented immigrants encounter when accessing emergency healthcare services based on the inclusion and exclusion criteria.

The results show that undocumented immigrants encounter significant barriers to emergency healthcare, including legal, financial, linguistic, and cultural challenges. Key findings were the extensive use of emergency departments as primary care due to lack of insurance and knowledge of alternatives, challenges faced by health professionals in providing care to undocumented migrants, increased hospitalizations due to severe symptoms and lack of healthcare access among undocumented patients, and differences in emergency department utilization between irregular migrants and citizens. The findings also serve as a call for enhanced healthcare accessibility and the dismantling of existing barriers to mitigate the adverse effects on undocumented immigrants' health outcomes.

Conclusions

Undocumented immigrants' barriers to emergency healthcare services are complex and multifaceted and therefore require multifaceted solutions. Policy reforms, increased healthcare provider awareness, and community-based interventions are crucial for improving access and outcomes for this vulnerable population. Further research should focus on evaluating the effectiveness of these interventions and exploring the broader implications of healthcare access disparities.

Introduction

People who live without legal authorization in a foreign country form a significant global demographic [ 1 ]. The terms "immigrant" and "migrant" are often used interchangeably in this context; however, "immigrant" typically refers to individuals who move to another country with the intention of permanent settlement, whereas "migrant" can refer to those who move temporarily, often for work, and may not intend to stay permanently [ 2 ]. Estimates suggest there are approximately 281 million international migrants worldwide, a substantial portion of whom lack legal status in their host countries [ 3 ]. For instance, in the United States alone, it is estimated that there are around 10.5 million undocumented immigrants, representing about 3.2% of the total U.S. population [ 4 ]. Similarly, in the European Union, there are an estimated 3.9 to 4.8 million undocumented migrants [ 5 ].These individuals face many obstacles in accessing healthcare. Such obstacles include lack of health insurance, fear of deportation, ineligibility for government programs, and language and cultural differences [ 1 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 ]. Addressing their healthcare needs is crucial not only from a humanitarian perspective but also for public health, as their exclusion from healthcare systems has serious consequences [ 15 , 16 ].

Studies found that financial barriers to healthcare included high out-of-pocket payments, high service prices, fragmented financial support, limited funding capacity, fear of deportation, and delayed referral [ 12 , 17 ]. Geographic challenges also play a role, with many migrants living in areas where healthcare facilities are either overwhelmed or scarce. These barriers hinder not only access to routine care but also emergency services, contributing to wider public health concerns [ 7 , 12 , 17 , 18 , 19 ].

In emergency care situations, undocumented immigrants face even greater challenges. They often avoid essential treatment due to financial problems and fear of legal actions [ 1 , 6 , 10 , 12 , 17 , 18 ]. Even when they do seek emergency care, they often encounter language and cultural differences that can lead to misunderstandings and inappropriate treatment [ 7 , 12 ]. This avoidance of essential care not only endangers their health but also affects the health of the community [ 10 , 11 , 13 ].

Although extensive searches were conducted, no systematic reviews were found that specifically addressed the difficulties undocumented immigrants have in accessing emergency care. The phrase "No Papers, No Treatment," used in the title of this study, reflects the harsh reality that undocumented immigrants often face when seeking healthcare. This phrase, which has been echoed in various advocacy platforms and public discussions, encapsulates the severe barriers to care that this population experiences. This scoping review aims to bridge this gap by examining those very challenges. The objectives of this review are threefold: 1) to identify the specific barriers encountered; 2) to understand the reported consequences of these barriers on undocumented immigrants; and 3) to examine the solutions that have been proposed to improve their access to emergency care. By undertaking this study, we aim to provide a foundational understanding of the complexities involved in access to emergency healthcare for undocumented immigrants, thereby contributing to the body of knowledge and suggesting pathways for future research and policy development. This is the first study to address this neglected issue in healthcare research and policy.

Methodology

This scoping review was designed by integrating the methodologies described by Arksey and O'Malley (2005) [ 20 ] and further refined by Levac et al. (2010) [ 21 ]. The research team consisted of two reviewers, who are also the authors of this work. These reviewers formulated the main research objectives and outlined the review by defining the search terms, identifying the databases for the literature search, and establishing the inclusion and exclusion criteria. We selected the MEDLINE/PubMed, Embase, Web of Science, PsychoInfo, and Cumulative Index to Nursing and Allied Health Literature (CINAHL) databases due to their extensive coverage of medical, psychological, and health literature. The search terms were chosen to cover a wide array of relevant components ("emergency" OR "emergency care") AND ("undocumented immigrants" OR "illegal immigrants" OR "unauthorized immigrants" OR "undocumented migrants" OR "irregular migrants"). This ensured the inclusion of literature that specifically addressed barriers faced by undocumented immigrants in accessing emergency care.

The search and selection processes were conducted by both reviewers. Duplicates were removed, followed by two parallel and separate screenings of titles and abstracts by each reviewer. The full-text review and data extraction were also performed independently by each reviewer, with any disagreements resolved through discussion. Our scoping review did not include a formal quality assessment of the included studies, in line with Arksey & O'Malley's (2005) [ 20 ] recommendations for scoping reviews. We limited our review to peer-reviewed research articles that examined undocumented immigrants' barriers to emergency care and were published in English up to February 29, 2024. Studies were excluded if they did not focus on undocumented immigrants in accessing emergency care, were not related to undocumented immigrants, were not based on empirical research, or were published in languages other than English. This extensive selection process resulted in a total of 12 studies for the final review (Fig.  1 ).

figure 1

All findings were entered in EndNote (version 21). The data from the included studies, which related to characteristics such as author, publication year, study design and participants, sample size, study purpose, and key findings were extracted and charted by the first author in Excel to address the research objectives.

This review uncovered 12 studies on emergency care use by undocumented individuals in the United States [ 13 , 18 , 22 , 23 , 24 ], Switzerland [ 25 ], Denmark [ 9 ], French Guiana [ 10 ], Israel [ 19 ], Norway [ 15 , 26 ], and Spain [ 16 ]. The methodologies of the studies varied. They encompassed six cross-sectional surveys [ 10 , 13 , 18 , 19 , 22 , 24 ], one prospective cohort design [ 25 ], one historical cohort study [ 15 ], one case-control study [ 23 ], one observational cross-sectional study [ 26 ], and two qualitative studies [ 9 , 16 ]. Notably, the study by Jiménez-Lasserrotte et al. (2023) included valuable insights from nurses who were directly involved in the care of child migrants, highlighting their critical role in health and social triage, as well as in addressing the immediate health needs of this vulnerable population. Sample sizes varied significantly across these studies, ranging from small-scale qualitative interviews with 12 participants [ 9 ] to large-scale analyses involving over half a million individuals [ 19 ]. The studies were published between 1996 and 2023.

Key findings were the excessive use of emergency departments for primary care due to lack of insurance and knowledge of alternatives [ 22 ], challenges faced by health professionals in providing care to undocumented migrants [ 9 ], increased hospitalizations due to severe symptoms, and lack of healthcare access [ 10 , 23 ], and differences in emergency department utilization between irregular migrants and citizens [ 19 ] (Table  1 ).

Barriers to accessing emergency healthcare

Barriers to accessing emergency care were broadly categorized under six themes: linguistic, financial, legal, cultural, health literacy, and other (Table  2 ).

Lack of health insurance [ 9 , 10 , 13 , 19 , 22 , 23 , 24 , 25 ], restricted medical benefits [ 22 ], high costs associated with healthcare [ 10 , 25 ], financial constraints due to unemployment or underemployment [ 19 ]; and exclusion from general practitioner and reimbursement schemes [ 15 ] were reported as the financial barriers to emergency care.

Most of the legal barriers were related to one's undocumented status and lack of entitlements, such as a health insurance card or identity number [ 9 , 10 , 15 , 16 , 19 , 22 , 23 , 25 , 26 ]. Fear of being reported to authorities [ 13 , 22 , 24 ] was mentioned in three studies. Administrative hurdles and systemic healthcare challenges, which include complications due to lack of proper documentation or previous medical records and the inefficiencies within the healthcare system itself, were also reported [ 9 , 15 , 26 ].

Transportation issues and lack of childcare were among the other barriers that prevented timely access to emergency healthcare [ 18 ]. Geographical remoteness and the complexity of health insurance systems [ 10 ], the patchwork system of safety net care (which is especially relevant to emergency renal disease care and the inconsistency in healthcare policies) [ 23 ], and structural vulnerabilities such as poor working and living conditions [ 15 , 26 ], were other assorted factors affecting the migrants’ accessibility and utilization of healthcare services.

Consequences of barriers

The costs of these identified barriers were increased reliance on emergency departments as primary care sources, higher rates of unfunded visits, and delays in treatment [ 22 ]; unintended pregnancies, delayed prenatal care, increased exposure to violence during pregnancy [ 25 ]; and limited access resulting in neglect of preventive care and excessive emergency service use [ 13 , 18 ]. The researchers also identified disparities such as: unequal access to primary care, delayed treatment, and administrative burdens [ 9 ]; fears leading to delayed healthcare access and higher emergency severity [ 24 ]; extended emergency department stays and lower hospitalization rates for non-severe conditions [ 19 ]; substandard antenatal care and related risks [ 15 , 26 ]; more severe conditions upon hospital arrival and higher hospitalization rates [ 10 ]; and specific issues such as increased emergent dialysis usage and associated costs [ 23 ] (Table  3 ).

Suggested solutions

The studies advocate for systemic changes to improve healthcare accessibility and quality for undocumented immigrants. Free or low-cost services and culturally appropriate education [ 25 ], increased social and economic resources [ 13 ], information dissemination through trusted sources [ 18 ], legal clarification and language support [ 9 ], patient education about confidentiality and health rights [ 24 ], initiatives to better healthcare access for undocumented migrants and affordable insurance options [ 10 ], and inclusive Medicaid policies [ 23 ] were all recommended. Furthermore, comprehensive care that addresses health, social, and emotional aspects, with culturally adapted and coordinated approaches, were also suggested [ 16 , 19 ] (Table  3 ).

Research gaps and future directions

The studies identified several significant gaps and future research needs in healthcare access for undocumented immigrants. These include understanding the impacts of legislative measures [ 22 ], access to care without documentation [ 13 , 25 ], improving prenatal care, variations in emergency room use, effects of information sources, and structural impacts on healthcare-seeking behaviors [ 18 ]. Other urgent areas for research are the impact of fear on healthcare access, ensuring understanding of a patient's rights and confidentiality, exploring health needs in regions with significant migrant populations, understanding intersections of immigration status with ethnicity in care disparities, and focusing on healthcare access and community care strategies for migrants [ 9 , 19 , 23 ]. Finally, investigating comprehensive care pathways, uncovering structural vulnerabilities that affect health coverage, and developing enhanced protocols for vulnerable migrant populations are imperative for future healthcare improvement and policy development [ 10 , 24 ] (Table  3 ). Additionally, there is a notable lack of qualitative insight from undocumented immigrants/migrants themselves regarding their experiences and perspectives on accessing emergency healthcare. Future research should prioritize capturing these first-hand accounts to better understand the nuanced challenges faced by this population and to inform more effective and empathetic policy interventions.

This scoping review aimed to identify and synthesize research on the challenges faced by undocumented immigrants in accessing emergency healthcare. The objectives were to identify specific barriers to care, understand the consequences of those barriers, and explore proposed solutions to improve access. Despite differences in methodologies, participants, and regional focus, the studies highlighted the urgent need for systemic reform to improve healthcare accessibility for undocumented populations.

Barriers to accessing emergency care

Ensuring equitable access to safe, well-organized, and high-quality emergency care services for all individuals in need can help mitigate health disparities [ 27 ]. However, several barriers were found that prevent undocumented immigrants from accessing emergency care. Most significantly, the fear of deportation led immigrants to avoid healthcare facilities [ 23 , 24 ]. Asch et al. found that individuals who feared seeing a doctor lest they get reported to the immigration authorities were nearly four times more prone to delaying care for over two months, increasing the risk of disease transmission [ 28 ]. Brenner et al. noted that deportation fears forced undocumented immigrants with end-stage renal disease (ESRD) to seek emergency care only when their condition became life-threatening [ 29 ].

Cultural and linguistic barriers further complicate these challenges. Many immigrants rely on social media or friends for health information due to a lack of trust in healthcare systems [ 24 ]. Granero-Molina et al. [ 30 ] note that health providers struggle to provide care due to language barriers and cultural misunderstandings [ 30 ]. Additionally, transportation issues, childcare responsibilities, and systemic inefficiencies hinder timely access to care, particularly in emergencies [ 15 , 18 , 26 ].

Structural vulnerabilities also play a role, as immigrants often live and work in environments that limit their access to healthcare [ 15 , 26 ]. DuBard and Massing emphasize that healthcare access for undocumented immigrants is further impeded by the complexity of health insurance systems [ 31 ]. These systemic barriers result in a system where undocumented immigrants rely on emergency departments, leading to overcrowding and increased costs [ 22 , 23 ]. Hsia and Gil-González note that legal ambiguities and administrative barriers exacerbate challenges in providing consistent healthcare access to undocumented immigrants [ 32 ].

Barriers to emergency care have many consequences for undocumented immigrants. Relying on emergency departments for primary care leads to delays in treatment, worsening conditions, and higher hospitalization rates [ 10 , 22 ]. Pregnant and undocumented women risk delayed prenatal care and exposure to violence [ 15 , 25 , 26 ]. Limited access to primary care results in untreated conditions becoming acute emergencies [ 19 ]. For patients with chronic conditions such as ESRD, limited access to regular hemodialysis forces them to rely on emergency departments for emergency-only hemodialysis EOHD, resulting in higher morbidity, mortality, and costs [ 23 , 33 ]. Patients receiving EOHD often experience severe symptoms such as hyperkalemia and uremia before seeking emergency care [ 34 ]. Clinicians providing EOHD also report significant morale distress due to the substandard care they have to provide [ 33 , 35 ]. In addition, cultural barriers during emergency triage contribute to inadequate care for undocumented immigrants, particularly those arriving by small boats in Europe [ 30 ]. Although our study did not specifically examine mental health conditions, it is well-documented that undocumented immigrants frequently experience significant mental health challenges due to the stress of living in uncertain conditions. This is particularly concerning in emergency department settings, where overcrowding and limited resources often result in inadequate mental health care for this vulnerable population.

Proposed solutions

Addressing these challenges requires systemic improvements to healthcare access and quality for undocumented immigrants. Cervantes et al. [ 34 ] argue that enhancing access to primary and preventive care through free or low-cost services and culturally appropriate education can help reduce the reliance on emergency departments for non-emergency conditions [ 34 ]. Nandi et al. (2008) [ 13 ]emphasized the need for increased social and economic resources.

Legal clarification and policy changes that explicitly include undocumented immigrants in healthcare systems are essential. Improved access to primary care, coupled with patient education about their rights and the confidentiality of healthcare services, can alleviate fears related to immigration status [ 9 , 24 ]. Affordable health insurance options and inclusive Medicaid (a joint federal and state program in the United States that provides health coverage to eligible low-income individuals and families) policies would significantly improve access to care and reduce the financial burden on safety-net programs [ 10 , 23 ]. Brenner et al. (2021) [ 29 ] argue that systemic efforts to improve public health, reduce the effects of injury and illness, and secure access to emergency and basic health care for all must involve policies that prioritize care over immigration enforcement.

Programs that enhance access to primary care and consider broader inclusion policies can improve outcomes for undocumented immigrants [ 19 ]. The inclusion of diverse healthcare provider perspectives, such as those of nurses, as seen in Jiménez-Lasserrotte et al. (2023), is crucial for developing comprehensive care strategies that address the unique needs of undocumented populations. Addressing structural vulnerabilities, including working and living conditions, is essential for improving healthcare access and quality. Accessible antenatal care and comprehensive healthcare that addresses physical, social, and emotional needs are crucial for vulnerable populations [ 16 ]. Addressing legislative barriers and reducing administrative burdens, as highlighted by the challenges faced in Spain, is also essential for ensuring equitable healthcare access [ 32 ]. By focusing on these systemic changes, healthcare systems can better accommodate the needs of undocumented immigrants, ensuring they receive the necessary care without unnecessary legal and administrative obstacles. Cultural mediation can help to bridge gaps in understanding between healthcare providers and undocumented immigrants [ 30 ].

Significant research gaps remain in understanding the full extent of healthcare challenges faced by undocumented immigrants. Further research is needed to understand the impact of legislative measures on healthcare access [ 22 ]. Additionally, studies should explore the influence of one's undocumented status on healthcare access and outcomes, especially in prenatal care [ 13 , 25 ]. Comprehensive studies on emergency room use, information sources, and structural barriers to healthcare are needed [ 18 ].

More comprehensive studies on healthcare access and quality for undocumented immigrants are required to inform effective policies [ 9 ]. Addressing the impact of fear on healthcare access, along with strategies to ensure that immigrants understand their rights, is critical [ 24 ]. Research should focus on developing effective community care strategies to overcome healthcare barriers for migrant populations [ 19 ]. Understanding the structural vulnerabilities affecting health coverage is imperative for future care improvement and policy development [ 15 , 26 ]. Further research should also explore the impact of administrative barriers and the challenges of policy implementation, as seen in Spain, to develop more effective solutions [ 32 ]. Additionally, research should prioritize examining the mental health challenges faced by undocumented immigrants, particularly in emergency settings. Given the limited resources in emergency departments, there is a critical need for targeted interventions that address these mental health needs to improve care and outcomes for this population.

Limitations

This review has several limitations. First, a restriction to English-language publications may have excluded important studies published in other languages and limited the global representativeness of our findings. Second, the exclusion of gray literature sources, such as reports and conference abstracts, may have overlooked valuable insights, restricting the breadth and depth of our review. Third, the heterogeneous methodologies employed across included studies introduced variability and could have complicated direct comparison and synthesis of findings. These limitations emphasize the need for careful interpretation and draw attention to areas where methodological improvements are needed in future research.

In conclusion, this comprehensive review found a diverse range of barriers faced by undocumented immigrants in accessing emergency healthcare services. Legal, financial, linguistic, cultural, and systemic factors collectively contribute to adverse health outcomes and strain emergency healthcare systems. Proposed solutions encompass policy initiatives such as enacting inclusive healthcare policies, together with community-based interventions like culturally tailored education and improved information dissemination. Further research is needed to understand the intersectionality of barriers, evaluate the effectiveness of proposed interventions, and assess the impact of legislative measures on healthcare access. By dismantling structural barriers, fostering cultural competency, and prioritizing the healthcare needs of undocumented immigrants, policymakers and practitioners can advance health equity agendas and foster a more inclusive healthcare landscape. Overall, addressing the diverse barriers to emergency healthcare access for undocumented immigrants is crucial for promoting health equity and improving public health outcomes. We will only achieve a truly healthy society when all its members, documented and otherwise, receive the care they need and deserve.

Availability of data and materials

All data generated or analyzed during this study are included in this published article.

Data availability

No datasets were generated or analysed during the current study.

Martinez O, Wu E, Sandfort T, Dodge B, Carballo-Dieguez A, Pinto R, et al. Evaluating the impact of immigration policies on health status among undocumented immigrants: a systematic review. J Immigr Minor Health. 2015;17(3):947–70.

Article   PubMed   PubMed Central   Google Scholar  

(IOM). IOfM. Glossary on Migration. 2019. Available from: https://publications.iom.int/books/international-migration-law-ndeg34-glossary-migration .

Affairs. UNDoEaS. International migration 2020 highlights. United Nations. 2020. Available from: https://www.un.org/development/desa/pd/news/international-migration-2020 .

Center PR. Facts on U.S. immigrants. 2020. Available from: https://www.pewresearch.org/race-and-ethnicity/2020/08/20/facts-on-u-s-immigrants/ .

Migration. IOf. World migration report 2020; 2020.

Clifford N, Blanco N, Bang SH, Heitkemper E, Garcia AA. Barriers and facilitators to healthcare for people without documentation status: a systematic integrative literature review. J Adv Nurs. 2023;79(11):4164–95.

Article   PubMed   Google Scholar  

Hacker K, Anies M, Folb BL, Zallman L. Barriers to health care for undocumented immigrants: a literature review. Risk Manag Healthc Policy. 2015;8:175–83.

Jacquez F, Vaughn L, Zhen-Duan J, Graham C. Health care use and barriers to care among latino immigrants in a new migration area. J Health Care Poor Underserved. 2016;27(4):1761–78.

Jensen NK, Norredam M, Draebel T, Bogic M, Priebe S, Krasnik A. Providing medical care for undocumented migrants in Denmark: what are the challenges for health professionals? BMC Health Serv Res. 2011;11:154.

Jolivet A, Cadot E, Angénieux O, Florence S, Lesieur S, Lebas J, Chauvin P. Use of an emergency department in Saint-Laurent du Maroni, French guiana: does being undocumented make a difference? J Immigr Minor Health. 2014;16(4):586–94.

Metcalf M, Comey D, Hines D, Chavez-Reyes G, Moyce S. "Because We Are Afraid": voices of the undocumented in a new immigrant destination in the United States. J Public Health Policy. 2024.;45(2):367–77.

Molina RL, Beecroft A, Pazos Herencia Y, Bazan M, Wade C, DiMeo A, et al. Pregnancy care utilization, experiences, and outcomes among undocumented immigrants in the united states: a scoping review. Womens Health Issues. 2024:34(4):370–80.

Nandi A, Galea S, Lopez G, Nandi V, Strongarone S, Ompad DC. Access to and use of health services among undocumented Mexican immigrants in a US urban area. Am J Public Health. 2008;98(11):2011–20.

Vargas Bustamante A, Fang H, Garza J, Carter-Pokras O, Wallace SP, Rizzo JA, Ortega AN. Variations in healthcare access and utilization among Mexican immigrants: the role of documentation status. J Immigr Minor Health. 2012;14(1):146–55.

Eick F, Vallersnes OM, Fjeld HE, Sørbye IK, Storkås G, Ekrem M, et al. Use of non-governmental maternity services and pregnancy outcomes among undocumented women: a cohort study from Norway. BMC Pregnancy Childbirth. 2022;22(1):789.

Jiménez-Lasserrotte MDM, Artés-Navarro R, Granero-Molina J, Fernández-Medina IM, Ruiz-Fernández MD, Ventura-Miranda MI. Experiences of healthcare providers who provide emergency care to migrant children who arriving in Spain by small boats (Patera): a qualitative study. Children (Basel). 2023;10(6):1079.

PubMed   PubMed Central   Google Scholar  

Cabral J, Cuevas AG. Health inequities among latinos/hispanics: documentation status as a determinant of health. J Racial Ethn Health Disparities. 2020;7(5):874–9.

Akincigil A, Mayers RS, Fulghum FH. Emergency room use by undocumented Mexican immigrants. J Sociol Soc Welf. 2011;38(4):33–50.

Google Scholar  

Shachaf S, Davidovitch N, Halpern P, Mor Z. Utilization profile of emergency department by irregular migrants and hospitalization rates: lessons from a large urban medical center in Tel Aviv, Israel. Int J Equity Health. 2020;19(1):56.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Arksey H, O’Malley L. Scoping studies: towards a methodological framework. Int J Soc Res Methodol. 2005;8(1):19–32.

Article   Google Scholar  

Levac D, Colquhoun H, O’Brien KK. Scoping studies: advancing the methodology. Implement Sci. 2010;5:69.

Chan TC, Krishel SJ, Bramwell KJ, Clark RF. Survey of illegal immigrants seen in an emergency department. West J Med. 1996;164(3):212–6.

CAS   PubMed   PubMed Central   Google Scholar  

Madden EF, Qeadan F. Dialysis hospitalization inequities by hispanic ethnicity and immigration status. J Health Care Poor Underserved. 2017;28(4):1509–21.

Maldonado CZ, Rodriguez RM, Torres JR, Flores YS, Lovato LM. Fear of discovery among Latino immigrants presenting to the emergency department. Acad Emerg Med. 2013;20(2):155–61.

Wolff H, Epiney M, Lourenco AP, Costanza MC, Delieutraz-Marchand J, Andreoli N, et al. Undocumented migrants lack access to pregnancy care and prevention. BMC Public Health. 2008;8:93.

Eick F, Vallersnes OM, Fjeld HE, Sørbye IK, Ruud SE, Dahl C. Use of emergency primary care among pregnant undocumented migrants over ten years: an observational study from Oslo, Norway. Scand J Prim Health Care. 2023;41(3):317–25.

Organization WH. Emergency care systems for universal health coverage: ensuring timely care for the acutely ill and injured. Draft resolution proposed by Argentina, Ecuador, Eswatini, Ethiopia, Israel, the European Union and its Member States and the United States of America. 2019.

Asch S, Frayne S, Waitzkin H. To discharge or not to discharge: ethics of care for an undocumented immigrant. J Health Care Poor Underserved. 1995;6(1):3–9.

Article   CAS   PubMed   Google Scholar  

Brenner JM, Blutinger E, Ricke B, Vearrier L, Kluesner NH, Moskop JC. Ethical issues in the access to emergency care for undocumented immigrants. J Am Coll Emerg Phys Open. 2021;2(3):e12461.

Granero-Molina J, Jiménez-Lasserrrotte MDM, Fernández-Sola C, Hernández-Padilla JM, Sánchez Hernández F, López DE. Cultural issues in the provision of emergency care to irregular migrants who arrive in spain by small boats. J Transcult Nurs. 2019;30(4):371–9.

DuBard CA, Massing MW. Trends in emergency Medicaid expenditures for recent and undocumented immigrants. JAMA. 2007;297(10):1085–92.

Hsia RY, Gil-González D. Perspectives on Spain’s legislative experience providing access to healthcare to irregular migrants: a qualitative interview study. BMJ Open. 2021;11(8):e050204.

Cervantes L, Richardson S, Raghavan R, Hou N, Hasnain-Wynia R, Wynia MK, et al. Clinicians’ perspectives on providing emergency-only hemodialysis to undocumented immigrants: a qualitative study. Ann Intern Med. 2018;169(2):78–86.

Cervantes L, Tuot D, Raghavan R, Linas S, Zoucha J, Sweeney L, et al. Association of emergency-only vs standard hemodialysis with mortality and health care use among undocumented immigrants with end-stage renal disease. JAMA Intern Med. 2018;178(2):188–95.

Welles CC, Cervantes L. Barriers to providing optimal dialysis care for undocumented immigrants: policy challenges and solutions. Semin Dial. 2020;33(1):52–7.

Download references

Open access funding provided by OsloMet - Oslo Metropolitan University

Author information

Sezer Kisa and Adnan Kisa contributed equally to this work.

Authors and Affiliations

Department of Nursing and Health Promotion, Faculty of Health Sciences, Oslo Metropolitan University, Oslo, Norway

School of Health Sciences, Kristiania University College, Oslo, Norway

Department of International Health and Sustainable Development, Tulane University, New Orleans, USA

You can also search for this author in PubMed   Google Scholar

Contributions

AK, SK: Conceptualization, Methodology, acquisition of data, interpretation of data;  AK: Writing- Original draft preparation, SK: reviewing and editing. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Sezer Kisa .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for publication

Competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Kisa, S., Kisa, A. "No Papers, No Treatment": a scoping review of challenges faced by undocumented immigrants in accessing emergency healthcare. Int J Equity Health 23 , 184 (2024). https://doi.org/10.1186/s12939-024-02270-9

Download citation

Received : 12 June 2024

Accepted : 06 September 2024

Published : 14 September 2024

DOI : https://doi.org/10.1186/s12939-024-02270-9

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Emergency healthcare
  • Health equity
  • Public health
  • Undocumented immigrants

International Journal for Equity in Health

ISSN: 1475-9276

inclusion and exclusion criteria in literature review importance

  • Study protocol
  • Open access
  • Published: 17 September 2024

Synbiotics in patients at risk for spontaneous preterm birth: protocol for a multi-centre, double-blind, randomised placebo-controlled trial (PRIORI)

  • Katrien Nulens   ORCID: orcid.org/0000-0002-6718-1454 1 , 2 ,
  • Els Papy 1 ,
  • Katrien Tartaglia 3 ,
  • Isabelle Dehaene 4 ,
  • Hilde Logghe 5 , 6 ,
  • Joachim Van Keirsbilck 6 ,
  • Frédéric Chantraine 7 ,
  • Veronique Masson 7 ,
  • Eva Simoens 8 ,
  • Willem Gysemans 9 ,
  • Liesbeth Bruckers 10 ,
  • Sarah Lebeer 11 ,
  • Camille Nina Allonsius 11 ,
  • Eline Oerlemans 11 ,
  • Deborah Steensels 12 , 13 ,
  • Marijke Reynders 14 ,
  • Dirk Timmerman 2 , 15 ,
  • Roland Devlieger 2 , 15 &
  • Caroline Van Holsbeke 1 , 5 , 6  

Trials volume  25 , Article number:  615 ( 2024 ) Cite this article

Metrics details

Prematurity remains one of the main causes of neonatal morbidity and mortality. Approximately two thirds of preterm births are spontaneous, i.e. secondary to preterm labour, preterm prelabour rupture of membranes (PPROM) or cervical insufficiency. Etiologically, the vaginal microbiome plays an important role in spontaneous preterm birth (sPTB). Vaginal dysbiosis and bacterial vaginosis are well-known risk factors for ascending lower genital tract infections and sPTB, while a Lactobacillus crispatus-dominated vaginal microbiome is associated with term deliveries. Synbiotics may help to achieve and/or maintain a normal, Lactobacillus-dominated vaginal microbiome.

We will perform a multi-centre, double-blind, randomised, placebo-controlled trial. Women aged 18 years or older with a singleton pregnancy are eligible for inclusion at 8 0/7 –10 6/7  weeks gestational age if they have one or more of the following risk factors for sPTB: previous sPTB at 24 0/7 –35 6/7  weeks, prior PPROM before 36 0/7  weeks, or spontaneous pregnancy loss at 14 0/7 –23 6/7  weeks of gestation. Exclusion criteria are multiple gestation, cervix conisation, inflammatory bowel disease, uterine anomaly, and the use of pro-/pre-/synbiotics. Patients will be randomised to oral synbiotics or placebo, starting before 11 weeks of gestation until delivery. The oral synbiotic consists of eight Lactobacillus species (including L. crispatus) and prebiotics. The primary outcome is the gestational age at delivery. Vaginal microbiome analysis once per trimester (at approximately 9, 20, and 30 weeks) and delivery will be performed using metataxonomic sequencing (16S rRNA gene) and microbial culture. Secondary outcomes include PPROM, the use of antibiotics, antenatal admission information, and neonatal outcomes.

This study will evaluate the effect of oral synbiotics on the vaginal microbiome during pregnancy in a high-risk population and correlate the microbial changes with the gestational age at delivery and relevant pregnancy outcomes.

Trial registration

ClinicalTrials.gov, NCT05966649. Registered on April 5, 2024.

Peer Review reports

Administrative information

Note: the numbers in curly brackets in this protocol refer to SPIRIT checklist item numbers. The order of the items has been modified to group similar items (see http://www.equator-network.org/reporting-guidelines/spirit-2013-statement-defining-standard-protocol-items-for-clinical-trials/ ).

Title {1}

Synbiotics in patients at risk for preterm birth: a multi-centre double-blind randomised placebo-controlled trial (PRIORI).

Trial registration {2a and 2b}.

ClinicalTrials.gov ID: NCT05966649

Protocol version {3}

Version 3.1 (April 2024)

Funding {4}

Belgian Health Care Knowledge Centre (KCE)

Author details {5a}

Katrien NULENS , Els PAPY , Katrien TARTAGLIA , Isabelle DEHAENE , Hilde LOGGHE , Joachim VAN KEIRSBILCK , Frédéric CHANTRAINE , Veronique MASSON , Eva SIMOENS , Willem GYSEMANS , Liesbeth BRUCKERS , Sarah LEBEER , Camille Nina ALLONSIUS , Eline OERLEMANS , Deborah STEENSELS , Marijke REYNDERS , Dirk TIMMERMAN , Roland DEVLIEGER , Caroline VAN HOLSBEKE

Ziekenhuis Oost-Limburg, Department of Obstetrics and Gynaecology, Genk, Belgium

KULeuven, Department of Development and Regeneration, Cluster Woman and Child, Leuven, Belgium

Clinical Trial Unit, Ziekenhuis Oost-Limburg, Genk, Belgium

Ghent University Hospital, Department of Obstetrics and Gynaecology, Ghent, Belgium

AZ Sint-Lucas Bruges, Department of Obstetrics and Gynaecology, Bruges, Belgium

AZ Sint-Jan Bruges, Department of Obstetrics and Gynaecology, Bruges, Belgium

Hopital Citadelle, CHU Liège, Department of Obstetrics and Gynaecology, Liège, Belgium

AZ Groeninge, Department of Obstetrics and Gynaecology, Kortrijk, Belgium

Ziekenhuis Oost-Limburg, Department of Paediatrics and Neonatal Intensive Care Unit, Genk, Belgium

I-Biostat, Data Science Institute, Hasselt University, Diepenbeek, Belgium

Department of Bioscience Engineering, Research Group Applied Microbiology and Biotechnology, University of Antwerp, Antwerp, Belgium

Ziekenhuis Oost-Limburg, Department of Microbiology, Genk, Belgium

Université Libre de Bruxelles, Faculty of Medicine, Brussels, Belgium

AZ Sint-Jan Bruges, Department of Microbiology, Bruges, Belgium

University Hospitals Leuven, Department of Obstetrics and Gynaecology, Leuven, Belgium

Name and contact information for the trial sponsor {5b}

Ziekenhuis Oost Limburg Autonome Verzorgingsinstelling (ZOL AV), Campus Sint-Jan, Synaps Park 1, 3600 Genk, Belgium.

Main contact person: Katrien Tartaglia. Phone: + 32 89 21 2020 / E-mail: [email protected]

Legal contact person: Myriam Goemans. Phone: + 32 89 80 8012 / E-mail: [email protected]

Role of sponsor {5c}

ZOL AV acts as sponsor of the clinical trial, as defined in the Law of 2004, and has all responsibilities and liabilities in connection therewith and procures the mandatory liability insurance coverage in accordance with the Law of 2004.

Introduction

Background and rationale {6a}.

Preterm birth (PTB), defined as delivery before 37 weeks of gestation, remains the main cause of neonatal mortality and severe, potentially lifelong morbidity [ 1 ]. Moreover, the costs associated with the care for premature neonates, as well as long-term health problems, place an important economic burden on parents and health care systems. The overall PTB rate is around 10% worldwide, with large regional differences ranging from 4 to 16% [ 2 ]. Approximately two-thirds of all premature deliveries are non-iatrogenic or spontaneous, following preterm labour, preterm prelabour rupture of membranes (PPROM), or cervical insufficiency without prodromal labour [ 3 , 4 , 5 ].

Etiologically, spontaneous PTB (sPTB) is a multifactorial condition of which the exact cause and mechanism cannot be identified in most patients. However, it is hypothesised that intrauterine inflammation secondary to an ascending infection from the lower genital tract plays an important role in a significant proportion of sPTB cases [ 3 , 6 , 7 ]. The knowledge about the vaginal microbiome is evolving quickly. Besides microscopy, bacterial cultures, and polymerase chain reaction (PCR) tests, next-generation sequencing (NGS) is nowadays able to map a vaginal microbiome and provide more insights in physiologic versus pathologic microbiome changes during pregnancy. Hence it is known that a disturbed vaginal microbiome, especially bacterial vaginosis (BV), early in pregnancy is an important risk factor for (subclinical) chorioamnionitis, PPROM, and sPTB [ 8 , 9 , 10 ]. On the other hand, a Lactobacillus—particularly L. crispatus—dominated vaginal microbiome in the first trimester is strongly predictive for term delivery [ 11 ]. While microbial diversity decreases throughout pregnancy in patients with term deliveries, a progressive depletion of lactobacilli and increasing diversity is observed in pregnancies complicated by PPROM and subsequent PTB, with a maximal diversity, and thus microbial instability, between 24 and 30 weeks of gestation [ 12 ]. This new evidence suggests that a progressively disturbed microbiome from early pregnancy on triggers the sPTB cascade and precedes clinically evident and culture/PCR-detectable infections. Consequently, instead of treating (asymptomatic) infections or secondary complications such as preterm labour and PPROM, interventions to prolong pregnancy duration in patients at risk for sPTB should start early.

A personal history of sPTB is a major risk factor for sPTB in subsequent pregnancies [ 4 , 13 , 14 , 15 ]. However, interventions that interfere with the pathophysiology of sPTB in these high-risk patients are currently lacking. While antibiotics are effective in treating infections and BV in pregnancy, there is no effect on PTB rates [ 16 , 17 ]. Recent literature suggests that supporting the vaginal microbiome with probiotics, rather than treating infections, could be a promising strategy [ 12 ]. Probiotics are ‘live microorganisms that, when administered in adequate amounts, confer a health benefit on the host’ (WHO definition [ 18 ], modified in the International Scientific Association for Probiotics and Prebiotics (ISAPP) consensus statement [ 19 ]), while a prebiotic is ‘a substrate that is selectively utilised by host microorganisms conferring a health benefit’ (ISAPP consensus statement [ 20 ]). Probiotic lactobacilli significantly increase vaginal Lactobacillus counts and stabilise the vaginal flora in patients with BV or dysbiosis without causing adverse effects [ 21 , 22 ]. Previous studies with probiotics in pregnancy could not consistently demonstrate a favourable effect on pregnancy duration. However, clinical trials were very heterogenous in terms of patient population, primary outcome, study design, probiotic composition (bacterial species and strains), route of administration (oral or vaginal), timing, and duration of probiotic intake and were often underpowered to assess an effect on PTB rate or gestational age at delivery [ 23 , 24 , 25 , 26 , 27 , 28 , 29 , 30 , 31 , 32 , 33 , 34 , 35 , 36 , 37 ]. Based on the available literature, probiotics should probably be started early and continued throughout pregnancy to create a stable, Lactobacillus-(preferable L. crispatus-)dominated vaginal microbiome. Lactobacillus crispatus containing probiotics are promising, as L. crispatus is considered a biomarker of a healthy vaginal ecosystem and predictor for term birth [ 11 , 38 , 39 , 40 ].

Objectives {7}

The aim of this study is to investigate the effect of oral synbiotics, a combination of probiotic bacteria and prebiotics, on the vaginal microbiome composition and on pregnancy duration in patients at risk for sPTB. The main hypothesis states that Lactobacillus crispatus containing synbiotics, when started early in pregnancy, support and maintain a healthy vaginal microbiome that is more resistant to ascending infections and associated with term delivery. It is expected that synbiotics can prolong pregnancy duration through favourable microbiome changes and that this higher gestational age (GA) at delivery translates into improved neonatal outcomes.

Trial design {8}

The PRIORI trial is a double-blind, randomised placebo-controlled trial wherein pregnant women at risk for sPTB will be recruited early in the first trimester of pregnancy and randomised in a 1:1 ratio into two parallel groups: an intervention (synbiotic) group and a control (placebo) group. The primary analysis of the primary endpoint (GA at delivery) is a superiority analysis comparing the intervention group with the control group.

An internal pilot study will be integrated in the full randomised controlled trial (RCT) to assess the effect of the oral synbiotics on the vaginal microbiome in the first 154 consecutively recruited patients. If the vaginal microbiome analyses after 4 weeks of treatment are in favour of the synbiotic, an independent data monitoring committee will decide to continue to the second phase (completion of the trial).

Methods: participants, interventions, and outcomes

Study setting {9}.

The recruiting hospitals are mainly tertiary-level centres, both community and academic hospitals with a high-risk antenatal ward (also called maternal intensive care unit or MIC) and a neonatal intensive care unit (NICU) or N*. Only centres that have fulfilled all the duties with regard to study selection and training will be allowed to randomise patients. Recruitment will start in seven Belgian teaching hospitals: Ziekenhuis Oost-Limburg, University Hospitals Leuven, Ghent University Hospital, Citadelle Hospital CHU Liège, AZ Sint-Lucas Bruges, AZ Sint-Jan Bruges, and AZ Groeninge Kortrijk. An up-to-date list of all study sites can be found on the PRIORI website: https://www.prioritrial.be/en/deelnemende-ziekenhuizen .

Eligibility criteria {10}

The study population consist of pregnant women at risk for sPTB based on their obstetric history. Inclusion and exclusion criteria are summarised in Table 1 . Spontaneous preterm birth is defined as delivery at viable preterm gestation (24 0/7 until 35 6/7 weeks) following preterm labour, PPROM, or cervical insufficiency. Spontaneous pregnancy late loss is defined as delivery at previable gestation (14 0/7 –23 6/7 weeks) following preterm labour, PPROM, or cervical insufficiency. Written informed consent must be obtained from the participant or authorised surrogate before inclusion.

Who will take informed consent? {26a}

Written informed consent will be obtained from eligible patients by signing an informed consent form after detailed information is provided by a trained and delegated physician (maternal-foetal medicine specialist, gynaecologist or resident in obstetrics and gynaecology). The Principal Investigator retains overall responsibility for the informed consent of participants at their site and ensures that any person with delegated responsibility to participate in the informed consent process is duly authorised, trained, and competent according to the ethically approved protocol, principles of Good Clinical Practice (GCP) and Declaration of Helsinki.

Additional consent provisions for collection and use of participant data and biological specimens {26b}

By signing the informed consent form, the study participants agree with collecting both maternal and neonatal clinical data and with taking and analysing biological specimens as described in the protocol (vaginal swabs, placental pathology, urine culture, and placental swabs in sPTB cases).

Interventions

Explanation for the choice of comparators {6b}.

Patients in the control group will take an oral placebo that visually and physically matches the investigational product. All study participants will receive standard care (e.g. vaginal progesterone, serial cervical length measurements, etc.).

Intervention description {11a}

Pregnant women at risk for sPTB who are eligible for inclusion will be included and randomised into the intervention or control group at 8 0/7 –10 6/7 weeks of gestation. All study participants will start taking synbiotics or placebo, respectively, immediately after treatment allocation and continue until delivery. The daily dose is two capsules, one in the morning and one in the evening. Patients will be instructed about normal personal intimate hygiene and advised to avoid excessive genital cleaning or vaginal washings.

The investigational product (IP) is an oral synbiotic containing eight probiotic Lactobacillus strains, in total 2 × 10 10 colony-forming units (CFU) per daily dose of two capsules, furthermore the prebiotics inulin, fructooligosaccharids (FOS), and D-mannose; see Table 2 . The excipients are magnesium bisglycinate, magnesium stearate, and silicon dioxide. Capsules with enteric coating will be used to ensure delayed release for both the IP and placebo. The placebo capsules will only contain the excipients mentioned above. IP and placebo will be stored below 25°C.

Criteria for discontinuing or modifying allocated interventions {11b}

Previous studies demonstrated the safety of pro- and synbiotics in pregnancy [ 41 , 42 ]. Therefore, we do not anticipate serious adverse events or complications secondary to taking either the IP or the placebo that could necessitate treatment discontinuation or modification.

Strategies to improve adherence to interventions {11c}

Patients will be given a diary that includes reporting the intake of the IP or placebo, a visit schedule, and contact information. The diary will be reviewed by a delegated team member and discussed with the patient on every study visit. Adherence to the intervention will be checked based on the patient’s diary entries and the number of returned, unused capsules of IP or placebo.

Relevant concomitant care permitted or prohibited during the trial {11d}

At each study visit (including unscheduled visits such as hospital admissions), concomitant medication is checked and documented by the principal investigator (PI) or delegated team member. Because certain interventions may influence the study outcomes, standardised treatment protocols for indications that are relevant for the research question were made in collaboration with all participating centres. Recommendations were made for the use of vaginal progesterone (200 mg once daily for the indication of sPTB prevention), tocolysis, corticosteroids for foetal lung maturation, magnesium sulphate for neuroprotection, cervical cerclage or pessary placement, antifungal medication for symptomatic Candidiasis, and antibiotics for PPROM, Group B Streptococci (GBS) prophylaxis, symptomatic BV, and common sexually transmitted infections. The use of any pre-, pro-, or synbiotic other than the IP is not allowed during the study.

Provisions for post-trial care {30}

N/a. Post-trial care includes the routine postpartum follow-up and this is not influenced by trial participation. We do not anticipate any harm from trial participation.

Outcomes {12}

The primary outcome is the gestational age at delivery in weeks plus days, expressed as mean and standard deviation (SD) for both groups. Secondary outcomes include PTB rates in subcategories based on the GA at delivery (extreme PTB from 24 0/7 until 27 6/7 weeks, severe PTB from 28 0/7 until 31 6/7 weeks, and moderate to late PTB from 32 0/7 until 36 6/7 weeks of gestation) expressed as number ( n ) and proportion (%), PPROM rates ( n , %) and GA at PPROM (weeks + days), vaginal microbiome analysis (see further), midstream urine culture at 16 weeks of gestation, GBS screening at 35 to 37 weeks of gestation, placental pathology, and neonatal outcomes (see further).

The vaginal microbiome will be examined once per trimester in order to correlate the effect of the oral synbiotic on the microbiome with pregnancy duration and duration of intake. Vaginal swabs will be taken at inclusion (8 0/7 to 10 6/7 weeks), at 19 0/7 to 21 0/7 weeks, at 29 0/7 to 31 0/7 weeks, at delivery, and upon admission at the high-risk antenatal ward for threatened preterm birth (PPROM, preterm labour or short cervix). The vaginal microbiome will be analysed by bacterial culture and metataxonomic 16S rRNA gene sequencing. Because of potential interference with NGS analysis, patients on vaginal progesterone for PTB prevention are instructed to hold the dose of progesterone the evening before that study visit and to resume immediately after the study visit. Furthermore, the following swabs will be sampled during the PRIORI trial, frozen, and stored for metataxonomic sequencing with alternative funding: one vaginal swab for NGS at day 0, 3, 6, 9, 14, and 28 as long as the patient is admitted after PPROM, placental swabs in sPTB cases, and neonatal meconium swabs in PPROM cases. The results of the swabs mentioned above, sampled in the context of this study, will stay blind until the end of the trial and will not influence the patient’s care.

During the internal pilot phase of the RCT, additional vaginal swabs will be taken during one extra study visit 4 weeks after the start of treatment. The primary outcomes of the internal pilot study are (1) the difference in total Lactobacillus abundance after 4 weeks of treatment compared to baseline by metataxonomic sequencing and (2) the vaginal detection of the Lactobacillus gasseri strain of the IP after 4 weeks of treatment by quantitative PCR (qPCR). The choice for L. gasseri as qPCR target is based upon the relatively low prevalence of natural L. gasseri dominance (< 10%), as compared to L. crispatus (around 40%) [ 43 ], in our European patient population (recently confirmed in the large-scale Belgian Isala project, Antwerp University), and the technical limitations in strain-specificity of qPCR.

Significant differences in pregnancy duration may be reflected in improved neonatal outcomes. Based on the Delphi consensus and in line with the Core Outcome Measures in Effectiveness Trials (COMET) initiative [ 44 ], we selected the following neonatal outcomes: neonatal mortality, birth weight, necrotising enterocolitis, bronchopulmonary dysplasia, intraventricular haemorrhage, encephalopathy of prematurity, infectious parameters (duration and number of antibiotic courses, early and late-onset sepsis), duration and type of respiratory support, surfactant administration, retinopathy, other neonatal morbidity, and duration of neonatal admission.

The economic impact and quality of life (QoL) will be assessed using the Work Productivity and Activity Impairment (WPAI) and EQ-5D questionnaire, respectively, during visit 2, 3, 4, 5, unscheduled visits (at admission and after 1 week), at delivery and during the neonatal follow-up period.

For continuous variables, mean and SD will be presented by study group and the difference (treatment effect) will also be presented with a 95% confidence interval. For binary variables, counts and percentages will be presented by the study group. The odds ratio, comparing the intervention group with the control group, and 95% confidence interval will also be presented.

Participant timeline {13}

The schedule of enrolment, interventions, and assessments can be found in the schematic diagram in Fig. 1 .

figure 1

Schedule of enrolment, interventions, and assessments. 1 Visit number and gestational age in weeks (w). 2 Admission on the high-risk antenatal ward for preterm labour, PPROM or short cervix. 3 New informed consent forms will be signed by the mother for the collection of neonatal data. 4 Height is only measured on visit 1 to calculate start Body Mass Index. 5 Microbial culture and metataxonomic sequencing. 6 Transvaginal ultrasound (TVUS). 7 Group B Streptococci (GBS) rectovaginal swab. 8 Quality of life questionnaire. 9 Work Productivity and Activity Impairment questionnaire

Sample size {14}

The sample size calculation is derived from the primary outcome: gestational age at delivery (continuous variable). To detect a clinically relevant difference in pregnancy duration of 1 week between the intervention and control group with sufficient statistical power (i.e. 90%), assuming a SD of 3 weeks [ 27 , 28 ], and with an alpha of 0.05, 382 patients are required in a 1:1 randomisation. These power calculations were based on a two-sided two-sample t -test. Information regarding the correlation between patients from the same centre, quantified by the intraclass correlation coefficient, is currently unavailable and thus could not be factored into the sample size calculation. However, it is anticipated that the intraclass correlation coefficient for gestational age at delivery is small, and therefore, the conducted power calculations are deemed applicable.

Anticipating that the primary outcome may not be available for a small proportion of the randomised patients due to reasons such as lost to follow-up or withdrawal of consent, a dropout rate of 5% is accounted for to maintain a power of 90%. Consequently, the total number of patients to be recruited for the trial is calculated as 402.

The sample size calculation was conducted using SAS for Windows, version 9.4.

Recruitment {15}

Patients will be recruited in seven Belgian teaching hospitals within a period of approximately 36 months. Depending on the recruitment speed, more sites will be activated to enrol 402 participants. The initial approach for pre-screening potential patients will be done by a member of the patient’s existing clinical care team. Only physicians who are members of the PRIORI study team will inform the patient. If the treating physician is not a member of the PRIORI study team, he or she could refer the patient to a PRIORI investigator. The PRIORI investigator will confirm the eligibility of the patient and then the screening process can be started after a written informed consent is obtained.

Assignment of interventions: allocation

Sequence generation {16a}.

An automated web-based system is used to randomly assign patients in a 1:1 ratio with variable block sizes, stratified for smoking (‘yes’ versus ‘no’ for smoking in the last 12 weeks before randomisation) according to the study centre.

Concealment mechanism {16b}

An automated web-based randomisation system will be used (randomized.net). The allocated treatment number will be checked by two delegated study members before the treatment (IP or placebo) is given to the patient.

Implementation {16c}

Only the PI or a qualified and delegated person (study nurse, midwife, or physician) to whom he/she has delegated this study task can enrol and randomise participants in the automated web-based system.

Assignment of interventions: blinding

Who will be blinded {17a}.

As this is a double-blind randomised placebo-controlled trial, both the investigators and the patients will be blinded. The study team includes all care providers, site and sponsor staff, outcome assessors, and data analysts.

Procedure for unblinding if needed {17b}

Participants, site and sponsor staff, care providers, and data analysts will remain blinded from the time of treatment allocation until database lock. Though we do not foresee serious adverse events related to the IP, the study code will only be broken for valid medical or safety reasons. Unblinding of a patient can be performed by a physician of the study team. On receipt of the treatment allocation details, the PI or treating health care professional will continue to deal with the participant’s medical emergency as appropriate. The PI/investigation team documents the breaking of the code and the reasons for doing so on the medical notes and CRF. It will also be documented at the end of the trial in any final study report. Unblinded data are to be kept strictly confidential until the time of unblinding of the trial and will not be accessible by anyone else involved in the trial with the following exceptions: the project manager of the company responsible for the labelling and packaging of the IP.

Data collection and management

Plans for assessment and collection of outcomes {18a}.

Individual patient data, included in the sponsor database and recorded in an electronic case report form (eCRF), will be handled in compliance with all applicable laws and regulations. The data collected will be pseudonymised and the data will only be used for the purpose(s) of this trial. All missing and ambiguous data will be queried. The study data will be transcribed from the source documents onto an eCRF by study staff within five working days of the participant’s visit. Worksheets may be used for the capture of some data to facilitate completion of the eCRF. Any such worksheets will become part of the subjects’ source documentation. Every effort should be made to ensure assessments susceptible to rater effects, which are to be recorded in the eCRF, are carried out by the same individual who conducted the initial screening assessment. The investigator must verify that all data entries in the eCRF are accurate and correct. All eCRF entries, corrections, and alterations must be made by the Investigator or other authorised study-site personnel. In case of a query, the investigator or an authorised member of the investigational staff must adjust the eCRF (if applicable) and complete the query.

PRIORI uses an eCRF to collect the data which will be used to perform statistical analysis for the trial. The CRF has been constructed to ensure (1) adequate data collection (only the data required by the protocol are captured in the CRF) and (2) proper audit trails to demonstrate the validity of the trial (both during and after the trial). An annotated CRF is developed with coding convention as will be used in the database. At the end of the trial, a copy of the CRF of each enrolled patient will be provided to the PI for archiving. The PI is responsible to keep records of all participating patients (sufficient information to link records e.g. CRFs, hospital records and samples), all original signed informed consent forms and copies of the CRF pages.

Swabs and requisition forms for NGS or PCR analysis will be labelled and frozen until shipment (once per 3 months) to the expert laboratory at Antwerp University. Swabs for classic microbial culture are analysed in the local lab.

Plans to promote participant retention and complete follow-up {18b}

Participants will be encouraged to remain on the assigned treatment and in close pregnancy monitoring for the total duration of the study. However, at any time during the study and without giving reasons, subjects may withdraw from the study at their own request or at the request of their legally acceptable representative. The subject will not suffer any disadvantage as a result of their withdrawal or discontinuation. In cases where subjects indicate they do not want to continue, investigators must determine whether this refers to discontinuation of study treatment, unwillingness to attend the follow-up visit, unwillingness to have telephone contact, unwillingness to have any contact with study staff, or unwillingness to allow contact with a third party (e.g., family member, doctor). In all cases, the reason for discontinuation (including ‘at the subject’s request’) will be recorded in the eCRF and in the participant’s medical records.

Data management {19}

Data management will agree with the ‘EU General Data Protection Regulation’. All collected study data will be recorded and stored in the CRF created with the CASTOR© software. To protect the privacy of the participants, all collected data will be encoded. Following the creation of a new study record in the eCRF, a study-specific patient code will be created. The study code, e.g. 01-PRIORI-023, will consist of a code specific for the site of recruitment (i.e. 01, 02, etc.), followed by the abbreviation of the study (PRIORI), and an incremental three-digit number per centre (starting from 001 in order of inclusion). CASTOR© complies with all applicable medical data privacy laws and regulations: GCP, 21 CFR Part 11, EU Annex 11, the European Data Protection Directive, ISO9001, and ISO27001/NEN7510. The principal investigator will be responsible for data entry and the quality of the data at her hospital. The sponsor will be responsible for the data analysis. Detailed information regarding data handling and record keeping is provided in the data management plan.

Confidentiality {27}

Personal information will be collected, kept secure, and maintained in a way that conforms all regulation concerning the protection of privacy. Encoded, depersonalised data where the participant’s identifying information is replaced by an unrelated sequence of characters will be created. The maintenance of the data and the linking code will be secured in separate locations using encrypted digital files within password-protected folders and storage media. Access will be limited to the minimum number of specific individuals necessary for quality control, audit, and analysis. The confidentiality of data will be preserved when the data are transmitted to sponsors and co-investigators. The data will be stored for at least 25 years and the sponsor is the data custodian.

Plans for collection, laboratory evaluation, and storage of biological specimens for genetic or molecular analysis in this trial/future use {33}

Vaginal, placental, and neonatal meconium swabs for microbiome analysis will be sampled in duplicate: one swab for local testing, i.e. bacterial culture in the local lab of each study site, and one swab for central testing, i.e. molecular analysis in a specialised laboratory (University of Antwerp). Furthermore, the midstream urine culture and placenta pathology will be analysed locally as part of standard care. Biological specimens will be labelled appropriately. All results will stay blinded, except those that are considered standard care.

Swabs for next-generation sequencing will be stored in a − 20 °C freezer within 4 h after collection, until shipment to the central laboratory. Every working day, the freezer temperature will be monitored and large deviations (normal range − 15 °C until − 40 °C) will be notified to the study team. The samples will be shipped on dry ice by Inter Healthcare Transport once every 3 months.

Statistical methods

Statistical methods for primary and secondary outcomes {20a}.

Statistical analysis will be conducted using SAS for Windows, version 9.4 or higher.

The flow of patients will be described using a CONSORT-statement flow diagram.

Descriptive statistics will be presented for patient baseline characteristics to assess baseline comparability between the intervention and control group. For continuous variables, means and standard deviations will be reported or median and interquartile range (IQR) if the distribution is skewed. For binary and categorical variables, numbers and proportions will be given for each category.

The primary analysis of the primary endpoint, GA at delivery, is a superiority analysis comparing the intervention and control groups using a two-sided test at a 5% significance level. Mean GA and standard deviation will be reported for both groups. The treatment effect, along with a 95% confidence interval and p -value, will be evaluated using a linear mixed model. The statistical model will include a fixed treatment effect and random centre effect to correct for potential correlations between patients recruited in the same centre. The analyses will be performed for the intention-to-treat (ITT) population, employing multiple imputation techniques to address missing data. No covariate adjustment will be made in the primary analysis. If necessary, the transformation will be applied to the primary endpoint to achieve an approximately normal distribution, as visually assessed through diagnostic plots. Secondary analysis of the primary endpoint includes a frailty model for time-to-event outcomes.

For the secondary outcomes, the treatment effect will be investigated using a mixed model approach: a linear mixed model for continuous variables, a generalized mixed model (logistic/proportional odds) for binary/count outcome variables, and a frailty model for time-to-event outcomes. The treatment effect and 95% confidence interval will be obtained from this model. For continuous outcomes, the difference in means will be estimated and for binary outcomes, the odds ratio and for time-to-event outcomes the hazard ratio. No covariate adjustments will be made. Continuous variables with skewed distributions will be presented using the median and IQR, and treatment effect will be explored through endpoint transformation and/or nonparametric methods. All secondary efficacy analyses will be performed using the ITT population.

Interim analyses {21b}

N/A. Interim analyses will not be performed.

Methods for additional analyses (e.g. subgroup analyses) {20b}

N/A. Subgroup analyses will not be performed.

Methods in analysis to handle protocol non-adherence and any statistical methods to handle missing data {20c}

The primary endpoint will be analysed on the ITT population and multiple imputation techniques will be used to account for the missing data. Furthermore, we will perform a per-protocol analysis. Patients with an overall compliance of less than 50% (defined as taking the recommended dose on less than 50% of the days during the treatment period), and participants who start taking pre-, pro-, or synbiotics other than the study medication for more than 14 consecutive days, will be excluded in the per-protocol analysis.

Plans to give access to the full protocol, participant level-data, and statistical code {31c}

The PRIORI trial is registered on ClinicalTrials.gov with ID NCT05966649. We do not plan granting public access to the full protocol.

Oversight and monitoring

Composition of the coordinating centre and trial steering committee {5d}.

The trial management group includes those individuals responsible for the day-to-day management of the trial, such as the chief investigator, statistician, trial coordinator, and data manager. The role of the trial management group is to monitor all aspects of the conduct and progress of the trial, ensure that the protocol is adhered to, and take appropriate action to safeguard participants and the quality of the trial itself.

The role of the Trial Steering Committee (TSC) is to provide the overall supervision of the trial. The TSC is composed of the chief investigator, a statistician, the trial coordinator, an independent expert, a neonatologist, representatives of other participating centres, up to two patient representatives, one representative of the sponsor, and one representative of the funder. The TSC will monitor trial progress and conduct and advise on scientific credibility. It will consider and act, as appropriate, and ultimately carries the responsibility for deciding whether a trial needs to be stopped on grounds of safety or efficacy. The TSC will meet on average three times per year in the first year and twice per year thereafter and send reports to the sponsor and funder. KCE shall have the right (but not the obligation) to be present at each TSC meeting.

Composition of the data monitoring committee, its role and reporting structure {21a}

The independent Data Monitoring Committee (iDMC) is an independent group of experts that will be responsible for the follow-up of the data of the internal pilot study. They will review the unblinded data periodically and recommend whether the results of the pilot study are favourable to proceed with the full study, based on preset, well-defined cutoffs of the pilot endpoints (change in Lactobacillus dominance and PCR detection of L. gasseri).

Adverse event reporting and harms {22}

Pregnancy complications will be collected on every study visit. Only specific data about gastrointestinal complaints will be collected as an AE, when there is a causal relationship with the IP: bloating, nausea, vomiting, diarrhoea, constipation, flatulence, abdominal pain, or intestinal cramps. Participants are instructed to report any new gastrointestinal symptom in the dairy, which will be reviewed by a delegated team member on the next study visit. Serious adverse events will also only be collected if they are likely related to the IP and should be reported within 24 h after awareness of the event; however, these are not anticipated since previous studies with synbiotics or probiotics have demonstrated safety in pregnancy [ 41 ].

Frequency and plans for auditing trial conduct {23}

The investigator will permit direct access to trial data and documents for the purpose of monitoring, audits, and/or inspections by authorised entities such as, but not limited to, the sponsor or its designees and competent regulatory or health authorities. As such, eCRFs, source records, and other trial-related documentation (e.g. the TMF, pharmacy records, etc.) will be kept current, complete, and accurate at all times. The frequency of audits has not been defined at this stage.

Plans for communicating important protocol amendments to relevant parties (e.g. trial participants, ethical committees) {25}

Substantial amendments that require review by the ethical committee (EC) will not be implemented until the EC grants a favourable opinion for the study. All correspondence with the EC will be retained in the Trial Master File/Investigator Site File.

During the study, the valuable opinion of patient representatives will be asked whenever changes need to be made. For example, participants will be actively involved in the review of protocol amendments, changes in ICF, and their opinion and input will be valuable for recruitment. Patients from both Flanders and Wallonia and with a different cultural background are part of the TSC.

Dissemination plans {31a}

Upon completion of the trial, the data arising from the trial will be analysed and tabulated to create the Final Study Report, which can be accessed online as well as on ClinicalTrials.gov. Upon approval of the TSC, the investigators will publish the primary study results within 6 months after the statistical analysis. Funding by KCE will be acknowledged in publications.

The trial participants will be notified about the outcome of the trial by a newsletter containing a reference of the published manuscript.

The primary study results of the PRIORI study will be reported fully and made publicly available when the research has been completed. All researchers shall ensure that the outcome of the research is prepared as a research paper for publication in a suitable peer-reviewed, preferably open-access, journal. The Consort Guidelines and checklist will be reviewed prior to generating any publications for the trial to ensure they meet the standards required for submission to high-quality peer-reviewed journals.

The PRIORI study is designed to investigate the effectiveness of oral synbiotic supplements in supporting the vaginal microbiome and prolonging gestation in high-risk pregnant women. To our knowledge, this is the first randomised placebo-controlled trial wherein synbiotics are started before 12 weeks of gestation and that is powered to detect a clinically relevant difference in pregnancy duration in a population at risk for spontaneous preterm birth. Common limitations of previous clinical studies were the low-risk patient population, small sample size, the use of probiotic species (other than L. crispatus) that are less likely to improve vaginal health, and the late start of pro/synbiotics (e.g. only in the third trimester or after PPROM).

Furthermore, serial vaginal swabs for metataxonomic analysis allow to understand how the vaginal microbiome changes throughout pregnancy in patients receiving synbiotics versus placebo, as well as in those with preterm versus term deliveries. This is important to interpret clinical outcomes and to learn more about the pathophysiologic mechanism of sPTB.

To recruit a sufficiently large population that meets the eligibility criteria, patients will be recruited in seven Belgian hospitals, all with a high-risk antenatal ward and neonatal intensive care unit. If the recruitment speed would be lower than anticipated, additional hospitals can participate in this trial.

The IP is a synbiotic containing eight Lactobacillus strains, including L. crispatus. Multi-strain and multi-species probiotic mixtures may be more effective than single-strain probiotics because of the synergistic effects of multiple strains (e.g. increased inhibition of pathogens, production of substances that facilitate adhesion of other Lactobacillus strains) [ 45 , 46 ]. Moreover, different lactobacilli have different modes of action and interactions with host microorganisms.

We choose the oral route of administration. Oral probiotics have shown to significantly increase vaginal Lactobacillus counts and restore the vaginal microbiome in patients with BV and vaginal dysbiosis without causing adverse effects [ 22 , 47 ]. Like urogenital pathogens, lactobacilli ascend the vagina from the rectum [ 47 , 48 ]. After oral intake, probiotic bacteria are able to colonise the colonic and rectal microbiome, which functions as a reservoir for both lactobacilli and pathogens ascending to the vagina [ 49 , 50 , 51 ]. Therefore, we hypothesise that parallel changes to the intestinal microbiome after oral intake may contribute to a more sustainable and stable vaginal microbiome. Furthermore, oral intake is simple and easy to implement into the patient’s daily routine, which might be an advantage in terms of compliance compared to vaginal administration, certainly for an intervention that requires long-term and daily administration, in a patient population already using daily vaginal progesterone.

This trial has some limitations inherent to the protocol. First, the PRIORI trial is a pragmatic clinical study powered to detect a difference in pregnancy duration, but not (composite) neonatal outcome. There is no long-term follow-up of the neonate after discharge or during childhood. Nevertheless, multiple relevant neonatal outcomes, consistent with the COMET consensus [ 44 ], will be collected after birth and data collection will continue until discharge from the NICU, when applicable. Second, we do not plan a cost-effectiveness analysis. However, data about the total duration of antenatal and neonatal hospital admissions will be collected, which indirectly allow for economic comparisons.

Trial status

The current protocol version 3.1 was issued on April 5, 2024. Recruitment will start in May 2024 and will be completed by 2028.

Availability of data and materials {29}

Only the TSC has access to the final trial dataset in order to ensure that the overall results are not disclosed by an individual trial site prior to manuscript publication. Site investigators will be allowed to access the full dataset if a formal request describing their plans is approved by the TSC.

Abbreviations

Adverse event

Bronchopulmonary dysplasia

Bacterial vaginosis

Colony-forming unit

Chief investigator

Core Outcome Measures in Effectiveness Trials

Case report form

Clinical study report

Ethics committee

Electronic case report form

EuroQol five dimension (questionnaire)

Fructooligosaccharids

Gestational age

Group B Streptococcus

Good Clinical Practice

Good manufacturing practice

Independent Data Monitoring Committee

Investigational product

Belgian Healthcare Knowledge Centre

Lactobacillus

Large loop excision of the transformation zone

Maternal intensive care

Mid-trimester pregnancy loss

Necrotizing enterocolitis

  • Next-generation sequencing

Neonatal intensive care unit

Polymerase chain reaction

Principal investigator

Preterm prelabour rupture of membranes

  • Preterm birth

Quality of life

Quantitative polymerase chain reaction

  • Randomised controlled trial

Serious adverse event

Socio-economic status

Spontaneous preterm birth

Sexual transmitted infections

Trial Master File

Trial Management Group

Trial Steering Committee

Work Productivity and Activity Impairment questionnaire

Ziekenhuis Oost Limburg Autonome Verzorgingsinstelling

Kramer MS, Demissie K, Yang H, Platt RW, Sauvé R, Liston R. The contribution of mild and moderate preterm birth to infant mortality. Fetal and infant health study group of the canadian perinatal surveillance system. JAMA. 2000;284(7):843–9. https://doi.org/10.1001/jama.284.7.843 .

PubMed   Google Scholar  

World Health Organization. Born too soon: decade of action on preterm birth. 2023. https://creativecommons.org/licenses/by-nc-sa/3.0/igo/ .

Google Scholar  

Goldenberg RL, Culhane JF, Iams JD, Romero R. Epidemiology and causes of preterm birth. Lancet. 2008;371(9606):75–84. https://doi.org/10.1016/S0140-6736(08)60074-4 .

PubMed   PubMed Central   Google Scholar  

Purisch SE, Turitz AL, Elovitz MA, Levine LD. The effect of prior term birth on risk of recurrent spontaneous preterm birth. Am J Perinatol. 2018;35(4):380–4. https://doi.org/10.1055/s-0037-1607317 .

Dehaene I, Scheire E, Steen J, et al. Obstetrical characteristics and neonatal outcome according to aetiology of preterm birth: a cohort study. Arch Gynecol Obstet. 2020;302(4):861–71. https://doi.org/10.1007/s00404-020-05673-5 .

Goldenberg RL, Hauth JC, Andrews WW. Intrauterine infection and preterm delivery. N Engl J Med. 2000;342(20):1500–7. https://doi.org/10.1056/NEJM200005183422007 .

Romero R, Gómez R, Chaiworapongsa T, Conoscenti G, Kim JC, Kim YM. The role of infection in preterm labour and delivery. Paediatr Perinat Epidemiol. 2001;15(Suppl 2):41–56. https://doi.org/10.1046/j.1365-3016.2001.00007.x .

Freitas AC, Bocking A, Hill JE, Money DM, VOGUE Research Group. Increased richness and diversity of the vaginal microbiota and spontaneous preterm birth. Microbiome. 2018;6(1):117. https://doi.org/10.1186/s40168-018-0502-8 .

Kosti I, Lyalina S, Pollard KS, Butte AJ, Sirota M. Meta-analysis of vaginal microbiome data provides new insights into preterm birth. Front Microbiol. 2020;11: 476. https://doi.org/10.3389/fmicb.2020.00476 .

Donders GG, Van Calsteren K, Bellen G, et al. Predictive value for preterm birth of abnormal vaginal flora, bacterial vaginosis and aerobic vaginitis during the first trimester of pregnancy. BJOG. 2009;116(10):1315–24. https://doi.org/10.1111/j.1471-0528.2009.02237.x .

Kindinger LM, Bennett PR, Lee YS, et al. The interaction between vaginal microbiota, cervical length, and vaginal progesterone treatment for preterm birth risk. Microbiome. 2017;5(1):6. https://doi.org/10.1186/s40168-016-0223-9 .

Brown RG, Al-Memar M, Marchesi JR, et al. Establishment of vaginal microbiota composition in early pregnancy and its association with subsequent preterm prelabor rupture of the fetal membranes. Transl Res. 2019;207:30–43. https://doi.org/10.1016/j.trsl.2018.12.005 .

McManemy J, Cooke E, Amon E, Leet T. Recurrence risk for preterm delivery. Am J Obstet Gynecol. 2007;196(6):576.e1–6. https://doi.org/10.1016/j.ajog.2007.01.039 . discussion 576.e6–7.

Yang J, Baer RJ, Berghella V, et al. Recurrence of preterm birth and early term birth. Obstet Gynecol. 2016;128(2):364–72. https://doi.org/10.1097/AOG.0000000000001506 .

Edlow AG, Srinivas SK, Elovitz MA. Second-trimester loss and subsequent pregnancy outcomes: what is the real risk? Am J Obstet Gynecol. 2007;197(6):581.e1–5816. https://doi.org/10.1016/j.ajog.2007.09.016 .

Brocklehurst P, Gordon A, Heatley E, Milan SJ. Antibiotics for treating bacterial vaginosis in pregnancy. Cochrane Database Syst Rev. 2013;1:CD000262. https://doi.org/10.1002/14651858.CD000262.pub4 .

McDonald HM, Brocklehurst P, Gordon A. Antibiotics for treating bacterial vaginosis in pregnancy. Cochrane Database Syst Rev. 2007;1:CD000262.  https://doi.org/10.1002/14651858.CD000262.pub3 .

Joint FAO/WHO. Expert consultation on evaluation of health and nutritional properties of probiotics in food including powder milk with live lactic acid bacteria. Published online 1 Oct 2001.

Hill C, Guarner F, Reid G, et al. Expert consensus document. The international scientific association for probiotics and prebiotics consensus statement on the scope and appropriate use of the term probiotic. Nat Rev Gastroenterol Hepatol. 2014;11(8):506–14. https://doi.org/10.1038/nrgastro.2014.66 .

Gibson GR, Hutkins R, Sanders ME, et al. Expert consensus document: the International Scientific Association for Probiotics and Prebiotics (ISAPP) consensus statement on the definition and scope of prebiotics. Nat Rev Gastroenterol Hepatol. 2017;14(8):491–502. https://doi.org/10.1038/nrgastro.2017.75 .

Reid G, Bruce AW, Fraser N, Heinemann C, Owen J, Henning B. Oral probiotics can resolve urogenital infections. FEMS Immunol Med Microbiol. 2001;30(1):49–52. https://doi.org/10.1111/j.1574-695X.2001.tb01549.x .

Reznichenko H, Henyk N, Maliuk V, et al. Oral intake of lactobacilli can be helpful in symptomatic bacterial vaginosis: a randomized clinical study. J Low Genit Tract Dis. 2020;24(3):284–9. https://doi.org/10.1097/LGT.0000000000000518 .

Yadav J, Das V, Kumar N, Agrawal S, Pandey A, Agrawal A. Vaginal probiotics as an adjunct to antibiotic prophylaxis in the management of preterm premature rupture of the membranes. J Obstet Gynaecol. 2022;42(5):1037–42. https://doi.org/10.1080/01443615.2021.1993803 .

Pérez-Castillo ÍM, Fernández-Castillo R, Lasserrot-Cuadrado A, Gallo-Vallejo JL, Rojas-Carvajal AM, Aguilar-Cordero MJ. Reporting of perinatal outcomes in probiotic randomized controlled trials. A systematic review and meta-analysis. Nutrients. 2021;13(1):256. https://doi.org/10.3390/nu13010256 .

Husain S, Allotey J, Drymoussi Z, et al. Effects of oral probiotic supplements on vaginal microbiota during pregnancy: a randomised, double-blind, placebo-controlled trial with microbiome analysis. BJOG. 2020;127(2):275–84. https://doi.org/10.1111/1471-0528.15675 .

Jarde A, Lewis-Mikhael AM, Moayyedi P, et al. Pregnancy outcomes in women taking probiotics or prebiotics: a systematic review and meta-analysis. BMC Pregnancy Childbirth. 2018;18(1):14. https://doi.org/10.1186/s12884-017-1629-5 .

Kirihara N, Kamitomo M, Tabira T, Hashimoto T, Taniguchi H, Maeda T. Effect of probiotics on perinatal outcome in patients at high risk of preterm birth. J Obstet Gynaecol Res. 2018;44(2):241–7. https://doi.org/10.1111/jog.13497 .

Daskalakis GJ, Karambelas AK. Vaginal probiotic administration in the management of preterm premature rupture of membranes. Fetal Diagn Ther. 2017;42(2):92–8. https://doi.org/10.1159/000450995 .

Gille C, Böer B, Marschal M, et al. Effect of probiotics on vaginal health in pregnancy. EFFPRO, a randomized controlled trial. Am J Obstet Gynecol. 2016;215(5):608.e1–608.e7. https://doi.org/10.1016/j.ajog.2016.06.021 .

Othman M, Neilson JP, Alfirevic Z. Probiotics for preventing preterm labour. Cochrane Database Syst Rev. 2007;1:CD005941.  https://doi.org/10.1002/14651858.CD005941.pub2 .

Krauss-Silva L, Moreira MEL, Alves MB, et al. Randomized controlled trial of probiotics for the prevention of spontaneous preterm delivery associated with intrauterine infection: study protocol. Reprod Health. 2010;7:14.  https://doi.org/10.1186/1742-4755-7-14 .

Wickens KL, Barthow CA, Murphy R, et al. Early pregnancy probiotic supplementation with Lactobacillus rhamnosus HN001 may reduce the prevalence of gestational diabetes mellitus: a randomised controlled trial. Br J Nutr. 2017;117(6):804–13. https://doi.org/10.1017/S0007114517000289 .

Callaway LK, McIntyre HD, Barrett HL, et al. Probiotics for the prevention of gestational diabetes mellitus in overweight and obese women: findings from the SPRING double-blind randomized controlled trial. Diabetes Care. 2019;42(3):364–71. https://doi.org/10.2337/dc18-2248 .

Pellonperä O, Mokkala K, Houttu N, et al. Efficacy of fish oil and/or probiotic intervention on the incidence of gestational diabetes mellitus in an at-risk group of overweight and obese women: a randomized, placebo-controlled, double-blind clinical trial. Dia Care. 2019;42(6):1009–17. https://doi.org/10.2337/dc18-2591 .

Okesene-Gafa KA, Moore AE, Jordan V, McCowan L, Crowther CA. Probiotic treatment for women with gestational diabetes to improve maternal and infant health and well-being. Cochrane Pregnancy and Childbirth Group, ed. Cochrane Database of Syst Rev. 2020;2020(6). https://doi.org/10.1002/14651858.CD012970.pub2 .

Yang S, Reid G, Challis JRG, et al. Effect of oral probiotic Lactobacillus rhamnosus GR-1 and Lactobacillusreuteri RC-14 on the vaginal microbiota, cytokines and chemokines in pregnant women. Nutrients. 2020;12(2):368.  https://doi.org/10.3390/nu12020368 .

Halkjær SI, de Knegt VE, Lo B, et al. Multistrain probiotic increases the gut microbiota diversity in obese pregnant women: results from a randomized, double-blind placebo-controlled study. Curr Dev Nutr. 2020;4(7):nzaa095. https://doi.org/10.1093/cdn/nzaa095 .

Lepargneur JP. Lactobacillus crispatus as biomarker of the healthy vaginal tract. Ann Biol Clin (Paris). 2016;74(4):421–7. https://doi.org/10.1684/abc.2016.1169 .

Almeida MO, Carmo FLR do, Gala-García A, et al. Lactobacillus crispatus protects against bacterial vaginosis. Genet Mol Res. 2019;18(4). https://doi.org/10.4238/gmr18475

Verstraelen H, Verhelst R, Claeys G, De Backer E, Temmerman M, Vaneechoutte M. Longitudinal analysis of the vaginal microflora in pregnancy suggests that L. crispatus promotes the stability of the normal vaginal microflora and that L. gasseri and/or L. iners are more conducive to the occurrence of abnormal vaginal microflora. BMC Microbiol. 2009;9:116. https://doi.org/10.1186/1471-2180-9-116 .

Dugoua JJ, Machado M, Zhu X, Chen X, Koren G, Einarson TR. Probiotic safety in pregnancy: a systematic review and meta-analysis of randomized controlled trials of Lactobacillus, Bifidobacterium, and Saccharomyces spp. J Obstet Gynaecol Can. 2009;31(6):542–52. https://doi.org/10.1016/S1701-2163(16)34218-9 .

Yang S, Reid G, Challis JRG, Kim SO, Gloor GB, Bocking AD. Is there a role for probiotics in the prevention of preterm birth? Front Immunol. 2015;6:62.  https://doi.org/10.3389/fimmu.2015.00062 .

Ravel J, Gajer P, Abdo Z, et al. Vaginal microbiome of reproductive-age women. Proc Natl Acad Sci USA. 2011;108(Suppl 1):4680–7. https://doi.org/10.1073/pnas.1002611107 .

van ’t Hooft J, Duffy JMN, Daly M, et al. A core outcome set for evaluation of interventions to prevent preterm birth. Obstet Gynecol. 2016;127(1):49–58. https://doi.org/10.1097/AOG.0000000000001195 .

McFarland LV. Efficacy of single-strain probiotics versus multi-strain mixtures: systematic review of strain and disease specificity. Dig Dis Sci. 2021;66(3):694–704. https://doi.org/10.1007/s10620-020-06244-z .

Timmerman HM, Koning CJM, Mulder L, Rombouts FM, Beynen AC. Monostrain, multistrain and multispecies probiotics–a comparison of functionality and efficacy. Int J Food Microbiol. 2004;96(3):219–33. https://doi.org/10.1016/j.ijfoodmicro.2004.05.012 .

Reid G, Beuerman D, Heinemann C, Bruce AW. Probiotic Lactobacillus dose required to restore and maintain a normal vaginal flora. FEMS Immunol Med Microbiol. 2001;32(1):37–41. https://doi.org/10.1111/j.1574-695X.2001.tb00531.x .

Reid G, Bocking A. The potential for probiotics to prevent bacterial vaginosis and preterm labor. Am J Obstet Gynecol. 2003;189(4):1202–8. https://doi.org/10.1067/s0002-9378(03)00495-2 .

Cribby S, Taylor M, Reid G. Vaginal microbiota and the use of probiotics. Interdiscip Perspect Infect Dis. 2008;2008:256490.  https://doi.org/10.1155/2008/256490 .

Antonio MAD, Rabe LK, Hillier SL. Colonization of the rectum by Lactobacillus species and decreased risk of bacterial vaginosis. J Infect Dis. 2005;192(3):394–8. https://doi.org/10.1086/430926 .

Petricevic L, Domig KJ, Nierscher FJ, et al. Characterisation of the oral, vaginal and rectal Lactobacillus flora in healthy pregnant and postmenopausal women. Eur J Obstet Gynecol Reprod Biol. 2012;160(1):93–9. https://doi.org/10.1016/j.ejogrb.2011.10.002 .

Download references

Acknowledgements

N/A. Everyone who contributed to the study protocol and manuscript meets the criteria for authorship.

This study (KCE20-1273) is funded by the Belgian Health Care Knowledge Centre (KCE) under the KCE Trials Programme.

Author information

Authors and affiliations.

Department of Obstetrics and Gynaecology, Ziekenhuis Oost-Limburg, Genk, Belgium

Katrien Nulens, Els Papy & Caroline Van Holsbeke

Department of Development and Regeneration, KULeuven, Cluster Woman and Child, Leuven, Belgium

Katrien Nulens, Dirk Timmerman & Roland Devlieger

Clinical Trial Unit, Ziekenhuis Oost-Limburg, Genk, Belgium

Katrien Tartaglia

Department of Obstetrics and Gynaecology, Ghent University Hospital, Ghent, Belgium

Isabelle Dehaene

Department of Obstetrics and Gynaecology, AZ Sint-Lucas, Bruges, Belgium

Hilde Logghe & Caroline Van Holsbeke

Department of Obstetrics and Gynaecology, AZ Sint-Jan, Bruges, Belgium

Hilde Logghe, Joachim Van Keirsbilck & Caroline Van Holsbeke

Department of Obstetrics and Gynaecology, Hopital Citadelle, CHU Liège, Liège, Belgium

Frédéric Chantraine & Veronique Masson

Department of Obstetrics and Gynaecology, AZ Groeninge, Kortrijk, Belgium

Eva Simoens

Department of Paediatrics and Neonatal Intensive Care Unit, Ziekenhuis Oost-Limburg, Genk, Belgium

Willem Gysemans

Data Science Institute, I-Biostat, Hasselt University, Diepenbeek, Belgium

Liesbeth Bruckers

Department of Bioscience Engineering, Research Group Applied Microbiology and Biotechnology, University of Antwerp, Antwerp, Belgium

Sarah Lebeer, Camille Nina Allonsius & Eline Oerlemans

Department of Microbiology, Ziekenhuis Oost-Limburg, Genk, Belgium

Deborah Steensels

Faculty of Medicine, Université Libre de Bruxelles, Brussels, Belgium

Department of Microbiology, AZ Sint-Jan, Bruges, Belgium

Marijke Reynders

Department of Obstetrics and Gynaecology, University Hospitals Leuven, Leuven, Belgium

Dirk Timmerman & Roland Devlieger

You can also search for this author in PubMed   Google Scholar

Contributions

CVH is the Chief Investigator; she conceived the study, led the proposal and protocol development, and wrote the manuscript. KN is co-investigator; she conceived the study, designed the study protocol, provided clinical input, and wrote the manuscript. EP is the project manager; she designed the study protocol, provided support with grant application and clinical trial management, and wrote the manuscript. KT designed the study protocol, provided support with grant application and clinical trial management, and wrote the manuscript. FC is the co-Chief Investigator; he provided domain knowledge expertise and clinical input, and revised the manuscript. RD is co-investigator; he provided domain knowledge expertise, methodological and clinical input, and revised the manuscript. JVK is co-investigator; he provided domain knowledge expertise, methodological and clinical input, and revised the manuscript. ID is co-investigator; she provided domain knowledge expertise, methodological and clinical input, and revised the manuscript. VM is co-investigator; she provided domain knowledge expertise and revised the manuscript. ES is co-investigator; she provided domain knowledge expertise and revised the manuscript. DT provided domain knowledge expertise, methodological and clinical input, and revised the manuscript. WG provided domain knowledge expertise and clinical input, and revised the manuscript. HL provided domain knowledge expertise and clinical input, and revised the manuscript. LB provided statistical and epidemiological support, and revised the manuscript. DS provided domain knowledge expertise and clinical input, and revised the manuscript. MR provided domain knowledge expertise and clinical input, and revised the manuscript. SL provided domain knowledge expertise and clinical input, and revised the manuscript. CA provided domain knowledge expertise and clinical input, and revised the manuscript. EO provided domain knowledge expertise and clinical input, and revised the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Katrien Nulens .

Ethics declarations

Ethics approval and consent to participate {24}.

Ethics Committee of University of Leuven (KU Leuven), Leuven, Belgium, number S67065. Written, informed consent to participate will be obtained from all participants.

Consent for publication {32}

Not applicable.

Competing interests {28}

The authors declare that they have no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/ .

Reprints and permissions

About this article

Cite this article.

Nulens, K., Papy, E., Tartaglia, K. et al. Synbiotics in patients at risk for spontaneous preterm birth: protocol for a multi-centre, double-blind, randomised placebo-controlled trial (PRIORI). Trials 25 , 615 (2024). https://doi.org/10.1186/s13063-024-08444-8

Download citation

Received : 29 April 2024

Accepted : 02 September 2024

Published : 17 September 2024

DOI : https://doi.org/10.1186/s13063-024-08444-8

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Prematurity
  • Vaginal microbiome
  • Lactobacilli

ISSN: 1745-6215

  • Submission enquiries: Access here and click Contact Us
  • General enquiries: [email protected]

inclusion and exclusion criteria in literature review importance

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

healthcare-logo

Article Menu

inclusion and exclusion criteria in literature review importance

  • Subscribe SciFeed
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

The relationship between training load and injury risk in basketball: a systematic review.

inclusion and exclusion criteria in literature review importance

1. Introduction

2. materials and methods, 2.1. literature search, 2.2. selection criteria, 2.3. quality assessment, 2.4. data extraction and analysis, 3.1. article identification, 3.2. description of the included articles, 3.3. definitions of injury, 3.4. measures of load, 3.5. assessment of article quality, level of evidence, and conflict of interest, 4. discussion, 4.1. load monitoring in basketball, 4.2. training and/or competition time and injury risk, 4.3. relative load, rapid changes in load, and injury risk, 4.4. minutes played per game (mpg) and injury risk, 4.5. sleep and injury risk, 4.6. competition calendar congestion and injury risk, 4.7. limitation, 4.8. practical applications and future direction, 5. conclusions, author contributions, institutional review board statement, informed consent statement, data availability statement, acknowledgments, conflicts of interest.

  • Soligard, T.; Schwellnus, M.; Alonso, J.-M.; Bahr, R.; Clarsen, B.; Dijkstra, H.P.; Gabbett, T.; Gleeson, M.; Hägglund, M.; Hutchinson, M.R.; et al. How much is too much? (Part 1) International Olympic Committee consensus statement on load in sport and risk of injury. Br. J. Sports Med. 2016 , 50 , 1030–1041. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Moreno-Pérez, V.; Ruiz, J.; Vazquez-Guerrero, J.; Rodas, G.; Del Coso, J. Training and competition injury epidemiology in professional basketball players: A prospective observational study. Physician Sportsmed. 2023 , 51 , 121–128. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Gabbett, T.J. Debunking the myths about training load, injury and performance: Empirical evidence, hot topics and recommendations for practitioners. Br. J. Sports Med. 2020 , 54 , 58–66. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Lewis, M. It’s a hard-knock life: Game load, fatigue, and injury risk in the National Basketball Association. J. Athl. Train. 2018 , 53 , 503–509. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Bullock, G.S.; Ferguson, T.; Vaughan, J.; Gillespie, D.; Collins, G.; Kluzek, S. Temporal trends and severity in injury and illness incidence in the National Basketball Association over 11 seasons. Orthop. J. Sports Med. 2021 , 9 , 23259671211004094. [ Google Scholar ] [ CrossRef ]
  • Torres-Ronda, L.; Gámez, I.; Robertson, S.; Fernández, J. Epidemiology and injury trends in the National Basketball Association: Pre- and per-COVID-19 (2017–2021). PLoS ONE 2022 , 17 , e0263354. [ Google Scholar ] [ CrossRef ]
  • Esteves, P.T.; Mikolajec, K.; Schelling, X.; Sampaio, J. Basketball performance is affected by the schedule congestion: NBA back-to-backs under the microscope. Eur. J. Sport Sci. 2021 , 21 , 26–35. [ Google Scholar ] [ CrossRef ]
  • Wang, X.; Zhang, S.; Gasperi, L.; Robertson, S.; Ruano, M.A.G. Rest or rust? Complex influence of schedule congestion on the home advantage in the National Basketball Association. Chaos Solit. Fractals 2023 , 174 , 113698. [ Google Scholar ] [ CrossRef ]
  • Morikawa, L.H.; Tummala, S.V.; Brinkman, J.C.; Buckner Petty, S.A.; Chhabra, A. Effect of a condensed NBA season on injury risk: An analysis of the 2020 season and player safety. Ortho. J. Sports Med. 2022 , 10 , 23259671221121116. [ Google Scholar ] [ CrossRef ]
  • Teramoto, M.; Cross, C.L.; Cushman, D.M.; Maak, T.G.; Petron, D.J.; Willick, S.E. Game injuries in relation to game schedules in the National Basketball Association. J. Sci. Med. Sport 2017 , 20 , 230–235. [ Google Scholar ] [ CrossRef ]
  • Eckard, T.G.; Padua, D.A.; Hearn, D.W.; Pexa, B.S.; Frank, B.S. The relationship between training load and injury in athletes: A systematic review. Sports Med. 2018 , 48 , 1929–1961. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Jones, C.M.; Griffiths, P.C.; Mellalieu, S.D. Training load and fatigue marker associations with injury and illness: A systematic review of longitudinal studies. Sports Med. 2017 , 47 , 943–974. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Drew, M.K.; Finch, C.F. The relationship between training load and injury, illness and soreness: A systematic and literature review. Sports Med. 2016 , 46 , 861–883. [ Google Scholar ] [ CrossRef ]
  • Perrett, C.; Lamb, P.; Bussey, M. Is there an association between external workload and lower-back injuries in cricket fast bowlers? A systematic review. Phys. Ther. Sport 2020 , 41 , 71–79. [ Google Scholar ] [ CrossRef ]
  • Coughlin, R.P.; Lee, Y.; Horner, N.S.; Simunovic, N.; Cadet, E.R.; Ayeni, O.R. Increased pitch velocity and workload are common risk factors for ulnar collateral ligament injury in baseball players: A systematic review. J. ISAKOS 2019 , 4 , 41–47. [ Google Scholar ] [ CrossRef ]
  • Wells, G.; Shea, B.; O’Connell, D.; Peterson, J.; Welch, V.; Losos, M.; Tugwell, P. The Newcastle-Ottawa Scale (NOS) for Assessing the Quality of Nonrandomised Studies in Meta-Analyses ; University of Ottawa: Ottawa, ON, Canada, 2000. [ Google Scholar ]
  • Ferioli, D.; La Torre, A.; Tibiletti, E.; Dotto, A.; Rampinini, E. Determining the relationship between load markers and non-contact injuries during the competitive season among professional and semi-professional basketball players. Res. Sports Med. 2021 , 29 , 265–276. [ Google Scholar ] [ CrossRef ]
  • Garcia, L.; Planas, A.; Peirau, X. Analysis of the injuries and workload evolution using the RPE and s-RPE method in basketball. Apunt. Sports Med. 2022 , 57 , 100372. [ Google Scholar ] [ CrossRef ]
  • Caparrós, T.; Alentorn-Geli, E.; Myer, G.D.; Capdevila, L.; Samuelsson, K.; Hamilton, B.; Rodas, G. The relationship of practice exposure and injury rate on game performance and season success in professional male basketball. J. Sports Sci. Med. 2016 , 15 , 397–402. [ Google Scholar ] [ PubMed ]
  • Caparrós, T.; Casals, M.; Solana, Á.; Peña, J. Low external workloads are related to higher injury risk in professional male basketball games. J. Sports Sci. Med. 2018 , 17 , 289–297. [ Google Scholar ]
  • Piedra, A.; Peña, J.; Ciavattini, V.; Caparrós, T. Relationship between injury risk, workload, and rate of perceived exertion in professional women’s basketball. Apunt. Sports Med. 2020 , 55 , 71–79. [ Google Scholar ] [ CrossRef ]
  • Doeven, S.H.; Brink, M.S.; Huijgen, B.C.H.; de Jong, J.; Lemmink, K.A.P.M. Managing load to optimize well-being and recovery during short-term match congestion in elite basketball. Int. J. Sports Physiol. Perform. 2021 , 16 , 45–50. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Benson, L.C.; Owoeye, O.B.A.; Räisänen, A.M.; Stilling, C.; Edwards, W.B.; Emery, C.A. Magnitude, frequency, and accumulation: Workload among injured and uninjured youth basketball players. Front. Sports Act. Living 2021 , 3 , 607205. [ Google Scholar ] [ CrossRef ]
  • Anderson, L.; Triplett-Mcbride, T.; Foster, C.; Doberstein, S.; Brice, G. Impact of training patterns on incidence of illness and injury during a women’s collegiate basketball season. J. Strength Cond. Res. 2003 , 17 , 734–738. [ Google Scholar ] [ CrossRef ]
  • Watson, A.; Johnson, M.; Sanfilippo, J. Decreased sleep is an independent predictor of in-season injury in male collegiate basketball players. Orthop. J. Sports Med. 2020 , 8 , 2325967120964481. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Orringer, M.J.; Pandya, N.K. Acutely increased workload is correlated with significant injuries among National Basketball Association players. Int. J. Sports Sci. Coach. 2022 , 17 , 568–575. [ Google Scholar ] [ CrossRef ]
  • Menon, S.; Morikawa, L.; Tummala, S.V.; Buckner-Petty, S.; Chhabra, A. The primary risk factors for season-ending injuries in professional basketball are minutes played per game and later season games. Arthrosc. J. Arthrosc. Relat. Surg. 2024 , in press . [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Senbel, S.; Sharma, S.; Raval, M.S.; Taber, C.; Nolan, J.; Artan, N.S.; Ezzeddine, D.; Kaya, T. Impact of sleep and training on game performance and injury in division-1 women’s basketball amidst the pandemic. IEEE Access 2022 , 10 , 15516–15527. [ Google Scholar ] [ CrossRef ]
  • Weiss, K.J.; Allen, S.V.; McGuigan, M.R.; Whatman, C.S. The relationship between training load and injury in men’s professional basketball. Int. J. Sports Physiol. Perform. 2017 , 12 , 1238–1242. [ Google Scholar ] [ CrossRef ]
  • Gianoudis, J.; Webster, K.E.; Cook, J. Volume of physical activity and injury occurrence in young basketball players. J. Sports Sci. Med. 2008 , 7 , 139–143. [ Google Scholar ]
  • Fuller, C.W.; Ekstrand, J.; Junge, A.; Andersen, T.E.; Bahr, R.; Dvorak, J.; Hägglund, M.; McCrory, P.; Meeuwisse, W.H. Consensus statement on injury definitions and data collection procedures in studies of football (soccer) injuries. Br. J. Sports Med. 2006 , 40 , 193–201. [ Google Scholar ] [ CrossRef ]
  • Hagglund, M.; Walden, M.; Bahr, R.; Ekstrand, J. Methods for epidemiological study of injuries to professional football players: Developing the UEFA model. Br. J. Sports Med. 2005 , 39 , 340–346. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Clarsen, B.; Myklebust, G.; Bahr, R. Development and validation of a new method for the registration of overuse injuries in sports injury epidemiology: The Oslo Sports Trauma Research Centre (OSTRC) overuse injury questionnaire. Br. J. Sports Med. 2013 , 47 , 495–502. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Noyes, F.R.; Lindenfeld, T.N.; Marshall, M.T. What determines an athletic injury (definition)? Who determines an injury (occurrence)? Am. J. Sports Med. 1988 , 16 (Suppl. S1), S-65–S-68. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Foster, C.; Florhaug, J.A.; Franklin, J.; Gottschall, L.; Hrovatin, L.A.; Parker, S.; Doleshal, P.; Dodge, C. A new approach to monitoring exercise training. J. Strength Cond. Res. 2001 , 15 , 109–115. [ Google Scholar ] [ CrossRef ]
  • Borg, G. Borg’s Perceived Exertion and Pain Scales ; Human Kinetics: Champaign, IL, USA, 1988. [ Google Scholar ]
  • Foster, C. Monitoring training in athletes with reference to overtraining syndrome. Med. Sci. Sports Exerc. 1998 , 30 , 1164–1168. [ Google Scholar ] [ CrossRef ]
  • Russell, J.L.; McLean, B.D.; Impellizzeri, F.M.; Strack, D.S.; Coutts, A.J. Measuring physical demands in basketball: An explorative systematic review of practices. Sports Med. 2021 , 51 , 81–112. [ Google Scholar ] [ CrossRef ]
  • Piedra, A.; Peña, J.; Caparrós, T. Monitoring Training Loads in Basketball: A Narrative Review and Practical Guide for Coaches and Practitioners. Strength Cond. J. 2021 , 43 , 12–35. [ Google Scholar ] [ CrossRef ]
  • Svilar, L.; Castellano, J.; Jukic, I.; Casamichana, D. Positional differences in elite basketball: Selecting appropriate training-load measures. Int. J. Sports Physiol Perform. 2018 , 13 , 947–952. [ Google Scholar ] [ CrossRef ]
  • Koyama, T.; Rikukawa, A.; Nagano, Y.; Sasaki, S.; Ichikawa, H.; Hirose, N. High-acceleration movement, muscle damage, and perceived exertion in basketball games. Int. J. Sports Physiol. Perform. 2022 , 17 , 16–21. [ Google Scholar ] [ CrossRef ]
  • Vázquez-Guerrero, J.; Suarez-Arrones, L.; Gómez, D.C.; Rodas, G. Comparing external total load, acceleration and deceleration outputs in elite basketball players across positions during match play. Kinesiology 2018 , 50 , 228–234. [ Google Scholar ] [ CrossRef ]
  • Petway, A.J.; Freitas, T.T.; Calleja-González, J.; Medina Leal, D.; Alcaraz, P.E. Training load and match-play demands in basketball based on competition level: A systematic review. PLoS ONE 2020 , 15 , e0229212. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Griffin, A.; Kenny, I.C.; Comyns, T.M.; Lyons, M. The association between the acute: Chronic workload ratio and injury and its application in team sports: A systematic review. Sports Med. 2020 , 50 , 561–580. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Andrade, R.; Wik, E.H.; Rebelo-Marques, A.; Blanch, P.; Whiteley, R.; Espregueira-Mendes, J.; Gabbett, T.J. Is the acute: Chronic workload ratio (ACWR) associated with risk of time-loss injury in professional team sports? A systematic review of methodology, variables and injury risk in practical situations. Sports Med. 2020 , 50 , 1613–1635. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Tummala, S.V.; Morikawa, L.; Brinkman, J.; Crijns, T.J.; Economopoulos, K.; Chhabra, A. Knee Injuries and associated risk factors in National Basketball Association athletes. Arthroc. Sports Med. Rehabil. 2022 , 4 , e1639–e1645. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Tummala, S.V.; Morikawa, L.; Brinkman, J.C.; Crijns, T.J.; Vij, N.; Gill, V.; Kile, T.A.; Patel, K.; Chhabra, A. Characterization of ankle injuries and associated risk factors in the National Basketball Association: Minutes per game and usage rate associated with time loss. Ortho. J. Sports Med. 2023 , 11 , 23259671231184459. [ Google Scholar ] [ CrossRef ]
  • Okoroha, K.R.; Marfo, K.; Meta, F.; Matar, R.; Shehab, R.; Thompson, T.; Moutzouros, V.; Makhni, E.C. Amount of minutes played does not contribute to anterior cruciate ligament injury in National Basketball Association athletes. Orthopedics 2017 , 40 , e658–e662. [ Google Scholar ] [ CrossRef ]
  • Page, R.M.; Field, A.; Langley, B.; Harper, L.D.; Julian, R. The effects of fixture congestion on injury in professional male soccer: A systematic review. Sports Med. 2023 , 53 , 667–685. [ Google Scholar ] [ CrossRef ]
  • Willberg, C.; Wieland, B.; Rettenmaier, L.; Behringer, M.; Zentgraf, K. The relationship between external and internal load parameters in 3 x 3 basketball tournaments. BMC Sports Sci. Med. Rehabil. 2022 , 14 , 152. [ Google Scholar ] [ CrossRef ]

Click here to enlarge figure

VariableSearch Strings
Training load and/or competition load“load *” OR “workload *” OR “train *” OR “compet *” OR “recovery” OR “volume *” OR “intensit *” OR “duration *” OR “stress *” OR “congestion” OR “saturation” OR “distance” OR “exposure *” OR “hours” OR “days” OR “weeks” OR “jump *” OR “psychosocial *” OR “travel” OR “acute:chronic load ratio” OR “acute: chronic workload ratio” OR “ACWR” OR “exponentially weighted moving average” OR “EWMA” OR “perception of effort” OR “rating of perceived exertion” OR “RPE”
ArticlesParticipantsInjury DefinitionInternal LoadExternal LoadSummary of Findings
Anderson et al. [ ]
Prospective
USA
1 season
n = 12
NCAA D3
All female
Age 18–22
An injury was defined as a circumstance in which the athlete received an evaluation from the team’s student athletic trainer and ATC and required limiting their practice for at least 1 day NilA moderately positive correlation was found between weekly injuries and total weekly training load (p ≤ 0.01; r = 0.675) and between strain and monotony (p ≤ 0.01; r = 0.668) in A Pearson Product Moment correlation
Gianoudis et al. [ ]
Prospective
Australia
1 season
n = 46
28 males
18 females
High school
Mean age 16.0
An injury was defined as an incident related to physical activity, that resulted in either time lost from athletic participation, medical diagnosis and treatment, or the presence of pain or discomfortNil No significant differences were found in the average weekly participation hours of physical activity of injured and uninjured players in independent t test (p = 0.67)
Caparrós et al. [ ]
Retrospective
Spain
7 seasons
n = 44
F.C. Barcelona
No gender information
Age 27.6 ± 4.1
Time-loss injury: any injury occurring during a practice season or matches that caused an absence for at least the next practice session or matchNil A strong positive correlation between exposure (total number of practices and hours of exposure) and the total number of injuries in Pearson’s correlation (r = 0.77; p = 0.04)
Weiss et al. [ ]
Prospective
Australia
1 season
n = 13
Australia New Zealand Basketball League
All male
Age 24.4 ± 4.7
Self-reported injury: Oslo Sports Trauma Research Center Injury Questionnaire NilProportions of injured squad members at workload ratios between 1.0–1.49 were substantially less than those observed at all other ratios by clear small to moderate amounts. Workload ratios ≤ 0.5, between 0.5–0.99, and ≥1.5 resulted in 1.5, 1.4, and 1.7 times more injured players, respectively. Comparisons between all other workload ratio ranges were trivial-to-small in magnitude and unclear, using the 90% CI to determine the significance
Caparrós et al. [ ]
Retrospective
Spain
3 seasons
n = 33
Professional team
All male
Age 24.9 ± 2.9
Time-loss injury: any injury (contact and non-contact) occurring during a practice session or game that caused an absence for at least the next practice session or competitionNil A significant higher risk of injury during games were found in athletes with ≤3 decelerations with 2 m/s (IRR, 4.36; 95% CI, 1.78–10.6) and those running ≤ 1.3 miles (lower workload) (IRR, 6.42; 95% CI, 2.52–16.3) (p < 0.01 in both cases)
Piedra et al. [ ]
Prospective
Spain
1 season
n = 11
Women’s league 1
All female
Age 23.36 ± 2.99
Muscular pain/injuries required attention of the team physiotherapist/time-loss injury: any injury that occurred during training or a game and that led to the absence for at least the following session or game Several significant differences were observed between the injury risk values and the morning RPE (F = 5.0811; p = 0.032), the sRPE of the morning practices (F = 7.3585; p = 0.010) and the total time of exposure (F = 3.5055; p = 0.064) in the one-way ANOVA test. Significant negative relationship was observed between total training time and the number of time-loss injuries (rho = −0.797; p = 0.003) in the Spearman Rho test, as well as a possible association was observed between exposure time and a lower risk of time-loss injury (R = 0.645) in lineal regression analysis
Watson et al. [ ]
Prospective
USA
2 seasons
n = 19
NCAA D1
All male
No age information
Time-loss injury: recorded by the team athletic trainer NilIn the initial prediction models that were conducted separately, several factors were found to be significantly predictive of in-season injury. These factors included mood, fatigue, stress, soreness, and sleep duration (p < 0.001 for all), with odds ratios ranging from 0.41 to 0.57. However, in the subsequent multivariable models, only sleep duration and soreness remained significant, independent predictors, with odds ratios ranging from 0.52 to 0.69 and 0.65, respectively (p < 0.001 and p = 0.024, respectively). Mood, fatigue, and stress were no longer significant predictors, with odds ratios ranging from 1.1 to 1.2 and p values ranging from 0.43 to 0.69.
Benson et al. [ ]
Prospective
Canada
1 season
n = 49
25 males
24 females
High school teams in Calgary, AB
Age 16.5 ± 0.6
Medical attention/time-loss injury: any physical complaint, including pain, ache, joint instability, stiffness, or any other complaint resulting from participating in basketball-related activities A low workload accumulation over 3 and 4 weeks coupled with a high 1-week workload could contribute to injury risk
Doeven et al. [ ]
Prospective
Netherlands
1 season
n = 16
Dutch Basketball League
All male
Age 24.8 ± 2.0
Self-reported injury: Oslo Sports Trauma Research Center Questionnaire No significant differences for severity scores and time loss were observed between short-term match congestion and regular competition
Ferioli et al. [ ]
Prospective
Italy
2 seasons
n = 35
Italian Basketball League (D1-D3)
All male
Age 24 ± 6
Time-loss injury (non-contact injuries only): when a player was unable to fully take part in future
basketball training or match due to physical complaints
The study did not find any significant associations between the load markers and non-contact injuries (all p > 0.05). Additionally, the load markers exhibited no ability to predict injuries, as evidenced by the low Area Under the Curve (AUC) range of 0.468 to 0.537 and Youden index range of 0.019 to 0.132.
Garcia et al. [ ]
Prospective
Spain
1 season
n = 8
Pardinyes competed in the “Leb plata” category
All male
Age 23.5 ± 2.56
A time-loss injury in basketball refers to a physical ailment sustained by a player during a match or training, caused by excessive transfer of energy that surpasses the body’s ability to maintain its structural and/or functional integrity. Such injuries result in the player being unable to fully participate in future basketball training or match play. A directly proportional but statistically non-significant relationship was observed in the connection between microtrauma injuries and RPE (F = 3.492; p = 0.112), but there is a directly proportional and statistically significant association between the team’s RPE and the one perceived by the coach (r = 0.775; p < 0.001)
Orringer & Pandya [ ]
Retrospective
USA
3 seasons
n = 34
NBA
All male
Age 26.6 ± 4.89
Significant in-game injury leading to missing at least 10 consecutive games from (accessed on 21 July 2021)Nil A higher number of minutes played per game in the three (4.9% increase, p = 0.04), five (5.8% increase, p = 0.004), and ten (4.0% increase, p = 0.02) games prior to the injury were significantly associated with a greater likelihood of injury occurrence.
Seibel et al. [ ]
Prospective
USA
1 season
n = 16
NCAA D1
All female
No age information
Injury data were extracted from medical injury reports generated as injuries occurred The study found that rapid eye movement (REM) sleep was the most significant contributor to injuries, with a 0.11 correlation coefficient for CORR, 2.7% for XGB, and 12.9% for RFC models. Additionally, low (<20%) and high (>30%) percentages of REM sleep increase the likelihood of injury. The partial dependence plots (PDPs) indicated that sleep disturbances increase when the respiratory rate falls outside the typical range of 12–18 repetitions per minute. Consequently, this increases the risk of injury.
Menon et al. [ ]
Retrospective
USA
5 seasons
n = 196
NBA
All male
No age information
Season-ending injuries (SEIs) from Pro Sports Transactions: any injury that resulted in failure to return at least 5 games before the end of the team’s game scheduleNil A SEIs was significantly associated with minutes per game (odds ratio, 1.06, 95% confidence interval, 1.04–1.08, p < 0.001)
StudyNOS ScoreLevel of Evidence
SelectionComparabilityOutcomeTotal Scores
Anderson et al. (2003) [ ]3126G1
Gianoudis et al. (2008) [ ]2024P0
Caparrós et al. (2016) [ ]4127G1
Weiss et al. (2017) [ ]3115P1
Caparrós et al. (2018) [ ]3137G1
Piedra et al. (2020) [ ]4127G1
Watson et al. (2020) [ ]3238G2
Benson et al. (2021) [ ]3238G2
Doeven et al. (2021) [ ]3115P1
Ferioli et al. (2021) [ ]3137G1
Garcia et al. (2022) [ ]3115P1
Orringer and Pandya (2022) [ ]3137G1
Senbel et al. (2022) [ ]3238G2
Menon et al. (2024) [ ]3126G1
Median (range)3 (2–4)1 (0–2)2 (1–3)6 (4–8)1 (0–2)
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

Chan, C.-C.; Yung, P.S.-H.; Mok, K.-M. The Relationship between Training Load and Injury Risk in Basketball: A Systematic Review. Healthcare 2024 , 12 , 1829. https://doi.org/10.3390/healthcare12181829

Chan C-C, Yung PS-H, Mok K-M. The Relationship between Training Load and Injury Risk in Basketball: A Systematic Review. Healthcare . 2024; 12(18):1829. https://doi.org/10.3390/healthcare12181829

Chan, Chi-Chung, Patrick Shu-Hang Yung, and Kam-Ming Mok. 2024. "The Relationship between Training Load and Injury Risk in Basketball: A Systematic Review" Healthcare 12, no. 18: 1829. https://doi.org/10.3390/healthcare12181829

Article Metrics

Article access statistics, further information, mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

IMAGES

  1. The Definition Of Business Research Excludes Which Of The Following

    inclusion and exclusion criteria in literature review importance

  2. inclusion and exclusion criteria in literature review examples

    inclusion and exclusion criteria in literature review importance

  3. Inclusion/Exclusion criteria for the literature review.

    inclusion and exclusion criteria in literature review importance

  4. Inclusion and exclusion criteria for review article selection.

    inclusion and exclusion criteria in literature review importance

  5. Literature inclusion and exclusion criteria

    inclusion and exclusion criteria in literature review importance

  6. Inclusion and Exclusion Criteria for Literature Review

    inclusion and exclusion criteria in literature review importance

VIDEO

  1. Inclusion Exclusion Principle part 1

  2. 20. The Inclusion Exclusion Principle

  3. ECE 108

  4. Writing Inclusion and exclusion criteria in medical research by Prof Dr Asif Hanif

  5. What is inclusion+ Exclusion criteria in research methodology

  6. 3-6 Inclusion-Exclusion (First Course in Probability)

COMMENTS

  1. Inclusion and exclusion criteria in research studies: definitions and

    Establishing inclusion and exclusion criteria for study participants is a standard, required practice when designing high-quality research protocols. Inclusion criteria are defined as the key features of the target population that the investigators will use to answer their research question. 2 Typical inclusion criteria include demographic ...

  2. Avoiding Bias in Selecting Studies

    The EPC should carefully consider whether PICOTS criteria are effect modifiers and how inclusion and exclusion criteria may potentially skew the studies and thus results reported in the review. Table 2 below suggests potential implications or biases that may result from specific hypothetical examples of inclusion and exclusion criteria.

  3. Chapter 3: Defining the criteria for including studies and how they

    First, the diseases or conditions of interest should be defined using explicit criteria for establishing their presence (or absence). Criteria that will force the unnecessary exclusion of studies should be avoided. For example, diagnostic criteria that were developed more recently - which may be viewed as the current gold standard for diagnosing the condition of interest - will not have ...

  4. Selecting Studies for Systematic Review: Inclusion and Exclusion Criteria

    The eligibility criteria are liberally applied in the beginning to ensure that relevant studies are included and no study is excluded without thorough evaluation. At the outset, studies are only excluded if they clearly meet one or more of the exclusion criteria. For example, if the focus of review is children, then studies with adult ...

  5. Inclusion and Exclusion Criteria

    Step 1: Developing and testing criteria. Developing the inclusion and exclusion criteria may involve an iterative process of refinement during review conceptualization and construction (see Chapter 2).During conceptualization, criteria may be adjusted as reviewers scope the likely literature base, consult stakeholders, and explore what questions may be feasible or relevant.

  6. LibGuides: Systematic Reviews : Inclusion and Exclusion Criteria

    The inclusion and exclusion criteria must be decided before you start the review. Inclusion criteria is everything a study must have to be included. Exclusion criteria are the factors that would make a study ineligible to be included. Criteria that should be considered include: Type of studies: It is important to select articles with an ...

  7. Inclusion and Exclusion Criteria

    Example: Inclusion and exclusion criteria. Let's say you are studying the effect of a relaxation therapy on women with insomnia. Here are some examples of effective and ineffective ways to phrase your criteria: Inclusion criteria. Bad example: "Subjects will be included in the study if they have insomnia.".

  8. How to Conduct a Systematic Review: A Narrative Literature Review

    Inclusion and exclusion criteria. Establishing inclusion and exclusion criteria come after formulating research questions. The concept of inclusion and exclusion of data in a systematic review provides a basis on which the reviewer draws valid and reliable conclusions regarding the effect of the intervention for the disorder under consideration ...

  9. Setting Inclusion and Exclusion Criteria

    Fig. 6.5. Scoping study for setting inclusion and exclusion critera for archetype systematic literature review. Building on the position of the scoping study in Figure 4.6, this figure depicts that a scoping study informs the inclusion and exclusion criteria that should be integrated in the protocol. Full size image.

  10. PDF Designing Inclusion and Exclusion Criteria

    Carefully look through key studies from your literature review. The inclusion/exclusion criteria from other studies similar to yours may be relevant to your study. They provide insight you may have overlooked or not thought of. Those studies, since they are most reflective of your study goal, are also great resources for the specifici-

  11. Inclusion and exclusion criteria

    Inclusion and exclusion criteria set the boundaries for the systematic review. They are determined after setting the research question usually before the search is conducted, however scoping searches may need to be undertaken to determine appropriate criteria. Many different factors can be used as inclusion or exclusion criteria.

  12. Reviewing the literature

    This may require a comprehensive literature review: this article aims to outline the approaches and stages required and provides a working example of a published review. ... Formulating clear inclusion and exclusion criteria, for example, patient groups, ages, conditions/treatments, sources of evidence/research designs; ... The importance of ...

  13. 4. Apply Inclusion and Exclusion Criteria

    In large systematic reviews, the inclusion/exclusion criteria are applied by at least 2 reviewers to all the studies retrieved by the literature search. A strategy to resolve any disagreements between the reviewers should be outlined in the protocol, such as bringing in a third screener. There are two levels of the screening process.

  14. Inclusion/Exclusion Criteria

    Exclusion criteria are the elements of an article that disqualify the study from inclusion in a literature review. For example, excluded studies: used qualitative methodology; used a certain study design (e.g, observational) are a certain publication type (e.g., systematic reviews) were published before a certain year (must have compelling reason)

  15. Define Inclusion/Exclusion Criteria

    Tip: Choose your criteria carefully to avoid bias. For example, if you exclude non-English language articles, you may be ignoring relevant studies. The following 6-minute video explains the relationship between inclusion and exclusion criteria and database searches.

  16. Systematic Reviews: Inclusion and Exclusion Criteria

    An important part of the SR process is defining what will and will not be included in your review. Inclusion and exclusion criteria are developed after a research question is finalized but before a search is carried out. They determine the limits for the evidence synthesis and are typically reported in the methods section of the publication.

  17. Inclusion & Exclusion Criteria

    You may want to think about criteria that will be used to select articles for your literature review based on your research question. These are commonly known as inclusion criteria and exclusion criteria, and they set the boundaries for the literature review.. Inclusion and exclusion criteria are determined after formulating the research question but usually before the search is conducted ...

  18. Inclusion and exclusion criteria in research studies: definitions and

    Europe PMC is an archive of life sciences journal literature. PRACTICAL SCENARIO. A cross-sectional multicenter study evaluated self-reported adherence to inhaled therapies among patients with COPD in Latin America. 1 Inclusion and exclusion criteria for the study are shown in Chart 1.The authors found that self-reported adherence was low in 20% of the patients, intermediate in 29%, and high ...

  19. Sample Selection in Systematic Literature Reviews of Management

    I used somewhat relaxed inclusion criteria to avoid excluding many of the earlier systematic reviews that did not necessarily refer to the Tranfield et al. (2003) article or did not use the term systematic literature review. That is, I used the question of whether the articles disclosed their inclusion or exclusion criteria as my overriding ...

  20. Developing the Review Question and Inclusion Criteria

    The clarity of the inclusion criteria also ensures the replicability of the review. Methods. It is important to clarify the methods you will use to search the literature, appraise the studies retrieved, and extract and synthesize the data. (These steps will be discussed in later articles in this series.) Conclusion.

  21. Establish your Inclusion and Exclusion criteria

    Using specific criteria will help make sure your final review is as unbiased, transparent and ethical as possible. How to establish your Inclusion and Exclusion criteria To establish your criteria you need to define each aspect of your question to clarify what you are focusing on, and consider if there are any variations you also wish to explore.

  22. The Effectiveness of Psychological Intervention for Women Who Committed

    The clear limitation of this review is evident in its failure to yield any studies that met the inclusion criteria and formally addressed the research question. It is important to note that this result does not necessarily indicate overly narrow inclusion criteria; rather, it suggests that the field of research on this topic remains underexplored.

  23. Inclusion and exclusion criteria in research studies ...

    Inclusion and exclusion criteria in research studies: definitions and why they matter. Inclusion and exclusion criteria in research studies: definitions and why they matter. J Bras Pneumol. 2018 Apr;44 (2):84. doi: 10.1590/s1806-37562018000000088. [Article in Portuguese, English]

  24. Case report & review: Bilateral NIFTP harboring concomitant HRAS and

    NIFTP's inclusion in the most recent WHO endocrine tumor classification indicates that this lesion, identified by stringent inclusion and exclusion criteria, is widely accepted. According to the above categorization, patients will avoid the psychological toll that receiving a cancer diagnosis and the adverse effects of thyroidectomy or RAI therapy.

  25. STEM Outside of School: a Meta-Analysis of the Effects of Informal

    Inclusion and Exclusion Criteria. For inclusion in this meta-analysis, we listed the criteria that studies must satisfy below: 1. A study necessitates the exploration of an informal science learning setting wherein explicit documentation of informal activities is provided. The study should align with the established criteria for informal ...

  26. Methodology for research I

    The exclusion criteria include factors or characteristics that make the recruited population ineligible for the study. These factors may be confounders for the outcome parameter. For example, patients with liver disease would be excluded if coagulation parameters would impact the outcome. The exclusion criteria are inclusive of inclusion criteria.

  27. "No Papers, No Treatment": a scoping review of challenges faced by

    The research team consisted of two reviewers, who are also the authors of this work. These reviewers formulated the main research objectives and outlined the review by defining the search terms, identifying the databases for the literature search, and establishing the inclusion and exclusion criteria.

  28. Synbiotics in patients at risk for spontaneous preterm birth: protocol

    The study population consist of pregnant women at risk for sPTB based on their obstetric history. Inclusion and exclusion criteria are summarised in Table 1. Spontaneous preterm birth is defined as delivery at viable preterm gestation (24 0/7 until 35 6/7 weeks) following preterm labour, PPROM, or cervical insufficiency.

  29. Full article: Newer Modalities and Updates in the Management of Sickle

    Inclusion Criteria. ... Exclusion Criteria. We excluded review articles, letter-to-editor, abstracts, and studies that were not directly related to recently approved drugs. ... Citation 35 Another important pathophysiological event in sickle cell disease is the polymerization of hemoglobin under deoxygenation, which causes red blood cell ...

  30. The Relationship between Training Load and Injury Risk in ...

    The present study employed specific inclusion criteria to identify studies for analysis. Only prospective or retrospective cohort designs were considered, with the exclusion of case studies, case series, case-control studies, review papers, or purely epidemiology [11,13]. The study population consisted of basketball athletes participating at ...