Just for emphasis, the means from Table 1 are presented in the next two figures (Fig. 1 and Fig. 2).
Figure 1. Age of subjects by groups (A = blue, B = red) with and without randomized assignment of subjects to treatment groups
Figure 2. BMI of subjects by groups (A = blue, B = red) with and without randomized assignment of subjects to treatment groups
Note that the apparent difference between A and B for BMI disappear once proper randomization of subjects was accomplished. In conclusion, a random sample is an approach to experimental design that helps to reduce the influence other factors may have on the outcome variable (e.g., change in blood pressure after 16 weeks of exercise). In principle, randomization should protect a project because, on average, these influences will be represented randomly for the two groups of individuals. This reasoning extends to unmeasured and unknown causal factors as well.
This discussion was illustrated by random assignment of subjects to treatment groups. The same logic applies to how to select subjects from a population. If the sampling is large enough, then a random sample of subjects will tend to be representative of the variability of the outcome variable for the population and representative also of the additional and unmeasured cofactors that may contribute to the variability of the outcome variable.
However, if you do cannot obtain a random sample, then conclusions reached may be sample-specific, biased . …perhaps the group of individuals that likes to exercise on treadmills just happens to have a higher cardiac output because they are larger than the individuals that like to exercise on bicycles. This nonrandom sample will bias your results and can lead to incorrect interpretation of results. Random sampling is CRUCIAL in epidemiology, opinion survey work, most aspects of health, drug studies, medical work with human subjects. It’s difficult and very costly to do… so most surveys you hear about, especially polls reported from Internet sites, are NOT conducted using random sampling (included in the catch-all term “ probability sampling “)!! As an aside, most opinion survey work involves complex sample designs involving some form of geographic clustering (e.g., all phone numbers in a city, random sample among neighborhoods).
Random sampling is the ideal if generalizations are to be made about data, but strictly random sampling is not appropriate for all kinds of studies. Consider the question of whether or not EMF exposure is a risk factor for developing cancer (Pool 1990). These kinds of studies are observational: at least in principle, we wouldn’t expect that housing and therefore exposure to EMF is manipulated (cf. discussion Walker 2009). Thus, epidemiologists will look for patterns: if EMF exposure is linked to cancer, then more cases of cancer should occur near EMF sources compared to areas distant from EMF sources. Thus, the hypothesis is that an association between EMF exposure and cancer occurs non-randomly, whereas cancers occurring in people not exposed to EMF are random. Unfortunately, clusters can occur even if the process that generates the data is random.
Compare Graph A and Graph B (Fig. 3). One of the graphs resulted from a random process and the other was generated by a non-random process . Note that the claim can be rephrased about the probability that each grid has a point, e.g., it’s like Heads/Tails of 16 tosses of a coin. We can see clusters of points in Graph B; Graph A lacks obvious clusters of points — there is a point in each of the 16 cells of the grid. Although both patterns could be random, the correct answer in this case is Graph B.
Figure 3. An example of clustering resulting from a random sampling process (Graph B). In contrast, Graph A was generated so that a point was located within each grid.
The graphic below shows the transmission grid in the continental United States (Fig. 4). How would one design a random sampling scheme overlaid against the obviously heterogeneous distribution of the grid itself? If a random sample was drawn, chances are good that no population would be near a grid in many of the western states, but in contrast, the likelihood would increase in the eastern portion of the United States where the population and therefore transmission grid is more densely placed.
Figure 4. Map of electrical transmission grid for continental United States of America. Image source https://openinframap.org/#3/24.61/-101.16
For example, you want to test whether or not EMF affects human health, and your particular interest is in whether or not there exists a relationship between living close to high voltage towers or transfer stations and brain cancer. How does one design a study, keeping in mind the importance of randomization for our ability to generalize and assign causation? This is a part of epidemiology which strives to detect whether clusters of disease are related to some environmental source. It is an extremely difficult challenge. For the record, no clear link to EMF and cancer has been found, but reports do appear from time to time (e.g., report on a cluster of breast cancer in men working in office adjacent to high EMF, Milham 2004).
1. I claimed that Graph B in Figure 8 was generated by a random process while Graph B was not. The results are: Graph A, each cell in the grid has a point; In graph B, ten cells have at least one point, six cells are empty. Which probability _____ distribution applies? A. beta B. binomial C. normal D. poisson
2. True or False. If sample with replacement is used, a subject may be included more than once.
3. Use the sample() with and without replacement on the object (see help with R below)
a) set of 3
b) set of 4
4. Confirm the claim by calculating the probability of Graph A result vs Graph B result (see R script below).
Code you type is shown in red; responses or output from R are shown in blue. Recall that statements preceded by the hash # are comments and are not read by R (i.e., no need for you tp type them).
First, create some variables. Vectors aa and bb contain my two age sequences.
Second, append vector bb to the end of vector aa
Third, get the average age for the first group (the aa sequence) and for the second group (the bb sequence). Lots of ways to do this, I made a two subsets from the combined age variable; could have just as easily taken the mean of aa and the mean of bb (same thing!).
Fourth, start building a data frame, then sort it by age. Will be adding additional variables to this data frame
Fifth, divide the variable again into two subsets of 30 and get the averages
Sixth, create an index variable, random order without replacement
Add the new variable to our existing data frame, then print it to check that all is well
Seventh, select for our first treatment group the first 30 subjects from the randomized index. There are again other ways to do this, but sorting on the index variable means that the subject order will be change too.
Print the new data frame to confirm that the sorting worked. It did. we can see that the rows have been sorted by ascending order based on the index variable.
Eighth, create our new treatment groups, again of n = 30 each, then get the means ages for each group.
Get the minimum and maximum values for the groups
Ninth, create a BMI variable drawn from a normal distribution with coefficient of variation equal to 20%. The first group with we will call cc
The second group called dd
Create a new variable called BMI by joining cc and dd
Add the BMI variable to our data frame.
Tenth, repeat our protocol from before: Set up two groups each with 30 subjects, calculate the means for the variables and then sort by the random index and get the new group means.
All we did was confirm that the unsorted groups had mean BMI of around 27.5 and 37.5 respectively. Now, proceed to sort by the random index variable. Go ahead and create a new data frame
Get the means of the new groups
That’s all of the work!
Run a free plagiarism check in 10 minutes, generate accurate citations for free.
Methodology
Published on March 8, 2021 by Pritha Bhandari . Revised on June 22, 2023.
In experimental research, random assignment is a way of placing participants from your sample into different treatment groups using randomization.
With simple random assignment, every member of the sample has a known or equal chance of being placed in a control group or an experimental group. Studies that use simple random assignment are also called completely randomized designs .
Random assignment is a key part of experimental design . It helps you ensure that all groups are comparable at the start of a study: any differences between them are due to random factors, not research biases like sampling bias or selection bias .
Why does random assignment matter, random sampling vs random assignment, how do you use random assignment, when is random assignment not used, other interesting articles, frequently asked questions about random assignment.
Random assignment is an important part of control in experimental research, because it helps strengthen the internal validity of an experiment and avoid biases.
In experiments, researchers manipulate an independent variable to assess its effect on a dependent variable, while controlling for other variables. To do so, they often use different levels of an independent variable for different groups of participants.
This is called a between-groups or independent measures design.
You use three groups of participants that are each given a different level of the independent variable:
Random assignment to helps you make sure that the treatment groups don’t differ in systematic ways at the start of the experiment, as this can seriously affect (and even invalidate) your work.
If you don’t use random assignment, you may not be able to rule out alternative explanations for your results.
With this type of assignment, it’s hard to tell whether the participant characteristics are the same across all groups at the start of the study. Gym-users may tend to engage in more healthy behaviors than people who frequent cafes or community centers, and this would introduce a healthy user bias in your study.
Although random assignment helps even out baseline differences between groups, it doesn’t always make them completely equivalent. There may still be extraneous variables that differ between groups, and there will always be some group differences that arise from chance.
Most of the time, the random variation between groups is low, and, therefore, it’s acceptable for further analysis. This is especially true when you have a large sample. In general, you should always use random assignment in experiments when it is ethically possible and makes sense for your study topic.
Random sampling and random assignment are both important concepts in research, but it’s important to understand the difference between them.
Random sampling (also called probability sampling or random selection) is a way of selecting members of a population to be included in your study. In contrast, random assignment is a way of sorting the sample participants into control and experimental groups.
While random sampling is used in many types of studies, random assignment is only used in between-subjects experimental designs.
Some studies use both random sampling and random assignment, while others use only one or the other.
Random sampling enhances the external validity or generalizability of your results, because it helps ensure that your sample is unbiased and representative of the whole population. This allows you to make stronger statistical inferences .
You use a simple random sample to collect data. Because you have access to the whole population (all employees), you can assign all 8000 employees a number and use a random number generator to select 300 employees. These 300 employees are your full sample.
Random assignment enhances the internal validity of the study, because it ensures that there are no systematic differences between the participants in each group. This helps you conclude that the outcomes can be attributed to the independent variable .
You use random assignment to place participants into the control or experimental group. To do so, you take your list of participants and assign each participant a number. Again, you use a random number generator to place each participant in one of the two groups.
To use simple random assignment, you start by giving every member of the sample a unique number. Then, you can use computer programs or manual methods to randomly assign each participant to a group.
This type of random assignment is the most powerful method of placing participants in conditions, because each individual has an equal chance of being placed in any one of your treatment groups.
In more complicated experimental designs, random assignment is only used after participants are grouped into blocks based on some characteristic (e.g., test score or demographic variable). These groupings mean that you need a larger sample to achieve high statistical power .
For example, a randomized block design involves placing participants into blocks based on a shared characteristic (e.g., college students versus graduates), and then using random assignment within each block to assign participants to every treatment condition. This helps you assess whether the characteristic affects the outcomes of your treatment.
In an experimental matched design , you use blocking and then match up individual participants from each block based on specific characteristics. Within each matched pair or group, you randomly assign each participant to one of the conditions in the experiment and compare their outcomes.
Sometimes, it’s not relevant or ethical to use simple random assignment, so groups are assigned in a different way.
Sometimes, differences between participants are the main focus of a study, for example, when comparing men and women or people with and without health conditions. Participants are not randomly assigned to different groups, but instead assigned based on their characteristics.
In this type of study, the characteristic of interest (e.g., gender) is an independent variable, and the groups differ based on the different levels (e.g., men, women, etc.). All participants are tested the same way, and then their group-level outcomes are compared.
When studying unhealthy or dangerous behaviors, it’s not possible to use random assignment. For example, if you’re studying heavy drinkers and social drinkers, it’s unethical to randomly assign participants to one of the two groups and ask them to drink large amounts of alcohol for your experiment.
When you can’t assign participants to groups, you can also conduct a quasi-experimental study . In a quasi-experiment, you study the outcomes of pre-existing groups who receive treatments that you may not have any control over (e.g., heavy drinkers and social drinkers). These groups aren’t randomly assigned, but may be considered comparable when some other variables (e.g., age or socioeconomic status) are controlled for.
Discover proofreading & editing
If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.
Research bias
In experimental research, random assignment is a way of placing participants from your sample into different groups using randomization. With this method, every member of the sample has a known or equal chance of being placed in a control group or an experimental group.
Random selection, or random sampling , is a way of selecting members of a population for your study’s sample.
In contrast, random assignment is a way of sorting the sample into control and experimental groups.
Random sampling enhances the external validity or generalizability of your results, while random assignment improves the internal validity of your study.
Random assignment is used in experiments with a between-groups or independent measures design. In this research design, there’s usually a control group and one or more experimental groups. Random assignment helps ensure that the groups are comparable.
In general, you should always use random assignment in this type of experimental design when it is ethically possible and makes sense for your study topic.
To implement random assignment , assign a unique number to every member of your study’s sample .
Then, you can use a random number generator or a lottery method to randomly assign each number to a control or experimental group. You can also do so manually, by flipping a coin or rolling a dice to randomly assign participants to groups.
If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.
Bhandari, P. (2023, June 22). Random Assignment in Experiments | Introduction & Examples. Scribbr. Retrieved August 12, 2024, from https://www.scribbr.com/methodology/random-assignment/
Other students also liked, guide to experimental design | overview, steps, & examples, confounding variables | definition, examples & controls, control groups and treatment groups | uses & examples, get unlimited documents corrected.
✔ Free APA citation check included ✔ Unlimited document corrections ✔ Specialized in correcting academic texts
Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.
Nature Reviews Methods Primers volume 4 , Article number: 27 ( 2024 ) Cite this article
322 Accesses
2 Citations
8 Altmetric
Metrics details
Single-case experimental designs are rapidly growing in popularity. This popularity needs to be accompanied by transparent and well-justified methodological and statistical decisions. Appropriate experimental design including randomization, proper data handling and adequate reporting are needed to ensure reproducibility and internal validity. The degree of generalizability can be assessed through replication.
This is a preview of subscription content, access via your institution
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
24,99 € / 30 days
cancel any time
Subscribe to this journal
Receive 1 digital issues and online access to articles
92,52 € per year
only 92,52 € per issue
Buy this article
Prices may be subject to local taxes which are calculated during checkout
Kazdin, A. E. Single-case experimental designs: characteristics, changes, and challenges. J. Exp. Anal. Behav. 115 , 56–85 (2021).
Article Google Scholar
Shadish, W. & Sullivan, K. J. Characteristics of single-case designs used to assess intervention effects in 2008. Behav. Res. 43 , 971–980 (2011).
Tanious, R. & Onghena, P. A systematic review of applied single-case research published between 2016 and 2018: study designs, randomization, data aspects, and data analysis. Behav. Res. 53 , 1371–1384 (2021).
Ferron, J., Foster-Johnson, L. & Kromrey, J. D. The functioning of single-case randomization tests with and without random assignment. J. Exp. Educ. 71 , 267–288 (2003).
Michiels, B., Heyvaert, M., Meulders, A. & Onghena, P. Confidence intervals for single-case effect size measures based on randomization test inversion. Behav. Res. 49 , 363–381 (2017).
Aydin, O. Characteristics of missing data in single-case experimental designs: an investigation of published data. Behav. Modif. https://doi.org/10.1177/01454455231212265 (2023).
De, T. K., Michiels, B., Tanious, R. & Onghena, P. Handling missing data in randomization tests for single-case experiments: a simulation study. Behav. Res. 52 , 1355–1370 (2020).
Baek, E., Luo, W. & Lam, K. H. Meta-analysis of single-case experimental design using multilevel modeling. Behav. Modif. 47 , 1546–1573 (2023).
Michiels, B., Tanious, R., De, T. K. & Onghena, P. A randomization test wrapper for synthesizing single-case experiments using multilevel models: a Monte Carlo simulation study. Behav. Res. 52 , 654–666 (2020).
Tate, R. L. et al. The single-case reporting guideline in behavioural interventions (SCRIBE) 2016: explanation and elaboration. Arch. Sci. Psychol. 4 , 10–31 (2016).
Google Scholar
Download references
R.T. and J.W.S.V. disclose support for the research of this work from the Dutch Research Council and the Dutch Ministry of Education, Culture and Science (NWO gravitation grant number 024.004.016) within the research project ‘New Science of Mental Disorders’ ( www.nsmd.eu ). R.M. discloses support from the Generalitat de Catalunya’s Agència de Gestió d’Ajusts Universitaris i de Recerca (grant number 2021SGR00366).
Authors and affiliations.
Experimental Health Psychology, Faculty of Psychology and Neuroscience, Maastricht University, Maastricht, the Netherlands
René Tanious & Johan W. S. Vlaeyen
Department of Social Psychology and Quantitative Psychology, Faculty of Psychology, University of Barcelona, Barcelona, Spain
Rumen Manolov
Methodology of Educational Sciences Research Group, Faculty of Psychology and Educational Science, KU Leuven, Leuven, Belgium
Patrick Onghena
You can also search for this author in PubMed Google Scholar
Correspondence to René Tanious .
Competing interests.
The authors declare no competing interests.
Reprints and permissions
Cite this article.
Tanious, R., Manolov, R., Onghena, P. et al. Single-case experimental designs: the importance of randomization and replication. Nat Rev Methods Primers 4 , 27 (2024). https://doi.org/10.1038/s43586-024-00312-8
Download citation
Published : 05 April 2024
DOI : https://doi.org/10.1038/s43586-024-00312-8
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.
What is randomisation, why do we randomise, choosing a randomisation method, implementing the chosen randomisation method.
Zoë Hoare, Randomisation: What, Why and How?, Significance , Volume 7, Issue 3, September 2010, Pages 136–138, https://doi.org/10.1111/j.1740-9713.2010.00443.x
Randomisation is a fundamental aspect of randomised controlled trials, but how many researchers fully understand what randomisation entails or what needs to be taken into consideration to implement it effectively and correctly? Here, for students or for those about to embark on setting up a trial, Zoë Hoare gives a basic introduction to help approach randomisation from a more informed direction.
Most trials of new medical treatments, and most other trials for that matter, now implement some form of randomisation. The idea sounds so simple that defining it becomes almost a joke: randomisation is “putting participants into the treatment groups randomly”. If only it were that simple. Randomisation can be a minefield, and not everyone understands what exactly it is or why they are doing it.
A key feature of a randomised controlled trial is that it is genuinely not known whether the new treatment is better than what is currently offered. The researchers should be in a state of equipoise; although they may hope that the new treatment is better, there is no definitive evidence to back this hypothesis up. This evidence is what the trial is trying to provide.
You will have, at its simplest, two groups: patients who are getting the new treatment, and those getting the control or placebo. You do not hand-select which patient goes into which group, because that would introduce selection bias. Instead you allocate your patients randomly. In its simplest form this can be done by the tossing of a fair coin: heads, the patient gets the trial treatment; tails, he gets the control. Simple randomisation is a fair way of ensuring that any differences that occur between the treatment groups arise completely by chance. But – and this is the first but of many here – simple randomisation can lead to unbalanced groups, that is, groups of unequal size. This is particularly true if the trial is only small. For example, tossing a fair coin 10 times will only result in five heads and five tails about 25% of the time. We would have a 66% chance of getting 6 heads and 4 tails, 5 and 5, or 4 and 6; 33% of the time we would get an even larger imbalance, with 7, 8, 9 or even all 10 patients in one group and the other group correspondingly undersized.
The impact of an imbalance like this is far greater for a small trial than for a larger trial. Tossing a fair coin 100 times will result in imbalance larger than 60–40 less than 1% of the time. One important part of the trial design process is the statement of intention of using randomisation; then we need to establish which method to use, when it will be used, and whether or not it is in fact random.
Randomisation needs to be controlled: You would not want all the males under 30 to be in one trial group and all the women over 70 in the other
It is partly true to say that we do it because we have to. The Consolidated Standards of Reporting Trials (CONSORT) 1 , to which we should all adhere, tells us: “Ideally, participants should be assigned to comparison groups in the trial on the basis of a chance (random) process characterized by unpredictability.” The requirement is there for a reason. Randomisation of the participants is crucial because it allows the principles of statistical theory to stand and as such allows a thorough analysis of the trial data without bias. The exact method of randomisation can have an impact on the trial analyses, and this needs to be taken into account when writing the statistical analysis plan.
Ideally, simple randomisation would always be the preferred option. However, in practice there often needs to be some control of the allocations to avoid severe imbalances within treatments or within categories of patient. You would not want, for example, all the males under 30 to be in one group and all the females over 70 in the other. This is where restricted or stratified randomisation comes in.
Restricted randomisation relates to using any method to control the split of allocations to each of the treatment groups based on certain criteria. This can be as simple as generating a random list, such as AAABBBABABAABB …, and allocating each participant as they arrive to the next treatment on the list. At certain points within the allocations we know that the groups will be balanced in numbers – here at the sixth, eighth, tenth and 14th participants – and we can control the maximum imbalance at any one time.
Stratified randomisation sets out to control the balance in certain baseline characteristics of the participants – such as sex or age. This can be thought of as producing an individual randomisation list for each of the characteristics concerned.
© iStockphoto.com/dra_schwartz
Stratification variables are the baseline characteristics that you think might influence the outcome your trial is trying to measure. For example, if you thought gender was going to have an effect on the efficacy of the treatment then you would use it as one of your stratification variables. A stratified randomisation procedure would aim to ensure a balance of the two gender groups between the two treatment groups.
If you also thought age would be affecting the treatment then you could also stratify by age (young/old) with some sensible limits on what old and young are. Once you start stratifying by age and by gender, you have to start taking care. You will need to use a stratified randomisation process that balances at the stratum level (i.e. at the level of those characteristics) to ensure that all four strata (male/young, male/old, female/young and female/old) have equivalent numbers of each of the treatment groups represented.
“Great”, you might think. “I'll just stratify by all my baseline characteristics!” Better not. Stop and consider what this would mean. As the number of stratification variables increases linearly, the number of strata increases exponentially. This reduces the number of participants that would appear in each stratum. In our example above, with our two stratification variables of age and sex we had four strata; if we added, say “blue-eyed” and “overweight” to our criteria to give four stratification variables each with just two levels we would get 16 represented strata. How likely is it that each of those strata will be represented in the population targeted by the trial? In other words, will we be sure of finding a blue-eyed young male who is also overweight among our patients? And would one such overweight possible Adonis be statistically enough? It becomes evident that implementing pre-generated lists within each stratification level or stratum and maintaining an overall balance of group sizes becomes much more complicated with many stratification variables and the uncertainty of what type of participant will walk through the door next.
Does it matter? There are a wide variety of methods for randomisation, and which one you choose does actually matter. It needs to be able to do everything that is required of it. Ask yourself these questions, and others:
Can the method accommodate enough treatment groups? Some methods are limited to two treatment groups; many trials involve three or more.
What type of randomness, if any, is injected into the method? The level of randomness dictates how predictable a method is.
A deterministic method has no randomness, meaning that with all the previous information you can tell in advance which group the next patient to appear will be allocated to. Allocating alternate participants to the two treatments using ABABABABAB … would be an example.
A static random element means that each allocation is made with a pre-defined probability. The coin-toss method does this.
With a dynamic element the probability of allocation is always changing in relation to the information received, meaning that the probability of allocation can only be worked out with knowledge of the algorithm together with all its settings. A biased coin toss does this where the bias is recalculated for each participant.
Can the method accommodate stratification variables, and if so how many? Not all of them can. And can it cope with continuous stratification variables? Most variables are divided into mutually exclusive categories (e.g. male or female), but sometimes it may be necessary (or preferable) to use a continuous scale of the variable – such as weight, or body mass index.
Can the method use an unequal allocation ratio? Not all trials require equal-sized treatment groups. There are many reasons why it might be wise to have more patients receiving treatment A than treatment B 2 . However, an allocation ratio being something other than 1:1 does impact on the study design and on the calculation of the sample size, so is not something to be changing mid-trial. Not all allocation methods can cope with this inequality.
Is thresholding used in the method? Thresholding handles imbalances in allocation. A threshold is set and if the imbalance becomes greater than the threshold then the allocation becomes deterministic to reduce the imbalance back below the threshold.
Can the method be implemented sequentially? In other words, does it require that the total number of participants be known at the beginning of the allocations? Some methods generate lists requiring exactly N participants to be recruited in order to be effective – and recruiting participants is often one of the more problematic parts of a trial.
Is the method complex? If so, then its practical implementation becomes an issue for the day-to-day running of the trial.
Is the method suitable to apply to a cluster randomisation? Cluster randomisations are used when randomising groups of individuals to a treatment rather than the individuals themselves. This can be due to the nature of the treatment, such as a new teaching method for schools or a dietary intervention for families. Using clusters is a big part of the trial design and the randomisation needs to be handled slightly differently.
Should a response-adaptive method be considered? If there is some evidence that one treatment is better than another, then a response-adaptive method works by taking into account the outcomes of previous allocations and works to minimise the number of participants on the “wrong” treatment.
For multi-centred trials, how to handle the randomisations across the centres should be considered at this point. Do all centres need to be completely balanced? Are all centres the same size? Considering the various centres as stratification variables is one way of dealing with more than one centre.
Once the method of randomisation has been established the next important step is to consider how to implement it. The recommended way is to enlist the services of a central randomisation office that can offer robust, validated techniques with the security and back-up needed to implement many of the methods proposed today. How the method is implemented must be as clearly reported as the method chosen. As part of the implementation it is important to keep the allocations concealed, both those already done and any future ones, from as many people as possible. This helps prevent selection bias: a clinician may withhold a participant if he believes that based on previous allocations the next allocations would not be the “preferred” ones – see the section below on subversion.
Part of the trial design will be to note exactly who should know what about how each participant has been allocated. Researchers and participants may be equally blinded, but that is not always the case.
For example, in a blinded trial there may be researchers who do not know which group the participants have been allocated to. This enables them to conduct the assessments without any bias for the allocation. They may, however, start to guess, on the basis of the results they see. A measure of blinding may be incorporated for the researchers to indicate whether they have remained blind to the treatment allocated. This can be in the form of a simple scale tool for the researcher to indicate how confident they are in knowing which allocated group the participant is in by the end of an assessment. With psychosocial interventions it is often impossible to hide from the participants, let alone the clinicians, which treatment group they have been allocated to.
In a drug trial where a placebo can be prescribed a coded system can ensure that neither patients nor researchers know which group is which until after the analysis stage.
With any level of blinding there may be a requirement to unblind participants or clinicians at any point in the trial, and there should be a documented procedure drawn up on how to unblind a particular participant without risking the unblinding of a trial. For drug trials in particular, the methods for unblinding a participant must be stated in the trial protocol. Wherever possible the data analysts and statisticians should remain blind to the allocation until after the main analysis has taken place.
Blinding should not be confused with allocation concealment. Blinding prevents performance and ascertainment bias within a trial, while allocation concealment prevents selection bias. Bias introduced by poor allocation concealment may be thought of as a predictive bias, trying to influence the results from the outset, while the biases introduced by non-blinding can be thought of as a reactive bias, creating causal links in outcomes because of being in possession of information about the treatment group.
In the literature on randomisation there are numerous tales of how allocation schemes have been subverted by clinicians trying to do the best for the trial or for their patient or both. This includes anecdotal tales of clinicians holding sealed envelopes containing the allocations up to X-ray lights and confessing to breaking into locked filing cabinets to get at the codes 3 . This type of behaviour has many explanations and reasons, but does raise the question whether these clinicians were in a state of equipoise with regard to the trial, and whether therefore they should really have been involved with the trial. Randomisation schemes and their implications must be signed up to by the whole team and are not something that only the participants need to consent to.
Clinicians have been known to X-ray sealed allocation envelopes to try to get their patients into the preferred group in a trial
The 2010 CONSORT statement can be found at http://www.consort-statement.org/consort-statement/ .
Dumville , J. C. , Hahn , S. , Miles , J. N. V. and Torgerson , D. J. ( 2006 ) The use of unequal randomisation ratios in clinical trials: A review . Contemporary Clinical Trials , 27 , 1 – 12 .
Google Scholar
Shulz , K. F. ( 1995 ) Subverting randomisation in controlled trials . Journal of the American Medical Association , 274 , 1456 – 1458 .
Month: | Total Views: |
---|---|
February 2023 | 6 |
March 2023 | 6 |
April 2023 | 7 |
May 2023 | 16 |
June 2023 | 24 |
July 2023 | 19 |
August 2023 | 45 |
September 2023 | 106 |
October 2023 | 139 |
November 2023 | 234 |
December 2023 | 184 |
January 2024 | 239 |
February 2024 | 197 |
March 2024 | 183 |
April 2024 | 140 |
May 2024 | 167 |
June 2024 | 142 |
July 2024 | 208 |
August 2024 | 106 |
Citing articles via.
Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide
Sign In or Create an Account
This PDF is available to Subscribers Only
For full access to this pdf, sign in to an existing account, or purchase an annual subscription.
BMC Medical Research Methodology volume 21 , Article number: 168 ( 2021 ) Cite this article
30k Accesses
40 Citations
14 Altmetric
Metrics details
Randomization is the foundation of any clinical trial involving treatment comparison. It helps mitigate selection bias, promotes similarity of treatment groups with respect to important known and unknown confounders, and contributes to the validity of statistical tests. Various restricted randomization procedures with different probabilistic structures and different statistical properties are available. The goal of this paper is to present a systematic roadmap for the choice and application of a restricted randomization procedure in a clinical trial.
We survey available restricted randomization procedures for sequential allocation of subjects in a randomized, comparative, parallel group clinical trial with equal (1:1) allocation. We explore statistical properties of these procedures, including balance/randomness tradeoff, type I error rate and power. We perform head-to-head comparisons of different procedures through simulation under various experimental scenarios, including cases when common model assumptions are violated. We also provide some real-life clinical trial examples to illustrate the thinking process for selecting a randomization procedure for implementation in practice.
Restricted randomization procedures targeting 1:1 allocation vary in the degree of balance/randomness they induce, and more importantly, they vary in terms of validity and efficiency of statistical inference when common model assumptions are violated (e.g. when outcomes are affected by a linear time trend; measurement error distribution is misspecified; or selection bias is introduced in the experiment). Some procedures are more robust than others. Covariate-adjusted analysis may be essential to ensure validity of the results. Special considerations are required when selecting a randomization procedure for a clinical trial with very small sample size.
The choice of randomization design, data analytic technique (parametric or nonparametric), and analysis strategy (randomization-based or population model-based) are all very important considerations. Randomization-based tests are robust and valid alternatives to likelihood-based tests and should be considered more frequently by clinical investigators.
Peer Review reports
Various research designs can be used to acquire scientific medical evidence. The randomized controlled trial (RCT) has been recognized as the most credible research design for investigations of the clinical effectiveness of new medical interventions [ 1 , 2 ]. Evidence from RCTs is widely used as a basis for submissions of regulatory dossiers in request of marketing authorization for new drugs, biologics, and medical devices. Three important methodological pillars of the modern RCT include blinding (masking), randomization, and the use of control group [ 3 ].
While RCTs provide the highest standard of clinical evidence, they are laborious and costly, in terms of both time and material resources. There are alternative designs, such as observational studies with either a cohort or case–control design, and studies using real world evidence (RWE). When properly designed and implemented, observational studies can sometimes produce similar estimates of treatment effects to those found in RCTs, and furthermore, such studies may be viable alternatives to RCTs in many settings where RCTs are not feasible and/or not ethical. In the era of big data, the sources of clinically relevant data are increasingly rich and include electronic health records, data collected from wearable devices, health claims data, etc. Big data creates vast opportunities for development and implementation of novel frameworks for comparative effectiveness research [ 4 ], and RWE studies nowadays can be implemented rapidly and relatively easily. But how credible are the results from such studies?
In 1980, D. P. Byar issued warnings and highlighted potential methodological problems with comparison of treatment effects using observational databases [ 5 ]. Many of these issues still persist and actually become paramount during the ongoing COVID-19 pandemic when global scientific efforts are made to find safe and efficacious vaccines and treatments as soon as possible. While some challenges pertinent to RWE studies are related to the choice of proper research methodology, some additional challenges arise from increasing requirements of health authorities and editorial boards of medical journals for the investigators to present evidence of transparency and reproducibility of their conducted clinical research. Recently, two top medical journals, the New England Journal of Medicine and the Lancet, retracted two COVID-19 studies that relied on observational registry data [ 6 , 7 ]. The retractions were made at the request of the authors who were unable to ensure reproducibility of the results [ 8 ]. Undoubtedly, such cases are harmful in many ways. The already approved drugs may be wrongly labeled as “toxic” or “inefficacious”, and the reputation of the drug developers could be blemished or destroyed. Therefore, the highest standards for design, conduct, analysis, and reporting of clinical research studies are now needed more than ever. When treatment effects are modest, yet still clinically meaningful, a double-blind, randomized, controlled clinical trial design helps detect these differences while adjusting for possible confounders and adequately controlling the chances of both false positive and false negative findings.
Randomization in clinical trials has been an important area of methodological research in biostatistics since the pioneering work of A. Bradford Hill in the 1940’s and the first published randomized trial comparing streptomycin with a non-treatment control [ 9 ]. Statisticians around the world have worked intensively to elaborate the value, properties, and refinement of randomization procedures with an incredible record of publication [ 10 ]. In particular, a recent EU-funded project ( www.IDeAl.rwth-aachen.de ) on innovative design and analysis of small population trials has “randomization” as one work package. In 2020, a group of trial statisticians around the world from different sectors formed a subgroup of the Drug Information Association (DIA) Innovative Designs Scientific Working Group (IDSWG) to raise awareness of the full potential of randomization to improve trial quality, validity and rigor ( https://randomization-working-group.rwth-aachen.de/ ).
The aims of the current paper are three-fold. First, we describe major recent methodological advances in randomization, including different restricted randomization designs that have superior statistical properties compared to some widely used procedures such as permuted block designs. Second, we discuss different types of experimental biases in clinical trials and explain how a carefully chosen randomization design can mitigate risks of these biases. Third, we provide a systematic roadmap for evaluating different restricted randomization procedures and selecting an “optimal” one for a particular trial. We also showcase application of these ideas through several real life RCT examples.
The target audience for this paper would be clinical investigators and biostatisticians who are tasked with the design, conduct, analysis, and interpretation of clinical trial results, as well as regulatory and scientific/medical journal reviewers. Recognizing the breadth of the concept of randomization, in this paper we focus on a randomized, comparative, parallel group clinical trial design with equal (1:1) allocation, which is typically implemented using some restricted randomization procedure, possibly stratified by some important baseline prognostic factor(s) and/or study center. Some of our findings and recommendations are generalizable to more complex clinical trial settings. We shall highlight these generalizations and outline additional important considerations that fall outside the scope of the current paper.
The paper is organized as follows. The “ Methods ” section provides some general background on the methodology of randomization in clinical trials, describes existing restricted randomization procedures, and discusses some important criteria for comparison of these procedures in practice. In the “ Results ” section, we present our findings from four simulation studies that illustrate the thinking process when evaluating different randomization design options at the study planning stage. The “ Conclusions ” section summarizes the key findings and important considerations on restricted randomization procedures, and it also highlights some extensions and further topics on randomization in clinical trials.
Randomization is an essential component of an experimental design in general and clinical trials in particular. Its history goes back to R. A. Fisher and his classic book “The Design of Experiments” [ 11 ]. Implementation of randomization in clinical trials is due to A. Bradford Hill who designed the first randomized clinical trial evaluating the use of streptomycin in treating tuberculosis in 1946 [ 9 , 12 , 13 ].
Reference [ 14 ] provides a good summary of the rationale and justification for the use of randomization in clinical trials. The randomized controlled trial (RCT) has been referred to as “the worst possible design (except for all the rest)” [ 15 ], indicating that the benefits of randomization should be evaluated in comparison to what we are left with if we do not randomize. Observational studies suffer from a wide variety of biases that may not be adequately addressed even using state-of-the-art statistical modeling techniques.
The RCT in the medical field has several features that distinguish it from experimental designs in other fields, such as agricultural experiments. In the RCT, the experimental units are humans, and in the medical field often diagnosed with a potentially fatal disease. These subjects are sequentially enrolled for participation in the study at selected study centers, which have relevant expertise for conducting clinical research. Many contemporary clinical trials are run globally, at multiple research institutions. The recruitment period may span several months or even years, depending on a therapeutic indication and the target patient population. Patients who meet study eligibility criteria must sign the informed consent, after which they are enrolled into the study and, for example, randomized to either experimental treatment E or the control treatment C according to the randomization sequence. In this setup, the choice of the randomization design must be made judiciously, to protect the study from experimental biases and ensure validity of clinical trial results.
The first virtue of randomization is that, in combination with allocation concealment and masking, it helps mitigate selection bias due to an investigator’s potential to selectively enroll patients into the study [ 16 ]. A non-randomized, systematic design such as a sequence of alternating treatment assignments has a major fallacy: an investigator, knowing an upcoming treatment assignment in a sequence, may enroll a patient who, in their opinion, would be best suited for this treatment. Consequently, one of the groups may contain a greater number of “sicker” patients and the estimated treatment effect may be biased. Systematic covariate imbalances may increase the probability of false positive findings and undermine the integrity of the trial. While randomization alleviates the fallacy of a systematic design, it does not fully eliminate the possibility of selection bias (unless we consider complete randomization for which each treatment assignment is determined by a flip of a coin, which is rarely, if ever used in practice [ 17 ]). Commonly, RCTs employ restricted randomization procedures which sequentially balance treatment assignments while maintaining allocation randomness. A popular choice is the permuted block design that controls imbalance by making treatment assignments at random in blocks. To minimize potential for selection bias, one should avoid overly restrictive randomization schemes such as permuted block design with small block sizes, as this is very similar to alternating treatment sequence.
The second virtue of randomization is its tendency to promote similarity of treatment groups with respect to important known, but even more importantly, unknown confounders. If treatment assignments are made at random, then by the law of large numbers, the average values of patient characteristics should be approximately equal in the experimental and the control groups, and any observed treatment difference should be attributed to the treatment effects, not the effects of the study participants [ 18 ]. However, one can never rule out the possibility that the observed treatment difference is due to chance, e.g. as a result of random imbalance in some patient characteristics [ 19 ]. Despite that random covariate imbalances can occur in clinical trials of any size, such imbalances do not compromise the validity of statistical inference, provided that proper statistical techniques are applied in the data analysis.
Several misconceptions on the role of randomization and balance in clinical trials were documented and discussed by Senn [ 20 ]. One common misunderstanding is that balance of prognostic covariates is necessary for valid inference. In fact, different randomization designs induce different extent of balance in the distributions of covariates, and for a given trial there is always a possibility of observing baseline group differences. A legitimate approach is to pre-specify in the protocol the clinically important covariates to be adjusted for in the primary analysis, apply a randomization design (possibly accounting for selected covariates using pre-stratification or some other approach), and perform a pre-planned covariate-adjusted analysis (such as analysis of covariance for a continuous primary outcome), verifying the model assumptions and conducting additional supportive/sensitivity analyses, as appropriate. Importantly, the pre-specified prognostic covariates should always be accounted for in the analysis, regardless whether their baseline differences are present or not [ 20 ].
It should be noted that some randomization designs (such as covariate-adaptive randomization procedures) can achieve very tight balance of covariate distributions between treatment groups [ 21 ]. While we address randomization within pre-specified stratifications, we do not address more complex covariate- and response-adaptive randomization in this paper.
Finally, randomization plays an important role in statistical analysis of the clinical trial. The most common approach to inference following the RCT is the invoked population model [ 10 ]. With this approach, one posits that there is an infinite target population of patients with the disease, from which \(n\) eligible subjects are sampled in an unbiased manner for the study and are randomized to the treatment groups. Within each group, the responses are assumed to be independent and identically distributed (i.i.d.), and inference on the treatment effect is performed using some standard statistical methodology, e.g. a two sample t-test for normal outcome data. The added value of randomization is that it makes the assumption of i.i.d. errors more feasible compared to a non-randomized study because it introduces a real element of chance in the allocation of patients.
An alternative approach is the randomization model , in which the implemented randomization itself forms the basis for statistical inference [ 10 ]. Under the null hypothesis of the equality of treatment effects, individual outcomes (which are regarded as not influenced by random variation, i.e. are considered as fixed) are not affected by treatment. Treatment assignments are permuted in all possible ways consistent with the randomization procedure actually used in the trial. The randomization-based p- value is the sum of null probabilities of the treatment assignment permutations in the reference set that yield the test statistic values greater than or equal to the experimental value. A randomization-based test can be a useful supportive analysis, free of assumptions of parametric tests and protective against spurious significant results that may be caused by temporal trends [ 14 , 22 ].
It is important to note that Bayesian inference has also become a common statistical analysis in RCTs [ 23 ]. Although the inferential framework relies upon subjective probabilities, a study analyzed through a Bayesian framework still relies upon randomization for the other aforementioned virtues [ 24 ]. Hence, the randomization considerations discussed herein have broad application.
Randomization is not a single methodology, but a very broad class of design techniques for the RCT [ 10 ]. In this paper, we consider only randomization designs for sequential enrollment clinical trials with equal (1:1) allocation in which randomization is not adapted for covariates and/or responses. The simplest procedure for an RCT is complete randomization design (CRD) for which each subject’s treatment is determined by a flip of a fair coin [ 25 ]. CRD provides no potential for selection bias (e.g. based on prediction of future assignments) but it can result, with non-negligible probability, in deviations from the 1:1 allocation ratio and covariate imbalances, especially in small samples. This may lead to loss of statistical efficiency (decrease in power) compared to the balanced design. In practice, some restrictions on randomization are made to achieve balanced allocation. Such randomization designs are referred to as restricted randomization procedures [ 26 , 27 ].
Suppose we plan to randomize an even number of subjects \(n\) sequentially between treatments E and C. Two basic designs that equalize the final treatment numbers are the random allocation rule (Rand) and the truncated binomial design (TBD), which were discussed in the 1957 paper by Blackwell and Hodges [ 28 ]. For Rand, any sequence of exactly \(n/2\) E’s and \(n/2\) C’s is equally likely. For TBD, treatment assignments are made with probability 0.5 until one of the treatments receives its quota of \(n/2\) subjects; thereafter all remaining assignments are made deterministically to the opposite treatment.
A common feature of both Rand and TBD is that they aim at the final balance, whereas at intermediate steps it is still possible to have substantial imbalances, especially if \(n\) is large. A long run of a single treatment in a sequence may be problematic if there is a time drift in some important covariate, which can lead to chronological bias [ 29 ]. To mitigate this risk, one can further restrict randomization so that treatment assignments are balanced over time. One common approach is the permuted block design (PBD) [ 30 ], for which random treatment assignments are made in blocks of size \(2b\) ( \(b\) is some small positive integer), with exactly \(b\) allocations to each of the treatments E and C. The PBD is perhaps the oldest (it can be traced back to A. Bradford Hill’s 1951 paper [ 12 ]) and the most widely used randomization method in clinical trials. Often its choice in practice is justified by simplicity of implementation and the fact that it is referenced in the authoritative ICH E9 guideline on statistical principles for clinical trials [ 31 ]. One major challenge with PBD is the choice of the block size. If \(b=1\) , then every pair of allocations is balanced, but every even allocation is deterministic. Larger block sizes increase allocation randomness. The use of variable block sizes has been suggested [ 31 ]; however, PBDs with variable block sizes are also quite predictable [ 32 ]. Another problematic feature of the PBD is that it forces periodic return to perfect balance, which may be unnecessary from the statistical efficiency perspective and may increase the risk of prediction of upcoming allocations.
More recent and better alternatives to the PBD are the maximum tolerated imbalance (MTI) procedures [ 33 , 34 , 35 , 36 , 37 , 38 , 39 , 40 , 41 ]. These procedures provide stronger encryption of the randomization sequence (i.e. make it more difficult to predict future treatment allocations in the sequence even knowing the current sizes of the treatment groups) while controlling treatment imbalance at a pre-defined threshold throughout the experiment. A general MTI procedure specifies a certain boundary for treatment imbalance, say \(b>0\) , that cannot be exceeded. If, at a given allocation step the absolute value of imbalance is equal to \(b\) , then one next allocation is deterministically forced toward balance. This is in contrast to PBD which, after reaching the target quota of allocations for either treatment within a block, forces all subsequent allocations to achieve perfect balance at the end of the block. Some notable MTI procedures are the big stick design (BSD) proposed by Soares and Wu in 1983 [ 37 ], the maximal procedure proposed by Berger, Ivanova and Knoll in 2003 [ 35 ], the block urn design (BUD) proposed by Zhao and Weng in 2011 [ 40 ], just to name a few. These designs control treatment imbalance within pre-specified limits and are more immune to selection bias than the PBD [ 42 , 43 ].
Another important class of restricted randomization procedures is biased coin designs (BCDs). Starting with the seminal 1971 paper of Efron [ 44 ], BCDs have been a hot research topic in biostatistics for 50 years. Efron’s BCD is very simple: at any allocation step, if treatment numbers are balanced, the next assignment is made with probability 0.5; otherwise, the underrepresented treatment is assigned with probability \(p\) , where \(0.5<p\le 1\) is a fixed and pre-specified parameter that determines the tradeoff between balance and randomness. Note that \(p=1\) corresponds to PBD with block size 2. If we set \(p<1\) (e.g. \(p=2/3\) ), then the procedure has no deterministic assignments and treatment allocation will be concentrated around 1:1 with high probability [ 44 ]. Several extensions of Efron’s BCD providing better tradeoff between treatment balance and allocation randomness have been proposed [ 45 , 46 , 47 , 48 , 49 ]; for example, a class of adjustable biased coin designs introduced by Baldi Antognini and Giovagnoli in 2004 [ 49 ] unifies many BCDs in a single framework. A comprehensive simulation study comparing different BCDs has been published by Atkinson in 2014 [ 50 ].
Finally, urn models provide a useful mechanism for RCT designs [ 51 ]. Urn models apply some probabilistic rules to sequentially add/remove balls (representing different treatments) in the urn, to balance treatment assignments while maintaining the randomized nature of the experiment [ 39 , 40 , 52 , 53 , 54 , 55 ]. A randomized urn design for balancing treatment assignments was proposed by Wei in 1977 [ 52 ]. More novel urn designs, such as the drop-the-loser urn design developed by Ivanova in 2003 [ 55 ] have reduced variability and can attain the target treatment allocation more efficiently. Many urn designs involve parameters that can be fine-tuned to obtain randomization procedures with desirable balance/randomness tradeoff [ 56 ].
A “good” randomization procedure is one that helps successfully achieve the study objective(s). Kalish and Begg [ 57 ] state that the major objective of a comparative clinical trial is to provide a precise and valid comparison. To achieve this, the trial design should be such that it: 1) prevents bias; 2) ensures an efficient treatment comparison; and 3) is simple to implement to minimize operational errors. Table 1 elaborates on these considerations, focusing on restricted randomization procedures for 1:1 randomized trials.
Before delving into a detailed discussion, let us introduce some important definitions. Following [ 10 ], a randomization sequence is a random vector \({{\varvec{\updelta}}}_{n}=({\delta }_{1},\dots ,{\delta }_{n})\) , where \({\delta }_{i}=1\) , if the i th subject is assigned to treatment E or \({\delta }_{i}=0\) , if the \(i\) th subject is assigned to treatment C. A restricted randomization procedure can be defined by specifying a probabilistic rule for the treatment assignment of the ( i +1)st subject, \({\delta }_{i+1}\) , given the past allocations \({{\varvec{\updelta}}}_{i}\) for \(i\ge 1\) . Let \({N}_{E}\left(i\right)={\sum }_{j=1}^{i}{\delta }_{j}\) and \({N}_{C}\left(i\right)=i-{N}_{E}\left(i\right)\) denote the numbers of subjects assigned to treatments E and C, respectively, after \(i\) allocation steps. Then \(D\left(i\right)={N}_{E}\left(i\right)-{N}_{C}(i)\) is treatment imbalance after \(i\) allocations. For any \(i\ge 1\) , \(D\left(i\right)\) is a random variable whose probability distribution is determined by the chosen randomization procedure.
Treatment balance and allocation randomness are two competing requirements in the design of an RCT. Restricted randomization procedures that provide a good tradeoff between these two criteria are desirable in practice.
Consider a trial with sample size \(n\) . The absolute value of imbalance, \(\left|D(i)\right|\) \((i=1,\dots,n)\) , provides a measure of deviation from equal allocation after \(i\) allocation steps. \(\left|D(i)\right|=0\) indicates that the trial is perfectly balanced. One can also consider \(\Pr(\vert D\left(i\right)\vert=0)\) , the probability of achieving exact balance after \(i\) allocation steps. In particular \(\Pr(\vert D\left(n\right)\vert=0)\) is the probability that the final treatment numbers are balanced. Two other useful summary measures are the expected imbalance at the \(i\mathrm{th}\) step, \(E\left|D(i)\right|\) and the expected value of the maximum imbalance of the entire randomization sequence, \(E\left(\underset{1\le i\le n}{\mathrm{max}}\left|D\left(i\right)\right|\right)\) .
Greater forcing of balance implies lack of randomness. A procedure that lacks randomness may be susceptible to selection bias [ 16 ], which is a prominent issue in open-label trials with a single center or with randomization stratified by center, where the investigator knows the sequence of all previous treatment assignments. A classic approach to quantify the degree of susceptibility of a procedure to selection bias is the Blackwell-Hodges model [ 28 ]. Let \({G}_{i}=1\) (or 0), if at the \(i\mathrm{th}\) allocation step an investigator makes a correct (or incorrect) guess on treatment assignment \({\delta }_{i}\) , given past allocations \({{\varvec{\updelta}}}_{i-1}\) . Then the predictability of the design at the \(i\mathrm{th}\) step is the expected value of \({G}_{i}\) , i.e. \(E\left(G_i\right)=\Pr(G_i=1)\) . Blackwell and Hodges [ 28 ] considered the expected bias factor , the difference between expected total number of correct guesses of a given sequence of random assignments and the similar quantity obtained from CRD for which treatment assignments are made independently with equal probability: \(E(F)=E\left({\sum }_{i=1}^{n}{G}_{i}\right)-n/2\) . This quantity is zero for CRD, and it is positive for restricted randomization procedures (greater values indicate higher expected bias). Matts and Lachin [ 30 ] suggested taking expected proportion of deterministic assignments in a sequence as another measure of lack of randomness.
In the literature, various restricted randomization procedures have been compared in terms of balance and randomness [ 50 , 58 , 59 ]. For instance, Zhao et al. [ 58 ] performed a comprehensive simulation study of 14 restricted randomization procedures with different choices of design parameters, for sample sizes in the range of 10 to 300. The key criteria were the maximum absolute imbalance and the correct guess probability. The authors found that the performance of the designs was within a closed region with the boundaries shaped by Efron’s BCD [ 44 ] and the big stick design [ 37 ], signifying that the latter procedure with a suitably chosen MTI boundary can be superior to other restricted randomization procedures in terms of balance/randomness tradeoff. Similar findings confirming the utility of the big stick design were recently reported by Hilgers et al. [ 60 ].
Validity of a statistical procedure essentially means that the procedure provides correct statistical inference following an RCT. In particular, a chosen statistical test is valid, if it controls the chance of a false positive finding, that is, the pre-specified probability of a type I error of the test is achieved but not exceeded. The strong control of type I error rate is a major prerequisite for any confirmatory RCT. Efficiency means high statistical power for detecting meaningful treatment differences (when they exist), and high accuracy of estimation of treatment effects.
Both validity and efficiency are major requirements of any RCT, and both of these aspects are intertwined with treatment balance and allocation randomness. Restricted randomization designs, when properly implemented, provide solid ground for valid and efficient statistical inference. However, a careful consideration of different options can help an investigator to optimize the choice of a randomization procedure for their clinical trial.
Let us start with statistical efficiency. Equal (1:1) allocation frequently maximizes power and estimation precision. To illustrate this, suppose the primary outcomes in the two groups are normally distributed with respective means \({\mu }_{E}\) and \({\mu }_{C}\) and common standard deviation \(\sigma >0\) . Then the variance of an efficient estimator of the treatment difference \({\mu }_{E}-{\mu }_{C}\) is equal to \(V=\frac{4{\sigma }^{2}}{n-{L}_{n}}\) , where \({L}_{n}=\frac{{\left|D(n)\right|}^{2}}{n}\) is referred to as loss [ 61 ]. Clearly, \(V\) is minimized when \({L}_{n}=0\) , or equivalently, \(D\left(n\right)=0\) , i.e. the balanced trial.
When the primary outcome follows a more complex statistical model, optimal allocation may be unequal across the treatment groups; however, 1:1 allocation is still nearly optimal for binary outcomes [ 62 , 63 ], survival outcomes [ 64 ], and possibly more complex data types [ 65 , 66 ]. Therefore, a randomization design that balances treatment numbers frequently promotes efficiency of the treatment comparison.
As regards inferential validity, it is important to distinguish two approaches to statistical inference after the RCT – an invoked population model and a randomization model [ 10 ]. For a given randomization procedure, these two approaches generally produce similar results when the assumption of normal random sampling (and some other assumptions) are satisfied, but the randomization model may be more robust when model assumptions are violated; e.g. when outcomes are affected by a linear time trend [ 67 , 68 ]. Another important issue that may interfere with validity is selection bias. Some authors showed theoretically that PBDs with small block sizes may result in serious inflation of the type I error rate under a selection bias model [ 69 , 70 , 71 ]. To mitigate risk of selection bias, one should ideally take preventative measures, such as blinding/masking, allocation concealment, and avoidance of highly restrictive randomization designs. However, for already completed studies with evidence of selection bias [ 72 ], special statistical adjustments are warranted to ensure validity of the results [ 73 , 74 , 75 ].
With the current state of information technology, implementation of randomization in RCTs should be straightforward. Validated randomization systems are emerging, and they can handle randomization designs of increasing complexity for clinical trials that are run globally. However, some important points merit consideration.
The first point has to do with how a randomization sequence is generated and implemented. One should distinguish between advance and adaptive randomization [ 16 ]. Here, by “adaptive” randomization we mean “in-real-time” randomization, i.e. when a randomization sequence is generated not upfront, but rather sequentially, as eligible subjects enroll into the study. Restricted randomization procedures are “allocation-adaptive”, in the sense that the treatment assignment of an individual subject is adapted to the history of previous treatment assignments. While in practice the majority of trials with restricted and stratified randomization use randomization schedules pre-generated in advance, there are some circumstances under which “in-real-time” randomization schemes may be preferred; for instance, clinical trials with high cost of goods and/or shortage of drug supply [ 76 ].
The advance randomization approach includes the following steps: 1) for the chosen randomization design and sample size \(n\) , specify the probability distribution on the reference set by enumerating all feasible randomization sequences of length \(n\) and their corresponding probabilities; 2) select a sequence at random from the reference set according to the probability distribution; and 3) implement this sequence in the trial. While enumeration of all possible sequences and their probabilities is feasible and may be useful for trials with small sample sizes, the task becomes computationally prohibitive (and unnecessary) for moderate or large samples. In practice, Monte Carlo simulation can be used to approximate the probability distribution of the reference set of all randomization sequences for a chosen randomization procedure.
A limitation of advance randomization is that a sequence of treatment assignments must be generated upfront, and proper security measures (e.g. blinding/masking) must be in place to protect confidentiality of the sequence. With the adaptive or “in-real-time” randomization, a sequence of treatment assignments is generated dynamically as the trial progresses. For many restricted randomization procedures, the randomization rule can be expressed as \(\Pr(\delta_{i+1}=1)=\left|F\left\{D\left(i\right)\right\}\right|\) , where \(F\left\{\cdot \right\}\) is some non-increasing function of \(D\left(i\right)\) for any \(i\ge 1\) . This is referred to as the Markov property [ 77 ], which makes a procedure easy to implement sequentially. Some restricted randomization procedures, e.g. the maximal procedure [ 35 ], do not have the Markov property.
The second point has to do with how the final data analysis is performed. With an invoked population model, the analysis is conditional on the design and the randomization is ignored in the analysis. With a randomization model, the randomization itself forms the basis for statistical inference. Reference [ 14 ] provides a contemporaneous overview of randomization-based inference in clinical trials. Several other papers provide important technical details on randomization-based tests, including justification for control of type I error rate with these tests [ 22 , 78 , 79 ]. In practice, Monte Carlo simulation can be used to estimate randomization-based p- values [ 10 ].
The design of any RCT starts with formulation of the trial objectives and research questions of interest [ 3 , 31 ]. The choice of a randomization procedure is an integral part of the study design. A structured approach for selecting an appropriate randomization procedure for an RCT was proposed by Hilgers et al. [ 60 ]. Here we outline the thinking process one may follow when evaluating different candidate randomization procedures. Our presented roadmap is by no means exhaustive; its main purpose is to illustrate the logic behind some important considerations for finding an “optimal” randomization design for the given trial parameters.
Throughout, we shall assume that the study is designed as a randomized, two-arm comparative trial with 1:1 allocation, with a fixed sample size \(n\) that is pre-determined based on budgetary and statistical considerations to obtain a definitive assessment of the treatment effect via the pre-defined hypothesis testing. We start with some general considerations which determine the study design:
Sample size ( \(n\) ). For small or moderate studies, exact attainment of the target numbers per group may be essential, because even slight imbalance may decrease study power. Therefore, a randomization design in such studies should equalize well the final treatment numbers. For large trials, the risk of major imbalances is less of a concern, and more random procedures may be acceptable.
The length of the recruitment period and the trial duration. Many studies are short-term and enroll participants fast, whereas some other studies are long-term and may have slow patient accrual. In the latter case, there may be time drifts in patient characteristics, and it is important that the randomization design balances treatment assignments over time.
Level of blinding (masking): double-blind, single-blind, or open-label. In double-blind studies with properly implemented allocation concealment the risk of selection bias is low. By contrast, in open-label studies the risk of selection bias may be high, and the randomization design should provide strong encryption of the randomization sequence to minimize prediction of future allocations.
Number of study centers. Many modern RCTs are implemented globally at multiple research institutions, whereas some studies are conducted at a single institution. In the former case, the randomization is often stratified by center and/or clinically important covariates. In the latter case, especially in single-institution open-label studies, the randomization design should be chosen very carefully, to mitigate the risk of selection bias.
An important point to consider is calibration of the design parameters. Many restricted randomization procedures involve parameters, such as the block size in the PBD, the coin bias probability in Efron’s BCD, the MTI threshold, etc. By fine-tuning these parameters, one can obtain designs with desirable statistical properties. For instance, references [ 80 , 81 ] provide guidance on how to justify the block size in the PBD to mitigate the risk of selection bias or chronological bias. Reference [ 82 ] provides a formal approach to determine the “optimal” value of the parameter \(p\) in Efron’s BCD in both finite and large samples. The calibration of design parameters can be done using Monte Carlo simulations for the given trial setting.
Another important consideration is the scope of randomization procedures to be evaluated. As we mentioned already, even one method may represent a broad class of randomization procedures that can provide different levels of balance/randomness tradeoff; e.g. Efron’s BCD covers a wide spectrum of designs, from PBD(2) (if \(p=1\) ) to CRD (if \(p=0.5\) ). One may either prefer to focus on finding the “optimal” parameter value for the chosen design, or be more general and include various designs (e.g. MTI procedures, BCDs, urn designs, etc.) in the comparison. This should be done judiciously, on a case-by-case basis, focusing only on the most reasonable procedures. References [ 50 , 58 , 60 ] provide good examples of simulation studies to facilitate comparisons among various restricted randomization procedures for a 1:1 RCT.
In parallel with the decision on the scope of randomization procedures to be assessed, one should decide upon the performance criteria against which these designs will be compared. Among others, one might think about the two competing considerations: treatment balance and allocation randomness. For a trial of size \(n\) , at each allocation step \(i=1,\dots ,n\) one can calculate expected absolute imbalance \(E\left|D(i)\right|\) and the probability of correct guess \(\Pr(G_i=1)\) as measures of lack of balance and lack of randomness, respectively. These measures can be either calculated analytically (when formulae are available) or through Monte Carlo simulations. Sometimes it may be useful to look at cumulative measures up to the \(i\mathrm{th}\) allocation step ( \(i=1,\dots ,n\) ); e.g. \(\frac{1}{i}{\sum }_{j=1}^{i}E\left|D(j)\right|\) and \(\frac1i\sum\nolimits_{j=1}^i\Pr(G_j=1)\) . For instance, \(\frac{1}{n}{\sum }_{j=1}^{n}{\mathrm{Pr}}({G}_{j}=1)\) is the average correct guess probability for a design with sample size \(n\) . It is also helpful to visualize the selected criteria. Visualizations can be done in a number of ways; e.g. plots of a criterion vs. allocation step, admissibility plots of two chosen criteria [ 50 , 59 ], etc. Such visualizations can help evaluate design characteristics, both overall and at intermediate allocation steps. They may also provide insights into the behavior of a particular design for different values of the tuning parameter, and/or facilitate a comparison among different types of designs.
Another way to compare the merits of different randomization procedures is to study their inferential characteristics such as type I error rate and power under different experimental conditions. Sometimes this can be done analytically, but a more practical approach is to use Monte Carlo simulation. The choice of the modeling and analysis strategy will be context-specific. Here we outline some considerations that may be useful for this purpose:
Data generating mechanism . To simulate individual outcome data, some plausible statistical model must be posited. The form of the model will depend on the type of outcomes (e.g. continuous, binary, time-to-event, etc.), covariates (if applicable), the distribution of the measurement error terms, and possibly some additional terms representing selection and/or chronological biases [ 60 ].
True treatment effects . At least two scenarios should be considered: under the null hypothesis ( \({H}_{0}\) : treatment effects are the same) to evaluate the type I error rate, and under an alternative hypothesis ( \({H}_{1}\) : there is some true clinically meaningful difference between the treatments) to evaluate statistical power.
Randomization designs to be compared . The choice of candidate randomization designs and their parameters must be made judiciously.
Data analytic strategy . For any study design, one should pre-specify the data analysis strategy to address the primary research question. Statistical tests of significance to compare treatment effects may be parametric or nonparametric, with or without adjustment for covariates.
The approach to statistical inference: population model-based or randomization-based . These two approaches are expected to yield similar results when the population model assumptions are met, but they may be different if some assumptions are violated. Randomization-based tests following restricted randomization procedures will control the type I error at the chosen level if the distribution of the test statistic under the null hypothesis is fully specified by the randomization procedure that was used for patient allocation. This is always the case unless there is a major flaw in the design (such as selection bias whereby the outcome of any individual participant is dependent on treatment assignments of the previous participants).
Overall, there should be a well-thought plan capturing the key questions to be answered, the strategy to address them, the choice of statistical software for simulation and visualization of the results, and other relevant details.
In this section we present four examples that illustrate how one may approach evaluation of different randomization design options at the study planning stage. Example 1 is based on a hypothetical 1:1 RCT with \(n=50\) and a continuous primary outcome, whereas Examples 2, 3, and 4 are based on some real RCTs.
Our first example is a hypothetical RCT in which the primary outcome is assumed to be normally distributed with mean \({\mu }_{E}\) for treatment E, mean \({\mu }_{C}\) for treatment C, and common variance \({\sigma }^{2}\) . A total of \(n\) subjects are to be randomized equally between E and C, and a two-sample t-test is planned for data analysis. Let \(\Delta ={\mu }_{E}-{\mu }_{C}\) denote the true mean treatment difference. We are interested in testing a hypothesis \({H}_{0}:\Delta =0\) (treatment effects are the same) vs. \({H}_{1}:\Delta \ne 0\) .
The total sample size \(n\) to achieve given power at some clinically meaningful treatment difference \({\Delta }_{c}\) while maintaining the chance of a false positive result at level \(\alpha\) can be obtained using standard statistical methods [ 83 ]. For instance, if \({\Delta }_{c}/\sigma =0.95\) , then a design with \(n=50\) subjects (25 per arm) provides approximately 91% power of a two-sample t-test to detect a statistically significant treatment difference using 2-sided \(\alpha =\) 5%. We shall consider 12 randomization procedures to sequentially randomize \(n=50\) subjects in a 1:1 ratio.
Random allocation rule – Rand.
Truncated binomial design – TBD.
Permuted block design with block size of 2 – PBD(2).
Permuted block design with block size of 4 – PBD(4).
Big stick design [ 37 ] with MTI = 3 – BSD(3).
Biased coin design with imbalance tolerance [ 38 ] with p = 2/3 and MTI = 3 – BCDWIT(2/3, 3).
Efron’s biased coin design [ 44 ] with p = 2/3 – BCD(2/3).
Adjustable biased coin design [ 49 ] with a = 2 – ABCD(2).
Generalized biased coin design (GBCD) with \(\gamma =1\) [ 45 ] – GBCD(1).
GBCD with \(\gamma =2\) [ 46 ] – GBCD(2).
GBCD with \(\gamma =5\) [ 47 ] – GBCD(5).
Complete randomization design – CRD.
These 12 procedures can be grouped into five major types. I) Procedures 1, 2, 3, and 4 achieve exact final balance for a chosen sample size (provided the total sample size is a multiple of the block size). II) Procedures 5 and 6 ensure that at any allocation step the absolute value of imbalance is capped at MTI = 3. III) Procedures 7 and 8 are biased coin designs that sequentially adjust randomization according to imbalance measured as the difference in treatment numbers. IV) Procedures 9, 10, and 11 (GBCD’s with \(\gamma =\) 1, 2, and 5) are adaptive biased coin designs, for which randomization probability is modified according to imbalance measured as the difference in treatment allocation proportions (larger \(\gamma\) implies greater forcing of balance). V) Procedure 12 (CRD) is the most random procedure that achieves balance for large samples.
We first compare the procedures with respect to treatment balance and allocation randomness. To quantify imbalance after \(i\) allocations, we consider two measures: expected value of absolute imbalance \(E\left|D(i)\right|\) , and expected value of loss \(E({L}_{i})=E{\left|D(i)\right|}^{2}/i\) [ 50 , 61 ]. Importantly, for procedures 1, 2, and 3 the final imbalance is always zero, thus \(E\left|D(n)\right|\equiv 0\) and \(E({L}_{n})\equiv 0\) , but at intermediate steps one may have \(E\left|D(i)\right|>0\) and \(E\left({L}_{i}\right)>0\) , for \(1\le i<n\) . For procedures 5 and 6 with MTI = 3, \(E\left({L}_{i}\right)\le 9/i\) . For procedures 7 and 8, \(E\left({L}_{n}\right)\) tends to zero as \(n\to \infty\) [ 49 ]. For procedures 9, 10, 11, and 12, as \(n\to \infty\) , \(E\left({L}_{n}\right)\) tends to the positive constants 1/3, 1/5, 1/11, and 1, respectively [ 47 ]. We take the cumulative average loss after \(n\) allocations as an aggregate measure of imbalance: \(Imb\left(n\right)=\frac{1}{n}{\sum }_{i=1}^{n}E\left({L}_{i}\right)\) , which takes values in the 0–1 range.
To measure lack of randomness, we consider two measures: expected proportion of correct guesses up to the \(i\mathrm{th}\) step, \(PCG\left(i\right)=\frac1i\sum\nolimits_{j=1}^i\Pr(G_j=1)\) , \(i=1,\dots ,n\) , and the forcing index [ 47 , 84 ], \(FI(i)=\frac{{\sum }_{j=1}^{i}E\left|{\phi }_{j}-0.5\right|}{i/4}\) , where \(E\left|{\phi }_{j}-0.5\right|\) is the expected deviation of the conditional probability of treatment E assignment at the \(j\mathrm{th}\) allocation step ( \({\phi }_{j}\) ) from the unconditional target value of 0.5. Note that \(PCG\left(i\right)\) takes values in the range from 0.5 for CRD to 0.75 for PBD(2) assuming \(i\) is even, whereas \(FI(i)\) takes values in the 0–1 range. At the one extreme, we have CRD for which \(FI(i)\equiv 0\) because for CRD \({\phi }_{i}=0.5\) for any \(i\ge 1\) . At the other extreme, we have PBD(2) for which every odd allocation is made with probability 0.5, and every even allocation is deterministic, i.e. made with probability 0 or 1. For PBD(2), assuming \(i\) is even, there are exactly \(i/2\) pairs of allocations, and so \({\sum }_{j=1}^{i}E\left|{\phi }_{j}-0.5\right|=0.5\cdot i/2=i/4\) , which implies that \(FI(i)=1\) for PBD(2). For all other restricted randomization procedures one has \(0<FI(i)<1\) .
A “good” randomization procedure should have low values of both loss and forcing index. Different randomization procedures can be compared graphically. As a balance/randomness tradeoff metric, one can calculate the quadratic distance to the origin (0,0) for the chosen sample size, e.g. \(d(n)=\sqrt{{\left\{Imb(n)\right\}}^{2}+{\left\{FI(n)\right\}}^{2}}\) (in our example \(n=50\) ), and the randomization designs can then be ranked such that designs with lower values of \(d(n)\) are preferable.
We ran a simulation study of the 12 randomization procedures for an RCT with \(n=50\) . Monte Carlo average values of absolute imbalance, loss, \(Imb\left(i\right)\) , \(FI\left(i\right)\) , and \(d(i)\) were calculated for each intermediate allocation step ( \(i=1,\dots ,50\) ), based on 10,000 simulations.
Figure 1 is a plot of expected absolute imbalance vs. allocation step. CRD, GBCD(1), and GBCD(2) show increasing patterns. For TBD and Rand, the final imbalance (when \(n=50\) ) is zero; however, at intermediate steps is can be quite large. For other designs, absolute imbalance is expected to be below 2 at any allocation step up to \(n=50\) . Note the periodic patterns of PBD(2) and PBD(4); for instance, for PBD(2) imbalance is 0 (or 1) for any even (or odd) allocation.
Simulated expected absolute imbalance vs. allocation step for 12 restricted randomization procedures for n = 50. Note: PBD(2) and PBD(4) have forced periodicity absolute imbalance of 0, which distinguishes them from MTI procedures
Figure 2 is a plot of expected proportion of correct guesses vs. allocation step. One can observe that for CRD it is a flat pattern at 0.5; for PBD(2) it fluctuates while reaching the upper limit of 0.75 at even allocation steps; and for ten other designs the values of proportion of correct guesses fall between those of CRD and PBD(2). The TBD has the same behavior up to ~ 40 th allocation step, at which the pattern starts increasing. Rand exhibits an increasing pattern with overall fewer correct guesses compared to other randomization procedures. Interestingly, BSD(3) is uniformly better (less predictable) than ABCD(2), BCD(2/3), and BCDWIT(2/3, 3). For the three GBCD procedures, there is a rapid initial increase followed by gradual decrease in the pattern; this makes good sense, because GBCD procedures force greater balance when the trial is small and become more random (and less prone to correct guessing) as the sample size increases.
Simulated expected proportion of correct guesses vs. allocation step for 12 restricted randomization procedures for n = 50
Table 2 shows the ranking of the 12 designs with respect to the overall performance metric \(d(n)=\sqrt{{\left\{Imb(n)\right\}}^{2}+{\left\{FI(n)\right\}}^{2}}\) for \(n=50\) . BSD(3), GBCD(2) and GBCD(1) are the top three procedures, whereas PBD(2) and CRD are at the bottom of the list.
Figure 3 is a plot of \(FI\left(n\right)\) vs. \(Imb\left(n\right)\) for \(n=50\) . One can see the two extremes: CRD that takes the value (0,1), and PBD(2) with the value (1,0). The other ten designs are closer to (0,0).
Simulated forcing index (x-axis) vs. aggregate expected loss (y-axis) for 12 restricted randomization procedures for n = 50
Figure 4 is a heat map plot of the metric \(d(i)\) for \(i=1,\dots ,50\) . BSD(3) seems to provide overall best tradeoff between randomness and balance throughout the study.
Heatmap of the balance/randomness tradeoff \(d\left(i\right)=\sqrt{{\left\{Imb(i)\right\}}^{2}+{\left\{FI(i)\right\}}^{2}}\) vs. allocation step ( \(i=1,\dots ,50\) ) for 12 restricted randomization procedures. The procedures are ordered by value of d(50), with smaller values (more red) indicating more optimal performance
Our next goal is to compare the chosen randomization procedures in terms of validity (control of the type I error rate) and efficiency (power). For this purpose, we assumed the following data generating mechanism: for the \(i\mathrm{th}\) subject, conditional on the treatment assignment \({\delta }_{i}\) , the outcome \({Y}_{i}\) is generated according to the model
where \({u}_{i}\) is an unknown term associated with the \(i\mathrm{th}\) subject and \({\varepsilon }_{i}\) ’s are i.i.d. measurement errors. We shall explore the following four models:
M1: Normal random sampling : \({u}_{i}\equiv 0\) and \({\varepsilon }_{i}\sim\) i.i.d. N(0,1), \(i=1,\dots ,n\) . This corresponds to a standard setup for a two-sample t-test under a population model.
M2: Linear trend : \({u}_{i}=\frac{5i}{n+1}\) and \({\varepsilon }_{i}\sim\) i.i.d. N(0,1), \(i=1,\dots ,n\) . In this model, the outcomes are affected by a linear trend over time [ 67 ].
M3: Cauchy errors : \({u}_{i}\equiv 0\) and \({\varepsilon }_{i}\sim\) i.i.d. Cauchy(0,1), \(i=1,\dots ,n\) . In this setup, we have a misspecification of the distribution of measurement errors.
M4: Selection bias : \({u}_{i+1}=-\nu \cdot sign\left\{D\left(i\right)\right\}\) , \(i=0,\dots ,n-1\) , with the convention that \(D\left(0\right)=0\) . Here, \(\nu >0\) is the “bias effect” (in our simulations we set \(\nu =0.5\) ). We also assume that \({\varepsilon }_{i}\sim\) i.i.d. N(0,1), \(i=1,\dots ,n\) . In this setup, at each allocation step the investigator attempts to intelligently guess the upcoming treatment assignment and selectively enroll a patient who, in their view, would be most suitable for the upcoming treatment. The investigator uses the “convergence” guessing strategy [ 28 ], that is, guess the treatment as one that has been less frequently assigned thus far, or make a random guess in case the current treatment numbers are equal. Assuming that the investigator favors the experimental treatment and is interested in demonstrating its superiority over the control, the biasing mechanism is as follows: at the \((i+1)\) st step, a “healthier” patient is enrolled, if \(D\left(i\right)<0\) ( \({u}_{i+1}=0.5\) ); a “sicker” patient is enrolled, if \(D\left(i\right)>0\) ( \({u}_{i+1}=-0.5\) ); or a “regular” patient is enrolled, if \(D\left(i\right)=0\) ( \({u}_{i+1}=0\) ).
We consider three statistical test procedures:
T1: Two-sample t-test : The test statistic is \(t=\frac{{\overline{Y} }_{E}-{\overline{Y} }_{C}}{\sqrt{{S}_{p}^{2}\left(\frac{1}{{N}_{E}\left(n\right)}+\frac{1}{{N}_{C}\left(n\right)}\right)}}\) , where \({\overline{Y} }_{E}=\frac{1}{{N}_{E}\left(n\right)}{\sum }_{i=1}^{n}{{\delta }_{i}Y}_{i}\) and \({\overline{Y} }_{C}=\frac{1}{{N}_{C}\left(n\right)}{\sum }_{i=1}^{n}{(1-\delta }_{i}){Y}_{i}\) are the treatment sample means, \({N}_{E}\left(n\right)={\sum }_{i=1}^{n}{\delta }_{i}\) and \({N}_{C}\left(n\right)=n-{N}_{E}\left(n\right)\) are the observed group sample sizes, and \({S}_{p}^{2}\) is a pooled estimate of variance, where \({S}_{p}^{2}=\frac{1}{n-2}\left({\sum }_{i=1}^{n}{\delta }_{i}{\left({Y}_{i}-{\overline{Y} }_{E}\right)}^{2}+{\sum }_{i=1}^{n}(1-{\delta }_{i}){\left({Y}_{i}-{\overline{Y} }_{C}\right)}^{2}\right)\) . Then \({H}_{0}:\Delta =0\) is rejected at level \(\alpha\) , if \(\left|t\right|>{t}_{1-\frac{\alpha }{2}, n-2}\) , the 100( \(1-\frac{\alpha }{2}\) )th percentile of the t-distribution with \(n-2\) degrees of freedom.
T2: Randomization-based test using mean difference : Let \({{\varvec{\updelta}}}_{obs}\) and \({{\varvec{y}}}_{obs}\) denote, respectively the observed sequence of treatment assignments and responses, obtained from the trial using randomization procedure \(\mathfrak{R}\) . We first compute the observed mean difference \({S}_{obs}=S\left({{\varvec{\updelta}}}_{obs},{{\varvec{y}}}_{obs}\right)={\overline{Y} }_{E}-{\overline{Y} }_{C}\) . Then we use Monte Carlo simulation to generate \(L\) randomization sequences of length \(n\) using procedure \(\mathfrak{R}\) , where \(L\) is some large number. For the \(\ell\mathrm{th}\) generated sequence, \({{\varvec{\updelta}}}_{\ell}\) , compute \({S}_{\ell}=S({{\varvec{\updelta}}}_{\ell},{{\varvec{y}}}_{obs})\) , where \({\ell}=1,\dots ,L\) . The proportion of sequences for which \({S}_{\ell}\) is at least as extreme as \({S}_{obs}\) is computed as \(\widehat{P}=\frac{1}{L}{\sum }_{{\ell}=1}^{L}1\left\{\left|{S}_{\ell}\right|\ge \left|{S}_{obs}\right|\right\}\) . Statistical significance is declared, if \(\widehat{P}<\alpha\) .
T3: Randomization-based test based on ranks : This test procedure follows the same logic as T2, except that the test statistic is calculated based on ranks. Given the vector of observed responses \({{\varvec{y}}}_{obs}=({y}_{1},\dots ,{y}_{n})\) , let \({a}_{jn}\) denote the rank of \({y}_{j}\) among the elements of \({{\varvec{y}}}_{obs}\) . Let \({\overline a}_n\) denote the average of \({a}_{jn}\) ’s, and let \({\boldsymbol a}_n=\left(a_{1n}-{\overline a}_n,...,\alpha_{nn}-{\overline a}_n\right)\boldsymbol'\) . Then a linear rank test statistic has the form \({S}_{obs}={{\varvec{\updelta}}}_{obs}^{\boldsymbol{^{\prime}}}{{\varvec{a}}}_{n}={\sum }_{i=1}^{n}{\delta }_{i}({a}_{in}-{\overline{a} }_{n})\) .
We consider four scenarios of the true mean difference \(\Delta ={\mu }_{E}-{\mu }_{C}\) , which correspond to the Null case ( \(\Delta =0\) ), and three choices of \(\Delta >0\) which correspond to Alternative 1 (power ~ 70%), Alternative 2 (power ~ 80%), and Alternative 3 (power ~ 90%). In all cases, \(n=50\) was used.
Figure 5 summarizes the results of a simulation study comparing 12 randomization designs, under 4 models for the outcome (M1, M2, M3, and M4), 4 scenarios for the mean treatment difference (Null, and Alternatives 1, 2, and 3), using 3 statistical tests (T1, T2, and T3). The operating characteristics of interest are the type I error rate under the Null scenario and the power under the Alternative scenarios. Each scenario was simulated 10,000 times, and each randomization-based test was computed using \(L=\mathrm{10,000}\) sequences.
Simulated type I error rate and power of 12 restricted randomization procedures. Four models for the data generating mechanism of the primary outcome (M1: Normal random sampling; M2: Linear trend; M3: Errors Cauchy; and M4: Selection bias). Four scenarios for the treatment mean difference (Null; Alternatives 1, 2, and 3). Three statistical tests (T1: two-sample t-test; T2: randomization-based test using mean difference; T3: randomization-based test using ranks)
From Fig. 5 , under the normal random sampling model (M1), all considered randomization designs have similar performance: they maintain the type I error rate and have similar power, with all tests. In other words, when population model assumptions are satisfied, any combination of design and analysis should work well and yield reliable and consistent results.
Under the “linear trend” model (M2), the designs have differential performance. First of all, under the Null scenario, only Rand and CRD maintain the type I error rate at 5% with all three tests. For TBD, the t-test is anticonservative, with type I error rate ~ 20%, whereas for nine other procedures the t-test is conservative, with type I error rate in the range 0.1–2%. At the same time, for all 12 designs the two randomization-based tests maintain the nominal type I error rate at 5%. These results are consistent with some previous findings in the literature [ 67 , 68 ]. As regards power, it is reduced significantly compared to the normal random sampling scenario. The t-test seems to be most affected and the randomization-based test using ranks is most robust for a majority of the designs. Remarkably, for CRD the power is similar with all three tests. This signifies the usefulness of randomization-based inference in situations when outcome data are subject to a linear time trend, and the importance of applying randomization-based tests at least as supplemental analyses to likelihood-based test procedures.
Under the “Cauchy errors” model (M3), all designs perform similarly: the randomization-based tests maintain the type I error rate at 5%, whereas the t-test deflates the type I error to 2%. As regards power, all designs also have similar, consistently degraded performance: the t-test is least powerful, and the randomization-based test using ranks has highest power. Overall, under misspecification of the error distribution a randomization-based test using ranks is most appropriate; yet one should acknowledge that its power is still lower than expected.
Under the “selection bias” model (M4), the 12 designs have differential performance. The only procedure that maintained the type I error rate at 5% with all three tests was CRD. For eleven other procedures, inflations of the type I error were observed. In general, the more random the design, the less it was affected by selection bias. For instance, the type I error rate for TBD was ~ 6%; for Rand, BSD(3), and GBCD(1) it was ~ 7.5%; for GBCD(2) and ABCD(2) it was ~ 8–9%; for Efron’s BCD(2/3) it was ~ 12.5%; and the most affected design was PBD(2) for which the type I error rate was ~ 38–40%. These results are consistent with the theory of Blackwell and Hodges [ 28 ] which posits that TBD is least susceptible to selection bias within a class of restricted randomization designs that force exact balance. Finally, under M4, statistical power is inflated by several percentage points compared to the normal random sampling scenario without selection bias.
We performed additional simulations to assess the impact of the bias effect \(\nu\) under selection bias model. The same 12 randomization designs and three statistical tests were evaluated for a trial with \(n=50\) under the Null scenario ( \(\Delta =0\) ), for \(\nu\) in the range of 0 (no bias) to 1 (strong bias). Figure S1 in the Supplementary Materials shows that for all designs but CRD, the type I error rate is increasing in \(\nu\) , with all three tests. The magnitude of the type I error inflation is different across the restricted randomization designs; e.g. for TBD it is minimal, whereas for more restrictive designs it may be large, especially for \(\nu \ge 0.4\) . PBD(2) is particularly vulnerable: for \(\nu\) in the range 0.4–1, its type I error rate is in the range 27–90% (for the nominal \(\alpha =5\) %).
In summary, our Example 1 includes most of the key ingredients of the roadmap for assessment of competing randomization designs which was described in the “ Methods ” section. For the chosen experimental scenarios, we evaluated CRD and several restricted randomization procedures, some of which belonged to the same class but with different values of the parameter (e.g. GBCD with \(\gamma =1, 2, 5\) ). We assessed two measures of imbalance, two measures of lack of randomness (predictability), and a metric that quantifies balance/randomness tradeoff. Based on these criteria, we found that BSD(3) provides overall best performance. We also evaluated type I error and power of selected randomization procedures under several treatment response models. We have observed important links between balance, randomness, type I error rate and power. It is beneficial to consider all these criteria simultaneously as they may complement each other in characterizing statistical properties of randomization designs. In particular, we found that a design that lacks randomness, such as PBD with blocks of 2 or 4, may be vulnerable to selection bias and lead to inflations of the type I error. Therefore, these designs should be avoided, especially in open-label studies. As regards statistical power, since all designs in this example targeted 1:1 allocation ratio (which is optimal if the outcomes are normally distributed and have between-group constant variance), they had very similar power of statistical tests in most scenarios except for the one with chronological bias. In the latter case, randomization-based tests were more robust and more powerful than the standard two-sample t-test under the population model assumption.
Overall, while Example 1 is based on a hypothetical 1:1 RCT, its true purpose is to showcase the thinking process in the application of our general roadmap. The following three examples are considered in the context of real RCTs.
Selection bias can arise if the investigator can intelligently guess at least part of the randomization sequence yet to be allocated and, on that basis, preferentially and strategically assigns study subjects to treatments. Although it is generally not possible to prove that a particular study has been infected with selection bias, there are examples of published RCTs that do show some evidence to have been affected by it. Suspect trials are, for example, those with strong observed baseline covariate imbalances that consistently favor the active treatment group [ 16 ]. In what follows we describe an example of an RCT where the stratified block randomization procedure used was vulnerable to potential selection biases, and discuss potential alternatives that may reduce this vulnerability.
Etanercept was studied in patients aged 4 to 17 years with polyarticular juvenile rheumatoid arthritis [ 85 ]. The trial consisted of two parts. During the first, open-label part of the trial, patients received etanercept twice weekly for up to three months. Responders from this initial part of the trial were then randomized, at a 1:1 ratio, in the second, double-blind, placebo-controlled part of the trial to receive etanercept or placebo for four months or until a flare of the disease occurred. The primary efficacy outcome, the proportion of patients with disease flare, was evaluated in the double-blind part. Among the 51 randomized patients, 21 of the 26 placebo patients (81%) withdrew because of disease flare, compared with 7 of the 25 etanercept patients (28%), yielding a p- value of 0.003.
Regulatory review by the Food and Drug Administrative (FDA) identified vulnerability to selection biases in the study design of the double-blind part and potential issues in study conduct. These findings were succinctly summarized in [ 16 ] (pp.51–52).
Specifically, randomization was stratified by study center and number of active joints (≤ 2 vs. > 2, referred to as “few” or “many” in what follows), with blocked randomization within each stratum using a block size of two. Furthermore, randomization codes in corresponding “few” and “many” blocks within each study center were mirror images of each other. For example, if the first block within the “few” active joints stratum of a given center is “placebo followed by etanercept”, then the first block within the “many” stratum of the same center would be “etanercept followed by placebo”. While this appears to be an attempt to improve treatment balance in this small trial, unblinding of one treatment assignment may lead to deterministic predictability of three upcoming assignments. While the double-blind nature of the trial alleviated this concern to some extent, it should be noted that all patients did receive etanercept previously in the initial open-label part of the trial. Chances of unblinding may not be ignorable if etanercept and placebo have immediately evident different effects or side effects. The randomized withdrawal design was appropriate in this context to improve statistical power in identifying efficacious treatments, but the specific randomization procedure used in the trial increased vulnerability to selection biases if blinding cannot be completely maintained.
FDA review also identified that four patients were randomized from the wrong “few” or “many” strata, in three of which (3/51 = 5.9%) it was foreseeable that the treatment received could have been reversed compared to what the patient would have received if randomized in the correct stratum. There were also some patients randomized out of order. Imbalance in baseline characteristics were observed in age (mean ages of 8.9 years in the etanercept arm vs. that of 12.2 years in the placebo arm) and corticosteroid use at baseline (50% vs. 24%).
While the authors [ 85 ] concluded that “The unequal randomization did not affect the study results”, and indeed it was unknown whether the imbalance was a chance occurrence or in part caused by selection biases, the trial could have used better alternative randomization procedures to reduce vulnerability to potential selection bias. To illustrate the latter point, let us compare predictability of two randomization procedures – permuted block design (PBD) and big stick design (BSD) for several values of the maximum tolerated imbalance (MTI). We use BSD here for the illustration purpose because it was found to provide a very good balance/randomness tradeoff based on our simulations in Example 1 . In essence, BSD provides the same level of imbalance control as PBD but with stronger encryption.
Table 3 reports two metrics for PBD and BSD: proportion of deterministic assignments within a randomization sequence, and excess correct guess probability. The latter metric is the absolute increase in proportion of correct guesses for a given procedure over CRD that has 50% probability of correct guesses under the “optimal guessing strategy”. Footnote 1 Note that for MTI = 1, BSD is equivalent to PBD with blocks of two. However, by increasing MTI, one can substantially decrease predictability. For instance, going from MTI = 1 in the BSD to an MTI of 2 or 3 (two bottom rows), the proportion of deterministic assignments decreases from 50% to 25% and 16.7%, respectively, and excess correct guess probability decreases from 25% to 12.5% and 8.3%, which is a substantial reduction in risk of selection bias. In addition to simplicity and lower predictability for the same level of MTI control, BSD has another important advantage: investigators are not accustomed to it (as they are to the PBD), and therefore it has potential for complete elimination of prediction through thwarting enough early prediction attempts.
Our observations here are also generalizable to other MTI randomization methods, such as the maximal procedure [ 35 ], Chen’s designs [ 38 , 39 ], block urn design [ 40 ], just to name a few. MTI randomization procedures can be also used as building elements for more complex stratified randomization schemes [ 86 ].
Chronological bias may occur if a trial recruitment period is long, and there is a drift in some covariate over time that is subsequently not accounted for in the analysis [ 29 ]. To mitigate risk of chronological bias, treatment assignments should be balanced over time. In this regard, the ICH E9 guideline has the following statement [ 31 ]:
“...Although unrestricted randomisation is an acceptable approach, some advantages can generally be gained by randomising subjects in blocks. This helps to increase the comparability of the treatment groups, particularly when subject characteristics may change over time, as a result, for example, of changes in recruitment policy. It also provides a better guarantee that the treatment groups will be of nearly equal size...”
While randomization in blocks of two ensures best balance, it is highly predictable. In practice, a sensible tradeoff between balance and randomness is desirable. In the following example, we illustrate the issue of chronological bias in the context of a real RCT.
Altman and Royston [ 87 ] gave several examples of clinical studies with hidden time trends. For instance, an RCT to compare azathioprine versus placebo in patients with primary biliary cirrhosis (PBC) with respect to overall survival was an international, double-blind, randomized trial including 248 patients of whom 127 received azathioprine and 121 placebo [ 88 ]. The study had a recruitment period of 7 years. A major prognostic factor for survival was the serum bilirubin level on entry to the trial. Altman and Royston [ 87 ] provided a cusum plot of log bilirubin which showed a strong decreasing trend over time – patients who entered the trial later had, on average, lower bilirubin levels, and therefore better prognosis. Despite that the trial was randomized, there was some evidence of baseline imbalance with respect to serum bilirubin between azathioprine and placebo groups. The analysis using Cox regression adjusted for serum bilirubin showed that the treatment effect of azathioprine was statistically significant ( p = 0.01), with azathioprine reducing the risk of dying to 59% of that observed during the placebo treatment.
The azathioprine trial [ 88 ] provides a very good example for illustrating importance of both the choice of a randomization design and a subsequent statistical analysis. We evaluated several randomization designs and analysis strategies under the given time trend through simulation. Since we did not have access to the patient level data from the azathioprine trial, we simulated a dataset of serum bilirubin values from 248 patients that resembled that in the original paper (Fig. 1 in [ 87 ]); see Fig. 6 below.
reproduced from Fig. 1 of Altman and Royston [ 87 ]
Cusum plot of baseline log serum bilirubin level of 248 subjects from the azathioprine trial,
For the survival outcomes, we use the following data generating mechanism [ 71 , 89 ]: let \({h}_{i}(t,{\delta }_{i})\) denote the hazard function of the \(i\mathrm{th}\) patient at time \(t\) such that
where \({h}_{c}(t)\) is an unspecified baseline hazard, \(\log HR\) is the true value of the log-transformed hazard ratio, and \({u}_{i}\) is the log serum bilirubin of the \(i\mathrm{th}\) patient at study entry.
Our main goal is to evaluate the impact of the time trend in bilirubin on the type I error rate and power. We consider seven randomization designs: CRD, Rand, TBD, PBD(2), PBD(4), BSD(3), and GBCD(2). The latter two designs were found to be the top two performing procedures based on our simulation results in Example 1 (cf. Table 2 ). PBD(4) is the most commonly used procedure in clinical trial practice. Rand and TBD are two designs that ensure exact balance in the final treatment numbers. CRD is the most random design, and PBD(2) is the most balanced design.
To evaluate both type I error and power, we consider two values for the true treatment effect: \(HR=1\) (Null) and \(HR=0.6\) (Alternative). For data analysis, we use the Cox regression model, either with or without adjustment for serum bilirubin. Furthermore, we assess two approaches to statistical inference: population model-based and randomization-based. For the sake of simplicity, we let \({h}_{c}\left(t\right)\equiv 1\) (exponential distribution) and assume no censoring when simulating the data.
For each combination of the design, experimental scenario, and data analysis strategy, a trial with 248 patients was simulated 10,000 times. Each randomization-based test was computed using \(L=\mathrm{1,000}\) sequences. In each simulation, we used the same time trend in serum bilirubin as described. Through simulation, we estimated the probability of a statistically significant baseline imbalance in serum bilirubin between azathioprine and placebo groups, type I error rate, and power.
First, we observed that the designs differ with respect to their potential to achieve baseline covariate balance under the time trend. For instance, probability of a statistically significant group difference on serum bilirubin (two-sided P < 0.05) is ~ 24% for TBD, ~ 10% for CRD, ~ 2% for GBCD(2), ~ 0.9% for Rand, and ~ 0% for BSD(3), PBD(4), and PBD(2).
Second, a failure to adjust for serum bilirubin in the analysis can negatively impact statistical inference. Table 4 shows the type I error and power of statistical analyses unadjusted and adjusted for serum bilirubin, using population model-based and randomization-based approaches.
If we look at the type I error for the population model-based, unadjusted analysis, we can see that only CRD and Rand are valid (maintain the type I error rate at 5%), whereas TBD is anticonservative (~ 15% type I error) and PBD(2), PBD(4), BSD(3), and GBCD(2) are conservative (~ 1–2% type I error). These findings are consistent with the ones for the two-sample t-test described earlier in the current paper, and they agree well with other findings in the literature [ 67 ]. By contrast, population model-based covariate-adjusted analysis is valid for all seven randomization designs. Looking at the type I error for the randomization-based analyses, all designs yield consistent valid results (~ 5% type I error), with or without adjustment for serum bilirubin.
As regards statistical power, unadjusted analyses are substantially less powerful then the corresponding covariate-adjusted analysis, for all designs with either population model-based or randomization-based approaches. For the population model-based, unadjusted analysis, the designs have ~ 59–65% power, whereas than the corresponding covariate-adjusted analyses have ~ 97% power. The most striking results are observed with the randomization-based approach: the power of unadjusted analysis is quite different across seven designs: it is ~ 37% for TBD, ~ 60–61% for CRD and Rand, ~ 80–87% for BCD(3), GBCD(2), and PBD(4), and it is ~ 90% for PBD(2). Thus, PBD(2) is the most powerful approach if a time trend is present, statistical analysis strategy is randomization-based, and no adjustment for time trend is made. Furthermore, randomization-based covariate-adjusted analyses have ~ 97% power for all seven designs. Remarkably, the power of covariate-adjusted analysis is identical for population model-based and randomization-based approaches.
Overall, this example highlights the importance of covariate-adjusted analysis, which should be straightforward if a covariate affected by a time trend is known (e.g. serum bilirubin in our example). If a covariate is unknown or hidden, then unadjusted analysis following a conventional test may have reduced power and distorted type I error (although the designs such as CRD and Rand do ensure valid statistical inference). Alternatively, randomization-based tests can be applied. The resulting analysis will be valid but may be potentially less powerful. The degree of loss in power following randomization-based test depends on the randomization design: designs that force greater treatment balance over time will be more powerful. In fact, PBD(2) is shown to be most powerful under such circumstances; however, as we have seen in Example 1 and Example 2, a major deficiency of PBD(2) is its vulnerability to selection bias. From Table 4 , and taking into account the earlier findings in this paper, BSD(3) seems to provide a very good risk mitigation strategy against unknown time trends.
In our last example, we illustrate the importance of the careful choice of randomization design and subsequent statistical analysis in a nonstandard RCT with small sample size. Due to confidentiality and because this study is still in conduct, we do not disclose all details here except for that the study is an ongoing phase II RCT in a very rare and devastating autoimmune disease in children.
The study includes three periods: an open-label single-arm active treatment for 28 weeks to identify treatment responders (Period 1), a 24-week randomized treatment withdrawal period to primarily assess the efficacy of the active treatment vs. placebo (Period 2), and a 3-year long-term safety, open-label active treatment (Period 3). Because of a challenging indication and the rarity of the disease, the study plans to enroll up to 10 male or female pediatric patients in order to randomize 8 patients (4 per treatment arm) in Period 2 of the study. The primary endpoint for assessing the efficacy of active treatment versus placebo is the proportion of patients with disease flare during the 24-week randomized withdrawal phase. The two groups will be compared using Fisher’s exact test. In case of a successful outcome, evidence of clinical efficacy from this study will be also used as part of a package to support the claim for drug effectiveness.
Very small sample sizes are not uncommon in clinical trials of rare diseases [ 90 , 91 ]. Naturally, there are several methodological challenges for this type of study. A major challenge is generalizability of the results from the RCT to a population. In this particular indication, no approved treatment exists, and there is uncertainty on disease epidemiology and the exact number of patients with the disease who would benefit from treatment (patient horizon). Another challenge is the choice of the randomization procedure and the primary statistical analysis. In this study, one can enumerate upfront all 25 possible outcomes: {0, 1, 2, 3, 4} responders on active treatment, and {0, 1, 2, 3, 4} responders on placebo, and create a chart quantifying the level of evidence ( p- value) for each experimental outcome, and the corresponding decision. Before the trial starts, a discussion with the regulatory agency is warranted to agree upon on what level of evidence must be achieved in order to declare the study a “success”.
Let us perform a hypothetical planning for the given study. Suppose we go with a standard population-based approach, for which we test the hypothesis \({H}_{0}:{p}_{E}={p}_{C}\) vs. \({H}_{0}:{p}_{E}>{p}_{C}\) (where \({p}_{E}\) and \({p}_{C}\) stand for the true success rates for the experimental and control group, respectively) using Fisher’s exact test. Table 5 provides 1-sided p- values of all possible experimental outcomes. One could argue that a p- value < 0.1 may be viewed as a convincing level of evidence for this study. There are only 3 possibilities that can lead to this outcome: 3/4 vs. 0/4 successes ( p = 0.0714); 4/4 vs. 0/4 successes ( p = 0.0143); and 4/4 vs. 1/4 successes ( p = 0.0714). For all other outcomes, p ≥ 0.2143, and thus the study would be regarded as a “failure”.
Now let us consider a randomization-based inference approach. For illustration purposes, we consider four restricted randomization procedures—Rand, TBD, PBD(4), and PBD(2)—that exactly achieve 4:4 allocation. These procedures are legitimate choices because all of them provide exact sample sizes (4 per treatment group), which is essential in this trial. The reference set of either Rand or TBD includes \(70=\left(\begin{array}{c}8\\ 4\end{array}\right)\) unique sequences though with different probabilities of observing each sequence. For Rand, these sequences are equiprobable, whereas for TBD, some sequences are more likely than others. For PBD( \(2b\) ), the size of the reference set is \({\left\{\left(\begin{array}{c}2b\\ b\end{array}\right)\right\}}^{B}\) , where \(B=n/2b\) is the number of blocks of length \(2b\) for a trial of size \(n\) (in our example \(n=8\) ). This results in in a reference set of \({2}^{4}=16\) unique sequences with equal probability of 1/16 for PBD(2), and of \({6}^{2}=36\) unique sequences with equal probability of 1/36 for PBD(4).
In practice, the study statistician picks a treatment sequence at random from the reference set according to the chosen design. The details (randomization seed, chosen sequence, etc.) are carefully documented and kept confidential. For the chosen sequence and the observed outcome data, a randomization-based p- value is the sum of probabilities of all sequences in the reference set that yield the result at least as large in favor of the experimental treatment as the one observed. This p- value will depend on the randomization design, the observed randomization sequence and the observed outcomes, and it may also be different from the population-based analysis p- value.
To illustrate this, suppose the chosen randomization sequence is CEECECCE (C stands for control and E stands for experimental), and the observed responses are FSSFFFFS (F stands for failure and S stands for success). Thus, we have 3/4 successes on experimental and 0/4 successes on control. Then, the randomization-based p- value is 0.0714 for Rand; 0.0469 for TBD, 0.1250 for PBD(2); 0.0833 for PBD(4); and it is 0.0714 for the population-based analysis. The coincidence of the randomization-based p- value for Rand and the p- value of the population-based analysis is not surprising. Fisher's exact test is a permutation test and in the case of Rand as randomization procedure, the p- value of a permutation test and of a randomization test are always equal. However, despite the numerical equality, we should be mindful of different assumptions (population/randomization model).
Likewise, randomization-based p- values can be derived for other combinations of observed randomization sequences and responses. All these details (the chosen randomization design, the analysis strategy, and corresponding decisions) would have to be fully specified upfront (before the trial starts) and agreed upon by both the sponsor and the regulator. This would remove any ambiguity when the trial data become available.
As the example shows, the level of evidence in the randomization-based inference approach depends on the chosen randomization procedure and the resulting decisions may be different depending on the specific procedure. For instance, if the level of significance is set to 10% as a criterion for a “successful trial”, then with the observed data (3/4 vs. 0/4), there would be a significant test result for TBD, Rand, PBD(4), but not for PBD(2).
Randomization is the foundation of any RCT involving treatment comparison. Randomization is not a single technique, but a very broad class of statistical methodologies for design and analysis of clinical trials [ 10 ]. In this paper, we focused on the randomized controlled two-arm trial designed with equal allocation, which is the gold standard research design to generate clinical evidence in support of regulatory submissions. Even in this relatively simple case, there are various restricted randomization procedures with different probabilistic structures and different statistical properties, and the choice of a randomization design for any RCT must be made judiciously.
For the 1:1 RCT, there is a dual goal of balancing treatment assignments while maintaining allocation randomness. Final balance in treatment totals frequently maximizes statistical power for treatment comparison. It is also important to maintain balance at intermediate steps during the trial, especially in long-term studies, to mitigate potential for chronological bias. At the same time, a procedure should have high degree of randomness so that treatment assignments within the sequence are not easily predictable; otherwise, the procedure may be vulnerable to selection bias, especially in open-label studies. While balance and randomness are competing criteria, it is possible to find restricted randomization procedures that provide a sensible tradeoff between these criteria, e.g. the MTI procedures, of which the big stick design (BSD) [ 37 ] with a suitably chosen MTI limit, such as BSD(3), has very appealing statistical properties. In practice, the choice of a randomization procedure should be made after a systematic evaluation of different candidate procedures under different experimental scenarios for the primary outcome, including cases when model assumptions are violated.
In our considered examples we showed that the choice of randomization design, data analytic technique (e.g. parametric or nonparametric model, with or without covariate adjustment), and the decision on whether to include randomization in the analysis (e.g. randomization-based or population model-based analysis) are all very important considerations. Furthermore, these examples highlight the importance of using randomization designs that provide strong encryption of the randomization sequence, importance of covariate adjustment in the analysis, and the value of statistical thinking in nonstandard RCTs with very small sample sizes and small patient horizon. Finally, in this paper we have discussed randomization-based tests as robust and valid alternatives to likelihood-based tests. Randomization-based inference is a useful approach in clinical trials and should be considered by clinical researchers more frequently [ 14 ].
Given the breadth of the subject of randomization, many important topics have been omitted from the current paper. Here we outline just a few of them.
In this paper, we have focused on the 1:1 RCT. However, clinical trials may involve more than two treatment arms. Extensions of equal randomization to the case of multiple treatment arms is relatively straightforward for many restricted randomization procedures [ 10 ]. Some trials with two or more treatment arms use unequal allocation (e.g. 2:1). Randomization procedures with unequal allocation ratios require careful consideration. For instance, an important and desirable feature is the allocation ratio preserving property (ARP). A randomization procedure targeting unequal allocation is said to be ARP, if at each allocation step the unconditional probability of a particular treatment assignment is the same as the target allocation proportion for this treatment [ 92 ]. Non-ARP procedures may have fluctuations in the unconditional randomization probability from allocation to allocation, which may be problematic [ 93 ]. Fortunately, some randomization procedures naturally possess the ARP property, and there are approaches to correct for a non-ARP deficiency – these should be considered in the design of RCTs with unequal allocation ratios [ 92 , 93 , 94 ].
In many RCTs, investigators may wish to prospectively balance treatment assignments with respect to important prognostic covariates. For a small number of categorical covariates one can use stratified randomization by applying separate MTI randomization procedures within strata [ 86 ]. However, a potential advantage of stratified randomization decreases as the number of stratification variables increases [ 95 ]. In trials where balance over a large number of covariates is sought and the sample size is small or moderate, one can consider covariate-adaptive randomization procedures that achieve balance within covariate margins, such as the minimization procedure [ 96 , 97 ], optimal model-based procedures [ 46 ], or some other covariate-adaptive randomization technique [ 98 ]. To achieve valid and powerful results, covariate-adaptive randomization design must be followed by covariate-adjusted analysis [ 99 ]. Special considerations are required for covariate-adaptive randomization designs with more than two treatment arms and/or unequal allocation ratios [ 100 ].
In some clinical research settings, such as trials for rare and/or life threatening diseases, there is a strong ethical imperative to increase the chance of a trial participant to receive an empirically better treatment. Response-adaptive randomization (RAR) has been increasingly considered in practice, especially in oncology [ 101 , 102 ]. Very extensive methodological research on RAR has been done [ 103 , 104 ]. RAR is increasingly viewed as an important ingredient of complex clinical trials such as umbrella and platform trial designs [ 105 , 106 ]. While RAR, when properly applied, has its merit, the topic has generated a lot of controversial discussions over the years [ 107 , 108 , 109 , 110 , 111 ]. Amid the ongoing COVID-19 pandemic, RCTs evaluating various experimental treatments for critically ill COVID-19 patients do incorporate RAR in their design; see, for example, the I-SPY COVID-19 trial ( https://clinicaltrials.gov/ct2/show/NCT04488081 ).
Randomization can also be applied more broadly than in conventional RCT settings where randomization units are individual subjects. For instance, in a cluster randomized trial, not individuals but groups of individuals (clusters) are randomized among one or more interventions or the control [ 112 ]. Observations from individuals within a given cluster cannot be regarded as independent, and special statistical techniques are required to design and analyze cluster-randomized experiments. In some clinical trial designs, randomization is applied within subjects. For instance, the micro-randomized trial (MRT) is a novel design for development of mobile treatment interventions in which randomization is applied to select different treatment options for individual participants over time to optimally support individuals’ health behaviors [ 113 ].
Finally, beyond the scope of the present paper are the regulatory perspectives on randomization and practical implementation aspects, including statistical software and information systems to generate randomization schedules in real time. We hope to cover these topics in subsequent papers.
All results reported in this paper are based either on theoretical considerations or simulation evidence. The computer code (using R and Julia programming languages) is fully documented and is available upon reasonable request.
Guess the next allocation as the treatment with fewest allocations in the sequence thus far, or make a random guess if the treatment numbers are equal.
Byar DP, Simon RM, Friedewald WT, Schlesselman JJ, DeMets DL, Ellenberg JH, Gail MH, Ware JH. Randomized clinical trials—perspectives on some recent ideas. N Engl J Med. 1976;295:74–80.
Article CAS PubMed Google Scholar
Collins R, Bowman L, Landray M, Peto R. The magic of randomization versus the myth of real-world evidence. N Engl J Med. 2020;382:674–8.
Article PubMed Google Scholar
ICH Harmonised tripartite guideline. General considerations for clinical trials E8. 1997.
Hernán MA, Robins JM. Using big data to emulate a target trial when a randomized trial is not available. Am J Epidemiol. 2016;183(8):758–64.
Article PubMed PubMed Central Google Scholar
Byar DP. Why data bases should not replace randomized clinical trials. Biometrics. 1980;36:337–42.
Mehra MR, Desai SS, Kuy SR, Henry TD, Patel AN. Cardiovascular disease, drug therapy, and mortality in Covid-19. N Engl J Med. 2020;382:e102. https://www.nejm.org/doi/10.1056/NEJMoa2007621 .
Mehra MR, Desai SS, Ruschitzka F, Patel AN. Hydroxychloroquine or chloroquine with or without a macrolide for treatment of COVID-19: a multinational registry analysis. Lancet. 2020. https://www.sciencedirect.com/science/article/pii/S0140673620311806?via%3Dihub .
Mehra MR, Desai SS, Kuy SR, Henry TD, Patel AN. Retraction: Cardiovascular disease, drug therapy, and mortality in Covid-19. N Engl J Med. 2020. https://doi.org/10.1056/NEJMoa2007621 . https://www.nejm.org/doi/10.1056/NEJMc2021225 .
Medical Research Council. Streptomycin treatment of pulmonary tuberculosis. BMJ. 1948;2:769–82.
Article Google Scholar
Rosenberger WF, Lachin J. Randomization in clinical trials: theory and practice. 2nd ed. New York: Wiley; 2015.
Google Scholar
Fisher RA. The design of experiments. Edinburgh: Oliver and Boyd; 1935.
Hill AB. The clinical trial. Br Med Bull. 1951;7(4):278–82.
Hill AB. Memories of the British streptomycin trial in tuberculosis: the first randomized clinical trial. Control Clin Trials. 1990;11:77–9.
Rosenberger WF, Uschner D, Wang Y. Randomization: The forgotten component of the randomized clinical trial. Stat Med. 2019;38(1):1–30 (with discussion).
Berger VW. Trials: the worst possible design (except for all the rest). Int J Person Centered Med. 2011;1(3):630–1.
Berger VW. Selection bias and covariate imbalances in randomized clinical trials. New York: Wiley; 2005.
Book Google Scholar
Berger VW. The alleged benefits of unrestricted randomization. In: Berger VW, editor. Randomization, masking, and allocation concealment. Boca Raton: CRC Press; 2018. p. 39–50.
Altman DG, Bland JM. Treatment allocation in controlled trials: why randomise? BMJ. 1999;318:1209.
Article CAS PubMed PubMed Central Google Scholar
Senn S. Testing for baseline balance in clinical trials. Stat Med. 1994;13:1715–26.
Senn S. Seven myths of randomisation in clinical trials. Stat Med. 2013;32:1439–50.
Rosenberger WF, Sverdlov O. Handling covariates in the design of clinical trials. Stat Sci. 2008;23:404–19.
Proschan M, Dodd L. Re-randomization tests in clinical trials. Stat Med. 2019;38:2292–302.
Spiegelhalter DJ, Freedman LS, Parmar MK. Bayesian approaches to randomized trials. J R Stat Soc A Stat Soc. 1994;157(3):357–87.
Berry SM, Carlin BP, Lee JJ, Muller P. Bayesian adaptive methods for clinical trials. Boca Raton: CRC Press; 2010.
Lachin J. Properties of simple randomization in clinical trials. Control Clin Trials. 1988;9:312–26.
Pocock SJ. Allocation of patients to treatment in clinical trials. Biometrics. 1979;35(1):183–97.
Simon R. Restricted randomization designs in clinical trials. Biometrics. 1979;35(2):503–12.
Blackwell D, Hodges JL. Design for the control of selection bias. Ann Math Stat. 1957;28(2):449–60.
Matts JP, McHugh R. Analysis of accrual randomized clinical trials with balanced groups in strata. J Chronic Dis. 1978;31:725–40.
Matts JP, Lachin JM. Properties of permuted-block randomization in clinical trials. Control Clin Trials. 1988;9:327–44.
ICH Harmonised Tripartite Guideline. Statistical principles for clinical trials E9. 1998.
Shao H, Rosenberger WF. Properties of the random block design for clinical trials. In: Kunert J, Müller CH, Atkinson AC, eds. mODa 11 – Advances in model-oriented design and analysis. Springer International Publishing Switzerland; 2016. 225–233.
Zhao W. Evolution of restricted randomization with maximum tolerated imbalance. In: Berger VW, editor. Randomization, masking, and allocation concealment. Boca Raton: CRC Press; 2018. p. 61–81.
Bailey RA, Nelson PR. Hadamard randomization: a valid restriction of random permuted blocks. Biom J. 2003;45(5):554–60.
Berger VW, Ivanova A, Knoll MD. Minimizing predictability while retaining balance through the use of less restrictive randomization procedures. Stat Med. 2003;22:3017–28.
Zhao W, Berger VW, Yu Z. The asymptotic maximal procedure for subject randomization in clinical trials. Stat Methods Med Res. 2018;27(7):2142–53.
Soares JF, Wu CFJ. Some restricted randomization rules in sequential designs. Commun Stat Theory Methods. 1983;12(17):2017–34.
Chen YP. Biased coin design with imbalance tolerance. Commun Stat Stochastic Models. 1999;15(5):953–75.
Chen YP. Which design is better? Ehrenfest urn versus biased coin. Adv Appl Probab. 2000;32:738–49.
Zhao W, Weng Y. Block urn design—A new randomization algorithm for sequential trials with two or more treatments and balanced or unbalanced allocation. Contemp Clin Trials. 2011;32:953–61.
van der Pas SL. Merged block randomisation: A novel randomisation procedure for small clinical trials. Clin Trials. 2019;16(3):246–52.
Zhao W. Letter to the Editor – Selection bias, allocation concealment and randomization design in clinical trials. Contemp Clin Trials. 2013;36:263–5.
Berger VW, Bejleri K, Agnor R. Comparing MTI randomization procedures to blocked randomization. Stat Med. 2016;35:685–94.
Efron B. Forcing a sequential experiment to be balanced. Biometrika. 1971;58(3):403–17.
Wei LJ. The adaptive biased coin design for sequential experiments. Ann Stat. 1978;6(1):92–100.
Atkinson AC. Optimum biased coin designs for sequential clinical trials with prognostic factors. Biometrika. 1982;69(1):61–7.
Smith RL. Sequential treatment allocation using biased coin designs. J Roy Stat Soc B. 1984;46(3):519–43.
Ball FG, Smith AFM, Verdinelli I. Biased coin designs with a Bayesian bias. J Stat Planning Infer. 1993;34(3):403–21.
BaldiAntognini A, Giovagnoli A. A new ‘biased coin design’ for the sequential allocation of two treatments. Appl Stat. 2004;53(4):651–64.
Atkinson AC. Selecting a biased-coin design. Stat Sci. 2014;29(1):144–63.
Rosenberger WF. Randomized urn models and sequential design. Sequential Anal. 2002;21(1&2):1–41 (with discussion).
Wei LJ. A class of designs for sequential clinical trials. J Am Stat Assoc. 1977;72(358):382–6.
Wei LJ, Lachin JM. Properties of the urn randomization in clinical trials. Control Clin Trials. 1988;9:345–64.
Schouten HJA. Adaptive biased urn randomization in small strata when blinding is impossible. Biometrics. 1995;51(4):1529–35.
Ivanova A. A play-the-winner-type urn design with reduced variability. Metrika. 2003;58:1–13.
Kundt G. A new proposal for setting parameter values in restricted randomization methods. Methods Inf Med. 2007;46(4):440–9.
Kalish LA, Begg CB. Treatment allocation methods in clinical trials: a review. Stat Med. 1985;4:129–44.
Zhao W, Weng Y, Wu Q, Palesch Y. Quantitative comparison of randomization designs in sequential clinical trials based on treatment balance and allocation randomness. Pharm Stat. 2012;11:39–48.
Flournoy N, Haines LM, Rosenberger WF. A graphical comparison of response-adaptive randomization procedures. Statistics in Biopharmaceutical Research. 2013;5(2):126–41.
Hilgers RD, Uschner D, Rosenberger WF, Heussen N. ERDO – a framework to select an appropriate randomization procedure for clinical trials. BMC Med Res Methodol. 2017;17:159.
Burman CF. On sequential treatment allocations in clinical trials. PhD Thesis Dept. Mathematics, Göteborg. 1996.
Azriel D, Mandel M, Rinott Y. Optimal allocation to maximize the power of two-sample tests for binary response. Biometrika. 2012;99(1):101–13.
Begg CB, Kalish LA. Treatment allocation for nonlinear models in clinical trials: the logistic model. Biometrics. 1984;40:409–20.
Kalish LA, Harrington DP. Efficiency of balanced treatment allocation for survival analysis. Biometrics. 1988;44(3):815–21.
Sverdlov O, Rosenberger WF. On recent advances in optimal allocation designs for clinical trials. J Stat Theory Practice. 2013;7(4):753–73.
Sverdlov O, Ryeznik Y, Wong WK. On optimal designs for clinical trials: an updated review. J Stat Theory Pract. 2020;14:10.
Rosenkranz GK. The impact of randomization on the analysis of clinical trials. Stat Med. 2011;30:3475–87.
Galbete A, Rosenberger WF. On the use of randomization tests following adaptive designs. J Biopharm Stat. 2016;26(3):466–74.
Proschan M. Influence of selection bias on type I error rate under random permuted block design. Stat Sin. 1994;4:219–31.
Kennes LN, Cramer E, Hilgers RD, Heussen N. The impact of selection bias on test decisions in randomized clinical trials. Stat Med. 2011;30:2573–81.
PubMed Google Scholar
Rückbeil MV, Hilgers RD, Heussen N. Assessing the impact of selection bias on test decisions in trials with a time-to-event outcome. Stat Med. 2017;36:2656–68.
Berger VW, Exner DV. Detecting selection bias in randomized clinical trials. Control Clin Trials. 1999;25:515–24.
Ivanova A, Barrier RC, Berger VW. Adjusting for observable selection bias in block randomized trials. Stat Med. 2005;24:1537–46.
Kennes LN, Rosenberger WF, Hilgers RD. Inference for blocked randomization under a selection bias model. Biometrics. 2015;71:979–84.
Hilgers RD, Manolov M, Heussen N, Rosenberger WF. Design and analysis of stratified clinical trials in the presence of bias. Stat Methods Med Res. 2020;29(6):1715–27.
Hamilton SA. Dynamically allocating treatment when the cost of goods is high and drug supply is limited. Control Clin Trials. 2000;21(1):44–53.
Zhao W. Letter to the Editor – A better alternative to the inferior permuted block design is not necessarily complex. Stat Med. 2016;35:1736–8.
Berger VW. Pros and cons of permutation tests in clinical trials. Stat Med. 2000;19:1319–28.
Simon R, Simon NR. Using randomization tests to preserve type I error with response adaptive and covariate adaptive randomization. Statist Probab Lett. 2011;81:767–72.
Tamm M, Cramer E, Kennes LN, Hilgers RD. Influence of selection bias on the test decision. Methods Inf Med. 2012;51:138–43.
Tamm M, Hilgers RD. Chronological bias in randomized clinical trials arising from different types of unobserved time trends. Methods Inf Med. 2014;53:501–10.
BaldiAntognini A, Rosenberger WF, Wang Y, Zagoraiou M. Exact optimum coin bias in Efron’s randomization procedure. Stat Med. 2015;34:3760–8.
Chow SC, Shao J, Wang H, Lokhnygina. Sample size calculations in clinical research. 3rd ed. Boca Raton: CRC Press; 2018.
Heritier S, Gebski V, Pillai A. Dynamic balancing randomization in controlled clinical trials. Stat Med. 2005;24:3729–41.
Lovell DJ, Giannini EH, Reiff A, et al. Etanercept in children with polyarticular juvenile rheumatoid arthritis. N Engl J Med. 2000;342(11):763–9.
Zhao W. A better alternative to stratified permuted block design for subject randomization in clinical trials. Stat Med. 2014;33:5239–48.
Altman DG, Royston JP. The hidden effect of time. Stat Med. 1988;7:629–37.
Christensen E, Neuberger J, Crowe J, et al. Beneficial effect of azathioprine and prediction of prognosis in primary biliary cirrhosis. Gastroenterology. 1985;89:1084–91.
Rückbeil MV, Hilgers RD, Heussen N. Randomization in survival trials: An evaluation method that takes into account selection and chronological bias. PLoS ONE. 2019;14(6):e0217964.
Article CAS Google Scholar
Hilgers RD, König F, Molenberghs G, Senn S. Design and analysis of clinical trials for small rare disease populations. J Rare Dis Res Treatment. 2016;1(3):53–60.
Miller F, Zohar S, Stallard N, Madan J, Posch M, Hee SW, Pearce M, Vågerö M, Day S. Approaches to sample size calculation for clinical trials in rare diseases. Pharm Stat. 2017;17:214–30.
Kuznetsova OM, Tymofyeyev Y. Preserving the allocation ratio at every allocation with biased coin randomization and minimization in studies with unequal allocation. Stat Med. 2012;31(8):701–23.
Kuznetsova OM, Tymofyeyev Y. Brick tunnel and wide brick tunnel randomization for studies with unequal allocation. In: Sverdlov O, editor. Modern adaptive randomized clinical trials: statistical and practical aspects. Boca Raton: CRC Press; 2015. p. 83–114.
Kuznetsova OM, Tymofyeyev Y. Expansion of the modified Zelen’s approach randomization and dynamic randomization with partial block supplies at the centers to unequal allocation. Contemp Clin Trials. 2011;32:962–72.
EMA. Guideline on adjustment for baseline covariates in clinical trials. 2015.
Taves DR. Minimization: A new method of assigning patients to treatment and control groups. Clin Pharmacol Ther. 1974;15(5):443–53.
Pocock SJ, Simon R. Sequential treatment assignment with balancing for prognostic factors in the controlled clinical trial. Biometrics. 1975;31(1):103–15.
Hu F, Hu Y, Ma Z, Rosenberger WF. Adaptive randomization for balancing over covariates. Wiley Interdiscipl Rev Computational Stat. 2014;6(4):288–303.
Senn S. Statistical issues in drug development. 2nd ed. Wiley-Interscience; 2007.
Kuznetsova OM, Tymofyeyev Y. Covariate-adaptive randomization with unequal allocation. In: Sverdlov O, editor. Modern adaptive randomized clinical trials: statistical and practical aspects. Boca Raton: CRC Press; 2015. p. 171–97.
Berry DA. Adaptive clinical trials: the promise and the caution. J Clin Oncol. 2011;29(6):606–9.
Trippa L, Lee EQ, Wen PY, Batchelor TT, Cloughesy T, Parmigiani G, Alexander BM. Bayesian adaptive randomized trial design for patients with recurrent glioblastoma. J Clin Oncol. 2012;30(26):3258–63.
Hu F, Rosenberger WF. The theory of response-adaptive randomization in clinical trials. New York: Wiley; 2006.
Atkinson AC, Biswas A. Randomised response-adaptive designs in clinical trials. Boca Raton: CRC Press; 2014.
Rugo HS, Olopade OI, DeMichele A, et al. Adaptive randomization of veliparib–carboplatin treatment in breast cancer. N Engl J Med. 2016;375:23–34.
Berry SM, Petzold EA, Dull P, et al. A response-adaptive randomization platform trial for efficient evaluation of Ebola virus treatments: a model for pandemic response. Clin Trials. 2016;13:22–30.
Ware JH. Investigating therapies of potentially great benefit: ECMO. (with discussion). Stat Sci. 1989;4(4):298–340.
Hey SP, Kimmelman J. Are outcome-adaptive allocation trials ethical? (with discussion). Clin Trials. 2005;12(2):102–27.
Proschan M, Evans S. Resist the temptation of response-adaptive randomization. Clin Infect Dis. 2020;71(11):3002–4. https://doi.org/10.1093/cid/ciaa334 .
Villar SS, Robertson DS, Rosenberger WF. The temptation of overgeneralizing response-adaptive randomization. Clinical Infectious Diseases. 2020; ciaa1027; doi: https://doi.org/10.1093/cid/ciaa1027 .
Proschan M. Reply to Villar, et al. Clinical infectious diseases. 2020; ciaa1029; doi: https://doi.org/10.1093/cid/ciaa1029 .
Donner A, Klar N. Design and Analysis of Cluster Randomization Trials in Health Research. London: Arnold Publishers Limited; 2000.
Klasnja P, Hekler EB, Shiffman S, Boruvka A, Almirall D, Tewari A, Murphy SA. Micro-randomized trials: An experimental design for developing just-in-time adaptive interventions. Health Psychol. 2015;34:1220–8.
Article PubMed Central Google Scholar
Download references
The authors are grateful to Robert A. Beckman for his continuous efforts coordinating Innovative Design Scientific Working Groups, which is also a networking research platform for the Randomization ID SWG. We would also like to thank the editorial board and the two anonymous reviewers for the valuable comments which helped to substantially improve the original version of the manuscript.
None. The opinions expressed in this article are those of the authors and may not reflect the opinions of the organizations that they work for.
Authors and affiliations.
National Institutes of Health, Bethesda, MD, USA
Vance W. Berger
Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach, Germany
Louis Joseph Bour
Boehringer-Ingelheim Pharmaceuticals Inc, Ridgefield, CT, USA
Kerstine Carter
Population Health Sciences, University of Utah School of Medicine, Salt Lake City UT, USA
Jonathan J. Chipman
Cancer Biostatistics, University of Utah Huntsman Cancer Institute, Salt Lake City UT, USA
Clinical Trials Research Unit, University of Leeds, Leeds, UK
Colin C. Everett
RWTH Aachen University, Aachen, Germany
Nicole Heussen & Ralf-Dieter Hilgers
Medical School, Sigmund Freud University, Vienna, Austria
Nicole Heussen
York Trials Unit, Department of Health Sciences, University of York, York, UK
Catherine Hewitt
Food and Drug Administration, Silver Spring, MD, USA
Yuqun Abigail Luo
Open University of Catalonia (UOC) and the University of Barcelona (UB), Barcelona, Spain
Jone Renteria
Department of Human Development and Quantitative Methodology, University of Maryland, College Park, MD, USA
BioPharma Early Biometrics & Statistical Innovations, Data Science & AI, R&D BioPharmaceuticals, AstraZeneca, Gothenburg, Sweden
Yevgen Ryeznik
Early Development Analytics, Novartis Pharmaceuticals Corporation, NJ, East Hanover, USA
Oleksandr Sverdlov
Biostatistics Center & Department of Biostatistics and Bioinformatics, George Washington University, DC, Washington, USA
Diane Uschner
You can also search for this author in PubMed Google Scholar
Conception: VWB, KC, NH, RDH, OS. Writing of the main manuscript: OS, with contributions from VWB, KC, JJC, CE, NH, and RDH. Design of simulation studies: OS, YR. Development of code and running simulations: YR. Digitization and preparation of data for Fig. 5 : JR. All authors reviewed the original manuscript and the revised version. The authors read and approved the final manuscript.
Correspondence to Oleksandr Sverdlov .
Ethics approval and consent to participate.
Not applicable.
Competing interests, additional information, publisher's note.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Additional file 1: figure s1.
. Type I error rate under selection bias model with bias effect ( \(\nu\) ) in the range 0 (no bias) to 1 (strong bias) for 12 randomization designs and three statistical tests.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
Reprints and permissions
Cite this article.
Berger, V., Bour, L., Carter, K. et al. A roadmap to using randomization in clinical trials. BMC Med Res Methodol 21 , 168 (2021). https://doi.org/10.1186/s12874-021-01303-z
Download citation
Received : 24 December 2020
Accepted : 14 April 2021
Published : 16 August 2021
DOI : https://doi.org/10.1186/s12874-021-01303-z
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
ISSN: 1471-2288
Design of Experiments > Randomization
Randomization in an experiment is where you choose your experimental participants randomly . For example, you might use simple random sampling , where participants names are drawn randomly from a pool where everyone has an even probability of being chosen. You can also assign treatments randomly to participants, by assigning random numbers from a random number table.
If you use randomization in your experiments, you guard against bias . For example, selection bias (where some groups are underrepresented) is eliminated and accidental bias (where chance imbalances happen) is minimized. You can also run a variety of statistical tests on your data (to test your hypotheses) if your sample is random.
The word “random” has a very specific meaning in statistics. Arbitrarily choosing names from a list might seem random, but it actually isn’t. Hidden biases (like a subconscious preference for English names, names that sound like friends, or names that roll off the tongue) means that what you think is a random selection probably isn’t. Because these biases are often hidden, or overlooked, specific randomization techniques have been developed for researchers:
You have full access to this open access chapter
Part of the book series: Handbook of Experimental Pharmacology ((HEP,volume 257))
37k Accesses
27 Citations
53 Altmetric
Most, if not all, guidelines, recommendations, and other texts on Good Research Practice emphasize the importance of blinding and randomization. There is, however, very limited specific guidance on when and how to apply blinding and randomization. This chapter aims to disambiguate these two terms by discussing what they mean, why they are applied, and how to conduct the acts of randomization and blinding. We discuss the use of blinding and randomization as the means against existing and potential risks of bias rather than a mandatory practice that is to be followed under all circumstances and at any cost. We argue that, in general, experiments should be blinded and randomized if (a) this is a confirmatory research that has a major impact on decision-making and that cannot be readily repeated (for ethical or resource-related reasons) and/or (b) no other measures can be applied to protect against existing and potential risks of bias.
You have full access to this open access chapter, Download chapter PDF
‘When I use a word,’ Humpty Dumpty said in rather a scornful tone, ‘it means just what I choose it to mean – neither more nor less.’
Lewis Carroll ( 1871 )
Through the Looking-Glass, and What Alice Found There
In various fields of science, outcome of the experiments can be intentionally or unintentionally distorted if potential sources of bias are not properly controlled. There is a number of recognized risks of bias such as selection bias, performance bias, detection bias, attrition bias, etc. (Hooijmans et al. 2014 ). Some sources of bias can be efficiently controlled through research rigor measures such as randomization and blinding.
Existing guidelines and recommendations assign a significant value to adequate control over various factors that can bias the outcome of scientific experiments (chapter “Guidelines and Initiatives for Good Research Practice”). Among internal validity criteria, randomization and blinding are two commonly recognized bias-reducing instruments that need to be considered when planning a study and are to be reported when the study results are disclosed in a scientific publication.
For example, editorial policy of the Nature journals requires authors in the life sciences field to submit a checklist along with the manuscripts to be reviewed. This checklist has a list of items including questions on randomization and blinding. More specifically, for randomization, the checklist is asking for the following information: “If a method of randomization was used to determine how samples/animals were allocated to experimental groups and processed, describe it.” Recent analysis by the NPQIP Collaborative group indicated that only 11.2% of analyzed publications disclosed which method of randomization was used to determine how samples or animals were allocated to experimental groups (Macleod, The NPQIP Collaborative Group 2017 ). Meanwhile, the proportion of studies mentioning randomization was much higher – 64.2%. Do these numbers suggest that authors strongly motivated to have their work published in a highly prestigious scientific journal ignore the instructions? It is more likely that, for many scientists (authors, editors, reviewers), a statement such as “subjects were randomly assigned to one of the N treatment conditions” is considered to be sufficient to describe the randomization procedure.
For the field of life sciences, and drug discovery in particular, the discussion of sources of bias, their impact, and protective measures, to a large extent, follows the examples from the clinical research (chapter “Learning from Principles of Evidence-Based Medicine to Optimize Nonclinical Research Practices”). However, clinical research is typically conducted by research teams that are larger than those involved in basic and applied preclinical work. In the clinical research teams, there are professionals (including statisticians) trained to design the experiments and apply bias-reducing measures such as randomization and blinding. In contrast, preclinical experiments are often designed, conducted, analyzed, and reported by scientists lacking training or access to information and specialized resources necessary for proper administration of bias-reducing measures.
As a result, researchers may design and apply procedures that reflect their understanding of what randomization and blinding are. These may or may not be the correct procedures. For example, driven by a good intention to randomize 4 different treatment conditions (A, B, C, and D) applied a group of 16 mice, a scientist may design the experiment in the following way (Table 1 ).
The above example is a fairly common practice to conduct “randomization” in a simple and convenient way. Another example of common practice is, upon animals’ arrival, to pick them haphazardly up from the supplier’s transport box and place into two (or more) cages which then constitute the control and experimental group(s). However, both methods of assigning subjects to experimental treatment conditions violate the randomness principle (see below) and, therefore, should not be reported as randomization.
Similarly, the use of blinding in experimental work typically cannot be described solely by stating that “experimenters were blinded to the treatment conditions.” For both randomization and blinding, it is essential to provide details on what exactly was applied and how.
The purpose of this chapter is to disambiguate these two terms by discussing what they mean, why they are applied, and how to conduct the acts of randomization and blinding. We discuss the use of blinding and randomization as the means against existing and potential risks of bias rather than a mandatory practice that is to be followed under all circumstances and at any cost.
Randomization can serve several purposes that need to be recognized individually as one or more of them may become critical when considering study designs and conditions exempt from the randomization recommendation.
First, randomization permits the use of probability theory to express the likelihood of chance as a source for the difference between outcomes. In other words, randomization enables the application of statistical tests that are common in biology and pharmacology research. For example, the central limit theorem states that the sampling distribution of the mean of any independent, random variable will be normal or close to normal, if the sample size is large enough. The central limit theorem assumes that the data are sampled randomly and that the sample values are independent of each other (i.e., occurrence of one event has no influence on the next event). Usually, if we know that subjects or items were selected randomly, we can assume that the independence assumption is met. If the study results are to be subjected to conventional statistical analyses dependent on such assumptions, adequate randomization method becomes a must.
Second, randomization helps to prevent a potential impact of the selection bias due to differing baseline or confounding characteristics of the subjects. In other words, randomization is expected to transform any systematic effects of an uncontrolled factor into a random, experimental noise. A random sample is one selected without bias: therefore, the characteristics of the sample should not differ in any systematic or consistent way from the population from which the sample was drawn. But random sampling does not guarantee that a particular sample will be exactly representative of a population. Some random samples will be more representative of the population than others. Random sampling does ensure, however, that, with a sufficiently large number of subjects, the sample becomes more representative of the population.
There are characteristics of the subjects that can be readily assessed and controlled (e.g., by using stratified randomization, see below). But there are certainly characteristics that are not known and for which randomization is the only way to control their potentially confounding influence. It should be noted, however, that the impact of randomization can be limited when the sample size is low. Footnote 1 This needs to be kept in mind given that most nonclinical studies are conducted using small sample sizes. Thus, when designing nonclinical studies, one should invest extra efforts into analysis of possible confounding factors or characteristics in order to judge whether or not experimental and control groups are similar before the start of the experiment.
Third, randomization interacts with other means to reduce risks of bias. Most importantly, randomization is used together with blinding to conceal the allocation sequence. Without an adequate randomization procedure, efforts to introduce and maintain blinding may not always be fully successful.
There are several randomization methods that can be applied to study designs of differing complexities. The tools used to apply these methods range from random number tables to specialized software. Irrespective of the tools used, reporting on the randomization schedule applied should also answer the following two questions:
Is the randomization schedule based on an algorithm or a principle that can be written down and, based on the description, be reapplied by anyone at a later time point resulting in the same group composition? If yes, we are most likely dealing with a “pseudo-randomization” (e.g., see below comments about the so-called Latin square design).
Does the randomization schedule exclude any subjects and groups that belong to the experiment? If yes, one should be aware of the risks associated with excluding some groups or subjects such as a positive control group (see chapter “Out of Control? Managing Baseline Variability in Experimental Studies with Control Groups”).
An answer “yes” to either of the above questions does not automatically mean that something incorrect or inappropriate is being done. In fact, a scientist may take a decision well justified by their experience with and need of particular experimental situation. However, in any case, the answer “yes” to either or both of the questions above mandates the complete and transparent description of the study design with the subject allocation schedule.
One of the common randomization strategies used for between-subject study designs is called simple (or unrestricted) randomization. Simple random sampling is defined as the process of selecting subjects from a population such that just the following two criteria are satisfied:
The probability of assignment to any of the experimental groups is equal for each subject.
The assignment of one subject to a group does not affect the assignment of any other subject to that same group.
With simple randomization, a single sequence of random values is used to guide assignment of subjects to groups. Simple randomization is easy to perform and can be done by anyone without a need to involve professional statistical help. However, simple randomization can be problematic for studies with small sample sizes. In the example below, 16 subjects had to be allocated to 4 treatment conditions. Using Microsoft Excel’s function RANDBETWEEN (0.5;4.5), there were 16 random integer numbers from 1 to 4 generated. Obviously, this method has resulted in an unequal number of subjects among groups (e.g., there is only one subject assigned to group 2). This problem may occur irrespective of whether one uses machine-generated random numbers or simply tosses a coin.
Subject ID | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 |
Group ID | 4 | 1 | 1 | 3 | 3 | 1 | 4 | 4 | 3 | 4 | 3 | 3 | 4 | 2 | 3 | 1 |
An alternative approach would be to generate a list of all treatments to be administered (top row in the table below) and generate a list of random numbers (as many as the total number of subjects in a study) using a Microsoft Excel’s function RAND() that returns random real numbers greater than or equal to 0 and less than 1 (this function requires no argument):
Treatment | 1 | 1 | 1 | 1 | 2 | 2 | 2 | 2 | 3 | 3 | 3 | 3 | 4 | 4 | 4 | 4 |
Random number | 0.76 | 0.59 | 0.51 | 0.90 | 0.64 | 0.10 | 0.50 | 0.48 | 0.22 | 0.37 | 0.05 | 0.09 | 0.73 | 0.83 | 0.50 | 0.43 |
The next step would be to sort the treatment row based on the values in the random number row (in an ascending or descending manner) and add a Subject ID row:
Subject ID | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 |
Treatment | 3 | 3 | 2 | 3 | 3 | 4 | 2 | 2 | 4 | 1 | 1 | 2 | 4 | 1 | 4 | 1 |
Random number | 0.05 | 0.09 | 0.10 | 0.22 | 0.37 | 0.43 | 0.48 | 0.50 | 0.50 | 0.51 | 0.59 | 0.64 | 0.73 | 0.76 | 0.83 | 0.90 |
There is an equal number of subjects (four) assigned to each of the four treatment conditions, and the assignment is random. This method can also be used when group sizes are not equal (e.g., when a study is conducted with different numbers of genetically modified animals and animals of wild type).
However, such randomization schedule may still be problematic for some types of experiments. For example, if the subjects are tested one by one over the course of 1 day, the first few subjects could be tested in the morning hours while the last subjects – in the afternoon. In the example above, none of the first eight subjects is assigned to group 1, while the second half does not include any subject from group 3. To avoid such problems, block randomization may be applied.
Blocking is used to supplement randomization in situations such as the one described above – when one or more external factors change or may change during the period when the experiment is run. Blocks are balanced with predetermined group assignments, which keeps the numbers of subjects in each group similar at all times. All blocks of one experiment have equal size, and each block represents all independent variables that are being studied in the experiment.
The first step in block randomization is to define the block size. The minimum block size is the number obtained by multiplying numbers of levels of all independent variables. For example, an experiment may compare the effects of a vehicle and three doses of a drug in male and female rats. The minimum block size in such case would be eight rats per block (i.e., 4 drug dose levels × 2 sexes). All subjects can be divided into N blocks of size X∗Y, where X is a number of groups or treatment conditions (i.e., 8 for the example given) and Y – number of subjects per treatment condition per block. In other words, there may be one or more subjects per treatment condition per block so that the actual block size is multiple of a minimum block size (i.e., 8, 16, 24, and so for the example given above).
The second step is, after block size has been determined, to identify all possible combinations of assignment within the block. For instance, if the study is evaluating effects of a drug (group A) or its vehicle (group B), the minimum block size is equal to 2. Thus, there are just two possible treatment allocations within a block: (1) AB and (2) BA. If the block size is equal to 4, there is a greater number of possible treatment allocations: (1) AABB, (2) BBAA, (3) ABAB, (4) BABA, (5) ABBA, and (6) BAAB.
The third step is to randomize these blocks with varying treatment allocations:
Block number | 4 | 3 | 1 | 6 | 5 | 2 |
Random number | 0.015 | 0.379 | 0.392 | 0.444 | 0.720 | 0.901 |
And, finally, the randomized blocks can be used to determine the subjects’ assignment to the groups. In the example above, there are 6 blocks with 4 treatment conditions in each block, but this does not mean that the experiment must include 24 subjects. This random sequence of blocks can be applied to experiments with a total number of subjects smaller or greater than 24. Further, the total number of subjects does not have to be a multiple of 4 (block size) as in the example below with a total of 15 subjects:
Block number | 4 | 3 | 1 | 6 | ||||||||||||
Random number | 0.015 | 0.379 | 0.392 | 0.444 | ||||||||||||
Subject ID | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | – |
Treatment | B | A | B | A | A | B | A | B | A | A | B | B | B | A | A | – |
It is generally recommended to blind the block size to avoid any potential selection bias. Given the low sample sizes typical for preclinical research, this recommendation becomes a mandatory requirement at least for confirmatory experiments (see chapter “Resolving the Tension Between Exploration and Confirmation in Preclinical Biomedical Research”).
Simple and block randomization are well suited when the main objective is to balance the subjects’ assignment to the treatment groups defined by the independent variables whose impact is to be studied in an experiment. With sample sizes that are large enough, simple and block randomization may also balance the treatment groups in terms of the unknown characteristics of the subjects. However, in many experiments, there are baseline characteristics of the subjects that do get measured and that may have an impact on the dependent (measured) variables (e.g., subjects’ body weight). Potential impact of such characteristics may be addressed by specifying inclusion/exclusion criteria, by including them as covariates into a statistical analysis, and (or) may be minimized by applying stratified randomization schedules.
It is always up to a researcher to decide where there are such potentially impactful covariates that need to be controlled and what is the best way of dealing with them. In case of doubt, the rule of thumb is to avoid any risk, apply stratified randomization, and declare an intention to conduct a statistical analysis that will isolate a potential contribution of the covariate(s).
It is important to acknowledge that, in many cases, information about such covariates may not be available when a study is conceived and designed. Thus, a decision to take covariates into account often affects the timing of getting the randomization conducted. One common example of such a covariate is body weight. A study is planned, and sample size is estimated before the animals are ordered or bred, but the body weights will not be known until the animals are ready. Another example is the size of the tumors that are inoculated and grow at different rates for a pre-specified period of time before the subjects start to receive experimental treatments.
For most situations in preclinical research, an efficient way to conduct stratified randomization is to run simple (or block) randomization several times (e.g., 100 times) and, for each iteration, calculate means for the covariate per each group (e.g., body weights for groups A and B in the example in previous section). The randomization schedule that yields the lowest between-group difference for the covariate would then be chosen for the experiment. Running a large number of iterations does not mean saving excessively large volumes of data. In fact, several tools used to support randomization allow to save the seed for the random number generator and re-create the randomization schedule later using this seed value.
Although stratified randomization is a relatively simple technique that can be of great help, there are some limitations that need to be acknowledged. First, stratified randomization can be extended to two or more stratifying variables. However, given the typically small sample sizes of preclinical studies, it may become complicated to implement if many covariates must be controlled. Second, stratified randomization works only when all subjects have been identified before group assignment. While this is often not a problem in preclinical research, there may be situations when a large study sample is divided into smaller batches that are taken sequentially into the study. In such cases, more sophisticated procedures such as the covariate adaptive randomization may need to be applied similar to what is done in clinical research (Kalish and Begg 1985 ). With this method, subjects are assigned to treatment groups by taking into account the specific covariates and assignments of subjects that have already been allocated to treatment groups. We intentionally do not provide any further examples or guidance on such advanced randomization methods as they should preferably be developed and applied in consultation with or by biostatisticians.
The above discussion on the randomization schedules referred to study designs known as between-subject. A different approach would be required if a study is designed as within-subject. In such study designs also known as the crossover, subjects may be given sequences of treatments with the intent of studying the differences between the effects produced by individual treatments. One should keep in mind that such sequence of testing always bears the danger that the first test might affect the following ones. If there are reasons to expect such interference, within-subjects designs should be avoided.
In the simplest case of a crossover design, there are only two treatments and only two possible sequences to administer these treatments (e.g., A-B and B-A). In nonclinical research and, particularly, in pharmacological studies, there is a strong trend to include at least three doses of a test drug and its vehicle. A Latin square design is commonly used to allocate subjects to treatment conditions. Latin square is a very simple technique, but it is often applied in a way that does not result in a proper randomization (Table 2 ).
In this example, each subject receives each of the four treatments over four consecutive study periods, and, for any given study period, each treatment is equally represented. If there are more than four subjects participating in a study, then the above schedule is copied as many times as need to cover all study subjects.
Despite its apparent convenience (such schedules can be generated without any tools), resulting allocation schedules are predictable and, what is even worse, are not balanced with respect to first-order carry-over effects (e.g., except for the first test period, D comes always after C). Therefore, such Latin square designs are not an example of properly conducted randomization.
One solution would be to create a complete set of orthogonal Latin Squares. For example, when the number of treatments equals three, there are six (i.e., 3!) possible sequences – ABC, ACB, BAC, BCA, CAB, and CBA. If the sample size is a multiple of six, then all six sequences would be applied. As the preclinical studies typically involve small sample sizes, this approach becomes problematic for larger numbers of treatments such as 4, where there are already 24 (i.e., 4!) possible sequences.
The Williams design is a special case of a Latin square where every treatment follows every other treatment the same number of times (Table 3 ).
The Williams design maintains all the advantages of the Latin square but is balanced (see Jones and Kenward 2003 for a detailed discussion on the Williams squares including the generation algorithms). There are six Williams squares possible in case of four treatments. Thus, if there are more than four subjects, more than one Williams square would be applied (e.g., two squares for eight subjects).
Constructing the Williams squares is not a randomization yet. In studies based on within-subject designs, subjects are not randomized to treatment in the same sense as they are in the between-subject design. For a within-subject design, the treatment sequences are randomized. In other words, after the Williams squares are constructed and selected, individual sequences are randomly assigned to the subjects.
The most common and basic method of simple randomization is flipping a coin. For example, with two treatment groups (control versus treatment), the side of the coin (i.e., heads, control; tails, treatment) determines the assignment of each subject. Other similar methods include using a shuffled deck of cards (e.g., even, control; odd, treatment), throwing a dice (e.g., below and equal to 3, control; over 3, treatment), or writing numbers of pieces of paper, folding them, mixing, and then drawing one by one. A random number table found in a statistics book, online random number generators ( random.org or randomizer.org ), or computer-generated random numbers (e.g., using Microsoft Excel) can also be used for simple randomization of subjects. As explained above, simple randomization may result in an unbalanced design, and, therefore, one should pay attention to the number of subjects assigned to each treatment group. But more advanced randomization techniques may require dedicated tools and, whenever possible, should be supported by professional biostatisticians.
Randomization tools are typically included in study design software, and, for in vivo research, the most noteworthy example is the NC3Rs’ Experimental Design Assistant ( www.eda.nc3rs.org.uk ). This freely available online resource allows to generate and share a spreadsheet with the randomized allocation report after the study has been designed (i.e., variables defined, sample size estimated, etc.). Similar functionality may be provided by Electronic Laboratory Notebooks that integrate study design support (see chapter “Electronic Lab Notebooks and Experimental Design Assistants”).
Randomization is certainly supported by many data analysis software packages commonly used in research. In some cases, there is even a free tool that allows to conduct certain types of randomization online (e.g., QuickCalcs at www.graphpad.com/quickcalcs/randMenu/ ).
Someone interested to have a nearly unlimited freedom in designing and executing different types of randomization will benefit from the resources generated by the R community (see https://paasp.net/resource-center/r-scripts/ ). Besides being free and supported by a large community of experts, R allows to save the scripts used to obtain randomization schedules (along with the seed numbers) that makes the overall process not only reproducible and verifiable but also maximally transparent.
Randomization is not and should never be seen as a goal per se. The goal is to minimize the risks of bias that may affect the design, conduct, and analysis of a study and to enable application of other research methods (e.g., certain statistical tests). Randomization is merely a tool to achieve this goal.
If not dictated by the needs of data analysis or the intention to implement blinding, in some cases, pseudo-randomizations such as the schedules described in Tables 1 and 2 may be sufficient. For example, animals delivered by a qualified animal supplier come from large batches where the breeding schemes themselves help to minimize the risk of systematic differences in baseline characteristics. This is in contrast to clinical research where human populations are generally much more heterogeneous than populations of animals typically used in research.
Randomization becomes mandatory in case animals are not received from major suppliers, are bred in-house, are not standard animals (i.e., transgenic), or when they are exposed to an intervention before the initiation of a treatment. Examples of intervention may be surgery, administration of a reagent substance inducing long-term effects, grafts, or infections. In these cases, animals should certainly be randomized after the intervention.
When planning a study, one should also consider the risk of between-subject cross-contamination that may affect the study outcome if animals receiving different treatment(s) are housed within the same cage. In such cases, the most optimal approach is to reduce the number of subjects per cage to a minimum that is acceptable from the animal care and use perspective and adjust the randomization schedule accordingly (i.e., so that all animals in the cage receive the same treatment).
There are situations when randomization becomes impractical or generates other significant risks that outweigh its benefits. In such cases, it is essential to recognize the reasons why randomization is applied (e.g., ability to apply certain statistical tests, prevention of selection bias, and support of blinding). For example, for an in vitro study with multi-well plates, randomization is usually technically possible, but one would need to recognize the risk of errors introduced during manual pipetting into a 96- or 384-well plate. With proper controls and machine-read experimental readout, the risk of bias in such case may not be seen as strong enough to accept the risk of a human error.
Another common example is provided by studies where incremental drug doses or concentrations are applied during the course of a single experiment involving just one subject. During cardiovascular safety studies, animals receive first an infusion of a vehicle (e.g., over a period of 30 min), followed by the two or three concentrations of the test drug, and the hemodynamics is being assessed along with the blood samples taken. As the goal of such studies is to establish concentration-effect relationships, one has no choice but to accept the lack of randomization. The only alternatives would be to give up on the within-subject design or conduct the study over many days to allow enough time to wash the drug out between the test days. Needless to say, neither of these options is perfect for a study where the baseline characteristics are a critical factor in keeping the sample size low. In this example, the desire to conduct a properly randomized study comes into a conflict with ethical considerations.
A similar design is often used in electrophysiological experiments (in vitro or ex vivo) where a test system needs to be equilibrated and baselined for extended periods of time (sometimes hours) to allow subsequent application of test drugs (at ascending concentrations). Because a washout cannot be easily controlled, such studies also do not follow randomized schedules of testing various drug doses.
The low-throughput studies such as in electrophysiology typically go over many days, and every day there is a small number of subjects or data points added. While one may accept the studies being not randomized in some cases, it is important to stress that there should be other measures in place that control potential sources of bias. It is a common but usually unacceptable practice to analyze the results each time a new data point has been added in order to decide whether a magic P value sank below 0.05 and the experiment can stop. For example, in one recent publication, it was stated: “For optogenetic activation experiments, cell-type-specific ablation experiments, and in vivo recordings (optrode recordings and calcium imaging), we continuously increased the number of animals until statistical significance was reached to support our conclusions.” Such an approach should be avoided by clear experimental planning and definition of study endpoints.
The above examples are provided only to illustrate that there may be special cases when randomization may not be done. This is usually not an easy decision to make and even more difficult to defend later. Therefore, one should always be advised to seek a professional advice (i.e., interaction with the biostatisticians or colleagues specializing in the risk assessment and study design issues). Needless to say, this advice should be obtained before the studies are conducted.
In the ideal case, once the randomization was applied to allocate subjects to treatment conditions, the randomization should be maintained through the study conduct and analysis to control against potential performance and outcome detection bias, respectively. In other words, it would not be appropriate first to assign the subjects, for example, to groups A and B and then do all experimental manipulations first with the group A and then with the group B.
In clinical research, blinding and randomization are recognized as the most important design techniques for avoiding bias (ICH Harmonised Tripartite Guideline 1998 ; see also chapter “Learning from Principles of Evidence-Based Medicine to Optimize Nonclinical Research Practices”). In the preclinical domain, there is a number of instruments assessing risks of bias, and the criteria most often included are randomization and blinding (83% and 77% of a total number of 30 instruments analyzed, Krauth et al. 2013 ).
While randomization and blinding are often discussed together and serve highly overlapping objectives, attitude towards these two research rigor measures is strikingly different. The reason for a higher acceptance of randomization compared to blinding is obvious – randomization can be implemented essentially at no cost, while blinding requires at least some investment of resources and may therefore have a negative impact on the research unit’s apparent capacity (measured by the number of completed studies, irrespective of quality).
Since the costs and resources are not an acceptable argument in discussions on ethical conduct of research, we often engage a defense mechanism, called rationalization, that helps to justify and explain why blinding should not be applied and do so in a seemingly rational or logical manner to avoid the true explanation. Arguments against the use of blinding can be divided into two groups.
One group comprises a range of factors that are essentially psychological barriers that can be effectively addressed. For example, one may believe that his/her research area or a specific research method has an innate immunity against any risk of bias. Or, alternatively, one may believe that his/her scientific excellence and the ability to supervise the activities in the lab make blinding unnecessary. There is a great example that can be used to illustrate that there is no place for beliefs and one should rather rely on empirical evidence. For decades, compared to male musicians, females have been underrepresented in major symphonic orchestras despite having equal access to high-quality education. The situation started to change in the mid-1970s when blind auditions were introduced and the proportion of female orchestrants went up (Goldin and Rouse 2000 ). In preclinical research, there are also examples of the impact of blinding (or a lack thereof). More specifically, there were studies that reveal substantially higher effect sizes reported in the experiments that were not randomized or blinded (Macleod et al. 2008 ).
Another potential barrier is related to the “trust” within the lab. Bench scientists need to be explained what the purpose of blinding is and, in the ideal case, be actively involved in development and implementation of blinding and other research rigor measures. With the proper explanation and engagement, blinding will not be seen as an unfriendly act whereby a PI or a lab head communicates a lack of trust.
The second group of arguments against the use of blinding is actually composed of legitimate questions that need to be addressed when designing an experiment. As mentioned above in the section on randomization, a decision to apply blinding should be justified by the needs of a specific experiment and correctly balanced against the existing and potential risks.
It requires no explanation that, in preclinical research, there are no double-blinded studies in a sense of how it is meant in the clinic. However, similar to clinical research, blinding in preclinical experiments serves to protect against two potential sources of bias: bias related to blinding of personnel involved in study conduct including application of treatments (performance bias) and bias related to blinding of personnel involved in the outcome assessment (detection bias).
Analysis of the risks of bias in a particular research environment or for a specific experiment allows to decide which type of blinding should be applied and whether blinding is an appropriate measure against the risks.
There are three types or levels of blinding, and each one of them has its use: assumed blinding, partial blinding, and full blinding. With each type of blinding, experimenters allocate subjects to groups, replace the group names with blind codes, save the coding information in a secure place, and do not access this information until a certain pre-defined time point (e.g., until the data are collected or the study is completed and analyzed).
In the assumed blinding, experimenters have access to the group or treatment codes at all times, but they do not know the correspondence between group and treatment before the end of the study. With the partial or full blinding, experimenters do not have access to the coding information until a certain pre-defined time point.
Main advantage of the assumed blinding is that an experiment can be conducted by one person who plans, performs, and analyzes the study. The risk of bias may be relatively low if the experiments are routine – e.g., lead optimization research in drug discovery or fee-for-service studies conducted using well-established standardized methods.
Efficiency of assumed blinding is enhanced if there is a sufficient time gap between application of a treatment and the outcome recording/assessment. It is also usually helpful if the access to the blinding codes is intentionally made more difficult (e.g., blinding codes are kept in the study design assistant or in a file on an office computer that is not too close to the lab where the outcomes will be recorded).
If introduced properly, assumed blinding can guard against certain unwanted practices such as remeasurement, removal, and reclassification of individual observations or data points (three evil Rs according to Shun-Shin and Francis 2013 ). In preclinical studies with small sample sizes, such practices have particularly deleterious consequences. In some cases, remeasurement even of a single subject may skew the results in a direction suggested by the knowledge of group allocation. One should emphasize that blinding is not necessarily an instrument against the remeasurement (it is often needed or unavoidable) but rather helps to avoid risks associated with it.
There are various situations where blinding (with no access to the blinding codes) is implemented not for the entire experiment but only for a certain part of it, e.g.:
No blinding during the application of experimental treatment (e.g., injection of a test drug) but proper blinding during the data collection and analysis
No blinding during the conduct of an experiment but proper blinding during analysis
For example, in behavioral pharmacology, there are experiments where subjects’ behavior is video recorded after a test drug is applied. In such cases, blinding is applied to analysis of the video recordings but not the drug application phase. Needless to say, blinded analysis has typically to be performed by someone who was not involved in the drug application phase.
A decision to apply partial blinding is based on (a) the confidence that the risks of bias are properly controlled during the unblinded parts of the experiment and/or (b) rationale assessment of the risks associated with maintaining blinding throughout the experiment. As an illustration of such decision-making process, one may imagine a study where the experiment is conducted in a small lab (two or three people) by adequately trained personnel that is not under pressure to deliver results of a certain pattern, data collection is automatic, and data integrity is maintained at every step. Supported by various risk reduction measures, such an experiment may deliver robust and reliable data even if not fully blinded.
Importantly, while partial blinding can adequately limit the risk of some forms of bias, it may be less effective against the performance bias.
For important decision-enabling studies (including confirmatory research, see chapter “Resolving the Tension Between Exploration and Confirmation in Preclinical Biomedical Research”), it is usually preferable to implement full blinding rather than to explain why it was not done and argue that all the risks were properly controlled.
It is particularly advisable to follow full blinding in the experiments that are for some reasons difficult to repeat. For example, these could be studies running over significant periods of time (e.g., many months) or studies using unique resources or studies that may not be repeated for ethical reasons. In such cases, it is more rational to apply full blinding rather than leave a chance that the results will be questioned on the ground of lacking research rigor.
As implied by the name, full blinding requires complete allocation concealment from the beginning until the end of the experiment. This requirement may translate into substantial costs of resources. In the ideal scenario, each study should be supported by at least three independent people responsible for:
(De)coding, randomization
Conduct of the experiment such as handling of the subjects and application of test drugs (outcome recording and assessment)
(Outcome recording and assessment), final analysis
The main reason for separating conduct of the experiment and the final analysis is to protect against potential unintended unblinding (see below). If there is no risk of unblinding or it is not possible to have three independent people to support the blinding of an experiment, one may consider a single person responsible for every step from the conduct of the experiment to the final analysis. In other words, the study would be supported by two independent people responsible for:
Conduct of the experiment such as handling of the subjects and application of test drugs, outcome recording and assessment, and final analysis
Successful blinding is related to adequate randomization. This does not mean that they should always be performed in this sequence: first randomization and then blinding. In fact, the order may be reversed. For example, one may work with an offspring of the female rats that received experimental and control treatments while pregnant. As the litter size may differ substantially between the dams, randomization may be conducted after the pups are born, and this does not require allocation concealment to be broken.
The blinding procedure has to be carefully thought through. There are several factors that are listed below and that can turn a well-minded intention into a waste of resources.
First, blinding should as far as possible cover the entire experimental setup – i.e., all groups and subjects. There is an unacceptable practice to exclude positive controls from blinding that is often not justified by anything other than an intention to introduce a detection bias in order to reduce the risk of running an invalid experiment (i.e., an experiment where a positive control failed).
In some cases, positive controls cannot be administered by the same route or using the same pretreatment time as other groups. Typically, such a situation would require a separate negative (vehicle) control treated in the same way as the positive control group. Thus, the study is only partially blinded as the experimenter is able to identify the groups needed to “validate” the study (negative control and positive control groups) but remains blind to the exact nature of the treatment received by each of these two groups. For a better control over the risk of unblinding, one may apply a “double-dummy” approach where all animals receive the same number of administrations via the same routes and pretreatment times.
Second, experiments may be unintentionally unblinded. For example, drugs may have specific, easy to observe physicochemical characteristics, or drug treatments may change the appearance of the subjects or produce obvious adverse effects. Perhaps, even more common is the unblinding due to the differences in the appearance of the drug solution or suspension dependent on the concentration. In such cases, there is not much that can be done but it is essential to take corresponding notes and acknowledge in the study report or publication. It is interesting to note that the unblinding is often cited as an argument against the use of blinding (Fitzpatrick et al. 2018 ); however, this argument reveals another problem – partial blinding schemes are often applied as a normative response without any proper risk of bias assessment.
Third, blinding codes should be kept in a secure place avoiding any risk that the codes are lost. For in vivo experiments, this is an ethical requirement as the study will be wasted if it cannot be unblinded at the end.
Fourth, blinding can significantly increase the risk of mistakes. A particular situation that one should be prepared to avoid is related to lack of accessibility of blinding codes in case of emergency. There are situations when a scientist conducting a study falls ill and the treatment schedules or outcome assessment protocols are not available or a drug treatment is causing disturbing adverse effects and attending veterinarians or caregivers call for a decision in the absence of a scientist responsible for a study. It usually helps to make the right decision if it is known that an adverse effect is observed in a treatment group where it can be expected. Such situations should be foreseen and appropriate guidance made available to anyone directly or indirectly involved in an experiment. A proper study design should define a backup person with access to the blinding codes and include clear definition of endpoints.
Several practical tips can help to reduce the risk of human-made mistakes. For example, the study conduct can be greatly facilitated if each treatment group is assigned its own color. Then, this color coding would be applied to vials with the test drugs, syringes used to apply the drug, and the subjects (e.g., apply solution from a green-labeled vial using a green-labeled syringe to an animal from a green-labeled cage or with a green mark on its tail). When following such practice, one should not forget to randomly assign color codes to treatment conditions. Otherwise, for example, yellow color is always used for vehicle control, green for the lowest dose, and so forth.
To sum up, it is not always lacking resources that make full blinding not possible to apply. Further, similar to what was described above for randomization, there are clear exception cases where application of blinding is made problematic by the very nature of the experiment itself.
Most, if not all, guidelines, recommendations, and other texts on Good Research Practice emphasize the importance of blinding and randomization (chapters “Guidelines and Initiatives for Good Research Practice”, and “General Principles of Preclinical Study Design”). There is, however, very limited specific guidance on when and how to apply blinding and randomization. The present chapter aims to close this gap.
Generally speaking, experiments should be blinded and randomized if:
This is a confirmatory research (see chapter “Resolving the Tension Between Exploration and Confirmation in Preclinical Biomedical Research”) that has a major impact on decision-making and that cannot be readily repeated (for ethical or resource-related reasons).
No other measures can be applied to protect against existing and potential risks of bias.
There are various sources of bias that affect the outcome of experimental studies and these sources are unique and specific to each research unit. There is usually no one who knows these risks better than the scientists working in the research unit, and it is always up to the scientist to decide if, when, and how blinding and randomization should be implemented. However, there are several recommendations that can help to decide and act in the most effective way:
Conduct a risk assessment for your research environment, and, if you do not know how to do that, ask for a professional support or advice.
Involve your team in developing and implementing the blinding/randomization protocols, and seek the team members’ feedback regarding the performance of these protocols (and revise them, as needed).
Provide training not only on how to administer blinding and randomization but also to preempt any questions related to the rationale behind these measures (i.e., experiments are blinded not because of the suspected misconduct or lack of trust).
Describe blinding and randomization procedures in dedicated protocols with as many details as possible (including emergency plans and accident reporting, as discussed above).
Ensure maximal transparency when reporting blinding and randomization (e.g., in a publication). When deciding to apply blinding and randomization, be maximally clear about the details (Table 4 ). When deciding against, be open about the reasons for such decision. Transparency is also essential when conducting multi-laboratory collaborative projects or when a study is outsourced to another laboratory. To avoid any misunderstanding, collaborators should specify expectations and reach alignment on study design prior to the experiment and communicate all important details in study reports.
Blinding and randomization should always be a part of a more general effort to introduce and maintain research rigor. Just as the randomization increases the likelihood that blinding will not be omitted (van der Worp et al. 2010 ), other Good Research Practices such as proper documentation are also highly instrumental in making blinding and randomization effective.
To conclude, blinding and randomization may be associated with some effort and additional costs, but, under all circumstances, a decision to apply these research rigor techniques should not be based on general statements and arguments by those who do not want to leave their comfort zone. Instead, the decision should be based on the applicable risk assessment and careful review of potential implementation burden. In many cases, this leads to a relieving discovery that the devil is not so black as he is painted.
https://stats.stackexchange.com/questions/74350/is-randomization-reliable-with-small-samples .
Carroll L (1871) Through the looking-glass, and what Alice found there. ICU Publishing
Google Scholar
Fitzpatrick BG, Koustova E, Wang Y (2018) Getting personal with the “reproducibility crisis”: interviews in the animal research community. Lab Anim 47:175–177
Article Google Scholar
Goldin C, Rouse C (2000) Orchestrating impartiality: the impact of “blind” auditions on female musicians. Am Econ Rev 90:715–741
Hooijmans CR, Rovers MM, de Vries RB, Leenaars M, Ritskes-Hoitinga M, Langendam MW (2014) SYRCLE’s risk of bias tool for animal studies. BMC Med Res Methodol 14:43
ICH Harmonised Tripartite Guideline (1998) Statistical principles for clinical trials (E9). CPMP/ICH/363/96, March 1998
Jones B, Kenward MG (2003) Design and analysis of cross-over designs, 2nd edn. Chapman and Hall, London
Kalish LA, Begg GB (1985) Treatment allocation methods in clinical trials a review. Stat Med 4:129–144
Article CAS Google Scholar
Krauth D, Woodruff TJ, Bero L (2013) Instruments for assessing risk of bias and other methodological criteria of published animal studies: a systematic review. Environ Health Perspect 121:985–992
Macleod MR, The NPQIP Collaborative Group (2017) Findings of a retrospective, controlled cohort study of the impact of a change in Nature journals’ editorial policy for life sciences research on the completeness of reporting study design and execution. bioRxiv:187245. https://doi.org/10.1101/187245
Macleod MR, van der Worp HB, Sena ES, Howells DW, Dirnagl U, Donnan GA (2008) Evidence for the efficacy of NXY-059 in experimental focal cerebral ischaemia is confounded by study quality. Stroke 39:2824–2829
Shun-Shin MJ, Francis DP (2013) Why even more clinical research studies may be false: effect of asymmetrical handling of clinically unexpected values. PLoS One 8(6):e65323
van der Worp HB, Howells DW, Sena ES, Porritt MJ, Rewell S, O’Collins V, Macleod MR (2010) Can animal models of disease reliably inform human studies? PLoS Med 7(3):e1000245
Download references
The authors would like to thank Dr. Thomas Steckler (Janssen), Dr. Kim Wever (Radboud University), and Dr. Jan Vollert (Imperial College London) for reading the earlier version of the manuscript and providing comments and suggestions.
Authors and affiliations.
Partnership for Assessment and Accreditation of Scientific Practice, Heidelberg, Germany
Anton Bespalov
Pavlov Medical University, St. Petersburg, Russia
AbbVie, Ludwigshafen, Germany
Karsten Wicke
Porsolt, Le Genest-Saint-Isle, France
Vincent Castagné
You can also search for this author in PubMed Google Scholar
Correspondence to Anton Bespalov .
Editors and affiliations.
Partnership for Assessment & Accreditation of Scientific Practice, Heidelberg, Baden-Württemberg, Germany
Department of Pharmacology, Johannes Gutenberg University, Mainz, Rheinland-Pfalz, Germany
Martin C. Michel
Janssen Pharmaceutica N.V., Beerse, Belgium
Thomas Steckler
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Reprints and permissions
© 2019 The Author(s)
Bespalov, A., Wicke, K., Castagné, V. (2019). Blinding and Randomization. In: Bespalov, A., Michel, M., Steckler, T. (eds) Good Research Practice in Non-Clinical Pharmacology and Biomedicine. Handbook of Experimental Pharmacology, vol 257. Springer, Cham. https://doi.org/10.1007/164_2019_279
DOI : https://doi.org/10.1007/164_2019_279
Published : 07 November 2019
Publisher Name : Springer, Cham
Print ISBN : 978-3-030-33655-4
Online ISBN : 978-3-030-33656-1
eBook Packages : Biomedical and Life Sciences Biomedical and Life Sciences (R0)
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
Policies and ethics
An official website of the United States government
The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.
The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.
Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .
An overview of randomization techniques: an unbiased assessment of outcome in clinical research.
Department of Biostatics, National Institute of Animal Nutrition & Physiology (NIANP), Adugodi, Bangalore, India
Randomization as a method of experimental control has been extensively used in human clinical trials and other biological experiments. It prevents the selection bias and insures against the accidental bias. It produces the comparable groups and eliminates the source of bias in treatment assignments. Finally, it permits the use of probability theory to express the likelihood of chance as a source for the difference of end outcome. This paper discusses the different methods of randomization and use of online statistical computing web programming ( www.graphpad.com /quickcalcs or www.randomization.com ) to generate the randomization schedule. Issues related to randomization are also discussed in this paper.
A good experiment or trial minimizes the variability of the evaluation and provides unbiased evaluation of the intervention by avoiding confounding from other factors, which are known and unknown. Randomization ensures that each patient has an equal chance of receiving any of the treatments under study, generate comparable intervention groups, which are alike in all the important aspects except for the intervention each groups receives. It also provides a basis for the statistical methods used in analyzing the data. The basic benefits of randomization are as follows: it eliminates the selection bias, balances the groups with respect to many known and unknown confounding or prognostic variables, and forms the basis for statistical tests, a basis for an assumption of free statistical test of the equality of treatments. In general, a randomized experiment is an essential tool for testing the efficacy of the treatment.
In practice, randomization requires generating randomization schedules, which should be reproducible. Generation of a randomization schedule usually includes obtaining the random numbers and assigning random numbers to each subject or treatment conditions. Random numbers can be generated by computers or can come from random number tables found in the most statistical text books. For simple experiments with small number of subjects, randomization can be performed easily by assigning the random numbers from random number tables to the treatment conditions. However, in the large sample size situation or if restricted randomization or stratified randomization to be performed for an experiment or if an unbalanced allocation ratio will be used, it is better to use the computer programming to do the randomization such as SAS, R environment etc.[ 1 – 6 ]
Researchers in life science research demand randomization for several reasons. First, subjects in various groups should not differ in any systematic way. In a clinical research, if treatment groups are systematically different, research results will be biased. Suppose that subjects are assigned to control and treatment groups in a study examining the efficacy of a surgical intervention. If a greater proportion of older subjects are assigned to the treatment group, then the outcome of the surgical intervention may be influenced by this imbalance. The effects of the treatment would be indistinguishable from the influence of the imbalance of covariates, thereby requiring the researcher to control for the covariates in the analysis to obtain an unbiased result.[ 7 , 8 ]
Second, proper randomization ensures no a priori knowledge of group assignment (i.e., allocation concealment). That is, researchers, subject or patients or participants, and others should not know to which group the subject will be assigned. Knowledge of group assignment creates a layer of potential selection bias that may taint the data.[ 9 ] Schul and Grimes stated that trials with inadequate or unclear randomization tended to overestimate treatment effects up to 40% compared with those that used proper randomization. The outcome of the research can be negatively influenced by this inadequate randomization.
Statistical techniques such as analysis of covariance (ANCOVA), multivariate ANCOVA, or both, are often used to adjust for covariate imbalance in the analysis stage of the clinical research. However, the interpretation of this post adjustment approach is often difficult because imbalance of covariates frequently leads to unanticipated interaction effects, such as unequal slopes among subgroups of covariates.[ 1 ] One of the critical assumptions in ANCOVA is that the slopes of regression lines are the same for each group of covariates. The adjustment needed for each covariate group may vary, which is problematic because ANCOVA uses the average slope across the groups to adjust the outcome variable. Thus, the ideal way of balancing covariates among groups is to apply sound randomization in the design stage of a clinical research (before the adjustment procedure) instead of post data collection. In such instances, random assignment is necessary and guarantees validity for statistical tests of significance that are used to compare treatments.
Many procedures have been proposed for the random assignment of participants to treatment groups in clinical trials. In this article, common randomization techniques, including simple randomization, block randomization, stratified randomization, and covariate adaptive randomization, are reviewed. Each method is described along with its advantages and disadvantages. It is very important to select a method that will produce interpretable and valid results for your study. Use of online software to generate randomization code using block randomization procedure will be presented.
Randomization based on a single sequence of random assignments is known as simple randomization.[ 3 ] This technique maintains complete randomness of the assignment of a subject to a particular group. The most common and basic method of simple randomization is flipping a coin. For example, with two treatment groups (control versus treatment), the side of the coin (i.e., heads - control, tails - treatment) determines the assignment of each subject. Other methods include using a shuffled deck of cards (e.g., even - control, odd - treatment) or throwing a dice (e.g., below and equal to 3 - control, over 3 - treatment). A random number table found in a statistics book or computer-generated random numbers can also be used for simple randomization of subjects.
This randomization approach is simple and easy to implement in a clinical research. In large clinical research, simple randomization can be trusted to generate similar numbers of subjects among groups. However, randomization results could be problematic in relatively small sample size clinical research, resulting in an unequal number of participants among groups.
The block randomization method is designed to randomize subjects into groups that result in equal sample sizes. This method is used to ensure a balance in sample size across groups over time. Blocks are small and balanced with predetermined group assignments, which keeps the numbers of subjects in each group similar at all times.[ 1 , 2 ] The block size is determined by the researcher and should be a multiple of the number of groups (i.e., with two treatment groups, block size of either 4, 6, or 8). Blocks are best used in smaller increments as researchers can more easily control balance.[ 10 ]
After block size has been determined, all possible balanced combinations of assignment within the block (i.e., equal number for all groups within the block) must be calculated. Blocks are then randomly chosen to determine the patients’ assignment into the groups.
Although balance in sample size may be achieved with this method, groups may be generated that are rarely comparable in terms of certain covariates. For example, one group may have more participants with secondary diseases (e.g., diabetes, multiple sclerosis, cancer, hypertension, etc.) that could confound the data and may negatively influence the results of the clinical trial.[ 11 ] Pocock and Simon stressed the importance of controlling for these covariates because of serious consequences to the interpretation of the results. Such an imbalance could introduce bias in the statistical analysis and reduce the power of the study. Hence, sample size and covariates must be balanced in clinical research.
The stratified randomization method addresses the need to control and balance the influence of covariates. This method can be used to achieve balance among groups in terms of subjects’ baseline characteristics (covariates). Specific covariates must be identified by the researcher who understands the potential influence each covariate has on the dependent variable. Stratified randomization is achieved by generating a separate block for each combination of covariates, and subjects are assigned to the appropriate block of covariates. After all subjects have been identified and assigned into blocks, simple randomization is performed within each block to assign subjects to one of the groups.
The stratified randomization method controls for the possible influence of covariates that would jeopardize the conclusions of the clinical research. For example, a clinical research of different rehabilitation techniques after a surgical procedure will have a number of covariates. It is well known that the age of the subject affects the rate of prognosis. Thus, age could be a confounding variable and influence the outcome of the clinical research. Stratified randomization can balance the control and treatment groups for age or other identified covariates. Although stratified randomization is a relatively simple and useful technique, especially for smaller clinical trials, it becomes complicated to implement if many covariates must be controlled.[ 12 ] Stratified randomization has another limitation; it works only when all subjects have been identified before group assignment. However, this method is rarely applicable because clinical research subjects are often enrolled one at a time on a continuous basis. When baseline characteristics of all subjects are not available before assignment, using stratified randomization is difficult.[ 10 ]
One potential problem with small to moderate size clinical research is that simple randomization (with or without taking stratification of prognostic variables into account) may result in imbalance of important covariates among treatment groups. Imbalance of covariates is important because of its potential to influence the interpretation of a research results. Covariate adaptive randomization has been recommended by many researchers as a valid alternative randomization method for clinical research.[ 8 , 13 ] In covariate adaptive randomization, a new participant is sequentially assigned to a particular treatment group by taking into account the specific covariates and previous assignments of participants.[ 7 ] Covariate adaptive randomization uses the method of minimization by assessing the imbalance of sample size among several covariates.
Using the online randomization http://www.graphpad.com/quickcalcs/index.cfm , researcher can generate randomization plan for treatment assignment to patients. This online software is very simple and easy to implement. Up to 10 treatments can be allocated to patients and the replication of treatment can also be performed up to 9 times. The major limitations of this software is that once the randomization plan is generated, same randomization plan cannot be generated as this uses the seed point of local computer clock and is not displayed for further use. Other limitation of this online software Maximum of only 10 treatments can be assigned to patients. Entering the web address http://www.graphpad.com/quickcalcs/index.cfm on address bar of any browser, the page of graphpad appears with number of options. Select the option of “Random Numbers” and then press continue, Random Number Calculator with three options appears. Select the tab “Randomly assign subjects to groups” and press continue. In the next page, enter the number of subjects in each group in the tab “Assign” and select the number of groups from the tab “Subjects to each group” and keep number 1 in repeat tab if there is no replication in the study. For example, the total number of patients in a three group experimental study is 30 and each group will assigned to 10 patients. Type 10 in the “Assign” tab and select 3 in the tab “Subjects to each group” and then press “do it” button. The results is obtained as shown as below (partial output is presented)
Another randomization online software, which can be used to generate randomization plan is http://www.randomization.com . The seed for the random number generator[ 14 , 15 ] (Wichmann and Hill, 1982, as modified by McLeod, 1985) is obtained from the clock of the local computer and is printed at the bottom of the randomization plan. If a seed is included in the request, it overrides the value obtained from the clock and can be used to reproduce or verify a particular plan. Up to 20 treatments can be specified. The randomization plan is not affected by the order in which the treatments are entered or the particular boxes left blank if not all are needed. The program begins by sorting treatment names internally. The sorting is case sensitive, however, so the same capitalization should be used when recreating an earlier plan. Example of 10 patients allocating to two groups (each with 5 patients), first the enter the treatment labels in the boxes, and enter the total number of patients that is 10 in the tab “Number of subjects per block” and enter the 1 in the tab “Number of blocks” for simple randomization or more than one for Block randomization. The output of this online software is presented as follows.
The benefits of randomization are numerous. It ensures against the accidental bias in the experiment and produces comparable groups in all the respect except the intervention each group received. The purpose of this paper is to introduce the randomization, including concept and significance and to review several randomization techniques to guide the researchers and practitioners to better design their randomized clinical trials. Use of online randomization was effectively demonstrated in this article for benefit of researchers. Simple randomization works well for the large clinical trails ( n >100) and for small to moderate clinical trials ( n <100) without covariates, use of block randomization helps to achieve the balance. For small to moderate size clinical trials with several prognostic factors or covariates, the adaptive randomization method could be more useful in providing a means to achieve treatment balance.
Source of Support: Nil
Conflict of Interest: None declared.
Chapter 1 principles of experimental design, 1.1 introduction.
The validity of conclusions drawn from a statistical analysis crucially hinges on the manner in which the data are acquired, and even the most sophisticated analysis will not rescue a flawed experiment. Planning an experiment and thinking about the details of data acquisition is so important for a successful analysis that R. A. Fisher—who single-handedly invented many of the experimental design techniques we are about to discuss—famously wrote
To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ( Fisher 1938 )
(Statistical) design of experiments provides the principles and methods for planning experiments and tailoring the data acquisition to an intended analysis. Design and analysis of an experiment are best considered as two aspects of the same enterprise: the goals of the analysis strongly inform an appropriate design, and the implemented design determines the possible analyses.
The primary aim of designing experiments is to ensure that valid statistical and scientific conclusions can be drawn that withstand the scrutiny of a determined skeptic. Good experimental design also considers that resources are used efficiently, and that estimates are sufficiently precise and hypothesis tests adequately powered. It protects our conclusions by excluding alternative interpretations or rendering them implausible. Three main pillars of experimental design are randomization , replication , and blocking , and we will flesh out their effects on the subsequent analysis as well as their implementation in an experimental design.
An experimental design is always tailored towards predefined (primary) analyses and an efficient analysis and unambiguous interpretation of the experimental data is often straightforward from a good design. This does not prevent us from doing additional analyses of interesting observations after the data are acquired, but these analyses can be subjected to more severe criticisms and conclusions are more tentative.
In this chapter, we provide the wider context for using experiments in a larger research enterprise and informally introduce the main statistical ideas of experimental design. We use a comparison of two samples as our main example to study how design choices affect an analysis, but postpone a formal quantitative analysis to the next chapters.
For illustrating some of the issues arising in the interplay of experimental design and analysis, we consider a simple example. We are interested in comparing the enzyme levels measured in processed blood samples from laboratory mice, when the sample processing is done either with a kit from a vendor A, or a kit from a competitor B. For this, we take 20 mice and randomly select 10 of them for sample preparation with kit A, while the blood samples of the remaining 10 mice are prepared with kit B. The experiment is illustrated in Figure 1.1 A and the resulting data are given in Table 1.1 .
A | 8.96 | 8.95 | 11.37 | 12.63 | 11.38 | 8.36 | 6.87 | 12.35 | 10.32 | 11.99 |
B | 12.68 | 11.37 | 12.00 | 9.81 | 10.35 | 11.76 | 9.01 | 10.83 | 8.76 | 9.99 |
One option for comparing the two kits is to look at the difference in average enzyme levels, and we find an average level of 10.32 for vendor A and 10.66 for vendor B. We would like to interpret their difference of -0.34 as the difference due to the two preparation kits and conclude whether the two kits give equal results or if measurements based on one kit are systematically different from those based on the other kit.
Such interpretation, however, is only valid if the two groups of mice and their measurements are identical in all aspects except the sample preparation kit. If we use one strain of mice for kit A and another strain for kit B, any difference might also be attributed to inherent differences between the strains. Similarly, if the measurements using kit B were conducted much later than those using kit A, any observed difference might be attributed to changes in, e.g., mice selected, batches of chemicals used, device calibration, or any number of other influences. None of these competing explanations for an observed difference can be excluded from the given data alone, but good experimental design allows us to render them (almost) arbitrarily implausible.
A second aspect for our analysis is the inherent uncertainty in our calculated difference: if we repeat the experiment, the observed difference will change each time, and this will be more pronounced for a smaller number of mice, among others. If we do not use a sufficient number of mice in our experiment, the uncertainty associated with the observed difference might be too large, such that random fluctuations become a plausible explanation for the observed difference. Systematic differences between the two kits, of practically relevant magnitude in either direction, might then be compatible with the data, and we can draw no reliable conclusions from our experiment.
In each case, the statistical analysis—no matter how clever—was doomed before the experiment was even started, while simple ideas from statistical design of experiments would have provided correct and robust results with interpretable conclusions.
By an experiment we understand an investigation where the researcher has full control over selecting and altering the experimental conditions of interest, and we only consider investigations of this type. The selected experimental conditions are called treatments . An experiment is comparative if the responses to several treatments are to be compared or contrasted. The experimental units are the smallest subdivision of the experimental material to which a treatment can be assigned. All experimental units given the same treatment constitute a treatment group . Especially in biology, we often compare treatments to a control group to which some standard experimental conditions are applied; a typical example is using a placebo for the control group, and different drugs for the other treatment groups.
The values observed are called responses and are measured on the response units ; these are often identical to the experimental units but need not be. Multiple experimental units are sometimes combined into groupings or blocks , such as mice grouped by litter, or samples grouped by batches of chemicals used for their preparation. More generally, we call any grouping of the experimental material (even with group size one) a unit .
In our example, we selected the mice, used a single sample per mouse, deliberately chose the two specific vendors, and had full control over which kit to assign to which mouse. In other words, the two kits are the treatments and the mice are the experimental units. We took the measured enzyme level of a single sample from a mouse as our response, and samples are therefore the response units. The resulting experiment is comparative, because we contrast the enzyme levels between the two treatment groups.
Figure 1.1: Three designs to determine the difference between two preparation kits A and B based on four mice. A: One sample per mouse. Comparison between averages of samples with same kit. B: Two samples per mouse treated with the same kit. Comparison between averages of mice with same kit requires averaging responses for each mouse first. C: Two samples per mouse each treated with different kit. Comparison between two samples of each mouse, with differences averaged.
In this example, we can coalesce experimental and response units, because we have a single response per mouse and cannot distinguish a sample from a mouse in the analysis, as illustrated in Figure 1.1 A for four mice. Responses from mice with the same kit are averaged, and the kit difference is the difference between these two averages.
By contrast, if we take two samples per mouse and use the same kit for both samples, then the mice are still the experimental units, but each mouse now groups the two response units associated with it. Now, responses from the same mouse are first averaged, and these averages are used to calculate the difference between kits; even though eight measurements are available, this difference is still based on only four mice (Figure 1.1 B).
If we take two samples per mouse, but apply each kit to one of the two samples, then the samples are both the experimental and response units, while the mice are blocks that group the samples. Now, we calculate the difference between kits for each mouse, and then average these differences (Figure 1.1 C).
If we only use one kit and determine the average enzyme level, then this investigation is still an experiment, but is not comparative.
To summarize, the design of an experiment determines the logical structure of the experiment ; it consists of (i) a set of treatments (the two kits); (ii) a specification of the experimental units (animals, cell lines, samples) (the mice in Figure 1.1 A,B and the samples in Figure 1.1 C); (iii) a procedure for assigning treatments to units; and (iv) a specification of the response units and the quantity to be measured as a response (the samples and associated enzyme levels).
Before we embark on the more technical aspects of experimental design, we discuss three components for evaluating an experiment’s validity: construct validity , internal validity , and external validity . These criteria are well-established in areas such as educational and psychological research, and have more recently been discussed for animal research ( Würbel 2017 ) where experiments are increasingly scrutinized for their scientific rationale and their design and intended analyses.
Construct validity concerns the choice of the experimental system for answering our research question. Is the system even capable of providing a relevant answer to the question?
Studying the mechanisms of a particular disease, for example, might require careful choice of an appropriate animal model that shows a disease phenotype and is accessible to experimental interventions. If the animal model is a proxy for drug development for humans, biological mechanisms must be sufficiently similar between animal and human physiologies.
Another important aspect of the construct is the quantity that we intend to measure (the measurand ), and its relation to the quantity or property we are interested in. For example, we might measure the concentration of the same chemical compound once in a blood sample and once in a highly purified sample, and these constitute two different measurands, whose values might not be comparable. Often, the quantity of interest (e.g., liver function) is not directly measurable (or even quantifiable) and we measure a biomarker instead. For example, pre-clinical and clinical investigations may use concentrations of proteins or counts of specific cell types from blood samples, such as the CD4+ cell count used as a biomarker for immune system function.
The internal validity of an experiment concerns the soundness of the scientific rationale, statistical properties such as precision of estimates, and the measures taken against risk of bias. It refers to the validity of claims within the context of the experiment. Statistical design of experiments plays a prominent role in ensuring internal validity, and we briefly discuss the main ideas before providing the technical details and an application to our example in the subsequent sections.
The scientific rationale of a study is (usually) not immediately a statistical question. Translating a scientific question into a quantitative comparison amenable to statistical analysis is no small task and often requires careful consideration. It is a substantial, if non-statistical, benefit of using experimental design that we are forced to formulate a precise-enough research question and decide on the main analyses required for answering it before we conduct the experiment. For example, the question: is there a difference between placebo and drug? is insufficiently precise for planning a statistical analysis and determine an adequate experimental design. What exactly is the drug treatment? What should the drug’s concentration be and how is it administered? How do we make sure that the placebo group is comparable to the drug group in all other aspects? What do we measure and what do we mean by “difference?” A shift in average response, a fold-change, change in response before and after treatment?
The scientific rationale also enters the choice of a potential control group to which we compare responses. The quote
The deep, fundamental question in statistical analysis is ‘Compared to what?’ ( Tufte 1997 )
highlights the importance of this choice.
There are almost never enough resources to answer all relevant scientific questions. We therefore define a few questions of highest interest, and the main purpose of the experiment is answering these questions in the primary analysis . This intended analysis drives the experimental design to ensure relevant estimates can be calculated and have sufficient precision, and tests are adequately powered. This does not preclude us from conducting additional secondary analyses and exploratory analyses , but we are not willing to enlarge the experiment to ensure that strong conclusions can also be drawn from these analyses.
Experimental bias is a systematic difference in response between experimental units in addition to the difference caused by the treatments. The experimental units in the different groups are then not equal in all aspects other than the treatment applied to them. We saw several examples in Section 1.2 .
Minimizing the risk of bias is crucial for internal validity and we look at some common measures to eliminate or reduce different types of bias in Section 1.5 .
Another aspect of internal validity is the precision of estimates and the expected effect sizes. Is the experimental setup, in principle, able to detect a difference of relevant magnitude? Experimental design offers several methods for answering this question based on the expected heterogeneity of samples, the measurement error, and other sources of variation: power analysis is a technique for determining the number of samples required to reliably detect a relevant effect size and provide estimates of sufficient precision. More samples yield more precision and more power, but we have to be careful that replication is done at the right level: simply measuring a biological sample multiple times as in Figure 1.1 B yields more measured values, but is pseudo-replication for analyses. Replication should also ensure that the statistical uncertainties of estimates can be gauged from the data of the experiment itself, without additional untestable assumptions. Finally, the technique of blocking , shown in Figure 1.1 C, can remove a substantial proportion of the variation and thereby increase power and precision if we find a way to apply it.
The external validity of an experiment concerns its replicability and the generalizability of inferences. An experiment is replicable if its results can be confirmed by an independent new experiment, preferably by a different lab and researcher. Experimental conditions in the replicate experiment usually differ from the original experiment, which provides evidence that the observed effects are robust to such changes. A much weaker condition on an experiment is reproducibility , the property that an independent researcher draws equivalent conclusions based on the data from this particular experiment, using the same analysis techniques. Reproducibility requires publishing the raw data, details on the experimental protocol, and a description of the statistical analyses, preferably with accompanying source code. Many scientific journals subscribe to reporting guidelines to ensure reproducibility and these are also helpful for planning an experiment.
A main threat to replicability and generalizability are too tightly controlled experimental conditions, when inferences only hold for a specific lab under the very specific conditions of the original experiment. Introducing systematic heterogeneity and using multi-center studies effectively broadens the experimental conditions and therefore the inferences for which internal validity is available.
For systematic heterogeneity , experimental conditions are systematically altered in addition to the treatments, and treatment differences estimated for each condition. For example, we might split the experimental material into several batches and use a different day of analysis, sample preparation, batch of buffer, measurement device, and lab technician for each batch. A more general inference is then possible if effect size, effect direction, and precision are comparable between the batches, indicating that the treatment differences are stable over the different conditions.
In multi-center experiments , the same experiment is conducted in several different labs and the results compared and merged. Multi-center approaches are very common in clinical trials and often necessary to reach the required number of patient enrollments.
Generalizability of randomized controlled trials in medicine and animal studies can suffer from overly restrictive eligibility criteria. In clinical trials, patients are often included or excluded based on co-medications and co-morbidities, and the resulting sample of eligible patients might no longer be representative of the patient population. For example, Travers et al. ( 2007 ) used the eligibility criteria of 17 random controlled trials of asthma treatments and found that out of 749 patients, only a median of 6% (45 patients) would be eligible for an asthma-related randomized controlled trial. This puts a question mark on the relevance of the trials’ findings for asthma patients in general.
1.5.1 randomization of treatment allocation.
If systematic differences other than the treatment exist between our treatment groups, then the effect of the treatment is confounded with these other differences and our estimates of treatment effects might be biased.
We remove such unwanted systematic differences from our treatment comparisons by randomizing the allocation of treatments to experimental units. In a completely randomized design , each experimental unit has the same chance of being subjected to any of the treatments, and any differences between the experimental units other than the treatments are distributed over the treatment groups. Importantly, randomization is the only method that also protects our experiment against unknown sources of bias: we do not need to know all or even any of the potential differences and yet their impact is eliminated from the treatment comparisons by random treatment allocation.
Randomization has two effects: (i) differences unrelated to treatment become part of the ‘statistical noise’ rendering the treatment groups more similar; and (ii) the systematic differences are thereby eliminated as sources of bias from the treatment comparison.
Randomization transforms systematic variation into random variation.
In our example, a proper randomization would select 10 out of our 20 mice fully at random, such that the probability of any one mouse being picked is 1/20. These ten mice are then assigned to kit A, and the remaining mice to kit B. This allocation is entirely independent of the treatments and of any properties of the mice.
To ensure random treatment allocation, some kind of random process needs to be employed. This can be as simple as shuffling a pack of 10 red and 10 black cards or using a software-based random number generator. Randomization is slightly more difficult if the number of experimental units is not known at the start of the experiment, such as when patients are recruited for an ongoing clinical trial (sometimes called rolling recruitment ), and we want to have reasonable balance between the treatment groups at each stage of the trial.
Seemingly random assignments “by hand” are usually no less complicated than fully random assignments, but are always inferior. If surprising results ensue from the experiment, such assignments are subject to unanswerable criticism and suspicion of unwanted bias. Even worse are systematic allocations; they can only remove bias from known causes, and immediately raise red flags under the slightest scrutiny.
Even with a fully random treatment allocation procedure, we might end up with an undesirable allocation. For our example, the treatment group of kit A might—just by chance—contain mice that are all bigger or more active than those in the other treatment group. Statistical orthodoxy recommends using the design nevertheless, because only full randomization guarantees valid estimates of residual variance and unbiased estimates of effects. This argument, however, concerns the long-run properties of the procedure and seems of little help in this specific situation. Why should we care if the randomization yields correct estimates under replication of the experiment, if the particular experiment is jeopardized?
Another solution is to create a list of all possible allocations that we would accept and randomly choose one of these allocations for our experiment. The analysis should then reflect this restriction in the possible randomizations, which often renders this approach difficult to implement.
The most pragmatic method is to reject highly undesirable designs and compute a new randomization ( Cox 1958 ) . Undesirable allocations are unlikely to arise for large sample sizes, and we might accept a small bias in estimation for small sample sizes, when uncertainty in the estimated treatment effect is already high. In this approach, whenever we reject a particular outcome, we must also be willing to reject the outcome if we permute the treatment level labels. If we reject eight big and two small mice for kit A, then we must also reject two big and eight small mice. We must also be transparent and report a rejected allocation, so that critics may come to their own conclusions about potential biases and their remedies.
Bias in treatment comparisons is also introduced if treatment allocation is random, but responses cannot be measured entirely objectively, or if knowledge of the assigned treatment affects the response. In clinical trials, for example, patients might react differently when they know to be on a placebo treatment, an effect known as cognitive bias . In animal experiments, caretakers might report more abnormal behavior for animals on a more severe treatment. Cognitive bias can be eliminated by concealing the treatment allocation from technicians or participants of a clinical trial, a technique called single-blinding .
If response measures are partially based on professional judgement (such as a clinical scale), patient or physician might unconsciously report lower scores for a placebo treatment, a phenomenon known as observer bias . Its removal requires double blinding , where treatment allocations are additionally concealed from the experimentalist.
Blinding requires randomized treatment allocation to begin with and substantial effort might be needed to implement it. Drug companies, for example, have to go to great lengths to ensure that a placebo looks, tastes, and feels similar enough to the actual drug. Additionally, blinding is often done by coding the treatment conditions and samples, and effect sizes and statistical significance are calculated before the code is revealed.
In clinical trials, double-blinding creates a conflict of interest. The attending physicians do not know which patient received which treatment, and thus accumulation of side-effects cannot be linked to any treatment. For this reason, clinical trials have a data monitoring committee not involved in the final analysis, that performs intermediate analyses of efficacy and safety at predefined intervals. If severe problems are detected, the committee might recommend altering or aborting the trial. The same might happen if one treatment already shows overwhelming evidence of superiority, such that it becomes unethical to withhold this treatment from the other patients.
An often overlooked source of bias has been termed the researcher degrees of freedom or garden of forking paths in the data analysis. For any set of data, there are many different options for its analysis: some results might be considered outliers and discarded, assumptions are made on error distributions and appropriate test statistics, different covariates might be included into a regression model. Often, multiple hypotheses are investigated and tested, and analyses are done separately on various (overlapping) subgroups. Hypotheses formed after looking at the data require additional care in their interpretation; almost never will \(p\) -values for these ad hoc or post hoc hypotheses be statistically justifiable. Many different measured response variables invite fishing expeditions , where patterns in the data are sought without an underlying hypothesis. Only reporting those sub-analyses that gave ‘interesting’ findings invariably leads to biased conclusions and is called cherry-picking or \(p\) -hacking (or much less flattering names).
The statistical analysis is always part of a larger scientific argument and we should consider the necessary computations in relation to building our scientific argument about the interpretation of the data. In addition to the statistical calculations, this interpretation requires substantial subject-matter knowledge and includes (many) non-statistical arguments. Two quotes highlight that experiment and analysis are a means to an end and not the end in itself.
There is a boundary in data interpretation beyond which formulas and quantitative decision procedures do not go, where judgment and style enter. ( Abelson 1995 )
Often, perfectly reasonable people come to perfectly reasonable decisions or conclusions based on nonstatistical evidence. Statistical analysis is a tool with which we support reasoning. It is not a goal in itself. ( Bailar III 1981 )
There is often a grey area between exploiting researcher degrees of freedom to arrive at a desired conclusion, and creative yet informed analyses of data. One way to navigate this area is to distinguish between exploratory studies and confirmatory studies . The former have no clearly stated scientific question, but are used to generate interesting hypotheses by identifying potential associations or effects that are then further investigated. Conclusions from these studies are very tentative and must be reported honestly as such. In contrast, standards are much higher for confirmatory studies, which investigate a specific predefined scientific question. Analysis plans and pre-registration of an experiment are accepted means for demonstrating lack of bias due to researcher degrees of freedom, and separating primary from secondary analyses allows emphasizing the main goals of the study.
The analysis plan is written before conducting the experiment and details the measurands and estimands, the hypotheses to be tested together with a power and sample size calculation, a discussion of relevant effect sizes, detection and handling of outliers and missing data, as well as steps for data normalization such as transformations and baseline corrections. If a regression model is required, its factors and covariates are outlined. Particularly in biology, handling measurements below the limit of quantification and saturation effects require careful consideration.
In the context of clinical trials, the problem of estimands has become a recent focus of attention. An estimand is the target of a statistical estimation procedure, for example the true average difference in enzyme levels between the two preparation kits. A main problem in many studies are post-randomization events that can change the estimand, even if the estimation procedure remains the same. For example, if kit B fails to produce usable samples for measurement in five out of ten cases because the enzyme level was too low, while kit A could handle these enzyme levels perfectly fine, then this might severely exaggerate the observed difference between the two kits. Similar problems arise in drug trials, when some patients stop taking one of the drugs due to side-effects or other complications.
Registration of experiments is an even more severe measure used in conjunction with an analysis plan and is becoming standard in clinical trials. Here, information about the trial, including the analysis plan, procedure to recruit patients, and stopping criteria, are registered in a public database. Publications based on the trial then refer to this registration, such that reviewers and readers can compare what the researchers intended to do and what they actually did. Similar portals for pre-clinical and translational research are also available.
The problem of measurements and measurands is further discussed for statistics in Hand ( 1996 ) and specifically for biological experiments in Coxon, Longstaff, and Burns ( 2019 ) . A general review of methods for handling missing data is Dong and Peng ( 2013 ) . The different roles of randomization are emphasized in Cox ( 2009 ) .
Two well-known reporting guidelines are the ARRIVE guidelines for animal research ( Kilkenny et al. 2010 ) and the CONSORT guidelines for clinical trials ( Moher et al. 2010 ) . Guidelines describing the minimal information required for reproducing experimental results have been developed for many types of experimental techniques, including microarrays (MIAME), RNA sequencing (MINSEQE), metabolomics (MSI) and proteomics (MIAPE) experiments; the FAIRSHARE initiative provides a more comprehensive collection ( Sansone et al. 2019 ) .
The problems of experimental design in animal experiments and particularly translation research are discussed in Couzin-Frankel ( 2013 ) . Multi-center studies are now considered for these investigations, and using a second laboratory already increases reproducibility substantially ( Richter et al. 2010 ; Richter 2017 ; Voelkl et al. 2018 ; Karp 2018 ) and allows standardizing the treatment effects ( Kafkafi et al. 2017 ) . First attempts are reported of using designs similar to clinical trials ( Llovera and Liesz 2016 ) . Exploratory-confirmatory research and external validity for animal studies is discussed in Kimmelman, Mogil, and Dirnagl ( 2014 ) and Pound and Ritskes-Hoitinga ( 2018 ) . Further information on pilot studies is found in Moore et al. ( 2011 ) , Sim ( 2019 ) , and Thabane et al. ( 2010 ) .
The deliberate use of statistical analyses and their interpretation for supporting a larger argument was called statistics as principled argument ( Abelson 1995 ) . Employing useless statistical analysis without reference to the actual scientific question is surrogate science ( Gigerenzer and Marewski 2014 ) and adaptive thinking is integral to meaningful statistical analysis ( Gigerenzer 2002 ) .
In an experiment, the investigator has full control over the experimental conditions applied to the experiment material. The experimental design gives the logical structure of an experiment: the units describing the organization of the experimental material, the treatments and their allocation to units, and the response. Statistical design of experiments includes techniques to ensure internal validity of an experiment, and methods to make inference from experimental data efficient.
Please enter the email address you used for your account. Your sign in information will be sent to your email address after it has been verified.
Completely Randomized Design (CRD) is a research methodology in which experimental units are randomly assigned to treatments without any systematic bias. CRD gained prominence in the early 20th century, largely attributed to the pioneering work of statistician Ronald A. Fisher . His method addressed the inherent variability in experimental units by randomly assigning treatments, thus countering potential biases. Today, CRD serves as an indispensable tool in various domains, including agriculture, medicine, industrial engineering, and quality control analysis.
CRD is particularly favored in situations with limited control over external variables. By leveraging its inherent randomness, CRD neutralizes potentially confounding factors. As a result, each experimental unit has an equal likelihood of receiving any specific treatment, ensuring a level playing field. Such random allocation is pivotal in eliminating systematic bias and bolstering the validity of experimental conclusions.
While CRD may sometimes necessitate larger sample sizes , the improved accuracy and consistency it introduces to results often justify this requirement.
At its core, CRD is centered on harnessing randomness to achieve objective experimental outcomes. This approach effectively addresses unanticipated extraneous variables —those not included in the study design but that can still influence the response variable. In the context of CRD, these extraneous variables are expected to be uniformly distributed across treatments, thereby mitigating their potential influence.
A key aspect of CRD is the single-factor experiment. This means that the experiment revolves around changing or manipulating one primary independent variable (or factor) to ascertain its effect on the dependent variable . Consider these examples across different fields:
For each of these scenarios, only one key factor or independent variable is intentionally varied, while any changes or outcomes in another variable (the dependent variable) are observed and recorded. This distinct focus on a single variable, while keeping all others constant or controlled, underscores the essence of the single-factor experiment in CRD.
Understanding the strengths of Completely Randomized Design is pivotal for effectively applying this research tool and interpreting results accurately. Below is an exploration of the benefits of employing CRD in research studies.
While CRD is marked by simplicity, flexibility, robustness, and enhanced generalizability, it is essential to carefully consider its limitations. A thoughtful analysis of these aspects will guide researchers in making informed decisions about the applicability of CRD to their specific research context.
CRD stands out in the realm of research designs due to its foundational simplicity. While its essence lies in the random assignment of experimental units to treatments without any systematic bias, other designs introduce varying layers of complexity tailored to specific experimental needs.
For instance, consider the Randomized Block Design (RBD) . Unlike the straightforward approach of CRD, RBD divides experimental units into homogenous blocks, based on known sources of variability, before assigning treatments. This method is especially useful when there's an identifiable source of variability that researchers wish to control for. Similarly, the Latin Square Design , while also involving random assignment, operates on a grid system to simultaneously control for two lurking variables , adding another dimension of complexity not found in CRD.
Factorial Design investigates the effects and interactions of multiple independent variables. This design can reveal interactions that might be overlooked in simpler designs. Then there's the Crossover Design , often used in medical trials. Unlike CRD, where each unit experiences only one treatment, in Crossover Design, participants receive multiple treatments over different periods, allowing each participant to serve as their own control.
The choice of research design, whether it be CRD, RBD, Latin Square, or any of the other methods available, is fundamentally guided by the nature of the research question , the characteristics of the experimental units, and the specific objectives the study aims to achieve. However, it's the inherent simplicity and flexibility of CRD that often makes it the go-to choice, especially in scenarios with many units or treatments, where intricate stratification or blocking isn't necessary.
Let us further explore the advantages and disadvantages of each method.
Research Design | Description | Key Features | Advantages | Disadvantages |
---|---|---|---|---|
Completely Randomized Design (CRD) | Employs random assignment of experimental units to treatments without any systematic bias. | Simple and flexible Each unit experiences only one treatment | Simple structure makes it easy to implement | Does not control for any other variables; may require a larger sample size |
Randomized Block Design (RBD) | Divides experimental units into homogenous blocks based on known sources of variability before assigning treatments. | Controls for one source of variability More complex than CRD | Controls for known variability, potentially increasing the precision of the experiment | More complex to implement and analyze |
Latin Square Design | Uses a grid system to control for two lurking variables. | Controls for two sources of variability Adds complexity not found in CRD | Controls for two sources of variability | Complex design; may not be practical for all experiments |
Factorial Design | Investigates the effects and interactions of multiple independent variables. | Reveals interactions More complex design | Can assess interactions between factors | Complex and may require a large sample size |
Crossover Design | Participants receive multiple treatments over different periods. | Each participant serves as their own control Often used in medical trials | Each participant can serve as their own control, potentially reducing variability | Period effects and carryover effects can complicate results |
While CRD's simplicity and flexibility make it a popular choice for many research scenarios, the optimal design depends on the specific needs, objectives, and contexts of the study. Researchers must carefully consider these factors to select the most suitable research design method.
Within the framework of experimental research, extraneous variables persistently challenge the validity of findings, potentially compromising the established relationship between independent and dependent variables . CRD is a methodological safeguard that systematically addresses these extraneous variables. Below, we describe specific types of extraneous variables and how CRD counteracts their potential influence:
The foundational principle underpinning the Completely Randomized Design—randomization—serves as a bulwark against the influences of extraneous variables. By uniformly distributing these variables across experimental conditions, CRD enhances the validity and reliability of experimental outcomes. However, researchers should exercise caution and continuously evaluate potential extraneous influences, even in randomized designs.
The selection of the independent variable is crucial for research design . This pivotal step not only shapes the direction and quality of the research but also underpins the understanding of causal relationships within the studied system, influencing the dependent variable or response. When choosing this essential component of experimental design , several critical considerations emerge:
Identifying the independent variable necessitates a methodical and structured approach where each step aligns with the overarching research objective:
In academic discourse, while CRD is praised for its rigor and clarity, the effectiveness of the design relies heavily on the meticulous selection of the independent variable. Making this choice with thorough consideration ensures the research offers valuable insights with both academic and wider societal implications.
CRD has found wide and varied applications in several areas of research. Its versatility and fundamental simplicity make it an attractive option for scientists and researchers across a multitude of disciplines.
Agricultural research was among the earliest fields to adopt the use of Completely Randomized Design. The broad application of CRD within agriculture not only encompasses crop improvement but also the systematic analysis of various fertilizers, pesticides, and cropping techniques. Agricultural scientists leverage the CRD framework to scrutinize the effects on yield enhancement and bolstered disease resistance. The fundamental randomization in CRD effectively mitigates the influence of nuisance variables such as soil variations and microclimate differences, ensuring more reliable and valid experimental outcomes.
Additionally, CRD in agricultural research paves the way for robust testing of new agricultural products and methods. The unbiased allocation of treatments serves as a solid foundation for accurately determining the efficacy and potential downsides of innovative fertilizers, genetically modified seeds, and novel pest control methods, contributing to informed decision-making and policy formulation in agricultural development.
However, the limitations of CRD within the agricultural context warrant acknowledgment. While it offers an efficient and straightforward approach for experimental design, CRD may not always capture spatial variability within large agricultural fields adequately. Such unaccounted variations can potentially skew results, underscoring the necessity for employing more intricate experimental designs, such as the Randomized Complete Block Design (RCBD), where necessary. This adaptation enhances the reliability and generalizability of the research findings, ensuring their applicability to real-world agricultural challenges.
The fields of medical and health research substantially benefit from the application of Completely Randomized Design, especially in executing randomized control trials. Within this context, participants, whether patients or others, are randomly assigned to either the treatment or control groups. This structured random allocation minimizes the impact of extraneous variables, ensuring that the groups are comparable. It fortifies the assertion that any discernible differences in outcomes are genuinely attributable to the treatment being analyzed, enhancing the robustness and reliability of the research findings.
CRD's randomized nature in medical research allows for a more objective assessment of varied medical treatments and interventions. By mitigating the influence of extraneous variables, researchers can more accurately gauge the effectiveness and potential side effects of novel medical approaches, including pharmaceuticals and surgical techniques. This precision is crucial for the continual advancement of medical science, offering a solid empirical foundation for the refinement of treatments that improve health outcomes and patient quality of life.
However, like other fields, the application of CRD in medical research has its limitations. Despite its effectiveness in controlling various factors, CRD may not always consider the complexity of human health conditions where multiple variables often interact in intricate ways. Hence, while CRD remains a valuable tool for medical research, it is crucial to apply it judiciously and alongside other research designs to ensure comprehensive and reliable insights into medical treatments and interventions.
In industrial engineering, Completely Randomized Design plays a significant role in process and product testing, offering a reliable structure for the evaluation and improvement of industrial systems. Engineers often employ CRD in single-factor experiments to analyze the effects of a particular factor on a certain outcome, enhancing the precision and objectivity of the assessment.
For example, to discern the impact of varying temperatures on the strength of a metal alloy, engineers might utilize CRD. In this scenario, the different temperatures represent the single factor, and the alloy samples are randomly allocated to be tested at each designated temperature. This random assignment minimizes the influence of extraneous variables, ensuring that the observed effects on alloy strength are primarily attributable to the temperature variations.
CRD's implementation in industrial engineering also assists in the optimization of manufacturing processes. Through random assignment and structured testing, engineers can effectively evaluate process parameters, such as production speed, material quality, and machine settings. By accurately assessing the influence of these factors on production efficiency and product quality, engineers can implement informed adjustments and enhancements, promoting optimal operational performance and superior product standards. This systematic approach, anchored by CRD, facilitates consistent and robust industrial advancements, bolstering overall productivity and innovation in industrial engineering.
Despite these advantages, it's crucial to acknowledge the limitations of CRD in industrial engineering contexts. The design is efficient for single-factor experiments but may falter with experiments involving multiple factors and interactions, common in industrial settings. This limitation underscores the importance of combining CRD with other experimental designs. Doing so navigates the complex landscape of industrial engineering research, ensuring insights are comprehensive, accurate, and actionable for continuous innovation in industrial operations.
Completely Randomized Design is also beneficial in quality control analysis, where ensuring the consistency of products is paramount.
For instance, a manufacturer keen on minimizing product defects may deploy CRD to empirically assess the effectiveness of various inspection techniques. By randomly assigning different inspection methods to identical or similar production batches, the manufacturer can gather data regarding the most effective techniques for identifying and mitigating defects, bolstering overall product quality and consumer satisfaction.
Furthermore, the utility of CRD in quality control extends to the analysis of materials, machinery settings, or operational processes that are pivotal to final product quality. This design enables organizations to rigorously test and compare assorted conditions or settings, ensuring the selection of parameters that optimize both quality and efficiency. This approach to quality analysis not only bolsters the reliability and performance of products but also significantly augments the optimization of organizational resources, curtailing wastage and improving profitability.
However, similar to other CRD applications, it is crucial to understand its limitations. While CRD can significantly aid in the analysis and optimization of various aspects of quality control, its effectiveness may be constrained when dealing with multi-factorial scenarios with complex interactions. In such situations, other experimental designs, possibly in tandem with CRD, might offer more robust and comprehensive insights, ensuring that quality control measures are not only effective but also adaptable to evolving industrial and market demands.
The breadth of applications for Completely Randomized Design continues to expand. Emerging fields such as data science, business analytics, and environmental studies are increasingly recognizing the value of CRD in conducting reliable and uncomplicated experiments. In the realm of data science, CRD can be invaluable in assessing the performance of different algorithms, models, or data processing techniques. It enables researchers to randomize the variables, minimizing biases and providing a clearer understanding of the real-world applicability and effectiveness of various data-centric solutions.
In the domain of business analytics, CRD is paving the way for robust analysis of business strategies and initiatives. Businesses can employ CRD to randomly assign strategies or processes across various departments or teams, allowing for a comprehensive assessment of their impact. The insights from such assessments empower organizations to make data-driven decisions, optimizing their operations, and enhancing overall productivity and profitability. This approach is particularly crucial in the business environment of today, characterized by rapid changes, intense competition, and escalating customer expectations, where informed and timely decision-making is a key determinant of success.
Moreover, in environmental studies, CRD is increasingly being used to evaluate the impact of various factors on environmental health and sustainability. For example, researchers might use CRD to study the effects of different pollutants, conservation strategies, or land use patterns on ecosystem health. The randomized design ensures that the conclusions drawn are robust and reliable, providing a solid foundation for the development of policies and initiatives. As environmental concerns continue to mount, the role of reliable experimental designs like CRD in facilitating meaningful research and informed policy-making cannot be overstated.
A CRD experiment involves meticulous planning and execution, outlined in the following structured steps. Each phase, from the preparatory steps to data collection and analysis, plays a pivotal role in bolstering the integrity and success of the experiment, ensuring that the findings stand as a valuable contribution to scientific knowledge and understanding.
While the Completely Randomized Design offers numerous advantages, researchers often encounter specific challenges when implementing it in real-world experiments. Recognizing these challenges early and being prepared with strategies to address them can significantly improve the integrity and success of the CRD experiment. Let's delve into some of the most common challenges and explore potential solutions:
While CRD is a powerful tool in experimental research, its successful implementation hinges on the researcher's ability to anticipate, recognize, and navigate challenges that might arise. By being proactive and employing strategies to mitigate potential pitfalls, researchers can maximize the reliability and validity of their CRD experiments, ensuring meaningful and impactful results.
In summary, the Completely Randomized Design holds a pivotal place in the field of research owing to its simplicity and straightforward approach. Its essence lies in the unbiased random assignment of experimental units to various treatments, ensuring the reliability and validity of the results. Although it may not control for other variables and often requires larger sample sizes, its ease of implementation frequently outweighs these drawbacks, solidifying it as a preferred choice for researchers across many fields.
Looking ahead, the future of CRD remains bright. As research continues to evolve, we anticipate the integration of CRD with more sophisticated design techniques and advanced analytical tools. This synergy will likely enhance the efficiency and applicability of CRD in varied research contexts, perpetuating its legacy as a fundamental research design method. While other designs might offer more control and complexity, the fundamental simplicity of CRD will continue to hold significant value in the rapidly evolving research landscape.
Moving forward, it is imperative to champion continuous learning and exploration in the field of CRD. Engaging in educational opportunities, staying abreast of the latest research and advancements, and actively participating in pertinent discussions and forums can markedly enrich understanding and expertise in CRD. Embracing this ongoing learning journey will not only bolster individual research skills but also make a significant contribution to the broader scientific community, fueling innovation and discovery in numerous fields of study.
Header image by Alex Shuper .
Take a peek at our powerful survey features to design surveys that scale discoveries.
Download feature sheet.
Explore Voxco
Need to map Voxco’s features & offerings? We can help!
Watch a Demo
Download Brochures
Get a Quote
Get exclusive insights into research trends and best practices from top experts! Access Voxco’s ‘State of Research Report 2024 edition’ .
We’ve been avid users of the Voxco platform now for over 20 years. It gives us the flexibility to routinely enhance our survey toolkit and provides our clients with a more robust dataset and story to tell their clients.
VP Innovation & Strategic Partnerships, The Logit Group
Explore Regional Offices
Find the best survey software for you! (Along with a checklist to compare platforms)
Get Buyer’s Guide
Explore Voxco
Watch a Demo
Download Brochures
Find the best customer experience platform
Uncover customer pain points, analyze feedback and run successful CX programs with the best CX platform for your team.
Get the Guide Now
VP Innovation & Strategic Partnerships, The Logit Group
SHARE THE ARTICLE ON
Randomization in an experiment refers to a random assignment of participants to the treatment in an experiment. OR, for instance we can say that randomization is assignment of treatment to the participants randomly.
For example : a teacher decides to take a viva in the class and randomly starts asking the students.
Here, all the participants have equal chance of getting into the experiment. Like with our example, every student has equal chance of getting a question asked by the teacher. Randomization helps you stand a chance against biases. It can be a case when you select a group using some category, there can be personal biases or accidental biases. But when the selection is random, you don’t get a chance to look into each participant and hence the groups are fairly divided.
See Voxco survey software in action with a Free demo.
As mentioned earlier, randomization minimizes the biases. But apart from that it also provides various benefits when adopted as a selection method in experiments.
Randomization can be subject to errors when it comes to “randomly” selecting the participants. As for our example, the teacher surely said she will ask questions to random students, but it is possible that she might subconsciously target mischievous students. This means we think the selection is random, but most of the times it isn’t.
Hence, to avoid these unintended biases, there are three techniques that researchers use commonly:
In simple random sampling. The selection of the participants is completely luck and probability based. Every participant has an equal chance of getting into the sample.
This method is theoretically easy to understand and works best against a sample size of 100 or more. The main factor here is that every participants gets an equal chance of being included in a treatment, and this is why it is also called the method of chances.
Methods of simple random sampling:
Example : A teacher wants to know how good her class is in mathematics. So she will give each student a number and will draw numbers from a bunch of chits. This will include a randomly selected sample size and It won’t have any biases depending on teachers interference.
It is a method of randomly assigning participants to the treatment groups. A block is a group is randomly ordered treatment group. All the blocks have a fair balance of treatment assignment throughout.
Example : A teacher wants to enroll student in two treatments A and B. and she plans to enroll 6 students per week. The blocks would look like this:
Week 1- AABABA
Week 2- BABAAB
Week 3- BBABAB
Each block has 9 A and 9 B. both treatments have been balanced even though their ordering is random.
There are two types of block assignment in permuted block randomization:
Generate a random number for each treatment that is assign in the block. In our example, the block “Week 1” would look like- A(4), A(5), B(56), A(33), B(40), A(10)
Then arrange these treatments according to their number is ascending order, the new treatment could be- AAABB
This includes listing the permutations for the block. Simply, writing down all possible variations.
The formula is b! / ((b/2)! (b/2)!)
For our example, the block sixe is 6, so the possible arrangements would be:
6! / ((6/2)! (6/2)!)
6! / (3)! x (3)!
6x5x4x3x2x1 / (3x2x1) x (3x2x1)
20 possible arrangements.
The word “strata” refers to characteristics. Every population has characteristics like gender, cast, age, background etc. Stratified random sampling helps you consider these stratum while sampling the population. The stratum can be pre-defined or you can define them yourself any way you think is best suitable for your study.
Example: you want to categorize population of a state depending on literacy. Your categories would be- (1) Literate (2) Intermediate (3) Illiterate.
Steps to conduct stratified random sampling:
Explore all the survey question types possible on Voxco
Explore Voxco Survey Software
+ Omnichannel Survey Software
+ Online Survey Software
+ CATI Survey Software
+ IVR Survey Software
+ Market Research Tool
+ Customer Experience Tool
+ Product Experience Software
+ Enterprise Survey Software
User Intent – Why Are They Visiting Your Website? SHARE THE ARTICLE ON Table of Contents Everything is customer-centric – products, services, and even their
What are Customer Touch Points? See what question types are possible with a sample survey! Try a Sample Survey Table of Contents Customer touch points
Customer 360 analytics SHARE THE ARTICLE ON Share on facebook Share on twitter Share on linkedin Table of Contents Customer 360 view allows an organization
Understanding the Essence of Customer Needs Analysis See what question types are possible with a sample survey! Try a Sample Survey Table of Contents Customer
All there is to know about Experimental Design SHARE THE ARTICLE ON Table of Contents The research design involves methodology and procedure for collecting and
Creating a Positive Guest Experience in 2023 Try a free Voxco Online sample survey! Unlock your Sample Survey SHARE THE ARTICLE ON A challenging 2020
We use cookies in our website to give you the best browsing experience and to tailor advertising. By continuing to use our website, you give us consent to the use of cookies. Read More
Name | Domain | Purpose | Expiry | Type |
---|---|---|---|---|
hubspotutk | www.voxco.com | HubSpot functional cookie. | 1 year | HTTP |
lhc_dir_locale | amplifyreach.com | --- | 52 years | --- |
lhc_dirclass | amplifyreach.com | --- | 52 years | --- |
Name | Domain | Purpose | Expiry | Type |
---|---|---|---|---|
_fbp | www.voxco.com | Facebook Pixel advertising first-party cookie | 3 months | HTTP |
__hstc | www.voxco.com | Hubspot marketing platform cookie. | 1 year | HTTP |
__hssrc | www.voxco.com | Hubspot marketing platform cookie. | 52 years | HTTP |
__hssc | www.voxco.com | Hubspot marketing platform cookie. | Session | HTTP |
Name | Domain | Purpose | Expiry | Type |
---|---|---|---|---|
_gid | www.voxco.com | Google Universal Analytics short-time unique user tracking identifier. | 1 days | HTTP |
MUID | bing.com | Microsoft User Identifier tracking cookie used by Bing Ads. | 1 year | HTTP |
MR | bat.bing.com | Microsoft User Identifier tracking cookie used by Bing Ads. | 7 days | HTTP |
IDE | doubleclick.net | Google advertising cookie used for user tracking and ad targeting purposes. | 2 years | HTTP |
_vwo_uuid_v2 | www.voxco.com | Generic Visual Website Optimizer (VWO) user tracking cookie. | 1 year | HTTP |
_vis_opt_s | www.voxco.com | Generic Visual Website Optimizer (VWO) user tracking cookie that detects if the user is new or returning to a particular campaign. | 3 months | HTTP |
_vis_opt_test_cookie | www.voxco.com | A session (temporary) cookie used by Generic Visual Website Optimizer (VWO) to detect if the cookies are enabled on the browser of the user or not. | 52 years | HTTP |
_ga | www.voxco.com | Google Universal Analytics long-time unique user tracking identifier. | 2 years | HTTP |
_uetsid | www.voxco.com | Microsoft Bing Ads Universal Event Tracking (UET) tracking cookie. | 1 days | HTTP |
vuid | vimeo.com | Vimeo tracking cookie | 2 years | HTTP |
Name | Domain | Purpose | Expiry | Type |
---|---|---|---|---|
__cf_bm | hubspot.com | Generic CloudFlare functional cookie. | Session | HTTP |
Name | Domain | Purpose | Expiry | Type |
---|---|---|---|---|
_gcl_au | www.voxco.com | --- | 3 months | --- |
_gat_gtag_UA_3262734_1 | www.voxco.com | --- | Session | --- |
_clck | www.voxco.com | --- | 1 year | --- |
_ga_HNFQQ528PZ | www.voxco.com | --- | 2 years | --- |
_clsk | www.voxco.com | --- | 1 days | --- |
visitor_id18452 | pardot.com | --- | 10 years | --- |
visitor_id18452-hash | pardot.com | --- | 10 years | --- |
lpv18452 | pi.pardot.com | --- | Session | --- |
lhc_per | www.voxco.com | --- | 6 months | --- |
_uetvid | www.voxco.com | --- | 1 year | --- |
An official website of the United States government
The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.
The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.
Email citation, add to collections.
Your saved search, create a file for external citation management software, your rss feed.
Random assignment of treatments is an essential feature of experimental design in general and clinical trials in particular. It provides broad comparability of treatment groups and validates the use of statistical methods for the analysis of results. Various devices are available for improving the balance of prognostic factors across treatment groups. Several recent initiatives to diminish the role of randomization are seen as being potentially misleading. Randomization is entirely compatible with medical ethics in circumstances when the treatment of choice is not clearly identified.
PubMed Disclaimer
NCBI Literature Resources
MeSH PMC Bookshelf Disclaimer
The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.
Causal role of immune cells in bipolar disorder: a mendelian randomization study.
Background: The understanding of the immunological mechanisms underlying bipolar disorder (BD) has enhanced in recent years due to the extensive use of high-density genetic markers for genotyping and advancements in genome-wide association studies (GWAS). However, studies on the relationship between immune cells and the risk of BD remain limited, necessitating further investigation.
Methods: Bidirectional two-sample Mendelian Randomization (MR) analysis was employed to investigate the causal association between immune cell morphologies and bipolar disorder. Immune cell traits were collected from a research cohort in Sardinia, whereas the GWAS summary statistics for BD were obtained from the Psychiatric Genomics Consortium. Sensitivity analyses were conducted, and the combination of MR-Egger and MR-Presso was used to assess horizontal pleiotropy. Cochran’s Q test was employed to evaluate heterogeneity, and the results were adjusted for false discovery rate (FDR).
Results: The study identified six immune cell phenotypes significantly associated with BD incidence ( P < 0.01). These phenotypes include IgD- CD27- %lymphocyte, CD33br HLA DR+ CD14- AC, CD8 on CD28+ CD45RA+ CD8br, CD33br HLA DR+ AC, CD14 on CD14+ CD16+ monocyte, and HVEM on CD45RA- CD4+. After adjusting the FDR to 0.2, two immune cell phenotypes remained statistically significant: IgD-CD27-% lymphocyte (OR=1.099, 95% CI: 1.051-1.149, P = 3.51E-05, FDR=0.026) and CD33br HLA DR+ CD14-AC (OR=0.981, 95% CI: 0.971-0.991, P = 2.17E-04, FDR=0.079). In the reverse MR analysis, BD significantly impacted the phenotypes of four monocytes ( P < 0.01), including CD64 on CD14+ CD16+ monocyte, CD64 on monocyte, CX3CR1 on CD14- CD16-, CD64 on CD14+ CD16- monocyte. However, after applying the FDR correction (FDR < 0.2), no statistically significant results were observed.
Conclusions: This MR investigation reveals associations between immune cell phenotypes, bipolar disorder, and genetics, providing novel perspectives on prospective therapeutic targets for bipolar disorder.
Bipolar disorder, a lifelong and recurrent disorder, is characterized by alternating episodes of mania or hypomania and depression ( 1 ). Symptoms typically begin between ages 15 and 25 ( 2 ), and in severe instances, it can result in cognitive impairment ( 3 ), disability ( 4 ), and potentially suicide ( 5 ). It has been reported that individuals with BD have a suicide risk ranging from 4% to 19%, with 20% to 60% attempting suicide at least once in their lives ( 6 ). The etiology of BD remains uncertain, encompassing genetic predispositions, neurotransmitter activity, environmental influences, and immunological mechanisms ( 7 , 8 ). Currently, lithium is the preferred treatment for BD ( 9 ), with cognitive therapy and electroconvulsive therapy as alternatives ( 10 , 11 ). Timely identification and intervention assist individuals with BD in managing symptoms before severe problems arise, hence improving the prognosis ( 12 ). Recently, there has been a notable rise in BD incidence, resulting in an escalating worldwide impact ( 13 ).
The immune system and inflammation play significant roles in the pathology of various mental diseases ( 14 ). Recent research has demonstrated that the pathophysiology of BD is significantly influenced by inflammatory responses and immune modulation ( 11 ). There is a complex association between BD and immune inflammation mechanisms. Meta-analyses have shown increased levels of inflammatory markers, including C-reactive protein, tumor necrosis factor (TNF)-α, and interleukin (IL)-6, in individuals with BD ( 15 ). However, one study indicated that inflammatory markers such as interleukin-1 receptor antagonists and TNF-α may vary based on the emotional state of BD, such as normal mood, manic episodes, or depression ( 16 ). The longitudinal study has reported that elevated baseline levels of circulating inflammatory markers are associated with an increased risk of developing BD during follow-up ( 17 ). Additionally, a MR study has provided weak evidence suggesting that IL-13 and IL-17 may have a protective effect on BD patients ( 18 ).
Previous studies have demonstrated a significant association between BD and lymphocytes. Significant differences in T cell subgroups exist between BD patients and healthy controls ( 19 ). Additionally, the percentages of total T cells, CD4+ T cells, and CD71+ B cells in BD patients were significantly higher than those in the healthy control group ( 20 ). Nevertheless, it is important to acknowledge that the cause-and-effect connection between T cell imbalance and BD remains uncertain ( 21 ). Furthermore, a large-scale cross-sectional survey suggests that the ratios of neutrophils to lymphocytes and monocytes to lymphocytes are associated with an increased risk of developing BD ( 22 ). Additionally, existing evidence indicates a higher prevalence of autoimmune disorders, such as autoimmune thyroiditis, among individuals with BD, as well as an increased risk of developing BD among those with autoimmune disorders ( 23 ). Furthermore, laboratory research findings indicate that lithium treatment is associated with an increased production of natural killer cells induced by circulating cytokines ( 24 ). Randomized controlled trials have demonstrated the efficacy of boosting T cell system therapy in addressing immune-inflammatory abnormalities linked to BD and improving antidepressant responsiveness ( 25 ). A review article suggests utilizing inflammatory and immune response indicators to monitor the progression of bipolar disorder in patients ( 26 ). However, most existing evidence is based on observational studies, and research on the relationship between BD and immune cells may be limited by confounding variables and reverse causality.
As one of the most heritable psychiatric disorders, BD is expected to see a substantial increase in the number of relevant genetic loci in future large-scale studies. This is due to advancements in technology and methodologies, along with the adoption of international consortia and extensive population biobanks, enhancing genetic prediction for BD ( 27 ). Additionally, evidence of familial heritability has been discovered for BD ( 28 ), focusing on candidate genes such as BDNF, CLOCK, COMT, and DAOA to explore the clinical genetics of BD ( 7 , 29 ). The review summarizes approaches aimed at identifying safer and more effective medications for individuals with BD through pharmacogenetics research ( 30 ). Hence, this study employs GWAS to investigate the complex interplay among individuals with BD, immune inflammation, and genetic predispositions, with the goal of identifying potential therapeutic targets to ameliorate BD symptoms.
MR is a method of causal inference that uses genetic variation as an instrumental variable. It utilizes the natural random allocation of genotypes to infer the effect of biological factors on disease phenotypes ( 31 ). MR methods have gained extensive application in causal inference within observational studies because genetic variations are inherent and less susceptible to common confounding factors such as environmental and social elements ( 32 ). Prior observational research has suggested an association between immune cell characteristics and BD. This study conducted a comprehensive two-sample MR analysis to investigate the causal relationship between immune cells and BD.
2.1 study design.
A two-sample MR analysis was used to assess the causal association between 731 immune cell characteristics (7 sets) and BD. Genetic variations are utilized as instrumental variables (IVs) to represent risk factors in MR. This approach requires three essential assumptions to establish causal inference. (1) The IVs in this study are genetic variations significantly related to exposure. (2) IVs are independent of both known and unknown confounding influences. (3) IVs impact the outcome solely through exposure, not through alternative pathways. Figure 1 illustrates the analysis workflow. The relevant institutional review board granted approval for this study.
Figure 1 . The design of bidirectional Mendelian randomization (MR) study by Figdraw .
The GWAS summary statistics for BD were obtained from the Psychiatric Genomics Consortium (PGC) ( 33 ). The study conducted GWAS on 413,466 individuals of European descent (Ncase = 41,917, Ncontrol = 371,549). BD patients met the international consensus criteria for lifelong BD (DSM-IV, ICD-9, or ICD-10). The GWAS meta-analysis identified 64 independent genetic loci associated with BD, 33 of which were newly discovered, demonstrating genome-wide significance ( P < 5 × 10 −8 ).
The GWAS summary statistics for each immunophenotype can be obtained from the GWAS catalog, with accession numbers ranging from GCST90001391 to GCST90002121, including a cohort of 3,757 Sardinians ( 34 ). A high-density array, based on a reference panel of Sardinian sequences ( 35 ), was used to estimate approximately 22 million SNPs and tested for correlation after adjusting for covariates such as age, age2, and sex. A total of 731 immunophenotypes were examined, comprising relative cell counts (192), morphologic parameters ( 32 ), absolute cell counts (118), and median fluorescence intensities representing surface antigen levels (389).
Based on recent studies, we set the threshold for SNPs related to immune cell phenotypes at P < 1 × 10 -5 ( 34 , 36 ). For SNPs related to BD, we used a threshold of P < 5 × 10 -8 ( 33 , 37 ). Additionally, we conducted a Linkage Disequilibrium check on the aforementioned SNPs (r² = 0.001 and kb = 10,000) ( 38 ). Each IV was assessed based on its F-statistic, and only those with F-statistics exceeding 10 were retained for subsequent analysis to ensure robustness and minimize bias from weak instrumental variables ( 39 ). SNP harmonization was conducted between the exposure and outcome datasets to maintain consistency in effect allele estimation. Furthermore, SNPs potentially confounded by other variables were filtered out utilizing the Phenoscanner V2 website ( http://www.phenoscanner.medschl.cam.ac.uk/ ).
To investigate the causal association between immune cell phenotypes and the risk of bipolar disorder, this study predominantly used the TwoSampleMR package in R (version 4.3.2). The main MR analysis methods were MR Egger, Weighted Median, Inverse Variance Weighted (IVW), Simple Mode, Weighted Mode, and MR-Presso. Among these, the IVW method served as the principal causal effect estimator in MR research due to its robust capability of detecting causality and yielding substantial testing efficacy ( 40 ).
In this study, we used a combination of MR-Egger and MR-Presso to assess horizontal pleiotropy ( P < 0.05 indicating horizontal pleiotropy). Heterogeneity of IVs was measured using Cochran’s IVW Q statistics ( P < 0.05 indicating heterogeneity). Furthermore, scatter plots and funnel plots were employed. Moreover, leave-one-out sensitivity analyses were conducted to determine if a single SNP influenced the observed causal relationship. Additionally, FDR correction was performed using the Bioladder web tool, as multiplex testing increases the likelihood of type 1 errors ( 41 ). To investigate reverse causation, identical techniques were applied to analyze the reverse MR of immune cell morphologies and bipolar disease. Results were considered highly significant when P < 0.01 ( 42 ). Additionally, based on prior research, an FDR < 0.2 was considered indicative of a causal relationship ( 43 ).
Initially, the causative impact of 731 immunophenotypes as exposure variables on BD was investigated, with findings displayed in Figure 2 . Before FDR correction, six immunophenotypes were found to impact the occurrence of BD ( P < 0.01). Specifically, significant negative associations with BD risk were observed for CD33br HLA DR+ CD14-AC on monocyte cells (OR=0.981, 95% CI: 0.971-0.991, P = 2.17E-04), CD8 on CD28+ CD45RA+ CD8br on Treg cells (OR=0.965, 95% CI: 0.943-0.987, P = 0.002), CD33br HLA DR+ AC on monocyte cells (OR=0.979, 95% CI: 0.965-0.993, P = 0.003), CD14 on CD14+ CD16+ monocyte (OR=0.933, 95% CI: 0.889-0.979, P = 0.005), and HVEM on CD45RA-CD4+ on maturation stages of T cells (OR=0.966, 95% CI: 0.942-0.991, P = 0.007), while IgD-CD27-% lymphocyte on B cells (OR=1.099, 95% CI: 1.051-1.149, P = 3.51E-05) showed a significant positive correlation. After adjusting the FDR to 0.2, the risk of BD remained significantly associated with IgD-CD27-% lymphocyte ( P = 3.51E-05, FDR = 0.026) and CD33br HLA DR+ CD14-AC ( P = 2.17E-04, FDR = 0.079).
Figure 2 . Forest plot showed the causal association between immune cell traits and BD. nsnp, nonsynonymous single-nucleotide polymorphism; OR, odds ratio; CI, confidence interval; PFDR, P value corrected by FDR.
Next, we conducted an analysis to examine the causal impact of BD as an exposure variable on 731 immunophenotypes. The results indicated that BD influences four specific immunophenotypes of monocytes, as shown in Figure 3 . BD exhibited a negative correlation with CD64 on CD14+ CD16+ monocyte (OR = 0.812, 95% CI: 0.713-0.925, P = 0.001), CD64 on monocyte (OR=0.834, 95% CI: 0.735-0.947, P = 0.005), CX3CR1 on CD14- CD16- (OR = 0.838, 95% CI: 0.740-0.950, P = 0.006), and CD64 on CD14+ CD16- monocyte (OR = 0.837, 95% CI: 0.738-0.950, P = 0.006). However, after FDR correction, no statistically significant results were observed.
Figure 3 . Forest plot showed the causal association between BD and immune cell traits. nsnp, nonsynonymous single-nucleotide polymorphism; OR, odds ratio; CI, confidence interval; PFDR, P value corrected by FDR.
In sections 3.1 and 3.2, the MR-Egger and MR-Presso methods indicated no evidence of horizontal pleiotropy, and the Cochran Q test showed no heterogeneity. These findings are presented in Supplementary Tables 1 , 2 . Subsequently, scatter plots ( Supplementary Figures 1 and 2 ) and funnel plots ( Supplementary Figures 3 and 4 ) further substantiated these findings. Furthermore, the leave-one-out analysis demonstrated the robustness of the MR results ( Supplementary Figures 5 and 6 ), as excluding any single SNP associated with immunophenotypes and BD did not significantly alter the overall findings. Additionally, all IDs with positive outcomes for immune cell morphologies were filtered out using the IVW technique ( P < 0.05). A heatmap was used to visually analyze the study findings ( Figure 4 ).
Figure 4 . The heatmap depicting the IDs of immune cell phenotypes with positive results and the p-values from the sensitivity analysis. The outer circle represents the IDs of immune cell phenotypes, while the inner circle uses different colors to indicate the p-values of different sensitivity analysis results. (A) IDs showing the causal effect of immune cells on BD; (B) IDs showing the causal effect of BD on immune cells.
Through a two-sample bidirectional MR study, we identified six immune cell phenotypes that exhibited a significant association with the risk of BD ( P < 0.01). Upon adjusting for FDR < 0.20, two of these immune cell morphologies remained associated with the risk of BD. In the reverse MR analysis, our findings suggested a potential causal relationship between BD and four monocyte immune cell phenotypes ( P < 0.01). However, after applying FDR correction, none of these associations retained statistical significance.
The current research indicates that B-cell activating factor and A proliferation-inducing ligand, which are critical growth factors for B cells and B cell-driven autoimmunity, exhibit aberrant plasma levels in BD patients. This suggests a pivotal role for B cells in BD ( 44 ). Neuroinflammation is one of the causes of BD ( 45 ), with B cells facilitating inflammatory responses by expressing inflammation-related genes and ribosomal protein genes ( 46 ). B cells can be categorized into four principal subsets based on the differential expression of the immunoglobulins IgD and CD27. IgD−CD27− B cells constitute a heterogeneous population of B cells, which have been shown to be associated with aging and systemic lupus erythematosus ( 47 ). Our findings suggest that IgD-CD27-% lymphocytes in B cells might confer a protective effect in BD patients. The CD27-IgD-B cell subset exhibited pronounced pro-inflammatory effects in nonagenarian cells, which were competent to produce pro-inflammatory cytokines (such as TNF, IL-6, and IL-8) ( 48 ). However, an analysis of immunosenescence markers and clinical characteristics in BD reveals a tenuous correlation between the percentage of late differentiated B cells (CD3‐CD19+IgD‐CD27‐) and immunological age factors ( 49 ). Consequently, the connection between CD27-IgD-cell subsets and BD warrants further exploration.
Treg cells are essential for modulating peripheral immune responses, monitoring the brain’s immune system, and facilitating neuroimmune interactions and coordination ( 50 ). Dysfunctional Tregs can lead to abnormal immune activation and neuroinflammation. Increased Treg levels are associated with symptom improvement in mental disorders, while decreased levels correlate with symptom worsening and heightened neuroinflammation ( 51 ). In BD patients, TLR-2 and other TLR signaling pathways impair Treg functionality, contributing to neuroinflammation and immune dysregulation ( 52 ). Meta-analysis evidence supports that heightened IL-10 levels are associated with increased Treg cells activity in BD ( 53 ). Conversely, literature reviews suggest that reductions in Tregs and diminished IL-10 secretion in BD may account for the elevated incidence of autoimmune disorders ( 54 ). These observations underscore the potential impact of Treg cells on BD. Our research delineates a negative association between CD8 on CD28+ CD45RA+ CD8br of Treg cells and BD. Emerging studies reveal that during different stages of BD, there is a marked decline in the functional capacities of both Treg cells and effector T cells. In individuals with BD, there is a pronounced reduction in Treg cells expressing CD152 and GARP, along with decreased numbers of CD4+ and CD8+ cells marked by CD71+ ( 55 ). CD4+ T cells secrete cytokines to facilitate immune responses, while CD8+ T cells execute cytotoxic activities. Their dysregulation may play a significant role in the pathophysiology of BD ( 21 ). Research further indicates that BD patients affected by childhood maltreatment exhibit elevated CD8+ T cells levels compared to their non-maltreated counterparts ( 56 ). Additionally, studies have shown correlations between modifications in brain structure and changes in CD4+ and CD8+ cells in individuals with BD ( 19 ). Moreover, research suggests that the latent state of HCMV may intensify the immune risk phenotypes associated with BD ( 55 ). This aligns with our findings that certain maturation stages of T cells, particularly HVEM expression on CD45RA-CD4+, constitute risk factors for BD.
The research identified CD33br HLA DR+ AC and CD33br HLA DR+ CD14-AC in myeloid cells as risk factors for BD. Previous studies have indicated that common genetic variations at the HLA loci are implicated in increasing susceptibility to BD ( 57 ). Moreover, gene expression patterns related to immature neutrophils, a component of myeloid cells, have been observed in peripheral blood cells from patients diagnosed with schizophrenia and bipolar disorder, indicating alterations in the innate immune system ( 58 ). Investigations into brain tissue have corroborated some genetic findings associated with BD, such as reduced expression of HLA-DPA1 ( 59 ), which encodes the α-chain, while HLA-DR encodes the β-chain, and these chains form the HLA class II molecule. Empirical evidence suggests that the percentages of CD11b/CD33+ hi and CD11b/CD33+ lo cells differed significantly between T carriers and healthy controls in patients with bipolar I disorder in the Han Chinese population ( 22 ). This suggests, for the first time, that myeloid-derived suppressor cells might play a role in patients with bipolar I disorder in the Han Chinese population. Thus, further exploration into the association between HLA-DR on myeloid cells and BD is warranted.
Additionally, this study reveals that CD14 expression on CD14+ CD16+ monocytes may impact BD patients, while BD may influence four phenotypic characteristics of monocytes: CD64 on CD14+ CD16- monocytes, CX3CR1 on CD14- CD16- monocytes, CD64 on monocytes, and CD64 on CD14+ CD16+ monocytes. A study using flow cytometry detected differential protein signaling in monocytes between treatment responders and non-responders ( 60 ). Pediatric BD patients displayed significantly higher monocyte counts compared to controls, suggesting potential clinical benefits of early BD detection through monocyte counts ( 61 ). Monocytes are classified into three categories based on surface markers CD14 and CD16: classical (CD14+CD16-), non-classical (CD14-CD16+), and intermediate (CD14+CD16+) ( 34 ). Monocytes exacerbate inflammation by releasing pro-inflammatory cytokines such as IL-10 and IL-15, and this inflammatory activation is closely linked to BD ( 62 ). It has been found that BD patients exhibit alterations in the expression of the monocyte marker CD14, and lithium treatment has been noted to exert an immunomodulatory effect on CD14 monocytes and dendritic cells in these patients ( 20 ). However, a study found no changes in CD64 mRNA expression among BD patients ( 63 ), presenting a possible contradiction to these observations.
In conclusion, the MR analysis has elucidated immune cell phenotypes associated with BD, highlighting immune system activation, inflammation, and bidirectional communication between the brain and the immune system. Neuroinflammation represents one of the key biological mechanisms in BD ( 64 ). Immune dysfunction may activate the hypothalamic-pituitary-adrenal axis through various pro-inflammatory cytokines and chemokines, thereby altering cerebral pathways associated with serotonin and catecholamines, leading to mood alterations ( 65 ). Furthermore, immune dysregulation may advance the progression of BD by impacting neurotransmitter synthesis, synaptic functionality, neural plasticity ( 61 ), brain regions, and neural circuits ( 66 ). Additionally, immune dysfunction might disrupt central nervous system operations by modifying the permeability of the blood-brain barrier ( 67 ). In turn, the brain can regulate immune responses by controlling the production and release of inflammatory mediators (such as IL-6 and TNF), activating microglia that influence oxidative stress, and modulating the autonomic nervous system ( 68 , 69 ). Recent studies indicate that epigenetic alterations, including DNA methylation, histone modification, and non-coding RNAs (e.g., miRNA and lncRNA), can induce gene expression variability ( 70 – 72 ). These modifications can be triggered by environmental factors like lifestyle and traumatic experiences, consequently activating pertinent genes and fostering the development of BD ( 72 ). Therefore, further investigation into the interplay between inflammation, the immune system, and the brain against a genetic backdrop is crucial for a deeper understanding of the pathophysiological mechanisms of BD and identifying novel therapeutic targets.
However, this research has several limitations. Firstly, due to constraints in the dataset, certain immune cell morphologies could not be analyzed. Secondly, this cohort study, based on a population from Sardinia, lacks stratification by gender and age, limiting the generalizability and accuracy of the conclusions. Finally, the selection of instrumental variables for immune cell phenotypes ( P < 1×10 −5 ) and the interpretation of the results (FDR < 0.2) lacked stringent rigor. Therefore, future research should incorporate larger sample sizes and employ more comprehensive causal analysis techniques (such as CAUSE) to further elucidate the relationship between immune cell phenotypes and BD.
This study reveals a causal relationship between immune cell phenotypes and bipolar disorder, providing new insights into the pathogenic mechanisms of BD and potential therapeutic targets.
The original contributions presented in the study are included in the article/ Supplementary Material . Further inquiries can be directed to the corresponding author.
Ethical review and approval was not required for the study of human participants in accordance with the local legislation and institutional requirements.
MW: Writing – review & editing, Methodology, Software, Writing – original draft. SW: Methodology, Writing – review & editing. GY: Methodology, Writing – original draft. MG: Writing – review & editing. XZ: Investigation, Writing – review & editing. ZC: Data curation, Writing – review & editing. DG: Funding acquisition, Supervision, Writing – review & editing.
The author(s) declare financial support was received for the research, authorship, and/or publication of this article. The study received funding from the National Natural Science Foundation of China (No. 81473558) and the Shandong Provincial Natural Science Foundation (No. ZR2022MH065).
Thanks to GWAS database contributors for sharing the data from the publicly accessible online platform.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpsyt.2024.1411280/full#supplementary-material
1. Husain SF, Tang TB, Tam WW, Tran BX, Ho CS, Ho RC. Cortical haemodynamic response during the verbal fluency task in patients with bipolar disorder and borderline personality disorder: a preliminary functional near-infrared spectroscopy study. BMC Psychiatry . (2021) 21:201. doi: 10.1186/s12888-021-03195-1
PubMed Abstract | Crossref Full Text | Google Scholar
2. Voelker R. What is bipolar disorder? JAMA . (2024) 331:894. doi: 10.1001/jama.2023.24844
3. Sole B, Jimenez E, Torrent C, Reinares M, Bonnin C, Torres I, et al. Cognitive impairment in bipolar disorder: treatment and prevention strategies. Int J Neuropsychopharmacol . (2017) 20:670–80. doi: 10.1093/ijnp/pyx032
4. Harrison PJ, Geddes JR, Tunbridge EM. The emerging neurobiology of bipolar disorder. Trends Neurosci . (2018) 41:18–30. doi: 10.1016/j.tins.2017.10.006
5. Dome P, Rihmer Z, Gonda X. Suicide risk in bipolar disorder: a brief review. Med (Kaunas) . (2019) 55. doi: 10.3390/medicina55080403
Crossref Full Text | Google Scholar
6. Rihmer Z, Gonda X, Döme P. The assessment and management of suicide risk in bipolar disorder. In: The treatment of bipolar disorder: Integrative clinical strategies and future directions . (New York, NY: Oxford University Press) (2017). p. 207–24.
Google Scholar
7. Freund N, Juckel G. Bipolar disorder: its etiology and how to model in rodents. Methods Mol Biol . (2019) 2011:61–77. doi: 10.1007/978-1-4939-9554-7_4
8. Oliveira J, Oliveira-Maia AJ, Tamouza R, Brown AS, Leboyer M. Infectious and immunogenetic factors in bipolar disorder. Acta Psychiatr Scand . (2017) 136:409–23. doi: 10.1111/acps.12791
9. Rybakowski JK. Lithium. Eur Neuropsychopharmacol . (2022) 57:86–7. doi: 10.1016/j.euroneuro.2022.01.111
10. Smith S, Falk A, Joseph R, Wilk A. Mood and anxiety disorders: bipolar disorder. FP Essent . (2023) 527:13–8.
PubMed Abstract | Google Scholar
11. Madireddy S, Madireddy S. Therapeutic interventions to mitigate mitochondrial dysfunction and oxidative stress-induced damage in patients with bipolar disorder. Int J Mol Sci . (2022) 23. doi: 10.3390/ijms23031844
12. Brietzke E, Rosa AR, Pedrini M, Noto MN, Kapczinski F, Scott J. Challenges and developments in research of the early stages of bipolar disorder. Braz J Psychiatry . (2016) 38:329–37. doi: 10.1590/1516-4446-2016-1975
13. Zhong Y, Chen Y, Su X, Wang M, Li Q, Shao Z, et al. Global, regional and national burdens of bipolar disorders in adolescents and young adults: a trend analysis from 1990 to 2019. Gen Psychiatr . (2024) 37. doi: 10.1136/gpsych-2023-101255
14. Bhatt S, Dhar AK, Samanta MK, Suttee A. Effects of current psychotropic drugs on inflammation and immune system. Adv Exp Med Biol . (2023) 1411:407–34. doi: 10.1007/978-981-19-7376-5_18
15. Solmi M, Suresh SM, Osimo EF, Fornaro M, Bortolato B, Croatto G, et al. Peripheral levels of c-reactive protein, tumor necrosis factor-alpha, interleukin-6, and interleukin-1beta across the mood spectrum in bipolar disorder: a meta-analysis of mean differences and variability. Brain Behav Immun . (2021) 97:193–203. doi: 10.1016/j.bbi.2021.07.014
16. Rowland T, Perry BI, Upthegrove R, Barnes N, Chatterjee J, Gallacher D, et al. Neurotrophins, cytokines, oxidative stress mediators and mood state in bipolar disorder: systematic review and meta-analyses. Br J Psychiatry . (2018) 213:514–25. doi: 10.1192/bjp.2018.144
17. Hayes JF, Khandaker GM, Anderson J, Mackay D, Zammit S, Lewis G, et al. Childhood interleukin-6, c-reactive protein and atopic disorders as risk factors for hypomanic symptoms in young adulthood: a longitudinal birth cohort study. Psychol Med . (2017) 47:23–33. doi: 10.1017/S0033291716001574
18. Perry BI, Upthegrove R, Kappelmann N, Jones PB, Burgess S, Khandaker GM. Associations of immunological proteins/traits with schizophrenia, major depression and bipolar disorder: a bi-directional two-sample mendelian randomization study. Brain Behav Immun . (2021) 97:176–85. doi: 10.1016/j.bbi.2021.07.009
19. Escelsior A, Inuggi A, Sterlini B, Bovio A, Marenco G, Bode J, et al. T-cell immunophenotype correlations with cortical thickness and white matter microstructure in bipolar disorder. J Affect Disord . (2024) 348:179–90. doi: 10.1016/j.jad.2023.12.054
20. Wu TN, Lee CS, Wu BJ, Sun HJ, Chang CH, Chen CY, et al. Immunophenotypes associated with bipolar disorder and lithium treatment. Sci Rep . (2019) 9:17453. doi: 10.1038/s41598-019-53745-7
21. Chen Z, Huang Y, Wang B, Peng H, Wang X, Wu H, et al. T cells: an emerging cast of roles in bipolar disorder. Transl Psychiatry . (2023) 13:153. doi: 10.1038/s41398-023-02445-y
22. Wei Y, Feng J, Ma J, Chen D, Chen J. Neutrophil/lymphocyte, platelet/lymphocyte and monocyte/lymphocyte ratios in patients with affective disorders. J Affect Disord . (2022) 309:221–8. doi: 10.1016/j.jad.2022.04.092
23. Li H, Hong W, Zhang C, Wu Z, Wang Z, Yuan C, et al. Il-23 and tgf-beta1 levels as potential predictive biomarkers in treatment of bipolar I disorder with acute manic episode. J Affect Disord . (2015) 174:361–6. doi: 10.1016/j.jad.2014.12.033
24. Furlan R, Melloni E, Finardi A, Vai B, Di Toro S, Aggio V, et al. Natural killer cells protect white matter integrity in bipolar disorder. Brain Behav Immun . (2019) 81:410–21. doi: 10.1016/j.bbi.2019.06.037
25. Poletti S, Zanardi R, Mandelli A, Aggio V, Finardi A, Lorenzi C, et al. Low-dose interleukin 2 antidepressant potentiation in unipolar and bipolar depression: safety, efficacy, and immunological biomarkers. Brain Behav Immun . (2024) 118:52–68. doi: 10.1016/j.bbi.2024.02.019
26. Sayana P, Colpo GD, Simoes LR, Giridharan VV, Teixeira AL, Quevedo J, et al. A systematic review of evidence for the role of inflammatory biomarkers in bipolar patients. J Psychiatr Res . (2017) 92:160–82. doi: 10.1016/j.jpsychires.2017.03.018
27. O’Connell KS, Coombes BJ. Genetic contributions to bipolar disorder: current status and future directions. Psychol Med . (2021) 51:2156–67. doi: 10.1017/S0033291721001252
28. Song J, Bergen SE, Kuja-Halkola R, Larsson H, Landen M, Lichtenstein P. Bipolar disorder and its relation to major psychiatric disorders: a family-based study in the swedish population. Bipolar Disord . (2015) 17:184–93. doi: 10.1111/bdi.12242
29. Ahmadi L, Kazemi NS, Behbahani P, Khajeddin N, Pourmehdi-Boroujeni M. Genetic variations of daoa (rs947267 and rs3918342) and comt genes (rs165599 and rs4680) in schizophrenia and bipolar i disorder. Basic Clin Neurosci . (2018) 9:429–38. doi: 10.32598/bcn.9.6.429
30. Gordovez F, Mcmahon FJ. The genetics of bipolar disorder. Mol Psychiatry . (2020) 25:544–59. doi: 10.1038/s41380-019-0634-7
31. Xue H, Liu S, Zeng L, Fan W. Causal effect of systemic lupus erythematosus on psychiatric disorders: a two-sample mendelian randomization study. J Affect Disord . (2024) 347:422–8. doi: 10.1016/j.jad.2023.11.033
32. Plotnikov D, Guggenheim JA. Mendelian randomisation and the goal of inferring causation from observational studies in the vision sciences. Ophthalmic Physiol Opt . (2019) 39:11–25. doi: 10.1111/opo.12596
33. Mullins N, Forstner AJ, O’Connell KS, Coombes B, Coleman J, Qiao Z, et al. Genome-wide association study of more than 40,000 bipolar disorder cases provides new insights into the underlying biology. Nat Genet . (2021) 53:817–29. doi: 10.1038/s41588-021-00857-4
34. Orru V, Steri M, Sidore C, Marongiu M, Serra V, Olla S, et al. Complex genetic signatures in immune cells underlie autoimmunity and inform therapy. Nat Genet . (2020) 52:1036–45. doi: 10.1038/s41588-020-0684-4
35. Sidore C, Busonero F, Maschio A, Porcu E, Naitza S, Zoledziewska M, et al. Genome sequencing elucidates sardinian genetic architecture and augments association analyses for lipid and blood inflammatory markers. Nat Genet . (2015) 47:1272–81. doi: 10.1038/ng.3368
36. Wang A, Zhang J. Causal role of immune cells in psoriasis: a mendelian randomization analysis. Front Immunol . (2024) 15:1326717. doi: 10.3389/fimmu.2024.1326717
37. Hu Y, Xiong Z, Huang P, He W, Zhong M, Zhang D, et al. Association of mental disorders with sepsis: a bidirectional mendelian randomization study. Front Public Health . (2024) 12:1327315. doi: 10.3389/fpubh.2024.1327315
38. Ma Z, Zhao M, Zhao H, Qu N. Causal role of immune cells in generalized anxiety disorder: mendelian randomization study. Front Immunol . (2023) 14:1338083. doi: 10.3389/fimmu.2023.1338083
39. Sun K, Gao Y, Wu H, Huang X. The causal relationship between gut microbiota and type 2 diabetes: a two-sample mendelian randomized study. Front Public Health . (2023) 11:1255059. doi: 10.3389/fpubh.2023.1255059
40. Bowden J, Del GMF, Minelli C, Davey SG, Sheehan N, Thompson J. A framework for the investigation of pleiotropy in two-sample summary data mendelian randomization. Stat Med . (2017) 36:1783–802. doi: 10.1002/sim.7221
41. Burgess S, Bowden J, Fall T, Ingelsson E, Thompson SG. Sensitivity analyses for robust causal inference from mendelian randomization analyses with multiple genetic variants. Epidemiology . (2017) 28:30–42. doi: 10.1097/EDE.0000000000000559
42. Aru N, Yang C, Chen Y, Liu J. Causal association of immune cells and polycystic ovarian syndrome: a mendelian randomization study. Front Endocrinol (Lausanne) . (2023) 14:1326344. doi: 10.3389/fendo.2023.1326344
43. Wang C, Zhu D, Zhang D, Zuo X, Yao L, Liu T, et al. Causal role of immune cells in schizophrenia: mendelian randomization (mr) study. BMC Psychiatry . (2023) 23:590. doi: 10.1186/s12888-023-05081-4
44. Engh JA, Ueland T, Agartz I, Andreou D, Aukrust P, Boye B, et al. Plasma levels of the cytokines b cell-activating factor (baff) and a proliferation-inducing ligand (april) in schizophrenia, bipolar, and major depressive disorder: a cross sectional, multisite study. Schizophr Bull . (2022) 48:37–46. doi: 10.1093/schbul/sbab106
45. Rantala MJ, Luoto S, Borraz-Leon JI, Krams I. Bipolar disorder: an evolutionary psychoneuroimmunological approach. Neurosci Biobehav Rev . (2021) 122:28–37. doi: 10.1016/j.neubiorev.2020.12.031
46. Qi L, Qiu Y, Li S, Yi N, Li C, Teng Z, et al. Single-cell immune profiling reveals broad anti-inflammation response in bipolar disorder patients with quetiapine and valproate treatment. iScience . (2023) 26:107057. doi: 10.1016/j.isci.2023.107057
47. Beckers L, Somers V, Fraussen J. IgD (-) CD27(-) double negative (dn) b cells: origins and functions in health and disease. Immunol Lett . (2023) 255:67–76. doi: 10.1016/j.imlet.2023.03.003
48. Nevalainen T, Autio A, Kummola L, Salomaa T, Junttila I, Jylha M, et al. Cd27- igd- b cell memory subset associates with inflammation and frailty in elderly individuals but only in males. Immun Ageing . (2019) 16:19. doi: 10.1186/s12979-019-0159-6
49. Rizzo LB, Swardfager W, Maurya PK, Graiff MZ, Pedrini M, Asevedo E, et al. An immunological age index in bipolar disorder: a confirmatory factor analysis of putative immunosenescence markers and associations with clinical characteristics. Int J Methods Psychiatr Res . (2018) 27:e1614. doi: 10.1002/mpr.1614
50. Debnath M, Raison CL, Maes M, Berk M. Role of the t-cell network in psychiatric disorders. Immuno-Psychiatry: Facts Prospects . (Cham, Switzerland: Springer Nature Switzerland AG) (2021), 109–32.
51. Corsi-Zuelli F, Deakin B, de Lima M, Qureshi O, Barnes NM, Upthegrove R, et al. T regulatory cells as a potential therapeutic target in psychosis? Current challenges and future perspectives. Brain Behav Immun Health . (2021) 17:100330. doi: 10.1016/j.bbih.2021.100330
52. Wieck A, Grassi-Oliveira R, Do PC, Viola TW, Petersen LE, Porto B, et al. Toll-like receptor expression and function in type I bipolar disorder. Brain Behav Immun . (2016) 54:110–21. doi: 10.1016/j.bbi.2016.01.011
53. Modabbernia A, Taslimi S, Brietzke E, Ashrafi M. Cytokine alterations in bipolar disorder: a meta-analysis of 30 studies. Biol Psychiatry . (2013) 74:15–25. doi: 10.1016/j.biopsych.2013.01.007
54. Qiu R, Zhou L, Ma Y, Zhou L, Liang T, Shi L, et al. Regulatory t cell plasticity and stability and autoimmune diseases. Clin Rev Allergy Immunol . (2020) 58:52–70. doi: 10.1007/s12016-018-8721-0
55. Maes M, Nani JV, Noto C, Rizzo L, Hayashi M, Brietzke E. Impairments in peripheral blood t effector and t regulatory lymphocytes in bipolar disorder are associated with staging of illness and anti-cytomegalovirus igg levels. Mol Neurobiol . (2021) 58:229–42. doi: 10.1007/s12035-020-02110-1
56. Foiselle M, Lajnef M, Hamdani N, Boukouaci W, Wu CL, Naamoune S, et al. Immune cell subsets in patients with bipolar disorder or schizophrenia with history of childhood maltreatment. Brain Behav Immun . (2023) 112:42–50. doi: 10.1016/j.bbi.2023.05.015
57. Purcell SM, Wray NR, Stone JL, Visscher PM, O’Donovan MC, Sullivan PF, et al. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature . (2009) 460:748–52. doi: 10.1038/nature08185
58. Torsvik A, Brattbakk HR, Trentani A, Holdhus R, Stansberg C, Bartz-Johannessen CA, et al. Patients with schizophrenia and bipolar disorder display a similar global gene expression signature in whole blood that reflects elevated proportion of immature neutrophil cells with association to lipid changes. Transl Psychiatry . (2023) 13:147. doi: 10.1038/s41398-023-02442-1
59. Morgan LZ, Rollins B, Sequeira A, Byerley W, Delisi LE, Schatzberg AF, et al. Quantitative trait locus and brain expression of hla-dpa1 offers evidence of shared immune alterations in psychiatric disorders. Microarrays (Basel) . (2016) 5. doi: 10.3390/microarrays5010006
60. Gao K, Ayati M, Koyuturk M, Calabrese JR, Ganocy SJ, Kaye NM, et al. Protein biomarkers in monocytes and cd4(+) lymphocytes for predicting lithium treatment response of bipolar disorder: a feasibility study with tyramine-based signal-amplified flow cytometry. Psychopharmacol Bull . (2022) 52:8–35.
61. Ceylan MF, Tural HS, Kasak M, Senat A, Erel O. Increased prolidase activity and high blood monocyte counts in pediatric bipolar disorder. Psychiatry Res . (2019) 271:360–4. doi: 10.1016/j.psychres.2018.11.066
62. Hughes HK, Yang H, Lesh TA, Carter CS, Ashwood P. Evidence of innate immune dysfunction in first-episode psychosis patients with accompanying mood disorder. J Neuroinflammation . (2022) 19:287. doi: 10.1186/s12974-022-02648-y
63. North HF, Weissleder C, Fullerton JM, Sager R, Webster MJ, Weickert CS. A schizophrenia subgroup with elevated inflammation displays reduced microglia, increased peripheral immune cell and altered neurogenesis marker gene expression in the subependymal zone. Transl Psychiatry . (2021) 11:635. doi: 10.1038/s41398-021-01742-8
64. Jakobsson J, Bjerke M, Sahebi S, Isgren A, Ekman CJ, Sellgren C, et al. Monocyte and microglial activation in patients with mood-stabilized bipolar disorder. J Psychiatry Neurosci . (2015) 40:250–8. doi: 10.1503/jpn.140183
65. Barichello T, Giridharan VV, Bhatti G, Sayana P, Doifode T, Macedo D, et al. Inflammation as a mechanism of bipolar disorder neuroprogression. Curr Top Behav Neurosci . (2021) 48:215–37. doi: 10.1007/7854_2020_173
66. Goldsmith DR, Bekhbat M, Mehta ND, Felger JC. Inflammation-related functional and structural dysconnectivity as a pathway to psychopathology. Biol Psychiatry . (2023) 93:405–18. doi: 10.1016/j.biopsych.2022.11.003
67. Zhao NO, Topolski N, Tusconi M, Salarda EM, Busby CW, Lima C, et al. Blood-brain barrier dysfunction in bipolar disorder: molecular mechanisms and clinical implications. Brain Behav Immun Health . (2022) 21:100441. doi: 10.1016/j.bbih.2022.100441
68. Pape K, Tamouza R, Leboyer M, Zipp F. Immunoneuropsychiatry - novel perspectives on brain disorders. Nat Rev Neurol . (2019) 15:317–28. doi: 10.1038/s41582-019-0174-4
69. Rosenblat JD, Mcintyre RS. Bipolar disorder and immune dysfunction: epidemiological findings, proposed pathophysiology and clinical implications. Brain Sci . (2017) 7. doi: 10.3390/brainsci7110144
70. Cattane N, Courtin C, Mombelli E, Maj C, Mora C, Etain B, et al. Transcriptomics and mirnomics data integration in lymphoblastoid cells highlights the key role of immune-related functions in lithium treatment response in bipolar disorder. BMC Psychiatry . (2022) 22:665. doi: 10.1186/s12888-022-04286-3
71. Misiak B, Ricceri L, Sasiadek MM. Transposable elements and their epigenetic regulation in mental disorders: current evidence in the field. Front Genet . (2019) 10:580. doi: 10.3389/fgene.2019.00580
72. Fries GR, Walss-Bass C, Bauer ME, Teixeira AL. Revisiting inflammation in bipolar disorder. Pharmacol Biochem Behav . (2019) 177:12–9. doi: 10.1016/j.pbb.2018.12.006
Keywords: bipolar disorder, immune cells, Mendelian randomization study, causal association, SNP
Citation: Wang M, Wang S, Yuan G, Gao M, Zhao X, Chu Z and Gao D (2024) Causal role of immune cells in bipolar disorder: a Mendelian randomization study. Front. Psychiatry 15:1411280. doi: 10.3389/fpsyt.2024.1411280
Received: 02 April 2024; Accepted: 30 July 2024; Published: 16 August 2024.
Reviewed by:
Copyright © 2024 Wang, Wang, Yuan, Gao, Zhao, Chu and Gao. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Dongmei Gao, [email protected]
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.
You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.
All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .
Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.
Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.
Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.
Original Submission Date Received: .
Find support for a specific problem in the support section of our website.
Please let us know what you think of our products and services.
Visit our dedicated information section to learn more about MDPI.
Zinc and diabetes: a connection between micronutrient and metabolism.
2. objective of the study and methodology, 3. sources of zinc, 4. zinc regulation, 5. zinc and inflammation, 6. zinc and glucose metabolism, 7. zinc deficiency and diabetes: experimental studies, 8. zinc deficiency and diabetes: clinical studies, 9. zinc intervention and diabetes: clinical outcome studies, 10. conclusions, institutional review board statement, acknowledgments, conflicts of interest.
Click here to enlarge figure
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
Ahmad, R.; Shaju, R.; Atfi, A.; Razzaque, M.S. Zinc and Diabetes: A Connection between Micronutrient and Metabolism. Cells 2024 , 13 , 1359. https://doi.org/10.3390/cells13161359
Ahmad R, Shaju R, Atfi A, Razzaque MS. Zinc and Diabetes: A Connection between Micronutrient and Metabolism. Cells . 2024; 13(16):1359. https://doi.org/10.3390/cells13161359
Ahmad, Rahnuma, Ronald Shaju, Azeddine Atfi, and Mohammed S. Razzaque. 2024. "Zinc and Diabetes: A Connection between Micronutrient and Metabolism" Cells 13, no. 16: 1359. https://doi.org/10.3390/cells13161359
Article access statistics, further information, mdpi initiatives, follow mdpi.
Subscribe to receive issue release notifications and newsletters from MDPI journals
Grab your spot at the free arXiv Accessibility Forum
Help | Advanced Search
Title: an offline meta black-box optimization framework for adaptive design of urban traffic light management systems.
Abstract: Complex urban road networks with high vehicle occupancy frequently face severe traffic congestion. Designing an effective strategy for managing multiple traffic lights plays a crucial role in managing congestion. However, most current traffic light management systems rely on human-crafted decisions, which may not adapt well to diverse traffic patterns. In this paper, we delve into two pivotal design components of the traffic light management system that can be dynamically adjusted to various traffic conditions: phase combination and phase time allocation. While numerous studies have sought an efficient strategy for managing traffic lights, most of these approaches consider a fixed traffic pattern and are limited to relatively small road networks. To overcome these limitations, we introduce a novel and practical framework to formulate the optimization of such design components using an offline meta black-box optimization. We then present a simple yet effective method to efficiently find a solution for the aforementioned problem. In our framework, we first collect an offline meta dataset consisting of pairs of design choices and corresponding congestion measures from various traffic patterns. After collecting the dataset, we employ the Attentive Neural Process (ANP) to predict the impact of the proposed design on congestion across various traffic patterns with well-calibrated uncertainty. Finally, Bayesian optimization, with ANP as a surrogate model, is utilized to find an optimal design for unseen traffic patterns through limited online simulations. Our experiment results show that our method outperforms state-of-the-art baselines on complex road networks in terms of the number of waiting vehicles. Surprisingly, the deployment of our method into a real-world traffic system was able to improve traffic throughput by 4.80\% compared to the original strategy.
Comments: | 12 pages, 7 figures, 10 tables |
Subjects: | Machine Learning (cs.LG); Artificial Intelligence (cs.AI) |
Cite as: | [cs.LG] |
(or [cs.LG] for this version) | |
Focus to learn more arXiv-issued DOI via DataCite |
Access paper:.
Code, data and media associated with this article, recommenders and search tools.
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .
IMAGES
COMMENTS
Note that the apparent difference between A and B for BMI disappear once proper randomization of subjects was accomplished. In conclusion, a random sample is an approach to experimental design that helps to reduce the influence other factors may have on the outcome variable (e.g., change in blood pressure after 16 weeks of exercise).
Randomization is an essential component of an experimental design in general and clinical trials in particular. ... Several misconceptions on the role of randomization and balance in clinical trials were documented and discussed by Senn . One common misunderstanding is that balance of prognostic covariates is necessary for valid inference.
Single-case experimental designs are rapidly growing in popularity. This popularity needs to be accompanied by transparent and well-justified methodological and statistical decisions. Appropriate ...
Random sampling (also called probability sampling or random selection) is a way of selecting members of a population to be included in your study. In contrast, random assignment is a way of sorting the sample participants into control and experimental groups. While random sampling is used in many types of studies, random assignment is only used ...
Single-case experimental designs (SCEDs) involve repeat measurements of the dependent variable under different experimental conditions within a single case, for example, a patient or a classroom 1 ...
A general review is given of the role of randomization in experimental design. Three objectives are distinguished, the avoidance of bias, the establishment of a secure base for the estimation of ... Randomization in Experimental Design 417 3 Second-Order Theory 3.1 Formulation The usual explicit formulation, stemming from Neyman (1923), takes ...
The idea sounds so simple that defining it becomes almost a joke: randomisation is "putting participants into the treatment groups randomly". If only it were that simple. Randomisation can be a minefield, and not everyone understands what exactly it is or why they are doing it. A key feature of a randomised controlled trial is that it is ...
We introduce the statistical design of experiments and put the topic into the larger context of scientific experimentation. We give a non-technical discussion of some key ideas of experimental design, including the role of randomization, replication, and the basic idea of blocking for increasing precision and power.
Various research designs can be used to acquire scientific medical evidence. The randomized controlled trial (RCT) has been recognized as the most credible research design for investigations of the clinical effectiveness of new medical interventions [1, 2].Evidence from RCTs is widely used as a basis for submissions of regulatory dossiers in request of marketing authorization for new drugs ...
The approximate randomization theory associated with analysis of covariance is outlined and conditionality considerations are used to explain the limited role of randomization in experiments with very small numbers of experimental units. The relation between the so-called design-based and model-based analyses is discussed.
Permuted block randomization is a way to randomly allocate a participant to a treatment group, while keeping a balance across treatment groups. Each "block" has a specified number of randomly ordered treatment assignments. 3. Stratified Random Sampling. Stratified random sampling is useful when you can subdivide areas.
Usually, statistics textbooks introduce the core aspects of experimental design as the three key elements, the four principles and the design types, which run through the whole scientific research design and determine the overall success of the research. This article discusses the principle of randomization, which is one of the four principles ...
Randomization tools are typically included in study design software, and, for in vivo research, the most noteworthy example is the NC3Rs' Experimental Design Assistant (www.eda.nc3rs.org.uk). This freely available online resource allows to generate and share a spreadsheet with the randomized allocation report after the study has been designed ...
In a randomized experimental design, objects or individuals are randomly assigned (by chance) to an experimental group. Using randomization is the most reliable method of creating homogeneous treatment groups, without involving any potential biases or judgments. There are several variations of randomized experimental designs, two of which are ...
Let's delve into the pivotal role randomization plays and its overarching importance in maintaining the rigor of experimental endeavors. Eliminating Bias: ... The central tenets of experimental design—control, randomization, replication—though fundamental, are being complemented by sophisticated techniques that ensure richer insights and ...
Randomization as a method of experimental control has been extensively used in human clinical trials and other biological experiments. It prevents the selection bias and insures against the accidental bias. ... Thus, the ideal way of balancing covariates among groups is to apply sound randomization in the design stage of a clinical research ...
Statistical design of experiments plays a prominent role in ensuring internal validity, and we briefly discuss the main ideas before providing the technical details and an application to our example in the subsequent sections. ... In a completely randomized design, each experimental unit has the same chance of being subjected to any of the ...
How to analyze a blocked design in JMP (Method 2) Open fit model tab. Enter y-variable. Add treatment, block and -if desired- treatment x block to "effects". Click on block in effects box and change attributes to random. Change Method option to EMS (not REML)
The randomized design ensures that the conclusions drawn are robust and reliable, providing a solid foundation for the development of policies and initiatives. As environmental concerns continue to mount, the role of reliable experimental designs like CRD in facilitating meaningful research and informed policy-making cannot be overstated.
Randomizing the experiments helps you get the best cause-effect relationships between the variables. It makes sure that the random selection is done from all genders, casts, races and the groups are not too different from each other. Researchers control values of the explanatory variable with a randomization procedure.
Random assignment of treatments is an essential feature of experimental design in general and clinical trials in particular. It provides broad comparability of treatment groups and validates the use of statistical methods for the analysis of results. Various devices are available for improving the balance of prognostic factors across treatment ...
impossible. Experiments without random assignment should nonetheless be understood to be experiments because they possess most of the strengths of a classical experimental design. Foregoing the benefits of random assignment of respondents is not a trivial loss because random assignment is an important contributor to the strong internal validity ...
Keywords: bipolar disorder, immune cells, Mendelian randomization study, causal association, SNP. Citation: Wang M, Wang S, Yuan G, Gao M, Zhao X, Chu Z and Gao D (2024) Causal role of immune cells in bipolar disorder: a Mendelian randomization study. Front. Psychiatry 15:1411280. doi: 10.3389/fpsyt.2024.1411280
Although additional large-scale randomized clinical trials are needed to establish zinc's clinical utility further, efforts should be made to increase awareness of its potential benefits on human health and disease. ... By comprehensively analyzing the available evidence from both human and experimental studies, the critical roles of zinc in ...
Complex urban road networks with high vehicle occupancy frequently face severe traffic congestion. Designing an effective strategy for managing multiple traffic lights plays a crucial role in managing congestion. However, most current traffic light management systems rely on human-crafted decisions, which may not adapt well to diverse traffic patterns. In this paper, we delve into two pivotal ...