what is statistical treatment in research paper

Community Blog

Keep up-to-date on postgraduate related issues with our quick reads written by students, postdocs, professors and industry leaders.

Statistical Treatment of Data – Explained & Example

Picture of DiscoverPhDs

  • By DiscoverPhDs
  • September 8, 2020

Statistical Treatment of Data in Research

‘Statistical treatment’ is when you apply a statistical method to a data set to draw meaning from it. Statistical treatment can be either descriptive statistics, which describes the relationship between variables in a population, or inferential statistics, which tests a hypothesis by making inferences from the collected data.

Introduction to Statistical Treatment in Research

Every research student, regardless of whether they are a biologist, computer scientist or psychologist, must have a basic understanding of statistical treatment if their study is to be reliable.

This is because designing experiments and collecting data are only a small part of conducting research. The other components, which are often not so well understood by new researchers, are the analysis, interpretation and presentation of the data. This is just as important, if not more important, as this is where meaning is extracted from the study .

What is Statistical Treatment of Data?

Statistical treatment of data is when you apply some form of statistical method to a data set to transform it from a group of meaningless numbers into meaningful output.

Statistical treatment of data involves the use of statistical methods such as:

  • regression,
  • conditional probability,
  • standard deviation and
  • distribution range.

These statistical methods allow us to investigate the statistical relationships between the data and identify possible errors in the study.

In addition to being able to identify trends, statistical treatment also allows us to organise and process our data in the first place. This is because when carrying out statistical analysis of our data, it is generally more useful to draw several conclusions for each subgroup within our population than to draw a single, more general conclusion for the whole population. However, to do this, we need to be able to classify the population into different subgroups so that we can later break down our data in the same way before analysing it.

Statistical Treatment Example – Quantitative Research

Statistical Treatment of Data Example

For a statistical treatment of data example, consider a medical study that is investigating the effect of a drug on the human population. As the drug can affect different people in different ways based on parameters such as gender, age and race, the researchers would want to group the data into different subgroups based on these parameters to determine how each one affects the effectiveness of the drug. Categorising the data in this way is an example of performing basic statistical treatment.

Type of Errors

A fundamental part of statistical treatment is using statistical methods to identify possible outliers and errors. No matter how careful we are, all experiments are subject to inaccuracies resulting from two types of errors: systematic errors and random errors.

Systematic errors are errors associated with either the equipment being used to collect the data or with the method in which they are used. Random errors are errors that occur unknowingly or unpredictably in the experimental configuration, such as internal deformations within specimens or small voltage fluctuations in measurement testing instruments.

These experimental errors, in turn, can lead to two types of conclusion errors: type I errors and type II errors . A type I error is a false positive which occurs when a researcher rejects a true null hypothesis. On the other hand, a type II error is a false negative which occurs when a researcher fails to reject a false null hypothesis.

What is a Research Instrument?

The term research instrument refers to any tool that you may use to collect, measure and analyse research data.

How to impress a PhD supervisor

Learn 10 ways to impress a PhD supervisor for increasing your chances of securing a project, developing a great working relationship and more.

Productive working

Learn more about using cloud storage effectively, video conferencing calling, good note-taking solutions and online calendar and task management options.

Join thousands of other students and stay up to date with the latest PhD programmes, funding opportunities and advice.

what is statistical treatment in research paper

Browse PhDs Now

what is statistical treatment in research paper

This post explains where and how to write the list of figures in your thesis or dissertation.

Scope and Delimitation

The scope and delimitations of a thesis, dissertation or paper define the topic and boundaries of a research problem – learn how to form them.

what is statistical treatment in research paper

Rebecca recently finished her PhD at the University of York. Her research investigated the adaptations that occur in the symbiosis between the tsetse fly and its bacterial microbiome.

Debby Cotton_Profile

Prof Cotton gained her DPhil in the school of education at Oxford University. She is now the Director of Academic Practice and Professor of Higher Education at Plymouth Marjon University.

Join Thousands of Students

Research Paper Statistical Treatment of Data: A Primer

We can all agree that analyzing and presenting data effectively in a research paper is critical, yet often challenging.

This primer on statistical treatment of data will equip you with the key concepts and procedures to accurately analyze and clearly convey research findings.

You'll discover the fundamentals of statistical analysis and data management, the common quantitative and qualitative techniques, how to visually represent data, and best practices for writing the results - all framed specifically for research papers.

If you are curious on how AI can help you with statistica analysis for research, check Hepta AI .

Introduction to Statistical Treatment in Research

Statistical analysis is a crucial component of both quantitative and qualitative research. Properly treating data enables researchers to draw valid conclusions from their studies. This primer provides an introductory guide to fundamental statistical concepts and methods for manuscripts.

Understanding the Importance of Statistical Treatment

Careful statistical treatment demonstrates the reliability of results and ensures findings are grounded in robust quantitative evidence. From determining appropriate sample sizes to selecting accurate analytical tests, statistical rigor adds credibility. Both quantitative and qualitative papers benefit from precise data handling.

Objectives of the Primer

This primer aims to equip researchers with best practices for:

Statistical tools to apply during different research phases

Techniques to manage, analyze, and present data

Methods to demonstrate the validity and reliability of measurements

By covering fundamental concepts ranging from descriptive statistics to measurement validity, it enables both novice and experienced researchers to incorporate proper statistical treatment.

Navigating the Primer: Key Topics and Audience

The primer spans introductory topics including:

Research planning and design

Data collection, management, analysis

Result presentation and interpretation

While useful for researchers at any career stage, earlier-career scientists with limited statistical exposure will find it particularly valuable as they prepare manuscripts.

How do you write a statistical method in a research paper?

Statistical methods are a critical component of research papers, allowing you to analyze, interpret, and draw conclusions from your study data. When writing the statistical methods section, you need to provide enough detail so readers can evaluate the appropriateness of the methods you used.

Here are some key things to include when describing statistical methods in a research paper:

Type of Statistical Tests Used

Specify the types of statistical tests performed on the data, including:

Parametric vs nonparametric tests

Descriptive statistics (means, standard deviations)

Inferential statistics (t-tests, ANOVA, regression, etc.)

Statistical significance level (often p < 0.05)

For example: We used t-tests and one-way ANOVA to compare means across groups, with statistical significance set at p < 0.05.

Analysis of Subgroups

If you examined subgroups or additional variables, describe the methods used for these analyses.

For example: We stratified data by gender and used chi-square tests to analyze differences between subgroups.

Software and Versions

List any statistical software packages used for analysis, including version numbers. Common programs include SPSS, SAS, R, and Stata.

For example: Data were analyzed using SPSS version 25 (IBM Corp, Armonk, NY).

The key is to give readers enough detail to assess the rigor and appropriateness of your statistical methods. The methods should align with your research aims and design. Keep explanations clear and concise using consistent terminology throughout the paper.

What are the 5 statistical treatment in research?

The five most common statistical treatments used in academic research papers include:

The mean, or average, is used to describe the central tendency of a dataset. It provides a singular value that represents the middle of a distribution of numbers. Calculating means allows researchers to characterize typical observations within a sample.

Standard Deviation

Standard deviation measures the amount of variability in a dataset. A low standard deviation indicates observations are clustered closely around the mean, while a high standard deviation signifies the data is more spread out. Reporting standard deviations helps readers contextualize means.

Regression Analysis

Regression analysis models the relationship between independent and dependent variables. It generates an equation that predicts changes in the dependent variable based on changes in the independents. Regressions are useful for hypothesizing causal connections between variables.

Hypothesis Testing

Hypothesis testing evaluates assumptions about population parameters based on statistics calculated from a sample. Common hypothesis tests include t-tests, ANOVA, and chi-squared. These quantify the likelihood of observed differences being due to chance.

Sample Size Determination

Sample size calculations identify the minimum number of observations needed to detect effects of a given size at a desired statistical power. Appropriate sampling ensures studies can uncover true relationships within the constraints of resource limitations.

These five statistical analysis methods form the backbone of most quantitative research processes. Correct application allows researchers to characterize data trends, model predictive relationships, and make probabilistic inferences regarding broader populations. Expertise in these techniques is fundamental for producing valid, reliable, and publishable academic studies.

How do you know what statistical treatment to use in research?

The selection of appropriate statistical methods for the treatment of data in a research paper depends on three key factors:

The Aim and Objective of the Study

The aim and objectives that the study seeks to achieve will determine the type of statistical analysis required.

Descriptive research presenting characteristics of the data may only require descriptive statistics like measures of central tendency (mean, median, mode) and dispersion (range, standard deviation).

Studies aiming to establish relationships or differences between variables need inferential statistics like correlation, t-tests, ANOVA, regression etc.

Predictive modeling research requires methods like regression, discriminant analysis, logistic regression etc.

Thus, clearly identifying the research purpose and objectives is the first step in planning appropriate statistical treatment.

Type and Distribution of Data

The type of data (categorical, numerical) and its distribution (normal, skewed) also guide the choice of statistical techniques.

Parametric tests have assumptions related to normality and homogeneity of variance.

Non-parametric methods are distribution-free and better suited for non-normal or categorical data.

Testing data distribution and characteristics is therefore vital.

Nature of Observations

Statistical methods also differ based on whether the observations are paired or unpaired.

Analyzing changes within one group requires paired tests like paired t-test, Wilcoxon signed-rank test etc.

Comparing between two or more independent groups needs unpaired tests like independent t-test, ANOVA, Kruskal-Wallis test etc.

Thus the nature of observations is pivotal in selecting suitable statistical analyses.

In summary, clearly defining the research objectives, testing the collected data, and understanding the observational units guides proper statistical treatment and interpretation.

What is statistical techniques in research paper?

Statistical methods are essential tools in scientific research papers. They allow researchers to summarize, analyze, interpret and present data in meaningful ways.

Some key statistical techniques used in research papers include:

Descriptive statistics: These provide simple summaries of the sample and the measures. Common examples include measures of central tendency (mean, median, mode), measures of variability (range, standard deviation) and graphs (histograms, pie charts).

Inferential statistics: These help make inferences and predictions about a population from a sample. Common techniques include estimation of parameters, hypothesis testing, correlation and regression analysis.

Analysis of variance (ANOVA): This technique allows researchers to compare means across multiple groups and determine statistical significance.

Factor analysis: This technique identifies underlying relationships between variables and latent constructs. It allows reducing a large set of variables into fewer factors.

Structural equation modeling: This technique estimates causal relationships using both latent and observed factors. It is widely used for testing theoretical models in social sciences.

Proper statistical treatment and presentation of data are crucial for the integrity of any quantitative research paper. Statistical techniques help establish validity, account for errors, test hypotheses, build models and derive meaningful insights from the research.

Fundamental Concepts and Data Management

Exploring basic statistical terms.

Understanding key statistical concepts is essential for effective research design and data analysis. This includes defining key terms like:

Statistics : The science of collecting, organizing, analyzing, and interpreting numerical data to draw conclusions or make predictions.

Variables : Characteristics or attributes of the study participants that can take on different values.

Measurement : The process of assigning numbers to variables based on a set of rules.

Sampling : Selecting a subset of a larger population to estimate characteristics of the whole population.

Data types : Quantitative (numerical) or qualitative (categorical) data.

Descriptive vs. inferential statistics : Descriptive statistics summarize data while inferential statistics allow making conclusions from the sample to the larger population.

Ensuring Validity and Reliability in Measurement

When selecting measurement instruments, it is critical they demonstrate:

Validity : The extent to which the instrument measures what it intends to measure.

Reliability : The consistency of measurement over time and across raters.

Researchers should choose instruments aligned to their research questions and study methodology .

Data Management Essentials

Proper data management requires:

Ethical collection procedures respecting autonomy, justice, beneficence and non-maleficence.

Handling missing data through deletion, imputation or modeling procedures.

Data cleaning by identifying and fixing errors, inconsistencies and duplicates.

Data screening via visual inspection and statistical methods to detect anomalies.

Data Management Techniques and Ethical Considerations

Ethical data management includes:

Obtaining informed consent from all participants.

Anonymization and encryption to protect privacy.

Secure data storage and transfer procedures.

Responsible use of statistical tools free from manipulation or misrepresentation.

Adhering to ethical guidelines preserves public trust in the integrity of research.

Statistical Methods and Procedures

This section provides an introduction to key quantitative analysis techniques and guidance on when to apply them to different types of research questions and data.

Descriptive Statistics and Data Summarization

Descriptive statistics summarize and organize data characteristics such as central tendency, variability, and distributions. Common descriptive statistical methods include:

Measures of central tendency (mean, median, mode)

Measures of variability (range, interquartile range, standard deviation)

Graphical representations (histograms, box plots, scatter plots)

Frequency distributions and percentages

These methods help describe and summarize the sample data so researchers can spot patterns and trends.

Inferential Statistics for Generalizing Findings

While descriptive statistics summarize sample data, inferential statistics help generalize findings to the larger population. Common techniques include:

Hypothesis testing with t-tests, ANOVA

Correlation and regression analysis

Nonparametric tests

These methods allow researchers to draw conclusions and make predictions about the broader population based on the sample data.

Selecting the Right Statistical Tools

Choosing the appropriate analyses involves assessing:

The research design and questions asked

Type of data (categorical, continuous)

Data distributions

Statistical assumptions required

Matching the correct statistical tests to these elements helps ensure accurate results.

Statistical Treatment of Data for Quantitative Research

For quantitative research, common statistical data treatments include:

Testing data reliability and validity

Checking assumptions of statistical tests

Transforming non-normal data

Identifying and handling outliers

Applying appropriate analyses for the research questions and data type

Examples and case studies help demonstrate correct application of statistical tests.

Approaches to Qualitative Data Analysis

Qualitative data is analyzed through methods like:

Thematic analysis

Content analysis

Discourse analysis

Grounded theory

These help researchers discover concepts and patterns within non-numerical data to derive rich insights.

Data Presentation and Research Method

Crafting effective visuals for data presentation.

When presenting analyzed results and statistics in a research paper, well-designed tables, graphs, and charts are key for clearly showcasing patterns in the data to readers. Adhering to formatting standards like APA helps ensure professional data presentation. Consider these best practices:

Choose the appropriate visual type based on the type of data and relationship being depicted. For example, bar charts for comparing categorical data, line graphs to show trends over time.

Label the x-axis, y-axis, legends clearly. Include informative captions.

Use consistent, readable fonts and sizing. Avoid clutter with unnecessary elements. White space can aid readability.

Order data logically. Such as largest to smallest values, or chronologically.

Include clear statistical notations, like error bars, where applicable.

Following academic standards for visuals lends credibility while making interpretation intuitive for readers.

Writing the Results Section with Clarity

When writing the quantitative Results section, aim for clarity by balancing statistical reporting with interpretation of findings. Consider this structure:

Open with an overview of the analysis approach and measurements used.

Break down results by logical subsections for each hypothesis, construct measured etc.

Report exact statistics first, followed by interpretation of their meaning. For example, “Participants exposed to the intervention had significantly higher average scores (M=78, SD=3.2) compared to controls (M=71, SD=4.1), t(115)=3.42, p = 0.001. This suggests the intervention was highly effective for increasing scores.”

Use present verb tense. And scientific, formal language.

Include tables/figures where they aid understanding or visualization.

Writing results clearly gives readers deeper context around statistical findings.

Highlighting Research Method and Design

With a results section full of statistics, it's vital to communicate key aspects of the research method and design. Consider including:

Brief overview of study variables, materials, apparatus used. Helps reproducibility.

Descriptions of study sampling techniques, data collection procedures. Supports transparency.

Explanations around approaches to measurement, data analysis performed. Bolsters methodological rigor.

Noting control variables, attempts to limit biases etc. Demonstrates awareness of limitations.

Covering these methodological details shows readers the care taken in designing the study and analyzing the results obtained.

Acknowledging Limitations and Addressing Biases

Honestly recognizing methodological weaknesses and limitations goes a long way in establishing credibility within the published discussion section. Consider transparently noting:

Measurement errors and biases that may have impacted findings.

Limitations around sampling methods that constrain generalizability.

Caveats related to statistical assumptions, analysis techniques applied.

Attempts made to control/account for biases and directions for future research.

Rather than detracting value, acknowledging limitations demonstrates academic integrity regarding the research performed. It also gives readers deeper insight into interpreting the reported results and findings.

Conclusion: Synthesizing Statistical Treatment Insights

Recap of statistical treatment fundamentals.

Statistical treatment of data is a crucial component of high-quality quantitative research. Proper application of statistical methods and analysis principles enables valid interpretations and inferences from study data. Key fundamentals covered include:

Descriptive statistics to summarize and describe the basic features of study data

Inferential statistics to make judgments of the probability and significance based on the data

Using appropriate statistical tools aligned to the research design and objectives

Following established practices for measurement techniques, data collection, and reporting

Adhering to these core tenets ensures research integrity and allows findings to withstand scientific scrutiny.

Key Takeaways for Research Paper Success

When incorporating statistical treatment into a research paper, keep these best practices in mind:

Clearly state the research hypothesis and variables under examination

Select reliable and valid quantitative measures for assessment

Determine appropriate sample size to achieve statistical power

Apply correct analytical methods suited to the data type and distribution

Comprehensively report methodology procedures and statistical outputs

Interpret results in context of the study limitations and scope

Following these guidelines will bolster confidence in the statistical treatment and strengthen the research quality overall.

Encouraging Continued Learning and Application

As statistical techniques continue advancing, it is imperative for researchers to actively further their statistical literacy. Regularly reviewing new methodological developments and learning advanced tools will augment analytical capabilities. Persistently putting enhanced statistical knowledge into practice through research projects and manuscript preparations will cement competencies. Statistical treatment mastery is a journey requiring persistent effort, but one that pays dividends in research proficiency.

Avatar of Antonio Carlos Filho

Antonio Carlos Filho @acfilho_dev

what is statistical treatment in research paper

Statistical Analysis in Research: Meaning, Methods and Types

Home » Videos » Statistical Analysis in Research: Meaning, Methods and Types

The scientific method is an empirical approach to acquiring new knowledge by making skeptical observations and analyses to develop a meaningful interpretation. It is the basis of research and the primary pillar of modern science. Researchers seek to understand the relationships between factors associated with the phenomena of interest. In some cases, research works with vast chunks of data, making it difficult to observe or manipulate each data point. As a result, statistical analysis in research becomes a means of evaluating relationships and interconnections between variables with tools and analytical techniques for working with large data. Since researchers use statistical power analysis to assess the probability of finding an effect in such an investigation, the method is relatively accurate. Hence, statistical analysis in research eases analytical methods by focusing on the quantifiable aspects of phenomena.

What is Statistical Analysis in Research? A Simplified Definition

Statistical analysis uses quantitative data to investigate patterns, relationships, and patterns to understand real-life and simulated phenomena. The approach is a key analytical tool in various fields, including academia, business, government, and science in general. This statistical analysis in research definition implies that the primary focus of the scientific method is quantitative research. Notably, the investigator targets the constructs developed from general concepts as the researchers can quantify their hypotheses and present their findings in simple statistics.

When a business needs to learn how to improve its product, they collect statistical data about the production line and customer satisfaction. Qualitative data is valuable and often identifies the most common themes in the stakeholders’ responses. On the other hand, the quantitative data creates a level of importance, comparing the themes based on their criticality to the affected persons. For instance, descriptive statistics highlight tendency, frequency, variation, and position information. While the mean shows the average number of respondents who value a certain aspect, the variance indicates the accuracy of the data. In any case, statistical analysis creates simplified concepts used to understand the phenomenon under investigation. It is also a key component in academia as the primary approach to data representation, especially in research projects, term papers and dissertations. 

Most Useful Statistical Analysis Methods in Research

Using statistical analysis methods in research is inevitable, especially in academic assignments, projects, and term papers. It’s always advisable to seek assistance from your professor or you can try research paper writing by CustomWritings before you start your academic project or write statistical analysis in research paper. Consulting an expert when developing a topic for your thesis or short mid-term assignment increases your chances of getting a better grade. Most importantly, it improves your understanding of research methods with insights on how to enhance the originality and quality of personalized essays. Professional writers can also help select the most suitable statistical analysis method for your thesis, influencing the choice of data and type of study.

Descriptive Statistics

Descriptive statistics is a statistical method summarizing quantitative figures to understand critical details about the sample and population. A description statistic is a figure that quantifies a specific aspect of the data. For instance, instead of analyzing the behavior of a thousand students, research can identify the most common actions among them. By doing this, the person utilizes statistical analysis in research, particularly descriptive statistics.

  • Measures of central tendency . Central tendency measures are the mean, mode, and media or the averages denoting specific data points. They assess the centrality of the probability distribution, hence the name. These measures describe the data in relation to the center.
  • Measures of frequency . These statistics document the number of times an event happens. They include frequency, count, ratios, rates, and proportions. Measures of frequency can also show how often a score occurs.
  • Measures of dispersion/variation . These descriptive statistics assess the intervals between the data points. The objective is to view the spread or disparity between the specific inputs. Measures of variation include the standard deviation, variance, and range. They indicate how the spread may affect other statistics, such as the mean.
  • Measures of position . Sometimes researchers can investigate relationships between scores. Measures of position, such as percentiles, quartiles, and ranks, demonstrate this association. They are often useful when comparing the data to normalized information.

Inferential Statistics

Inferential statistics is critical in statistical analysis in quantitative research. This approach uses statistical tests to draw conclusions about the population. Examples of inferential statistics include t-tests, F-tests, ANOVA, p-value, Mann-Whitney U test, and Wilcoxon W test. This

Common Statistical Analysis in Research Types

Although inferential and descriptive statistics can be classified as types of statistical analysis in research, they are mostly considered analytical methods. Types of research are distinguishable by the differences in the methodology employed in analyzing, assembling, classifying, manipulating, and interpreting data. The categories may also depend on the type of data used.

Predictive Analysis

Predictive research analyzes past and present data to assess trends and predict future events. An excellent example of predictive analysis is a market survey that seeks to understand customers’ spending habits to weigh the possibility of a repeat or future purchase. Such studies assess the likelihood of an action based on trends.

Prescriptive Analysis

On the other hand, a prescriptive analysis targets likely courses of action. It’s decision-making research designed to identify optimal solutions to a problem. Its primary objective is to test or assess alternative measures.

Causal Analysis

Causal research investigates the explanation behind the events. It explores the relationship between factors for causation. Thus, researchers use causal analyses to analyze root causes, possible problems, and unknown outcomes.

Mechanistic Analysis

This type of research investigates the mechanism of action. Instead of focusing only on the causes or possible outcomes, researchers may seek an understanding of the processes involved. In such cases, they use mechanistic analyses to document, observe, or learn the mechanisms involved.

Exploratory Data Analysis

Similarly, an exploratory study is extensive with a wider scope and minimal limitations. This type of research seeks insight into the topic of interest. An exploratory researcher does not try to generalize or predict relationships. Instead, they look for information about the subject before conducting an in-depth analysis.

The Importance of Statistical Analysis in Research

As a matter of fact, statistical analysis provides critical information for decision-making. Decision-makers require past trends and predictive assumptions to inform their actions. In most cases, the data is too complex or lacks meaningful inferences. Statistical tools for analyzing such details help save time and money, deriving only valuable information for assessment. An excellent statistical analysis in research example is a randomized control trial (RCT) for the Covid-19 vaccine. You can download a sample of such a document online to understand the significance such analyses have to the stakeholders. A vaccine RCT assesses the effectiveness, side effects, duration of protection, and other benefits. Hence, statistical analysis in research is a helpful tool for understanding data.

Sources and links For the articles and videos I use different databases, such as Eurostat, OECD World Bank Open Data, Data Gov and others. You are free to use the video I have made on your site using the link or the embed code. If you have any questions, don’t hesitate to write to me!

Support statistics and data, if you have reached the end and like this project, you can donate a coffee to “statistics and data”..

Mashable is a global, multi-platform media and entertainment company.  For more queries and news contact us on this: mail:  mashablepartners@gmail. com

Copyright © 2024 Statistics and Data

U.S. flag

An official website of the United States government

The .gov means it's official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you're on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • Browse Titles

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2024 Jan-.

Cover of StatPearls

StatPearls [Internet].

Exploratory data analysis: frequencies, descriptive statistics, histograms, and boxplots.

Jacob Shreffler ; Martin R. Huecker .

Affiliations

Last Update: November 3, 2023 .

  • Definition/Introduction

Researchers must utilize exploratory data techniques to present findings to a target audience and create appropriate graphs and figures. Researchers can determine if outliers exist, data are missing, and statistical assumptions will be upheld by understanding data. Additionally, it is essential to comprehend these data when describing them in conclusions of a paper, in a meeting with colleagues invested in the findings, or while reading others’ work.

  • Issues of Concern

This comprehension begins with exploring these data through the outputs discussed in this article. Individuals who do not conduct research must still comprehend new studies, and knowledge of fundamentals in analyzing data and interpretation of histograms and boxplots facilitates the ability to appraise recent publications accurately. Without this familiarity, decisions could be implemented based on inaccurate delivery or interpretation of medical studies.

Frequencies and Descriptive Statistics

Effective presentation of study results, in presentation or manuscript form, typically starts with frequencies and descriptive statistics (ie, mean, medians, standard deviations). One can get a better sense of the variables by examining these data to determine whether a balanced and sufficient research design exists. Frequencies also inform on missing data and give a sense of outliers (will be discussed below).

Luckily, software programs are available to conduct exploratory data analysis. For this chapter, we will be examining the following research question.

RQ: Are there differences in drug life (length of effect) for Drug 23 based on the administration site?

A more precise hypothesis could be: Is drug 23 longer-lasting when administered via site A compared to site B?

To address this research question, exploratory data analysis is conducted. First, it is essential to start with the frequencies of the variables. To keep things simple, only variables of minutes (drug life effect) and administration site (A vs B) are included. See Image. Figure 1 for outputs for frequencies.

Figure 1 shows that the administration site appears to be a balanced design with 50 individuals in each group. The excerpt for minutes frequencies is the bottom portion of Figure 1 and shows how many cases fell into each time frame with the cumulative percent on the right-hand side. In examining Figure 1, one suspiciously low measurement (135) was observed, considering time variables. If a data point seems inaccurate, a researcher should find this case and confirm if this was an entry error. For the sake of this review, the authors state that this was an entry error and should have been entered 535 and not 135. Had the analysis occurred without checking this, the data analysis, results, and conclusions would have been invalid. When finding any entry errors and determining how groups are balanced, potential missing data is explored. If not responsibly evaluated, missing values can nullify results.  

After replacing the incorrect 135 with 535, descriptive statistics, including the mean, median, mode, minimum/maximum scores, and standard deviation were examined. Output for the research example for the variable of minutes can be seen in Figure 2. Observe each variable to ensure that the mean seems reasonable and that the minimum and maximum are within an appropriate range based on medical competence or an available codebook. One assumption common in statistical analyses is a normal distribution. Image . Figure 2 shows that the mode differs from the mean and the median. We have visualization tools such as histograms to examine these scores for normality and outliers before making decisions.

Histograms are useful in assessing normality, as many statistical tests (eg, ANOVA and regression) assume the data have a normal distribution. When data deviate from a normal distribution, it is quantified using skewness and kurtosis. [1]  Skewness occurs when one tail of the curve is longer. If the tail is lengthier on the left side of the curve (more cases on the higher values), this would be negatively skewed, whereas if the tail is longer on the right side, it would be positively skewed. Kurtosis is another facet of normality. Positive kurtosis occurs when the center has many values falling in the middle, whereas negative kurtosis occurs when there are very heavy tails. [2]

Additionally, histograms reveal outliers: data points either entered incorrectly or truly very different from the rest of the sample. When there are outliers, one must determine accuracy based on random chance or the error in the experiment and provide strong justification if the decision is to exclude them. [3]  Outliers require attention to ensure the data analysis accurately reflects the majority of the data and is not influenced by extreme values; cleaning these outliers can result in better quality decision-making in clinical practice. [4]  A common approach to determining if a variable is approximately normally distributed is converting values to z scores and determining if any scores are less than -3 or greater than 3. For a normal distribution, about 99% of scores should lie within three standard deviations of the mean. [5]  Importantly, one should not automatically throw out any values outside of this range but consider it in corroboration with the other factors aforementioned. Outliers are relatively common, so when these are prevalent, one must assess the risks and benefits of exclusion. [6]

Image . Figure 3 provides examples of histograms. In Figure 3A, 2 possible outliers causing kurtosis are observed. If values within 3 standard deviations are used, the result in Figure 3B are observed. This histogram appears much closer to an approximately normal distribution with the kurtosis being treated. Remember, all evidence should be considered before eliminating outliers. When reporting outliers in scientific paper outputs, account for the number of outliers excluded and justify why they were excluded.

Boxplots can examine for outliers, assess the range of data, and show differences among groups. Boxplots provide a visual representation of ranges and medians, illustrating differences amongst groups, and are useful in various outlets, including evidence-based medicine. [7]  Boxplots provide a picture of data distribution when there are numerous values, and all values cannot be displayed (ie, a scatterplot). [8]  Figure 4 illustrates the differences between drug site administration and the length of drug life from the above example.

Image . Figure 4 shows differences with potential clinical impact. Had any outliers existed (data from the histogram were cleaned), they would appear outside the line endpoint. The red boxes represent the middle 50% of scores. The lines within each red box represent the median number of minutes within each administration site. The horizontal lines at the top and bottom of each line connected to the red box represent the 25th and 75th percentiles. In examining the difference boxplots, an overlap in minutes between 2 administration sites were observed: the approximate top 25 percent from site B had the same time noted as the bottom 25 percent at site A. Site B had a median minute amount under 525, whereas administration site A had a length greater than 550. If there were no differences in adverse reactions at site A, analysis of this figure provides evidence that healthcare providers should administer the drug via site A. Researchers could follow by testing a third administration site, site C. Image . Figure 5 shows what would happen if site C led to a longer drug life compared to site A.

Figure 5 displays the same site A data as Figure 4, but something looks different. The significant variance at site C makes site A’s variance appear smaller. In order words, patients who were administered the drug via site C had a larger range of scores. Thus, some patients experience a longer half-life when the drug is administered via site C than the median of site A; however, the broad range (lack of accuracy) and lower median should be the focus. The precision of minutes is much more compacted in site A. Therefore, the median is higher, and the range is more precise. One may conclude that this makes site A a more desirable site.

  • Clinical Significance

Ultimately, by understanding basic exploratory data methods, medical researchers and consumers of research can make quality and data-informed decisions. These data-informed decisions will result in the ability to appraise the clinical significance of research outputs. By overlooking these fundamentals in statistics, critical errors in judgment can occur.

  • Nursing, Allied Health, and Interprofessional Team Interventions

All interprofessional healthcare team members need to be at least familiar with, if not well-versed in, these statistical analyses so they can read and interpret study data and apply the data implications in their everyday practice. This approach allows all practitioners to remain abreast of the latest developments and provides valuable data for evidence-based medicine, ultimately leading to improved patient outcomes.

  • Review Questions
  • Access free multiple choice questions on this topic.
  • Comment on this article.

Exploratory Data Analysis Figure 1 Contributed by Martin Huecker, MD and Jacob Shreffler, PhD

Exploratory Data Analysis Figure 2 Contributed by Martin Huecker, MD and Jacob Shreffler, PhD

Exploratory Data Analysis Figure 3 Contributed by Martin Huecker, MD and Jacob Shreffler, PhD

Exploratory Data Analysis Figure 4 Contributed by Martin Huecker, MD and Jacob Shreffler, PhD

Exploratory Data Analysis Figure 5 Contributed by Martin Huecker, MD and Jacob Shreffler, PhD

Disclosure: Jacob Shreffler declares no relevant financial relationships with ineligible companies.

Disclosure: Martin Huecker declares no relevant financial relationships with ineligible companies.

This book is distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) ( http://creativecommons.org/licenses/by-nc-nd/4.0/ ), which permits others to distribute the work, provided that the article is not altered or used commercially. You are not required to obtain permission to distribute this article, provided that you credit the author and journal.

  • Cite this Page Shreffler J, Huecker MR. Exploratory Data Analysis: Frequencies, Descriptive Statistics, Histograms, and Boxplots. [Updated 2023 Nov 3]. In: StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2024 Jan-.

In this Page

Bulk download.

  • Bulk download StatPearls data from FTP

Related information

  • PMC PubMed Central citations
  • PubMed Links to PubMed

Recent Activity

  • Exploratory Data Analysis: Frequencies, Descriptive Statistics, Histograms, and ... Exploratory Data Analysis: Frequencies, Descriptive Statistics, Histograms, and Boxplots - StatPearls

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

Connect with NLM

National Library of Medicine 8600 Rockville Pike Bethesda, MD 20894

Web Policies FOIA HHS Vulnerability Disclosure

Help Accessibility Careers

statistics

An official website of the United States government

Official websites use .gov A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS A lock ( Lock Locked padlock icon ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

  • Publications
  • Account settings
  • Advanced Search
  • Journal List

Cancer Reports logo

Evidence‐based statistical analysis and methods in biomedical research (SAMBR) checklists according to design features

Alok kumar dwivedi, rakesh shukla.

  • Author information
  • Article notes
  • Copyright and License information

Correspondence , Alok Kumar Dwivedi PhD, Division of Biostatistics and Epidemiology, Department of Molecular and Translational Medicine, Paul L. Foster School of Medicine, Texas Tech University Health Sciences Center El Paso, 5001 El Paso Drive, El Paso, TX 79905. Email: [email protected]

Corresponding author.

Received 2019 Apr 10; Revised 2019 Jun 11; Accepted 2019 Jul 16; Collection date 2020 Aug.

Statistical analysis according to design features and objectives is essential to ensure the validity and reliability of the study findings and conclusions in biomedical research. Heterogeneity in reporting study design elements and conducting statistical analyses is often observed for the same study design and study objective in medical literatures. Sometimes, researchers face a lot of predicaments using appropriate statistical approaches highlighted by methodologists for a specific study design either due to lack of accessibility or understanding of statistical methods or unavailability of checklists related to design and analysis in a concise format. The purpose of this review is to provide the checklist of statistical analysis and methods in biomedical research (SAMBR) to applied researchers.

Recent findings

We initially identified the important steps of reporting design features that may influence the choice of statistical analysis in biomedical research and essential steps of data analysis of common studies. We subsequently searched for statistical approaches employed for each study design/study objective available in publications and other resources. Compilation of these steps produced SAMBR guidance document, which includes three parts. Applied researchers can use part (A) and part (B) of SAMBR to describe or evaluate research design features and quality of statistical analysis, respectively, in reviewing studies or designing protocols. Part (C) of SAMBR can be used to perform essential and preferred evidence‐based data analysis specific to study design and objective.

Conclusions

We believe that the statistical methods checklists may improve reporting of research design, standardize methodological practices, and promote consistent application of statistical approaches, thus improving the quality of research studies. The checklists do not enforce the use of suggested statistical methods but rather highlight and encourage to conduct the best statistical practices. There is a need to develop an interactive web‐based application of the checklists for users for its wide applications.

Keywords: checklists, evidence‐based statistical practice, statistical analysis, statistical methods

1. INTRODUCTION

The overall quality and utility of biomedical research in generating proper evidence depend, in part, on the appropriate execution of research design, statistical methods, and interpretation of results and their quality reporting as well. Recently, a systematic review study identified nonadherence to standards of methodological approaches required by the Agency for Healthcare Research and Quality for research based on the National Inpatient Sample database even in high‐quality publications. 1 It is found that the appropriate use of methods and their standardized reporting helps in improving the quality of studies. 2 However, inconsistencies exist in methodological practices for similar study designs with the same objective/hypothesis. As a result, the quality of methodological standards in biomedical studies is often incredulous.

Guidelines and recommendations exist for assessing the quality of a study, or appropriate reporting and interpretation of results ( www.equator‐network.org ). Similarly, numerous statistical guidelines were developed for biomedical researchers to minimize misconduct of statistical approaches and improve the quality of biomedical studies. 3 , 4 , 5 , 6 However, these statistical guidelines mainly focus on improving the reporting of statistical methods used in studies. Unfortunately, guidance support is nonexistent for assessing best statistical practices of different types of studies as per the design features. Due to the lack of methodological standards checklist, misuse and abuse of statistical approaches in biomedical research have been noticed for a long time. 7 , 8

In recent years, novel statistical methods, computational program codes to analyze the complex problems, and statistical software for the ease of application of statistical methods and reporting have grown substantially. Numerous studies proposed alternative efficient and accurate approaches for specific study designs or distributional conditions and provided up‐to‐date statistical methods by comparing their performance on real data and extensive simulation studies. 9 , 10 However, the use of state‐of‐the‐art appropriate statistical methods for design and analysis of research studies is minimal in practice due to a lack of guidance for applied statisticians and applied researchers as recognized in the strengthening analytical thinking for observational studies (STRATOS). 11 For example, predictive intervals are computed and reported rarely in published meta‐analysis, 12 , 13 risk ratio models are rarely being used for the analysis of cross‐sectional or interventional studies even in high impact clinical journals, 14 , 15 , 16 inappropriate use and presentation of statistical modeling depending on the objective of the model building is common in published works, 17 inappropriate uses of graphs in animal studies and inappropriate interpretations of the results have also been noticed in biomedical studies as well. 18 Such examples and many more like these demonstrate that the use of appropriate statistical methods, accurate interpretations of results, and their reporting are not according to evidence‐based statistical methods and analysis. Thus, there is a need to develop checklists for evaluating the quality of statistical practices and a guidance document for promoting evidence‐based statistical analysis.

2. AIMS OF THE SAMBR

In the era of reproducible research, to increase the reproducibility, validity, and integrity of the research findings, we suggest following evidence‐based statistical practices in publications by use of appropriate statistical methods and their reporting relevant for specific objectives. Specifically, we (a) summarize the reporting elements of design features in studies to determine appropriate statistical analysis, (b) develop essential steps to be conducted in data analysis of common studies for promoting best statistical practices, and (c) provide evidence‐based essential and preferred choices of statistical methods for data analysis in different studies. Overall, the intention of the review is to provide checklists of statistical analysis and methods in biomedical research (SAMBR) according to specific objectives in different studies.

3. DEVELOPMENT OF THE CHECKLISTS

Initially, we identified the purpose and objectives of commonly employed study designs such as clinical trials, observational studies, and laboratory studies in biomedical research through various resources that may influence the choice of statistical analysis in studies. We also identified the essential steps to be followed in common studies to evaluate adherence to the best statistical practice in biomedical research. State‐of‐the‐art available statistical methods were identified for analyses (both unadjusted and adjusted and sensitivity) and reporting from high‐quality publications of biostatistics/epidemiology journals and other resources. The identified statistical methods were classified and linked with study designs and study objectives. When a clear choice did not exist, the decision was based on the qualitative evaluation of the statistical methods by comparing with other competing approaches in terms of statistical properties, assumptions, interpretation, and recommendations suggested by the researchers. The essential and preferred statistical procedures and appropriate references for employing each statistical method were provided under each study design and objective. Altogether, these procedures set the checklists for evidence‐based statistical analysis and their reporting for a specific study design in view of study purpose and objectives. Figure  1 shows the components of SAMBR and provides navigation to appropriate SAMBR checklist table as per study design and objective. Figure  2 summarizes the essential steps of data analysis according to common study designs/objectives.

Figure 1

Flow chart for selecting the appropriate checklist table specific to study design and objective

Figure 2

Flow diagram of checklists for common clinical studies. ITT, intention to treat analysis; PP, per protocol analysis; AT, as treated analysis; IV, instrument variable analysis; PSMA, propensity score matched analysis; PSS, propensity score stratified analysis; IPTW, inverse probability treatment weight analysis; IPTWRA, doubly robust inverse probability treatment weight and regression adjustment analysis; IV, instrument variable analysis; DAG, directed acyclic graph

4. COMPONENTS OF SAMBR

The SAMBR checklists have three parts as follows: Part (A) focuses on the reporting elements of common biomedical research designs, part (B) shows the essential steps to be followed in common studies for quality assessment of statistical analysis, and part (C) comprises the steps for data analysis and related essential and preferred choices of statistical approaches and methods for statistical analysis strictly linked to the type of research design and study objectives.

4.1. Part A: Research design

The items related to part (A) are displayed in Table  1 . Table  1 may help the investigators to describe the essential features of their study design and objectives of the study. The detail provided in Table  1 along with study setting, study population, eligibility criteria, and data collection procedures and methods can be used to develop materials and methods section of a study protocol.

Reporting elements for research design

4.2. Part B: General quality assessment tool for statistical analysis in common studies

Table  2 provides a tool to assess the quality of methodological standards in published studies. Our review identified 10 essential steps to be followed in data analysis and reporting for any common studies. Each of the 10 steps may be rated into no/low, medium, high adherence, or not applicable. The maximum number of items rated into moderate or high would indicate the good or excellent quality of the statistical analysis.

Statistical analysis and methods in biomedical research (SAMBR) checklist for assessing data analysis practice in biomedical studies

The explanation and use of each of the 10 steps in statistical analysis and reporting in biomedical studies are described in the following subsections:

4.2.1. Statistical analysis in view of study design, objective, and hypothesis

The choice of statistical methods and steps in data analysis is heavily linked with the study design features. The statistical analysis depends on study design type (randomized clinical trial [RCT], nonrandomized clinical trial [NRCT], and observational study), study design methods (matched study, two groups pre‐post study, cross‐over study, repeated measures study, etc), study hypothesis (superiority, non‐inferiority, equivalence), study purpose (inferential, predictive, or descriptive study), and type of outcome. RCTs mostly require adjustment of prognostic variables while observational studies require adjustment of confounding variables in multivariable analysis. Matched studies typically require paired data analysis compared with unmatched studies. The selection of screening variables in a multivariable model depends on the purpose of the multivariable model. The choice of statistical test used and design conditions (sampling design, level of significance, etc) applied in computing sample size and statistical power should be accounted for in primary data analysis. The research characteristics that may affect the choice of statistical analysis should be clearly described in research studies or publications.

4.2.2. Evidence‐based statistical methods

Variety of statistical tests with varying efficiencies are available for data analysis of a specific problem. Methodologists have been continuously making efforts to prioritize statistical methods in terms of their efficiency and power for their proper use, interpretation, and reporting in statistical analysis and results. The continuous growth in the development of advanced statistical methods suggests the use of evidence‐based state‐of‐the‐art statistical methods in data analysis. Superior statistical methods according to sample size and distributions of outcome and independent variables in the literature should be preferred for data analysis.

4.2.3. Eliminate known and unknown confounding effects or screen important variables that predict the outcome

In association studies, the efforts should be made for reducing confounding effects either at the designing phase through randomization, matching, restriction, stratification or at analysis phase through multivariable regression analysis or propensity score data analysis. In prediction studies, proper selection of variables is required for developing a parsimonious model. The use and reporting of statistical analysis should properly reflect such efforts.

4.2.4. Multivariable analysis for any studies by including factors that might confound or interact or predict the outcome

Exploration of interaction effect and inclusion of confounding or prognostic variables are mostly required in data analysis through multivariable regression analysis.

4.2.5. Assessment of the stability, validity, and robustness of the multivariable model

In inferential and descriptive models, the assessment of assumptions related to multivariable model and the stability of the developed model are critically important to draw appropriate inference while the selection of appropriate regression analysis in view of outcome distribution and the assessment of the validity of the developed model is critically important in predictive models.

4.2.6. Adjustment for the multiplicity of outcomes

Adjustment for the multiplicity of outcomes may be applied for studies with multiple outcomes in inferential and descriptive studies.

4.2.7. Reproducibility measures for statistical methods

The study should provide sufficient detail for statistical procedures as required by study objectives and study design. The study should also justify the robustness of statistical methods in view of study design features using evidence‐based statistical analysis and reporting practice. The presentation of study results should be based on used statistical methods, design, and objectives of the study.

4.2.8. Reproducibility measures for results

The study should provide some reproducibility measures for results by reporting the confidence interval, internal validity, and robustness of the findings using sensitivity and validation analysis.

4.2.9. Reproducibility measures for inference

The study should provide some reproducibility measures for inference by reporting predictive interval, external validation of the estimate/effect/prediction, heterogeneity analysis, or alternatives of the P value 19 as appropriate to ensure generalizability and accuracy of the inference made in the study.

4.2.10. Interpret the results in view of study design, limitations, and methods used for data analysis

The interpretation of study findings, its generalizability, and limitations should be made in view of study design features and statistical analysis after evaluating study setting, population, nature of the data, and accuracy of the results. Accordingly, the classification of significant or nonsignificant or conclusive or inconclusive findings should be made in the study.

4.3. Part C: Evidence‐based statistical analysis and methods as per study design and objective

Part (C) of the SAMBR is the extension of part (B) linking to statistical methods according to specific design and objectives. The choices of statistical procedures and their reporting are shown in Table  3 for an RCT. This table suggests that statistical analysis of an RCT should be according to a specific hypothesis and sub‐design of the study. Table  4 shows statistical procedures and evidence‐based suggested and preferred methods for NRCT. The statistical analysis of a nonrandomized study should demonstrate appropriate attempts to minimize known and unknown effects of confounding factors. Table  5 displays the steps involved in analyzing predictive studies and related statistical approaches. This table suggests that the appropriate selection of a model based on the distribution of outcome, the appropriate link function, and form of covariates and screening of important variables for predicting outcome are the most important steps in data analysis of a predictive model study. In the predictive model, statistical approaches should provide ample evidence to demonstrate external validity of the developed model. Table  6 describes the suggested methods for laboratory study. Statistical approaches for analysis of small sample size studies, paired or unpaired structure of data, adjustment for multiple comparisons, and appropriate reporting of experimental data are critical for fundamental studies. Table  7 shows the statistical procedures required for an inferential study. The statistical procedure involving all study design elements, their rigorousness, and model diagnostics are the most important steps in producing reliable inference from an inferential study.

Interventional randomized study

Abbreviations: AT, as treated analysis; ITT, intention to treat analysis; IV, instrument variable analysis; PPA, per protocol analysis.

Nonrandomized intervention study

Abbreviations: IV, instrument variable analysis; IPTW, inverse probability treatment weight analysis; IPTWRA, doubly robust inverse probability treatment weight and regression adjustment analysis; PSMA, propensity score matched analysis; PSS, propensity score stratified analysis.

Observational predictive study for diagnosis or prognosis

Note . Internal validation refers to assessing model performances by the random splitting of study sample data into test/development/derivation dataset and validation dataset while external validation refers to assessing model performances by either nonrandom split of study sample data into test/development dataset and validation dataset or on independent datasets in different settings from the study sample.

Laboratory study

Abbreviation: CI: confidence interval.

Observational inferential/etiologic study

Abbreviations: ANCOVA, analysis of covariance; CI, confidence interval; GEE, generalized estimating equations.

Table  8 delineates the steps involved in the statistical analyses for descriptive or risk factors study. The choice of the appropriate model along with the intensive exploration of interacting variables and stability of the developed model are critical for descriptive studies. Table  9 displays the data analysis steps for an exploratory study especially with high dimensional data. The selection of variables in the final multivariable model along with the intensive exploration of interacting variables and stability of the developed model are critical for exploratory studies. Tables  10 , 11 , 12 show statistical analysis procedures for diagnostic studies as per objective and type of a reference test. The analysis of diagnostic studies should provide enough emphasis on developing simple, robust, and user‐friendly tool for screening and diagnosing the problem. Table  13 provides methods for meta‐analysis studies according to the number of studies and heterogeneity across studies. The statistical procedure should provide ample evidence to minimize various sources of biases in obtaining a pooled estimate from multiple studies.

Observational descriptive study

Observational exploratory study for high dimensional data

Abbreviation: GEE: generalized estimating equations.

Diagnostic accuracy or comparison study for binary tests in the presence of a binary reference test or an imperfect reference test

Diagnostic accuracy or comparison study for continuous/ordinal diagnostic markers and predictive study

Diagnostic agreement study

Meta‐analysis

For using the SAMBR checklists, the researcher may describe all the items in part A (Table  1 ) (1‐10) and part B (Table  2 ) in the study. Part (B) may be described using the appropriate section from part (C) related to their study design, and accordingly, develop statistical analysis plan for a proposal/grant and execute the statistical analyses from the referenced papers.

5. APPLICATION

For the purpose of SAMBR checklists illustration, we evaluated three recently published articles in oncology—an RCT, 88 an NRCT, 89 and a predictive study. 90 The adherence to SAMBR checklists was evaluated for these studies and reported in Table  14 . As per SAMBR checklists, these studies could have reported additional information to further improve quality and reproducibility. The RCT article applied intention‐to‐treat analysis, two‐sided test, and determined the unadjusted effect of treatment using stratified Cox model and log rank test as per the study design, objective and SAMBR checklists. However, this study did not report randomization accuracy, the adjusted effect of treatment after controlling for prognostic factors, and assess heterogeneity for composite outcome or treatment effect as per the preferred methods in SAMBR checklists. Also, this study did not report number‐needed‐to‐treat or years‐needed‐to‐treat and classify superiority using clinically meaningful limit as suggested in SAMBR checklists. This study concluded that palbociclib‐fulvestrant group showed longer overall survival than the placebo‐fulvestrant group without significant difference among advanced breast cancer patients (hazards ratio = 0.81, P  = .09). 88 Adjusting for prognostic factors may change the significant findings especially with borderline significant results. Similarly, published NRCT 89 did not fully adhere to analysis steps according to SAMBR checklists. The predictive study 90 did not report the selection of parsimonious model among competing models using a bootstrap approach as suggested in SAMBR checklists. Further, internal validation of the developed predictive models and necessary elements (model equation, baseline survival probability at a specific time, etc) were not provided as per SAMBR checklists, which makes them harder to use for predicting different types of cancer risk for a batch of at‐risk subjects.

Application of SAMBR‐part (C) for evaluating statistical analysis and methods in three published articles

Abbreviations: AIC, Akaike information criteria; CI, confidence interval; IPTW, inverse probability treatment weighting; MSPE, mean square prediction error; NA, not applicable; NNT, number‐needed‐to‐treat; NRCT, nonrandomized controlled trial; RCT, randomized controlled trial; SAMBR, statistical analysis and methods in biomedical research; YNT, years‐needed‐to‐treat.

6. CONCLUSIONS

SAMBR is an attempt for a modest proposal towards a concise resource document to be used for evidence‐based analytic practice. The SAMBR (a) suggests linking study objectives, design, and methods for proper selection and application of statistical methods; (b) suggests preferred reporting and summarizing of research question, sample size, and statistical analysis plan; (c) facilitates the choice of statistical approaches with proper references for their execution classified according to study design and study objectives in a concise format; (d) highlights and emphasizes uniform practice for data analysis and reporting. The SAMBR include three components, the first component of SAMBR would help in reporting essential design features to determine the appropriate checklist for statistical analysis specific to study objective and design, the second component of SAMBR would help reviewers to assess the quality reporting of statistical analysis, and third component includes various checklists specific to study design and objectives. We have provided flow charts to navigate researchers to appropriately select checklist as per their study design and objectives. However, ideally, these flow charts should be implemented on the web with skip patterns to direct researchers to the target checklist. We are in the process of developing a web‐based application of the SAMBR checklist and is a subject of our next publication. We plan to include not only the checklist but also step by step guidance to conduct the analyses using available commercial analytic software as well as using freely available analytic software (such as R).

The SAMBR checklists for various studies were not developed using expert opinions. However, we conducted an extensive review of research designs, statistical, and epidemiological published studies to develop these checklists. The purpose of the SAMBR checklists is to highlight the use and critical appraisal of evidence‐based statistical approaches by study design type and objectives. Although the developed checklists provide comprehensive evidence‐based statistical approaches for commonly used research designs, these checklists exclude synthesis of advanced statistical approaches and designs (such as Bayesian methods, structural equation modeling, methods for mixed study designs, multiple time‐to‐event data analysis, sequential or adaptive clinical trials, futility analysis, and survey designs), which need to be included periodically to revise the proposed checklists. The SAMBR checklists provide a quality reporting tool for ensuring methodological standards in common study designs. We believe that the checklists may reduce statistical controversies, promote consistent application of statistical approaches, thus improving the quality of research studies. The suggested methods need to be updated periodically in view of updated evidence. There is a need to develop an interactive web‐based application of the checklists for users for its wide‐availability and applications. The checklists do not enforce to use suggested statistical methods rather encourage to adhere and report minimum statistical analysis steps required for data analysis of common studies as per their objectives/hypothesis. Researchers may use alternative statistical procedures with proper justification as opposed to suggested methods classified for each step of data analysis.

AUTHORS' CONTRIBUTIONS

All authors had full access to the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis. Conceptualization , A.K.D.; Methodology , A.K.D. & R.S.; Investigation , A.K.D.; Formal Analysis , A.K.D.; Resources , A.K.D.; Writing ‐ Original Draft , A.K.D.; Writing ‐ Review & Editing , A.K.D. & R.S.; Visualization , A.K.D. & R.S.; Supervision , A.K.D. & R.S.; Funding Acquisition , F.M.L.

CONFLICT OF INTERESTS

The authors declare that they have no competing interests or any financial disclosure. All the authors have completed ICMJE disclosure form.

FUNDING INFORMATION

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ACKNOWLEDGEMENTS

The authors would like to thank Pallavi Dubey and Muditha Perera for formatting references as per the journal criteria and providing their useful comments and insights.

Dwivedi AK, Shukla R. Evidence‐based statistical analysis and methods in biomedical research (SAMBR) checklists according to design features. Cancer Reports. 2020;3:e1211. 10.1002/cnr2.1211

  • 1. Khera R, Angraal S, Couch T, et al. Adherence to methodological standards in research using the national inpatient sample. JAMA. 2017;318(20):2011‐2018. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 2. Motheral B, Brooks J, Clark MA, et al. A checklist for retrospective database studies—report of the ISPOR task force on retrospective databases. Value Health. 2003;6(2):90‐97. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 3. Altman DG, Gore SM, Gardner MJ, Pocock SJ. Statistical guidelines for contributors to medical journals. Ann Clin Biochem. 1992;29(1):1‐8. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 4. Lang TA, Altman DG. Basic statistical reporting for articles published in biomedical journals: the "Statistical Analyses and Methods in the Published Literature" or the SAMPL Guidelines. Int J Nurs Stud. 2015;52(1):5. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 5. Charan J, Saxena D. Suggested statistical reporting guidelines for clinical trials data. Indian J Psychol Med. 2012;34(1):25. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 6. Hess A, Shardell M, Johnson J, et al. Methods and recommendations for evaluating and reporting a new diagnostic test. Eur J Clin Microbiol Infect Dis. 2012;31(9):2111‐2116. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 7. Ercan I, Yazıcı B, Yang Y, et al. Misuse of statistics in medical research. Eur J Gen Med. 2007;4(3):128‐134. [ Google Scholar ]
  • 8. Thiese MS, Arnold ZC, Walker SD. The misuse and abuse of statistics in biomedical research. Biochem Med. 2015;25(1):5‐11. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 9. Dwivedi AK, Mallawaarachchi I, Alvarado LA. Analysis of small sample size studies using nonparametric bootstrap test with pooled resampling method. Stat Med. 2017;36(14):2187‐2205. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 10. Nelson KP, Edwards D. Measures of agreement between many raters for ordinal classifications. Stat Med. 2015;34(23):3116‐3132. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 11. Sauerbrei W, Abrahamowicz M, Altman DG, Cessie S, Carpenter J. Strengthening analytical thinking for observational studies: The STRATOS initiative. Stat Med. 2014;33(30):5413‐5432. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 12. IntHout J, Ioannidis JP, Rovers MM, Goeman JJ. Plea for routinely presenting prediction intervals in meta‐analysis. BMJ Open. 2016;6(7):e010247. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 13. Yu J, Zhou Z, McEvoy RD, et al. Association of positive airway pressure with cardiovascular events and death in adults with sleep apnea: a systematic review and meta‐analysis. JAMA. 2017;318(2):156‐166. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 14. Rose CE, Pan Y, Baughman AL. Bayesian logistic regression modeling as a flexible alternative for estimating adjusted risk ratios in studies with common outcomes. J Biom Biostat. 2015;6(4):1‐6. [ Google Scholar ]
  • 15. Dwivedi AK, Mallawaarachchi I, Lee S, Tarwater P. Methods for estimating relative risk in studies of common binary outcomes. J Appl Stat. 2014;41(3):484‐500. [ Google Scholar ]
  • 16. Pi‐Sunyer X, Astrup A, Fujioka K, et al. A randomized, controlled trial of 3.0 mg of liraglutide in weight management. N Engl J Med. 2015;373(1):11‐22. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 17. Galit S. To Explain or to predict? Stat Sci. 2010;25(3):289‐310. [ Google Scholar ]
  • 18. Weissgerber TL, Milic NM, Winham SJ, Garovic VD. Beyond bar and line graphs: time for a new data presentation paradigm. PLoS Biol. 2015;13(4):e1002128. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 19. Benjamin DJ, Berger JO. Three recommendations for improving the use of p‐values. Am Stat. 2019;73(1):1537‐2731. [ Google Scholar ]
  • 20. Ten Have TR, Normand SL, Marcus SM, Brown CH, Lavori P, Duan N. Intent‐to‐treat vs. non‐intent‐to‐treat analyses under treatment non‐adherence in mental health randomized trials. Psychiatr Ann. 2008;38(12):772‐783. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 21. Little RJA, Rubin DB. Statistical analysis with missing data. New York: John Wiley & Sons, Inc; 1987. [ Google Scholar ]
  • 22. Heniksen JMT, Geersing GJ, Moons KGM, de Groot JAH. Diagnostic and prognostic prediction models. J Thromb Haemost. 2013;11(Suppl 1):129‐141. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 23. Sauerbrei W, Royston P, Binder H. Selection of important variables and determination of functional form for continuous predictors in multivariable model building. Stat Med. 2007;26(30):5512‐5528. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 24. Genuer R, Poggi J‐M, Tuleau‐Malot C. Variable selection using random forests. Pattern Recogn Lett. 2010;31(14):2225‐2236. [ Google Scholar ]
  • 25. Molinaro AM, Wrensch MR, Jenkins RB, Eckel‐Passow JE. Statistical considerations on prognostic models for glioma. Neuro Oncol. 2016;18(5):609‐623. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 26. Bradburn MJ, Clark TG, Love SB, Altman DG. Survival analysis part II: Multivariate data analysis—an introduction to concepts and methods. Br J Cancer. 2003;89(3):431‐436. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 27. Sullivan LM, Massaro JM, D'Agostino RB. Presentation of multivariate data for clinical use: The Framingham Study risk score functions. Stat Med. 2004;23(10):1631‐1660. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 28. Elze MC, Gregson J, Baber U, et al. Comparison of propensity score methods and covariate adjustment: evaluation in 4 cardiovascular studies. J Am Coll Cardiol. 2017;69(3):345‐357. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 29. Austin PC. An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivar Behav Res. 2011;46(3):399‐424. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 30. Ertefaie A, Small DS, Flory JH, Hennessy S. A tutorial on the use of instrumental variables in pharmacoepidemiology. Pharmacoepidemiol Drug Saf. 2017;26(4):357‐367. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 31. Austin PC, Stuart EA. Moving towards best practice when using inverse probability of treatment weighting (IPTW) using the propensity score to estimate causal treatment effects in observational studies. Stat Med. 2015;34(28):3661‐3679. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 32. Austin PC, Small DS. The use of bootstrapping when using propensity‐score matching without replacement: a simulation study. Stat Med. 2014;33(24):4306‐4319. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 33. Castelloe J, Watts D. Equivalence and noninferiority testing using SAS/STAT® software. 2015.
  • 34. Bonett DG, Price RM. Adjusted Wald confidence interval for a difference of binomial proportions based on paired data. J Educ Behav Stat. 2012;37(4):479‐488. [ Google Scholar ]
  • 35. Goeman JJ, Solari A, Stijnen T. Three‐sided hypothesis testing: simultaneous testing of superiority, equivalence and inferiority. Stat Med. 2010;29(20):2117‐2125. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 36. Jeffrey DB, Lucy D'Agostino M, William DD, Robert AG Jr. Second‐generation p‐values: improved rigor, reproducibility, & transparency in statistical analyses. PLoS ONE. 2018;13(3):e0188299. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 37. Pocock SJ, McMurray JJV, Collier TJ. Statistical controversies in reporting of clinical trials: part 2 of a 4‐Part series on Statistics for clinical trials. J Am Coll Cardiol. 2015;66(23):2648. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 38. O'Connell NS, Dai L, Jiang Y, et al. Methods for analysis of pre‐post data in clinical research: a comparison of five common methods. J Biom Biostat. 2017;8(1):1. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 39. Szmaragd C, Clarke P, Steele F. Subject specific and population average models for binary longitudinal data: A tutorial. 2013.
  • 40. McNeish DM, Harring JR. Clustered data with small sample sizes: comparing the performance of model‐based and design‐based approaches. Commun Stat Simul Comput. 2017;46(2):855‐869. [ Google Scholar ]
  • 41. Ye Y, Li A, Liu L, Yao B. A group sequential Holm procedure with multiple primary endpoints. Stat Med. 2013;32(7):1112‐1124. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 42. Guyatt G, Rennie D, Meade M, Cook D. Users' guides to the medical literature. 3rd ed. New York, N.Y: McGraw‐Hill Medical; 2015. [ Google Scholar ]
  • 43. Laubender RP, Bender R. Estimating adjusted risk difference (RD) and number needed to treat (NNT) measures in the Cox regression model. Stat Med. 2010;29(7–8):851‐859. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 44. Austin PC. Absolute risk reductions, relative risks, relative risk reductions, and numbers needed to treat can be obtained from a logistic regression model. J Clin Epidemiol. 2010;63(1):2‐6. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 45. Page P. Beyond statistical significance: clinical interpretation of rehabilitation research literature. Int J Sports Phys Ther. 2014;9(5):726. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 46. Saunders R, Cape J, Fearon P, Pilling S. Predicting treatment outcome in psychological treatment services by identifying latent profiles of patients. J Affect Disord. 2016;197:107‐115. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 47. Kent DM, Nelson J, Dahabreh IJ, Rothwell PM, Altman DG, Hayward RA. Risk and treatment effect heterogeneity: re‐analysis of individual participant data from 32 large clinical trials. Int J Epidemiol. 2016;45(6):2075‐2088. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 48. Pogue J, Thabane L, Devereaux PJ, Yusuf S. Testing for heterogeneity among the components of a binary composite outcome in a clinical trial. BMC Med Res Methodol. 2010;10(1):49. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 49. Pogue J, Devereaux PJ, Thabane L, Yusuf S. Designing and analyzing clinical trials with composite outcomes: consideration of possible treatment differences between the individual outcomes. PLoS ONE. 2012;7(4):e34785. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 50. Fagerland MW, Lydersen S, Laake P. Recommended confidence intervals for two independent binomial proportions. Stat Methods Med Res. 2015;24(2):224‐254. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 51. Kim H‐Y. Statistical notes for clinical researchers: post‐hoc multiple comparisons. Restor Dent Endod. 2015;40(2):172‐176. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 52. Galbraith S, Daniel JA, Vissel B. A study of clustered data and approaches to its analysis. J Neurosci. 2010;30(32):10601‐10608. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 53. Laajala TD, Jumppanen M, Huhtaniemi R, et al. Optimized design and analysis of preclinical intervention studies in vivo. Sci Rep. 2016;6(1):30723. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 54. Pearce N. Analysis of matched case‐control studies. BMJ. 2016;352:i969. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 55. Sjölander A, Greenland S. Ignoring the matching variables in cohort studies—when is it valid and why? Stat Med. 2013;32(27):4696‐4708. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 56. Sterne JAC, Tilling K. G‐estimation of causal effects, allowing for time‐varying confounding. Stata J. 2002;2(2):164‐182. [ Google Scholar ]
  • 57. Sander G, Judea P, James MR. Causal diagrams for epidemiologic research. Epidemiology. 1999;10(1):37‐48. [ PubMed ] [ Google Scholar ]
  • 58. Koller MT, Raatz H, Steyerberg EW, Wolbers M. Competing risks and the clinical community: irrelevance or ignorance? Stat Med. 2012;31(11–12):1089‐1097. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 59. Austin PC, Merlo J. Intermediate and advanced topics in multilevel logistic regression analysis. Stat Med. 2017;36(20):3257‐3277. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 60. Figueiras A, Domenech‐Massons JM, Cadarso C. Regression models: calculating the confidence interval of effects in the presence of interactions. Stat Med. 1998;17(18):2099‐2105. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 61. Heinze G, Wallisch C, Dunkler D. Variable selection—a review and recommendations for the practicing Statistician. Biom J. 2018;60(3):431‐449. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 62. Hubbard AE, Ahern J, Fleischer NL, et al. To GEE or not to GEE: Comparing population average and mixed models for estimating the associations between neighborhood risk factors and health. Epidemiology. 2010;21(4):467‐474. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 63. Chavent M, Kuentz Simonet V, Liquet B, Saracco J. ClustOfVar: An R package for the clustering of variables. J Stat Softw. 2012;50(13):1‐16.25317082 [ Google Scholar ]
  • 64. Lu F, Petkova E. A comparative study of variable selection methods in the context of developing psychiatric screening instruments. Stat Med. 2014;33(3):401‐421. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 65. Dan S, Cardell NS. The hybrid CART‐Logit model in classification and data mining 1998.
  • 66. Habibzadeh F, Yadollahie M. Number needed to misdiagnose: a measure of diagnostic test effectiveness. Epidemiology. 2013;24(1):170. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 67. Reitsma JB, Rutjes AW, Khan KS, Coomarasamy A, Bossuyt PM. A review of solutions for diagnostic accuracy studies with an imperfect or missing reference standard. J Clin Epidemiol. 2009;62(8):797‐806. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 68. Reiczigel J. Con dence intervals for the binomial parameter: some new considerations. Stat Med. 2003;22:611‐621. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 69. Dwivedi AK, Mallawaarachchi I, Figueroa‐Casas JB, Morales AM, Tarwater P. Multinomial logistic regression approach for the evaluation of binary diagnostic test in medical research. Stat Transition New Ser. 2015;16(2):203‐222. [ Google Scholar ]
  • 70. Elie C, Coste J. A methodological framework to distinguish spectrum effects from spectrum biases and to assess diagnostic and screening test accuracy for patient populations: application to the Papanicolaou cervical cancer smear test. BMC Med Res Methodol. 2008;8:7. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 71. Takaya S, Marc R. The precision‐recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS ONE. 2015;10(3):e0118432. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 72. Ray P, Le Manach Y, Riou B, Houle TT. Statistical evaluation of a biomarker. Anesthesiology. 2010;112(4):1023‐1040. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 73. Kumar R, Indrayan A. Receiver operating characteristic (ROC) curve for medical researchers. Indian Pediatr. 2011;48(4):277‐287. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 74. Dwivedi DK, Kumar R, Dwivedi AK, et al. Prebiopsy multiparametric MRI‐based risk score for predicting prostate cancer in biopsy‐naive men with prostate‐specific antigen between 4–10 ng/mL. J Magn Reson Imaging. 2018;47(5):1227‐1236. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 75. Hajian‐Tilaki KO, Hanley JA, Joseph L, Collet J‐P. A Comparison of parametric and nonparametric approaches to ROC analysis of quantitative diagnostic tests. Med Decis Making. 1997;17(1):94‐102. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 76. Bland JM, Altman DG. Applying the right statistics: analyses of measurement studies. Ultrasound Obstet Gynecol. 2003;22(1):85‐93. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 77. Hallgren KA. Computing inter‐rater reliability for observational data: an overview and tutorial. Tutor Quant Methods Psychol. 2012;8(1):23‐34. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 78. Barnhart HX, Haber M, Song J. Overall concordance correlation coefficient for evaluating agreement among multiple observers. Biometrics. 2002;58(4):1020‐1027. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 79. Banerjee M, Capozzoli M, McSweeney L, Sinha D. Beyond kappa: a review of interrater agreement measures. Can J Stat. 1999;27(1):3‐23. [ Google Scholar ]
  • 80. Mitani AA, Nelson KP. Modeling agreement between binary classifications of multiple raters in R and SAS. J Mod Appl Stat Methods. 2017;16(2):277‐309. [ Google Scholar ]
  • 81. Zeng X, Zhang Y, Kwong JS, et al. The methodological quality assessment tools for preclinical and clinical studies, systematic review and meta‐analysis, and clinical practice guideline: a systematic review. J Evid Based Med. 2015;8(1):2‐10. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 82. Borenstein M, Hedges LV, Higgins JPT, Rothstein HR. Introduction to meta‐analysis. 1. Aufl. ed. West Sussex, England: Wiley; 2009. [ Google Scholar ]
  • 83. Borenstein M, Hedges LV, Higgins JPT, Rothstein HR. A basic introduction to fixed‐effect and random‐effects models for meta‐analysis. Res Synth Methods. 2010;1(2):97‐111. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 84. Bradburn MJ, Deeks JJ, Berlin JA, Russell LA. Much ado about nothing: a comparison of the performance of meta‐analytical methods with rare events. Stat Med. 2007;26(1):53‐77. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 85. IntHout J, Ioannidis JP, Borm GF. The Hartung‐Knapp‐Sidik‐Jonkman method for random effects meta‐analysis is straightforward and considerably outperforms the standard DerSimonian‐Laird method. BMC Med Res Methodol. 2014;14:25. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 86. Veroniki AA, Jackson D, Viechtbauer W, et al. Methods to estimate the between‐study variance and its uncertainty in meta‐analysis. Res Synth Methods. 2016;7(1):55‐79. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 87. Burke DL, Ensor J, Riley RD. Meta‐analysis using individual participant data: one‐stage and two‐stage approaches, and why they may differ. Stat Med. 2017;36(5):855‐875. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 88. Turner NC, Slamon DJ, Ro J, et al. Overall Survival with palbociclib and fulvestrant in advanced breast cancer. N Engl J Med. 2018;379(20):1926‐1936. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 89. Spera G, Fresco R, Fung H, et al. Beta blockers and improved progression‐free survival in patients with advanced HER2 negative breast cancer: a retrospective analysis of the ROSE/TRIO‐012 study. Ann Oncol. 2017;28(8):1836‐1841. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 90. Hippisley‐Cox J, Coupland C. Development and validation of risk prediction algorithms to estimate future risk of common cancers in men and women: prospective cohort study. BMJ Open. 2015;5(3):e007825. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • View on publisher site
  • PDF (687.3 KB)
  • Collections

Similar articles

Cited by other articles, links to ncbi databases.

  • Download .nbib .nbib
  • Format: AMA APA MLA NLM

Add to Collections

  • Foundations
  • Write Paper

Search form

  • Experiments
  • Anthropology
  • Self-Esteem
  • Social Anxiety
  • Statistics >

Statistical Treatment Of Data

Statistical treatment of data is essential in order to make use of the data in the right form. Raw data collection is only one aspect of any experiment; the organization of data is equally important so that appropriate conclusions can be drawn. This is what statistical treatment of data is all about.

This article is a part of the guide:

  • Statistics Tutorial
  • Branches of Statistics
  • Statistical Analysis
  • Discrete Variables

Browse Full Outline

  • 1 Statistics Tutorial
  • 2.1 What is Statistics?
  • 2.2 Learn Statistics
  • 3 Probability
  • 4 Branches of Statistics
  • 5 Descriptive Statistics
  • 6 Parameters
  • 7.1 Data Treatment
  • 7.2 Raw Data
  • 7.3 Outliers
  • 7.4 Data Output
  • 8 Statistical Analysis
  • 9 Measurement Scales
  • 10 Variables and Statistics
  • 11 Discrete Variables

There are many techniques involved in statistics that treat data in the required manner. Statistical treatment of data is essential in all experiments, whether social, scientific or any other form. Statistical treatment of data greatly depends on the kind of experiment and the desired result from the experiment.

For example, in a survey regarding the election of a Mayor, parameters like age, gender, occupation, etc. would be important in influencing the person's decision to vote for a particular candidate. Therefore the data needs to be treated in these reference frames.

An important aspect of statistical treatment of data is the handling of errors. All experiments invariably produce errors and noise. Both systematic and random errors need to be taken into consideration.

Depending on the type of experiment being performed, Type-I and Type-II errors also need to be handled. These are the cases of false positives and false negatives that are important to understand and eliminate in order to make sense from the result of the experiment.

what is statistical treatment in research paper

Treatment of Data and Distribution

Trying to classify data into commonly known patterns is a tremendous help and is intricately related to statistical treatment of data. This is because distributions such as the normal probability distribution occur very commonly in nature that they are the underlying distributions in most medical, social and physical experiments.

Therefore if a given sample size is known to be normally distributed, then the statistical treatment of data is made easy for the researcher as he would already have a lot of back up theory in this aspect. Care should always be taken, however, not to assume all data to be normally distributed, and should always be confirmed with appropriate testing.

Statistical treatment of data also involves describing the data. The best way to do this is through the measures of central tendencies like mean , median and mode . These help the researcher explain in short how the data are concentrated. Range, uncertainty and standard deviation help to understand the distribution of the data. Therefore two distributions with the same mean can have wildly different standard deviation, which shows how well the data points are concentrated around the mean.

Statistical treatment of data is an important aspect of all experimentation today and a thorough understanding is necessary to conduct the right experiments with the right inferences from the data obtained.

  • Psychology 101
  • Flags and Countries
  • Capitals and Countries

Siddharth Kalla (Apr 10, 2009). Statistical Treatment Of Data. Retrieved Nov 15, 2024 from Explorable.com: https://explorable.com/statistical-treatment-of-data

You Are Allowed To Copy The Text

The text in this article is licensed under the Creative Commons-License Attribution 4.0 International (CC BY 4.0) .

This means you're free to copy, share and adapt any parts (or all) of the text in the article, as long as you give appropriate credit and provide a link/reference to this page.

That is it. You don't need our permission to copy the article; just include a link/reference back to this page. You can use it freely (with some kind of link), and we're also okay with people reprinting in publications like books, blogs, newsletters, course-material, papers, wikipedia and presentations (with clear attribution).

what is statistical treatment in research paper

Want to stay up to date? Follow us!

Save this course for later.

Don't have time for it all now? No problem, save it as a course and come back to it later.

Footer bottom

  • Privacy Policy

what is statistical treatment in research paper

  • Subscribe to our RSS Feed
  • Like us on Facebook
  • Follow us on Twitter

National Academies Press: OpenBook

On Being a Scientist: A Guide to Responsible Conduct in Research: Third Edition (2009)

Chapter: the treatment of data.

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

 On Being a S c i e n t i s t The Treatment of Data In order to conduct research responsibly, graduate students need to understand how to treat data correctly. In 2002, the editors of the Journal of Cell Biology began to test the images in all accepted manu- scripts to see if they had been altered in ways that violated the jour- nal’s guidelines. About a quarter of the papers had images that showed evidence of inappropriate manipulation. The editors requested the original data for these papers, compared the original data with the submitted images, and required that figures be remade to accord with the guidelines. In about 1 percent of the papers, the editors found evidence for what they termed “fraudulent manipulation” that affected conclusions drawn in the paper, resulting in the papers’ rejection. Researchers who manipulate their data in ways that deceive others, even if the manipulation seems insignificant at the time, are violating both the basic values and widely accepted professional standards of science. Researchers draw conclusions based on their observations of nature. If data are altered to present a case that is stronger than the data warrant, researchers fail to fulfill all three of the obligations described at the beginning of this guide. They mis- lead their colleagues and potentially impede progress in their field or research. They undermine their own authority and trustworthiness as researchers. And they introduce information into the scientific record that could cause harm to the broader society, as when the dangers of a medical treatment are understated. This is particularly important in an age in which the Internet al- lows for an almost uncontrollably fast and extensive spread of infor- mation to an increasingly broad audience. Misleading or inaccurate data can thus have far-reaching and unpredictable consequences of a magnitude not known before the Internet and other modern com- munication technologies. Misleading data can arise from poor experimental design or care- less measurements as well as from improper manipulation. Over time,

T h e T r e a t m e n t o f D a t a  researchers have developed and have continually improved methods and tools designed to maintain the integrity of research. Some of these methods and tools are used within specific fields of research, such as statistical tests of significance, double-blind trials, and proper phrasing of questions on surveys. Others apply across all research fields, such as describing to others what one has done so that research data and results can be verified and extended. Because of the critical importance of methods, scientific papers must include a description of the procedures used to produce the data, sufficient to permit reviewers and readers of a scientific paper to evaluate not only the validity of the data but also the reliability of the methods used to derive those data. If this information is not available, other researchers may be less likely to accept the data and the conclusions drawn from them. They also may be unable to reproduce accurately the conditions under which the data were derived. The best methods will count for little if data are recorded incor- rectly or haphazardly. The requirements for data collection differ among disciplines and research groups, but researchers have a fun- damental obligation to create and maintain an accurate, accessible, and permanent record of what they have done in sufficient detail for others to check and replicate their work. Depending on the field, this obligation may require entering data into bound notebooks with sequentially numbered pages using permanent ink, using a computer application with secure data entry fields, identifying when and where work was done, and retaining data for specified lengths of time. In much industrial research and in some academic research, data note- books need to be signed and dated by a witness on a daily basis. Unfortunately, beginning researchers often receive little or no formal training in recording, analyzing, storing, or sharing data. Regularly scheduled meetings to discuss data issues and policies maintained by research groups and institutions can establish clear expectations and responsibilities.

10 On Being a S c i e n t i s t The Selection of Data Deborah, a third-year graduate student, and Kamala, a postdoc- toral fellow, have made a series of measurements on a new experimental semiconductor material using an expensive neutron test at a national laboratory. When they return to their own laboratory and examine the data, a newly proposed mathematical explanation of the semiconductor’s behavior predicts results indicated by a curve. During the measurements at the national laboratory, Deborah and Kamala observed electrical power fluctuations that they could not control or predict were affecting their detector. They suspect the fluctuations af- fected some of their measurements, but they don’t know which ones. When Deborah and Kamala begin to write up their results to present at a lab meeting, which they know will be the first step in preparing a publication, Kamala suggests dropping two anomalous data points near the horizontal axis from the graph they are preparing. She says that due to their deviation from the theoretical curve, the low data points were obviously caused by the power fluctuations. Furthermore, the deviations were outside the expected error bars calculated for the remaining data points. Deborah is concerned that dropping the two points could be seen as manipulating the data. She and Kamala could not be sure that any of their data points, if any, were affected by the power fluctuations. They also did not know if the theoretical prediction was valid. She wants to do a separate analysis that includes the points and discuss the issue in the lab meeting. But Kamala says that if they include the data points in their talk, others will think the issue important enough to discuss in a draft paper, which will make it harder to get the paper published. Instead, she and Deborah should use their professional judgment to drop the points now. 1. What factors should Kamala and Deborah take into account in deciding how to present the data from their experiment? 2. Should the new explanation predicting the results affect their deliberations? 3. Should a draft paper be prepared at this point? 4. If Deborah and Kamala can’t agree on how the data should be presented, should one of them consider not being an author of the paper?

T h e T r e a t m e n t o f D a t a 11 Most researchers are not required to share data with others as soon as the data are generated, although a few disciplines have ad- opted this standard to speed the pace of research. A period of confi- dentiality allows researchers to check the accuracy of their data and draw conclusions. However, when a scientific paper or book is published, other re- searchers must have access to the data and research materials needed to support the conclusions stated in the publication if they are to verify and build on that research. Many research institutions, funding agencies, and scientific journals have policies that require the sharing of data and unique research materials. Given the expectation that data will be accessible, researchers who refuse to share the evidentiary basis behind their conclusions, or the materials needed to replicate published experiments, fail to maintain the standards of science. In some cases, research data or materials may be too voluminous, unwieldy, or costly to share quickly and without expense. Neverthe- less, researchers have a responsibility to devise ways to share their data and materials in the best ways possible. For example, centralized facilities or collaborative efforts can provide a cost-effective way of providing research materials or information from large databases. Examples include repositories established to maintain and distribute astronomical images, protein sequences, archaeological data, cell lines, reagents, and transgenic animals. New issues in the treatment and sharing of data continue to arise as scientific disciplines evolve and new technologies appear. Some forms of data undergo extensive analysis before being recorded; con- sequently, sharing those data can require sharing the software and sometimes the hardware used to analyze them. Because digital tech- nologies are rapidly changing, some data stored electronically may be inaccessible in a few years unless provisions are made to transport the data from one platform to another. New forms of publication are challenging traditional practices associated with publication and the evaluation of scholarly work.

The scientific research enterprise is built on a foundation of trust. Scientists trust that the results reported by others are valid. Society trusts that the results of research reflect an honest attempt by scientists to describe the world accurately and without bias. But this trust will endure only if the scientific community devotes itself to exemplifying and transmitting the values associated with ethical scientific conduct.

On Being a Scientist was designed to supplement the informal lessons in ethics provided by research supervisors and mentors. The book describes the ethical foundations of scientific practices and some of the personal and professional issues that researchers encounter in their work. It applies to all forms of research—whether in academic, industrial, or governmental settings-and to all scientific disciplines.

This third edition of On Being a Scientist reflects developments since the publication of the original edition in 1989 and a second edition in 1995. A continuing feature of this edition is the inclusion of a number of hypothetical scenarios offering guidance in thinking about and discussing these scenarios.

On Being a Scientist is aimed primarily at graduate students and beginning researchers, but its lessons apply to all scientists at all stages of their scientific careers.

READ FREE ONLINE

Welcome to OpenBook!

You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

Do you want to take a quick tour of the OpenBook's features?

Show this book's table of contents , where you can jump to any chapter by name.

...or use these buttons to go back to the previous chapter or skip to the next one.

Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

To search the entire text of this book, type in your search term here and press Enter .

Share a link to this book page on your preferred social network or via email.

View our suggested citation for this chapter.

Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

Get Email Updates

Do you enjoy reading reports from the Academies online for free ? Sign up for email notifications and we'll let you know about new publications in your areas of interest when they're released.

IMAGES

  1. Statistical-Treatment-of-Data 3 .docx

    what is statistical treatment in research paper

  2. 7 Types Of Statistical Analysis Definition And Explanation Analytics

    what is statistical treatment in research paper

  3. Statistical Treatment of Data

    what is statistical treatment in research paper

  4. Statistical Treatment of Data The data that was gathered in the survey was recorded, analyzed

    what is statistical treatment in research paper

  5. Statistical Treatment of Data

    what is statistical treatment in research paper

  6. What is statistical treatment for research paper

    what is statistical treatment in research paper

COMMENTS

  1. Statistical Treatment of Data – Explained & Example

    Statistical treatment of data involves the use of statistical methods such as: mean, mode, median, regression, conditional probability, sampling, standard deviation and. distribution range.

  2. Research Paper Statistical Treatment of Data: A Primer

    Proper statistical treatment and presentation of data are crucial for the integrity of any quantitative research paper. Statistical techniques help establish validity, account for errors, test hypotheses, build models and derive meaningful insights from the research.

  3. The Beginner's Guide to Statistical Analysis | 5 Steps & Examples

    This article is a practical introduction to statistical analysis for students and researchers. We’ll walk you through the steps using two research examples. The first investigates a potential cause-and-effect relationship, while the second investigates a potential correlation between variables.

  4. Statistical Treatment - Statistics How To

    What is Statistical Treatment? Statistical treatment can mean a few different things: In Data Analysis: Applying any statistical method — like regression or calculating a mean — to data. In Factor Analysis: Any combination of factor levels is called a treatment.

  5. Statistical Analysis in Research: Meaning, Methods and Types

    Statistical analysis uses quantitative data to investigate patterns, relationships, and patterns to understand real-life and simulated phenomena. The approach is a key analytical tool in various fields, including academia, business, government, and science in general.

  6. Exploratory Data Analysis: Frequencies, Descriptive ...

    Researchers must utilize exploratory data techniques to present findings to a target audience and create appropriate graphs and figures. Researchers can determine if outliers exist, data are missing, and statistical assumptions will be upheld by understanding data.

  7. Evidence‐based statistical analysis and methods in biomedical ...

    In the era of reproducible research, to increase the reproducibility, validity, and integrity of the research findings, we suggest following evidence‐based statistical practices in publications by use of appropriate statistical methods and their reporting relevant for specific objectives.

  8. Statistical Treatment Of Data - Explorable

    Statistical treatment of data is an important aspect of all experimentation today and a thorough understanding is necessary to conduct the right experiments with the right inferences from the data obtained.

  9. Choosing the Right Statistical Test | Types & Examples - Scribbr

    Statistical tests are used in hypothesis testing. They can be used to: determine whether a predictor variable has a statistically significant relationship with an outcome variable. estimate the difference between two or more groups. Statistical tests assume a null hypothesis of no relationship or no difference between groups. Then they ...

  10. The Treatment of Data | On Being a Scientist: A Guide to ...

    Some of these methods and tools are used within specific fields of research, such as statistical tests of significance, double-blind trials, and proper phrasing of questions on surveys. Others apply across all research fields, such as describing to others what one has done so that research data and results can be verified and extended.