Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Published: 18 February 2021

Essentials of data management: an overview

  • Miren B. Dhudasia 1 , 2 ,
  • Robert W. Grundmeier 2 , 3 , 4 &
  • Sagori Mukhopadhyay 1 , 2 , 3  

Pediatric Research volume  93 ,  pages 2–3 ( 2023 ) Cite this article

4494 Accesses

6 Citations

5 Altmetric

Metrics details

What is data management?

Data management is a multistep process that involves obtaining, cleaning, and storing data to allow accurate analysis and produce meaningful results. While data management has broad applications (and meaning) across many fields and industries, in clinical research the term data management is frequently used in the context of clinical trials. 1 This editorial is written to introduce early career researchers to practices of data management more generally, as applied to all types of clinical research studies.

Outlining a data management strategy prior to initiation of a research study plays an essential role in ensuring that both scientific integrity (i.e., data generated can accurately test the hypotheses proposed) and regulatory requirements are met. Data management can be divided into three steps—data collection, data cleaning and transformation, and data storage. These steps are not necessarily chronological and often occur simultaneously. Different aspects of the process may require the expertise of different people necessitating a team effort for the effective completion of all steps.

Data collection

Data source.

Data collection is a critical first step in the data management process and may be broadly classified as “primary data collection” (collection of data directly from the subjects specifically for the study) and “secondary use of data” (repurposing data that were collected for some other reason—either for clinical care in the subject’s medical record or for a different research study). While the terms retrospective and prospective data collection are occasionally used, 2 these terms are more applicable to how the data are utilized rather than how they are collected . Data used in a retrospective study are almost always secondary data; data collected as part of a prospective study typically involves primary data collection, but may also involve secondary use of data collected as part of ongoing routine clinical care for study subjects. Primary data collected for a specific study may be categorized as secondary data when used to investigate a new hypothesis, different from the question for which the data were originally collected. Primary data collection has the advantage of being specific to the study question, minimize missingness in key information, and provide an opportunity for data correction in real time. As a result, this type of data is considered more accurate but increases the time and cost of study procedures. Secondary use of data includes data abstracted from medical records, administrative data such as from the hospital’s data warehouse or insurance claims, and secondary use of primary data collected for a different research study. Secondary use of data offers access to large amounts of data that are already collected but often requires further cleaning and codification to align the data with the study question.

A case report form (CRF) is a powerful tool for effective data collection. A CRF is a paper or electronic questionnaire designed to record pertinent information from study subjects as outlined in the study protocol. 3 CRFs are always required in primary data collection but can also be useful in secondary use of data to preemptively identify, define, and, if necessary, derive critical variables for the study question. For instance, medical records provide a wide array of information that may not be required or be useful for the study question. A CRF with well-defined variables and parameters helps the chart reviewer focus only on the relevant data, and makes data collection more objective and unbiased, and, in addition, optimize patient confidentiality by minimizing the amount of patient information abstracted. Tools like REDCap (Research Electronic Data Capture) provide electronic CRFs and offer some advanced features like setting validation rules to minimize errors during data collection. 4 Designing an effective CRF upfront during the study planning phase helps to streamline the data collection process, and make it more efficient. 3

Data cleaning and transformation

Quality checks.

Data collected may have errors that arise from multiple sources—data manually entered in a CRF may have typographical errors, whereas data obtained from data warehouses or administrative databases may have missing data, implausible values, and nonrandom misclassification errors. Having a systematic approach to identify and rectify these errors, while maintaining a log of the steps performed in the process, can prevent many roadblocks during analysis.

First, it is important to check for missing data. Missing data are defined as values that are not available and that would be meaningful for analysis if they were observed. 5 Missing data can bias the results of the study depending on how much data is missing and what is the pattern of distribution of missing data in the study cohort. Many methods for handling missing data have been published. Kang 6 provide a practical review of methods for handling missing data. If missing data cannot be retrieved and is limited to only a small number of subjects, one approach is to exclude these subjects from the study. Missing data in different variables across many subjects often require more sophisticated approaches to account for the “missingness.” These may include creating a category of “missing” (for categorical variables), simple imputation (e.g., substituting missing values in a variable with an average of non-missing values in the variable), or multiple imputations (substituting missing values with the most probable value derived from other variables in the dataset). 7

Second, errors in the data can be identified by running a series of data validation checks. Some examples of data validation rules for identifying implausible values are shown in Table  1 . Automated algorithms for detection and correction of implausible values may be available for cleaning specific variables in large datasets (e.g., growth measurements). 8 After identification, data errors can either be corrected, if possible, or can be marked for deletion. Other approaches, similar to those for dealing with missing data, can also be used for managing data errors.

Data transformation

The data collected may not be in the form required for analysis. The process of data transformation includes recategorization and recodification of the data, which has been collected along with derivation of new variables, to align with the study analytic plan. Examples include categorizing body mass index collected as a continuous variable into under- and overweight categories, recoding free-text values such as “growth of an organism” or “no growth,” and into a binary “positive” or “negative,” or deriving new variables such as average weight per year from multiple weight values over time available in the dataset. Maintaining a code-book of definitions for all variables, predefined and derived, can help a data analyst better understand the data.

Data storage

Securely storing data is especially important in clinical research as the data may contain protected health information of the study subjects. 9 Most institutes that support clinical research have guidelines for safeguards to prevent accidental data breaches.

Data are collected in paper or electronic formats. Paper data should be stored in secure file cabinets inside a locked office at the site approved by the institutional review board. Electronic data should be stored on a secure approved institutional server, and should never be transported using unencrypted portable media devices (e.g., “thumb drives”). If all study team members do not require access to study data, then selective access should be granted to the study team members based on their roles.

Another important aspect of data storage is data de-identification. Data de-identification is a process by which identifying characteristics of the study participants are removed from the data, in order to mitigate privacy risks to individuals. 10 Identifying characteristics of a study subject includes name, medical record number, date of birth/death, and so on. To de-identify data, these characteristics should either be removed from the data or modified (e.g., changing the medical record number to study IDs, changing dates to age/duration, etc.). If feasible, study data should be de-identified when storing. If you anticipate that reidentification of the study participants may be required in future, then the data can be separated into two files, one containing only the de-identified data of the study participants, and one containing all the identifying information, with both files containing a common linking variable (e.g., study ID), which is unique for every subject or record in the two files. The linking variable can be used to merge the two files when reidentification is required to carry out additional analyses or to get further data. The link key should be maintained in a secure institutional server accessible only to authorized individuals who need access to the identifiers.

To conclude, effective data management is important to the successful completion of research studies and to ensure the validity of the results. Outlining the steps of the data management process upfront will help streamline the process and reduce the time and effort subsequently required. Assigning team members responsible for specific steps and maintaining a log, with date/time stamp to document each action as it happens, whether you are collecting, cleaning, or storing data, can ensure all required steps are done correctly and identify any errors easily. Effective documentation is a regulatory requirement for many clinical trials and is helpful for ensuring all team members are on the same page. When interpreting results, it will serve as an important tool to assess if the interpretations are valid and unbiased. Last, it will ensure the reproducibility of the study findings.

Krishnankutty, B., Bellary, S., Kumar, N. B. & Moodahadu, L. S. Data management in clinical research: an overview. Indian J. Pharm. 44 , 168–172 (2012).

Article   Google Scholar  

Weinger, M. B. et al. Retrospective data collection and analytical techniques for patient safety studies. J. Biomed. Inf. 36 , 106–119 (2003).

Avey, M. in Clinical Data Management 2nd edn. (eds Rondel, R. K., Varley, S. A. & Webb, C. F.) 47–73 (Wiley, 1999).

Harris, P. A. et al. Research electronic data capture (REDCap)—a metadata-driven methodology and workflow process for providing translational research informatics support. J. Biomed. Inf. 42 , 377–381 (2009).

Little, R. J. et al. The prevention and treatment of missing data in clinical trials. N. Engl. J. Med. 367 , 1355–1360 (2012).

Article   CAS   Google Scholar  

Kang, H. The prevention and handling of the missing data. Korean J. Anesthesiol. 64 , 402 (2013).

Rubin, D. B. Inference and missing data. Biometrika 63 , 581–592 (1976).

Daymont, C. et al. Automated identification of implausible values in growth data from pediatric electronic health records. J. Am. Med. Inform. Assoc. 24 , 1080–1087 (2017).

Office for Civil Rights, Department of Health and Human Services. Health insurance portability and accountability act (HIPAA) privacy rule and the national instant criminal background check system (NICS). Final rule. Fed. Regist. 81 , 382–396 (2016).

Google Scholar  

Office for Civil Rights (OCR). Methods for de-identification of PHI. https://www.hhs.gov/hipaa/for-professionals/privacy/special-topics/de-identification/index.html (2012).

Download references

Acknowledgements

This work was partially supported in part by the Eunice Kennedy Shriver National Institute of Child Health & Human Development of the National Institutes of Health grant (K23HD088753).

Author information

Authors and affiliations.

Division of Neonatology, Children’s Hospital of Philadelphia, Philadelphia, PA, USA

Miren B. Dhudasia & Sagori Mukhopadhyay

Center for Pediatric Clinical Effectiveness, Children’s Hospital of Philadelphia, Philadelphia, PA, USA

Miren B. Dhudasia, Robert W. Grundmeier & Sagori Mukhopadhyay

Department of Pediatrics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA

Robert W. Grundmeier & Sagori Mukhopadhyay

Department of Biomedical and Health Informatics, Children’s Hospital of Philadelphia, Philadelphia, PA, USA

Robert W. Grundmeier

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Sagori Mukhopadhyay .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article.

Dhudasia, M.B., Grundmeier, R.W. & Mukhopadhyay, S. Essentials of data management: an overview. Pediatr Res 93 , 2–3 (2023). https://doi.org/10.1038/s41390-021-01389-7

Download citation

Received : 11 December 2020

Revised : 27 December 2020

Accepted : 06 January 2021

Published : 18 February 2021

Issue Date : January 2023

DOI : https://doi.org/10.1038/s41390-021-01389-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Advancing clinical and translational research in germ cell tumours (gct): recommendations from the malignant germ cell international consortium.

  • Adriana Fonseca
  • Matthew J. Murray

British Journal of Cancer (2022)

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

data analysis clinical research

Skip to main content

  • SAS Viya Platform
  • Capabilities
  • Why SAS Viya?
  • Move to SAS Viya
  • Artificial Intelligence
  • Risk Management
  • All Products & Solutions
  • Public Sector
  • Life Sciences
  • Retail & Consumer Goods
  • All Industries
  • Contracting with SAS
  • Customer Stories
  • Generative AI

Why Learn SAS?

Demand for SAS skills is growing. Advance your career and train your team in sought after skills

  • Train My Team
  • Course Catalog
  • Free Training
  • My Training
  • Academic Programs
  • Free Academic Software
  • Certification
  • Choose a Credential
  • Why get certified?
  • Exam Preparation
  • My Certification
  • Communities
  • Ask the Expert
  • All Webinars
  • Video Tutorials
  • YouTube Channel
  • SAS Programming
  • Statistical Procedures
  • New SAS Users
  • Administrators
  • All Communities
  • Documentation
  • Installation & Configuration
  • SAS Viya Administration
  • SAS Viya Programming
  • System Requirements
  • All Documentation
  • Support & Services
  • Knowledge Base
  • Starter Kit
  • Support by Product
  • Support Services
  • All Support & Services
  • User Groups
  • Partner Program
  • Find a Partner
  • Sign Into PartnerNet

Learn why SAS is the world's most trusted analytics platform, and why analysts, customers and industry experts love SAS.

Learn more about SAS

  • Annual Report
  • Vision & Mission
  • Office Locations
  • Internships
  • Search Jobs
  • News & Events
  • Newsletters
  • Trust Center
  • support.sas.com
  • documentation.sas.com
  • blogs.sas.com
  • communities.sas.com
  • developer.sas.com

Select Your Region

Middle East & Africa

Asia Pacific

  • Canada (English)
  • Canada (Français)
  • United States
  • Bosnia & Herz.
  • Česká Republika
  • Deutschland
  • North Macedonia
  • Schweiz (Deutsch)
  • Suisse (Français)
  • United Kingdom
  • Middle East
  • Saudi Arabia
  • South Africa
  • New Zealand
  • Philippines
  • Thailand (English)
  • ประเทศไทย (ภาษาไทย)
  • Worldwide Sites

Create Profile

Get access to My SAS, trials, communities and more.

Edit Profile

Life Sciences Analytics

Clinical Research Analytics

Advance clinical data science with regulatory-grade analytics., how sas ® supports clinical research analytics.

As the market leader in clinical research analytics, SAS provides a secure analytics foundation and scalable framework for clinical analysis and submission. Our robust analytic tools and techniques, including AI and machine learning, help you gain a competitive edge in the high-stakes world of clinical research analytics – from getting trials up and running, to modernizing trial designs, to delivering life-changing therapies to market faster and more efficiently. SAS also provides the leading platform for data transparency, allowing you to securely share historical trial data with third-party researchers for the betterment of medicine.

Clinical trial modernization

  • Adopt an end-to-end clinical analytics foundation so you can spend more time on data exploration, data quality monitoring, and executing advanced analytics and statistics.
  • Integrate your clinical ecosystem and collaborate with partners.
  • Apply streaming and edge IoT analytics to tackle unprecedented growth in data volume and velocity from the Internet of Medical Things (IoMT) devices.
  • Use real world data to drive insight into the clinical development process.

Clinical trial operational analytics

  • Use advanced analytics to mimic the behavior of the enrollment process, quickly predict likely outcomes and test different approaches using what-if scenarios.
  • Compare realized enrollment versus simulations, and adapt plans to stay on target.
  • Inform site supply more efficiently using enrollment predictions.
  • Use real world data to inform operational processes and make better decisions.

AI & machine learning

  • Drive automation and consistency across studies with machine learning.
  • Use machine learning to extract new insights and boost clinical development efficiency.
  • Use AI and machine learning to develop fit-for-future trial designs, e.g., virtual, pragmatic and adaptive trials.
  • Use clinical data mapping to easily store AI-powered transformation rules in a central database in alignment with actual trial data and CDISC data standards metadata.

Clinical data transparency

  • Use our industry-leading clinical data transparency to share clinical research with external researchers for secondary analysis and advancement of new discoveries.
  • Give researchers access to an unparalleled breadth of available data from more than 25 sponsors.
  • Empower researchers with maximum analytical depth and flexibility, including the use of open source and third-party licensed software.
  • Give researchers free access to data, and allow for transparency platform interoperability.

Why SAS ® for clinical research analytics?

SAS is the standard for clinical research analytics in the life sciences industry, helping you maximize value and reduce risks.

Gain insights faster

Adopt a single integrated platform for clinical analysis and submission, allowing your team to focus on insights and drive results. A wealth of technologies provides scalability, visualization and submission-grade statistics for understanding clinical trial results faster and detecting potential issues sooner.

Mitigate risk

Reduce risk with a centralized repository that meets all security requirements with auditable actions, traceability and repeatability. Deliver data to regulators using built-in CDISC standards. Decrease downtime with all-in hosted solutions. And provide technical support for all your global teams and collaborators.

Increase efficiency

Improve performance with mature, standards-driven processes. Streamline data and analytics building blocks for maximum reuse.

The centralized system allows team members in different continents to look at data without having to transfer data to them. With the SAS Life Science Analytics Framework, everyone is viewing the same snapshot of the data. Global access to consistent data is the dream, and we’ve achieved it. Nina Worden Director of Statistical Programming Santen

Related Products & Solutions

  • SAS for Transforming Clinical Trial Analysis & Submission | Powered by Azure Advance clinical data science with regulatory-grade analytics in support of decentralized, patient-centric clinical trials.
  • SAS® Clinical Enrollment Simulation Build and evaluate faster, more strategic enrollment plans using simulation.
  • SAS® Health Solutions Accelerate time to value with industry-specific analytics and guided interactive visuals.
  • SAS® Life Science Analytics Framework Efficiently manage the transformation, analysis and reporting of clinical trials data with a single, cloud-based pharma analytics framework.

Connect with SAS and see what we can do for you.

Request a demo, get pricing.

  • Technical advance
  • Open access
  • Published: 24 June 2019

The Generalized Data Model for clinical research

  • Mark D. Danese   ORCID: orcid.org/0000-0002-7068-9603 1 ,
  • Marc Halperin 1 ,
  • Jennifer Duryea 1 &
  • Ryan Duryea 1  

BMC Medical Informatics and Decision Making volume  19 , Article number:  117 ( 2019 ) Cite this article

20k Accesses

10 Citations

12 Altmetric

Metrics details

Most healthcare data sources store information within their own unique schemas, making reliable and reproducible research challenging. Consequently, researchers have adopted various data models to improve the efficiency of research. Transforming and loading data into these models is a labor-intensive process that can alter the semantics of the original data. Therefore, we created a data model with a hierarchical structure that simplifies the transformation process and minimizes data alteration.

There were two design goals in constructing the tables and table relationships for the Generalized Data Model (GDM). The first was to focus on clinical codes in their original vocabularies to retain the original semantic representation of the data. The second was to retain hierarchical information present in the original data while retaining provenance. The model was tested by transforming synthetic Medicare data; Surveillance, Epidemiology, and End Results data linked to Medicare claims; and electronic health records from the Clinical Practice Research Datalink. We also tested a subsequent transformation from the GDM into the Sentinel data model.

The resulting data model contains 19 tables, with the Clinical Codes, Contexts, and Collections tables serving as the core of the model, and containing most of the clinical, provenance, and hierarchical information. In addition, a Mapping table allows users to apply an arbitrarily complex set of relationships among vocabulary elements to facilitate automated analyses.

Conclusions

The GDM offers researchers a simpler process for transforming data, clear data provenance, and a path for users to transform their data into other data models. The GDM is designed to retain hierarchical relationships among data elements as well as the original semantic representation of the data, ensuring consistency in protocol implementation as part of a complete data pipeline for researchers.

Peer Review reports

Healthcare data contains useful information for clinical researchers across a wide range of disciplines, including pharmacovigilance, epidemiology, and health services research. However, most data sources throughout the world store information within their own unique schemas, making it difficult to develop software tools that ensure reliable and reproducible research. One solution to this problem is to create data models that standardize the storage of both the data and the relationships among data elements [ 1 ].

In healthcare, several commonly used data models include those supported by the following organizations: Informatics for Integrating Biology and the Bedside (i2b2) [ 2 , 3 , 4 ], Observational Health Data Sciences and Informatics (OHDSI, managing the OMOP [Observational Outcomes Medical Partnership] data model) [ 5 , 6 , 7 ], Sentinel [ 8 , 9 , 10 ], and PCORnet (Patient Centered Outcomes Research Network) [ 11 , 12 ], among others. The first, and biggest, challenge with any data model is the process of migrating the raw (source) data into the data model, referred to as the “extract, transform, and load” (ETL) process. The ETL process is particularly burdensome when one has to support multiple, large data sources, and to update them regularly [ 13 ].

Some aspects of transforming raw data into a particular data model are straight-forward, including reorganizing variables and standardizing their names. However, the most challenging aspect is standardizing the relationships among data elements without changing their meaning. Since different healthcare data sources encode relationships in different ways, the ETL process can lose information, or create inaccurate information. The best example is the process of creating a visit, a construct which, in most data models, is used to link information (e.g., diagnoses and procedures) on a per patient basis.

Visits are challenging because administrative claims allow facilities and practitioners to invoice separately for their portions of the same medical encounter, and allow practitioners to bill for multiple interactions on a single invoice [ 14 ]. Within the practitioner bills, individual procedures are linked to diagnosis codes, procedure modifiers, and costs. Consequently, a visit should link both the facility and the practitioner information without changing the existing practitioner-specified relationships between procedures, modifiers, diagnoses, and costs. Even electronic medical records can be challenging when each interaction with a different provider (e.g., nurse, physician, pharmacist, etc.) is recorded separately, requiring decisions to be made about defining a visit.

To minimize the need to encode specific relationships that may not exist in the source data, we created a data model with a hierarchical structure that minimizes changes to the meaning of the original data. This data model can serve both as a stand-alone data model for clinical researchers using observational data, as well as a storage model for later conversion into other data models.

In designing the Generalized Data Model (GDM) the primary use case was to allow clinical researchers using commonly available observational datasets to conduct research efficiently using a common framework. In particular, the GDM was designed to allow researchers to reuse an extensive, published body of existing algorithms for identifying clinical research constructs, including visits, that are expressed in the native vocabularies of the raw data. These algorithms require code sets, and may also require temporal logic (e.g., before, after, during, etc.), sequencing information (e.g., first, last, etc.), and provenance information (e.g., inpatient, outpatient, etc.). The GDM specifically considered both oncology research, which has its own specific vocabularies, and health services research. However, the model was designed so that these specific focus areas would not limit the design or use of the model.

Design goals

We initiated development of the GDM to make ETL specification and implementation easier for users who work with data models. There were two primary goals in defining the standard tables and table relationships for the GDM, described below.

Focus on clinical codes in their original vocabularies

For clinical research, transparency and reproducibility are critically important. Therefore, the model is focused on the original (source) vocabularies to prevent the loss of the original semantic expression of the underlying clinical information. We also wanted all clinical codes (e.g., International Classification of Diseases [ICD], Current Procedural Terminology, National Drug Codes, etc.) to be easy to load into the data model and easy to query, because they represent the majority of electronic clinical information. Hence, the key organizing structure of the GDM is the placement of all clinical codes in a central “fact” table. This is not unlike the i2b2 data model that uses a fact table to store all “observations” from a source data set; however, the GDM was not designed as a star schema despite the similar idea of locating the most important data at the center of the data model.

We also considered interoperability as part of the design, but it was of secondary importance. Interoperability, like the construction of visits, requires establishing new connections (“mappings”) between the source vocabularies and a standard vocabulary such that a single query can operate across all data sources regardless of the source vocabulary. For international studies using different vocabularies, this might be a useful tool. However, given that every code isn’t yet mapped to a standard (e.g., OMOP has little in the way of procedure code mappings), and the maintenance required to support and update mappings, we designed the GDM to incorporate reliable cross-vocabulary mappings where they exist.

Retain hierarchical information with provenance

The second goal was to capture important hierarchical relationships among data elements within a relational data structure. Based on the review of numerous data sources including Medicare, Surveillance Epidemiology and End Results (SEER) Medicare, Optum, Truven, JMDC (Japanese claims), and Clinical Practice Research Datalink (CPRD), we decided on a two-level hierarchy for grouping clinical codes, with the lower level table called Contexts and the higher-level table called Collections. This was based on common data structures where many related codes are recorded on a single record in the source data (Contexts table), and where these records are often grouped together (Collections table) based on clinical reporting or billing considerations. See Results for table definitions, and Fig.  1 for a visual depiction of the hierarchical structure of the Contexts and Collections tables.

figure 1

Relationships Among the Collections, Contexts, and Clinical Codes Tables. Note: EHR = electronic health record. HCPCS = Healthcare Common Procedure Coding System. NDC = National Drug Code. ICD = International Classification of Diseases. Figure does not contain specific data, but is intended to show the conceptual relationships among data elements across tables

Our review of data sources suggested that the data model needed to support relatively few relationship types. The primary relationship represents data that is reported together or collected at the same time. One example of this includes a “line”, which occurs in claims data when one or more diagnosis codes, a procedure code, and a cost are all reported together. Another example includes laboratory values assessed at the same time (e.g., systolic and diastolic blood pressure) which could be considered to be co-reported. Also, a set of prescription refills could represent a linked set of records. Even records that contain pre-coordinated expressions (i.e., a linked set of codes used to provide clinical information akin to an English sentence) could also be stored in order by associating the codes with a single Context record.

We also included the provenance for each clinical code as part of Contexts, recording not only the type of relationship among elements within a Context as discussed above, but also the source file from which the data was abstracted. To minimize the loss of information when converting from the GDM to a data model that uses visits for organizing and consolidating most data relationships, the GDM does not require explicit visits (see Results ). This is important because visits are not consistently defined among other data models, particularly for administrative claims data (see Discussion ).

Other considerations

There are several other considerations made in building this data model, some of which were borrowed or adapted from other data models. For example, in addition to the cost table, we borrowed the OMOP idea to store all codes as “concept ids” (unique numeric identifiers for each code in each vocabulary to avoid conflicts between different vocabularies that use the same code). We also expanded upon the idea of OMOP “type_concept_ids” to track provenance within our data model. Finally, we allow flexibility in storing enrollment information in the Information Periods table using a “type_concept_id” so that the data can be used for different purposes (e.g., if a protocol does not require drug data, then enrollment in a drug plan should not be required). We also wanted to facilitate a straightforward, subsequent ETL process to other data models, including OMOP, Sentinel, and PCORnet.

We adapted the Payer Reimbursements table from the OMOP version 5.2 Cost table because it was the only data model with a cost table, and because we contributed substantially to its design. However, unlike the single OMOP cost table, we created two tables to accommodate both reimbursement-specific information, which has a well-defined structure, and all other kinds of economic information, which requires a very flexible structure. (The OMOP version 5.31 Cost table was redesigned to be more flexible, coincidentally resembling the GDM Costs table.)

We tested the data model on three very different types of commonly available data used by clinical researchers: administrative claims data, EHR data, and cancer registry data. Claims in the United States are generally submitted electronically by the provider to the insurer using the American National Standards Institute (ANSI) 837P and 837I file specifications, which correspond to the CMS-1500 and UB04 paper forms [ 15 ]. Remittance information is sent from the insurer to the provider using the 835P and 835I specifications. However, actual claims data used for research is provided in a much simpler format. Based on experience developing and supporting software for submitting claims to insurers as well as creating ETL specifications for multiple commercial claims and EHR datasets using the OMOP data model, we determined that Medicare data is the most stringent test for transforming claims data because it contains the most information from the 837 and 835 files. For EHR data, we used the Clinical Practice Research Datalink (CPRD) data, because it is widely used for clinical research [ 16 ]. Finally, as part of our focus on oncology research, we included Surveillance, Epidemiology, and End Results (SEER) data [ 17 ] because SEER provides some of the most detailed cancer registry data available globally to clinical researchers which is challenging to incorporate into data models.

More specifically, we implemented a complete ETL process for the Medicare Synthetic Public Use Files (SynPUF). The SynPUF data are created from a 2.1-million-patient sample of Medicare beneficiaries from 2008 who were followed for three years, created to facilitate software development using Medicare data [ 18 , 19 ]. We also implemented an ETL for SEER data linked to Medicare claims data [ 20 ] for 20,000 patients with small cell lung cancer, as part of an ongoing research project to describe patterns of care in that population. Finally, we developed a complete ETL for 140,000 CPRD patients for an ongoing research project evaluating outcomes associated with adherence to lipid-lowering medications. We also tested the feasibility of an ETL process to move SynPUF data from the GDM to the Sentinel data model (version 6.0) to ensure that the model did not contain any structural irregularities that would make it difficult to move data into other data model structures.

Finally, we conducted a test of information loss in the context of applying quality control to a study of mesothelioma patients. We conducted two analyses by separate people based on a written specification document using SEER Medicare data. The first was conducted using the source data and a combination of SAS and R code, and the second was conducted using the GDM version of the data and proprietary software. The analysis required the use of several SEER-specific fields, including the tumor sequence (first primary), histology, reporting type (microscopic confirmation), reporting source (not at death or autopsy), and tumor location data.

ETL software

Our ETL process focused on the extraction of the source data and the transformation to the GDM data model, and saved tables as .csv files (i.e., it focused primarily on the E and T parts of the ETL). The ETL processes were built using R (version 3.4.4) and the data.table package (version 1.11.6) [ 21 ]. R was selected because it is an open-source, cross-platform software package; because of its flexibility for composing ETL functions; and because of the availability of the data.table package as an in-memory database written in C for speed. The package itself is modular, and allows users to compose arbitrary ETL functions. Although the approach is different, the process is conceptually related to the dynamic ETL described by Ong, et al. [ 22 ]

The resulting data model contains 19 tables (see hierarchical view in Fig.  2 ). Details of the tables are provided in Additional file  1 , and the most up-to-date version is available on a GitHub repository [ 23 ]. This repository will also contain links to any publicly available ETL specifications that we develop.

figure 2

Hierarchical View of the Generalized Data Model. Note: Table names and key relationships among tables are depicted above. See Additional file 1 for more detail on tables. Tables in green serve as lookup tables across the database. There is a single Addresses table for unique addresses with relationships to Patients, Practitioners, and Facilities, and a single Practitioners table with relationships to Patients and Contexts Practitioners. The Contexts Practitioners table allows multiple practitioners to be associated with a Context record

Clinical data

The Clinical Codes, Contexts, and Collections tables make up the core of the GDM (as shown in Fig. 1 ). All clinical codes are stored in the Clinical Codes table. Each row of the Clinical Codes table contains a single code from the source data. In addition, each row also contains a patient id, the associated start and end dates for the record, a provenance concept id, and a sequence number. The sequence number allows codes to retain their order from the source data, as necessary. The most obvious example from billing data is diagnosis codes that are stored in numbered fields (e.g., diagnosis 1, diagnosis 2, etc.). But any set of ordered records could be stored this way, including groups of codes in a pre-coordinated expression. Grouping together ordered records in the Clinical Codes table is accomplished by associating them with the same id from the Contexts table. The provenance id allows for the specification of the type of record (e.g., admitting diagnosis, problem list diagnosis, etc.).

The Contexts table allows for grouping clinical codes and storing information about their origin. The record type concept id identifies the type of group that is stored. Examples might include lines from claims data where diagnoses, procedures, and other information are grouped, prescription and refill records that might be in electronic medical record or pharmacy data, or measurements of some kind from electronic health record or laboratory data (e.g., systolic and diastolic blood pressure, or a laboratory panel). In addition, the table stores the file name from the source data, the Center for Medicare and Medicaid Services place of service values [ 27 ] (used for physician records since facility records to not have a place of service in claims data), and foreign keys to the care site and facility tables. The Contexts table also contains a patient id and both start and stop dates which could be different from the start and stop dates of the individual records from other tables to which the Contexts record is linked (e.g., a hospitalization may have different start and stop dates than the individual records within the hospitalization, as might occur with an in-hospital procedure performed on a single day of a multi-day hospitalization).

The Collections table represents a higher level of hierarchy for records in the Contexts table. That is, records in the Collections table represent groups of records from the Contexts table. This kind of grouping occurs when multiple billable units (“lines” or “details”) are combined into invoices (“claims”). It also occurs when prescriptions, laboratory measures, diagnoses and/or procedures are all recorded at a single office visit. In short, a Collection is typically a “claim” or a “visit” depending on whether the source data is administrative billing or electronic health record data. By using a hierarchical structure, the model avoids the requirement to construct “visits” from claims data which often leads to inaccuracy, loss of information, and complicated ETL processing. In the simplest possible case, it is possible to have a single record in the Clinical Codes table which is associated with a single Context record, which is associated with a single Collection record, as shown in Fig. 1 for a drug record. The critical part of the ETL process, moving data into the Clinical Codes, Contexts, and Collections tables, is described in Fig.  3 for the SynPUF data.

figure 3

Visualization of the ETL Process for SynPUF Data. Note: Clinical codes are derived from a single row in the source data set (SynPUF record). Colored arrows indicate how each group of codes is used to create records. Each code from the original record gets its own row in the Clinical Codes table. Codes that are grouped together (e.g., line diagnosis 1 and procedure 1 in yellow) share the same context. In the Contexts table, type concept id ending in “64” indicates a claim level context, and the id ending in “65” indicates a line level context. The three contexts (groups of codes) share the same collection id

The Details tables capture domain-specific information related to hospitalizations, drugs, and measurements. The Admissions Details table stores admissions and emergency department information that doesn’t fit in the Clinical Codes, Contexts, or Collections tables. It is designed to hold one admission per row. Each record in the Collections table for an inpatient admission links to this table. The Drug Exposure Details and Measurement Details contain information about medications and measurements (e.g., laboratory values). The Clinical Codes table contains foreign keys to these tables. We should also note that these two tables could be combined with the Clinical Codes table to make one larger table and improve query times on some database platforms. While this might require some minor modifications to the query, it wouldn’t change the underlying logic of the data model.

Patient data

The Patients table includes information about birth date, sex, race, ethnicity, address (via the Addresses table) and primary care provider (via Practitioners table). The Patient Details table allows a more flexible structure for timeless information like family history or simple genetic information. The Information Periods table captures periods of time during which the information in each table is relevant. This can include multiple records for each patient, including records for different enrollment types (e.g., Medicare Part A, Medicare Part B, or Medicare Managed Care) or this can be something as simple as a single date range of “up-to-standard” data as provided by the Clinical Practice Research Datalink. This table includes one row per patient for each unique combination of information type and date range.

The Deaths table captures mortality information at the patient level, including date and cause(s) of death. This is typically populated from beneficiary or similar administrative data associated with the medical record. However, it is useful to check discharge status in the Admissions Details table as part of ETL process to ensure completeness. There are also diagnosis codes that indicate death. Deaths that are indicated by diagnosis codes should be in the Clinical Codes table and not be moved to the Deaths table. If needed, these codes can be identified using an appropriate algorithm (e.g., a set of ICD-9 codes, possibly with associated provenance specifications) to identify death as part of the identification of outcomes in an analysis.

There are two tables that store cost, charge, or payment data of some kind. The Payer Reimbursements table stores information from administrative claims data, with separate columns for each commonly used reimbursement element. All other financial information is stored in the Costs table, which is designed to support arbitrary cost types, and uses a “value_type_concept_id” to indicate the specific type. Costs may be present at a Context (line-item) or Collection (invoice) level. Therefore, this led us to align costs with the Contexts table. By evaluating the type of the context record, users can determine whether a cost is an aggregated construct or not. In administrative claims data, this means that each “line” (diagnosis and procedure) can have a cost record. For records that have costs only at the claim/header level (e.g., inpatient hospitalizations), only Contexts that refer to “claims” (i.e., a record_type_concept_id for “claim”) will have costs. For data with costs at both the line and claim/header level, costs can be distinguished by the Context type. In our experience, the sum of the line costs does not always equal the total cost, so depending on the research question, the researcher will need to determine whether claim, line, or both should be used. It is possible that each Clinical Code record sharing a single Contexts record could have a different cost; therefore, the two cost-related tables include a column to indicate the specific Clinical Code record to which the cost belongs. This might occur, for example, if multiple laboratory tests have different costs, but are share a common provenance (i.e., Contexts record).

Facility and practitioner data

The Facilities table contains unique records for each facility where a patient is seen. The facility_type_concept_id should be used to describe the whole facility (e.g., Academic Medical Center or Community Medical Center). Specific departments in the facility should be entered in the Contexts table using the care_site_type_concept_id field. The Addresses table captures address information for practitioners and facilities, as well as patients.

The Contexts Practitioners table links one or more practitioners with a record in the Contexts table. Each record represents an encounter between a patient and a practitioner in a specific context. This role_type_concept_id in the table captures the role, if any, the practitioner played on the context (e.g., attending physician).

Vocabulary data

The Concepts table provides a unique numeric identifier (“concept_id”) for each source code in each vocabulary used in the data (see Table  1 ). Since queries against the GDM are intended to use the source codes, the Vocabulary table functions as a lookup table; therefore, the Concepts table does not have to be consistent across databases. However, there may be efficiencies in using a consistent set of identifiers for all entries from commonly used vocabularies. The specific vocabularies used in the data are provided in the Vocabularies table. The idea of having both Concepts and Vocabularies tables was adapted from the OMOP data models. As mentioned in Methods, the Mappings table allows for the expression of consistent concepts across databases.

The Mappings table is designed to express relationships among data elements. It can also be used to facilitate translation into other data models (see Table  2 ). In a few very simple cases like sex and race/ethnicity, we recommend concept mappings to a core set of values to make it easier for users of a protocol implementation software to filter patients by age, gender, and race/ethnicity using a simpler representation of the underlying information. The Mappings table also permits an arbitrarily complex set of relationships, along the lines of the approach taken with the OMOP model and the use of standard concepts for all data elements. By using a Mappings table, we reduce the need to re-map and re-load the entire dataset when new mappings become available. Regardless of how the Mappings table is used, the GDM still retains the original codes from the raw dataset.

ETL results

We loaded SynPUF data and SEER Medicare data into the GDM. After downloading the data to a local server, the process of migrating the SynPUF data with 2.1 million patients of data to the GDM took approximately 8 h on a Windows server with 4 cores and 128 Gb of RAM and conventional hard drives (running two files at a time in parallel). Most of the time was spent loading files into RAM and writing files to disk since the process of ETL with the GDM is primarily about relocating data.

SEER Medicare data for SCLC included approximately 20,000 patients and took less than 1 h. Selected SEER data was included in the ETL process ignoring recoded versions of existing variables or variables used for consistency of interpretation over time. The ETL process focused on 31 key variables including histology, location, behavior, grade, stage, surgery, radiation, urban/rural status, and poverty indicators. Each SEER variable was included as a new vocabulary in the Concepts table (see Table 1 ).

CPRD data included approximately 140,000 patients and took approximately 2 h. For the Test file which contains laboratory values and related measurements, we used Read codes in the Clinical Codes table; however, one could add the “entity types” (numeric values for laboratory values and other clinical measurements and assessments) to the Clinical Codes table as well, with both the Read code and the entity type associated with the same Context record and the same Measurement Details record. We used the entity types for all records in the CPRD Additional Clinical Details table. In all cases, the Mapping table allows for alternative relationships to be added to the data.

Information loss

After reconciling differences in interpretation and resolving coding errors, we identified the identical cohort of patients when using the source data compared to using the same data in the GDM.

ETL from the GDM to sentinel

We conducted an exploratory transformation from the GDM to Sentinel to ensure that it was feasible. The process of moving the data was conducted as follows. The transformations from the GDM Patients, Deaths, and Information Periods tables to Sentinel’s Demographic, Death, and Enrollment tables required renaming variables and mapping a source data vocabulary to a Sentinel vocabulary (e.g., SynPUF sex coding to Sentinel sex coding). The Sentinel Diagnosis, Procedure, and Dispensing tables were populated by splitting the GDM Clinical Codes table by clinical_code_source_vocabulary (e.g., ICD-9 codes were moved to the Sentinel Diagnosis table).

Populating the Sentinel Encounter table required records to be rolled up into a visit. To do this, the Contexts table was transformed into a “pre-Encounter” table with an encounter identifier set to the Contexts table identifier, with a similar process used for the Sentinel Procedure and Diagnosis tables. The “pre-Encounter” table was created with all of the specified columns and correctly mapped data, but had not yet grouped the records into visits. We applied logic based primarily on provenance information in the Contexts table to roll-up records into visits, and we created a new identifier in the Encounter table. Finally, the Diagnosis and Procedure tables were updated with new Encounter table identifier.

The remaining processing from the GDM to Sentinel involved vocabulary transformation since Sentinel has specific ways of representing concepts like sex which, in the GDM, are based on the source (e.g., male = 1 and female = 2) using a unique concept id in the Vocabulary table. We created records in the Mappings table from the SynPUF concepts to the Sentinel concepts (Table 2 ) to accomplish all needed mappings. Our ETL process then used those mappings to insert the correctly transformed variables from the GDM into the Sentinel tables during the ETL.

The GDM is designed to allow clinical researchers to identify the clinical, resource utilization, and cost constructs needed for a wide range of epidemiological and health services research areas without altering the data’s original semantics by creating visits or domains, or performing substantial vocabulary mapping. This provides flexibility for researchers to study not only clinical encounters like outpatient visits, hospitalizations, emergency room visits, and episodes of care, but also more basic constructs like conditions or medication use. Its main goal is to simplify the location of the most important information for creating analysis data sets, which has the benefit of making ETL easier. It does this by using a hierarchical structure instead of visits. It tracks the provenance of the original data elements to enhance the reproducibility of studies. It includes a table to store relationships among data elements for standardized analyses. And it allows for a subsequent ETL process to other data models to provide researchers access to the analytical tools and frameworks associated with those models.

Because other data models (e.g., OMOP, Sentinel, PCORnet, and i2b2) use visits to connect patient-related information within the data model, our emphasis on avoiding visits deserves comment. Visits are seldom required for clinical research, unless the enumeration of explicit visits is the research topic itself. However, for most research projects, protocols require retrieval of the dates of specific, clinically relevant codes, perhaps with provenance or temporal constraints. Satisfying these criteria does not require knowledge of a visit, per se. It is a research project in and of itself to define visits, and their definitions are specific to the health services research question being investigated [ 14 ]. For example, a study of “emergency department” visits would need to consider at least four options to define a visit [ 24 ]. Data models that pre-define visits do not allow such flexibility.

The challenges with visits can best be seen by inspecting the guidelines for creating visits from each data model. In the Sentinel version 6 data model [ 10 ], a visit is defined as a unique combination of patient, start date, provider and visit type. Visit types are defined as Ambulatory, Emergency Department, Inpatient Hospital, Non-acute Institutional, and Other. Furthermore, “Multiple visits to the same provider on the same day should be considered one visit and should include all diagnoses and procedures that were recorded during those visits. Visits to different providers on the same day, such as a physician appointment that leads to a hospitalization, should be considered multiple encounters.”

PCORnet version 4.1 is similar to Sentinel [ 12 ]. However, PCORnet allows more visit types compared to PCORnet version 3, OMOP, and Sentinel. It includes Emergency Department Admit to Inpatient Stay, Observation Stay, and Institutional Professional Consult.

In the OMOP version 5.31 data model, a visit is defined for each “visit to a healthcare facility.” According to the specifications [ 6 ], in any single day, there can be more than one visit. One visit may involve multiple providers, in which case the ETL must either specify how a single provider is selected or leave it null. One visit may involve multiple care sites, in which case the ETL must either specify how a single site is selected or leave it null. Visits must be given one of the following visit types: Inpatient Visit, Outpatient Visit, Emergency Room Visit, Long Term Care Visit and Combined ER and Inpatient Visit. OMOP added an optional Visit Detail table in version 5.3, recognizing the two-level hierarchy common in US claims data [ 6 ].

For i2b2, the specifications state a visit “.. . can involve a patient directly, such as a visit to a doctor’s office, or it can involve the patient indirectly, as in when several tests are run on a tube of the patient’s blood. More than one observation can be made during a visit. All visits must have a start date / time associated with them, but they may or may not have an end date. The visit record also contains specifics about the location of the session, such as the hospital or clinic the session occurred and whether the patient was an inpatient or an outpatient at the time of the visit.” There are no specified visit types, and the data model allows for an “unlimited number of optional columns but their data types and coding systems are specific to the local implementation” [ 4 ].

Clearly, each data model has different perspectives on the definition of a visit. Such ambiguity can lead to differences in how tables are created in the ETL process. As a result, inconsistencies within or across data models can lead to differences in results, as has already been demonstrated [ 25 , 26 ]. Laboratory records could be visits as with i2b2, or could be associated with visits as with other data models. Similarly, prescription, refill, and pharmacy dispensing records could be considered visits, or associated with visits. And other information, like family history, might not require a visit at all. In short, the most important structural component of other data models cannot be accurately and consistently defined, which affects the consistency of analyses across the data models, and makes translation among data models problematic. This also undermines provenance since each data model might answer the question of “where did this record come from” using different visit types. However, we note that these are semantic considerations and not technical limitations for record retrieval. For example, the i2b2 query platform recently has been extended to permit querying of OMOP and PCORnet data [ 28 ].

One important consideration in using data models is their stability. It can be labor-intensive to keep data updated, and if both the data and the data model are changing, maintenance may be prohibitively time-consuming [ 13 ]. One of our intentions is that the GDM should remain stable over time; therefore, we incorporated separate Vocabulary and the Mappings tables which can be updated without running the ETL from the beginning. Hence, the GDM may be a useful, harmonized approach for data providers, compared to their various proprietary solutions. This contrasts with the OMOP data model which requires re-running the ETL when the vocabulary and domain mappings are updated.

The value of domains is that they allow data users to identify the necessary clinical information to extract for analysis and they facilitate interoperability. However, moving raw healthcare data into domains requires either mapping the entire vocabulary into a single domain, or mapping each individual code into a single domain. Placing codes in domain-specific tables can be particularly challenging when vocabularies cross domains (e.g., Read) or when individual codes are ambiguous (e.g., family history information). The GDM does not require domains or vocabulary mappings to be fully functional. The GDM only requires that users assign a unique number (concept id) to all unique source codes in a given dataset to ensure consistency in the data type for the codes. The vocabulary table is simply a look-up table for the codes and concept ids. Because of this, all codes in all vocabularies (e.g., ICD-9, HCPCS [ 29 ], etc.) in the source data will be retained unless there is an explicit decision to exclude a code. However, if needed, the GDM could support domains as an additional field in the Vocabulary table.

It is important to clarify the role of analyses in the ecosystem of data models. Neither the GDM nor any other data model is designed to support direct analyses of any sophistication on the entire database (excluding summary analyses to characterize the entire dataset). The role of the data model is to ease the extraction and organization of analysis data sets to address specific clinical research questions. The required analysis dataset structure depends on the specific analyses (e.g., prevalence, incidence, time to event, repeated measures, etc.) and is typically performed using R (OHDSI) or SAS (Sentinel). By starting with the GDM, researchers can develop tools to extract data directly, or implement the necessary transformations to migrate their data to other data models and make use of the tools for extraction and analysis offered by those models. While this requires another ETL process, or a database view to be created on the GDM, it facilitates access to existing analytical tools. Hence, the GDM can be used as a standardized waypoint in a data pipeline because the necessary information for other data models can be contained within the GDM as we found in our test of a GDM to Sentinel conversion.

We should also note that our approach to incorporating relationships into the data (i.e., our Mappings table) is not unique. Others have designed approaches that rely on semantic mappings to organize and extract data [ 30 ]. There are even methods to eliminate the need for both database reorganization and semantic mapping [ 31 ]. While these approaches may be more flexible and avoid cumbersome ETL and/or mapping processes, it is unclear how they fare with respect to the sensitivity and specificity of their exposure and outcome definitions making it challenging to understand or assess bias in their results [ 32 , 33 ].

Information loss and data quality assessment are challenging subjects. We designed the GDM to minimize information loss in the sense that any codes in the source data can be incorporated by creating entries in the Concepts, Vocabularies, and Clinical Codes tables. We also retained database specific provenance information by indicating the source file from which each data element is derived as well as the type of information that was derived. While we tested information loss in the context of a cohort study and found no problems, this is not a guarantee that all necessary information is, or can be, retained. A more robust assessment of data quality will be the subject of future research. However, our use of the SEER data is illustrative because detailed oncology data does not fit naturally into any of the other data models mentioned. Cancer registry data relies heavily on very specific vocabularies for location, histology, grade, staging, behavior, reporting source, microscopic confirmation and many other factors. Many of these don’t fit easily into the existing domain-based tables. The OMOP data model has a further complication in that the International Classification of Diseases for Oncology version 3 (ICD-O-3) which covers location, histology, grade, and behavior is not a standard vocabulary. Therefore, while the OMOP data model stores the concatenated source codes, work remains to be done to map all combinations to the proper standard vocabulary based on SNOMED. (This work is ongoing at the time of this writing).

There are other limitations to the GDM. While we have tested it against data that is typically used by health services researchers and epidemiologists, there are likely to be specific data sets that will require modifications or improvements. The GDM does not yet include tables for patient reported outcomes, genomic data, or free text notes which are becoming more widely available for researchers. If other data models add support for these or other fields, this might require changes to the GDM to retain compatibility. For example, more detailed location information may need to be added for those with access to additional data (which is often limited due to privacy issues). While we have considered data from Japan and the United Kingdom, there are many data sources to which we did not have access that might require changes in the data model. Finally, while we have developed tools to extract analysis data sets from the GDM based on a protocol, they are not yet available publicly. (However, the ConceptQL language on which the tools are based is open-source [ 34 ]).

The GDM is designed to retain the relationships among data elements to the extent possible, facilitating ETL and protocol implementation as part of a complete data pipeline for clinical researchers using commonly available observational data. Furthermore, by avoiding the requirements to create visits and to use domains, it offers researchers a simpler process of standardizing the location of data in a defined structure and may make it easier for users to transform their data into other data models.

Availability of data and materials

The data model is publicly available. The raw data is not available due to privacy reasons, except for the Medicare Synthetic Public Use data. See Ethics approval and consent to participate for details on SEER Medicare and CPRD data acquisition, and References for a specific hyperlink to the Synthetic Public Use data.

Abbreviations

American National Standards Institute

Centers for Medicare and Medicaid Services

Clinical Practice Research Data link

Electronic Health Records

Extract Transform and Load

Generalized Data Model

Informatics for Integrating Biology and the Bedside

International Classification of Diseases

North American Association of Central Cancer Registries

Observational Health Data Science and Informatics

Observational Outcomes Medical Partnership

Patient Centered Outcomes Research Network

Surveillance Epidemiology and End Results

Synthetic Public Use Files

Kahn MG, Batson D, Schilling LM. Data model considerations for clinical effectiveness researchers. Med Care. 2012;50:S60–7.

Article   Google Scholar  

Klann JG, Abend A, Raghavan VA, Mandl KD, Murphy SN. Data interchange using i2b2. J Am Med Informatics Assoc. 2016;23:909–15.

Murphy SN, Weber G, Mendis M, Gainer V, Chueh HC, Churchill S, et al. Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2). J Am Med Informatics Assoc. 2010;17:124–30.

i2b2 Common Data Model. https://i2b2.org/software/files/PDF/current/CRC_Design.pdf . Accessed 20 Apr 2017.

Overhage JM, Ryan PB, Reich CG, Hartzema AG, Stang PE. Validation of a common data model for active safety surveillance research. J Am Med Inform Assoc. 2012;19:54–60.

OHDSI. OMOP Common Data Model. http://www.ohdsi.org/web/wiki/doku.php?id=documentation:overview . Accessed 20 Apr 2017.

Voss EA, Makadia R, Matcho A, Ma Q, Knoll C, Schuemie M, et al. Feasibility and utility of applications of the common data model to multiple, disparate observational health databases. J Am Med Informatics Assoc. 2015;22:553–64.

Psaty BM, Breckenridge AM. Mini-sentinel and regulatory science--big data rendered fit and functional. N Engl J Med. 2014;370:2165.

Curtis LH, Weiner MG, Boudreau DM, Cooper WO, Daniel GW, Nair VP, et al. Design considerations, architecture, and use of the mini-sentinel distributed data system. Pharmacoepidemiol Drug Saf. 2012;21(SUPPL. 1):23–31.

Sentinel Common Data Model. https://www.sentinelinitiative.org/sentinel/data/distributed-database-common-data-model . Accessed 20 Apr 2017.

Fleurence RL, Curtis LH, Califf RM, Platt R, Selby JV, Brown JS. Launching PCORnet, a national patient-centered clinical research network. J Am Med Inform Assoc. 2014;21:578–82.

PCORnet Common Data Model v 4.1. https://pcornet.org/data-driven-common-model/ . Accessed 28 Sept 2018.

Bourke A, Bate A, Sauer BC, Brown JS, Hall GC. Evidence generation from healthcare databases: recommendations for managing change. Pharmacoepidemiol Drug Saf. 2016;25:749–54.

Tyree PT, Lind BK, Lafferty WE. Challenges of using medical insurance claims data for utilization analysis. Am J Med Qual. 2006;21:269–75.

Centers for Medicare and Medicaid Services. Medicare fee-for-service companion guides. https://www.cms.gov/Medicare/Billing/ElectronicBillingEDITrans/CompanionGuides.html . Accessed 24 Oct 2017.

Herrett E, Gallagher AM, Bhaskaran K, Forbes H, Mathur R, van Staa T, et al. Data resource profile: clinical practice research datalink (CPRD). Int J Epidemiol. 2015;44:827–36.

Park HS, Lloyd S, Decker RH, Wilson LD, Yu JB. Overview of the surveillance, epidemiology, and end results database: evolution, data variables, and quality assurance. Curr Probl Cancer. 36:183–90.

Danese MD, Voss EA, Duryea J, Gleeson M, Duryea R, Matcho A, et al. Feasibility of converting the Medicare synthetic public use data into a standardized data model for clinical research informatics. In: AMIA 2015 annual symposium. San Francisco; 2015.

Centers for Medicare and Medicaid Services. Synthetic public use file. https://www.cms.gov/research-statistics-data-and-systems/downloadable-public-use-files/synpufs/ . Accessed 20 Apr 2017.

Warren JL, Klabunde CN, Schrag D, Bach PB, Riley GF. Overview of the SEER-Medicare data: content, research applications, and generalizability to the United States elderly population. Med Care. 2002;40(8 Suppl):IV–3-18.

PubMed   Google Scholar  

Comprehensive R. Archive network. R. .

Ong TC, Kahn MG, Kwan BM, Yamashita T, Brandt E, Hosokawa P, et al. Dynamic-ETL: a hybrid approach for health data extraction, transformation and loading. BMC Med Inform Decis Mak. 2017;17:134.

Outcomes Insights Inc. Generalized Data Model. https://github.com/outcomesinsights/generalized_data_model . Accessed 20 Apr 2017.

Venkatesh AK, Mei H, Kocher KE, Granovsky M, Obermeyer Z, Spatz ES, et al. Identification of emergency department visits in Medicare administrative claims: approaches and implications. Acad Emerg Med. 2017;24:422–31.

Xu Y, Zhou X, Suehs BT, Hartzema AG, Kahn MG, Moride Y, et al. A comparative assessment of observational medical outcomes partnership and mini-sentinel common data models and analytics: implications for active drug safety surveillance. Drug Saf. 2015;38:749–65.

Zhou X, Murugesan S, Bhullar H, Liu Q, Cai B, Wentworth C, et al. An evaluation of the THIN database in the OMOP common data model for active drug safety surveillance. Drug Saf. 2013;36:119–34.

Article   CAS   Google Scholar  

Centers for Medicare and Medicaid Services. Place of service code set. https://www.cms.gov/Medicare/Coding/place-of-service-codes/Place_of_Service_Code_Set.html . Accessed 20 Sep 2018.

Klann JG, Phillips LC, Herrick C, Joss MAH, Wagholikar KB, Murphy SN. Web services for data warehouses: OMOP and PCORnet on i2b2. J Am Med Inform Assoc. 2018;25(10):1331–8.

Centers for Medicare and Medicaid Services. HCPCS.

Bradshaw RL, Matney S, Livne OE, Bray BE, Mitchell JA, Narus SP. Architecture of a federated query engine for heterogeneous resources. AMIA . Annu Symp proceedings AMIA Symp. 2009;2009:70–4.

Google Scholar  

Rajkomar A, Oren E, Chen K, Dai AM, Hajaj N, Liu PJ, et al. Scalable and accurate deep learning for electronic health records. npj Digit Med. 2018; January:1–10.

Lash TL, Fox MP, Cooney D, Lu Y, Forshee RA. Quantitative Bias analysis in regulatory settings. Am J Public Health. 2016;106:1227–30.

Duan R, Cao M, Wu Y, Huang J, Denny JC, Xu H, et al. An empirical study for impacts of measurement errors on EHR based association studies. AMIA Annu Symp proceedings AMIA Symp. 2016;2016:1764–73.

Outcomes Insights Inc. ConceptQL. https://github.com/outcomesinsights/conceptql . Accessed 30 Sep 2018.

Bodenreider O. The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res. 2004;32(suppl 1):D267–70.

Download references

Acknowledgements

We gratefully acknowledge the influence of the open-source OMOP model specifications on our thinking in creating our data model. In addition, we acknowledge the influence of Sentinel, PCORnet, and i2b2 on our approach, although most of our data model was designed prior to reviewing these models in detail. We also thank Chris Adamson for helpful discussions about organizing the data model in different ways. At the time of writing, all references to the concepts table refer to the OMOP version 5.20 vocabulary table maintained by OHDSI. However, there is no reason that a user could not create their own system of codes with unique identifiers across vocabularies, or use the codes from the National Library of Medicine Metathesaurus [ 35 ].

This research was self-funded.

Author information

Authors and affiliations.

Outcomes Insights, Inc., 2801 Townsgate Road, Suite 330, Westlake Village, CA, 91361, USA

Mark D. Danese, Marc Halperin, Jennifer Duryea & Ryan Duryea

You can also search for this author in PubMed   Google Scholar

Contributions

MD, MH, RD, and JD contributed to the design of the data model. MH wrote the software code for data transformations. All authors have read and approved the manuscript.

Authors’ information

MD is an epidemiologist who has worked with a wide variety of clinical data sources across therapeutic areas. JD and RD have extensive experience designing software for providers to submit medical bills to insurers. JD has constructed and/or substantially revised OMOP ETL specifications for many commercially available data sources. MD, JD, and RD are collaborators in the Observational Data Health Sciences and Informatics organization.

Corresponding author

Correspondence to Mark D. Danese .

Ethics declarations

Ethics approval and consent to participate.

A study protocol, an institutional review board exemption determination (Quorum IRB exemption determination #31309), and a data use agreement were required to access SEER-Medicare data. CPRD receives ethics approval to supply patient data for all protocols. Because of this, and the fact that all data are de-identified, no IRB approval or exemption are required. Our study protocol to access the CPRD data was reviewed by the CPRD Independent Scientific Advisory Committee. Provision of the CPRD data required a data use agreement. All transformations of the raw data were completed as part of the process of creating analysis data sets for approved study protocols. No clinical data was analyzed or reported for this study.

Consent for publication

Not applicable.

Competing interests

Outcomes Insights provides consulting services and license software for implementing research protocols using observational data.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Additional file

Additional file 1:.

The Generalized Data Model Table Specifications. (DOCX 75 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/ ), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article.

Danese, M.D., Halperin, M., Duryea, J. et al. The Generalized Data Model for clinical research. BMC Med Inform Decis Mak 19 , 117 (2019). https://doi.org/10.1186/s12911-019-0837-5

Download citation

Received : 12 November 2017

Accepted : 10 June 2019

Published : 24 June 2019

DOI : https://doi.org/10.1186/s12911-019-0837-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Claims data
  • Electronic health records

BMC Medical Informatics and Decision Making

ISSN: 1472-6947

data analysis clinical research

Handbook home

  • Search the Handbook
  • Undergraduate courses
  • Graduate courses
  • Research courses
  • Undergraduate subjects
  • Graduate subjects
  • Research subjects
  • Breadth Tracks
  • CAPS Login - Staff only
  • Data Analysis in Clinical Research

Data Analysis in Clinical Research (CLRS90010)

Graduate coursework Points: 12.5 On Campus (Parkville)

View full page

About this subject

  • Eligibility and requirements
  • Dates and times
  • Further information
  • Timetable (opens in new window)

Contact information

Email: [email protected]

Phone: + 61 3 8344 0149

Contact hours : https://unimelb.edu.au/professional-development/contact-us

Please refer to the LMS for up-to-date subject information, including assessment and participation requirements, for subjects being offered in 2020.

Availability
Fees

Data analysis methods are an integral part of modern clinical research. They are powerful techniques that enable researchers to draw meaningful conclusions from data collected through observation, survey, or experimentation.

However, data analysis is a huge discipline with different paradigms, schools of thought and alternative methodologies. Therefore consideration of the appropriate methods used must be undertaken when designing a study and selecting variables and groups.

This subject introduces students to the basic principles of qualitative and quantitative data analysis techniques. It will provide a functional grounding in the theoretical concepts behind each type of analysis, as well as exploration of the interpretation of data and the difference, where applicable, between clinical vs statistical significance.

Intended learning outcomes

On completion of this subject students should be able to:

  • describe the theoretical concepts behind a range of qualitative and quantitative data analysis techniques
  • compare and contrast the strengths and weaknesses of different qualitative and quantitative data analysis techniques
  • describe a strategy for selecting an appropriate data analysis technique based on the study design selected and/or research data collected
  • competently perform a range of basic data analysis techniques using appropriate analysis software and interpret analysis output/s
  • provide a rationale for the importance of statistical power and perform power calculations
  • identify and discuss the key elements associated with ensuring data integrity including storage, management, collation and coding
  • critically compare and contrast statistical vs clinical significance and its relevance to clinical practice
  • demonstrate confidence in discussing the validity of data analysis outcomes reported in the scientific literature.

Generic skills

  • to engage with unfamiliar problems and identify relevant data analysis strategies
  • to construct and express logical arguments and to work in abstract or general terms to increase the clarity and efficiency of data analysis
  • communicate advanced data analysis concepts in written and oral form;
  • the ability to comprehend complex data analysis information
  • exercise responsibility for their own learning;
  • manage their time effectively.

Last updated: 3 November 2022

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • My Bibliography
  • Collections
  • Citation manager

Save citation to file

Email citation, add to collections.

  • Create a new collection
  • Add to an existing collection

Add to My Bibliography

Your saved search, create a file for external citation management software, your rss feed.

  • Search in PubMed
  • Search in NLM Catalog
  • Add to Search

Approaches to data analyses of clinical trials

Affiliation.

  • 1 Division of Public Health Sciences, Wake Forest University School of Medicine, Winston-Salem, NC, USA. [email protected]
  • PMID: 22225999
  • DOI: 10.1016/j.pcad.2011.07.002

There are two types of data analyses of randomized clinical trials (RCTs). The primary analyses are pre-specified in the protocol and the findings form the basis for recommendations and clinical decisions. They typically adhere to the intention-to-treat principle. Secondary analyses are supplemental and of various sorts. Although some may be pre-specified, many are not. We encourage the use of the rich sources of data from large RCTs for these secondary purposes. Depending on the kinds of secondary analyses, whether they are pre-specified, and whether intention-to-treat analysis is used, the results range from being quite conclusive to being hypothesis generating. In this article we answer four questions related to secondary analysis with emphasis on sharing of data primarily from NIH-sponsored trials: Who has access to this information? What questions can be asked? What are the requirements? What are the common challenges?

Copyright © 2012. Published by Elsevier Inc.

PubMed Disclaimer

Similar articles

  • Secondary use of randomized controlled trials to evaluate drug safety: a review of methodological considerations. Hammad TA, Pinheiro SP, Neyarapally GA. Hammad TA, et al. Clin Trials. 2011 Oct;8(5):559-70. doi: 10.1177/1740774511419165. Epub 2011 Aug 30. Clin Trials. 2011. PMID: 21878445 Review.
  • The use of the intention-to-treat principle in nursing clinical trials. Polit DF, Gillespie BM. Polit DF, et al. Nurs Res. 2009 Nov-Dec;58(6):391-9. doi: 10.1097/NNR.0b013e3181bf1505. Nurs Res. 2009. PMID: 19918150
  • Why have meta-analyses of randomized controlled trials of the association between non-white-blood-cell-reduced allogeneic blood transfusion and postoperative infection produced discordant results? Vamvakas EC. Vamvakas EC. Vox Sang. 2007 Oct;93(3):196-207. doi: 10.1111/j.1423-0410.2007.00959.x. Vox Sang. 2007. PMID: 17845256 Review.
  • Subgroup analyses in therapeutic cardiovascular clinical trials: are most of them misleading? Hernández AV, Boersma E, Murray GD, Habbema JD, Steyerberg EW. Hernández AV, et al. Am Heart J. 2006 Feb;151(2):257-64. doi: 10.1016/j.ahj.2005.04.020. Am Heart J. 2006. PMID: 16442886
  • Subgroup analyses in randomized clinical trials: statistical and regulatory issues. Grouin JM, Coste M, Lewis J. Grouin JM, et al. J Biopharm Stat. 2005;15(5):869-82. doi: 10.1081/BIP-200067988. J Biopharm Stat. 2005. PMID: 16078390
  • Tutorial: best practices and considerations for mass-spectrometry-based protein biomarker discovery and validation. Nakayasu ES, Gritsenko M, Piehowski PD, Gao Y, Orton DJ, Schepmoes AA, Fillmore TL, Frohnert BI, Rewers M, Krischer JP, Ansong C, Suchy-Dicey AM, Evans-Molina C, Qian WJ, Webb-Robertson BM, Metz TO. Nakayasu ES, et al. Nat Protoc. 2021 Aug;16(8):3737-3760. doi: 10.1038/s41596-021-00566-6. Epub 2021 Jul 9. Nat Protoc. 2021. PMID: 34244696 Free PMC article. Review.
  • Longitudinal study of inflammatory, behavioral, clinical, and psychosocial risk factors for chemotherapy-induced peripheral neuropathy. Kleckner IR, Jusko TA, Culakova E, Chung K, Kleckner AS, Asare M, Inglis JE, Loh KP, Peppone LJ, Miller J, Melnik M, Kasbari S, Ossip D, Mustian KM. Kleckner IR, et al. Breast Cancer Res Treat. 2021 Sep;189(2):521-532. doi: 10.1007/s10549-021-06304-6. Epub 2021 Jun 30. Breast Cancer Res Treat. 2021. PMID: 34191201 Free PMC article.
  • Lack of Association between Postoperative Pancreatitis and Other Postoperative Complications Following Pancreaticoduodenectomy. Yoo D, Park SY, Hwang DW, Lee JH, Song KB, Lee W, Park Y, Jun E, Kim SC. Yoo D, et al. J Clin Med. 2021 Mar 11;10(6):1179. doi: 10.3390/jcm10061179. J Clin Med. 2021. PMID: 33799863 Free PMC article.
  • Minimal clinically important difference for daily pedometer step count in COPD. Polgar O, Patel S, Walsh JA, Barker RE, Clarke SF, Man WD, Nolan CM. Polgar O, et al. ERJ Open Res. 2021 Mar 22;7(1):00823-2020. doi: 10.1183/23120541.00823-2020. eCollection 2021 Jan. ERJ Open Res. 2021. PMID: 33778056 Free PMC article.
  • Utility of the Autism Diagnostic Observation Schedule and the Brief Observation of Social and Communication Change for Measuring Outcomes for a Parent-Mediated Early Autism Intervention. Carruthers S, Charman T, El Hawi N, Kim YA, Randle R, Lord C, Pickles A; PACT Consortium. Carruthers S, et al. Autism Res. 2021 Feb;14(2):411-425. doi: 10.1002/aur.2449. Epub 2020 Dec 4. Autism Res. 2021. PMID: 33274842 Free PMC article.
  • Search in MeSH

LinkOut - more resources

Full text sources.

  • Elsevier Science
  • Citation Manager

NCBI Literature Resources

MeSH PMC Bookshelf Disclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.

AI's role in Clinical Research and Drug Discovery

AI is revolutionizing healthcare by enhancing diagnostics, personalized treatments, and clinical trials through data analysis, predictive modeling, and patient recruitment.

Vera Ovanin

Artificial Intelligence (AI) is transforming clinical research by enhancing patient recruitment and streamlining drug development. In this article, we’ll aim to discover its transformative impact on clinical trials through advanced data analysis and predictive modeling.

AI’s impact on healthcare includes enhanced diagnostics, personalized treatments and operational efficiencies. What is less known, is the growing significance of machine learning in clinical trials, where it drives advancements through data analysis, predictive modeling and patient recruitment optimization.

AI accelerates drug discovery, enhances trial accuracy, and reduces costs while expediting treatments. For instance, AI algorithms efficiently analyze extensive data to identify potential drug candidates, predict treatment outcomes accurately, and optimize clinical trial designs for faster, more successful trials. AI computer vision models like Ultralytics YOLOv8 have been transforming the healthcare industry providing help for various datasets to facilitate object detection , instance segmentation , pose estimation and classification providing access to high-quality annotated data.   

Additionally, AI-driven platforms like DeepMind's AlphaFold have demonstrated the capability to predict the 3-D structure of molecules, revolutionizing drug design and discovery processes.

data analysis clinical research

Furthermore, Jimeng Sun's lab at the University of Illinois Urbana-Champaign introduced HINT (hierarchical interaction network) to forecast trial success based on drug molecules, target diseases and patient eligibility. Their SPOT system (sequential predictive modeling of clinical trial outcome) prioritizes recent data, influencing pharmaceutical trial designs and potential drug alternatives.

And yet, only a handful of established companies are deploying AI in their clinical development.

The Use of AI in Clinical Trials

AI is being applied across various domains in clinical research to improve efficiency, accuracy, and outcomes. Here’s a closer look at the key areas where AI in clinical trials is making a significant impact:

·   Data analysis and pattern recognition. AI can analyze extensive data from clinical trials, electronic health records, and other sources, uncovering patterns and correlations beyond human capacity. This enhances the pinpointing of treatment effects and patient responses with greater precision.

·   Patient recruitment and retention. AI algorithms can streamline participant selection for clinical trials, analyzing vast datasets to swiftly and accurately identify eligible patients. This accelerates recruitment and enhances retention rates by aligning participants more closely with trial criteria.

·   Predictive analytics for treatment outcomes. By analyzing historical and current patient data, predictive algorithms forecast treatment outcomes accurately. This aids in designing efficient trials and customizing treatments, potentially improving results and minimizing side effects for individual patients.

·  Automated data collection and management. AI can automate collection, organization, and analysis of data, minimizing human error and providing real-time insights. This streamlines processes, expediting research and advancing new treatments.

AI in Clinical Research: Navigating the Challenges

As AI continues to drive change in clinical research, it's essential to acknowledge the potential pitfalls alongside the promises. While AI offers enticing advantages such as improved efficiency, enhanced accuracy, streamlined patient recruitment and cost reduction, its implementation isn't without challenges. Here are some key considerations:

· Potential Biases in AI Algorithms . AI systems are trained on historical data, which may contain inherent biases such as selection, sampling, or measurement biases. For example, models may perform poorly on female patients due to predominantly male training data (selection bias), not generalize well to rural patients when trained on urban data (sampling bias), or perpetuate inaccuracies due to systematic errors in data collection (measurement bias). If left unchecked, these biases could lead to skewed outcomes, impacting patient care and research findings.

· Data Privacy and Security Concerns . With the massive amounts of sensitive patient data involved in clinical research, ensuring data privacy and security is paramount. AI systems are vulnerable to cyberattacks and breaches, raising concerns about the confidentiality and integrity of patient information.

· Regulatory and Ethical Challenges . The rapid advancement of AI technologies often outpaces regulatory frameworks and ethical guidelines. Questions arise regarding the appropriate use of AI in clinical research, including issues of informed consent, transparency, and accountability.

· Dependence on High-Quality Data . While AI thrives on data, its effectiveness is contingent on the quality, diversity, and extent of datasets. Inadequate, biased, or insufficient data can compromise the reliability and validity of AI-driven insights, hindering the progress of clinical research.

By addressing these concerns, stakeholders can pave the way for responsible integration of artificial intelligence in clinical research in the pursuit of advancing healthcare outcomes.

FDA Regulations: AI's Role in Clinical Research

The U.S. Food and Drug Administration (FDA) agency has observed a notable rise in drug and biologic application filings incorporating AI/machine learning elements in recent years, with over 100 submissions recorded in 2021. These filings span various stages of drug development, encompassing drug exploration, clinical investigation, post-market safety monitoring, and cutting-edge pharmaceutical production.

In support of innovation in drug development, the FDA has approved several AI tools and technologies for use in clinical research. They range from predictive analytics targeting patient recruitment, to image analysis for diagnostic purposes .

By providing FDA guidance for clinical trials, the agency recognizes that AI and machine learning present opportunities and challenges in drug development. To address both, the FDA is enhancing regulatory agility to foster innovation while prioritizing public health protection.

AI and machine learning will undoubtedly play a critical role in drug development, and FDA plans to develop and adopt a flexible risk-based regulatory framework that promotes innovation and protects patient safety.

AI Innovations in Healthcare: Key Players

Companies worldwide are increasingly leveraging AI to accelerate drug discovery and personalized treatment planning.  Here are some industry majors harnessing AI's potential:

· Pfizer: Leveraging AI for drug discovery and development, Pfizer is accelerating the identification and development of novel therapeutics, streamlining the process from research to market.

· Medidata Solutions: This cloud-based software solutions company utilizes AI to optimize clinical trials by streamlining data analysis, enhancing patient engagement and predicting outcomes in real-time. The end result is accelerated research and improved trial success rates.

data analysis clinical research

· BenevolentAI: Utilizing AI for hypothesis generation and validation, BenevolentAI transforms vast datasets into actionable insights, driving innovation and discovery in biomedical research.

· Tempus: Through collaboration with GlaxoSmithKline, Tempus personalizes treatments, optimizes efficacy and minimizes side effects with its AI-enabled platforms. Together, they aim to accelerate R&D success and deliver faster, tailored therapies to patients.

· Exscientia: Pioneering AI for drug design and optimization, Exscientia accelerates drug development timelines and enhances the precision of therapeutic interventions, leading to more effective treatments.

Focal Points and Future Horizons for AI in Clinical Research

Cardiology, oncology, neurology and rare diseases have emerged as focal fields for AI implementation in clinical research due to several factors. Firstly, these areas often involve complex data sets, making them ripe for AI-driven analysis and prediction.

Secondly, the high stakes nature of conditions in these fields, such as heart disease, cancer, neurological disorders and rare diseases, require precise and personalized approaches to diagnosis and treatment, which AI excels at providing.

Additionally, advancements in AI technologies have enabled researchers to develop innovative solutions tailored to the unique challenges presented by each of these medical specialties. As a result, AI has become increasingly integrated into clinical research within these areas, paving the way for improved patient outcomes and more efficient healthcare delivery.

However, the horizon of AI’s applications extends far beyond these domains. As technology advances and data availability increases, there's immense potential for AI to revolutionize other medical fields. 

From dermatology to radiology and psychiatry, AI holds promise in enhancing diagnostics, treatment planning, and patient care across diverse specialties. As researchers continue to explore AI's capabilities, its role in clinical research is poised to expand into previously uncharted territories, ushering in a new era of precision medicine and improved healthcare outcomes. 

AI and Clinical Research: Key Takeaways

 AI's transformative impact on healthcare spans diagnostics, personalized treatments, and operational efficiencies. In clinical trials, machine learning plays a pivotal role by driving advancements in data analysis, predictive modeling, and optimizing patient recruitment. This accelerates drug discovery, enhances trial accuracy and effectively reduces costs.

For example, AI algorithms efficiently analyze extensive data to identify drug candidates and predict treatment outcomes. Additionally, AI platforms like DeepMind's AlphaFold predict molecular structures, revolutionizing drug design. 

Yet, AI's potential transcends these areas, promising advancements in diverse specialties. Despite challenges like bias and data privacy concerns, AI's integration in clinical research offers transformative possibilities, ushering in a new era of precision medicine and improved healthcare outcomes.

Interested in AI? Join our community to stay informed with the latest news about artificial intelligence. Visit our GitHub repository and dive into computer vision applications across domains like agriculture and manufacturing !

Read more in this category

Let’s build the future of ai together.

Begin your journey with the future of machine learning

  • Open access
  • Published: 07 June 2024

Effects of intensive lifestyle changes on the progression of mild cognitive impairment or early dementia due to Alzheimer’s disease: a randomized, controlled clinical trial

  • Dean Ornish 1 , 2 ,
  • Catherine Madison 1 , 3 ,
  • Miia Kivipelto 4 , 5 , 6 , 7 ,
  • Colleen Kemp 8 ,
  • Charles E. McCulloch 9 ,
  • Douglas Galasko 10 ,
  • Jon Artz 11 , 12 ,
  • Dorene Rentz 13 , 14 , 15 ,
  • Jue Lin 16 ,
  • Kim Norman 17 ,
  • Anne Ornish 1 ,
  • Sarah Tranter 8 ,
  • Nancy DeLamarter 1 ,
  • Noel Wingers 1 ,
  • Carra Richling 1 ,
  • Rima Kaddurah-Daouk 18 ,
  • Rob Knight 19 ,
  • Daniel McDonald 20 ,
  • Lucas Patel 21 ,
  • Eric Verdin 22 , 23 ,
  • Rudolph E. Tanzi 13 , 24 , 25 , 26 &
  • Steven E. Arnold 13 , 27  

Alzheimer's Research & Therapy volume  16 , Article number:  122 ( 2024 ) Cite this article

28k Accesses

812 Altmetric

Metrics details

Evidence links lifestyle factors with Alzheimer’s disease (AD). We report the first randomized, controlled clinical trial to determine if intensive lifestyle changes may beneficially affect the progression of mild cognitive impairment (MCI) or early dementia due to AD.

A 1:1 multicenter randomized controlled phase 2 trial, ages 45-90 with MCI or early dementia due to AD and a Montreal Cognitive Assessment (MoCA) score of 18 or higher. The primary outcome measures were changes in cognition and function tests: Clinical Global Impression of Change (CGIC), Alzheimer’s Disease Assessment Scale (ADAS-Cog), Clinical Dementia Rating–Sum of Boxes (CDR-SB), and Clinical Dementia Rating Global (CDR-G) after 20 weeks of an intensive multidomain lifestyle intervention compared to a wait-list usual care control group. ADAS-Cog, CDR-SB, and CDR-Global scales were compared using a Mann-Whitney-Wilcoxon rank-sum test, and CGIC was compared using Fisher’s exact test. Secondary outcomes included plasma Aβ42/40 ratio, other biomarkers, and correlating lifestyle with the degree of change in these measures.

Fifty-one AD patients enrolled, mean age 73.5. No significant differences in any measures at baseline. Only two patients withdrew. All patients had plasma Aβ42/40 ratios <0.0672 at baseline, strongly supporting AD diagnosis. After 20 weeks, significant between-group differences in the CGIC ( p = 0.001), CDR-SB ( p = 0.032), and CDR Global ( p = 0.037) tests and borderline significance in the ADAS-Cog test ( p = 0.053). CGIC, CDR Global, and ADAS-Cog showed improvement in cognition and function and CDR-SB showed significantly less progression, compared to the control group which worsened in all four measures. Aβ42/40 ratio increased in the intervention group and decreased in the control group ( p = 0.003). There was a significant correlation between lifestyle and both cognitive function and the plasma Aβ42/40 ratio. The microbiome improved only in the intervention group ( p <0.0001).

Conclusions

Comprehensive lifestyle changes may significantly improve cognition and function after 20 weeks in many patients with MCI or early dementia due to AD.

Trial registration

Approved by Western Institutional Review Board on 12/31/2017 (#20172897) and by Institutional Review Boards of all sites. This study was registered retrospectively with clinicaltrials.gov on October 8, 2020 (NCT04606420, ID: 20172897).

Increasing evidence links lifestyle factors with the onset and progression of dementia, including AD. These include unhealthful diets, being sedentary, emotional stress, and social isolation.

For example, a Lancet commission on dementia prevention, intervention, and care listed 12 potentially modifiable risk factors that together account for an estimated 40% of the global burden of dementia [ 1 ]. Many of these factors (e.g., hypertension, smoking, depression, type 2 diabetes, obesity, physical inactivity, and social isolation) are also risk factors for coronary heart disease and other chronic illnesses because they share many of the same underlying biological mechanisms. These include chronic inflammation, oxidative stress, insulin resistance, telomere shortening, sympathetic nervous system hyperactivity, and others [ 2 ]. A recent study reported that the association of lifestyle with cognition is mostly independent of brain pathology, though a part, estimated to be only 12%, was through β-amyloid [ 3 ].

In one large prospective study of adults 65 or older in Chicago, the risk of developing AD was 38% lower in those eating high vs low amounts of vegetables and 60% lower in those consuming omega-3 fatty acids at least once/week, [ 4 ] whereas consuming saturated fat and trans fats more than doubled the risk of developing AD [ 5 ].A systematic review and meta-analysis of 243 observational prospective studies and 153 randomized controlled trials found a similar relationship between these and similar risk factors and the onset of AD [ 6 ].

The multifactorial etiology and heterogeneity of AD suggest that multidomain lifestyle interventions may be more effective than single-domain ones for reducing the risk of dementia, and that more intensive multimodal lifestyle interventions may be more efficacious than moderate ones at preventing dementia [ 7 ].

For example, in the Finnish Geriatric Intervention Study (FINGER) study, a RCT of men and women 60-77 in age with Cardiovascular Risk Factors, Aging, and Incidence of Dementia (CAIDE) dementia risk scores of at least 6 points and cognition at mean or slightly lower, a multimodal intervention of diet, exercise, cognitive training, vascular risk monitoring maintained cognitive function after 2 years in older adults at increased risk of dementia [ 8 ]. After 24 months, global cognition in the FINGER intervention group was 25% higher than in the control group which declined. Moreover, the FINGER intervention was equally beneficial regardless of several demographic and socioeconomic risk factors [ 9 ] and apolipoprotein E (APOE) ε4 status [ 10 ].

The FINGER lifestyle intervention also resulted in a 13-20% reduction in rates of cardiovascular disease events (stroke, transient ischemic attack, or coronary), providing more evidence that “what’s good for the heart is good for the brain”(and vice versa) [ 11 ]. Other large-scale multidomain intervention studies to determine if this intervention can help prevent dementia are being conducted or planned in over 60 countries worldwide, as part of the World-Wide FINGERS network, including the POINTER study in the U.S. [ 12 , 13 ].

More recently, a similar dementia prevention-oriented RCT showed that a 2-year personalized multidomain intervention led to modest improvements in cognition and dementia risk factors in those at risk for (but not diagnosed with) dementia and AD [ 14 ].

All these studies showed that lifestyle changes may help prevent dementia. The study we are reporting here is the first randomized, controlled clinical trial to test whether intensive lifestyle changes may beneficially affect those already diagnosed with mild cognitive impairment (MCI) or early dementia due to AD.

In two earlier RCTs, we found that the same multimodal lifestyle intervention described in this article resulted in regression of coronary atherosclerosis as measured by quantitative coronary arteriography [ 15 ] and ventricular function, [ 16 ] improvements in myocardial perfusion as measured by cardiac PET scans, and 2.5 times fewer cardiac events after five years, all of which were statistically significant [ 17 ]. Until then, it was believed that coronary heart disease progression could only be slowed, not stopped or reversed, similar to how MCI or early dementia due to AD are viewed today.

Since AD and coronary heart disease share many of the same risk factors and biological mechanisms, and since moderate multimodal lifestyle changes may help prevent AD, [ 18 ] we hypothesized that a more intensive multimodal intervention proven to often reverse the progression of coronary heart disease and some other chronic diseases may also beneficially affect the progression of MCI or early dementia due to AD.

We report here results of a randomized controlled trial to determine if the progression of MCI or early dementia due to AD may be slowed, stopped, or perhaps even reversed by a comprehensive, multimodal, intensive lifestyle intervention after 20 weeks when compared to a usual-care randomized control group. This lifestyle intervention includes (1) a whole foods, minimally processed plant-based diet low in harmful fats and low in refined carbohydrates and sweeteners with selected supplements; (2) moderate exercise; (3) stress management techniques; and (4) support groups.

This intensive multimodal lifestyle modification RCT sought to address the following questions:

Can the specified multimodal intensive lifestyle changes beneficially affect the progression of MCI or early dementia due to AD as measured by the AD Assessment Scale–Cognitive Subscale (ADAS-Cog), CGIC (Clinical Global Impression of Change), CDR-SB (Clinical Dementia Rating Sum of Boxes), and CDR-G (Clinical Dementia Rating Global) testing?

Is there a significant correlation between the degree of lifestyle change and the degree of change in these measures of cognition and function?

Is there a significant correlation between the degree of lifestyle change and the degree of change in selected biomarkers (e.g., the plasma Aβ42/40 ratio)?

Participants and methods

This study was a 1:1 multi-center RCT during the first 20 weeks of the study, and these findings are reported here. Patients who met the clinical trial inclusion criteria were enrolled between September 2018 and June 2022.

Participants were enrolled who met the following inclusion criteria:

Male or female, ages 45 to 90

Current diagnosis of MCI or early dementia due to AD process, with a MoCA score of 18 or higher (National Institute on Aging–Alzheimer’s Association McKhann and Albert 2011 criteria) [ 19 , 20 ]

Physician shared this diagnosis with the patient and approved their participation in this clinical trial

Willingness and ability to participate in all aspects of the intervention

Availability of spouse or caregiver to provide collateral information and assist with study adherence

Patients were excluded if they had any of the following:

Moderate or severe dementia

Physical disability that precludes regular exercise

Evidence for other primary causes of neurodegeneration or dementia, e.g., significant cerebrovascular disease (whose primary cause of dementia was vascular in origin), Lewy Body disease, Parkinson's disease, FTD

Significant ongoing psychiatric or substance abuse problems

Fifty-one participants with MCI or early-stage dementia due to AD who met these inclusion criteria were enrolled between September 2018 and June 2022 and underwent baseline testing. 26 of the enrolled participants were randomly assigned to an intervention group that received the multimodal lifestyle intervention for 20 weeks and 25 participants were randomly assigned to a usual habits and care control group that was asked not to make any lifestyle changes for 20 weeks, after which they would be offered the intervention. Patients in both groups received standard of care treatment managed by their own neurologist.

The intervention group received the lifestyle program for 20 weeks (initially in person, then via synchronous Zoom after March 2020 due to COVID-19). Two participants who did not want to continue these lifestyle changes withdrew during this time, both in the intervention group (one male, one female). Participants in both groups completed a follow-up visit at 20 weeks, where clinical and cognitive assessments were completed. Data were analyzed comparing the baseline and 20 week assessments between the groups.

In a drug trial, access to an investigational new drug can be restricted from participants in a randomized control group. However, we learned in our prior clinical trials of this lifestyle intervention with other diseases that it is often difficult to persuade participants who are randomly assigned to a usual-care control group to refrain from making these lifestyle changes for more than 20 weeks, which is why this time duration was chosen. If participants in both groups made similar lifestyle changes, then it would not be possible to show differences between the groups. Therefore, to encourage participants randomly assigned to the control group not to make lifestyle changes during the first 20 weeks, we offered to provide them the same lifestyle program at no cost to them for 20 weeks after being in the usual-care control group and tested after 20 weeks.

We initially planned to enroll 100 patients into this study based on power calculations of possible differences between groups in cognition and function after 20 weeks. However, due to challenges in recruiting patients, especially with the COVID-19 emergency and that many pharma trials began recruiting patients with similar criteria, it took longer to enroll patients than initially planned [ 21 ]. Because of this, we terminated recruitment after 51 patients were enrolled. This decision was based only on recruitment issues and limited funding, without reviewing the data at that time.

Patients were recruited from advertisements, presentations at neurology meetings, referrals from diverse groups of neurologists and other physicians, and a search of an online database of patients at UCSF. We put a special emphasis on recruiting diverse patients, although we were less successful in doing so than we hoped (Table 1 ).

This clinical trial was approved by the Western Institutional Review Board on 12/31/2017 (approval number: 20172897) and all participants and their study partners provided written informed consent. The trial protocol was also approved by the appropriate Institutional Review Board of all participating sites, and all subjects provided informed consent. Due to the COVID-19 emergency, planned MRI and amyloid PET scans were no longer feasible, and the number of cognition and function tests was decreased. An initial inclusion criterion of “current diagnosis of mild to moderate dementia due to AD (McKhann et al., 2011)” was further clarified to include a MoCA score of 18 or higher. This study was registered with clinicaltrials.gov on October 8, 2020 (NCT04606420, Unique Protocol ID: 20172897) retrospectively due to an administrative error. None of the sponsors who provided funding for this study participated in its design, conduct, management, or reporting of the results. Those providing the lifestyle intervention were separate from those performing testing and from those collecting and analyzing the data, who were blinded to group assignment. All authors contributed to manuscript draft revisions, provided critical comment, and approved submission for publication.

Any modifications in the protocol were approved in advance and in writing by the senior biostatistician (Charles McCulloch PhD) or the senior expert neuropsychologist (Dorene Rentz PsyD), and subsequently approved by the WIRB.

Patients were initially recruited only from the San Francisco Bay area beginning October 2018 and met in person until February 2020 when the COVID-19 pandemic began. Subsequently, this multimodal lifestyle intervention was offered to patients at home in real time via Zoom.

Offering this intervention virtually provided an opportunity to recruit patients from multiple sites, including the Massachusetts General Hospital/Harvard Medical School, Boston, MA; the University of California, San Diego; and Renown Regional Medical Center, Reno, NV, as well as with neurologists in the San Francisco Bay Area. These participants were recruited and tested locally at each site and the intervention was provided via Zoom and foods were sent directly to their home.

Patient recruitment

This is described in the Supplemental Materials section.

Intensive multimodal lifestyle intervention

Each patient received a copy of a book which describes this lifestyle medicine intervention for other chronic diseases. [ 2 ]

A whole foods minimally-processed plant-based (vegan) diet, high in complex carbohydrates (predominantly fruits, vegetables, whole grains, legumes, soy products, seeds and nuts) and especially low in harmful fats, sweeteners and refined carbohydrates. It was approximately 14-18% of calories as total fat, 16-18% protein, and 63-68% mostly complex carbohydrates. Calories were unrestricted. Those with higher caloric needs were given extra portions.

To assure the high adherence and standardization required to adequately test the hypothesis, 21 meals/week and snacks plus the daily supplements listed below were provided throughout the 40 weeks of this intervention to each study participant and his or her spouse or study partner at no cost to them. Twice/week, we overnight shipped to each patient as well as to their spouse or study partner three meals plus two snacks per day that met the nutritional guidelines as well as the prescribed nutritional supplements.

We asked participants to consume only the food and nutritional supplements we sent to them and no other foods. We reasoned that if adherence to the diet and lifestyle intervention was high, whatever outcomes we measured would be of interest. That is, if patients in the intervention group were adherent but showed no significant benefits, that would be a disappointing but an important finding. If they showed improvement, that would also be an important finding. But if they did not follow the lifestyle intervention sufficiently, then we would not have been able to adequately test the hypotheses.

Aerobic (e.g., walking) at least 30 minutes/day and mild strength training exercises at least three times per week from an exercise physiologist in person or with virtual sessions. Patients were given a personalized exercise prescription based on age and fitness level. All sessions were overseen by a registered nurse.

  • Stress management

Meditation, gentle yoga-based poses, stretching, progressive relaxation, breathing exercises, and imagery for a total of one hour per day, supervised by a certified stress management specialist. The purpose of each technique was to increase the patient’s sense of relaxation, concentration, and awareness. They were also given access to online meditations. Patients had the option of using flashing-light glasses at a theta frequency of 7.83 Hz plus soothing music as an aid to meditation and insomnia [ 22 ]. They were also encouraged to get adequate sleep.

Group support

Participants and their spouses/study partners participated in a support group one hour/session, three days/week, supervised by a licensed mental health professional in a supportive, safe environment to increase emotional support and community as well as communication skills and strategies for maintaining adherence to the program. They also received a book with memory exercises used periodically during group sessions [ 23 ].

To reinforce this lifestyle intervention, each patient and their spouse or study partner met three times/week, four hours/session via Zoom: 2

one hour of supervised exercise (aerobic + strength training)

one hour of stress management practices (stretching, breathing, meditation, imagery)

one hour of a support group

one hour lecture on lifestyle

Additional optional exercise and stress management classes were provided.

Supplements

Omega-3 fatty acids with Curcumin (1680 mg omega-3 & 800 mg Curcumin, Nordic Naturals ProOmega CRP, 4 capsules/day). Omega-3 fatty acids: In those age 65 or older, those consuming omega-3 fatty acids once/week or more had a 60% lower risk of developing AD, and total intake of n-3 polyunsaturated fatty acids was associated with reduced risk of Alzheimer disease [ 24 ]. Curcumin targets inflammatory and antioxidant pathways as well as (directly) amyloid aggregation, [ 25 ] although there may be problems with bioavailability and crossing the blood-brain barrier [ 26 ].

Multivitamin and Minerals (Solgar VM-75 without iron, 1 tablet/day). Combinatorial formulations demonstrate improvement in cognitive performance and the behavioral difficulties that accompany AD [ 27 ].

Coenzyme Q10 (200 mg, Nordic Naturals, 2 soft gels/day). CoQ10. May reduce mitochondrial impairment in AD [ 28 ].

Vitamin C (1 gram, Solgar, 1 tablet/day): Maintaining healthy vitamin C levels may have a protective function against age-related cognitive decline and AD [ 29 ].

Vitamin B12 (500 mcg, Solgar, 1 tablet/day): B12 hypovitaminosis is linked to the development of AD pathology [ 30 ].

Magnesium L-Threonate (Mg) (144 mg, Magtein, 2 tablets/day). A meta-analysis found that Mg deficiency may be a risk factor of AD and Mg supplementation may be an adjunctive treatment for AD [ 31 ].

Hericium erinaceus (Lion’s Mane, Stamets Host Defense, 2 grams/day): Lion’s mane may produce significant improvements in cognition and function in healthy people over 50 [ 32 ] and in MCI patients compared to placebo [ 33 ].

Super Bifido Plus Probiotic (Flora, 1 tablet/day). A meta-analysis suggests that probiotics may benefit AD patients [ 34 ].

Primary outcome measures: cognition and function testing

Four tests were used to assess changes in cognition and function in these patients. These are standard measures of cognition and function included in many FDA drug trials: ADAS-Cog; Clinical Global Impression of Change (CGIC); Clinical Dementia Rating Sum of Boxes (CDR-SB); Clinical Dementia Rating Global (CDR Global). All cognition and function raters were trained psychometrists with experience in administering these tests in clinical trials. Efforts were made to have the same person perform cognitive testing at each visit to reduce inter-observer variability. Those doing ADAS-Cog assessments were certified raters and tested patients in person. The CGIC and CDR tests were administered for all patients via Zoom by different raters than the ADAS-cog. Also, raters were blind to treatment arm to the degree possible.

Secondary outcome measures: biomarkers and microbiome

These are described in the Supplemental Materials section. These include blood-based biomarkers (such as the plasma Aβ42/40 ratio) and microbiome taxa (organisms).

Statistical methods

These are described in the Supplemental Materials section.

The recruitment effort for this trial lasted from 01/23/2018 to 6/16/2022. The most effective recruitment method was referral from the subjects’ physician or healthcare provider. Additional recruitment efforts included advertising in print and digital media; speaking to community groups; mentioning the study during podcast and radio interviews; collaborating with research institutions that provide dementia diagnosis and treatment; and contracting a clinical trials recruitment service (Linea). A total of 1585 people contacted us; of these, 1300 did not meet the inclusion criteria, 102 declined participation, and 132 were screening incomplete when enrollment closed, resulting in the enrollment of 51 participants (Fig. 1 ).

figure 1

CONSORT flowchart: patients, demographics, and enrollment

The remaining 51 patients were randomized to an intervention group (26 patients) that received the lifestyle intervention for 20 weeks or to a usual-care control group (25 patients) that was asked not to make any lifestyle changes. Two patients in the intervention group withdrew during the intervention because they did not want to continue the diet and lifestyle changes. No patients in the control group withdrew prior to 20-week testing. Analyses were performed on the remaining 49 patients. No patients were lost to follow-up.

All of these 49 patients had plasma Aβ42/40 ratios <0.089 (all were <0.0672), strongly supporting the diagnosis of Alzheimer’s disease [ 35 ].

At baseline, there were no statistically significant differences between the intervention group and the randomized control group in any measures, including demographic characteristics, cognitive function measures, or biomarkers (Table 1  and Table 2 ).

Cognition and function testing: primary analysis

Results after 20 weeks of a multimodal intensive lifestyle intervention in all patients showed overall statistically significant differences between the intervention group and the randomized control group in cognition and function in the CGIC ( p = 0.001), CDR-SB ( p = 0.032), and CDR Global ( p = 0.037) tests and of borderline significance in the ADAS-Cog test ( p = 0.053, Table 3 ). Three of these measures (CGIC, CDR Global, ADAS-Cog) showed improvement in cognition and function in the intervention group and worsening in the control group, and one test (CDR-SB) showed significantly less progression when compared to the randomized control group, which worsened in all four of these measures.

PRIMARY ANALYSIS (with outlier included), Table 3 :

CGIC (Clinical Global Impression of Change)

These scores improved in the intervention group and worsened in the control group.

(Fisher’s exact p -value = 0.001). 10 people in the intervention group showed improvement compared to none in the control group. 7 people in the intervention group and 8 people in the control group were unchanged. 7 people in the intervention group showed minimal worsening compared to 14 in the control group. None in the intervention group showed moderate worsening compared to 3 in the control group.

CDR-Global (Clinical Dementia Rating-Global)

These scores improved in the intervention group (from 0.69 to 0.65) and worsened in the randomized control group (from 0.66 to 0.74), mean difference = 0.12, p = 0.037 (Table 3 and Fig. 2 ).

figure 2

Changes in CDR-Global (lower = improved)

ADAS-Cog (Alzheimer’s Disease Assessment Scale)

These scores improved in the intervention group (from 21.551 to 20.536) and worsened in the randomized control group (from 21.252 to 22.160), mean group difference of change = 1.923 points, p = 0.053 (Table 3 and Fig. 3 ). (ADAS-Cog testing in one intervention group patient was not administered properly so it was excluded.)

figure 3

Changes in ADAS-Cog (lower = improved)

CDR-SB (Clinical Dementia Rating Sum of Boxes)

These scores worsened significantly more in the control group (from 3.34 to 3.86) than in the intervention group (from 3.27 to 3.35), mean group difference = 0.44, p = 0.032 (Table 3 and Fig. 4 ).

figure 4

Changes in CDR-SB (lower = improved)

There were no significant differences in depression scores as measured by PHQ-9 between the intervention and control groups.

Secondary sensitivity analyses

One patient in the intervention group was a clear statistical outlier in his cognitive function testing based on standard mathematical definitions (none was an outlier in the control group) [ 36 ]. Therefore, this patient’s data were excluded in a secondary sensitivity analysis. These results showed statistically significant differences in all four of these measures of cognition and function (Table 4 ). Three measures (ADAS-Cog, CGIC, and CDR Global) showed significant improvement in cognition and function and one (CDR-SB) showed significantly less worsening when compared to the randomized control group, which worsened in all four of these measures.

Sensitivity analysis (with outlier excluded)

There were no significant differences in depression scores as measured by PHQ-9 between the intervention and control groups in either analysis.

A reason why this patient might have been a statistical outlier is that he reported intense situational stress before his testing. As a second sensitivity analysis, this same outlier patient was retested when he was calmer, and all four measures (ADAS-Cog, CGIC, CDR Global, and CDR-SB) showed significant improvement in cognition and function, whereas the randomized control group worsened in all four of these measures.

Biomarker results

We selected biomarkers that have a known role in the pathophysiology of AD (Table 5 ). Of note is that the plasma Aβ42/40 ratio increased in the intervention group but decreased in the randomized control group ( p = 0.003, two-tailed).

Correlation of lifestyle index and cognitive function

In the current clinical trial, despite the inherent limitations of self-reported data, we found statistically significant correlations between the degree of lifestyle change (from baseline to 20 weeks) and the degree of change in three of four measures of cognition and function as well as correlations between the adherence to desired lifestyle changes at just the 20-week timepoint and the degree of change in two of the four measures of cognition and function and borderline significance in the fourth measure.

Correlation with lifestyle at 20 weeks: p = 0.052; correlation: 0.241

Correlation with degree of change in lifestyle: p = 0.015; correlation: 0.317

Correlation with lifestyle at 20 weeks: p = 0.043; correlation: 0.251

Correlation with degree of change in lifestyle: p = 0.081; correlation: 0.205

Correlation with lifestyle at 20 weeks: p = 0.065; correlation: 0.221

Correlation with degree of change in lifestyle: p = 0.024; correlation: 0.286

Correlation with lifestyle at 20 weeks: p = 0.002

Correlation with degree of change in lifestyle: p = 0.0005

(CGIC tests are non-parametric analyses, so standard effect size calculations are not included for this measure.)

Also, we also found a significant correlation between dietary total fat intake and changes in the CGIC measure ( p = 0.001), but this was not significant for the other three measures.

Correlation of lifestyle index and biomarker data

In the current clinical trial, despite the inherent limitations of self-reported data, we found statistically significant correlations between the degree of lifestyle change (from baseline to 20 weeks) and the degree of change in many of the key biomarkers, as well as correlations between the degree of lifestyle change at 20 weeks and the degree of change in these biomarkers:

Plasma Aβ42/40 ratio

Correlation with lifestyle at 20 weeks: p = 0.035; correlation: 0.306

Correlation with degree of change in lifestyle: p = 0.068; correlation: 0.266

Correlation with lifestyle at 20 weeks: p = 0.011; correlation: 0.363

Correlation with degree of change in lifestyle: p = 0.007; correlation: 0.383

LDL-cholesterol

Correlation with lifestyle at 20 weeks: p < 0.0001; correlation: 0.678

Correlation with degree of change in lifestyle: p < 0.0001; correlation: 0.628

Beta-Hydroxybutyrate (ketones)

Correlation with lifestyle at 20 weeks: p = 0.013; correlation: 0.372

Correlation with degree of change in lifestyle: p = 0.034; correlation: 0.320

Correlation with lifestyle at 20 weeks: p = 0.228; correlation: 0.177

Correlation with degree of change in lifestyle: p = 0.135; correlation: 0.219

GFAP/glial fibrillary acidic protein

Correlation with lifestyle at 20 weeks: p = 0.096; correlation: 0.243

Correlation with degree of change in lifestyle: p =0.351; correlation: 0.138

What degree of lifestyle change is correlated with improvement in cognitive function tests?

What degree of lifestyle is needed to stop or improve the worsening of MCI or early dementia due to AD? In other words, what % of adherence to the lifestyle intervention was correlated with no change in MCI or dementia across both groups? Higher adherence than this degree of lifestyle change was associated with improvement in MCI or dementia.

Correlation with lifestyle at 20 weeks: 71.4% adherence

Correlation with lifestyle at 20 weeks: 120.6% adherence

CDR-Global:

Correlation with lifestyle at 20 weeks: 95.6%

Microbiome results

There was a significant and beneficial change in the microbiome configuration in the intervention group but not in the control group.

Several taxa (groups of microorganisms) that increased only in the intervention group were consistent with those involved in reduced AD risk in other studies. For example, Blautia, which increased during the intervention in the intervention group, has previously been associated with a lower risk of AD, potentially due to its involvement in increasing γ-aminobutyric acid (GABA) production [ 37 ].  Eubacterium also increased during the intervention in the intervention group, and prior studies have identified Eubacterium genera (namely Eubacterium fissicatena) as a protective factor in AD [ 38 ].

Also, there was a decrease in relative abundance of taxa involved in increased AD risk in the intervention group, e.g., Prevotella and Turicibacter , the latter of which has been associated with relevant biological processes such as 5-HT production. Prevotella and Turicibacter have previously been shown to increase with disease progression, [ 39 ] and these taxa decreased over the course of the intervention.

These results support the hypothesis that the lifestyle intervention may beneficially modify specific microbial groups in the microbiome: increasing those that lower the risk of AD and decreasing those that increase the risk of AD. (Please see Supplement for more detailed information.)

We report the first randomized, controlled trial showing that an intensive multimodal lifestyle intervention may significantly improve cognition and function and may allay biological features in many patients with MCI or early dementia due to AD after 20 weeks.

After 20 weeks of a multimodal intensive lifestyle intervention, results of the primary analysis when all patients were included showed overall statistically significant differences between the intervention group and the randomized control group in cognition and function as measured by the CGIC ( p = 0.001), CDR-SB ( p = 0.032), and CDR Global ( p = 0.037) tests and of borderline significance in the ADAS-Cog test ( p = 0.053).

Three of these measures (CGIC, CDR Global, ADAS-Cog) showed improvement in cognition and function in the intervention group and worsening in the randomized control group, and one test (CDR-SB) showed less progression in the intervention group when compared to the control group which worsened in all four of these measures.

These differences were even clearer in a secondary sensitivity analysis when a mathematical outlier was excluded. These results showed statistically significant differences between groups in all four of these measures of cognition and function. Three of these measures showed improvement in cognition and function and one (CDR-SB) showed less deterioration when compared to the randomized control group, which worsened in all four of these measures.

The validity of these changes in cognition and function and possible biological mechanisms of improvement is supported by the observed changes in several clinically relevant biomarkers that showed statistically significant differences in a beneficial direction after 20 weeks when compared to the randomized control group.

One of the most clinically relevant biomarkers is the plasma Aβ42/40 ratio, which increased by 6.4% in the intervention group and decreased by 8.3% in the randomized control group after 20 weeks, and these differences were statistically significant ( p = 0.003, two-tailed).

In the lecanemab trial, plasma levels of the Aβ42/40 biomarker increased in the intervention group over 18 months with the presumption that this reflected amyloid moving from the brain to the plasma [ 40 ]. We found similar results in the direction of change in the plasma Aβ42/40 ratio from this lifestyle intervention but in only 20 weeks. Conversely, this biomarker decreased in the control group (as in the lecanemab trial), which may indicate increased cerebral uptake of amyloid.

Other clinically relevant biomarkers also showed statistically significant differences (two-tailed) in a beneficial direction after 20 weeks when compared to the randomized control group. These include hemoglobin A1c, insulin, glycoprotein acetyls (GlycA), LDL-cholesterol, and β-Hydroxybutyrate (ketone bodies).

Improvement in these biomarkers provides more biological plausibility for the observed improvements in cognition and function as well as more insight into the possible mechanisms of improvement. This information may also help in predicting which patients are more likely to show improvements in cognition and function by making these intensive lifestyle changes.

Other relevant biomarkers were in a beneficial direction of change in the intervention group compared with the randomized control group after 20 weeks. These include pTau181, GFAP, CRP, SAA, and C-peptide. Telomere length increased in the intervention group and was essentially unchanged in the control group. These differences were not statistically significant even when there was an order of magnitude difference between groups (as with GFAP and pTau181) or an almost four-fold difference (as with CRP), but these changes were in a beneficial direction. At least in part, these findings may be due to a relatively small sample size and/or a short duration of only 20 weeks.

We found a statistically significant dose-response correlation between the degree of lifestyle changes in both groups (“lifestyle index”) and the degree of change in many of these biomarkers. This correlation was found in both the degree of change in lifestyle from baseline to 20 weeks as well as the lifestyle measured at 20 weeks. These correlations also add to the biological plausibility of these findings.

We also found a statistically significant dose-response correlation between the degree of lifestyle changes in both groups (“lifestyle index”) and changes in most measures of cognition and function testing. In short, the more these AD patients changed their lifestyle in the prescribed ways, the greater was the beneficial impact on their cognition and function. These correlations also add to the biological plausibility of these findings. This variation in adherence helps to explain in part why some patients in the intervention group improved and others did not, but there are likely other mechanisms that we do not fully understand that may play a role. These statistically significant correlations are especially meaningful given the greater variability of self-reported data, the relatively small sample size, and the short duration of the intervention.

These findings are consistent with earlier clinical trials in which we used this same lifestyle intervention and the same measure of lifestyle index and found significant dose-response correlations between this lifestyle index (i.e., the degree of lifestyle changes) and changes in the degree of coronary atherosclerosis (percent diameter stenosis) in coronary heart disease; [ 41 , 45 ] changes in PSA levels and LNCaP cell growth in men with prostate cancer; [ 42 ] and changes in telomere length [ 43 ].

We also found significant differences between the intervention and control groups in several taxa (groups of micro-organisms) in the microbiome which may be beneficial.

There were no significant differences in depression scores as measured by PHQ-9 between the intervention and control groups. Therefore, reduction in depression is unlikely to account for the overall improvements in cognition and function seen in the intervention group patients.

We also found that substantial lifestyle changes were required to stop the progression of MCI in these patients. In the primary analysis, this ranged from 71.4% adherence for ADAS-Cog to 95.6% adherence for CDR-Global to 120.6% adherence for CDR-SB. In other words, extensive lifestyle changes were required to stop or improve cognition and function in these patients. This helps to explain why other studies of less-intensive lifestyle interventions may not have been sufficient to stop deterioration or improve cognition and function.

For example, comparing these results to those of the MIND-AD clinical trial provides more biological plausibility for both studies [ 44 ]. That is, more moderate multimodal lifestyle changes may slow the rate of worsening of cognition and function in MCI or early dementia due to early-stage AD, whereas more intensive multimodal lifestyle changes may result in overall average improvements in many measures of cognition and function when compared to a randomized usual-care control group in both clinical trials.

Lifestyle changes may provide additional benefits to patients on drug therapy. Anti-amyloid antibodies have shown modest effects on slowing progression, but they are expensive, have potential for adverse events, are not yet widely available, and do not result in overall cognitive improvement [ 40 ]. Perhaps there may be synergy from doing both.

Limitations

This study has several limitations. Only 51 patients were enrolled and randomized in our study, and two of these patients (both in the intervention group) withdrew during the trial. Showing statistically significant differences across different tests of cognition and function and other measures despite the relatively small sample size suggests that the lifestyle intervention may be especially effective and has strong internal validity.

However, the smaller sample size limits generalizability, especially since there was much less racial and ethnic diversity in this sample than we strived to achieve. Also, we measured these differences despite the relative insensitivity of these measures, which might have increased the likelihood of a type II error.

Raters were blinded to the group assignment of the participants. However, unlike a double-blind placebo-controlled drug trial, it is not possible to blind subjects in a lifestyle intervention about whether or not they are receiving the intervention. This might have affected outcome measures, although to reduce positive expectations and because it was true, patients were told during the study that we did not know whether or not this lifestyle intervention would be beneficial, and we said that whatever we showed would be useful.

Also, 20 weeks is a relatively short time for any intervention with MCI or early dementia due to AD. We did not include direct measures of brain structure in this trial, so we cannot determine whether there were direct impacts on markers of brain pathology relevant to AD. However, surrogate markers such as the plasma Aβ42/40 ratio are becoming more widely accepted.

Not all patients in the intervention group improved. Of the 24 patients in the intervention group, 10 showed improvement as measured by the CGIC test, 7 were unchanged, and 7 worsened. In the control group, none improved, 8 were unchanged, and 17 worsened. In part, this may be explained by variations in adherence to the lifestyle intervention, as there was a significant relationship between the degree of lifestyle change and the degree of change in cognition and function across both groups. We hope that further research may further clarify other factors and mechanisms to help explain why cognition and function improved in some patients but not in others.

The findings on the degree of lifestyle change required to stop the worsening or improve cognition and function need to be interpreted with caution. Since data from both groups were combined, it was no longer a randomized trial for this specific analysis, so there could be unknown confounding influences. Also, it is possible that those with improved changes in cognition were better able to adhere to the intervention and thus have higher lifestyle indices.

In summary, in persons with mild cognitive impairment or early dementia due to Alzheimer’s disease, comprehensive lifestyle changes may improve cognition and function in several standard measures after 20 weeks. In contrast, patients in the randomized control group showed overall worsening in all four measures of cognition and function during this time.

The validity of these findings was supported by the observed changes in plasma biomarkers and microbiome; the dose-response correlation of the degree of lifestyle change with the degree of improvement in all four measures of cognition and function; and the correlation between the degree of lifestyle change and the degree of changes in the Aβ42/40 ratio and the changes in some other relevant biomarkers in a beneficial direction.

Our findings also have implications for helping to prevent AD. Newer technologies, some aided by artificial intelligence, enable the probable diagnosis of AD years before it becomes clinically apparent. However, many people do not want to know if they are likely to get AD if they do not believe they can do anything about it. If intensive lifestyle changes may cause improvement in cognition and function in MCI or early dementia due to AD, then it is reasonable to think that these lifestyle changes may also help to prevent MCI or early dementia due to AD. Also, it may take less-extensive lifestyle changes to help prevent AD than to treat it. Other studies cited earlier on the effects of these lifestyle changes on diseases such as coronary heart disease support this conclusion. Clearly, intensive lifestyle changes rather than moderate ones seem to be required to improve cognition and function in those suffering from early-stage AD.

These findings support longer follow-up and larger clinical trials to determine the longer-term outcomes of this intensive lifestyle medicine intervention in larger groups of more diverse AD populations; why some patients beneficially respond to a lifestyle intervention better than others besides differences in adherence; as well as the potential synergy of these lifestyle changes and some drug therapies.

Availability of data and materials

The datasets used and/or analyzed during the current study may be available from the corresponding author on reasonable request. Requesters will be asked to submit a study protocol, including the research question, planned analysis, and data required. The authors will evaluate this plan (i.e., relevance of the research question, suitability of the data, quality of the proposed analysis, planned or ongoing analysis, and other matters) on a case-by-case basis.

Livingston G, Huntley J, Sommerlad A, Ames D, Ballard C, Banerjee S, Brayne C, Burns A, Cohen-Mansfield J, Cooper C, Costafreda SG, Dias A, Fox N, Gitlin LN, Howard R, Kales HC, Kivimäki M, Larson EB, Ogunniyi A, Orgeta V, Ritchie K, Rockwood K, Sampson EL, Samus Q, Schneider LS, Selbæk G, Teri L, Mukadam N. Dementia prevention, intervention, and care: 2020 report of the Lancet Commission. Lancet. 2020;396(10248):413–46. https://doi.org/10.1016/S0140-6736(20)30367-6 . (Epub 2020 Jul 30. Erratum in: Lancet. 2023 Sep 30;402(10408):1132. PMID: 327389 PMCID: PMC7392084).

Article   PubMed   PubMed Central   Google Scholar  

Ornish D, Ornish A. UnDo It. New York: Ballantine Books; 2019.

Google Scholar  

Dhana K, Agarwal P, James BD, Leurgans SE, Rajan KB, Aggarwal NT, Barnes LL, Bennett DA, Schneider JA. Healthy Lifestyle and Cognition in Older Adults With Common Neuropathologies of Dementia. JAMA Neurol. 2024. https://doi.org/10.1001/jamaneurol.2023.5491 . Epub ahead of print. PMID: 38315471.

Morris MC, Evans DA, Tangney CC, Bienias JL, Wilson RS. Associations of vegetable and fruit consumption with age-related cognitive change. Neurology. 2006;67(8):1370–6. https://doi.org/10.1212/01.wnl.0000240224.38978.d8 . (PMID:17060562;PMCID:PMC3393520).

Article   CAS   PubMed   Google Scholar  

Morris MC, Evans DA, Bienias JL, Tangney CC, Bennett DA, Aggarwal N, Schneider J, Wilson RS. Dietary fats and the risk of incident Alzheimer disease. Arch Neurol. 2003;60(2):194–200. https://doi.org/10.1001/archneur.60.2.194 . (Erratum.In:ArchNeurol.2003Aug;60(8):1072 PMID: 12580703).

Article   PubMed   Google Scholar  

Yu JT, Xu W, Tan CC, Andrieu S, Suckling J, Evangelou E, Pan A, Zhang C, Jia J, Feng L, Kua EH, Wang YJ, Wang HF, Tan MS, Li JQ, Hou XH, Wan Y, Tan L, Mok V, Tan L, Dong Q, Touchon J, Gauthier S, Aisen PS, Vellas B. Evidence-based prevention of Alzheimer’s disease: systematic review and meta-analysis of 243 observational prospective studies and 153 randomised controlled trials. J Neurol Neurosurg Psychiatry. 2020;91(11):1201–9. https://doi.org/10.1136/jnnp-2019-321913 . (Epub 2020 Jul 20. PMID: 32690803; PMCID: PMC7569385).

Blumenthal JA, Smith PJ, Mabe S, Hinderliter A, Lin PH, Liao L, et al. Lifestyle and neurocognition in older adults with cognitive impairments: A randomized trial. Neurology. 2019;92(3):e212–23. https://doi.org/10.1212/WNL.0000000000006784 . (Epub 2018/12/21. PubMed PMID: 30568005; PubMed Central PMCID: PMCPMC6340382).

Ngandu T, Lehtisalo J, Solomon A, Levälahti E, Ahtiluoto S, Antikainen R, Bäckman L, Hänninen T, Jula A, Laatikainen T, Lindström J, Mangialasche F, Paajanen T, Pajala S, Peltonen M, Rauramaa R, Stigsdotter-Neely A, Strandberg T, Tuomilehto J, Soininen H, Kivipelto M. A 2 year multidomain intervention of diet, exercise, cognitive training, and vascular risk monitoring versus control to prevent cognitive decline in at-risk elderly people (FINGER): a randomised controlled trial. Lancet. 2015;385(9984):2255–63. https://doi.org/10.1016/S0140-6736(15)60461-5 . (Epub 2015 Mar 12 PMID: 25771249).

Rosenberg A, Ngandu T, Rusanen M, Antikainen R, Backman L, Havulinna S, et al. Multidomain lifestyle intervention benefits a large elderly population at risk for cognitive decline and dementia regardless of baseline characteristics: The FINGER trial. Alzheimers Dement. 2018;14(3):263–70. https://doi.org/10.1016/j.jalz.2017.09.006 . (Epub 2017/10/23. PubMed PMID: 29055814).

Solomon A, Turunen H, Ngandu T, Peltonen M, Levalahti E, Helisalmi S, et al. Effect of the apolipoprotein e genotype on cognitive change during a multidomain lifestyle intervention: a subgroup analysis of a randomized clinical trial. JAMA Neurol. 2018;75(4):462–70. https://doi.org/10.1001/jamaneurol.2017.4365 . (Epub 2018/01/23. PubMed PMID: 29356827; PubMed Central PMCID: PMCPMC5885273).

Lehtisalo J, Rusanen M, Solomon A, Antikainen R, Laatikainen T, Peltonen M, et al. Effect of a multi-domain lifestyle intervention on cardiovascular risk in older people: the FINGER trial. Eur Heart J. 2022. https://doi.org/10.1093/eurheartj/ehab922 . Epub 2022/01/21. PubMed PMID: 35051281.

Kivipelto M, Mangialasche F, Snyder HM, Allegri R, Andrieu S, Arai H, et al. World-Wide FINGERS Network: a global approach to risk reduction and prevention of dementia. Alzheimers Dement. 2020;16(7):1078–94. https://doi.org/10.1002/alz.12123 . (Epub 2020/07/07. PubMed PMID: 32627328).

Kivipelto M, Mangialasche F, Snyder H M, Allegri R, Andrieu S, Arai H, Baker L, Belleville S, Brodaty H, Brucki SM, Calandri I, Caramelli P, Chen C, Chertkow H, Chew E, Choi S H, Chowdhary N, Crivelli L, De La Torre R, Du Y, Dua T, Espeland M, Feldman H H, Hartmanis M, Hartmann T, Heffernan M, Henry C J, Hong C H, Håkansson K, Iwatsubo T, Jeong J H, Jimenez‐Maggiora G, Koo E H, Launer L J, Lehtisalo J, Lopera F, Martínez‐Lage P, Martins R, Middleton L, Molinuevo J L, Montero‐Odasso M, Moon S Y, Morales‐Pérez K, Nitrini R, Nygaard H B, Park Y K, Peltonen M, Qiu C, Quiroz Y T, Raman R, Rao N, Ravindranath V, Rosenberg A, Sakurai T, Salinas R M, Scheltens P, Sevlever G, Soininen H, Sosa A L, Suemoto C K, Tainta‐Cuezva M, Velilla L, Wang Y, Whitmer R, Xu X, Bain L J, Solomon A, Ngandu T, Carillo, M C. World‐Wide FINGERS Network: A global approach to risk reduction and prevention of dementia. Alzheimer's Dement. 2020, https://doi.org/10.1002/alz.12123 .

Yaffe K, Vittinghoff E, Dublin S, Peltz CB, Fleckenstein LE, Rosenberg DE, Barnes DE, Balderson BH, Larson EB. Effect of personalized risk-reduction strategies on cognition and dementia risk profile among older adults: the SMARRT randomized clinical trial. JAMA Intern Med. 2023:e236279. https://doi.org/10.1001/jamainternmed.2023.6279 . Epub ahead of print. PMID: 38010725; PMCID: PMC10682943

Ornish D, Scherwitz LW, Billings JH, Brown SE, Gould KL, Merritt TA, Sparler S, Armstrong WT, Ports TA, Kirkeeide RL, Hogeboom C, Brand RJ. Intensive lifestyle changes for reversal of coronary heart disease. JAMA. 1998;280(23):2001–7. https://doi.org/10.1001/jama.280.23.2001 . (Erratum.In:JAMA1999Apr21;281(15):1380 PMID: 9863851).

Ornish D, Scherwitz LW, Doody RS, Kesten D, McLanahan SM, Brown SE, DePuey E, Sonnemaker R, Haynes C, Lester J, McAllister GK, Hall RJ, Burdine JA, Gotto AM Jr. Effects of stress management training and dietary changes in treating ischemic heart disease. JAMA. 1983;249(1):54–9 (PMID: 6336794).

Gould KL, Ornish D, Scherwitz L, Brown S, Edens RP, Hess MJ, Mullani N, Bolomey L, Dobbs F, Armstrong WT, et al. Changes in myocardial perfusion abnormalities by positron emission tomography after long-term, intense risk factor modification. JAMA. 1995;274(11):894–901. https://doi.org/10.1001/jama.1995.03530110056036 . (PMID: 7674504).

Dhana K, Evans DA, Rajan KB, Bennett DA, Morris MC. Healthy lifestyle and the risk of Alzheimer dementia: Findings from 2 longitudinal studies. Neurology. 2020;95(4):e374–83. https://doi.org/10.1212/WNL.0000000000009816 . (Epub 2020 Jun 17. PMID: 32554763; PMCID: PMC7455318).

Article   CAS   PubMed   PubMed Central   Google Scholar  

McKhann GM, Knopman DS, Chertkow H, Hyman BT, Jack CR Jr, Kawas CH, Klunk WE, Koroshetz WJ, Manly JJ, Mayeux R, Mohs RC, Morris JC, Rossor MN, Scheltens P, Carrillo MC, Thies B, Weintraub S, Phelps CH. The diagnosis of dementia due to Alzheimer’s disease: recommendations from the National Institute on Aging-Alzheimer’s Association workgroups on diagnostic guidelines for Alzheimer’s disease. Alzheimers Dement. 2011;7(3):263–9. https://doi.org/10.1016/j.jalz.2011.03.005 . (Epub 2011 Apr 21. PMID: 21514250; PMCID: PMC3312024).

Albert MS, DeKosky ST, Dickson D, Dubois B, Feldman HH, Fox NC, et al. The diagnosis of mild cognitive impairment due to Alzheimer’s disease: recommendations from the National Institute on Aging-Alzheimer’s Association workgroups on diagnostic guidelines for Alzheimer’s disease. Alzheimers Dement. 2011;7:270–9.

McDonald K, Seltzer E, Lu M, Gaisenband SD, Fletcher C, McLeroth P, Saini KS. Quantifying the impact of the COVID-19 pandemic on clinical trial screening rates over time in 37 countries. Trials. 2023;24(1):254. https://doi.org/10.1186/s13063-023-07277-1 . (PMID:37013558;PMCID:PMC10071259).

Tang HY, Vitiello MV, Perlis M, Mao JJ, Riegel B. A pilot study of audio-visual stimulation as a self-care treatment for insomnia in adults with insomnia and chronic pain. Appl Psychophysiol Biofeedback. 2014;39(3–4):219–25. https://doi.org/10.1007/s10484-014-9263-8 . (PMID:25257144;PMCID:PMC4221414).

Horsley K. Unlimited Memory. Granger Indiana: TCK Publishing; 2016.

Morris MC, Evans DA, Bienias JL, Tangney CC, Bennett DA, Wilson RS, et al. Consumption of fish and n-3 fatty acids and risk of incident Alzheimer disease. Arch Neurol. 2003;60(7):940–6. https://doi.org/10.1001/archneur.60.7.940 . (Epub 2003/07/23. PubMed PMID: 12873849).

Voulgaropoulou SD, van Amelsvoort T, Prickaerts J, Vingerhoets C. The effect of curcumin on cognition in Alzheimer’s disease and healthy aging: A systematic review of pre-clinical and clinical studies. Brain Res. 2019;1725:146476. https://doi.org/10.1016/j.brainres.2019.146476 . Epub 2019/09/29. PubMedPMID:31560864.

Ringman JM, Frautschy SA, Teng E, Begum AN, Bardens J, Beigi M, Gylys KH, Badmaev V, Heath DD, Apostolova LG, Porter V, Vanek Z, Marshall GA, Hellemann G, Sugar C, Masterman DL, Montine TJ, Cummings JL, Cole GM. Oral curcumin for Alzheimer’s disease: tolerability and efficacy in a 24-week randomized, double blind, placebo-controlled study. Alzheimers Res Ther. 2012;4(5):43. https://doi.org/10.1186/alzrt146 . (PMID:23107780;PMCID:PMC3580400).

Shea TB, Remington R. Nutritional supplementation for Alzheimer’s disease? Curr Opin Psychiatry. 2015;28(2):141–7. https://doi.org/10.1097/YCO.0000000000000138 . (Epub 2015/01/21. PubMed PMID: 25602242).

Pradhan N, Singh C, Singh A. Coenzyme Q10 a mitochondrial restorer for various brain disorders. Naunyn Schmiedebergs Arch Pharmacol. 2021;394(11):2197–222. https://doi.org/10.1007/s00210-021-02161-8 . (Epub 2021/10/02 PubMed PMID: 34596729).

Harrison FE. A critical review of vitamin C for the prevention of age-related cognitive decline and Alzheimer’s disease. J Alzheimers Dis. 2012;29(4):711–26. https://doi.org/10.3233/JAD-2012-111853 . (Epub 2012/03/01. PubMed PMID: 22366772; PubMed Central PMCID: PMCPMC3727637).

Lauer AA, Grimm HS, Apel B, Golobrodska N, Kruse L, Ratanski E, et al. Mechanistic Link between Vitamin B12 and Alzheimer's Disease. Biomolecules. 2022;12(1). https://doi.org/10.3390/biom12010129 . Epub 2022/01/22. PubMed PMID: 35053277; PubMed Central PMCID: PMCPMC8774227.

Du K, Zheng X, Ma ZT, Lv JY, Jiang WJ, Liu MY. Association of Circulating Magnesium Levels in Patients With Alzheimer’s Disease From 1991 to 2021: A Systematic Review and Meta-Analysis. Front Aging Neurosci. 2021;13:799824. https://doi.org/10.3389/fnagi.2021.799824 . (Epub 2022/01/28. PubMed PMID: 35082658; PubMed Central PMCID: PMCPMC8784804).

Saitsu Y, Nishide A, Kikushima K, Shimizu K, Ohnuki K. Improvement of cognitive functions by oral intake of Hericium erinaceus. Biomed Res. 2019;40(4):125–31. https://doi.org/10.2220/biomedres.40.125 . (Epub 2019/08/16. PubMed PMID: 31413233).

Mori K, Inatomi S, Ouchi K, Azumi Y, Tuchida T. Improving effects of the mushroom Yamabushitake (Hericium erinaceus) on mild cognitive impairment: a double-blind placebo-controlled clinical trial. Phytother Res. 2009;23(3):367–72. https://doi.org/10.1002/ptr.2634 . (Epub 2008/10/11. PubMed PMID: 18844328).

Xiang S, Ji JL, Li S, Cao XP, Xu W, Tan L, et al. Efficacy and Safety of probiotics for the treatment of alzheimer’s disease, mild cognitive impairment, and Parkinson’s Disease: a systematic review and meta-analysis. Front Aging Neurosci. 2022;14:730036. https://doi.org/10.3389/fnagi.2022.730036 . (Epub 2022/02/22. PubMed PMID: 35185522; PubMed Central PMCID: PMCPMC8851038).

Fogelman I, West T, Braunstein JB, Verghese PB, Kirmess KM, Meyer MR, Contois JH, Shobin E, Ferber KL, Gagnon J, Rubel CE, Graham D, Bateman RJ, Holtzman DM, Huang S, Yu J, Yang S, Yarasheski KE. Independent study demonstrates amyloid probability score accurately indicates amyloid pathology. Ann Clin Transl Neurol. 2023;10(5):765–78. https://doi.org/10.1002/acn3.51763 . (Epub 2023 Mar 28. PMID: 36975407; PMCID: PMC10187729).

Exploratory data analysis. John W. Tukey, 1977. Addison-Wesley, Reading MA. https://doi.org/10.1002/bimj.4710230408 .

Zhuang Z, Yang R, Wang W, Qi L, Huang T. Associations between gut microbiota and Alzheimer’s disease, major depressive disorder, and schizophrenia. J Neuroinflammation. 2020;17(1):288. https://doi.org/10.1186/s12974-020-01961-8 . (PMID:33008395;PMCID:PMC7532639).

Cammann D, Lu Y, Cummings MJ, Zhang ML, Cue JM, Do J, Ebersole J, Chen X, Oh EC, Cummings JL, Chen J. Genetic correlations between Alzheimer’s disease and gut microbiome genera. Sci Rep. 2023;13(1):5258. https://doi.org/10.1038/s41598-023-31730-5 . (PMID:37002253;PMCID:PMC10066300).

Borsom EM, Conn K, Keefe CR, Herman C, Orsini GM, Hirsch AH, Palma Avila M, Testo G, Jaramillo SA, Bolyen E, Lee K, Caporaso JG, Cope EK. Predicting Neurodegenerative Disease Using Prepathology Gut Microbiota Composition: a Longitudinal Study in Mice Modeling Alzheimer’s Disease Pathologies. Microbiol Spectr. 2023;11(2):e0345822. https://doi.org/10.1128/spectrum.03458-22 . (Epub ahead of print. PMID: 36877047; PMCID: PMC10101110).

van Dyck CH, Swanson CJ, Aisen P, Bateman RJ, Chen C, Gee M, Kanekiyo M, Li D, Reyderman L, Cohen S, Froelich L, Katayama S, Sabbagh M, Vellas B, Watson D, Dhadda S, Irizarry M, Kramer LD, Iwatsubo T. Lecanemab in Early Alzheimer’s Disease. N Engl J Med. 2023;388(1):9–21. https://doi.org/10.1056/NEJMoa2212948 . (Epub 2022 Nov 29 PMID: 36449413).

Ornish D, Scherwitz LW, Billings JH, Brown SE, Gould KL, Merritt TA, Sparler S, Armstrong WT, Ports TA, Kirkeeide RL, Hogeboom C, Brand RJ. Intensive lifestyle changes for reversal of coronary heart disease. JAMA. 1998;280(23):2001–7. https://doi.org/10.1001/jama.280.23.2001 .

Ornish D, Weidner G, Fair WR, Marlin R, Pettengill EB, Raisin CJ, Dunn-Emke S, Crutchfield L, Jacobs FN, Barnard RJ, Aronson WJ, McCormac P, McKnight DJ, Fein JD, Dnistrian AM, Weinstein J, Ngo TH, Mendell NR, Carroll PR. Intensive lifestyle changes may affect the progression of prostate cancer. J Urol. 2005;174(3):1065–9. https://doi.org/10.1097/01.ju.0000169487.49018.73 . (discussion 1069-70. PMID: 16094059).

Ornish D, Lin J, Chan JM, Epel E, Kemp C, Weidner G, Marlin R, Frenda SJ, Magbanua MJM, Daubenmier J, Estay I, Hills NK, Chainani-Wu N, Carroll PR, Blackburn EH. Effect of comprehensive lifestyle changes on telomerase activity and telomere length in men with biopsy-proven low-risk prostate cancer: 5-year follow-up of a descriptive pilot study. Lancet Oncol. 2013;14(11):1112–20. https://doi.org/10.1016/S1470-2045(13)70366-8 . (Epub 2013 Sep 17 PMID: 24051140).

Kivipelto M et al. Multimodal preventive trial for Alzheimer’s disease. Alzheimer’s Dement. 2021;17(Suppl.10):e056105. https://alz-journals.onlinelibrary.wiley.com/doi/abs/10.1002/alz.056105 .

Ornish D, Brown SE, Scherwitz LW, Billings JH, Armstrong WT, Ports TA, McLanahan SM, Kirkeeide RL, Brand RJ, Gould KL. Can lifestyle changes reverse coronary heart disease? The Lifestyle Heart Trial. Lancet. 1990;336(8708):129–33. https://doi.org/10.1016/0140-6736(90)91656-u . (PMID: 1973470).

Download references

Acknowledgements

We are grateful to each of the following people who made this study possible. Paramount among these are all of the study participants and their spouse or support person. Their commitment was inspiring, and without them this study would not have been possible. Each of the staff who provided and supported this program is exceptionally caring and competent, and includes: Heather Amador, who coordinated and administered all grants and infrastructure; Tandis Alizadeh, who is chief of staff; as well as Lynn Sievers, Nikki Liversedge, Pamela Kimmel, Stacie Dooreck, Antonella Dewell, Stacey Dunn-Emke, Marie Goodell, Emily Dougherty, Kamala Berrio, Kristin Gottesman, Katie Mayers, Dennis Malone, Sarah & Mary Barber, Steven Singleton, Kevin Lane, Laurie Case, Amber O’Neill, Annie DiRocco, Alison Eastwood, Sara Henley, Sousha Naghshineh, Sarah Reinhard, Laura Kandell, Alison Haag, Sinead Lafferty, Haley Perkins, Chase Delaney, Danielle Marquez, Ava Hoffman, Sienna Lopez, and Sophia Gnuse. Dr. Caitlin Moore conducted much of the cognition and function testing along with Dr. Catherine Madison, Trevor Ragas, Andrea Espinosa, Lorraine Martinez, Davor Zink, Jeff Webb, Griffin Duffy, Lauren Sather, and others. Dr. Cecily Jenkins trained the ADAS-Cog rater. Dr. Jan Krumsiek and Dr. Richa Batra performed important analyses in Dr. Rima Kaddurah-Daouk’s lab. Dr. Pia Kivisåkk oversaw biomarker assays in Dr. Steven Arnold's lab. We are grateful to all of the referring neurologists. Board members of the nonprofit Preventive Medicine Research Institute provided invaluable oversight and support, including Henry Groppe, Jenard & Gail Gross, Ken Hubbard, Brock Leach, and Lee Stein, as well as Joel Goldman.

Author’s information

DO is the corresponding author. RT contributed as the senior author.

We are very grateful to Leonard A. Lauder & Judith Glickman Lauder; Gary & Laura Lauder; Howard Fillit and Mark Roithmayr of The Alzheimer’s Drug Discovery Foundation; Mary & Patrick Scanlan of the Mary Bucksbaum Scanlan Family Foundation; Laurene Powell Jobs/Silicon Valley Community Foundation; Pierre & Pamela Omidyar Fund/Silicon Valley Community Foundation (Pat Christen and Jeff Alvord); George Vradenburg Foundation/Us Against Alzheimer’s; American Endowment Foundation (Anna & James McKelvey); Arthur M. Blank Family Foundation/Around the Table Foundation (Elizabeth Brown, Natalie Gilbert, Christian Amica); John Paul & Eloise DeJoria Peace Love & Happiness Foundation (Constance Dykhuizen); Maria Shriver/Women’s Alzheimer’s Movement (Sandy Gleysteen, Laurel Ann Gonsecki, Erin Stein); Mark Pincus Family Fund/Silicon Valley Community Foundation; Christy Walton/Walton Family Foundation; Milken Family Foundation; The Cleveland Clinic Lou Ruvo Center for Brain Health (Larry Ruvo); Jim Greenbaum Foundation; R. Martin Chavez; Wonderful Company Foundation (Stewart & Lynda Resnick); Daniel Socolow; Anthony J. Robbins/Tony Robbins Foundation; John Mackey; John & Lisa Pritzker and the Lisa Stone Pritzker Family Foundation; Ken Hubbard; Greater Houston Community Foundation (Jenard & Gail Gross); Henry Groppe; Brock & Julie Leach Family Charitable Foundation; Bucksbaum/Baum Foundation (Glenn Bucksbaum & April Minnich); YPO Gold Los Angeles; Lisa Holland/Betty Robertson; the Each Foundation (Lionel Shaw); Moby Charitable Fund; California Relief Program; Gary & Lisa Schildhorn; McNabb Foundation (Ricky Rafner); Renaissance Charitable Foumdation (Stephen & Karen Slinkard); Network for Good; Ken & Kim Raisler Foundation; Miner Foundation; Craiglist Charitable Fund (Jim Buckmaster and Annika Joy Quist); Gaurav Kapadia; Healing Works Foundation/Wayne Jonas; and the Center for Innovative Medicine (CIMED) at the Karolinska Institutet, Hjärnfonden, Stockholms Sjukhem, Research Council for Health Working Life and Welfare (FORTE). In-kind donations were received from Alan & Rob Gore of Body Craft Recreation Supply (exercise equipment), Dr. Andrew Abraham of Orgain, Paul Stamets of Fungi Perfecta ( Host Defense Lion’s Mane), Nordic Naturals, and Flora. Dr. Rima Kaddurah-Daouk at Duke is PI of the Alzheimer Gut Microbiome Project (funded by NIA U19AG063744). She also received additional funding from NIA that has enabled her research (U01AG061359 & R01AG081322).

The funders had no role in the conceptualization; study design; data collection; analysis; and interpretation; writing of the report; or the decision to submit for publication.

Author information

Authors and affiliations.

Preventive Medicine Research Institute, 900 Bridgeway, Sausalito, CA, USA

Dean Ornish, Catherine Madison, Anne Ornish, Nancy DeLamarter, Noel Wingers & Carra Richling

University of California, San Francisco and University of California, San Diego, USA

Dean Ornish

Ray Dolby Brain Health Center, California Pacific Medical Center, San Francisco, CA, USA

Catherine Madison

Division of Clinical Geriatrics, Department of Neurobiology, Care Sciences and Society, Karolinska Institute, Karolinska vägen 37 A, SE-171 64, Solna, Sweden

Miia Kivipelto

Theme Inflammation and Aging, Karolinska University Hospital, Karolinska vägen 37 A, SE-171 64, Stockholm, Solna, Sweden

The Ageing Epidemiology (AGE) Research Unit, School of Public Health, Imperial College London, St Mary’s Hospital, Norfolk Place, London, W2 1PG, United Kingdom

Institute of Public Health and Clinical Nutrition, University of Eastern Finland, Yliopistonranta 8, 70210, Kuopio, Finland

Clinical Services, Preventive Medicine Research Institute, Bridgeway, Sausalito, CA, 900, USA

Colleen Kemp & Sarah Tranter

Division of Biostatistics, Department of Epidemiology & Biostatistics, UCSF, San Francisco, CA, USA

Charles E. McCulloch

Neurosciences, University of California, San Diego, CA, USA

Douglas Galasko

Clinical Neurology, School of Medicine, University of Nevada, Reno, USA

Renown Health Institute of Neurosciences, Reno, NV, USA

Harvard Medical School, Boston, MA, USA

Dorene Rentz, Rudolph E. Tanzi & Steven E. Arnold

Center for Alzheimer Research and Treatment, Boston, MA, USA

Dorene Rentz

Mass General Brigham Alzheimer Disease Research Center, Boston, MA, USA

Elizabeth Blackburn Lab, UCSF, San Francisco, CA, USA

UCSF, San Francisco, CA, USA

Departments of Medicine and Psychiatry, Duke University Medical Center and Member, Duke Institute of Brain Sciences, Durham, NC, USA

Rima Kaddurah-Daouk

Department of Pediatrics; Department of Computer Science & Engineering; Department of Bioengineering; Center for Microbiome Innovation, Halıcıoğlu Data Science Institute, University of California, San Diego, La Jolla, CA, USA

Department of Pediatrics and Scientific Director, American Gut Project and The Microsetta Initiative, University of California San Diego, La Jolla, CA, USA

Daniel McDonald

Bioinformatics and Systems Biology Program; Rob Knight Lab; Medical Scientist Training Program, University of California, San Diego, La Jolla, CA, USA

Lucas Patel

Buck Institute for Research on Aging, San Francisco, CA, USA

Eric Verdin

University of California, San Francisco, CA, USA

Genetics and Aging Research Unit, Boston, MA, USA

Rudolph E. Tanzi

McCance Center for Brain Health, Boston, MA, USA

Massachusetts General Hospital, Boston, MA, USA

Interdisciplinary Brain Center, Massachusetts General Hospital, Boston, MA, USA

Steven E. Arnold

You can also search for this author in PubMed   Google Scholar

Contributions

DO, CM, MK, CK, DG, JA, DR, CEM, JL, KN, AO, ST, ND, NW, CR, RKD, RK, EV, RT, and SEA were involved in the study design and conduct. DO conceptualized the study hypotheses (building on the work of MK), obtained funding, prepared the first draft of the manuscript, and is the principal investigator. CEM oversaw the statistical analyses and interpretation, and DR oversaw the cognition and function testing and interpretation. CK and ST oversaw all clinical operations and patient recruitment, including the IRB. JL conducted the telomere analyses. CM oversaw patient selection. AO developed the learning management system and community platform for patients and providers. KN managed an IRB. ND co-led most of the support groups, and CR oversaw all aspects involving nutrition. All authors participated in writing the manuscript. NW and ST oversaw data collection and prepared the databases other than the microbiome databases which were overseen by RK and prepared by DM and LP who helped design this part of the study. CM, CK, JL, RKD, RK, DM, and LP were involved in the acquisition of data. SA, RT, and RKD did biomarker analyses. All authors contributed to critical review of the manuscript and approved the final manuscript.

Corresponding author

Correspondence to Dean Ornish .

Ethics declarations

Competing interests.

MK is one of the Editors-in-Chief of this journal and has no relevant competing interests and recused herself from the review process. RKD is an inventor on key patents in the field of metabolomics and holds equity in Metabolon, a biotech company in North Carolina. In addition, she holds patents licensed to Chymia LLC and PsyProtix with royalties and ownership. DO and AO have consulted for Sharecare and have received book royalties and lecture honoraria and, with CK, have received equity in Ornish Lifestyle Medicine. RK is a scientific advisory board member and consultant for BiomeSense, Inc., has equity and receives income. He is a scientific advisory board member and has equity in GenCirq. He is a consultant and scientific advisory board member for DayTwo, and receives income. He has equity in and acts as a consultant for Cybele. He is a co-founder of Biota, Inc., and has equity. He is a cofounder of Micronoma, and has equity and is a scientific advisory board member. The terms of these arrangements have been reviewed and approved by the University of California, San Diego in accordance with its conflict of interest policies. DM is a consultant for BiomeSense. RT is a co-founder and equity holder in Hyperion Rx, which produces the flashing-light glasses at a theta frequency of 7.83 Hz used as an optional aid to meditation. The rest of the authors declare that they have no competing interests.

Ethics approval and consent to participate

This clinical trial was approved by the Western Institutional Review Board on 12/31/2017 (approval number: 20172897) and all participants and their study partners provided written informed consent. The trial protocol was also approved by the appropriate Institutional Review Board of all participating sites; and all subjects provided informed consent.

Consent for publication

Informed consent was received from all patients. All data from research participants described in this paper is de-identified.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary material 1. , rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Ornish, D., Madison, C., Kivipelto, M. et al. Effects of intensive lifestyle changes on the progression of mild cognitive impairment or early dementia due to Alzheimer’s disease: a randomized, controlled clinical trial. Alz Res Therapy 16 , 122 (2024). https://doi.org/10.1186/s13195-024-01482-z

Download citation

Received : 21 February 2024

Accepted : 15 May 2024

Published : 07 June 2024

DOI : https://doi.org/10.1186/s13195-024-01482-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Alzheimer’s
  • Lifestyle medicine
  • Social support

Alzheimer's Research & Therapy

ISSN: 1758-9193

data analysis clinical research

  • Introduction
  • Article Information

Data are from Epic Systems Corporation peer benchmarking. Center horizontal lines represent medians; lower and upper bounds of the boxes, 25th and 75th percentiles; vertical lines, 5th to 95th percentile; and dashed horizontal line, 4-hour standard set by The Joint Commission.

Data are from Epic Systems Corporation peer benchmarking. A, Hospital occupancy is the percentage of staffed beds; ED visits are from January 2020.

eTable. Sample Site Characteristics from the Epic Peer Benchmarking Service

  • Monthly Rates of Patients Who Left Before Accessing Care in US Emergency Departments JAMA Network Open Research Letter September 30, 2022 This cross-sectional study investigates rates of patients who left emergency departments without being seen from 2017 to 2021. Alexander T. Janke, MD; Edward R. Melnick, MD, MHS; Arjun K. Venkatesh, MD, MBA, MHS

See More About

Sign up for emails based on your interests, select your interests.

Customize your JAMA Network experience by selecting one or more topics from the list below.

  • Academic Medicine
  • Acid Base, Electrolytes, Fluids
  • Allergy and Clinical Immunology
  • American Indian or Alaska Natives
  • Anesthesiology
  • Anticoagulation
  • Art and Images in Psychiatry
  • Artificial Intelligence
  • Assisted Reproduction
  • Bleeding and Transfusion
  • Caring for the Critically Ill Patient
  • Challenges in Clinical Electrocardiography
  • Climate and Health
  • Climate Change
  • Clinical Challenge
  • Clinical Decision Support
  • Clinical Implications of Basic Neuroscience
  • Clinical Pharmacy and Pharmacology
  • Complementary and Alternative Medicine
  • Consensus Statements
  • Coronavirus (COVID-19)
  • Critical Care Medicine
  • Cultural Competency
  • Dental Medicine
  • Dermatology
  • Diabetes and Endocrinology
  • Diagnostic Test Interpretation
  • Drug Development
  • Electronic Health Records
  • Emergency Medicine
  • End of Life, Hospice, Palliative Care
  • Environmental Health
  • Equity, Diversity, and Inclusion
  • Facial Plastic Surgery
  • Gastroenterology and Hepatology
  • Genetics and Genomics
  • Genomics and Precision Health
  • Global Health
  • Guide to Statistics and Methods
  • Hair Disorders
  • Health Care Delivery Models
  • Health Care Economics, Insurance, Payment
  • Health Care Quality
  • Health Care Reform
  • Health Care Safety
  • Health Care Workforce
  • Health Disparities
  • Health Inequities
  • Health Policy
  • Health Systems Science
  • History of Medicine
  • Hypertension
  • Images in Neurology
  • Implementation Science
  • Infectious Diseases
  • Innovations in Health Care Delivery
  • JAMA Infographic
  • Law and Medicine
  • Leading Change
  • Less is More
  • LGBTQIA Medicine
  • Lifestyle Behaviors
  • Medical Coding
  • Medical Devices and Equipment
  • Medical Education
  • Medical Education and Training
  • Medical Journals and Publishing
  • Mobile Health and Telemedicine
  • Narrative Medicine
  • Neuroscience and Psychiatry
  • Notable Notes
  • Nutrition, Obesity, Exercise
  • Obstetrics and Gynecology
  • Occupational Health
  • Ophthalmology
  • Orthopedics
  • Otolaryngology
  • Pain Medicine
  • Palliative Care
  • Pathology and Laboratory Medicine
  • Patient Care
  • Patient Information
  • Performance Improvement
  • Performance Measures
  • Perioperative Care and Consultation
  • Pharmacoeconomics
  • Pharmacoepidemiology
  • Pharmacogenetics
  • Pharmacy and Clinical Pharmacology
  • Physical Medicine and Rehabilitation
  • Physical Therapy
  • Physician Leadership
  • Population Health
  • Primary Care
  • Professional Well-being
  • Professionalism
  • Psychiatry and Behavioral Health
  • Public Health
  • Pulmonary Medicine
  • Regulatory Agencies
  • Reproductive Health
  • Research, Methods, Statistics
  • Resuscitation
  • Rheumatology
  • Risk Management
  • Scientific Discovery and the Future of Medicine
  • Shared Decision Making and Communication
  • Sleep Medicine
  • Sports Medicine
  • Stem Cell Transplantation
  • Substance Use and Addiction Medicine
  • Surgical Innovation
  • Surgical Pearls
  • Teachable Moment
  • Technology and Finance
  • The Art of JAMA
  • The Arts and Medicine
  • The Rational Clinical Examination
  • Tobacco and e-Cigarettes
  • Translational Medicine
  • Trauma and Injury
  • Treatment Adherence
  • Ultrasonography
  • Users' Guide to the Medical Literature
  • Vaccination
  • Venous Thromboembolism
  • Veterans Health
  • Women's Health
  • Workflow and Process
  • Wound Care, Infection, Healing

Get the latest research based on your areas of interest.

Others also liked.

  • Download PDF
  • X Facebook More LinkedIn

Janke AT , Melnick ER , Venkatesh AK. Hospital Occupancy and Emergency Department Boarding During the COVID-19 Pandemic. JAMA Netw Open. 2022;5(9):e2233964. doi:10.1001/jamanetworkopen.2022.33964

Manage citations:

© 2024

  • Permissions

Hospital Occupancy and Emergency Department Boarding During the COVID-19 Pandemic

  • 1 Department of Emergency Medicine, Yale University School of Medicine, New Haven, Connecticut
  • 2 VA Ann Arbor, University of Michigan, National Clinician Scholars Program, Ann Arbor
  • 3 Center for Outcomes Research and Evaluation, Yale University, New Haven, Connecticut
  • Research Letter Monthly Rates of Patients Who Left Before Accessing Care in US Emergency Departments Alexander T. Janke, MD; Edward R. Melnick, MD, MHS; Arjun K. Venkatesh, MD, MBA, MHS JAMA Network Open

Emergency department (ED) boarding refers to holding admitted patients in the ED, often in hallways, while awaiting an inpatient bed. The Joint Commission identified boarding as a patient safety risk that should not exceed 4 hours. 1 Downstream harms include increased medical errors, compromises to patient privacy, and increased mortality. 2 Boarding is a key indicator of overwhelmed resources and may be more likely to occur when hospital occupancy exceeds 85% to 90%. 3

Hospital resource constraints have become more salient during the COVID-19 pandemic and have been associated with excess mortality. 4 Existing federal data fail to capture a comprehensive view of resource limitations inclusive of ED strain. 5 We used a national benchmarking database to examine hospital occupancy and ED boarding during the COVID-19 pandemic.

This cross-sectional study used aggregated hospital measures available through a voluntary peer benchmarking service offered by Epic Systems Corporation, an electronic health record vendor. Measures were collected monthly from January 2020 to December 2021. Annual ED visit volumes and total hospital beds for participating sites were included (eTable in the Supplement ). We reported median and 5th to 95th percentile for hospital occupancy (percentage of staffed inpatient beds occupied, calculated hourly and averaged over the month), ED boarding time (median time from admission order to ED departure to an inpatient bed), and ED visit count. The study was classified as exempt by the institutional review board at Yale University because the study did not use patient data. This study followed the STROBE reporting guideline.

Distribution of ED boarding time was examined across hospital occupancy levels, with a threshold of 85% or greater based on Kelen et al. 3 We plotted all 3 measures with new national daily COVID-19 cases. 6 The difference in median ED boarding time between high-occupancy and low-occupancy hospital-months was evaluated using the Wilcoxon rank sum test .Analyses were performed using R, version 4.0.2.

Hospitals reporting benchmarking data increased from 1289 in January 2020 to 1769 in December 2021. Occupancy rates and boarding time had a threshold association: when occupancy exceeded 85%, boarding exceeded The Joint Commission 4-hour standard for 88.9% of hospital-months ( Figure 1 ). In those hospital-months, median ED boarding time was 6.58 hours compared with 2.42 hours in other hospital-months ( P  < .001). Across all hospitals, the median ED boarding time was 2.00 hours (5th-95th percentile, 0.93-7.88 hours) in January 2020, 1.58 hours (5th-95th percentile, 0.90-3.51 hours) in April 2020, and 3.42 hours in December 2021 (5th-95th percentile, 1.27-9.14 hours). Median hospital occupancy was highest in January 2020 (69.6%; 5th-95th percentile, 44.3%-69.6%), 48.7% (5th-95th percentile, 28.7%-69.9% hours) in April 2020, and 65.8% (5th-95th percentile, 42.7%-84.8%) in December 2021 ( Figure 2 ).

We found that hospital occupancy greater than 85% was associated with increased ED boarding beyond the 4-hour standard. Throughout 2020 and 2021, ED boarding increased even when hospital occupancy did not increase above January 2020 levels. The harms associated with ED boarding and crowding, long-standing before the pandemic, may have been further entrenched. Study limitations were the inability to differentiate occupancy for specific services, median measures of boarding likely underestimated actual burden, and the sample was anchored to specific data fields within the Epic peer benchmarking service. Future research should explore more complex measures like staffing variability and local outbreak burden. Policy makers should address acute care system strain in future pandemic waves and other disasters to avoid further hospital system capacity strain and unsafe patient care conditions.

Accepted for Publication: June 30, 2022.

Published: September 30, 2022. doi:10.1001/jamanetworkopen.2022.33964

Open Access: This is an open access article distributed under the terms of the CC-BY License . © 2022 Janke AT et al. JAMA Network Open .

Corresponding Author: Alexander T. Janke, MD, VA Ann Arbor, University of Michigan, National Clinician Scholars Program, NCRC Building 14, G100, 2800 Plymouth Rd, Ann Arbor MI 48109 ( [email protected] ).

Author Contributions: Dr Janke had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Concept and design: All authors.

Acquisition, analysis, or interpretation of data: Janke, Melnick.

Drafting of the manuscript: Janke, Melnick.

Critical revision of the manuscript for important intellectual content: All authors.

Statistical analysis: Janke.

Obtained funding: Janke.

Administrative, technical, or material support: Melnick, Venkatesh.

Supervision: Melnick, Venkatesh.

Conflict of Interest Disclosures: Dr Janke reported receiving support from the Veterans Affairs (VA) Office of Academic Affiliations through the VA/National Clinician Scholars Program and the University of Michigan and funding from an Emerging Infectious Diseases and Preparedness grant from the Society for Academic Emergency Medicine Foundation. Dr Melnick reported receiving grants from the National Institute on Drug Abuse, the American Medical Association, and the Agency for Healthcare Research & Quality outside the submitted work. Dr Venkatesh reported receiving grants from the Centers for Medicare & Medicaid Services and the American College of Emergency Physicians outside the submitted work; receiving funding from an Emerging Infectious Diseases and Preparedness grant from the Society for Academic Emergency Medicine Foundation; and having committee leadership roles with the American College of Emergency Physicians and the Society for Academic Emergency Medicine. No other disclosures were reported.

Funding/Support: Dr Venkatesh was supported by the American Board of Emergency Medicine–National Academy of Medicine Fellowship.

Role of the Funder/Sponsor: The funder had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.

Disclaimer: The contents of this article do not represent the views of the US Department of Veterans Affairs or the US Government.

  • Register for email alerts with links to free full-text articles
  • Access PDFs of free articles
  • Manage your interests
  • Save searches and receive search alerts

Osteosarcopenia increases the risk of mortality: a systematic review and meta-analysis of prospective observational studies

  • Open access
  • Published: 18 June 2024
  • Volume 36 , article number  132 , ( 2024 )

Cite this article

You have full access to this open access article

data analysis clinical research

  • Nicola Veronese 1 , 2 ,
  • Francesco Saverio Ragusa 1 ,
  • Shaun Sabico 2 ,
  • Ligia J. Dominguez 3 ,
  • Mario Barbagallo 1 ,
  • Gustavo Duque 4 , 5 &
  • Nasser Al-Daghri 2  

Background & aims

Osteosarcopenia is a recently recognized geriatric syndrome. The association between osteosarcopenia and mortality risk is still largely underexplored. In this systematic review with meta-analysis of prospective cohort studies, we aimed to explore whether osteosarcopenia could be associated with a higher mortality risk.

Several databases were searched from the inception to 16th February 2024 for prospective cohort studies dealing with osteosarcopenia and mortality. We calculated the mortality risk in osteosarcopenia vs. controls using the most adjusted estimate available and summarized the data as risk ratios (RRs) with their 95% confidence intervals (CIs). A random-effect model was considered for all analyses.

Among 231 studies initially considered, nine articles were included after exclusions for a total of 14,429 participants (mean age: 70 years; 64.5% females). The weighted prevalence of osteosarcopenia was 12.72%. Over a mean follow-up of 6.6 years and after adjusting for a mean of four covariates, osteosarcopenia was associated with approximately 53% increased risk of mortality (RR: 1.53; 95% CI: 1.28–1.78). After accounting for publication bias, the re-calculated RR was 1.48 (95%CI: 1.23–1.72). The quality of the studies was generally good, as determined by the Newcastle Ottawa Scale.

Conclusions

Osteosarcopenia was significantly linked with an increased risk of mortality in older people, indicating the need to consider the presence of osteoporosis in patients with sarcopenia, and vice versa, since the combination of these two conditions typical of older people may lead to further complications, such as mortality.

Avoid common mistakes on your manuscript.

Introduction

Osteosarcopenia is a term derived from “osteo” (bone) and “sarcopenia” (loss of muscle mass and strength) [ 1 ]. This condition refers to the concurrent presence of osteoporosis and sarcopenia, two age-related musculoskeletal conditions with significant implications for health and functional independence in older adults [ 1 ]. While osteoporosis and sarcopenia have traditionally been viewed as distinct entities, emerging evidence suggests that they often coexist and share common pathophysiological mechanisms, leading to a synergistic decline in musculoskeletal health [ 2 ].

Nowadays, the importance of osteosarcopenia lies in its profound impact on overall health, mobility, and quality of life in older individuals [ 3 ]. On the one hand, osteoporosis, characterized by low bone mass and microarchitectural deterioration of bone tissue, increases the risk of fragility fractures, particularly in the spine, hip, and wrist, resulting in pain, disability, and loss of independence [ 4 ]. Sarcopenia, on the other hand, involves the progressive loss of muscle mass, strength, and function, leading to impaired physical performance, increased risk of falls, and functional decline [ 5 ].

Probably, the coexistence of osteoporosis and sarcopenia in osteosarcopenia further exacerbates these adverse outcomes, creating a vicious cycle of frailty, disability, and mortality in older adults [ 6 ]. Individuals with osteosarcopenia are at heightened risk of falls, fractures, hospitalizations, and institutionalization, placing a substantial burden on healthcare systems and society as a whole [ 7 ].

Understanding the etiology, epidemiology, and clinical consequences of osteosarcopenia is essential for developing effective prevention and management strategies to optimize musculoskeletal health and promote healthy aging. In this regard, the association between osteosarcopenia and mortality is still underexplored.

Given this background, with this systematic review and meta-analysis of prospective cohort studies, we aimed to explore whether osteosarcopenia could be associated with a higher mortality risk.

This systematic review and meta-analysis was conducted in accordance with the updated 2020 Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines [ 8 ]. The protocol has been registered in Open Science Framework ( https://osf.io/5drnu/ ).

Search strategy

Two independent reviewers (NV and FSR) searched PubMed, Web of Science, and Embase from inception until 16 February 2024. The full search strategy and the search terms used are described in Supplementary Table 1 . Discrepancies in the literature search process were resolved by a third investigator (SS).

Inclusion and exclusion criteria

Studies were included based on the following criteria: (i) Baseline data from observational prospective studies; (ii) clear diagnostic criteria for osteosarcopenia indicated as validated criteria for osteoporosis and for sarcopenia; (iii) reporting data regarding mortality and summarizing these data as hazard ratios (HRs) or risk ratios (RRs), deriving from multivariate analyses; and (iv) studies had to include both adults with and without osteosarcopenia. Published articles were excluded if they (i) were reviews, letters, in vivo or in vitro experiments, commentaries, or posters; and (ii) were not published as a full text and in English, since literature has demonstrated excluding such papers has little impact on the effect estimates and conclusions of systematic reviews [ 9 ].

Data extraction and risk of bias

Two authors (NV and FSR) extracted data independently, which included name of first author, date of publication, country of origin, participant age, study design, population studied, number of participants, definition of sarcopenia and osteoporosis, tools and criteria for assessing sarcopenia and osteoporosis, follow-up time in years, main condition, number and type of adjustments in statistical analyses. Disagreements between reviewers were resolved by one independent reviewer (SS).

The Newcastle-Ottawa Scale (NOS) was used to assess the study quality/risk of bias [ 10 ]. The NOS assigns a maximum of 9 points based on three quality parameters: selection, comparability, and outcome. The evaluation was made by two investigator (FSR and NV) and checked by another (SS). The risk of bias was consequently categorized as high (< 5/9 points), moderate (6–7), or low (8–9) [ 11 ].

The outcome of our interest was mortality (overall or specific), reported using any method, including death certificates, medical records, administrative data, or other information, such as asking for information from relatives.

Statistical analysis

The primary analysis compared the cumulative incidence of mortality in patients with osteosarcopenia versus controls, summarizing the data derived from multivariate statistical analyses. In the case of univariate analyses, the number of confounders was posed equal to zero. Then, we calculated the risk ratios (RRs) with their 95% confidence intervals (CIs). Statistical significance was assessed using the random effects model and inverse-variance method [ 12 ].

Statistical heterogeneity of outcome measurements between different studies was assessed using the overlap of their confidence interval (95% CI) and expressed as I 2 . Data classification as having low heterogeneity was based on I 2 from 30 to 49%, moderate heterogeneity from 50 to 74%, and high heterogeneity from 75% and above [ 13 ]. In case of high heterogeneity, a random-effect meta-regression was planned to explore potential sources of variability that could affect estimate rates among studies [ 14 ]. We plan to consider as moderators mean age of the population, percentage of females, number of adjustments in multivariate analyses (in univariate analyses was posed equal to zero), and follow-up in years, but the main outcome did not suffer on any statistical heterogeneity.

Publication bias was assessed by visually inspecting funnel plots and using the Egger bias test [ 15 ]. In case of statistically significant publication bias, the trim-and-fill analysis was used [ 15 ]. For all analyses, a P-value less than 0.05 was considered statistically significant. All analyses were performed using STATA version 14.0 (StataCorp).

Literature search

Among the 231 studies initially identified, we screened 114 records and retrieved 13 full texts. At this level, two studies were excluded: one was a review [ 7 , 16 ], one did not report meta-analyzable data on mortality (only included in a composite outcome) [ 17 ], and one had limited data about the diagnosis of osteosarcopenia [ 18 ]. Finally, we included nine cohort studies [ 19 , 20 , 21 , 22 , 23 , 24 , 25 , 26 , 27 ]. The literature search selection is summarized in the PRISMA flowchart ( Supplementary Fig. 1) .

Descriptive characteristics

Table  1 shows the main descriptive characteristics of the studies included. Overall, the nine cohort studies included a total of 14,429 participants, followed up for a mean of 6.6 years. They aged a mean of 70 (SD = 6) years, and they were prevalently females (64.5%). The studies were conducted on all continents except for Africa, mainly Asia ( n  = 4), Europe ( n  = 2), South America ( n  = 2), and Oceania ( n  = 1). Among the main conditions considered, three studies were conducted among community-dwelling older people, while the other six considered specific medical conditions, such as cirrhosis, hip fracture, or similar (see Table  1 for further details). Regarding the diagnosis of sarcopenia, five studies used the criteria proposed by international societies that associated the evaluation of body composition parameters with muscle strength and/or physical performance, one study used phase angle parameters, and the other three studies, criteria specific for the population examined; similarly, the diagnosis of osteoporosis was made in six studies using a T-score less than − 2.5 SD, while two studies used less than one SD, and one study, criteria specific for the population included (Table  1 ).

Osteosarcopenia as a risk factor for mortality: meta-analysis

Figure  1 shows the prevalence of osteosarcopenia in the studies included. Overall, the studies reported that 1,147 over 14,429 participants suffered from osteosarcopenia for a weighted prevalence of 12.72% (95%CI: 9.65–15.78) (Fig.  1 ). The prevalence largely varied from 2.78% [ 21 ] to 38.46% [ 20 ], leading to a substantial heterogeneity (I 2  = 99%).

figure 1

Prevalence of osteosarcopenia in the studies included

Figure  2 shows the association between osteosarcopenia at the baseline and mortality. After adjusting the analyses for a mean of four potential confounders (see the list in Table  1 ), the presence of osteosarcopenia significantly increased the risk of mortality in the cohort studies included by 53% (RR = 1.53; 95%CI: 1.28–1.78). This analysis was not affected by any significant heterogeneity (I 2  = 0%), and all the studies reported a significant association between osteosarcopenia and mortality except for one [ 26 ].

figure 2

Meta-analysis of osteosarcopenia as predictor factor for mortality

This outcome was, however, affected by the presence of publication bias (Egger’s test p-value < 0.0001): after using the trim-and-fill analysis, with four studies trimmed at the left of the mean, the association was only slightly reduced (RR = 1.48; 95%CI: 1.23–1.72).

Risk of bias

The risk of bias evaluation is reported in Supplementary Table 2 . Overall, the mean NOS was 8, with no study at possible high risk of bias. The main source of risk of bias was the short time of follow-up, less than 5 years.

In this systematic review with meta-analysis, including nine cohort studies with a total of 14,429 participants followed up for a mean of 6.6 years, we found that the presence of osteosarcopenia at the baseline increased the risk of mortality by 53%, also after accounting for several potential confounders. Even if the outcome suffers from publication bias, the trim-and-fill analysis only slightly attenuated our findings.

The first crucial epidemiological point is the high prevalence of osteosarcopenia found in our meta-analysis, i.e., about 12.7%. Osteosarcopenia represents a growing concern in aging populations. While individual prevalence estimates vary, studies suggest a substantial overlap between osteoporosis and sarcopenia, with prevalence rates ranging from 5 to 20% in older adults [ 28 ]. Of importance, the prevalence of osteosarcopenia is expected to rise in parallel with the aging population, placing a significant burden on healthcare systems and society [ 28 ]. Our review, using a meta-analytic approach confirms the epidemiological importance of this entity in geriatrics, across different clinical situations.

Overall, the pooled analysis indicated that osteosarcopenia significantly increased the risk of mortality, and the results were not affected by any heterogeneity, with practically all the studies reporting a significant positive association between osteosarcopenia and mortality. Our findings are in agreement with two previous reviews reporting that osteosarcopenia increased the risk of mortality [ 7 , 16 ]. Even if these two systematic reviews increased the risk of our knowledge about this important topic, they could report only three [ 7 ] and five studies [ 16 ], respectively, therefore having more limited literature compared to our work. Indeed, according to several previous studies, both osteoporosis and sarcopenia individually increased the risk of mortality [ 5 , 29 ]. Thus, the possibility that osteosarcopenia could significantly increase the risk of mortality is reasonable, as it involves the co-existence of the two aforementioned conditions [ 7 ]. Of importance is that the presence of osteosarcopenia significantly affects mortality rate independently from the definition used that was, however, of clinical heterogeneity for both, sarcopenia and osteoporosis. Altogether, our findings suggest that the importance of identifying osteosarcopenia does not stand in the diagnostic criteria used to identify it but in identifying this entity to effectively treat and prevent mortality.

Osteosarcopenia can increase the risk of mortality through different mechanisms. First, and most obviously, osteosarcopenia could increase the risk of fractures, including hip and falls [ 7 , 30 ]. Both falls and fractures are widely known risk factors for mortality in older people [ 31 ]. In this regard, sarcopenia is a progressive and generalized skeletal muscle disorder characterized by the loss of muscle mass and function and is known to be associated with increased adverse outcomes related to fractures, falls, frailty, disability, and mortality [ 5 ]. Moreover, sarcopenia also represents a significant economic burden worldwide [ 32 ], with a remarkable prospected increase in the next 40 years [ 32 ]. At the same time, osteoporosis is a chronic skeletal disorder characterized by low bone mass and mineral density, along with the deterioration of bone–tissue microarchitecture, further leading to bone fragility and consequential susceptibility to fractures, disability, and mortality [ 29 ]. With the aging of the global population, these two conditions will become more prevalent, and the incidence of osteosarcopenia will thus increase dramatically in the upcoming decades [ 7 ]. Therefore, osteosarcopenia represents an important public health issue to which great attention should be paid globally, also because it significantly increases the risk of death independently from potential confounders.

The findings of this systematic review must be considered within its limitations. First, we could not estimate whether the risk of mortality caused by osteosarcopenia was higher compared to the presence of sarcopenia or osteoporosis alone due to insufficient original data. Second, even if the I 2 was < 50%, the diagnostic criteria for osteosarcopenia may have affected the results from a clinical point of view, not leading to a univocal definition of this entity. For example, some studies included osteoporotic patients, but others involved osteopenic participants; similarly, sarcopenia was defined according to different criteria. Third, some studies explored osteosarcopenia among community dwellers, while others analyzed specific populations. Fourth, even if we used the results of multivariable analyses, the adjustment factors differed among studies.

In conclusion, our systematic review suggests that osteosarcopenia significantly increases the risk of mortality by about 53% compared to controls. Our results underline the need to consider the presence of osteoporosis in sarcopenic patients, and vice versa, since the combination of these two conditions, typical of older people, may lead to further adverse complications, such as mortality.

Data availability

Data are available upon request to the Corresponding Author, based on a reasonable request.

Hirschfeld H, Kinsella R, Duque G (2017) Osteosarcopenia: where bone, muscle, and fat collide. Osteoporos Int 28:2781–2790

Article   CAS   PubMed   Google Scholar  

Zanker J, Duque G (2020) Osteosarcopenia: the path beyond controversy. Curr Osteoporos Rep 18:81–84

Article   PubMed   Google Scholar  

Kirk B, Miller S, Zanker J, Duque G (2020) A clinical guide to the pathophysiology, diagnosis and treatment of osteosarcopenia. Maturitas 140:27–33

Curtis EM, Reginster J-Y, Al-Daghri N, Biver E, Brandi ML, Cavalier E, Hadji P, Halbout P, Harvey NC, Hiligsmann M (2022) Management of patients at very high risk of osteoporotic fractures through sequential treatments. Aging Clin Exp Res 34:695–714

Article   PubMed   PubMed Central   Google Scholar  

Veronese N, Demurtas J, Soysal P, Smith L, Torbahn G, Schoene D, Schwingshackl L, Sieber C, Bauer J, Cesari M (2019) Sarcopenia and health-related outcomes: an umbrella review of observational studies. Eur Geriatr Med 1–10

Inoue T, Maeda K, Nagano A, Shimizu A, Ueshima J, Murotani K, Sato K, Hotta K, Morishita S, Tsubaki A (2021) Related factors and clinical outcomes of osteosarcopenia: a narrative review. Nutrients 13:291

Article   CAS   PubMed   PubMed Central   Google Scholar  

Teng Z, Zhu Y, Teng Y, Long Q, Hao Q, Yu X, Yang L, Lv Y, Liu J, Zeng Y (2021) The analysis of osteosarcopenia as a risk factor for fractures, mortality, and falls. Osteoporos Int 32:2173–2183

Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, Shamseer L, Tetzlaff JM, Akl EA, Brennan SE (2021) The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. Syst Reviews 10:1–11

Article   Google Scholar  

Dobrescu A, Nussbaumer-Streit B, Klerings I, Wagner G, Persad E, Sommer I, Herkner H, Gartlehner G (2021) Restricting evidence syntheses of interventions to English-language publications is a viable methodological shortcut for most medical topics: a systematic review. J Clin Epidemiol 137:209–217

Luchini C, Stubbs B, Solmi M, Veronese N (2017) Assessing the quality of studies in meta-analyses: advantages and limitations of the Newcastle Ottawa Scale. World J Meta-Analysis 5:80–84

Luchini C, Veronese N, Nottegar A, Shin JI, Gentile G, Granziol U, Soysal P, Alexinschi O, Smith L (2021) Assessing the quality of studies in meta-research: Review/guidelines on the most important quality assessment tools. 20:185–195

DerSimonian R, Laird N (1986) Meta-analysis in clinical trials. Control Clin Trials 7:177–188

Higgins JP, Thompson SG, Deeks JJ, Altman DG (2003) Measuring inconsistency in meta-analyses. BMJ 327:557–560

Cumpston M, Li T, Page MJ, Chandler J, Welch VA, Higgins JP, Thomas J (2019) Updated guidance for trusted systematic reviews: a new edition of the Cochrane Handbook for Systematic Reviews of Interventions. Cochrane Database Syst Rev 10:14651858

Google Scholar  

Duval S, Tweedie R (2000) A nonparametric trim and fill method of accounting for publication bias in meta-analysis. J Am Stat Assoc 95:89–98

Chen S, Xu X, Gong H et al (2024) Global epidemiological features and impact of osteosarcopenia: a comprehensive meta-analysis and systematic review. J Cachexia Sarcopenia Muscle 15:8–20

Nakano Y, Mandai S, Naito S et al (2024) Effect of osteosarcopenia on longitudinal mortality risk and chronic kidney disease progression in older adults. Bone 179:116975

de Cuevas KMM, Sepúlveda-Loyola W, Araya-Quintanilla F, Morselli JB, Molari M, Probst VS (2022) Association between clinical measures for the diagnosis of osteosarcopenia with functionality and mortality in older adults: longitudinal study. Nutricion Clinica y Dietetica Hospitalaria. 42:143–151

Balogun S, Winzenberg T, Wills K, Scott D, Callisaya M, Cicuttini F, Jones G, Aitken D (2019) Prospective associations of osteosarcopenia and osteodynapenia with incident fracture and mortality over 10 years in community-dwelling older adults. Arch Gerontol Geriatr 82:67–73

Kara GK, Ozturk C (2023) Effect of osteosarcopenia on the development of a second compression fracture and mortality in elderly patients after vertebroplasty. Acta Orthop Traumatol Turc 57:271–276

Paulin TK, Malmgren L, McGuigan FE, Akesson KE (2024) Osteosarcopenia: prevalence and 10-Year fracture and mortality risk - A Longitudinal, Population-based study of 75-Year-old women. Calcif Tissue Int 114:315–325

Saeki C, Kanai T, Ueda K, Nakano M, Oikawa T, Torisu Y, Saruta M, Tsubota A (2023) Osteosarcopenia predicts poor survival in patients with cirrhosis: a retrospective study. BMC Gastroenterol 23:196

Salech F, Marquez C, Lera L, Angel B, Saguez R, Albala C (2021) Osteosarcopenia predicts Falls, Fractures, and mortality in Chilean Community-Dwelling older adults. J Am Med Dir Assoc 22:853–858

Shimada H, Suzuki T, Doi T, Lee S, Nakakubo S, Makino K, Arai H (2023) Impact of osteosarcopenia on disability and mortality among Japanese older adults. J Cachexia Sarcopenia Muscle 14:1107–1116

Xiang T, Fu P, Zhou L (2023) Sarcopenia and Osteosarcopenia among patients undergoing hemodialysis. Front Endocrinol (Lausanne) 14:1181139

Yoo JI, Kim H, Ha YC, Kwon HB, Koo KH (2018) Osteosarcopenia in patients with hip fracture is related with high mortality. J Korean Med Sci 33:e27

Loyola WS, de Barros Morselli J, Quintanilla FA, Teixeira D, Bustos AA, Molari M, Fuenzalida JJV, Probst VS (2023) Clinical impact of osteosarcopenia on mortality, physical function and chronic inflammation: a 9-year follow up cohort study. Nutricion Clin Y Dietetica Hospitalaria 43:133–140

Kirk B, Zanker J, Duque G (2020) Osteosarcopenia: epidemiology, diagnosis, and treatment—facts and numbers. Wiley Online Library, pp 609–618

Leboime A, Confavreux CB, Mehsen N, Paccou J, David C, Roux C (2010) Osteoporosis and mortality. Joint bone Spine 77:S107–S112

Chen S, Xu X, Gong H, Chen R, Guan L, Yan X, Zhou L, Yang Y, Wang J, Zhou J (2024) Global epidemiological features and impact of osteosarcopenia: a comprehensive meta-analysis and systematic review. J Cachexia Sarcopenia Muscle 15:8–20

James SL, Lucchesi LR, Bisignano C, Castle CD, Dingels ZV, Fox JT, Hamilton EB, Henry NJ, Krohn KJ, Liu Z (2020) The global burden of falls: global, regional and national estimates of morbidity and mortality from the global burden of Disease Study 2017. Inj Prev 26:i3–i11

Bruyère O, Beaudart C, Ethgen O, Reginster J-Y, Locquet M (2019) The health economics burden of Sarcopenia: a systematic review. Maturitas 119:61–69

Download references

This work was supported by the Distinguished Scientist Fellowship Program (DFSP) of the King Saud University, Riyadh, Kingdom of Saudi Arabia.

Open access funding provided by Università degli Studi di Palermo within the CRUI-CARE Agreement.

Author information

Authors and affiliations.

Geriatric Unit, Department of Internal Medicine and Geriatrics, University of Palermo, Palermo, 90127, Italy

Nicola Veronese, Francesco Saverio Ragusa & Mario Barbagallo

Chair for Biomarkers of Chronic Diseases, Biochemistry Department, College of Science, King Saud University, Riyadh, 11451, Saudi Arabia

Nicola Veronese, Shaun Sabico & Nasser Al-Daghri

Department of Medicine and Surgery, Kore University of Enna, Enna, 94100, Italy

Ligia J. Dominguez

Bone, Muscle & Geroscience Group, Research Institute of the McGill University Health Centre, Montreal, QC, Canada

Gustavo Duque

Dr Joseph Kaufmann Chair in Geriatric Medicine, Department of Medicine, McGill University, Montreal, QC, Canada

You can also search for this author in PubMed   Google Scholar

Contributions

Preparation of the first draft: Veronese, Ragusa, Sabico; data analysis: Veronese, Ragusa; critical revision: Dominguez, Barbagallo, Duque, Al-Daghri. All authors approved the final version submitted to the journal.

Corresponding author

Correspondence to Nicola Veronese .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Conflict of interest

Additional information, publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Veronese, N., Ragusa, F.S., Sabico, S. et al. Osteosarcopenia increases the risk of mortality: a systematic review and meta-analysis of prospective observational studies. Aging Clin Exp Res 36 , 132 (2024). https://doi.org/10.1007/s40520-024-02785-9

Download citation

Received : 05 April 2024

Accepted : 23 May 2024

Published : 18 June 2024

DOI : https://doi.org/10.1007/s40520-024-02785-9

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Osteosarcopenia
  • Meta-analysis
  • Osteoporosis
  • Find a journal
  • Publish with us
  • Track your research
  • Open access
  • Published: 11 June 2024

BUB1 regulates non-homologous end joining pathway to mediate radioresistance in triple-negative breast cancer

  • Sushmitha Sriramulu 1   na1 ,
  • Shivani Thoidingjam 1   na1 ,
  • Wei-Min Chen 2 ,
  • Oudai Hassan 3 ,
  • Farzan Siddiqui 1 , 4 , 5 ,
  • Stephen L. Brown 1 , 4 , 5 ,
  • Benjamin Movsas 1 , 4 , 5 ,
  • Michael D. Green 6 ,
  • Anthony J. Davis 2 ,
  • Corey Speers 6 , 7 ,
  • Eleanor Walker 1 , 4 , 5 &
  • Shyam Nyati 1 , 4 , 5  

Journal of Experimental & Clinical Cancer Research volume  43 , Article number:  163 ( 2024 ) Cite this article

395 Accesses

Metrics details

Triple-negative breast cancer (TNBC) is a highly aggressive form of breast cancer subtype often treated with radiotherapy (RT). Due to its intrinsic heterogeneity and lack of effective targets, it is crucial to identify novel molecular targets that would increase RT efficacy. Here we demonstrate the role of BUB1 (cell cycle Ser/Thr kinase) in TNBC radioresistance and offer a novel strategy to improve TNBC treatment.

Gene expression analysis was performed to look at genes upregulated in TNBC patient samples compared to other subtypes. Cell proliferation and clonogenic survivals assays determined the IC 50 of BUB1 inhibitor (BAY1816032) and radiation enhancement ratio (rER) with pharmacologic and genomic BUB1 inhibition. Mammary fat pad xenografts experiments were performed in CB17/SCID. The mechanism through which BUB1 inhibitor sensitizes TNBC cells to radiotherapy was delineated by γ-H2AX foci assays, BLRR, Immunoblotting, qPCR, CHX chase, and cell fractionation assays.

BUB1 is overexpressed in BC and its expression is considerably elevated in TNBC with poor survival outcomes. Pharmacological or genomic ablation of BUB1 sensitized multiple TNBC cell lines to cell killing by radiation, although breast epithelial cells showed no radiosensitization with BUB1 inhibition. Kinase function of BUB1 is mainly accountable for this radiosensitization phenotype. BUB1 ablation also led to radiosensitization in TNBC tumor xenografts with significantly increased tumor growth delay and overall survival. Mechanistically, BUB1 ablation inhibited the repair of radiation-induced DNA double strand breaks (DSBs). BUB1 ablation stabilized phospho-DNAPKcs (S2056) following RT such that half-lives could not be estimated. In contrast, RT alone caused BUB1 stabilization, but pre-treatment with BUB1 inhibitor prevented stabilization (t 1/2 , ~8 h). Nuclear and chromatin-enriched fractionations illustrated an increase in recruitment of phospho- and total-DNAPK, and KAP1 to chromatin indicating that BUB1 is indispensable in the activation and recruitment of non-homologous end joining (NHEJ) proteins to DSBs. Additionally, BUB1 staining of TNBC tissue microarrays demonstrated significant correlation of BUB1 protein expression with tumor grade.

Conclusions

BUB1 ablation sensitizes TNBC cell lines and xenografts to RT and BUB1 mediated radiosensitization may occur through NHEJ. Together, these results highlight BUB1 as a novel molecular target for radiosensitization in women with TNBC.

Breast cancer (BC) affects more than 2 million women worldwide each year. Triple-negative breast cancer (TNBC) is the most lethal subtype of BC and while effective targeted therapies exist for the prevention and treatment of ER-positive breast cancer, no effective targeted therapy exists for TNBC. TNBC tend to be more aggressive, occur in younger women, and are less likely to be cured by adjuvant therapy [ 1 ]. As radiotherapy is standard in the management of BC, there is a need to identify molecular targets with potential to increase the efficacy of radiation therapy (RT). To this end, DNA damage repair pathways are of interest.

DNA damage is a critical determinant of radiation-induced cell death [ 2 ]. Radiation mediated base damages and single strand breaks (SSBs) are more efficiently repaired by cells, whereas double strand breaks (DSBs) are more difficult to repair and, if unrepaired, lead to lethality in cells. The ability of cells to recognize and respond to DSBs is fundamental in determining the sensitivity (or resistance) of cells to radiation [ 3 ]. DSB repair is comprised of two major and mechanistically distinct processes: non-homologous end-joining (NHEJ) and homologous recombination (HR). NHEJ involves directly ligating two broken DNA ends and is initiated by binding of Ku70/Ku80 hetero dimers at DSB sites [ 4 ]. Ku70/Ku80 localization recruits DNA-dependent protein kinase (DNAPKcs) to the DSB site, followed by Artemis-dependent end-processing, strand synthesis by DNA polymerase-beta (POLβ) and strand ligation by XRCC4, ligase IV, and XLF complex [ 5 ]. HR on the other hand is initiated by lesion recognition by ATM and processing of DSB ends by MRN complex (Mre1—Rad50-Nbs1). 53BP1 protein may play a role in pathway choice between NHEJ and HR [ 6 , 7 ]. Target-based radiosensitization approaches increase radiotherapy efficiency by selectively sensitizing tumor tissue to ionizing radiation [ 8 ]. Several new molecular targets are currently being evaluated in clinical trials to measure their radiation sensitization potential [ 9 ].

Following DNA damage, cell cycle checkpoints are activated to block cell cycle progression and prevent propagation of cells with damaged DNA. Both DNA damage repair and cell cycle checkpoints are positively regulated by several kinases, including BUB1 (Budding uninhibited by benzimidazoles-1). BUB1 is a serine/threonine kinase implicated in chromosomal segregation during mitosis. BUB1 regulates cell-cycle and is known to impact DNA damage signaling. However, it is still uncertain how BUB1 contributes to radioresistance in TNBC. BUB1 is known to localize near DSB sites where early DNA damage sensor proteins such as phosphorylated H2AX are also recruited [ 10 ]. Moreover, BUB1 co-localizes with 53BP1 suggesting a role in NHEJ pathway [ 10 ]. Knockdown of BUB1 results in prolonged γH2AX foci and comet tail formation as well as hypersensitivity in response to ionizing radiation [ 11 ]. Increased expression of BUB1 is associated with resistance to DNA-damaging agents (i.e. radiotherapy and some chemotherapies) [ 12 ] and we have shown that BUB1 inhibition reduces invasion and migration in cancer cell lines [ 13 ] through direct interaction with TGFβ receptors [ 14 , 15 ]. Moreover, BUB1 regulates cell cycle through its roles in spindle assembly checkpoint and chromosome alignment [ 16 , 17 , 18 ].

Here, we demonstrate that BUB1 is overexpressed in TNBC, and that its overexpression correlates with poorer outcome and radiation resistance. Moreover, we confirm that pharmacological or genomic ablation of BUB1 is cytotoxic to TNBC cell lines and leads to radiation sensitization. BUB1 ablation delays DSB repair as evident by prolonged γH2AX foci and affects NHEJ as evaluated by bioluminescent DNA damage repair reporters (BLRR). BUB1 inhibition causes significant decrease in tumor volume when combined with radiation in SUM159 mammary fat pad tumor xenograft models and demonstrates significant reduction in tumor cell proliferation as evaluated by Ki67 immunostaining of tumor sections. Additionally, our mechanistic studies show that BUB1 mediates radioresistance through impacting chromatin localization of core NHEJ proteins and increasing radiation mediated DNAPKcs phosphorylation and stability. Overall, our results provide evidence that BUB1 mediated radiation resistance takes place through NHEJ, specifically by regulating chromatin binding of key proteins and that combining BUB1 ablation with radiation could be an effective approach for radiosensitization of TNBC.

Gene expression data

Normalized expression data for the cell lines were downloaded from the EMBL-EBI ArrayExpress website as described in the original publication [ 19 ]. The Hatzis gene expression and survival data were downloaded from the Gene Expression Omnibus (GEO) database with series number GSE25066 [ 20 ]. A log-rank (Mantel-Cox) test was used for survival curve analyses. Data for the TCGA cohort was downloaded from http://tcga-data.nci.nih.gov . Expression levels were log transformed, median centered and scaled, subtype calls were based on previous description [ 21 ].

A receiver operating characteristic curve (ROC) was generated as an alternate way to measure the performance of BUB1 as a biomarker using area under the curve (AUC) as a metric, with an AUC >0.65 being considered of significant clinical value. BUB1 expression was evaluated as a continuous variable. BUB1 expression was measured by using RNA isolated from patients tumors at time of surgical expression, then log transformed values from the Affymetrix Human Genome U133A Array were assessed. Other clinical covariates included ER, PR, overall stage, size, nodal status, and PAM50 classification (p =0.0003).

Gene expression and metastasis correlation

In vivo screening for metastases was performed using Chick Chorioallantoic Membrane assays in 21 preclinical breast cancer models with data published previously [ 22 ]. Correlation coefficients were calculated using Pearson’s correlation methods.

Cell culture

Triple-negative breast cancer cell lines (MDA-MB-231, MDA-MB-468, BT-549), normal breast epithelial cell line (MCF10A) and Estrogen Receptor (+), Progesterone Receptor (+) breast cancer cell line T47D were obtained from the American Type Culture Collection (ATCC). SUM159 cells were originally sourced from Steve P. Either (University of Michigan) and were acquired from Sofia Merajver (University of Michigan). SUM159 cells were grown in HAM’S F-12 media (Catalog No. 31765035, Thermo Fisher Scientific) supplemented with 5% FBS, 10 mM HEPES, 1 μg/ml Hydrocortisone, 6 μg/ml Insulin, and 1% Penicillin-Streptomycin. MDA-MB-231 and MDA-MB-468 cells were grown in DMEM media (Catalog No. 30-2002, ATCC) supplemented with 10% FBS and 1% Penicillin-Streptomycin. BT-549 and T-47D cells were grown in RPMI-1640 media (Catalog No. 30-2001, ATCC) supplemented with 10% FBS, 0.023 U/ml insulin, and 1% Penicillin-Streptomycin. MCF10A cells were also grown in RPMI-1640 media supplemented with 10% FBS, and 1% Penicillin-Streptomycin. All cell lines were maintained at 37⁰C in a 5% CO 2 incubator and passaged at 70% confluence. Cell lines were routinely tested for mycoplasma contamination. Mutations in key genes is listed in Supplementary Table S1 (See Additional file 1 for Supplementary figures and tables).

Drug treatment and irradiation

A BUB1 inhibitor (BUB1i) BAY1816032 (Catalog No. HY-103020, MedChemExpress) and DNAPK inhibitor (DNAPKi) NU7441 (Catalog No. S2638, Selleckchem) were dissolved in DMSO (20 mM BUB1i and 15 mM DNAPKi) and stored at -80⁰C. For each experiment, a fresh vial was thawed, and any remaining stock solution was discarded. Working concentrations were made in media with serum and supplements and cells were exposed to a range of concentrations, from 125 nM to 1000 nM. Irradiation was performed 1 h after the drug treatment using a CIX-3 orthovoltage unit (Xstrahl Life Sciences) operating at 320 kV and 10 mA with 1 mm Cu filter.

Proliferation assay

To investigate the effect of BAY1816032 on cell proliferation in TNBC cell lines, 2 x 10 3 cells were plated into a 96-well plate 24 h prior to treatment. Cells were exposed to different concentrations of BUB1 inhibitor (BUB1i) ranging from 1 nM to 10 μM and cultured for 72 h. Cell proliferation was measured using alamarBlue (Catalog No. DAL1025, Thermo Fisher Scientific) following the manufacturer’s protocols. Absorbance was read at 570 nM on Synergy H1 Hybrid Reader (BioTek Instruments). Values were normalized to mock (DMSO/vehicle) treated cells. The IC 50 values were estimated on GraphPad Prism (V9) using a non-linear regression best-fit equation.

Clonogenic survival assay

Cells were plated in 6-well plates at different cell densities overnight. The next morning, cells were treated with BAY1816032 (125 nM to 1000 nM) for 1 hr and irradiated (2 to 6 Gy). Cells were allowed to grow for 7-23 days until visible colonies formed before being fixed and stained with methanol and crystal violet. All the colonies with >50 cells were manually counted, and the cell survival was plotted using GraphPad (V9). Plating efficiency (PE %) was estimated as: (100 x Number of colonies formed / Number of cells plated x 100). Radiation enhancement ratios (rER) were determined from the survival curve using the formula: D bar of varying inhibitor concentrations / D bar of vehicle (DMSO) (Microsoft Excel) which indicates radiation dose to produce some level of cell killing in the absence of inhibitor (i.e., vehicle) divided by the radiation dose in the presence of the inhibitor to produce the same level of cell kill. rER >1 was considered to be radiation sensitization while rER <1 was radiation resistance/protection.

Transfections

Cells were seeded in 6-well plates overnight and the transfection was performed with 60% confluent cells. The siGENOME SMARTPool siRNA for human BUB1 and DNAPK (gene ID: PRKDC) were purchased from Dharmacon. Next morning, 100 nM siRNAs were diluted in Opti-MEM reduced serum media (Catalog No. 31985062, Thermo Fisher Scientific) and transfected using Lipofectamine RNAiMAX (Catalog No. 13778075, Thermo Fisher Scientific). Diluted siRNAs and lipofectamine reagent were separately incubated for 5 mins at RT before being combined and incubated further for 20-30 mins after combining them. The siRNA-lipid complex was added to the cells in plain media without serum and antibiotics. After 48 h of transfection, the transfected cells were used for further experiments. BUB1 Wild-type (WT) and Kinase-dead (KD) plasmids were a kind gift from Dr. Hongtao Yu (UT Southwestern). The BUB1 plasmids or siRNA with plasmids were transfected with Lipofectamine 2000 (Catalog No. 11668500, Thermo Fisher Scientific) per manufacture’s protocol.

Generation of BUB1 CRISPR knockout cell lines

Cells were transfected with CRISPR/CAS9 ribonucleoprotein (RNP) for generating BUB1 knockout cell lines. BUB1 sgRNAs were designed with the CRISPR tool ( www.benchling.com ) and synthesized by Integrated DNA Technologies (IDT). We combined two sgRNAs (sgRNA1 and sgrna2) in this experiment each targeted different exons (exon 2 and 3) for better knockout efficiency. Purified CAS9 protein was purchased from IDT (Catalog No. 1081058) while Lipofectamine RNAiMAX (Catalog No. 11668027) was from Thermo Fisher Scientific. TNBC cell lines were plated in 2 wells of a 24-well plate overnight. Next morning, the cells in one well were transfected by combining BUB1 sgRNA1 and sgRNA2 (each 300 ng/well), Cas9 protein (1 µg/well) using Lipofectamine RNAiMAX (3 µl/well) and the other well was used as a negative control without sgRNAs [ 23 , 24 , 25 ]. 24-hours after transfection, cells were trypsinized and plated in 96-well plate at 1 cell/well. Cells were allowed to grow until colonies formed (2-4 weeks) and expanded into 24 well plates. Genomic DNA was isolated from these clones using the QuickExtract DNA extraction solution (Catalog No. QE09050, Lucigen) following manufacturer’s protocol. The extracted DNA was PCR-amplified with following conditions: 98⁰C for 30 s, 98⁰C for 10 s, 61.5⁰C for 30 s, 72⁰C for 23 s, and 72⁰C for 10 mins for 34 cycles. The putative BUB1 null clones were sequence verified (Sanger sequencing, Azenta Life Sciences, NJ, USA) and absence of BUB1 protein was confirmed by western immunoblotting. The efficiency of BUB1 CRISPR knockout was estimated by Synthego ICE software. gRNA sequences for BUB1 knockout, primer sequences for PCR amplification and Sanger sequencing are listed in Supplementary Table S2.

Immunoblotting

Total protein was extracted using IP-lysis buffer (50mM Tris PH 7.4, 1% NP40, 0.25% Deoxycholate sodium salt, 150mM NaCl, 10% Glycerol, and 1mM EDTA) supplemented with PhosStop (Roche), Protease inhibitor (Roche), Sodium Ortho Vanadate, Sodium fluoride, PMSF, and β-Glycerol phosphate (2 µM each). Protein concentrations were determined using Pierce BCA protein assay kit (Catalog No. 23225, Thermo Fisher Scientific) and equal amounts of samples were loaded on NuPAGE 4-12%, Bis-Tris Midi protein gels (Catalog No. WG1402BOX, Thermo Fisher Scientific) along with SeeBlue Plus2 Pre-stained Protein Standard (Catalog No. LC5925, Thermo Fisher Scientific). Samples were transferred to Immobilon-P PVDF membranes (Catalog No. IPVH00010, Millipore). The blots were blocked using 5% non-fat dry milk (Catalog No. 1706404, BioRad) and/or 5% BSA and incubated with primary antibodies at 4⁰C overnight. Membranes were incubated with HRP-tagged secondary antibodies and protein bands were detected using ECL Prime western blotting system (Catalog No. GERPN2232, Millipore Sigma). Protein band density was measured using ImageJ 1.52a. Specific antibody information and dilutions are listed in Supplementary Table S3.

Animal studies

Fox Chase SCID female mice (CB17/lcr Prkdcscid/lcrlcoCrl; 8 weeks old) (N = 52) were procured from Charles River Laboratories through the Department of Bioresources, Henry Ford Health. Mice were acclimatized for a week and housed at the Animal Facility, E&R building, Henry Ford Hospital. Experimental animals were housed and handled in accordance with protocols approved by IACUC of Henry Ford Health (protocol # 00001298). We used >9-10 mice per treatment group (2 tumors/mouse). After injecting SUM159 cells (1 x 10 6 bilaterally) into the 4 th mammary fat pads, animals were randomly assigned to receive treatment once the tumors reached a size of about 80 mm 3 . BAY1816032 (25 mg/kg, in vivo grade, Catalog No. CT-BAY181, Chemietek) dissolved in 50% PEG 400, 10% DMSO, and 40% saline was given orally twice daily (5 days) for four weeks. RT was administered in three 5 Gy fractions over 5 days (total 15 Gy) using the small animal radiation research platform (SARRP, Xstrahl Life Sciences). Animals wherein tumors were generated with SUM159 BUB1 CRISPR KO cells were treated only with radiation or sham irradiated. Tumor volume and animal body weights were measured twice a week using a digital vernier caliper and tumor volume was calculated using the formula: (Length x Width 2 ) x 3.14/6. When the tumor volume reached >1000 mm 3 , mice were euthanized according to IACUC guidelines. Linear mixed model (LMM) of log2 (tumor volume) was built on time and time* arm interaction. LMM was clustered by each tumor and nested within each mouse. 95% CI was 0.1 while p-value <0.001 was considered significant. Animal survival was estimated and depicted in a Kaplan-Meier survival plot. Logrank test were performed to estimate if the arms (treatment groups) were different (p<0.0001). Cox proportional hazards model with Firth’s penalized maximum likelihood bias reduction method was used to compare if experimental conditions resulted in significant differences.

Immunohistochemical staining

Five random tumors from each treatment groups were harvested, fixed in buffered formalin and paraffin embedded. Histological sections from individual paraffin-embedded xenograft tumor tissues were initially deparaffinized and rehydrated. These tumor sections were stained at the Histology core (Henry Ford Health) according to the manufacturer’s protocols (Ki-67 IHC MIB-1, Dako Omnis). Proliferating cells were immunostained with FLEX monoclonal mouse anti-human Ki-67 (Catalog No. GA626, Ready-to-use (Dako Omnis), Clone MIB-1, Agilent). Images of the microscopic slides were taken under the light microscope at 20x magnification in two to three random fields for each tumor ( N = 5, each arm). The % of Ki-67 positive cells was calculated using the formula: 100 x number of Ki-67 positive cells in treated / sham. H&E staining was also performed to assess the structural changes in the tumor sections.

γH2AX foci formation assay

Cells (1 x 10 5 ) were plated into a 6-well plate containing glass coverslips (12 mm Catalog No. 633029, Carolina). After treatment with BUB1i or DNAPKi (1 μM) for an hour, cells were irradiated (4 Gy) and coverslips were collected at different time points. Coverslips were washed with ice-cold PBS, fixed with 2% w/v Sucrose, 0.2% Triton X-100 in formaldehyde, and permeabilized 0.5% triton in PBS for 10 min. Coverslips were blocked for 30 min at 4⁰C (2.5% Horse serum, 2.5% FBS, 0.5% w/v BSA, 0.05% Triton X-100 in PBS) and were incubated with Anti-phospho-Histone H2A.X (Ser139); Millipore) for 1 h at room temperature. After washing, coverslips were incubated with Goat anti-Mouse-Alexa Fluor 488 (Thermo Fisher Scientific) for 30 min at 4⁰C. DAPI (1 µg/ml) was used as a nuclear counter stain and the coverslips were mounted using ProLong Gold Antifade Mountant (Catalog No. P10144, Thermo Fisher Scientific) onto glass slides and observed under a microscope (Zeiss Axio Imager 2). At least 3 random fields were imaged for each condition. Cells with more than 10 γH2AX foci were scored as positive. The percentage of γH2AX foci positive cells was calculated as: (100 x number of γH2AX foci positive cells / Total number of cells counted).

Bioluminescent NHEJ and HR reporter assays

The rate of DNA repair by NHEJ and HR was measured by the bioluminescent repair reporter (BLRR) kindly gifted by Dr. Christian E Badr, Harvard Medical School [ 26 ]. Cells (2.5 x 10 5 ) were plated in a 6-well plate and transfected with pLenti-BLRR (Addgene # 158958), pLenti-trGluc (Addgene # 158959), and pX330-gRNA (Addgene # 158973) plasmids with Lipofectamine 2000. After 48 h of transfection, the cells were reseeded into a 96-well plate, treated with BUB1i or DNAPKi (1 μM) for 1 h followed by RT (4 Gy), and replaced with fresh media. After 48 h of treatment, the cell supernatant was collected, centrifuged and 20 μl was transferred in a white opaque 96-well plate (Catalog No. IP-DP35F-96-W, Stellar Scientific). 1 mM Coelenterazine (Catalog No. 16123, Cayman Chemical) diluted to 80 μl was added to the supernatant and Gaussia luciferase activity (GLuc; HR efficiency) was measured for 0.8 s on Synergy H1 Hybrid (Biotek Instruments). 6.16 mM Vargulin (Catalog No. 305, NanoLight Technology) diluted in 50 μl was added to measure Cypridina luciferase activity (VLuc; NHEJ efficiency) with integration time of 1 s.

Quantitative PCR

Cells (1.5 x 10 5 ) were seeded in 6-well plates 24 h prior to treatment with BUB1i or DNAPKi (1 μM) and irradiation (4 Gy). Cells were harvested after 72 h and stored at -80⁰C. Total RNA was isolated with TRIzol (Catalog No. 15596026, Thermo Fisher Scientific) and concentration was measured on Nanodrop (Nanodrop 2000c, Thermo Fisher Scientific). RNA was reverse transcribed into cDNA using Super Script III Reverse Transcriptase kit (Catalog No. 18080044, Thermo Fisher Scientific), dNTPs (Catalog No. R0191, Thermo Fisher Scientific), and Random Primers (Catalog No. 48190011, Thermo Fisher Scientific). The qPCR was performed using Takyon Low ROX SYBR 2X MasterMix (Catalog No. UF-LSMT-B0701, Eurogentec) and KiCqStart pre-designed SYBR green gene-specific primers (Supplementary Table S4) in QuantStudio 6 Flex Real-Time PCR System (Applied Biosystems). Expression level for each gene was normalized to GAPDH for each experiment. All QRTPCR reactions were performed in triplicates and all experiments were repeated at least three times.

Cycloheximide-chase assay

1.5 x 10 5 cells were seeded in a 6-well plate over-night. Next morning cells were treated with 50 μM Cycloheximide (CHX; Catalog No. 14126, Cayman Chemical) to block nascent protein synthesis followed by BAY1816032 (1 μM) and 4 Gy irradiation. BUB1 CRISPR KO cells were treated with CHX followed by 4 Gy irradiation. Proteins were eluted at different time points (0 - 24h) by direct lysis (IP lysis buffer with 1.25X SDS protein loading buffer), sonicated, boiled for 7-8 mins before loading on the SDS-PAGE gels. Protein band density was quantified using ImageJ 1.52a software and calculated fold change using Microsoft Excel. The graphs were plotted in GraphPad Prism 9 software and the average half-life of BUB1 protein (t 1/2 ) was determined using Microsoft Excel.

Subcellular fractionation

The effect of BUB1 ablation on localization and movement of key DNA repair proteins on break sites and chromatin was investigated by subcellular fractionation assays. Different protein fractions were collected using Subcellular Protein Fractionation Kit (Catalog No. 78840, Thermo Fisher Scientific) according to the manufacturer’s protocols. Briefly, 2.5 x 10 6 cells were plated in 100 mm petri dishes 48 h prior to treatment. BUB1i was added 1 h prior to irradiation (8 Gy) and allowed to recover for 10 mins. Cells were harvested, protein fractions were eluted as recommended and 40 μg protein was loaded onto NuPAGE 4-12%, Bis-Tris gels for western blot analysis.

Laser micro-irradiation

U2-OS cells expressing YFP-tagged Ku80 and YFP-tagged DNA-PKcs were generated in the earlier studies (PMID: 22179609, PMID: 35580045). YFP-Ku80 and YFP-DNA-PKcs were transfected into U2-OS DNA-PKcs +/+ and −/− cells with JetPrime® (Polyplus transfection reagent, Catalog No. 101000027) following the manufacturer's instructions. To observe the role of BUB1 in the accumulation of DNA-PKcs and KU80 at DNA DSBs, BUB1 was inhibited, and the cells were subjected to laser micro-irradiation. Twenty-four hours after the transfection, laser micro-irradiation and real-time recruitment were carried out using a Carl Zeiss Axiovert 200M microscope with a Plan-Apochromat 63X/NA 1.40 oil immersion objective (Carl Zeiss) as described in previous studies [ 27 ]. A 365-nm pulsed nitrogen laser (Spectra-Physics) connected directly to the microscope's epifluorescence path was used to create DSBs [ 27 ]. During micro-irradiation, the cells were kept in an Invitrogen CO 2 -independent medium at 37°C. The fluorescence intensities of the micro-irradiated and control areas were measured using the Carl Zeiss Axiovision software, version 4.5. The irradiated area's intensity was then normalized to the non-irradiated control area in accordance with earlier descriptions [ 28 , 29 ].

Tissue Microarray

Tissue microarray (TMA) panels of human breast carcinoma with adjacent normal breast tissues (BC081120f - 110 cores/110 cases and BR1191 – 119 cores/119 cases) were purchased from Tissue Array (formerly US Biomax, Derwood, MD). Breast TMAs were stained by the Histology Core-HFH with an anti-BUB1 antibody, (Catalog # ab195268, Clone EPR18947, Abcam, 1:50 dilution) following standard protocols. The slides were scanned/imaged using Aperio digital pathology slide scanner (Leica Biosystems). The TMAs were reviewed (manual scoring) by a blinded pathologist who provided the score of 0, 1+, 2+, 3+ that measures the staining intensity of BUB1, and the percentage of cells stained positive for BUB1. Graphs were plotted based on the staining intensity and % of cells positive for BUB1 to compare between normal and breast cancer tissues, molecular subtypes, tumor grades and stages.

Statistical analysis

For the analyses of in vitro data, the Student’s t-test method was used in GraphPad Prism 9 software. Results are presented as mean ± standard error of the Mean (SEMs). All experiments were performed in triplicates and were repeated at least three times. Correlation coefficients were calculated using Pearson’s correlation methods. P < 0.05 was considered statistically significant. The statistical analysis of in vivo tumor growth data is presented under that section.

BUB1 is overexpressed in TNBC and correlates with poorer survival and metastatic potential

In an effort to identify novel therapeutic targets for radiosensitization, we performed a screen focused on the human kinome to identify kinases upregulated in across 21 breast cancer cell lines that also impacted radiation sensitivity in human breast tumors [ 30 ]. We identified a list of 52 kinases whose expression was significantly elevated in triple-negative breast cancer. We hypothesized that many of these kinases would govern mitogenic, metastatic, survival, or growth regulatory pathways critical to the development and dissemination of triple-negative breast cancer that could be readily targeted for the treatment of patients with triple-negative and basal-like breast cancer. To further characterize which of these 52 kinases played an important role in the aggressive features of triple-negative breast cancer, we combined expression, phenotypic, and clinical outcomes data to prioritize kinases that warranted further interrogation. We prioritized those kinases that had the highest level of differential expression in triple-negative breast cancer, showed limited to no expression in normal tissues (including the mammary gland, thus were specific for breast cancer), were associated with clinically relevant outcomes, and for which we would be able to obtain or generate a specific inhibitor that was of clinical-grade quality to aid in translational efforts. To that end, BUB1 was one of the top nominated of the 52 kinases as it showed significantly elevated expression in triple-negative and basal-like breast cancers and limited expression in normal tissues (Fig. 1 A-B, data based on expression in over 1000 patient tumors from TCGA). Additionally, BUB1 expression is significantly associated with basal-like and luminal B tumors and in triple-negative breast cancers (Fig. 1 C-D). BUB1 expression is also much higher in breast cancer cell lines with basal-like characteristics and in cell lines with increased metastatic potential (Fig. 1 E) [ 19 ]. To further investigate the association of BUB1 expression with the metastatic potential of various breast cancer cell lines, we performed chick chorioallantoic membrane (CAM) assays on 21 breast cancer cell lines and quantitated the number of metastatic cells in the lungs and liver of chick embryos after injection of each of these 21 cell lines. This data was then correlated with BUB1 expression and there was a significant association between BUB1 expression and metastatic potential in this in vivo system (Fig. 1 F, R 2 =0.64, p-value 0.004).

figure 1

BUB1 is highly expressed in breast cancer compared to normal, non-malignant breast tissue and is associated with triple-negative and basal-like breast cancers. A-B , BUB1 expression is significantly increased in breast tumors compared to normal breast tissue. C , BUB1 expression is strongly associated with the PAM50-defined basal-like subtype of breast cancer and ( D ), is also significantly elevated in TNBC. E , BUB1 expression is significantly increased in basal-like breast cancer cell lines. F , BUB1 expression strongly correlates with metastatic potential to the lungs and liver as measured by CAM assay in vivo . All CAM assays performed at least in triplicate. G , Kaplan-Meier survival plot demonstrate that high BUB1 levels are associated with worse overall survival in breast cancer patients (data from Hatzis et al, JAMA 2011). H , On multivariable analysis, BUB1 expression discriminates overall survival with high sensitivity and specificity (AUC: 0.68, <0.01). I , Raw data that was used for the analysis of the receiver operating characteristic curve (ROC). z statistic 3.631, *** P = 0.0003, a DeLong et al., 1988 [ 31 ], b Binomial exact

To investigate the clinical relevance of our findings, we assessed the impact of BUB1 expression on clinical outcomes. We found that BUB1 expression was significantly associated with poor outcomes (including higher mortality and increased rates of recurrence) in both women treated with chemotherapy and radiation therapy, the two most common adjuvant treatment modalities for women with breast cancer, with high BUB1 expression being strongly associated with worse overall survival in women with breast cancer (Fig. 1 G) [ 20 , 32 ]. Furthermore, we demonstrate that BUB1 outperforms every other clinical or pathologic parameter (i.e., T-stage, grade, age, nodal status, ER, PR, Her2, margin, etc.) as a predictive biomarker of response (as measured by metastasis-free survival) to chemotherapy in a dataset of patients treated with paclitaxel and anthracycline-based chemotherapy with an AUC of 0.68 (Fig. 1 H; *** P = 0.0003). The control group, tissue sample type, and detection method for Fig. 1 H are described in the Fig. 1 I.

Pharmacological inhibition of BUB1 reduces viability of breast cancer cells

To study the effect of BUB1 inhibition in TNBC, we used the selective inhibitor of BUB1 kinase, BAY1816032. We assessed the effects of BAY1816032 on proliferation of TNBC (SUM159, MDA-MB-231, MDA-MB-468, BT-549) cells (Fig. 2 A-D), luminal A subtype (T-47D) (Fig. 2 E) and the non-tumorigenic human breast epithelial cell line, MCF10A (Fig. 2 F). BAY1816032 is cytotoxic in all breast cancer cell lines tested with IC 50 values ranging from 1.6 μM to 3.9 μM. However, BAY1816032 had less cell killing and/or growth inhibitory effects in MCF10A with IC 50 around 18 μM. This response correlated with differential BUB1 mRNA expression (Fig. 1 E) and BUB1 protein expression (Fig 2 G). Based on these observations, we hypothesize that breast cancer cell lines that express high BUB1 would be radiosensitized by BAY1816032 while the cell lines that express low to moderate BUB1 would not.

figure 2

Effect of BUB1 inhibitor on cell proliferation in TNBC cell lines. BAY1816032 is cytotoxic to cells at low micromolar range ( A ) SUM159, IC 50 : 2.90 μM; ( B ) MDA-MB-231, IC 50 : 2.10 μM; ( C ) MDA-MB-468, IC 50 : 2.59 μM; ( D ) BT-549 IC 50 : 1.59 μM; ( E ) T-47D, IC 50 : 3.9μM; ( F ) MCF10A, IC 50 : 18 μM. G BUB1 protein expression in cell lines by immunoblotting; gray scale values of BUB1 are normalized over Actin for each cell line. Pharmacological inhibition of BUB1 induces radiosensitivity in TNBC cell lines: ( H ) SUM159, ( I ) MDA-MB-231, ( J ) MDA-MB-468, ( K ) BT-549, ( L ) T-47D, and (M) MCF10A. P -values were defined as * P ≤0.05, ** P ≤0.01, *** P ≤0.001, **** P ≤0.0001

BUB1 inhibition causes durable radiosensitization in TNBC cell lines

We evaluated the effect of BAY1816032 on radiation sensitivity in Basal A (MDA-MB-468), Basal B (MDA-MB-231, SUM159, BT-549) (Fig. 2 H-K) and Luminal A (T-47D) (Fig. 2 L) cell lines by clonogenic survival assays. High levels of BUB1 expressed in selected Basal A and B cell lines ( BUB1-high ) while expressed at low level in Luminal cells ( BUB1-low ). BUB1-high cells were radiosensitized by BAY1816032 (rER from 1.1 to 1.38) while the radiation sensitivity of BUB1-low cells did not increase with BAY1816032 (rER 0.91). As expected, BAY1816032 had no effect on radiosensitivity in MCF10A cells (Fig. 2 M). BAY1816032 led to a significant dose-dependent reduction in the surviving fraction at 2 Gy (SF-2 Gy) in BUB1-high cells indicating that BUB1 kinase function is important for radioresistance. Moreover, BAY1816032 did not significantly impacted SF-2Gy in BUB1-low cells.

Genomic depletion of BUB1 is cytotoxic and makes TNBC cells radiosensitive

We evaluated the effect of BUB1 genomic depletion on cell survival and radiation sensitivity. SUM159 and MDA-MB-231 cells were transiently transfected with an increasing concentration of BUB1 siRNA (20, 60 and 100 nM) or control siRNA (100 nM) and cell viability was measured by alamarBlue assay (Fig. 3 A-B). The siRNA-mediated BUB1 depletion demonstrated a dose-dependency on cell survival. Additionally, BUB1 was depleted in MDA-MB-468, BT-549 and T-47D cells which also exhibited significant reduction in cell viability as compared to control siRNA (Fig. 3 C-E). DNAPKcs (gene ID: 5591, PRKDC ) siRNA was used as a positive control since its inhibition or knockdown is known to reduce cell survival [ 33 ] because of the role it plays in DNA DSB repair process [ 34 ].

figure 3

Effect of BUB1 genomic depletion on cell survival and radiation sensitivity. Transient transfection of BUB1 siRNA (20, 60 and 100 nM) or control siRNA (100 nM) measured cell viability using alamarBlue assay in ( A ) SUM159, ( B ) MDA-MB-231, ( C ) MDA-MB-468, ( D ) BT-549, and ( E ) T-47D. Effect of siRNA-mediated BUB1 depletion on radiosensitization was measured in these cell lines. Transient BUB1 siRNA transfection led to moderate radiosensitization with rER 1.0 to 1.2 in ( F ) SUM159, and ( G ) MDA-MB-231; After silencing of BUB1, BUB1-WT re-expression rescues the radiosensitization phenotype while BUB1-KD does not in ( H ) SUM159 and ( I ) MDA-MB-231. Genomic depletion of BUB1 by CRISPR/Cas9 leads to radiosensitization in ( J ) SUM159 and ( K ) MDA-MB-231 cells; Re-expression of BUB1-WT rescues the radiosensitization phenotype in BUB1 CRISPR KO ( L ) SUM159 and ( M ) MDA-MB-231 cells but BUB1-KD does not in. P -values were defined as * P ≤0.05, ** P ≤0.01, and *** P ≤0.001

Effect of siRNA-mediated BUB1 depletion on radiosensitization was measured in all the selected breast cancer cell lines (Fig. 3 F-G; Supplementary Fig. S5). We observed moderate radiosensitization (rER 1.0 to 1.2) when BUB1 was transiently depleted by siRNA. BUB1 depletion led to a significant reduction in the surviving fraction at 2 Gy (SF-2 Gy). Western blot analyses of total cell lysates following transfection of siRNA revealed that BUB1 could be efficiently repressed. In order to confirm that these effects are mediated by BUB1, we performed the same experiments in SUM159 and MDA-MB-231 (BUB1 depleted) cells with reintroduction of wild-type or kinase dead BUB1 (BUB1-wt, BUB1-kd) (F i g. 3 H-I). Addition of BUB1-wt restored radioresistance in both the cell lines (rER 0.9) while BUB1-kd addition did not (rER 1.0 to 1.1) and this response was correlated with immunoblotting analyses.

Since transient BUB1 depletion by siRNA did not lead to significant radiation sensitization, we generated BUB1 knockout (BUB1 KO) SUM159 and MDA-MB-231 cell lines by CRISPR-CAS9 RNP transfection. Multiple BUB1 CRISPR clones were validated by Western blotting and Sanger sequencing to confirm complete BUB1 KO (Supplementary Fig. S6). Two different BUB1 KO clones for each cell line were used for subsequent experiments. SUM159 BUB1 KO clones demonstrated significant radiation sensitization (clone #18 rER 1.24, clone #48 rER 1.27) (Fig. 3 J). There was also significant decrease in surviving fractions at 2 Gy (SF-2 Gy) in these clones. Similarly, significant radiation sensitization was observed in MDA-MB-231 BUB1 KO clones (clone #12 rER 1.57, clone #15 rER 1.37) and also significant reduction in surviving fractions at 2 Gy (Fig. 3 K). To further confirm a role for BUB1 in radiation sensitization, BUB1-wt and BUB1-kd plasmids were transfected in one BUB1 CRISPR KO SUM159 and MDA-MB-231 clone each and clonogenic survival assay was performed (Fig. 3 L-M). In both the cases, we observed significant radiation sensitization which was reversed when BUB1-wt was expressed (rER 0.9) but not in BUB1-kd expressed cells (rER 1.0), as demonstrated by immunoblotting. The rER value of SUM159 and MDA-MB-231 BUB1 KO clones presented in Fig. 3 L-M is lower than that of Fig. 3 J-K due to the toxicity that is commonly observed with the Lipofectamine 2000 transfection reagent, which we used to transfect the BUB1-wt and BUB1-kd plasmids.

BUB1 inhibition radiosensitizes SUM159 tumor xenografts and prolongs animal survival

To determine the effects of BUB1 inhibition on radiosensitization in vivo , xenograft tumors ( N = >9-10/arm) were generated by injecting SUM159 cells into the 4 th mammary fat pads of female CB17/SCID mice. Mice were randomized to different treatment groups once the tumors reached ~80 mm 3 . Mice received either BUB1 inhibitor BAY1816032 (25 mg/kg, twice daily for 4 weeks, week days only), RT (5Gy X3, 2 days apart), combination or sham irradiation/vehicle (Fig. 4 A). We initially tested the doses/fractions (2.5 GyX8, 5 GyX3 and 10 GyX1) that yielded similar equivalent dose (EQD2; 21.7-23.3Gy) and biologically effective dose (BED; 32.5-35Gy) using an alpha/beta ratio of 4 which enabled us to explore whether high dose/fraction was more effective than standard fractionation. We observed insignificant benefits of adding 2.5GyX8 and 10 GyX1 radiation with BUB1i while 5GyX3 schema demonstrated superior tumor control (Supplementary Fig. S7A) which was selected for the subsequent repeat experiments. In combination treatment arm, RT started 24 h after the first treatment with BUB1i. BAY1816032 with RT significantly reduced tumor growth (Fig. 4 B) compared with inhibitor or RT alone and significantly extended animal survival (Fig. 4 C-E). There was no toxicity of BUB1 inhibitor since body weight of experimental animals remained constant during the study period (Supplementary Fig. S7B). Immunohistochemical staining of Ki67 (marker for proliferation) from tumors collected at the study end point revealed a significant reduction in Ki67 positivity in combination treatment than either treatment alone which also correlated with H&E staining pattern (Fig. 4 F-G).

figure 4

BUB1 inhibition sensitizes SUM159 tumor xenografts to radiation ( A ) Timeline of the experiment; ( B ) Representative images of tumor growth in different treatment groups; ( C ) Combination treatment of BAY1816032 + RT reduces tumor volume in vivo; ( D, E ) Combination treatment increases tumor volume doubling time in Fox Chase SCID mice; Representative images of ( F ) H&E staining showing structural changes and Ki67 staining (a proliferation marker) revealed a significant reduction in combination treatment of SUM159 xenografts; ( G ) Ki-67 plot showing decrease in % of positive cells in combination treatment of BUB1i + RT. P-value was defined as **** P ≤0.0001

Additionally, we generated mammary fat pad tumor xenografts in CB17/SCID mice ( N = 4-10/arm) using SUM159 BUB1 CRISPR KO cell line (clone #48). Animals were randomly divided into treatment groups once the tumors established (~80 mm 3 ) and treated with RT (5Gy X3) or sham irradiated (Fig. 5 A). There was a significant increase in mouse survival in combination treatment group as compared to sham irradiation (Fig. 5 B-E).

figure 5

Tumor xenograft of BUB1 CRISPR KO SUM159 cells are sensitive to irradiation. A Timeline of the experiment; ( B ) Representative images of tumor growth in different treatment groups; ( C ) Treatment of BUB1 KO + RT reduces tumor volume in vivo; ( D, E ) Treatment of BUB1 KO + RT increases tumor volume doubling time in Fox Chase SCID mice. P -value was defined as * P ≤0.05

BUB1 inhibition reduces radiation induced DSB repair as visualized by γH2AX foci

We next investigated the effect of BAY1816032 on dsDNA break repair. γH2AX foci (> 10 foci per cell), a marker for unresolved double strand DNA damage was assessed in cells treated with DMSO and 1 μM BAY1816032, either with or without RT (4 Gy) at different time points (30 min, 4 h, 16 h, 24 h). NU7441 (DNAPK inhibitor) was used as a positive control. Representative images are shown of γH2AX (16 h) in SUM159 and MDA-MB-231 cell lines (Fig. 6 A and C). Non-irradiated cells had fewer γH2AX positive cells. RT induced the formation of γH2AX foci in approximately 40% of cells within 30 mins post-irradiation, peaked at 4 h, gradually decreased by 16 h and reached near baseline levels by 24 h. However, pretreatment with BAY1816032 resulted in a slight increase in the number of foci (approximately 90% of cells) at 30 min post-irradiation and the expression of γH2AX foci continued to remain elevated thereafter; even at 16 and 24 h with a significantly higher number of foci in the BAY1816032 pre-treated group compared to RT alone group (Fig. 6 B and D). Cells treated with RT alone efficiently repaired the RT-induced dsDNA damage than the combination over the time, suggesting that BUB1 inhibition delayed the RT induced dsDNA break repair efficiency.

figure 6

BUB1 ablation radiosensitize through NHEJ. Representative images of ( A ) SUM159 and ( C ) MDA-MB-231 γH2AX foci at 16 h. Original magnification, ×63; Combination treatment of BUB1i and RT leads to delayed resolution of γH2AX foci in ( B ) SUM159, and ( D ) MDA-MB-231 cell lines. Inhibition of BUB1 kinase function by BAY1816032, at 1 μM and 10 μM, decreases NHEJ efficiency (V Luc) and increases HR efficiency (G Luc) in ( E ) SUM159, and ( F ) MDA-MB-231. Effect of DNAPK inhibitor (NU7441) on cell proliferation in TNBC cell lines. NU7441 is cytotoxic to cells at low nanomolar range ( G ) SUM159, IC 50 : 368 nM; ( H ) MDA-MB-231, IC 50 : 503 nM; Combination of BAY1816032 and NU7441 does not increase DNAPKcs-mediated radiosensitization in ( I ) SUM159 ( J ) MDA-MB-231 cell lines. Inhibition of BUB1 increased transcription of DNA damage genes after radiation. Significant upregulation of H2AFX and downregulation of PRKDC levels in ( K ) SUM159 and ( L ) SUM159 BUB1 CRISPR KO cells were observed. P -values were defined as * P ≤0.05, ** P ≤0.01, *** P ≤0.001, **** P ≤0.0001

We also assessed the dsDNA break repair using BUB1 siRNA in SUM159 and MDA-MB-231 cell lines. Representative images of γH2AX (16 h) are shown in Supplementary Fig. S8. Almost all cells treated with BUB1 siRNA were γH2AX foci positive in presence or absence of RT (4 Gy) at 30 min and 4 h. The largest differences were seen at subsequent time points (16 and 24 h) in which BUB1 depletion resulted in persistence of γH2AX foci, whereas the foci began to resolve in presence of BUB1. These results indicate that inhibition of BUB1 kinase activity most likely results in a slower rate of DNA damage repair.

BUB1 inhibition reduces non-homologous end joining (NHEJ) repair

The two major pathways for repair of DNA DSBs include HR and NHEJ [ 35 ]. Though either may be involved in repairing dsDNA breaks, earlier reports suggested a potential link between BUB1 expression and NHEJ pathway [ 10 ]. Thus, we hypothesized that reduced NHEJ repair efficiency is partly responsible for BUB1-mediated radiosensitization and prolonged unresolved dsDNA breaks. Following the induction of a DSB, we used BLRR approach to simultaneously monitor the NHEJ and HR dynamics [ 26 ]. We aimed to confirm if BUB1 inhibition impacted NHEJ or HR since it has been previously demonstrated that knockdown of BUB1 reduces NHEJ efficiency [ 10 ]. BLRR transfected cells treated with BAY1816032 at two different concentrations (1 and 10 μM) in presence of RT (4 Gy) led to a significant decrease in NHEJ (VLuc activity) signal as the GLuc signal (HR activity) increased reciprocally in a dose-dependent manner (Fig. 6 E-F). NU7441 was used as a positive control. These results indicate that BUB1 inhibition decreases NHEJ-mediated DNA damage repair efficiency and BUB1-mediated radiosensitization may take place through the NHEJ pathway.

BUB1 inhibition does not increase DNAPKi-mediated radiosensitization

The above results encouraged us to further assess the effect of BUB1 inhibition in combination with a DNAPK specific inhibitor NU7441, which is well-known to impair NHEJ-mediated radiation-induced DSB repair [ 36 ]. Initially, we investigated the cytotoxicity of NU7441 in SUM159 and MDA-MB-231 cells at 72 h. The IC 50 value of NU7441 on these cells ranges from 300 - 500 nM (Fig. 6 G-H). The radiosensitization effects following treatment with a combination of BAY1816032 and NU7441 (250 nM each) in presence of radiation (0, 2, 4, 6 Gy) were assessed using clonogenic survival assays. When combined, BUB1 inhibition does not increase DNAPK inhibitor driven radiosensitization (combination rER ranges from 1 to 1.3) which further confirms that BUB1-mediated radiosensitization takes place through NHEJ pathway (F i g. 6 I-J). Furthermore, the surviving fraction at 2 Gy (SF-2 Gy) was significantly reduced by the combined effect of BAY1816032 and NU7441 (Fig. 6 I-J; inset plots) but it was not significantly different than either agent alone.

Pharmacological and genomic ablation of BUB1 causes increased transcription of DNA damage genes after radiation

Cells were pre-treated with BUB1i, irradiated (4 Gy) 1h after BUB1i and harvested 72h post RT to examine the impact of BUB1 inhibition on NHEJ pathway associated genes by qPCR. The expression of H2AFX, XRCC5, XRCC6, PRKDC, and BUB1 was measured and normalized against GAPDH . These results demonstrated an increase in H2AFX, XRCC5, and XRCC6 in BUB1i treated SUM159 (Fig. 6 K) and MDA-MB-231 cells (Supplementary Fig. S9, top panel). We observed significant downregulation of PRKDC and BUB1 in BUB1i treated cells. (Fig. 6 K and Supplementary Fig. S9). Similar results were obtained in both BUB1 CRISPR KO cell lines (SUM159 KO #48; Fig. 6 L and MDA-MB-231 KO #12; Supplementary S9, bottom panel) further supporting a role for BUB1 in regulating mRNA levels of key NHEJ genes in response to radiation.

BUB1 ablation increases DNAPKcs phosphorylation and stabilizes it after irradiation

DNAPK catalytic subunit (DNAPKcs) is a well-known mediator of DNA DSB repair through the activation of NHEJ [ 37 , 38 ]. DNAPK autophosphorylates at Ser2056 (PQR cluster) and Thr2609 (ABCDE cluster) in response to DSB induction [ 39 , 40 , 41 ] which may limit or promote DNA end processing during NHEJ [ 40 , 42 ]. Thus, we evaluated if BUB1 ablation had any effect of DNAPK phosphorylation at Ser2056 (S2056) in MDA-MB-231 (Fig. 7 A) and MDA-MB-468 (Supplementary S10A) cells. As expected, radiation treatment led to an increase in DNAPK phosphorylation (pDNAPKcs) at S2056 which was significantly increased in samples that had been pre-treated with BUB1i (Fig. 7 A). DNAPK inhibitor NU7441 was used as a positive control in parallel experiments. Not surprisingly, pre-treatment with NU7441 almost completely blocked radiation induced DNAPKcs S2056 phosphorylation in these cells (Fig. 7 A and Supplementary S10A). There were no noticeable changes in the expression of KU70, KU80, or total DNAPKcs. Since radiation induced DNAPKcs autophosphorylation can be observed within minutes [ 28 , 39 , 40 ] and phospho-DNAPKcs levels decrease afterwards [ 41 ], we next investigated if BUB1 ablation changes pDNAPKcs dynamics following radiation. Cells were pre-treated with BUB1i for 1hr followed by 4 Gy radiation and collected at various intervals (0, 15, 30, and 120 min). We observed that the pre-treatment with BAY1816032 augmented the expression of pDNAPKcs (S2056), which was noticeable up to 2h while pDNAPKcs started to decrease after 30 minutes in the radiation alone group in MDA-MB-231 while it was noticeable in MDA-MB-468 only at 120 minutes in RT only lanes (Fig. 7 B and Supplementary S10B). This data indicates that BUB1 ablation increases the amplitude and duration of radiation induced pDNAPKcs within a PQR cluster site.

figure 7

BUB1 ablation leads to increased phosphorylation of DNAPKcs, alters chromatin localization of key NHEJ factors and induces apoptotic cell death upon irradiation. A MDA-MB-231 cells were treated with BUB1i or DNAPKi an hour prior to radiation treatment. Cells were harvested 30 minutes post RT (4Gy) and resolved on SDS-PAGE gels and probed with indicated antibodies. B MDA-MB-231 cells were treated as (A) and harvested at 1-, 15-, 30- and 120-minutes post-RT and immunoblotted as specified. C SUM159 (top panel) MDA-MB-231 cells (bottom panel) were treated with cycloheximide followed by BUB1i or DNAPKi and radiation (4Gy). Total protein lysates were made at the indicated time-points and resolved on gels. D BUB1 CRISPR KO SUM159 (left panel) or MDA-MB-231 (right panel) cells were treated with cycloheximide, and radiation and samples were harvested at different time points. E Quantitation of pDNAPKcs protein levels in SUM159 and MDA-MB-231 cells (from 7C and other experiments). F Quantitation of BUB1 protein levels in SUM159 and MDA-MB-231 cells (from above experiments). G Nuclear and chromatin fractions of SUM159 and MDA-MB-231 cells treated with BUB1i, DNAPKi and RT (left) and BUB1 CRISPR KO SUM159 and MDA-MB-231 cells treated with RT (right panels). H Effect of BUB1 inhibitor (red circles) on initial recruitment of YFP-tagged KU80 and YFP-DNAPKcs by laser microirradiation in U2OS cells. I effect of BUB1 inhibition on the accumulation of YFP-KU80 and YFP-DNAPKcs at laser-induced DSBs for up to 120 minutes. J QRT-PCR of BAX, BCL2, PCNA, CASP3 and CASP9 in SUM159 cells treated with BUB1i, DNAPKi and radiation (4 Gy, 72 hours). (I) QRT-PCR of BAX, BCL2, PCNA, CASP3 and CASP9 in SUM159 BUB1 CRISPR cells 72 hours post-irradiation (4 Gy). P -values were defined as * P ≤0.05, ** P ≤0.01, and *** P ≤0.001

To validate if observed increase in amplitude and duration of pDNAPKcs was due to the stabilization of pDNAPKcs-S2056, we carried out an experiment wherein nascent protein synthesis was blocked by cycloheximide (CHX). MDA-MB-231 and SUM159 cells were treated with CHX, followed by BUB1i, DNAPKi, vehicle/mock and radiation (4Gy). Protein samples were collected at various time points (0 min, 30 min, 2 h, 8 h, 16 h, and 24 h) and resolved on SDS-PAGE gels (Fig. 7 C). Densitometric analysis yielded half-life (t 1/2 ) of pDNAPKcs at >24h in radiation treated samples which significantly increased upon DNAPKi treatment in SUM159 cells. Surprisingly, combination of BUB1i with RT significantly stabilized pDNAPKcs up to the longest time point evaluated (24h) such that t 1/2 could not be estimated (Fig. 7 C). These results demonstrate that BUB1 ablation stabilizes radiation induced pDNAPKcs. In BUB1 CRISPR KO cell lines (SUM159 KO#48 and MDA-MB-231 KO#12), DNAPKcs phosphorylation was detectable up to 24 h in the presence of RT further confirming a role for BUB1 in stabilizing pDNAPKcs (i.e., active DNAPKcs) in response to radiation (Fig. 7 D). Interestingly, we observed that BUB1 protein was stabilized upon radiation treatment (t 1/2 = ∞, Fig. 7 C and 7E) which was reversed in cells pre-treated with BAY1816032 (t 1/2 = 8h, Fig. 7 F). BUB1 inhibitor at clonogenic concentrations did not affect BUB1 protein levels in MCF10A cells (Supplementary Fig. S11).

BUB1 ablation alters chromatin localization of NHEJ proteins

Chromatin remodeling increases the accessibility of the region surrounding a DNA lesion for proteins involved in DNA damage response and repair [ 43 ]. DNA damage sensors and early signal transducers are rapidly attracted to damaged DNA sites right after the radiation exposure [ 44 ]. We postulated that the initial local chromatin relaxation brought about by BUB1 kinase activity is necessary for the rapid loading of the NHEJ machinery to DSBs. To examine this, nuclear and chromatin fractions were isolated 10 min post-DNA damage with 8 Gy RT in BUB1i-treated and BUB1 CRISPR KO SUM159 and MDA-MB-231 cell lines (Fig. 7 G). We observed an increased recruitment of phospho-DNAPKcs, total-DNAPKcs, and KAP1 in both nuclear and chromatin-enriched fractions suggesting that BUB1 plays a crucial role in the activation and recruitment of key NHEJ proteins to DSBs. There was no change in the enrichment of KU70 and KU80 proteins in these fractions. The hypothesis that BUB1 is necessary for the quick recruitment of the NHEJ factors to DNA damage sites was supported by laser micro-irradiation experimental findings. Since BUB1 interacts with DNAPKcs just after DSB induction [ 10 ], we additionally looked at whether BUB1 regulates DNAPKcs at DSBs. Inhibition of BUB1 does not affect the initial recruitment of YFP-KU80 as viewed in Fig. 7 H (top, 200 sec) while BUB1i results in rapid recruitment of YFP-DNAPKcs to DSBs compared to the vehicle treated cells (Fig. 7 H, bottom). In contrast, BUB1 inhibition resulted in prolonged retention of KU80 and DNAPKcs at DSBs for up to 120 minutes (Fig. 7 I). Gene correlation analysis on the METABRIC dataset identified very strong correlation between BUB1 and H2AX (spearman correlation 0.58), PRKDC (0.39), and moderate correlation with XRCC5 (0.05) and XRCC6 (0.29) further corroborating a strong link between BUB1 and NHEJ mediators (Supplementary S12).

BUB1 ablation increases transcription of apoptotic genes after irradiation

Since BUB1 increased radiation induced cell death (Fig. 2 H-M, and Fig. 3 F-M) and led to increased loading of key NHEJ factors chromatin fractions (i.e., DNA damage; Fig. 7 G), we next sought out to elucidate cell death mechanisms mediated by the combination treatment. qRT-PCR for pro-apoptotic, anti-apoptotic and proliferation genes demonstrated significant upregulation of BAX , CASP3 and CASP9 while significant downregulation of PCNA and BCL2 was observed after BUB1 ablation in SUM159 and MDA-MB-231 cell lines (Fig. 7 J and Supplementary S13). Gene correlation studies using the METABRIC dataset identified very strong correlation between BUB1 and MKI67 (spearman correlation 0.71), CASP3 (0.44), BAX (0.27), BCL2 (-0.42) and PCNA (0.48) all with p <*** (Supplementary S13) further supporting a role for BUB1 in facilitating radiation induced apoptosis.

BUB1 is overexpressed in tumors and its expression correlates with tumor grade

We examined the expression of BUB1 in breast tumors (N = 202) and compared with normal breast tissues (N = 15). Expression levels of BUB1 protein were graded based on staining intensity and percentage of cells positively stained for BUB1. Levels of immunopositivity were scored as follows: 0 (No staining); 1+ (Weak staining); 2+ (Moderate staining); 3+ (Strong staining). Scores of 0 designated as negative, and scores of 1, 2, and 3 were designated as positive. Examples of BUB1 staining are illustrated in Fig. 8 A under 4x and 20x magnifications. Immunohistochemical analysis revealed a significantly high BUB1 protein expression in breast tumors compared to normal breast tissue. We observed significant correlation between BUB1 protein expression (staining intensity) and tumor grades and stages (Fig. 8 B). Furthermore, BUB1 was overexpressed in TNBC ( N = 50; P<0.05 ), ER+/PR+ ( N = 63; P<0.001 ), ER+/PR+/HER2+ ( N = 19; P<0.05 ), ER+ ( N = 37; P<0.05 ), ER+/HER2+ ( N = 12; P<0.01 ), HER2+ ( N = 18; P<0.0001 ), and PR+ ( N = 3; P<0.0001 ) compared to normal breast ( N = 15). Although, we observed highest BUB1 staining intensity in PR+ tumors, the number of PR+ samples in the current TMA are too small to statistically support the findings.

figure 8

BUB1 is overexpressed in breast tumors. A Representative images of BUB1 staining intensity at 4x and 20x magnifications in breast TMA. B Quantification of BUB1 staining in breast tumor TMA. C Proposed model for a role of BUB1 in mediating radiation induced NHEJ signaling. We propose that radiation induced DNA DSB are repaired efficiently when BUB1 is present (left panel) leading to radiation resistance. In the absence of BUB1 activity or availability, radiation induces hyper phosphorylation of DNAPKcs (Ser2056) and increased binding of NHEJ mediators at the DNA DSB sites (right panel). These NHEJ mediators may not stay on the extended chromatin thus hamper end processing causing radiation-sensitization. P -values were defined as * P ≤0.05, ** P ≤0.01, *** P ≤0.001, and **** P ≤0.0001

BUB1 is a serine/threonine kinase required for optimal DNA damage response as there is increasing evidence that DNA damage response elements and spindle assembly checkpoint components crosstalk [ 11 ]. We identified BUB1 as a key kinase associated to radiosensitivity in a focused human kinome screen [ 30 ]. Nevertheless, there is no data that links BUB1 to radiation therapy or DNA damage repair in TNBC. Here, we demonstrate that BUB1-specific inhibitor BAY1816032 radiosensitized TNBC models, a subtype of breast cancer known to have limited treatment options with poorest prognosis [ 45 ]. Previous studies have shown the advantage of radiation therapy in reducing local recurrence rates, and this was validated in a randomized controlled trial in patients with TNBC [ 46 ]. However, radioresistance is a major cause of treatment failure or locoregional relapse in TNBC. Here, we provide evidence that BUB1 mediates radiation resistance in TNBC through modulating DNA DSB repair.

Our study showed that BUB1 is overexpressed (differential mRNA levels) in breast cancer with the highest expression in TNBC (Fig. 1 D). However, BUB1 protein expression is found to be slightly different when compared to the differential mRNA expression (Fig. 2 G). We hypothesize that several factors including delayed protein synthesis, post-transcriptional and post-translational modifications, different protein half-lives cause reduced mRNA/protein correlations [ 47 , 48 , 49 ]. Generally, only 20 – 40% correlation is observed between protein expression and corresponding mRNA levels [ 50 , 51 ]. Our findings that BUB1i was effective at a log lower concentration in cancer cells (Fig. 2 A-E) compared to normal breast epithelial MCF10A cell line (Fig. 2 F) demonstrate the selectivity and potentially minimal toxicity of BUB1i in future translational studies given it was found to be safe in large animal models [ 52 ]. Our observations that BUB1 ablation sensitizes TNBC cell lines (SUM159, MDA-MB-231, MDA-MB-468, and BT-549) but not Luminal A subtype (T47D) further support a role for BUB1 in mediating radiation-resistance phenotype in TNBC [ 30 ]. PI3K family kinases including ATM, ATR, and DNAPK phosphorylate Ser139 in H2AX upon DNA damage which is necessary to sustain the stable association of repair factors at DSB sites [ 53 ]. ATM phosphorylates BUB1 at Ser314 that activates BUB1 resulting in optimal DNA damage response [ 11 ]. By re-expressing BUB1-WT and BUB1-KD in BUB1 knockout cells, we confirmed that BUB1 activity plays a role in radiation (Fig. 3 ) and DDR responses (Fig. 6 A-D). Biochemical or genomic BUB1 ablation radiosensitized SUM159 mouse xenograft model (Figs. 4 and 5 ) further corroborating a role for BUB1 in mediating radiation response. Prolonged presence of γH2AX foci after irradiation in BUB1 ablated cells supports earlier reports on delayed or unrepaired DSB after BUB1 ablation [ 10 , 54 ]. The BLRR assay confirmed that BUB1i radiosensitizes TNBC through NHEJ (Fig. 6 E-F) which was further confirmed by no increase in DNAPKi mediated radiosensitization by BUB1i (Fig. 6 I-J). Similar observations with DNAPKi have been reported [ 55 ].

Recent evidence has shown that the DNAPKcs and DNA methyltransferase inhibitors are effective at sensitizing TNBC to PARPi and radiation [ 56 ]. Autophosphorylation of DNAPKcs at Ser2056, a known autophosphorylation site within the PQR cluster regulates DNA end processing and possibly DSB repair pathway choice [ 57 , 58 ]. Surprisingly, we identified that BUB1 ablation increased the level and amplitude of pDNAPKcs-S2056 following radiation. The pDNAPKcs was stabilized till the longest time point evaluated (24hr). Our observations support previous studies which demonstrated higher or persistent pDNAPKcs following DNAPKi, ATMi with IR or other DNA damaging agents [ 39 , 58 , 59 , 60 ]. Although, we have not confirmed the mechanism of pDNAPKcs stabilization by BUB1, we are tempted to speculate that known (RNF144A [ 61 ], MARCH5 [ 62 ], CRL4ADTL [ 63 ]) or yet unrelated E3-ubiquitin ligase(s) may be involved. Although tumor suppressor protein P53 (p53) has been linked to BUB1 expression [ 64 ], the data has been lacking that demonstrated BUB1 protein regulation by radiotherapy. Our cycloheximide chase assays clearly demonstrates that BUB1 is stabilized upon RT while pretreatment with BUB1i or DNAPKi reverses this and causes BUB1 degradation after irradiation (Fig. 7 C, 7F). Future studies will determine the mechanism of BUB1 regulation by radiation.

Chromatin remodeling increases the accessibility of DNA damage response and repair proteins in area around a DNA lesion [ 43 ]. Phosphorylation of H2AX and KAP1 are key steps that enhance chromatin relaxation and allow the recruitment of the DDR machinery to a DSB [ 65 , 66 ]. Our findings (Fig. 7 G) that BUB1 ablation causes increased loading of pDNAPKc, pKAP1, KAP1, and pATM to chromatin fractions and alters the recruitment of YFP-KU80 and YFP-DNAPKcs (F i g. 7 H-I) support a role for BUB1 in this step. Lu et al., identified a role for DNAPK kinase activity wherein attenuated chromatin recruitment of MRN complex was detected in DNAPK-KD or null (-/-) cells [ 28 ]. Additionally, they observed decreased localization of NHEJ factors including LIG4, XRCC4 and XLF in chromatin fractions in these cells further supporting a role for DNAPKcs activity in NHEJ. The above data supports our hypothesis wherein BUB1 mediate radiation induced NHEJ through regulating activation (phosphorylation) of DNAPKcs thus chromatin relaxation and access of NHEJ factors. Taken together, our results provide evidence to the hypothesis that BUB1 is necessary for the quick recruitment of the NHEJ factors to DNA damage sites (F i g. 7 G-I). In future, we will perform in-depth mechanistic studies such as micronuclease digestion, immunoprecipitation (CO-IP), and proximality ligation assay (PLA) to confirm a role for BUB1 in this step of NHEJ. Since mutagenesis of DNAPKcs Ser2056 confirmed that it limits end-processing [ 58 ], additionally we will evaluate if BUB1 ablation impacts this process. Because DNAPKcs inhibition or depletion leads to reduction in NHEJ and reciprocally a shift to HR [ 67 , 68 ] and since certain phosphorylation in DNAPKcs promote HR while inhibiting NHEJ [ 67 ], it would be interesting to see if these phospho sites in DNAPKcs are affected by BUB1 and thus downstream signaling (HR). Moreover, ATM phosphorylates members of MRN complex (that initiates HR cascade) [ 69 ], NHEJ factors (including DNAPKcs [ 70 ], and H2AX) and is shown to phosphorylate BUB1 at Ser314 in response to irradiation [ 11 ], it would be fascinating to investigate if BUB1 indeed affects HR response through ATM or some other mechanism.

BUB1 transcripts are significantly higher in breast cancer cell lines and in high-grade primary breast cancer tissues compared to normal mammary epithelial cells, or in normal breast tissues [ 71 ]. High BUB1 expression (transcript) correlates with extremely poor outcome in breast cancer [ 72 , 73 ]. Our meta-analysis that BUB1 expression significantly correlates with Ki67 (Supplementary Fig. S12) supports earlier findings [ 72 , 73 , 74 ] and signifies our in-vivo observations that tumors harvested from mice treated with a combination of BUB1i and radiation have statistically significant reduction in Ki67 (Fig. 4 F-G) or PCNA in cells (Fig. 7 J). Our TMA analysis found strong correlation between BUB1 protein expression and tumor grade (Fig. 8 A-B) and identified high BUB1 expression in TNBC samples. Our BUB1 immunostaining TMA data support earlier findings wherein nuclear BUB1 staining was found to strongly correlate with stage, pathological tumor factors, lymph node metastasis, distant metastasis, histological grade, and proliferation [ 74 ]. In future it will be important to assess if BUB1 protein expression correlates with treatment naïve or radioresistant-recurrence cases. Based on our data that BUB1 is stabilized upon radiation treatment (Fig. 7 C) we speculate higher BUB1 expression in radiation resistant, recurrent cases compared to treatment naïve cases.

Our results are consistent with previous reports where knockdown of BUB1 was demonstrated to prolong γH2AX foci, comet tail as well as hypersensitivity in response to ionizing radiation [ 11 ]. Since BUB1 co-localizes with 53BP1 [ 10 ] and interacts with NHEJ factors [ 10 ] and we identified that BUB1 ablation increases the amplitude and duration of DNAPKcs phosphorylation and increases chromatin localization of key NHEJ factors, we describe a model on BUB1’s role in NHEJ (Graphical Abstract, Fig. 8 C). In the presence of BUB1 (left panel), the NHEJ is efficient and can repair radiation induced DSB thus causes radio-resistance. On the other hand, BUB1 inhibition or depletion causes increased phosphorylation of DNAPKcs and increased binding of NHEJ factors at the DSB sites (right panel). These NHEJ mediators do not stay on the extended/open chromatin required for proper end processing and ligation of the DNA ends. This leads to reduced NHEJ repair leading to radiation-sensitization. DNAPKcs phosphorylation is essential for its dissociation from Ku bound DNA [ 46 , 75 , 76 ]. Although we observed increased binding of NHEJ factors at the chromatin following BUB1i+RT (10 minutes post RT), we cannot rule out that these factors fall off at a later time without repairing broken DNA ends (limited end processing) as has been demonstrated using DNAPKcs phospho-site mutants [ 76 ]. Taken together, our data demonstrate that BUB1 is overexpressed in breast cancer including TNBC and BUB1 ablation leads to radiosensitization through regulating DNAPKcs phosphorylation and chromatin localization of key NHEJ factors. Our findings strongly support nomination of BUB1 as a potential biomarker and a therapeutic target for radiosensitization in TNBC.

Availability of data and material

All the relevant data are already presented in the manuscript. Any additional data will be available upon request to the corresponding author.

Abbreviations

Breast cancer

Bioluminescent repair reporter

Budding uninhibited by benzimidazoles-1

Chick chorioallantoic membrane

Cycloheximide

Dulbecco's modified eagle medium

Dimethyl sulfoxide

DNA dependent protein kinase

Double-strand breaks

Electrogenerated chemiluminescence

Ethylenediaminetetraacetic acid

Estrogen receptor

Fetal bovine serum

Gene expression omnibus

Human epidermal growth factor receptor 2

Homologous recombination

Institutional Animal Care and Use Committee

Integrated DNA technologies

KRAB-associated protein 1

Kinase-dead

Linear mixed models

Mre11, Rad50 and Nbs1

Non-homologous end joining

Poly (ADP-ribose) polymerase inhibitor

Plating efficiency

Polyethylene glycol

Proximality ligation assay

DNA polymerase-beta

Progesterone receptor

polyvinylidene difluoride

Quantitative polymerase chain reaction

Radiation enhancement ratio

Ribonucleoprotein

Radiotherapy

Standard error of the mean

Survival fraction

Single-strand breaks

The Cancer Genome Atlas

Tissue microarray

Triple-negative breast cancer

Kyndi M, Sorensen FB, Knudsen H, Overgaard M, Nielsen HM, Overgaard J, et al. Estrogen receptor, progesterone receptor, HER-2, and response to postmastectomy radiotherapy in high-risk breast cancer: the Danish Breast Cancer Cooperative Group. J Clin Oncol. 2008;26(9):1419–26.

Article   CAS   PubMed   Google Scholar  

Mladenov E, Magin S, Soni A, Iliakis G. DNA double-strand break repair as determinant of cellular radiosensitivity to killing and target in radiation therapy. Front Oncol. 2013;3:113.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Morgan MA, Lawrence TS. Molecular Pathways: Overcoming Radiation Resistance by Targeting DNA Damage Response Pathways. Clin Cancer Res. 2015;21(13):2898–904.

Britton S, Coates J, Jackson SP. A new method for high-resolution imaging of Ku foci to decipher mechanisms of DNA double-strand break repair. J Cell Biol. 2013;202(3):579–95.

Ahnesorg P, Smith P, Jackson SP. XLF interacts with the XRCC4-DNA ligase IV complex to promote DNA nonhomologous end-joining. Cell. 2006;124(2):301–13.

Chapman JR, Taylor MR, Boulton SJ. Playing the end game: DNA double-strand break repair pathway choice. Mol Cell. 2012;47(4):497–510.

Gupta A, Hunt CR, Chakraborty S, Pandita RK, Yordy J, Ramnarain DB, et al. Role of 53BP1 in the regulation of DNA double-strand break repair pathway choice. Radiat Res. 2014;181(1):1–8.

Berry MR, Fan TM. Target-Based Radiosensitization Strategies: Concepts and Companion Animal Model Outlook. Front Oncol. 2021;11:768692.

Sriramulu S, Thoidingjam S, Brown SL, Siddiqui F, Movsas B, Nyati S. Molecular targets that sensitize cancer to radiation killing: From the bench to the bedside. Biomed Pharmacother. 2022;158:114126.

Article   PubMed   Google Scholar  

Jessulat M, Malty RH, Nguyen-Tran DH, Deineko V, Aoki H, Vlasblom J, et al. Spindle Checkpoint Factors Bub1 and Bub2 Promote DNA Double-Strand Break Repair by Nonhomologous End Joining. Mol Cell Biol. 2015;35(14):2448–63.

Yang C, Wang H, Xu Y, Brinkman KL, Ishiyama H, Wong ST, et al. The kinetochore protein Bub1 participates in the DNA damage response. DNA Repair (Amst). 2012;11(2):185–91.

Komura K, Inamoto T, Tsujino T, Matsui Y, Konuma T, Nishimura K, et al. Increased BUB1B/BUBR1 expression contributes to aberrant DNA repair activity leading to resistance to DNA-damaging agents. Oncogene. 2021;40(43):6210–22.

Nyati S, Schinske-Sebolt K, Pitchiaya S, Chekhovskiy K, Chator A, Chaudhry N, et al. The kinase activity of the Ser/Thr kinase BUB1 promotes TGF-beta signaling. Sci Signal. 2015;8(358):ra1.

Article   PubMed   PubMed Central   Google Scholar  

Nyati S, Gregg B, Xu JQ, Young G, Kimmel L, Mukesh N, et al. TGFBR2 mediated phosphorylation of BUB1 at Ser-318 is required for transforming growth factor-beta signaling. Cancer Res. 2019;79(13):3430.

Article   Google Scholar  

Nyati S, Gregg BS, Xu J, Young G, Kimmel L, Nyati MK, et al. TGFBR2 mediated phosphorylation of BUB1 at Ser-318 is required for transforming growth factor-beta signaling. Neoplasia. 2020;22(4):163–78.

Tang ZY, Shu HJ, Oncel D, Chen S, Yu HT. Phosphorylation of Cdc20 by Bub1 provides a catalytic mechanism for APC/C inhibition by the spindle checkpoint. Mol Cell. 2004;16(3):387–97.

Tang ZY, Sun YX, Harley SE, Zou H, Yu HT. Human Bub1 protects centromeric sister-chromatid cohesion through Shugoshin during mitosis. Proc Natl Acad Sci U S A. 2004;101(52):18012–7.

Yu H, Tang Z. Bub1 multitasking in mitosis. Cell Cycle. 2005;4(2):262–5.

Neve RM, Chin K, Fridlyand J, Yeh J, Baehner FL, Fevr T, et al. A collection of breast cancer cell lines for the study of functionally distinct cancer subtypes. Cancer Cell. 2006;10(6):515–27.

Hatzis C, Pusztai L, Valero V, Booser DJ, Esserman L, Lluch A, et al. A genomic predictor of response and survival following taxane-anthracycline chemotherapy for invasive breast cancer. JAMA. 2011;305(18):1873–81.

Speers C, Zhao SG, Kothari V, Santola A, Liu M, Wilder-Romans K, et al. Maternal Embryonic Leucine Zipper Kinase (MELK) as a Novel Mediator and Biomarker of Radioresistance in Human Breast Cancer. Clin Cancer Res. 2016;22(23):5864–75.

Zhao SG, Shilkrut M, Speers C, Liu M, Wilder-Romans K, Lawrence TS, et al. Development and validation of a novel platform-independent metastasis signature in human breast cancer. PLoS One. 2015;10(5):e0126631.

Zuris JA, Thompson DB, Shu Y, Guilinger JP, Bessen JL, Hu JH, et al. Cationic lipid-mediated delivery of proteins enables efficient protein-based genome editing in vitro and in vivo. Nat Biotechnol. 2015;33(1):73–80.

Liang X, Potter J, Kumar S, Zou Y, Quintanilla R, Sridharan M, et al. Rapid and highly efficient mammalian cell engineering via Cas9 protein transfection. J Biotechnol. 2015;208:44–53.

Serçin Ö, Reither S, Roidos P, Ballin N, Palikyras S, Baginska A, et al. A solid-phase transfection platform for arrayed CRISPR screens. Mol Syst Biol. 2019;15(12):e8983.

Chien JC, Tabet E, Pinkham K, da Hora CC, Chang JC, Lin S, et al. A multiplexed bioluminescent reporter for sensitive and non-invasive tracking of DNA double strand break repair dynamics in vitro and in vivo. Nucleic Acids Res. 2020;48(17):e100.

So S, Davis AJ, Chen DJ. Autophosphorylation at serine 1981 stabilizes ATM at DNA damage sites. J Cell Biol. 2009;187(7):977–90.

Lu H, Saha J, Beckmann PJ, Hendrickson EA, Davis AJ. DNA-PKcs promotes chromatin decondensation to facilitate initiation of the DNA damage response. Nucleic Acids Res. 2019;47(18):9467–79.

Lu H, Shamanna RA, de Freitas JK, Okur M, Khadka P, Kulikowicz T, et al. Cell cycle-dependent phosphorylation regulates RECQL4 pathway choice and ubiquitination in DNA double-strand break repair. Nat Commun. 2017;8(1):2039.

Speers C, Zhao S, Liu M, Bartelink H, Pierce LJ, Feng FY. Development and Validation of a Novel Radiosensitivity Signature in Human Breast Cancer. Clin Cancer Res. 2015;21(16):3667–77.

DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics. 1988;44(3):837–45.

Servant N, Bollet MA, Halfwerk H, Bleakley K, Kreike B, Jacob L, et al. Search for a gene expression signature of breast cancer local recurrence in young women. Clin Cancer Res. 2012;18(6):1704–15.

Goodwin JF, Knudsen KE. Beyond DNA repair: DNA-PK function in cancer. Cancer Discov. 2014;4(10):1126–39.

Yue X, Bai C, Xie D, Ma T, Zhou PK. DNA-PKcs: A Multi-Faceted Player in DNA Damage Response. Front Genet. 2020;11:607428.

Mao Z, Bozzella M, Seluanov A, Gorbunova V. DNA repair by nonhomologous end joining and homologous recombination during cell cycle in human cells. Cell Cycle. 2008;7(18):2902–6.

Dong J, Ren Y, Zhang T, Wang Z, Ling CC, Li GC, et al. Inactivation of DNA-PK by knockdown DNA-PKcs or NU7441 impairs non-homologous end-joining of radiation-induced double strand break repair. Oncol Rep. 2018;39(3):912–20.

CAS   PubMed   PubMed Central   Google Scholar  

Chang HHY, Pannunzio NR, Adachi N, Lieber MR. Non-homologous DNA end joining and alternative pathways to double-strand break repair. Nat Rev Mol Cell Biol. 2017;18(8):495–506.

Kurimasa A, Kumano S, Boubnov NV, Story MD, Tung CS, Peterson SR, et al. Requirement for the kinase activity of human DNA-dependent protein kinase catalytic subunit in DNA strand break rejoining. Mol Cell Biol. 1999;19(5):3877–84.

Neal JA, Sugiman-Marangos S, VanderVere-Carozza P, Wagner M, Turchi J, Lees-Miller SP, et al. Unraveling the complexities of DNA-dependent protein kinase autophosphorylation. Mol Cell Biol. 2014;34(12):2162–75.

Cui X, Yu Y, Gupta S, Cho YM, Lees-Miller SP, Meek K. Autophosphorylation of DNA-dependent protein kinase regulates DNA end processing and may also alter double-strand break repair pathway choice. Mol Cell Biol. 2005;25(24):10842–52.

Chan DW, Chen BP, Prithivirajsingh S, Kurimasa A, Story MD, Qin J, et al. Autophosphorylation of the DNA-dependent protein kinase catalytic subunit is required for rejoining of DNA double-strand breaks. Genes Dev. 2002;16(18):2333–8.

Block WD, Yu Y, Merkle D, Gifford JL, Ding Q, Meek K, et al. Autophosphorylation-dependent remodeling of the DNA-dependent protein kinase catalytic subunit regulates ligation of DNA ends. Nucleic Acids Res. 2004;32(14):4351–7.

Price BD, D’Andrea AD. Chromatin remodeling at DNA double-strand breaks. Cell. 2013;152(6):1344–54.

Harper JW, Elledge SJ. The DNA damage response: ten years after. Mol Cell. 2007;28(5):739–45.

Zagami P, Carey LA. Triple negative breast cancer: Pitfalls and progress. NPJ Breast Cancer. 2022;8(1):95.

Abdulkarim BS, Cuartero J, Hanson J, Deschenes J, Lesniak D, Sabri S. Increased risk of locoregional recurrence for women with T1–2N0 triple-negative breast cancer treated with modified radical mastectomy without adjuvant radiation therapy compared with breast-conserving therapy. J Clin Oncol. 2011;29(21):2852–8.

Gedeon T, Bokes P. Delayed protein synthesis reduces the correlation between mRNA and protein fluctuations. Biophys J. 2012;103(3):377–85.

Liu Y, Beyer A, Aebersold R. On the Dependency of Cellular Protein Levels on mRNA Abundance. Cell. 2016;165(3):535–50.

Greenbaum D, Colangelo C, Williams K, Gerstein M. Comparing protein abundance and mRNA expression levels on a genomic scale. Genome Biol. 2003;4(9):117.

Nie L, Wu G, Zhang W. Correlation between mRNA and protein abundance in Desulfovibrio vulgaris: a multiple regression to identify sources of variations. Biochem Biophys Res Commun. 2006;339(2):603–10.

Tian Q, Stepaniants SB, Mao M, Weng L, Feetham MC, Doyle MJ, et al. Integrated genomic and proteomic analyses of gene expression in Mammalian cells. Mol Cell Proteomics. 2004;3(10):960–9.

Siemeister G, Mengel A, Fernandez-Montalvan AE, Bone W, Schroder J, Zitzmann-Kolbe S, et al. Inhibition of BUB1 Kinase by BAY 1816032 Sensitizes Tumor Cells toward Taxanes, ATR, and PARP Inhibitors In Vitro and In Vivo. Clin Cancer Res. 2019;25(4):1404–14.

Griesbach E, Schlackow M, Marzluff WF, Proudfoot NJ. Dual RNA 3’-end processing of H2A.X messenger RNA maintains DNA damage repair throughout the cell cycle. Nat Commun. 2021;12(1):359.

Morales AG, Pezuk JA, Brassesco MS, de Oliveira JC, de Paula Queiroz RG, Machado HR, et al. BUB1 and BUBR1 inhibition decreases proliferation and colony formation, and enhances radiation sensitivity in pediatric glioblastoma cells. Childs Nerv Syst. 2013;29(12):2241–8.

Chandler BC, Moubadder L, Ritter CL, Liu M, Cameron M, Wilder-Romans K, Zhang A, Pesch AM, Michmerhuizen AR, Hirsh N, Androsiglio M, Ward T, Olsen E, Niknafs YS, Merajver S, Thomas DG, Brown PH, Lawrence TS, Nyati S, Pierce LJ, Chinnaiyan A, Speers C. TTK inhibition radiosensitizes basal-like breast cancer through impaired homologous recombination. J Clin Invest. 2020;130(2):958–73.

Fok JHL, Ramos-Montoya A, Vazquez-Chantada M, Wijnhoven PWG, Follia V, James N, et al. AZD7648 is a potent and selective DNA-PK inhibitor that enhances radiation, chemotherapy and olaparib activity. Nat Commun. 2019;10(1):5065.

Mohiuddin IS, Kang MH. DNA-PK as an Emerging Therapeutic Target in Cancer. Front Oncol. 2019;9:635.

Jiang W, Crowe JL, Liu X, Nakajima S, Wang Y, Li C, et al. Differential phosphorylation of DNA-PKcs regulates the interplay between end-processing and end-ligation during nonhomologous end-joining. Mol Cell. 2015;58(1):172–85.

Quanz M, Chassoux D, Berthault N, Agrario C, Sun JS, Dutreix M. Hyperactivation of DNA-PK by double-strand break mimicking molecules disorganizes DNA damage response. PLoS One. 2009;4(7):e6298.

Wang Y, Xu H, Liu T, Huang M, Butter PP, Li C, et al. Temporal DNA-PK activation drives genomic instability and therapy resistance in glioma stem cells. JCI Insight. 2018;3(3):e98096.

Ho SR, Mahanic CS, Lee YJ, Lin WC. RNF144A, an E3 ubiquitin ligase for DNA-PKcs, promotes apoptosis during DNA damage. Proc Natl Acad Sci USA. 2014;111(26):E2646-55.

Heo J, Park YJ, Kim Y, Lee HS, Kim J, Kwon SH, et al. Mitochondrial E3 ligase MARCH5 is a safeguard against DNA-PKcs-mediated immune signaling in mitochondria-damaged cells. Cell Death Dis. 2023;14(12):788.

Feng M, Wang Y, Bi L, Zhang P, Wang H, Zhao Z, et al. CRL4A(DTL) degrades DNA-PKcs to modulate NHEJ repair and induce genomic instability and subsequent malignant transformation. Oncogene. 2021;40(11):2096–111.

Elango R, Vishnubalaji R, Shaath H, Alajez NM. Transcriptional alterations of protein coding and noncoding RNAs in triple negative breast cancer in response to DNA methyltransferases inhibition. Cancer Cell Int. 2021;21(1):515.

Ziv Y, Bielopolski D, Galanty Y, Lukas C, Taya Y, Schultz DC, et al. Chromatin relaxation in response to DNA double-strand breaks is modulated by a novel ATM- and KAP-1 dependent pathway. Nat Cell Biol. 2006;8(8):870–6.

Nakamura AJ, Rao VA, Pommier Y, Bonner WM. The complexity of phosphorylated H2AX foci formation and DNA repair assembly at DNA double-strand breaks. Cell Cycle. 2010;9(2):389–97.

Neal JA, Dang V, Douglas P, Wold MS, Lees-Miller SP, Meek K. Inhibition of homologous recombination by DNA-dependent protein kinase requires kinase activity, is titratable, and is modulated by autophosphorylation. Mol Cell Biol. 2011;31(8):1719–33.

Neal JA, Meek K. Choosing the right path: does DNA-PK help make the decision? Mutat Res. 2011;711(1–2):73–86.

Blackford AN, Jackson SP. ATM, ATR, and DNA-PK: The Trinity at the Heart of the DNA Damage Response. Mol Cell. 2017;66(6):801–17.

Lu H, Zhang Q, Laverty DJ, Puncheon AC, Augustine MM, Williams GJ, et al. ATM phosphorylates the FATC domain of DNA-PKcs at threonine 4102 to promote non-homologous end joining. Nucleic Acids Res. 2023;51(13):6770–83.

Yuan B, Xu Y, Woo JH, Wang Y, Bae YK, Yoon DS, et al. Increased expression of mitotic checkpoint genes in breast cancer cells with chromosomal instability. Clin Cancer Res. 2006;12(2):405–10.

Dai H, van’t Veer L, Lamb J, He YD, Mao M, Fine BM, et al. A cell proliferation signature is a marker of extremely poor outcome in a subpopulation of breast cancer patients. Cancer Res. 2005;65(10):4059–66.

Chen DL, Cai JH, Wang CCN. Identification of Key Prognostic Genes of Triple Negative Breast Cancer by LASSO-Based Machine Learning and Bioinformatics Analysis. Genes (Basel). 2022;13(5):902.

Takagi K, Miki Y, Shibahara Y, Nakamura Y, Ebata A, Watanabe M, et al. BUB1 immunolocalization in breast carcinoma: its nuclear localization as a potent prognostic factor of the patients. Horm Cancer. 2013;4(2):92–102.

Meek K, Douglas P, Cui X, Ding Q, Lees-Miller SP. trans Autophosphorylation at DNA-dependent protein kinase’s two major autophosphorylation site clusters facilitates end processing but not end joining. Mol Cell Biol. 2007;27(10):3881–90.

Uematsu N, Weterings E, Yano K, Morotomi-Yano K, Jakob B, Taucher-Scholz G, et al. Autophosphorylation of DNA-PKCS regulates its dynamics at DNA double-strand breaks. J Cell Biol. 2007;177(2):219–29.

Download references

Acknowledgements

The authors thank Grahm Valadie for help with animal treatment using SARRP and Katheryn Meek (MSU) with data interpretation. We thank Transgenics and CRISPR (TGEF) core at MSU for help with BUB1 gRNA design and Histology core-HFH for immunohistological staining. We thank Pin Li and Sunita Ghosh from Public Health Sciences for statistical analyses. Grant support to SN ((NIH/NCI R21 CA252010-01A1), Henry Ford Cancer Institute (HFCI) and Henry Ford Health Research Administration Start Up grant, HFH Proposal Development Award, HFH Near Miss Award, HFH Radiation Oncology Start Up grant, and Game on Cancer award), HFCI Translational Oncology Postdoctoral Fellowship to SS.

This work was supported by NCI R21 (1R21CA252010-01A1), HFHS Research Administration Start up, HFHS Proposal Development Award, HFHS-Radiation Oncology Start Up, and Game on Cancer award to SN. We also thank HFCI for providing a Translational Oncology Postdoctoral Fellowship to SS.

Author information

Sushmitha Sriramulu and Shivani Thoidingjam contributed equally to this work.

Authors and Affiliations

Department of Radiation Oncology, Henry Ford Cancer Institute, Henry Ford Health, 1 Ford Place, Detroit, 5D-42, MI-48202, USA

Sushmitha Sriramulu, Shivani Thoidingjam, Farzan Siddiqui, Stephen L. Brown, Benjamin Movsas, Eleanor Walker & Shyam Nyati

Department of Radiation Oncology, UT Southwestern Medical School, Dallas, TX-75390, USA

Wei-Min Chen & Anthony J. Davis

Department of Surgical Pathology, Henry Ford Cancer Institute, Henry Ford Health, Detroit, MI-48202, USA

Oudai Hassan

Henry Ford Health + Michigan State University Health Sciences, Detroit, MI-48202, USA

Farzan Siddiqui, Stephen L. Brown, Benjamin Movsas, Eleanor Walker & Shyam Nyati

Department of Radiology, Michigan State University, East Lansing, MI-48824, USA

Department of Radiation Oncology, University of Michigan, Ann Arbor, MI-48109, USA

Michael D. Green & Corey Speers

Department of Radiation Oncology, UH Seidman Cancer Center, University Hospitals Case Medical Center, Case Western Reserve University, Cleveland, OH-44106, USA

Corey Speers

You can also search for this author in PubMed   Google Scholar

Contributions

SN conceived and designed the study; SS and ST performed the experiments; WMC performed laser micro irradiation experiments, CS performed the bioinformatic analysis; SN, SS, and ST wrote the original manuscript; OH analyzed the immunohistological staining of BUB1 on TMA slides; SS, ST, WMC, OH, FS, SLB, BM, MDG, AJD, CS, EW, and SN interpreted the data; SS, ST, WMC, OH, FS, SLB, BM, MDG, AJD, CS, EW, and SN revised the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Shyam Nyati .

Ethics declarations

Ethics approval and consent to participate.

Experimental animals were housed and handled in accordance with protocols approved by IACUC of Henry Ford Health (protocol # 00001298).

Consent for publication

Not applicable.

Competing interests

SS, ST, WMC, OH, SLB, MDG, AJD, SN: No competing interests, FS: Varian Medical Systems Inc - Honorarium and travel reimbursement for lectures and talks, Varian Noona – Member of Medical Advisory Board - Honorarium (no direct conflict), BM: Research support from Varian, ViewRay, and Philips (no direct conflict), CS: Exact Sciences (paid consultant - no direct conflict), EW: Genentech research support for clinical trials.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

13046_2024_3086_moesm1_esm.pdf.

Additional file 1: Table S1. List of mutated genes in the TNBC cell lines. Table S2. Guide RNA (gRNA) sequences used to knock out BUB1, primer sequences for PCR amplification of BUB1-edited section, and primer sequence for Sanger sequencing. Table S3. List of antibodies used for Western Blotting/Immunohistochemical /Immunofluorescence studies. Table S4. Primer sequences used in quantitative PCR (qRT-PCR) analysis. Fig. S5. Clonogenic assays using BUB1 siRNA and RT in (A) MDA-MB-468, (B) BT-549, (C) T-47D cell lines. PRKDC siRNA is used as a positive control. Fig. S6. (A) CRISPR-CAS9 RNP transfection method was utilized to knock out BUB1 (B) BUB1 knockouts were confirmed through Immunoblotting in SUM159, and MDA-MB-231 cell lines followed by (C) PCR amplification and (D) Sanger Sequencing to further validate the BUB1 KO’s (E) Sanger Sequencing chromatograms of SUM159 BUB1 KO #48 and #18, and MDA-MB-231 KO #12 and #15. Fig. S7. (A) Initial radiation dose-response studies in SUM159 tumor xenograft in CB17 SCID mice. SUM159 xenograft mammary fat pad tumors were conformally irradiated at 2.5 GyX8, 5 Gy X3 or 10 GyX1 by SARRP (light blue curves). Additionally, mice were treated with a BUB1 inhibitor (25 mg/kg, orally, twice daily, 5 days/week for 4 weeks) along with radiation (red curves). (B) A spaghetti plot for animal body weight change during the treatment. Fig. S8. Immunofluorescence studies using BUB1 siRNA and RT (16 h time point) in (a) SUM159 and (b) MDA-MB-231 cell lines. Fig. S9. qRT-PCR of NHEJ pathway related genes in MDA-MB-231 cell line with BUB1i (top panel) and BUB1 CRISPR-KO #12. Fig. S10. Effect of BUB1 inhibition with IR on DNAPKcs phosphorylation using Immunoblotting in (A) MDA-MB-468 cell line, and (B) shown at different time points up to 2 h. Fig. S11. The effect of BUB1 inhibitor (BAY1816032) on BUB1 protein levels in normal mammary epithelial cell line MCF 10A. The cells were treated for 1 hour with the same doses of BUB1i that were used for the colony formation assays (250 nM, 500 nM and 1000 nM). Fig. S12. mRNA expression plots showing correlation of BUB1 vs. NHEJ pathway-related genes, apoptotic, and proliferation genes in Breast cancer (METABRIC, 2509 samples) from cBIOPORTAL. Fig. S13. qRT-PCR of apoptotic and proliferation genes in MDA-MB-231 cell line with (A) BUB1i and (B) BUB1 CRISPR-KO #12.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Sriramulu, S., Thoidingjam, S., Chen, WM. et al. BUB1 regulates non-homologous end joining pathway to mediate radioresistance in triple-negative breast cancer. J Exp Clin Cancer Res 43 , 163 (2024). https://doi.org/10.1186/s13046-024-03086-9

Download citation

Received : 14 February 2024

Accepted : 30 May 2024

Published : 11 June 2024

DOI : https://doi.org/10.1186/s13046-024-03086-9

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • DNA damage response
  • Radiation sensitization

Journal of Experimental & Clinical Cancer Research

ISSN: 1756-9966

  • Submission enquiries: Access here and click Contact Us
  • General enquiries: [email protected]

data analysis clinical research

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List

Logo of plosone

Exploratory data analysis of a clinical study group: Development of a procedure for exploring multidimensional data

Bogumil m. konopka.

1 Department of Biomedical Engineering, Faculty of Fundamental Problems of Technology, Wroclaw University of Science and Technology, Wroclaw, Poland

Felicja Lwow

2 Department of Health Promotion, Faculty of Physiotherapy University School of Physical Education, Wroclaw, Poland

Magdalena Owczarz

3 Mossakowski Medical Research Centre, Polish Academy of Sciences, Warsaw, Poland

4 International Institute of Molecular and Cell Biology, Warsaw, Poland

Łukasz Łaczmański

5 Hirszfeld Institute of Immunology and Experimental Therapy, Polish Academy of Sciences, Wroclaw, Poland

Associated Data

All relevant data are within the paper and its Supporting Information files.

Thorough knowledge of the structure of analyzed data allows to form detailed scientific hypotheses and research questions. The structure of data can be revealed with methods for exploratory data analysis. Due to multitude of available methods, selecting those which will work together well and facilitate data interpretation is not an easy task. In this work we present a well fitted set of tools for a complete exploratory analysis of a clinical dataset and perform a case study analysis on a set of 515 patients. The proposed procedure comprises several steps: 1) robust data normalization, 2) outlier detection with Mahalanobis (MD) and robust Mahalanobis distances (rMD), 3) hierarchical clustering with Ward’s algorithm, 4) Principal Component Analysis with biplot vectors. The analyzed set comprised elderly patients that participated in the PolSenior project. Each patient was characterized by over 40 biochemical and socio-geographical attributes. Introductory analysis showed that the case-study dataset comprises two clusters separated along the axis of sex hormone attributes. Further analysis was carried out separately for male and female patients. The most optimal partitioning in the male set resulted in five subgroups. Two of them were related to diseased patients: 1) diabetes and 2) hypogonadism patients. Analysis of the female set suggested that it was more homogeneous than the male dataset. No evidence of pathological patient subgroups was found. In the study we showed that outlier detection with MD and rMD allows not only to identify outliers, but can also assess the heterogeneity of a dataset. The case study proved that our procedure is well suited for identification and visualization of biologically meaningful patient subgroups.

Introduction

Thorough knowledge of the structure of analyzed data allows to form detailed scientific hypotheses and research questions. It is crucial for correct interpretation of conducted experiments. This is especially important in case of investigations where the researcher does not directly control the conditions or the investigated objects. Clinical or epidemiological studies can be examples of such investigations. Here we will present a case-study analysis of a group of 515 elderly participants of an epidemiological study. Despite the fact that usually participants of clinical studies go through a qualification procedure, fill in detailed question forms and need to meet requirements regarding biochemical parameters, age, health history etc., it may happen that a gathered dataset still contains individuals that should not take part in the study. Their presence in the dataset may significantly influence its final outcome and lead to false conclusions.

The data structure and basic associations between parameters in the data can be revealed with methods for exploratory data analysis, such as clustering or Principal Component Analysis (PCA). Distanced based data analysis methods (including many types of clustering and PCA) are sensitive to data scaling. Therefore data normalization is often needed. Typically this can be performed with Z-score normalization, which assumes normal distribution of values of an attribute. It indicates how many standard deviations an instance of the data is away from the sample mean. Another often used normalization method is the Min-max normalization, which scales an attribute to a 0–1 range. It is especially useful when the bottom and top values of the attribute are limited—for instance due to experimental design. These normalization techniques are sensitive to outliers. The robust Z-score normalization is a modification of the classic Z-score normalization in which median is used instead of the mean and interquartile range is used instead of the standard deviation. These changes minimize the influence of extreme values on the resulting normalization.

Identification of outliers in the data set is another important step in the analysis. Outliers are instances of data that are characterized by extreme attribute values in comparison to the core of the dataset. An outlier can be defined as an instance that was generated by a different process than the rest of instances [ 1 ]. Outliers in single dimensional data can be filtered out with univariate statistic based methods [ 2 ]. However, for high-dimensional data more sophisticated methods need to be used. These methods can be divided into 1) model-based approaches, which assume a model of data—if a data point does not fit the model, it is labelled as an outlier [ 3 ], [ 4 ], 2) proximity-based approaches, which calculate the distance between a data point and all other data—outliers are points that show significantly different distances [ 5 ], [ 6 ] 3) angle-based approaches, which calculate the angles between a data point and all other data, outliers are points that acquire small fluctuations of angles [ 7 ]. Thorough reviews of outlier detection techniques can be found in [ 8 ], [ 9 ] and [ 10 ].

The structure of pre-processed data can be investigated with clustering techniques. These fall into several main categories: 1) hierarchical clustering, 2) partitioning relocation methods (which include various versions of K-means and K-medoids), 3) density-based partitioning, and 4) grid-based partitioning, which performs segmentation of attribute space and agglomeration of similar segments. For a review see [ 11 ]. Among these, hierarchical clustering is associated with probably the clearest way of visualization, i.e. the dendrogram also called the clustering tree, which allows detailed investigation of every clustering step. That is why it is especially useful in data exploration. Clustering quality can be verified quantitatively with clustering validation indices, such as Dunn index [ 12 ], Davies-Bouildin index [ 13 ] or silhouette values [ 14 ].

Data visualization is an extremely important element of data exploration analysis. It allows to connect facts and form conclusions based on the outcome of other steps of the analysis. A classical method for visualization of multidimensional data is PCA [ 15 ], which allows to reduce the number of dimensions needed to depict a dataset without a significant loss of information. However this can also be performed with multidimensional scaling [ 16 ] or some other nonlinear dimensionality reduction techniques [ 17 ].

As it can be seen from this short introduction, when facing the problem of getting to know a new dataset, a researcher has a plethora of exploratory tools to choose from. Selecting methods that will work together and facilitate revealing the structure of the data is not an easy task. In this work we present a well fitted set of tools for a complete exploratory analysis of a clinical study dataset. We perform a case-study analysis in which we address the most important questions that need to be asked prior to most studies: are there any significant outliers in the dataset? What subgroups make up for the dataset? What are the characteristics of particular subgroups? And finally, what are the biological reasons that underlie such dataset structure?

Dataset description

The presented analysis is part of a project which aims at investigating the relation between certain polymorphisms of a gene–Vitamin D Receptor and sex hormone levels in elderly people. The research sample was chosen from the PolSenior study [ 18 ]—a project that aims at investigating the interrelations between health, genetics and social status in advanced age in Polish population.

The dataset consisted of 515 participants– 238 women, and 277 men, whose age was in the range 55–102 years. Each participant was described by 23 numeric and 21 nominal attributes ( S1 Table ). Numeric attributes contain biophysical and biochemical parameters, such as AGE, WEIGHT and BLOOD INSULIN CONCENTRATION. Nominal attributes include socio-geographical data such as COUNTRY REGION, CITY POPULATION, and also SEASON and MONTH. The full list of attributes and their description is given in Tables ​ Tables1 1 and ​ and2. 2 . The study was approved by Bioethical Committee of the Medical University of Silesia (KNW-6501-38/I/08) and informed written consent, including consent for genetic studies, was obtained from all of the subjects before testing.

Attribute nameDescription
AGEage in years
HEIGHTheight given in [cm]
WEIGHTweight in [kg]
WAISTLINEwaistline given in [cm]
HIP.GIRTHhip girth given in [cm]
BMIthe body to mass index [kg/m ]
FATAmount of body fat as percentage of body weight [%]
CHOL.HDLCholesterol serum level—High Density Lipoprotein [mg/dl]
CHOL.LDLcholesterol serum level—Ligh Density Lipoprotein [mg/dl]
CHOL.TOTALtotal level of cholesterol [mg/dl]
TGCserum level of triglycerides [mg/dl]
GLUCOSESerum Glucose level [mg/dl]
INSserum level of insulin [μIU/ml]
TESTOSTERONEserum level of testosterone [nmol/l]
ESTRADIOLserum level of Estradiol [pmol/l]
DHEA.Sserum level of Dehydroepiandrosteron [ng/dl]
SHGBserum level of sex hormone binding globulin [pmol/l]
FAIFree Androgen Index defined as the ratio of total testosterone to SHBG × 100 [ ]
FEIFree Estradiol Index defined as the ratio of total estradiol to SHBG × 100 [ ]
FSHSerum Follicle-Stimulating Hormone level [IU/l]
ICTPserum level of carboxy-terminal cross-linked telopeptide of type I collagen [mg/l]
OPGserum level of osteoprotegerin [pmol/l]
VITAMIN.Dserum level of Vitamin D [ng/ml]
Attribute nameDescription
AGE.GROUPage in discretized groups (5 year bins)
CG1.IDENTIFIED.DIABETES.YESbinary; 1 if observed
CITY.SIZEcity size bins: countryside, population < 20 thousand, 20–50 thousand, 50–200 thousand, 200–500 thousand, >500 thousand
HYPERANDROGENISM.YESbinary; 1 if observed
HYPERTENSION.YESbinary; 1 if observed
INSOLATION.YESbinary; 1 if in summer and spring
MACROREGION6 binary attributes: ‘north’, ‘east’, ‘south’, ‘central’, ‘north-west’, ‘south-west’
OBESTIY_PHENO_FLMHObinary; obesity phenotype—metabolic healthy obesity [ ]
OBESTIY_PHENO_FLMONWbinary; obesity phenotype—methabolic obesity normal weight [ ]
OBESTIY_PHENO_FLOMWDbinary; obesity phenotype—obesity methabolic weist disease [ ]
OBSETITY_PHENOOBZMbinary; obesity phenotype–adjustment of FLMHO for Polish population [ ]
OBSETITY_PHENOOZZMbinary; obesity phenotype–adjustment of FLMONW for Polish population [ ]
YEAR_SEASON4 binary attributes: ‘winter’, ‘spring’, ‘summer’, ‘autumn’

Data exploration procedure

As mentioned in the introduction: data visualization and clustering are crucial for understanding the data at hand. These were key elements of the procedure proposed in the study. In order to visualize multidimensional data in a two dimensional space, dimension reduction has to be performed. We used PCA which is a classical method, available in most statistical packages. Using PCA requires data scaling, otherwise attributes with highest variance may dominate the outcome. For the same reason outliers need to be detected and removed.

The exploratory analysis was carried out in two stages. First, we conducted the exploratory analysis based on numeric attributes ( Table 1 ) using the following procedure: 1) normalization, 2) Principal Component Analysis, 3) Outlier detection and removal, 4) clustering. After that, clustering was repeated with the nominal/categorical attributes added ( Table 2 ). We performed the analysis in two stages because processing numerical data is more straightforward–most analysis algorithms were designed to treat numerical data. Processing nominal data requires additional actions to transform from the nominal attribute space to a numerical one and the results need to be analyzed with great caution.

Normalization

All numerical attributes were normalized using Robust Z-Score Normalization ( Eq 1 ):

where IQR(x) is the interquartile range of the attribute. Applying Robust Z-score Normalization insures that the influence of any potential outliers on the normalization is minimal.

Principal Component Analysis (PCA)

Basic R package function prcomp was used for calculation of principal components (PCs). The PC biplot was used for visualization of PCs along with variability and contributions of original attributes [ 21 ]. PCA was carried out on normalized data.

Outlier detection

Two approaches were used to detect outlying samples.

The Mahalanobis Distance [ 4 ] is defined as:

where x i is the vector of attribute values of i-th sample, X - is the m-dimensional vector of attribute means and S 0 is the covariance matrix calculated for the whole dataset.

The robust Minimum Covariance Determinant (MCD) is a modification of Mahalanobis distance as defined in [ 3 ]. It is also called the robust Mahalanobis Distance (rMD). The MCD algorithm is an iterative procedure. The steps are:

  • Chose a subset H of size h .
  • Calculate X 1 - and S 1 for samples in H
  • Calculate distance rMD ( x i ) for i = 1 ,.., n , with X 1 - and S 1 instead of X - and S 0 : r M D ( x i ) = ( x i - X k - ) S k - 1 ( x i - X k - ) , (3) where k is the iteration number.
  • Sort all samples in terms of rMD ( x i ).
  • Choose a new subset H 2 of h samples with the smallest rMD .
  • Repeat 1–5 untill det ( S k ) = 0 or det ( S k ) = det ( S k − 1 ), where k is the iteration number.

The intuitive difference between MD and rMD is that, in case of MD outliers influence X - and S 0 ( Eq 3 ), while in rMD only a subset of h samples is used for calculating X k - and S k thus the influence of outliers on the calculated distances is limited.

Both MD and rMD were calculated using the ‘chemometrics’ R Package [ 22 ].

Hierarchical clustering analysis

The main clustering approach used was hierarchical clustering. It was performed in two steps. First, samples were clustered based only on numerical attributes. Then, nominal attributes were incorporated for a joined cluster analysis. Nominal attributes were binarized and then rescaled, so that 0 and 1 equaled the I-st and the III-rd quartile of the distribution of all numerical values. This way the center of the data remained unchanged upon addition of nominal attributes. Simultaneously, we performed clustering of attributes. We used hierarchical agglomerative clustering using Ward method, which minimizes the change in variance resulting from fusion of two clusters [ 23 ]. Technically, calculations were carried out with hclust R function with the “ward.D2” method.

Dunn [ 12 ] and Davies–Bouldin [ 13 ] indices were used to support this cluster analysis and index proper number of clusters. The indexes were calculated using the ‘clv’ R Package [ 24 ].

Dunn index is defined as:

where nc , denotes number of clusters, c i is the i-th cluster, d(c i , c j ) is the dissimilarity between clusters i and j , and diam(c) is a function used for assessing the dispersion of a cluster.

Davies-Bouldin is calculated as:

where R ij measures the relations between each pair of clusters defined as:

where d(c i , c j ) is the dissimilarity between clusters i and j , and diam(c) is a function used for assessing the dispersion of a cluster.

During calculation of Dunn and DB indices we chose diam(c) to be the average distance between cluster members and cluster centroids, and d(c i , c j ) to be the distance between centroids of compared clusters. The choice was implied by the fact that Ward’s clustering algorithm minimizes the within-cluster variance which is defined as the average distance between cluster members and cluster centroids, and also maximizes the inter-cluster variance which is based on centroid locations [ 23 ]. Therefore, such a choice of measures for Dunn and DB gives the best insight into the outcome of clustering.

Additional cluster analysis

Hierarchical clustering analysis of the male set was additionally supported with three other clustering techniques: 1) density-based DBSCAN clustering [ 25 ], 2) clustering based on PCAs and 3) biclustering in order to verify the main conclusions.

Density Based clustering depends on two input parameters, i.e. number of neighbors required to start a new cluster– K , and the distance defining the neighborhood of a point– epsilon . K was set to 3 based on visual inspection of the dataset, while epsilon was set to 4 based on k Nearest Neighbor Distance plot (see Results ). The choice was the y-value beyond which the distances increased rapidly. We used the DBSCAN R package implementation of the algorithm [ 26 ].

PCA-based clustering was performed on top 7 PCs, which accounted for 70% of data variance. The same routine as for main hierarchical clustering was used, i.e. euclidean distance and Wards algorithm as implemented in R stats package.

The biclustering approach used was the Plaid Models clustering [ 27 ], which allows to identify subsets of rows and columns with coherent values. In case of the analyzed dataset those subsets could be regarded as subgroups of patients presenting similar dependence of particular attributes. The biclust package implementation of the algorithm was used [ 28 ].

Statistical testing

Significance of differences between all clusters in terms of particular attributes was first tested with the Kruskal-Wallis test [ 29 ]–h 0 : distributions are the same in all groups. Then paired Wilcoxon rank sum test with Bonferroni correction was used to evaluate the head-to-head difference significance. Both are non-parametric test available in R basic {stats} package.

Results & discussion

Introductory analysis.

Firstly, raw data were normalized using the robust Z-score normalization then PCA was carried out. The plot of first two components shows that there are significant outliers in the data set ( Fig 1A ). The first component clearly dominates the remaining ones ( Fig 1B ). The main contribution to the first component comes from the INSULINE level (data not shown) due to increased variability caused by outliers. MD vs rMD plot shows that the majority of data forms a core ( Fig 1C –grey points) and also confirms the presence of significantly outlying samples ( Fig 1C –red points).

An external file that holds a picture, illustration, etc.
Object name is pone.0201950.g001.jpg

A) PCA carried out on full dataset. B) standard deviations of first 10 PCs indicate that the first PC dominates the variability of the dataset. C) The MD vs rMD plot allows to identify the most distant outliers (red points). D) PCA carried out after removal of most distant samples shows that male and female patients form two distinct clusters.

In order to get an overall look at the core of data we used arbitrarily set MD and rMD thresholds to remove the most distant outliers, 6.5 and 15 respectively ( Fig 1C –dashed lines). The thresholds were selected so that only the core of the data remained.

The plot of two first components, calculated after removing outlying points, reveals that samples are grouped in two clusters, consisting of male and female patients respectively ( Fig 1D ). The biplot [ 1 ] allows to visualize contributions of original attributes to particular PCs in the form of vectors. For instance if a patient had a level of ESTRADIOL higher than average, then in the PCA with biplot vectors he/she would be moved away from the center of the plot in the direction pointed by the ESTRADIOL vector. It can be seen that the two acquired clusters are separated along an axis formed by attributes such as: ESTRADIOL, TESTOSTERONE, FEI, FAI, FSH, which are sex hormones ( Fig 1D –red vectors). Such strong separation suggests that further analysis should be carried out separately for male and female patients. The position of particular samples in Fig 1D is also strongly influenced by a group of attributes perpendicular to the sex hormone axis. These attributes are generally related to metabolism: such as GLUCOSE, INSULINE, FAT, WEIGHT etc. The fact that these attributes are perpendicular to the sex hormone axis suggested they are unrelated to patient sex.

Male set analysis

In the first part of male set analysis all 277 male patients with all 23 numeric attributes from the raw dataset were analyzed. Again robust Z-score normalization was performed.

According to MD there are 22 outliers in the dataset. These points clearly stand out in terms of MD values from the rest of the set ( Fig 2A –red points). In terms of rMD there are many more candidate outliers, i.e. 124 samples. Both measures are consistent with regard to MD outliers—all samples pointed as outliers by the classic MD were also outliers in terms of rMD, what is more these were among the points with the highest rMD values ( Fig 2b –red points). The fact that rMD indicated almost half of the dataset as outliers may suggest that the set is heterogeneous.

An external file that holds a picture, illustration, etc.
Object name is pone.0201950.g002.jpg

A) according to classic MD, B) according to rMD. Outliers according to MD are colored red in both plots. The dashed line denotes the 0.99 quantile threshold for Chi2 distribution used for flagging outliers.

The MD vs rMD plot reveals that the data can be divided into three groups: 1) 155 samples that form the core of the set ( Fig 3 –gray points), 2) 100 samples that are rMD outliers only ( Fig 3 –blue points) 3) 22 samples that are outliers according to both MD and rMD ( Fig 3 red points marked blue). This shows that the classic MD is more conservative in terms marking outliers than the rMD. Both measures MD and rMD calculate the distance of data points from the data center. However while MD uses all points to determine the data center location, rMD uses only a subset of points that are the closest to the center (see Methods for more details). If a dataset consists of two subsets of points then rMD may use only one of them two determine the center of the data (this depends on the sizes of subsets). In such a situation points from the other set may be seen as outliers in terms of rMD. That is why this measure can be successfully used to state whether the set is homo- or heterogeneous.

An external file that holds a picture, illustration, etc.
Object name is pone.0201950.g003.jpg

Outliers were marked with blue and red points for rMD and MD respectively. All MD outliers are also rMD outliers.

Hierarchical clustering

We performed two rounds of clustering: 1) clustering of attributes–attributes were treated as instances and patients were treated as attributes, 2) clustering of patients—patients were treated as instances and their parameters were treated as attributes.

Clustering of attributes showed that there are three main groups of parameters ( Fig 4A —top panel), i.e. age-related parameters (FSH, SHGB, ICTP, AGE, OPG), cholesterol and sex-hormone related parameters (including TESTOSTERONE, ESTRADIOL, DHEA), and metabolism related parameters (such as FAT, WEIGHT, BMI, GLUCOSE and INSULINE). This division was also confirmed in the PCA biplot, which depicts three groups of attribute vectors pointing in similar directions ( Fig 5A ). These three groups correspond well to groups revealed by clustering.

An external file that holds a picture, illustration, etc.
Object name is pone.0201950.g004.jpg

A) top panel–attribute clustering tree, left panel–patient clustering tree, central panel–dataset heatmap; branch length is proportional to distances between clusters B) Davies Bouldin index for patient partitioning into 2–10 clusters C) Dunn index for patient partitioning into 2–10 clusters.

An external file that holds a picture, illustration, etc.
Object name is pone.0201950.g005.jpg

A) in PCA biplot, B) MD vs rMD metrics.

The patient clustering tree is presented in Fig 4A –left panel. Acquired partitioning was validated using Davies-Buildin (DB) and Dunn indices at different tree cut levels, i.e divisions into 2 to 10 clusters were analyzed. Neither DB nor Dunn index clearly indicated which cluster partitioning is the most appropriate ( Fig 4B and 4C ). In case of the DB good partitioning is indicated by small values. As depicted in Fig 4B , DB index decreases as the number of clusters increases, with a local minimum formed for the division in to 5 groups. In case of the Dunn index a good partitioning is indicated by high values. The highest values can be observed for partitioning into 2 and 3 clusters. However, a local maximum can be observed at the division into 5 groups ( Fig 4C ). Since both indices emphasized clustering into 5 groups, this partitioning is analyzed in greater details.

Partitioning the set into 5 groups results in two large clusters- cl #1 and cl #5, of 89 and 80 samples respectively and three smaller clusters cl #2–24 samples, cl #3–24 samples and cl #4–38 samples. According to MD and rMD metrics clusters #1, #2 and #5 form the core of the data as shown in Fig 5B , while clusters #3 and #4 deviate from the core and form the majority of RD outliers ( Fig 5B ).

The significance of differences between all clusters in terms of particular attributes were tested first with the Kruskal-Wallis test [ 29 ] and then paired Wilcoxon rank sum test with Bonferroni correction. In Fig 6 p-values of all-vs-all Wilcoxon tests were shown.

An external file that holds a picture, illustration, etc.
Object name is pone.0201950.g006.jpg

Values in red denote p-values.

Cluster #3 is characterized by significantly elevated levels of INSULINE and GLUCOSE. This is clearly visible in the clustering heatmap as a bright area in INS and GLUCOSE columns ( Fig 4A ). In PCA bioplot members of the cluster are localized far away from the center of the dataset along INS and GLUCOSE vectors ( Fig 5A ). The significance of difference between #3 and members of other clusters was confirmed by statistical tests ( Fig 6 ). We suspect this cluster may be a group of putative diabetes patients. Cluster #4 is characterized by exceptionally high levels of FSH and ICTP hormones, which are accompanied by low level TESTOSTERONE and decreased ESTRADIOL. The group is also characterized by greater AGE values. FAI and FEI attributes are also low in this group of patients, however this was expected since TESTOSTERON and FAI as well as ESTRADIOL and FEI are related attributes. In the PCA biplot ( Fig 5A ) Members of cluster #4 are localized far away from the center of the dataset along the FSH and ICTP vectors. High FSH and low serum level of TESTOSTERONE may indicate that these patients suffer from primary hypogonadism [ 30 ].

The core of the data in terms of MD and rMD is formed by clusters #1, #2 and #5. Cluster #2 is the smallest of them. As featured by the dendrogram ( Fig 4A –left panel) it is closely related to cluster #5. With the main difference between them being the elevated levels of cholesterol (CHOL.LDL, CHOL.HDL, and CHOL.TOTAL). Members of both clusters are characterized by relatively high TESTOSTERONE levels.

The largest clusters #1 and #5 are hard to be characterized since they form a reference point for describing remaining clusters. The main difference between them comes from metabolism-related attributes: WEIGHT, WAISTLINE, BMI, HIP.GIRTH, FAT, TGC, INS, GLUCOSE. This can be observed in the clustering heat map as a darker patch in the region of cluster #5 ( Fig 4A ). The difference became more evident after addition of categorical data, which included metabolic phenotype classifications (see next section). The clusters also differ in terms of SHGB and FEI, FAI levels. In the PCA biplot members of cluster #5 are shifted in the opposite direction to the one pointed by metabolic attributes ( Fig 5A ) and also towards the SHGB direction. The latter confirms higher SGHB values in this cluster. Quite interestingly members of both largest clusters can be found not only in the core of the data but also in the rMD outlier group ( Fig 5B ), which means that further division might reveal some interpretable subgroups.

Addition of categorical data

Categorical attributes were transformed to binary attributes and scaled as described in Methods section. Hierarchical clustering with Wards algorithm was repeated. Clustering validation Davies-Bouldin and Dunn indexes both indicated division into three clusters as the most appropriate partitioning (data not shown). Two of the clusters could be easily identified as outlier clusters #3 (aberrant GLUCOSE and INS levels) and #4 (aberrant FSH and ICTP) from the numerical attribute clustering analysis. The third cluster forms the core of the data which includes clusters #1, #2 and #5 ( Fig 7 —left panel). Obesity phenotype attributes present in in the set of categorical attributes confirmed that the main difference between cluster #1 and clusters #2 and #5 is related to metabolism–dark patch in OBESITY_PHENOOZZM and OBESTITY_PHENO_FLOMWD and light patch in OBESITY_PHENO_FLMONW ( Fig 7 —heatmap).

An external file that holds a picture, illustration, etc.
Object name is pone.0201950.g007.jpg

Other clustering approaches

Applying additional methodologically divergent approaches may strengthen the final conclusions or suggest other optional viewpoints. We supported the main clustering analysis with three alternative approaches: density-based DBSCAN clustering, hierarchical clustering based on top principal components and biclustering focused on identification of coherent values. While PC based clustering and biclustering approaches led to conclusions compliant with those already presented, the density based approach was unable to uncover the underlying structure of the data. The majority of samples fell into a single cluster and only a few marginal samples were marked as noise (see S1B Fig ). Most probably this is due to the fact that the subgroups overlap and also are characterized by similar point densities, which make them hard to separate by the DBSCAN algorithm. However, the method was successfully applied to support outlier detection. When we ran the algorithm on the dataset containing outliers, the algorithm marked 31 samples as noise. All of them were also marked as outliers by either MD or RD distances ( S1 Table ).

Opposite to DBSCAN clustering–the clustering based on top 7 PCs, which accounted for 70% of data variance, resulted in a partitioning very similar to the one acquired by the main clustering approach ( S2 Fig ).

Finally, the main conclusions were also supported by the outcome of the biclustering plaid model analysis. All significant clusters and relations were found. However, the clusters were smaller and the outcomes were subject to some the randomness due to the nature of the clustering algorithm ( S3A–S3D Fig ).

Female set analysis

The female set was analyzed using the same methodology that was applied in male set analysis. The set included 238 patients with 23 numeric attributes. Data were normalized with the Z-score robust normalization, then outlier analysis was carried out with MD and robust MD distances, finally we performed hierarchical clustering analysis supported with DB and Dunn clustering validation indices.

Outlier analysis in the female set indicates 70 and 20 robust MD and MD outliers respectively. All MD outliers were also robust MD outliers. The robust MD vs MD plot differs significantly from the plot acquired in the male set analysis–points are more condensed and cannot easily divided into subgroups ( Fig 8 ). Although there are many outliers according to rMD, it seems that only a few of them are actual outliers. The majority of rMD outliers remain quite close to the core of the dataset in terms of MD. This suggests that female dataset is more homogeneous than the male dataset.

An external file that holds a picture, illustration, etc.
Object name is pone.0201950.g008.jpg

MD and rMD are consistent–most points lie on a straight line.

Hierarchical clustering of attributes confirmed the division revealed in male set analysis, i.e. three attribute groups were identified: age-related parameters (FSH, SHGB, ICTP, AGE, OPG), cholesterol and sex-hormone related parameters (including TESTOSTERONE, ESTRADIOL, DHEA), and metabolism related parameters ( Fig 9A –top panel). The HDL Cholesterol level was an exception–in this analysis it is part of the age related attribute group.

An external file that holds a picture, illustration, etc.
Object name is pone.0201950.g009.jpg

(A) The violet line and labels on the dendrogram denote the best partitioning according to cluster validation indices. Davies- Bouldin index (B) and Dunn index(C) indicate that partitioning the set into 2 or 3 clusters are the best choices for further analysis (low DB and high Dunn values).

According to DB and Dunn indices the optimal division of female patients includes two or three groups ( Fig 9B and 9C ). We analyzed the three cluster division as it is more informative. In this case cluster #1 consists of 71 patients. These patients are characterized by low values of metabolic parameters ( Fig 9A –heatmap, dark path in GLUCOSE, INS, FAT and others), and elevated levels of SHGB, FSH, CHOL.HDL. Cluster #2 groups 131 patients. It forms the core of the dataset and probably represents the majority of population. Finally cluster #3, a cluster of 14 patients with high levels of metabolic parameters (GLUCOSE, INS, FAT and others) but also elevated levels of TESTOSTERONE and ESTRADIOL.

The biplot visualization of the data is consistent with both: clustering of attributes and clustering of patients. The contributions of particular attributes in PCs confirm the relations between parameters–metabolic parameters and hormone related parameters form two well distinguishable groups of similarly pointing vectors. The third group is more diverse, but the sub groups are correct, i.e OPG, AGE and ICTP form one group and FSH SHGB and CHOL.HDL form a second group of vectors ( Fig 9 red arrows). The distribution of patients in the biplot is also consistent with the clustering. Members of cluster #1 are localized in the region pointed by SHGB, FSH and CHOL.HDL vectors, and opposite the direction of metabolic attributes. Members of cluster #2 are in the center of the plot, while members of cluster #3 are shifted away from the origin mainly in the direction of metabolic attributes.

Over all the PCA plot of the female set is more homogeneous in comparison to the PCA in the male set analysis ( Fig 10 ). Samples present are more evenly distributed around the origin, while in the male set subgroups could be easily distinguished. This suggests that in the female set there are no pathological groups of patients that could be recognized based on the set of attributes at hand. However still, there are some patients that should be investigated and verified prior to including them in further studies (for instance three patients in cluster #3 furthest away from the origin).

An external file that holds a picture, illustration, etc.
Object name is pone.0201950.g010.jpg

Patients form a quite condensed cloud of point (we just a few exceptions). The clusters result from natural biological variation rather than from pathologies.

In this work we presented a data exploratory analysis of a clinical study group. Each patient was described by over 40 numerical and nominal attributes. The aim of the study was to reveal the structure of the data, i.e. verify whether the population of patients is homogenous or whether subpopulations are present. We also wanted to characterize identified subgroups and to investigate basic relations between attributes. The analysis was performed with a set of methods that were specially selected to work well together. First a robust normalization technique was used. Then MD based outlier detection methods, hierarchical clustering with Wards algorithm and PCA visualization was performed. Since all these methods take in to account the correlation and variance of data attributes, their outcomes were consistent. We have shown that the MD/rMD analysis allows not only to identify outliers but can also be used to assess the heterogeneity of a dataset. PCA together with the biplot allowed to characterize data instances and explain the acquired clustering. The analysis was additionally supported by three alternative clustering approaches, which strengthen the main conclusions and contributed to better understanding of the data.

Several important biological conclusion can be drawn. The study showed significant differences between male and female patients. In the male set we managed to identify five distinct patient groups, two of which were recognized as clusters of putatively diseased patients. In further analysis this structure should be taken into account. One should consider testing scientific hypothesis separately in each of identified subgroups. Depending on the aims of subsequent investigation some of the groups should be removed or treated in a special way.

The female set was more homogenous in comparison to the male set and the clusters we identified were not recognized as pathological. However, still one might also consider performing further investigations separately in the identified subgroups.

Neglecting the fact of existence of patient subgroups might make it impossible to reveal important biological phenomena or in the worst case lead to false conclusions.

Supporting information

A) the parameters chosen for clustering were K = 3 neighbors and epsilon = 4 (based on the elbow method), B) density clustering failed to confirm the structure of the data revealed by hierarchical clustering by managed to mark marginal points (zero’s) and could be used for outlier detection.

Most importantly clusters of patients with high levels of FSH or GLUCOSE/INSULIN were found (blue and green cluster respectively).

The analysis resulted in identifying the two important outlier clusters: A) the cluster with elevated INSULIN and GLUCOSE levels and B) patients with elevated FSH levels. In addition two other patient subgroups were found: C) one showing a dependence of hormone and cholesterol related attributes and D) group of patients with simultaneously elevated SHGB and CHOL.HDL levels.

Acknowledgments

Data analyzed in the case-study were gathered in the PolSenior study. We thank all people engaged in the project. In particular we would like to thank prof. Andrzej Milewicz, prof. Malgorzata Mossakowska, prof. Monika Puzianowska-Kuznicka, prof. Ewa Bar-Andziak, prof. Jerzy Chudek.

We would like to thank Dr. Jean-Christophe Nebel for his valuable comments and discussion during preparation of the manuscript.

Funding Statement

The project was partly supported by Wroclaw Centre of Biotechnology through the programme The Leading National Research Centre (KNOW) for years 2014-2018. BMK would like to acknowledge the funding from the statuary fund of the Department of Biomedical Engineering, Wroclaw University of Science and Technology. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Data Availability

IMAGES

  1. The Statistician’s view of a Clinical Trial

    data analysis clinical research

  2. What are the tools for data analysis in research

    data analysis clinical research

  3. Fundamentals to Improve Data Quality in Clinical Trials

    data analysis clinical research

  4. Data Management in Clinical Trials

    data analysis clinical research

  5. data analysis in clinical trials ppt

    data analysis clinical research

  6. The Future of Clinical Trial Data Management

    data analysis clinical research

VIDEO

  1. Lecture 10 : An overview of NGS technology

  2. Statistical Aspects of Bioequivalence Studies: Insights & Analysis

  3. How to search or find the clinical trials for research

  4. Information Resources for Clinical Research, 5 of 5

  5. How I perform data analysis for medical research

  6. Introduction to Clinical SAS

COMMENTS

  1. Planning and Conducting Clinical Research: The Whole Process

    Clinical research can be completed in two major steps: study designing and study reporting. Three study designs should be planned in sequence and iterated until properly refined: theoretical design, data collection design, and statistical analysis design.

  2. A practical guide to data analysis in general literature reviews

    This article is a practical guide to conducting data analysis in general literature reviews. The general literature review is a synthesis and analysis of published research on a relevant clinical issue, and is a common format for academic theses at the bachelor's and master's levels in nursing, physiotherapy, occupational therapy, public health and other related fields.

  3. Design, data analysis and sampling techniques for clinical research

    Statistical analysis is an essential technique that enables a medical research practitioner to draw meaningful inference from their data analysis. Improper application of study design and data analysis may render insufficient and improper results and conclusion. Converting a medical problem into a statistical hypothesis with appropriate ...

  4. Fundamentals of Clinical Data Science

    This open access book comprehensively covers the fundamentals of clinical data science, focusing on data collection, modelling and clinical applications. Topics covered in the first section on data collection include: data sources, data at scale (big data), data stewardship (FAIR data) and related privacy concerns. Aspects of predictive modelling using techniques such as classification ...

  5. Data Management for Clinical Research

    There are 6 modules in this course. This course presents critical concepts and practical methods to support planning, collection, storage, and dissemination of data in clinical research. Understanding and implementing solid data management principles is critical for any scientific domain. Regardless of your current (or anticipated) role in the ...

  6. Clinical Data Science Specialization [6 courses] (CU)

    Clinical Data Science Specialization. Launch your career in Clinical Data Science. A six-course introduction to using clinical data to improve the care of tomorrow's patients. Taught in English. 21 languages available. Some content may not be translated. Instructors: Laura K. Wiley, PhD. +1 more. Enroll for Free.

  7. Rethinking clinical study data: why we should respect analysis ...

    As a repercussion, the scientific process cycle is broken, leaving researchers who want to reuse prior results with three options: 1. Re-run the analysis if the code and original source data are ...

  8. Essentials of data management: an overview

    While data management has broad applications (and meaning) across many fields and industries, in clinical research the term data management is frequently used in the context of clinical trials. 1 ...

  9. Understanding Clinical Data Analysis

    Four textbooks complementary to the current production and written by the same authors are Statistics applied to clinical studies 5th edition, 2012, Machine learning in medicine a complete overview, 2015, SPSS for starters and 2nd levelers 2nd edition, 2015, Clinical Data Analysis on a Pocket Calculator 2nd edition, 2016, all of them edited by ...

  10. Understanding Clinical Research: Behind the Statistics

    Here we'll provide an intuitive understanding of clinical research results. So this isn't a comprehensive statistics course - rather it offers a practical orientation to the field of medical research and commonly used statistical analysis. ... If a research question is evaluated through the collection of data points and statistical analysis ...

  11. Clinical Research Analytics

    Use clinical data mapping to easily store AI-powered transformation rules in a central database in alignment with actual trial data and CDISC data standards metadata. Clinical data transparency Use our industry-leading clinical data transparency to share clinical research with external researchers for secondary analysis and advancement of new ...

  12. PDF Data Management Considerations for Clinical Trials

    7. Understand the reasons for performing research that is reproducible from data collection through publication of results. 9. Distinguish between variable types (e.g. continuous, binary, categorical) and understand the implications for selection of appropriate statistical methods. Extensively covered by required coursework.

  13. An overview of commonly used statistical methods in clinical research

    In order to interpret research datasets, clinicians involved in clinical research should have an understanding of statistical methodology. This article provides a brief overview of statistical methods that are frequently used in clinical research studies. Descriptive and inferential methods, including regression modeling and propensity scores ...

  14. PDF Effective Data Management and Analysis in Clinical Trials

    approaches in clinical trial data analysis. Adaptive designs, Bayesian methods, and machine learning SJIF Impact Factor 6.222 Review Article ISSN 2394-3211 EJPMR ... flexible clinical research.[3] Data management and analysis in clinical trials are closely intertwined with regulatory requirements and compliance. Regulatory agencies, such as the ...

  15. The Generalized Data Model for clinical research

    The role of the data model is to ease the extraction and organization of analysis data sets to address specific clinical research questions. The required analysis dataset structure depends on the specific analyses (e.g., prevalence, incidence, time to event, repeated measures, etc.) and is typically performed using R (OHDSI) or SAS (Sentinel ...

  16. Data management in clinical research: An overview

    Clinical Data Management (CDM) is a critical phase in clinical research, which leads to generation of high-quality, reliable, and statistically sound data from clinical trials. This helps to produce a drastic reduction in time from drug development to marketing. Team members of CDM are actively involved in all stages of clinical trial right ...

  17. Data Management Guidance, Tools & Resources

    Data Management Guidance, Tools & Resources. As of Jan 2023 NIH now requires that all grants that generate research data include a Data Management and Sharing Plan. This page is a starting point to guide researchers through the data life cycle and highlight available tools for data organization and planning that can be included in this plan.

  18. PDF Data Quality Management In Clinical Research

    These data may and records including the medical record. Data quality management (DQM) is a formal process for managing the quality, validity and integrity of the research data captured throughout the study from the time it is collected, stored and transformed (processed) through analysis and publication.

  19. Data Analysis in Clinical Research (CLRS90010)

    Data analysis methods are an integral part of modern clinical research. They are powerful techniques that enable researchers to draw meaningful conclusions from data collected through observation, survey, or experimentation. However, data analysis is a huge discipline with different paradigms, schools of thought and alternative methodologies.

  20. Approaches to data analyses of clinical trials

    Abstract. There are two types of data analyses of randomized clinical trials (RCTs). The primary analyses are pre-specified in the protocol and the findings form the basis for recommendations and clinical decisions. They typically adhere to the intention-to-treat principle. Secondary analyses are supplemental and of various sorts.

  21. AI's role in Clinical Research and Drug Discovery by Vera Ovanin

    AI and Clinical Research: Key Takeaways AI's transformative impact on healthcare spans diagnostics, personalized treatments, and operational efficiencies. In clinical trials, machine learning plays a pivotal role by driving advancements in data analysis, predictive modeling, and optimizing patient recruitment.

  22. Statistical considerations for outcomes in clinical research: A review

    In this summary, we review standard statistical methodology used for data analysis in clinical research. We identify five common types of outcome data and provide an overview of the typical methods of analysis, effect estimates derived, and graphical presentation. We aim to provide a resource for the clinical researcher who is not a practicing ...

  23. Analysis of a Collaborative Research Network of Botulinum Toxin

    Moreover, collaboration network analysis is one of the most valuable approaches of examining the international and regional clinical research. Introduction Botulinum toxin (BT) plays an important role in various medical conditions ( Kasyanju Carrero et al., 2019 ), because it can be employed as a therapeutic and esthetic indicator and allows ...

  24. Effects of intensive lifestyle changes on the progression of mild

    Requesters will be asked to submit a study protocol, including the research question, planned analysis, and data required. The authors will evaluate this plan (i.e., relevance of the research question, suitability of the data, quality of the proposed analysis, planned or ongoing analysis, and other matters) on a case-by-case basis.

  25. What Is Data Analysis? (With Examples)

    Data analysis process. As the data available to companies continues to grow both in amount and complexity, so too does the need for an effective and efficient process by which to harness the value of that data. The data analysis process typically moves through several iterative phases. Let's take a closer look at each.

  26. Hospital Occupancy and Emergency Department Boarding During the COVID

    Author Contributions: Dr Janke had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis. Concept and design: All authors. Acquisition, analysis, or interpretation of data: Janke, Melnick. Drafting of the manuscript: Janke, Melnick.

  27. Statistical Approaches to Analysis of Small Clinical Trials

    A necessary companion to well-designed clinical trial is its appropriate statistical analysis. Assuming that a clinical trial will produce data that could reveal differences in effects between two or more interventions, statistical analyses are used to determine whether such differences are real or are due to chance. Data analysis for small clinical trials in particular must be focused.

  28. Osteosarcopenia increases the risk of mortality: a ...

    Background & aims Osteosarcopenia is a recently recognized geriatric syndrome. The association between osteosarcopenia and mortality risk is still largely underexplored. In this systematic review with meta-analysis of prospective cohort studies, we aimed to explore whether osteosarcopenia could be associated with a higher mortality risk. Methods Several databases were searched from the ...

  29. BUB1 regulates non-homologous end joining pathway to mediate

    The statistical analysis of in vivo tumor growth data is presented under that section. Results. BUB1 is overexpressed in TNBC and correlates with poorer survival and metastatic potential. ... Exact Sciences (paid consultant - no direct conflict), EW: Genentech research support for clinical trials. ...

  30. Exploratory data analysis of a clinical study group: Development of a

    The research sample was chosen from the PolSenior study —a project that aims at investigating the interrelations between health, genetics and social status in advanced age in Polish population. ... In this work we presented a data exploratory analysis of a clinical study group. Each patient was described by over 40 numerical and nominal ...