• Deep Learning Research Proposal

The word deep learning is the study and analysis of deep features that are hidden in the data using some intelligent deep learning models . Recently, it turns out to be the most important research paradigm for advanced automated systems for decision-making . Deep learning is derived from machine learning technologies that learn based on hierarchical concepts . So, it is best for performing complex and long mathematical computations in deep learning .

This page describes to you the innovations of deep learning research proposals with major challenges, techniques, limitations, tools, etc.!!!

One most important thing about deep learning is the multi-layered approach . It enables the machine to construct and work the algorithms in different layers for deep analysis . Further, it also works on the principle of artificial neural networks which functions in the same human brain. Since it got inspiration from the human brain to make machines automatically understand the situation and make smart decisions accordingly.  Here, we have given you some of the important real-time applications of deep learning.

Deep Learning Project Ideas

  • Natural Language Processing
  • Pattern detection in Human Face
  • Image Recognition and Object Detection
  • Driverless UAV Control Systems
  • Prediction of Weather Condition Variation
  • Machine Translation for Autonomous Cars
  • Medical Disorder Diagnosis and Treatment
  • Traffic and Speed Control in Motorized Systems
  • Voice Assistance for Dense Areas Navigation
  • Altitude Control System for UAV and Satellites

Now, we can see the workflow of deep learning models . Here, we have given you the steps involved in the deep learning model. This assists you to know the general procedure of deep learning model execution . Similarly, we precisely guide you in every step of your proposed deep learning model . Further, the steps may vary based on the requirement of the handpicked deep learning project idea. Anyway, the deep learning model is intended to grab deep features of data by processing through neural networks . Then, the machine will learn and understand the sudden scenarios for controlling systems.

Top 10 Interesting Deep Learning Research Proposal

Process Flow of Deep Learning

  • Step 1 – Load the dataset as input
  • Step 2 – Extraction of features
  • Step 3 – Process add-on layers for more abstract features
  • Step 4 – Perform feature mapping
  • Step 5 –Display the output

Although deep learning is more efficient to automatically learn features than conventional methods, it has some technical constraints. Here, we have specified only a few constraints to make you aware of current research. Beyond these primary constraints, we also handpicked more number of other constraints. To know other exciting research limitations in deep learning , approach us. We will make you understand more from top research areas.

Deep Learning Limitations

  • Test Data Variation – When the test data is different from training data, then the employed deep learning technique may get failure. Further, it also does not efficiently work in a controlled environment.
  • Huge Dataset – Deep learning models efficiently work on large-scale datasets than limited data

Our research team is highly proficient to handle different deep learning technologies . To present you with up-to-date information, we constantly upgrade our research knowledge in all advanced developments. So, we are good not only at handpicking research challenges but also more skilled to develop novel solutions. For your information, here we have given you some most common data handling issues with appropriate solutions. 

What are the data handling techniques?

  • Variables signifies the linear combo of factors with errors
  • Depends on the presence of different unobserved variables (i.e., assumption)
  • Identify the correlations between existing observed variables
  • If the data in a column has fixed values, then it has “0” variance.
  • Further, these kinds of variables are not considered in target variables
  • If there is the issue of outliers, variables, and missing values, then effective feature selection will help you to get rid out of it. 
  • So, we can employ the random forest method
  • Remove the unwanted features from the model
  • Repeat the same process until attaining maximum  error rate
  • At last, define the minimum features
  • Remove one at a time and check the error rate
  • If there are dependent values among data columns, then may have redundant information due to similarities.
  • So, we can filter the largely correlated columns based on coefficients of correlation
  • Add one at a time for high performance
  • Enhance the entire model efficiency
  • Addresses the possibility where data points are associated with high-dimensional space
  • Select low-dimensional embedding to generate related distribution
  •   Identify the missing value columns and remove them by threshold
  • Present variable set is converted to a new variable set
  • Also, referred to as a linear combo of new variables
  • Determine the location of each point by pair-wise spaces among all points which are represented in a matrix
  • Further, use standard multi-dimensional scaling (MDS) for determining low-dimensional points locations

In addition, we have also given you the broadly utilized deep learning models in current research . Here, we have classified the models into two major classifications such as discriminant models and generative models . Further, we have also specified the deep learning process with suitable techniques. If there is a complex situation, then we design new algorithms based on the project’s needs . On the whole, we find apt solutions for any sort of problem through our smart approach to problems.

Deep Learning Models

  • CNN and NLP (Hybrid)
  • Domain-specific
  • Image conversion
  • Meta-Learning

Furthermore, our developers are like to share the globally suggested deep learning software and tools . In truth, we have thorough practice on all these developing technologies. So, we are ready to fine-tuned guidance on deep learning libraries, modules, packages, toolboxes , etc. to ease your development process. By the by, we will also suggest you best-fitting software/tool for your project . We ensure you that our suggested software/tool will make your implementation process of deep learning projects techniques more simple and reliable .

Deep Learning Software and Tools

  • Caffe & Caffe2
  • Deep Learning 4j
  • Microsoft Cognitive Toolkit

So far, we have discussed important research updates of deep learning . Now, we can see the importance of handpicking a good research topic for an impressive deep learning research proposal. In the research topic, we have to outline your research by mentioning the research problem and efficient solutions . Also, it is necessary to check the future scope of research for that particular topic.

The topic without future research direction is not meant to do research!!!

For more clarity, here we have given you a few significant tips to select a good deep learning research topic.

How to write a research paper on deep learning?

  • Check whether your selected research problem is inspiring to overcome but not take more complex to solve
  • Check whether your selected problem not only inspires you but also create interest among readers and followers
  • Check whether your proposed research create a contribution to social developments
  • Check whether your selected research problem is unique

From the above list, you can get an idea about what exactly a good research topic is. Now, we can see how a good research topic is identified.

  • To recognize the best research topic, first undergo in-depth research on recent deep learning studied by referring latest reputed journal papers.
  • Then, perform a review process over the collected papers to detect what are the current research limitations, which aspect not addressed yet, which is a problem is not solved effectively,   which solution is needed to improve, what the techniques are followed in recent research, etc.
  • This literature review process needs more time and effort to grasp knowledge on research demands among scholars.
  • If you are new to this field, then it is suggested to take the advice of field experts who recommend good and resourceful research papers.
  • Majorly, the drawbacks of the existing research are proposed as a problem to provide suitable research solutions.
  • Usually, it is good to work on resource-filled research areas than areas that have limited reference.
  • When you find the desired research idea, then immediately check the originality of the idea. Make sure that no one is already proved your research idea.
  • Since, it is better to find it in the initial stage itself to choose some other one.
  • For that, the search keyword is more important because someone may already conduct the same research in a different name. So, concentrate on choosing keywords for the literature study.

How to describe your research topic?

One common error faced by beginners in research topic selection is a misunderstanding. Some researchers think topic selection means is just the title of your project. But it is not like that, you have to give detailed information about your research work on a short and crisp topic . In other words, the research topic is needed to act as an outline for your research work.

For instance: “deep learning for disease detection” is not the topic with clear information. In this, you can mention the details like type of deep learning technique, type of image and its process, type of human parts, symptoms , etc.

The modified research topic for “deep learning for disease detection” is “COVID-19 detection using automated deep learning algorithm”

 For your awareness, here we have given you some key points that need to focus on while framing research topics. To clearly define your research topic, we recommend writing some text explaining:

  • Research title
  • Previous research constraints
  • Importance of the problem that overcomes in proposed research
  • Reason of challenges in the research problem
  • Outline of problem-solving possibility

To the end, now we can see different research perspectives of deep learning among the research community. In the following, we have presented you with the most demanded research topics in deep learning such as image denoising, moving object detection, and event recognition . In addition to this list, we also have a repository of recent deep learning research proposal topics, machine learning thesis topics . So, communicate with us to know the advanced research ideas of deep learning.

Research Topics in Deep Learning

  • Continuous Network Monitoring and Pipeline Representation in Temporal Segment Networks
  • Dynamic Image Networks and Semantic Image Networks
  • Advance Non-uniform denoising verification based on FFDNet and DnCNN
  • Efficient image denoising based on ResNets and CNNs
  • Accurate object recognition in deep architecture using ResNeXts, Inception Nets and  Squeeze and Excitation Networks
  • Improved object detection using Faster R-CNN, YOLO, Fast R-CNN, and Mask-RCNN

Novel Deep Learning Research Proposal Implementation

Overall, we are ready to support you in all significant and new research areas of deep learning . We guarantee you that we provide you novel deep learning research proposal in your interested area with writing support. Further, we also give you code development , paper writing, paper publication, and thesis writing services . So, create a bond with us to create a strong foundation for your research career in the deep learning field.

Related Pages

Services we offer.

Mathematical proof

Pseudo code

Conference Paper

Research Proposal

System Design

Literature Survey

Data Collection

Thesis Writing

Data Analysis

Rough Draft

Paper Collection

Code and Programs

Paper Writing

Course Work

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Diagnostics (Basel)

Logo of diagno

A Novel Proposal for Deep Learning-Based Diabetes Prediction: Converting Clinical Data to Image Data

Associated data.

The data utilized in this work can be found at https://data.world/data-society/pima-indians-diabetes-database (accessed on 8 June 2022).

Diabetes, one of the most common diseases worldwide, has become an increasingly global threat to humans in recent years. However, early detection of diabetes greatly inhibits the progression of the disease. This study proposes a new method based on deep learning for the early detection of diabetes. Like many other medical data, the PIMA dataset used in the study contains only numerical values. In this sense, the application of popular convolutional neural network (CNN) models to such data are limited. This study converts numerical data into images based on the feature importance to use the robust representation of CNN models in early diabetes diagnosis. Three different classification strategies are then applied to the resulting diabetes image data. In the first, diabetes images are fed into the ResNet18 and ResNet50 CNN models. In the second, deep features of the ResNet models are fused and classified with support vector machines (SVM). In the last approach, the selected fusion features are classified by SVM. The results demonstrate the robustness of diabetes images in the early diagnosis of diabetes.

1. Introduction

The most prevalent chronic non-communicable disease in the world is diabetes, also known as diabetes mellitus. Diabetes is fatal or drastically lowers quality of life and affects more women than men [ 1 ]. Diabetes is particularly risky for pregnant women, and unborn children are likely to be affected by this disease. Generally, if the glucose level in the blood rises above the normal value, the person is considered diabetic. This is due to the inability of the pancreas in the human body to fully perform its task. The person’s blood sugar rises if the pancreas cannot utilize the insulin it produces or does not create enough of it. Diabetes can cause long-term damage to different organs such as the eyes, heart, kidneys and blood vessels [ 2 ]. There are three different types of diabetes: type 1, type 2 and gestational. In type 1 diabetes, the pancreas produces little or no insulin. Insulin therapy is needed. It is usually seen in young individuals (age < 30) or children. Type 2 is usually caused by insulin resistance and is more common in older (age > 65) and obese patients [ 3 , 4 , 5 ]. Gestational diabetes is hyperglycemia that occurs during pregnancy. In addition, after pregnancy, the risk of type 2 diabetes is higher in women, and in this case, babies are also at risk [ 6 , 7 ].

It is known that diabetes is a public health problem that affects 60% of the world’s population [ 8 ]. Although the main cause of diabetes is unknown, scientists think it is related to genetic factors and environmental conditions. There are currently 425 million diabetics worldwide, according to the International Diabetes Federation, and 625 million will develop the disease in the next 23 years [ 9 , 10 ]. It is essential to identify the disease at an early stage in order to stop this rise. Only early detection can stop the growth of the disease because there is no cure for diabetes, which is a lifetime condition. With the right treatment, regular nutrition and drugs, the disease can be managed after early diagnosis. [ 11 , 12 ]. However, a delayed diagnosis might result in heart conditions and serious harm to many organs. For the early diagnosis of diabetes, clinical (plasma glucose concentration, serum insulin, etc.) and physical data (for example, body mass index (BMI), age) are often used [ 13 ]. According to these data, a doctor carries out the diagnosis of the disease. However, making a medical diagnosis is a very difficult task for the doctor and can take a very long time. In addition, the decisions made by the doctor may be erroneous and biased. For this reason, the fields called data mining and machine learning are frequently used as a decision support mechanism for the rapid and accurate detection of diseases according to data [ 11 , 14 , 15 ].

Recent advances in computer technologies have led to the emergence of algorithms that allow human tasks to be performed faster and more automatically by computers. Tools such as data mining, machine learning and deep learning, which are generally referred to as artificial intelligence, have shown remarkable performance in interpreting existing data. Especially in the medical field, artificial-intelligence-based methods are used in the diagnosis or treatment of many different diseases as they provide fast and powerful results. Examples of these are diagnostic studies of cancer [ 16 ], diabetes [ 17 ], COVID-19 [ 18 ], heart diseases [ 19 ], brain tumors [ 20 ], Alzheimer’s [ 21 ], etc. For more comprehensive information on the applications of artificial intelligence in the medical field, research studies by Kaur et al. [ 22 ] and Mirbabaie et al. [ 23 ] can be reviewed. Artificial intelligence is very useful for the medical field. Thanks to the superior success of artificial intelligence in medical studies so far, it has recently become common to record medical big data in hospitals. Considering that each patient is a real data point, much numerical data such as electrocardiograms (ECG), electromyograms (EMG), clinical data, blood values or a large number of image data such as X-ray, magnetic resonance imaging (MRI) or computed tomography (CT) can be produced after medical records. In this sense, such medical records constitute an important part of big data in the medical field [ 24 ].

Machine learning algorithms are generally used to interpret (regression, classification or clustering) big data based on artificial intelligence. Thanks to these algorithms, the relationship between them is learned based on samples and observations of the data. Machine learning methods that are frequently used in this sense are artificial neural networks (ANN), support vector machines (SVM), k-nearest neighbors (k-NN), decision trees (DT) and naïve Bayes (NB). These methods directly learn the correlation between input and target data. However, with the developments in artificial intelligence and computer processors in the last decade, ANN has been further deepened, and deep learning, which applies both feature extraction and classification together, has come to the fore. Especially in big data applications, deep learning has given a great advantage over traditional machine learning methods [ 25 ]. The most frequently used model in deep-learning-based medical diagnosis/detection applications is convolutional neural network (CNN). CNN models are very popular due to both their deep architecture and high-level feature representation. Since the architecture designed for CNN is end to end, raw data are given as input and classes are obtained as output. Therefore, the designed architecture is very important for the performance of the CNN model [ 26 ]. Recently, however, researchers have adopted transfer learning applications and used popular CNN architectures such as ResNet [ 27 ], GoogleNet [ 28 ], Inception [ 29 ], Xception [ 30 ], VGGNet [ 31 ], etc. In different data-driven studies [ 32 ], the direct use of pre-trained or pre-designed CNN architectures has provided advantages in terms of both performance and convenience

1.1. Previous Artificial Intelligence Based Studies on Diabetes Prediction

This study performs deep-learning-based diabetes prediction using the PIMA dataset. In general, studies developed for diabetes prediction are based on machine learning or deep learning.

Some of the studies that applied diabetes prediction to the PIMA dataset using machine learning methods are as follows. Zolfaghari [ 33 ] performed diabetes detection based on an ensemble of SVM and feedforward neural network. For this, the results obtained from the individual classifiers were combined using the majority voting technique. The ensemble approach provided a better result than the individual classifiers with 88.04% success. Sneha and Gangil [ 34 ] performed diabetes prediction using many machine learning methods such as naïve Bayes (NB), SVM and logistic regression. The best accuracy was obtained with SVM with 77.37%. In addition, the authors applied feature selection for the PIMA dataset. The features with low correlation were removed. Edeh et al. [ 35 ] compared four machine learning algorithms, Bayes, decision tree (DT), SVM and random forest (RF), on two different datasets for diabetes prediction. In the experimental results with PIMA, the highest accuracy was obtained with SVM at 83.1%. Chen et al. [ 36 ] reorganized the PIMA data with preprocessing and removed the misclassified data with the k-means algorithm (data reduction). They then classified the reduced data with DT. As a result of the study, diabetes was predicted with an accuracy of 90.04%. Dadgar and Kaardaan [ 37 ] proposed a hybrid technique for diabetes prediction. First, feature selection was performed with the UTA algorithm. Then, the selected features were given to the two-layer neural network (NN) whose weights were updated by genetic algorithm (GA). As a result, diabetes estimation was provided with an accuracy of 87.46%. Zou et al. [ 38 ] used DT, RF, and NN models for diabetes prediction. They also used principal component analysis (PCA) and minimum redundancy maximum relevance (mRMR) to reduce dimensionality. As a result, RF performed more successful predictions than the others, with 77.21% accuracy. For other proposed studies based on machine learning, studies by Choudhury and Gupta [ 39 ] and Rajeswari and Prabhu [ 40 ] can be examined.

The following are some studies that use the PIMA dataset with deep learning models: For diabetes prediction, Ashiquzzaman et al. [ 41 ] created a network with an input layer, fully connected layers, dropouts and an output layer architecture. It fed the PIMA dataset features directly into this designed MLP and achieved an accuracy of 88.41% at the end of the application. Massaro et al. [ 42 ] created artificial records and classified these data with long short-term memory (LSTM) (LSTM-AR). The LSTM-AR classification result, which was stated as 89%, was superior to both LSTM and the multi-layer perceptron (MLP) with cross validation previously performed. Kannadasan et al. [ 43 ] designed a deep neural network that extracts features with stacked autoencoders and performs diabetes classification with softmax. The designed deep architecture provided 86.26% accuracy. Rahman et al. [ 44 ] presented a model based on convolutional LSTM (Conv-LSTM). They also experimented with traditional LSTM and CNN to compare the results. They applied grid search algorithm for hyperparameter optimization in deep models. For all models, the input layer was one dimensional (1D). After training and test separation, Conv-LSTM for test data outperformed other models, with 91.38% accuracy. Alex et al. [ 45 ] designed a 1D CNN architecture for diabetes prediction. However, missing values were corrected by outlier detection. Then, they preprocessed the data with synthetic minority oversampling technique (SMOTE), and the imbalance in the data were removed. They then fed the processed data into the 1D CNN architecture and achieved 86.29% accuracy. For other applications based on deep learning for diabetes prediction, the studies presented by Zhu et al. [ 46 ] and Fregoso-Aparicio et al. [ 47 ] can be examined.

Previous studies show that the PIMA dataset is often used for machine learning, 1D-CNN and LSTM structures. The numerical nature of the PIMA dataset has limited the feature extraction and classification algorithms that researchers can use. In this study, this limitation is overcome by converting numerical data to images. Thus, the PIMA numerical dataset will be applicable with popular CNN models such as ResNet, VGGNet and GoogleNet.

1.2. The Structure, Purpose, Differences and Contribution of the Study

Examining the previous studies mentioned in Section 1.1 . reveals that various machine learning and deep-learning-based applications predict diabetes quite successfully for the PIMA dataset containing clinical data records. Similar to the PIMA dataset, many clinical data in the medical field are composed of numerical values. Using numerical values directly with conventional machine learning techniques is more typical because studies involving machine learning models such as SVM, NB, RF, DT, etc. feed raw data or data with small preprocessing directly to the model and give target (0 (negative)–1 (positive)) values to the output. Studies that design deep architecture using the same data feed the PIMA features either to the 1D convolution layer or to the fully connected layers. The study by Massaro, Maritati, Giannone, Convertini and Galiano [ 42 ] processed the PIMA dataset containing 1D data with a recurrent-neural-network (RNN)-based LSTM. Nevertheless, LSTM was designed for sequential data, whereas the PIMA dataset contains independent data.

Traditional machine learning techniques have been surpassed in many respects by deep learning, which has become more popular in recent years [ 48 , 49 ]. With the high-level capabilities they offer, particularly deep CNN models, they have shown greater performance, notably in computer vision applications. However, the PIMA dataset’s inclusion of numeric values has thus far prompted researchers to create 1D CNN models. Popular CNN models are created for computer vision, and therefore the input layer only accepts 2D data. These models are employed in transfer learning applications. As a result, feature extraction using well-known CNN models and a diabetes prediction using these models have not yet been established from this PIMA dataset containing independent numerical data. Therefore, in order to provide more successful diagnoses, transformation can be applied to the raw data in accordance with popular CNN models.

This study converts each sample in the PIMA dataset to images (diabetes images) to overcome this limitation. Each diabetes image has cells representing features in the PIMA dataset. The ReliefF feature selection algorithm [ 50 , 51 , 52 ] was also used to make the feature with high correlation more dominant in the image. After each feature is placed on the image according to its importance, data augmentation is applied for these images. In fact, the easy application of data augmentation for diabetes data is one of the important contributions of this study because compared to numerical data, data augmentation for images is an easier and more common technique. The augmented image data are then fed to the ResNet18 and ResNet50 CNN models and diabetes prediction is performed. In order to improve these current results, the features of both models are then fused and classified with SVM (CNN-SVM). Finally, feature selection is made with the ReliefF algorithm, among many fusion features, and these selected features are classified by SVM. At the end of the study, all these results are compared. According to the results, the CNN-SVM structure with selected fusion features provides more successful diabetes prediction than others. In addition, the results of the proposed method are compared with those of previous studies, and the method is proven to be effective. The contributions of the proposed method can be stated as follows:

  • An application with an end-to-end structure is suggested for diabetes prediction.
  • PIMA dataset with numeric values is converted to images.
  • It is provided to use numerical diabetes data together with popular CNN models.
  • During the conversion to the image, the importance of the features is taken into account.
  • The proposed method is superior to most previous studies.

2. PIMA Indians Diabetes Dataset

In this study, the PIMA Indians Diabetes dataset, which is taken from the Kaggle data repository and is frequently preferred for diabetes prediction, is used. The access link is https://data.world/data-society/pima-indians-diabetes-database (Access Date: 8 June 2022). The National Institute of Diabetes and Digestive and Kidney Diseases provided the source data for this dataset. The dataset’s goal is to diagnose whether or not a patient has diabetes based on certain diagnostic metrics provided in the collection. All patients here, in particular, are PIMA Indian women over the age of 21.

The dataset includes the following measurements and ranges of clinical and physical characteristics. Pregnancies (number, [0–17]), glucose (value, [0–199]), blood pressure (mm Hg, [0–122]), skin thickness (mm, [0–99]), insulin (mu U/mL, [0–846]), BMI (kg/m 2 , [0–67.1]), diabetes pedigree function (PDF) (value, [0.078–2.42]), age (years, [21–81]), and outcome (Boolean- 0, 1). The data are entirely numerical and comprise a total of 8 features and 768 samples. Table 1 shows a few samples from the dataset.

Some examples of Pima Indians Diabetes dataset.

Pregnancy [0–17]Glucose [0–199]Blood Pressure
[0–122]
Skin Thickness
[0–99]
Serum Insulin
[0–846]
BMI
[0–67.1]
PDF
[0.078–2.42]
Age
[21–81]
Outcome
(0–1)
18966239428.10.167210
2197704554330.50.158531
1189602384630.10.398591
110330388343.30.183330
91711102424045.40.721541
58866212324.40.342300
2141583412825.40.699240
210066209032.90.867281
78378267129.30.767360
7160543217530.50.588391

3. Methodology

The methods used to determine the diabetes status of patients will be outlined in detail in this section. The steps of the proposed method are shown in Figure 1 . The feature selection method initially selects the most useful features from the numerical data, as shown in Figure 1 . The boundaries of all features are then adjusted for the numeric-to-image conversion stage once the numerical data has been normalized. The numerical to image conversion process is applied in such a way that the most effective features determined by the feature selection algorithm are dominant. The classification success of deep ResNet models is then increased by the use of data augmentation techniques. The three ResNet-based approaches suggested in this study are used to classify data in the final stage. Below, we go over each of these processes in more detail.

An external file that holds a picture, illustration, etc.
Object name is diagnostics-13-00796-g001.jpg

Application steps of proposed methods.

3.1. ReliefF Feature Selection Algorithm

To improve classification capability, a variety of feature reduction strategies have been explored in the literature [ 53 ]. In the literature, ReliefF is one of the distance-based feature selectors. ReliefF, developed by Kira and Rendell [ 52 ] in 1992, is one of the most successful feature filtering methods.

Dimension reduction strategies aid in the removal of superfluous attributes from a data set. These technologies aid in data compression, which saves storage space. It also shortens the time required for computational complexity and reduces the amount of time it takes to attain the same goal [ 54 ].

Kononenko [ 55 ] improved the algorithm for multi-class issues in 1994. With the help of this algorithm, feature selection can be performed successfully. The ReliefF algorithm is highly efficient and does not impose any restrictions on the data kinds’ features. The ReliefF method assists in the solution of many classes of issues by selecting the nearest neighboring samples from each sample in each category [ 56 ].

ReliefF seeks to expose the connections and consistency found in the dataset’s properties. Furthermore, by constructing a model that addresses the proximity to samples of the same class and distance to samples of different classes, it is feasible to discover the significant features in the dataset. Between samples of distinct qualities, the ReliefF model chooses neighboring attributes that are closest to each other [ 54 ]. The dataset is divided into two components in this model: training and test data. R i random samples are chosen from the training set, and the difference function d i f f is used to calculate the nearest neighbors of the same and different classes to identify the nearest neighbors to the selected R i sample, as illustrated in Equation (1). When identifying nearest neighbors, the d i f f function is also utilized to compute the distance between instances. The total distance is simply the sum of all attribute differences (i.e., Manhattan distance) [ 51 ].

Equation (1) is used to determine the difference between two separate I 1 and I 2 samples for the attribute A and to discover the closest distance between samples. The nearest neighbor H from the same class and the nearest neighbor M from a different class are chosen. The distance of adjacent sample A f in the class and between the classes is compared based on the values of R i , M , H , and the dataset’s weighting vector. The W A f weight is calculated as a result of the comparison by giving less weight to the distant attributes [ 57 ]. These processes are performed m times for each attribute, and the weight values are calculated for each attribute. The weights are updated using Equation (2) [ 55 , 58 ].

As a result of applying the ReliefF feature selection method described above to the PIMA dataset features, the importance weight of each feature is shown in Figure 2 . The number of nearest neighbors was also determined as 10. As seen in Figure 2 , the most effective features from the PIMA numerical data were determined by the ReliefF algorithm.

An external file that holds a picture, illustration, etc.
Object name is diagnostics-13-00796-g002.jpg

Importance weight of features in the PIMA dataset.

3.2. Normalization of Data

In artificial intelligence studies, normalizing data containing many features is a known process. Because different features have different limits. Setting features to the same or similar range, i.e., normalization, improves learning performance. The PIMA dataset also has different lower and upper bound values, as seen in Table 1 . In this sense, normalization of these values is necessary. In addition, normalization is vital for the numeric-to-image conversion process in the proposed implementation because the value of each feature must be located on the image that represents that sample. According to the amplitude of the feature, the cell in the corresponding image has a brighter color. Therefore, the maximum and minimum values for all features must be the same.

The preferred method for normalization is feature scaling. With this method, feature values are rescaled to a certain range. The feature scaling method used in this study is the min – max normalization method. In this method, the new sample value ( x ^ ) is determined according to the maximum ( x m a x ) and minimum ( x m i n ) values of the features. As a result of normalization, all features are distributed between 0–1. In the application phase, normalization is applied for eight features in the PIMA dataset. Figure 3 shows that after this normalization, the glucose [0–199] and blood pressure [0–122] values range from 0–1. Equation (3) shows the formula for the min–max normalization method.

An external file that holds a picture, illustration, etc.
Object name is diagnostics-13-00796-g003a.jpg

Min – max normalization of PIMA dataset.

3.3. Conversion of Numeric Data to Image Data

Although the number of image data in the medical field has increased considerably recently, there is still a large amount of numerical data available. Although numerical values are easily and cheaply obtained, the interpretation of these data is usually performed by machine learning methods. Recently proposed deep architecture studies prefer 1D CNN structures that take these numerical values as input because popular CNN models, which provide significant improvements in computer vision, cannot be used directly for such data. Because for these models, 2D data should be given as input to the input layer. CNN models such as ResNet, VGGNet, GoogleNet, etc., have an architecture designed for image data. Therefore, the inability to analyze datasets containing 1D samples with these powerful models is a major disadvantage in terms of both application diversity and prediction performance. This section discusses the conversion of numeric data to images to overcome this limitation in the PIMA dataset, which is a numeric dataset.

In the process of converting PIMA data to images, the principle of determining the brightness of a specific region (cell) in the image according to the amplitude of each feature is adopted. In fact, each feature can be viewed as a piece of the sample image’s puzzle. For each sample in the PIMA dataset, the 120 × 120 image structure shown in Figure 4 is used. The index on each cell corresponds to the feature index in the PIMA dataset. That is, Figure 4 shows feature locations in a sample image. In Figure 4 , the location and size of features are determined not randomly but based on feature importance. As seen in Figure 2 as a result of the ReliefF algorithm, the order of importance of features is 2-8-6-1-7-3-5-4. Therefore, a larger cell is assigned for the more important feature. Each cell is colored according to the amplitude of the corresponding feature value. Because all data were previously normalized, each feature value ranges from 0 to 1. Each feature value is multiplied by 255, resulting in images with cells with brightness values between 0 and 255. Therefore, the resulting images are in gray spaces. Some sample diabetes images are shown in Figure 4 .

An external file that holds a picture, illustration, etc.
Object name is diagnostics-13-00796-g004.jpg

Conversion selected features to image (numeric to image).

As a result of applying the aforementioned image conversion method on the PIMA dataset, one image for each sample (that is, 768 images in total) is formed. These images, with all features included, can now be used in CNN models that require 2D data input. Furthermore, image data augmentation methods are easily applicable to these image data. For this purpose, the image structure in Figure 4 is designed asymmetrically because in the data augmentation stage, all images must be reproduced differently from each other.

3.4. Data Augmentaion

The number of samples directly influences the success of deep learning approaches. However, accessing a significant volume of data is not always possible. As a result, researchers artificially increase the size of a training dataset by producing modified versions of the images in the dataset. These techniques, which are applied to raw images for this purpose, are known as data augmentation techniques.

In this study, the diabetic data contains 768 numerical samples in total, and hence 768 images are created during the conversion from numerical to image data. Data augmentation techniques are used because this amount is insufficient for a deep learning implementation. To ensure data diversity and robust training, four different data augmentation techniques (rotation, scale, reflection and translation) are applied to all images produced, as in Figure 4 . Table 2 shows the lower and upper limit values for these data augmentation techniques. Additionally, Figure 5 shows the new diabetes images produced as a result of data augmentation techniques.

An external file that holds a picture, illustration, etc.
Object name is diagnostics-13-00796-g005.jpg

Data augmentation methodologies and sample augmented images.

Approaches to data augmentation’s lower and upper limitations.

Parameter NameLower LimitUpper Limit
Reflection--
Rotation−30°30°
Scale0.91.1
Translation−10+10

After data augmentation, each original diabetes image is reproduced in four different ways. As a result, a total of five artificial samples is obtained from one sample. The sample numbers of the classes before and after the data augmentation stage are shown in Table 3 . As a result of the data augmentation, the total number of images reached 3840.

Examination of all data before and following data augmentation.

Class0
(Negative)
1
(Positive)
Total
Before data augmentation500268768
After data augmentation250013403840

3.5. Diabetes Prediction via ResNet Models

After data augmentation, images separated into 80% training and 20% testing are fed to the CNN model. In this study, diabetes estimation is provided with the ResNet18 and ResNet50 models, which are frequently used for comparison purposes. Many studies apply ResNet models widely because of the advantages they provide [ 59 ]. What makes ResNet preferable is that it transmits residual values to the next layers to avoid the vanishing gradients problem, for which it uses residual blocks. There are ResNet models with different depths. The depths of the ResNet18 and ResNet50 models used in this study are 18 and 50, respectively.

This study performs diabetes detection with existing models instead of designing a new CNN architecture. With only minor modifications (fine-tuning), existing ResNet models are adapted to our work. For both models, the last two layers of existing models are removed and replaced with two fully connected output layers and a classification (softmax) layer. In addition, while the diabetes images produced are 120 × 120, the input size for ResNet models should be 224 × 224. Therefore, all diabetes images are resized before and during training (see Figure 6 ). Information about the results obtained after the training and testing phases will be discussed in the results section.

An external file that holds a picture, illustration, etc.
Object name is diagnostics-13-00796-g006.jpg

Classification of diabetes images as diabetic (1) and nondiabetic (0) with ResNet models.

3.6. Deep Feature Extraction, Feature Selection and Classification

While the previous section directly uses fine-tuning of ResNet models, this section describes the CNN-SVM structure. In other words, CNN is used for feature extraction and SVM is used for classification. This approach has been frequently preferred recently to increase the classification accuracy [ 60 ]. Two different experimental applications are presented at this stage. The features obtained with the CNN models in the previous stage are combined and fed to the SVM. In the previous step, 512 deep features were extracted from the ResNet18 model, and 2048 deep features were extracted from the ResNet50 model. Then, these features were combined to obtain a total of 2560 deep features, with 80% and 20% of these deep features being divided into two groups for training and testing. In the first experimental stage, these deep features are classified by the SVM machine learning algorithm. For this classification, linear, quadratic, cubic and Gaussian SVM kernel functions are used and results are obtained. In the second stage, the most effective 500 features from a total of 2560 features extracted from ResNet models are selected using the ReliefF feature selection algorithm. These 500 selected features are classified by the SVM machine learning algorithm. Similarly, at this stage, classification is made for SVM using linear, quadratic, cubic and Gaussian SVM kernel functions. All results are then compared. Figure 7 shows the proposed CNN-SVM structure.

An external file that holds a picture, illustration, etc.
Object name is diagnostics-13-00796-g007.jpg

Implementation steps of the proposed CNN-SVM approach.

The results of the experimental studies are discussed in Section 4 . Experimental application in the last step provided the most successful results. The flow graph containing the applications of this step is shown in Figure 8 .

An external file that holds a picture, illustration, etc.
Object name is diagnostics-13-00796-g008.jpg

Application flow chart in the last step.

4. Results and Discussion

In this section, the results of the proposed approach are discussed. All deep learning applications for diabetes prediction were performed using a laptop with Intel Core i7-7700HG processor, NVIDIA GeForce GTX 1050 4 GB graphics card and 16 GB RAM. Applications are developed in Matlab environment. Figure 8 can be taken as reference for the software design or code implementation of the proposed approach. The algorithm of the method was created as in Figure 8 . Toolbox and libraries used directly during coding prevented software complexity. Toolboxes used in this context are Machine Learning Toolbox, Deep Learning Toolbox and Image Processing Toolbox.

In order to demonstrate the superiority of the proposed method, results are produced with three different approaches. Methodological information about these three approaches has been shared in detail in the previous section. In the first approach, classification is performed with fine-tuned ResNet models to perform the diabetes prediction using diabetes images after data augmentation. For this, the ResNet18 and ResNet50 models are fine-tuned, and the output layer is changed according to the two classes. Then, 3840 diabetes images are divided into two groups as 80% and 20% training data and test data, respectively. While the models are trained using the training data, the performance of the network is obtained using the test data. In the second approach, deep features extracted from two fine-tuned ResNet models are combined and these fusion features (2560) are classified by the SVM machine learning method. The performance of SVM differs according to the kernel function used. Therefore, in the second approach, classification accuracies are obtained by using linear, quadratic, cubic and Gaussian kernel functions and compared with each other. In the last approach, namely the proposed method, the most important 500 features from a total of 2560 fusion features extracted from fine-tuned ResNet models are selected with the ReliefF feature selection algorithm. In this way, we aimed to achieve similar success with fewer features. These features are classified with SVM as in the second approach. Classification results are obtained with linear, quadratic, cubic and Gaussian kernel functions, and the results are compared with other approaches.

ResNet models are trained once for all the approaches mentioned above. In other words, as a result of the three approaches, the features obtained with the ResNet models are the same. The parameters used for training ResNet models are: Mini Batch Size: 32; Max Epochs: 5; Learn Rate Drop factor: 0.1; Learn Rate Drop period: 20; Initial Learn Rate: 0.001. In addition, the optimizer used to update the weights during the training process is Stochastic Gradient Descent with Momentum.

After training is performed in the first approach, the features obtained in the first approach are used in other approaches. The accuracy and loss graph of the first approach, obtained during the training and testing phase of the ResNet18 and ResNet50 models, is shown in Figure 9 . It is clear that overfitting does not occur during the training phase. In the second and third approaches, the CNN model is not trained, and 512 and 2048 features are extracted from ResNet18 and ResNet50, respectively, through fully connected layers used directly. The confusion matrixes obtained as a result of the classification of these features with SVM are shown in Figure 10 and Figure 11 . Figure 10 shows the application results obtained with the second approach using all the fusion features. Figure 11 shows the final application results that classify the selected features from the fusion features.

An external file that holds a picture, illustration, etc.
Object name is diagnostics-13-00796-g009a.jpg

Training and loss graphics of ResNet models. ( a ) ResNet18. ( b ) ResNet50.

An external file that holds a picture, illustration, etc.
Object name is diagnostics-13-00796-g010.jpg

Confusion matrices obtained as a result of classification of all fused features with SVM.

An external file that holds a picture, illustration, etc.
Object name is diagnostics-13-00796-g011.jpg

Confusion matrices obtained as a result of classification of selected features with SVM.

The confusion matrix structure that enables the calculation of these metrics is shown in Figure 12 . The performance of the system is measured with the t p , t n , f p and f n values in this matrix. Using these values, accuracy, specificity, precision, sensitivity, F1-score and MCC performance metrics are calculated with the help of the formulas between Equations (4) and (9). Table 4 shows the performance metrics obtained as a result of the three approaches.

An external file that holds a picture, illustration, etc.
Object name is diagnostics-13-00796-g012.jpg

Structure of confusion matrices.

Performance metrics for the three approaches.

ModelKernelAcc. (%)Spec.Prec.Sens.F1-ScoreMCC
ResNet18-80.860.66890.81420.89470.85260.5868
ResNet50-80.470.57340.78260.94740.85710.5832
SVM with 2560 featuresLinear91.020.85820.92500.93800.93150.8012
Quadratic91.670.87690.93430.93800.93610.8163
Cubic90.890.86190.92660.93400.93400.7988
Gaussian91.410.84330.91890.95200.93520.8090
SVM with 500 selected featuresLinear91.150.83210.91380.95400.93350.8030
Quadratic91.930.84700.92120.95800.93920.8206
Gaussian90.890.83210.91350.95000.93140.7972

According to Table 4 , the highest accuracy in the first approach is obtained with the fine-tuned ResNet18 model. The accuracy rates obtained with ResNet18 and Resnet50 are 80.86% and 80.47%, respectively. In the second approach, in the classification made with SVM using 2560 features, the highest accuracy is calculated as 91.67% with the quadratic kernel function. In the last approach, in the classification made with the 500 most effective features selected by the feature selection algorithm, the highest accuracy is calculated with the SVM/cubic kernel function of 92.19%. The results of the first approach showed that converting diabetes data from numeric to image is an effective technique because these images were successfully classified with ResNet models. The second approach shows that fusing the features of different CNN models highly affects the success. In addition, SVM also showed a successful classification performance. The last approach showed that higher achievement can be achieved with fewer features. The results obtained with the last approach are compared with previous studies using the PIMA dataset, as shown in Table 5 . As can be seen, the method proposed in our study outperformed many previous studies. Considering the methodological knowledge of previous studies, the numerical nature of the PIMA dataset has led researchers to use algorithms fed with numerical data such as traditional machine learning, 1D-CNN and LSTM. This study, unlike previous studies, transformed the PIMA dataset into image data and thus made the PIMA dataset suitable for popular CNN models.

Comparative analysis with previous works.

Previous WorkMethodAccuracy (%)
Zolfaghari [ ]Ensemble of SVM and NN88.04
Sneha and Gangil [ ]Feature Selection and SVM77.37
Srivastava et al. [ ]ANN92.00
Edeh, Khalaf, Tavera, Tayeb, Ghouali, Abdulsahib, Richard-Nnabu and Louni [ ]SVM83.1
Massaro, Maritati, Giannone, Convertini and Galiano [ ]LSTM-AR89
Dadgar and Kaardaan [ ]UTA-NN and GA87.46
Zou, Qu, Luo, Yin, Ju and Tang [ ]mRMR-RF77.21
Ashiquzzaman, Tushar, Islam, Shon, Im, Park, Lim and Kim [ ]Deep MLP88.41
Kannadasan, Edla and Kuppili [ ]Stacked Autoencoders-DNN86.26
Rahman, Islam, Mukti and Saha [ ]Conv-LSTM91.38
Alex, Nayahi, Shine and Gopirekha [ ]DCNN/SMOTE/Outlier Detection86.29
Kalagotla et al. [ ]Stacking of MLP, SVM, LR78.2
Jakka and Vakula Rani [ ]LR77.6
Diabetes images: ResNet18 and ResNet50-ReliefF92.19

5. Conclusions, Discussion and Future Works

Diabetes is a chronic disease that limits people’s daily activities, reduces their quality of life and increases the risk of death. In the past, machine learning and DNN solutions have been developed using clinical data and various diabetes prediction studies have been carried out. Despite the encouraging results of these studies, the numerical nature of clinical registry data has limited the use of popular CNN models. In this study, popular CNN models were used to determine the diagnosis of diabetes. Since these CNN models require two-dimensional data input, numerical clinical patient data (PIMA dataset) were first converted to images in this study. In this way, each feature was included in the sample image. This process was not performed randomly, and the most effective feature was made to stand out more in the image. During this process, the ReliefF feature selection method was used to determine the most effective features. After the number of generated images was increased by data augmentation and their size was adjusted for the ResNet model, diabetes prediction was carried out with three different approaches.

Diabetes images were successfully classified with the first approach using the fine-tuned ResNet18 and ResNet50 models. In the second approach, SVM was used to classify a total of 2560 deep features extracted from the fully connected layers of both ResNet models. In the last approach, the most effective 500 of these deep features were selected using the ReliefF feature selection algorithm, and the selected features were classified by SVM. The most successful prediction was obtained with the third approach. The accuracy of the classification using the SVM/cubic model with 500 selected features was 92.19%. All these classifications were performed on the image data. The conversion to image data removed the algorithm limitation that can be used for the PIMA dataset. In this way, the PIMA dataset or similar numerical data can be analyzed with different CNN models capable of extracting high-level and complex features. An application containing image data can be analyzed more diversely and comprehensively than an application containing numerical data because the different artificial intelligence combinations that can be applied to the image data are very rich. The results obtained with the ResNet18 and ResNet50 models in this study, therefore, outperform previous studies. For example, the number and variety of features can be increased with different CNN models. Based on all this, the experimental results have shown that converting clinical data into images is an effective technique.

The method proposed in this study can also be applied for different numerical data. Deep-learning-based studies have reduced the dependency on features and the designed architecture has come to the fore. However, the method proposed in this study is valuable in that deep and comprehensive architectures can also be used for numerical data. This application may involve more processing steps than studies using raw data directly. However, the generation of image data paves the way for further improvement of diabetes prediction performance because CNN models in many different architectures are now applicable to numerical data. Moreover, data augmentation can now be easily applied to diabetes images. In addition, the application results show that the fusion features used in the CNN-SVM architecture greatly increase the success. Additionally, using selected features, CNN-SVM is less costly and provides more accurate predictions. Based on these situations, the important trend of experimental simulation studies can be explained as follows: selected fusion features increase the performance of the system, although they are fewer in number. In addition, the CNN-SVM structure is quite effective. Different applications with fewer, more effective and more diverse features increase the classification accuracy of the system.

In future studies, it is planned to use of different CNN models and feature selection methods to improve diabetes prediction performance. A greater variety of features will be obtained by using more CNN models. In this case, it is expected that the classification accuracy will increase. In addition, future studies plan to apply the produced diabetes images with transformer-based networks.

Funding Statement

This research received no external funding.

Author Contributions

Conceptualization, M.F.A. and K.S.; methodology, M.F.A.; software, M.F.A. and K.S.; validation, M.F.A.; formal analysis, M.F.A. and K.S.; investigation, M.F.A.; writing—original draft preparation, M.F.A.; writing—review and editing, M.F.A. and K.S.; visualization, M.F.A. and K.S.; supervision, K.S. All authors have read and agreed to the published version of the manuscript.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Data availability statement, conflicts of interest.

The authors declare no conflict of interest.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Review Article
  • Open access
  • Published: 22 April 2020

Deep learning in mental health outcome research: a scoping review

  • Chang Su 1 ,
  • Zhenxing Xu 1 ,
  • Jyotishman Pathak 1 &
  • Fei Wang 1  

Translational Psychiatry volume  10 , Article number:  116 ( 2020 ) Cite this article

52k Accesses

158 Citations

20 Altmetric

Metrics details

  • Psychiatric disorders

Mental illnesses, such as depression, are highly prevalent and have been shown to impact an individual’s physical health. Recently, artificial intelligence (AI) methods have been introduced to assist mental health providers, including psychiatrists and psychologists, for decision-making based on patients’ historical data (e.g., medical records, behavioral data, social media usage, etc.). Deep learning (DL), as one of the most recent generation of AI technologies, has demonstrated superior performance in many real-world applications ranging from computer vision to healthcare. The goal of this study is to review existing research on applications of DL algorithms in mental health outcome research. Specifically, we first briefly overview the state-of-the-art DL techniques. Then we review the literature relevant to DL applications in mental health outcomes. According to the application scenarios, we categorize these relevant articles into four groups: diagnosis and prognosis based on clinical data, analysis of genetics and genomics data for understanding mental health conditions, vocal and visual expression data analysis for disease detection, and estimation of risk of mental illness using social media data. Finally, we discuss challenges in using DL algorithms to improve our understanding of mental health conditions and suggest several promising directions for their applications in improving mental health diagnosis and treatment.

Similar content being viewed by others

research proposal of deep learning

Natural language processing applied to mental illness detection: a narrative review

research proposal of deep learning

Predicting the future of neuroimaging predictive models in mental health

research proposal of deep learning

Predictive modeling of depression and anxiety using electronic health records and a novel machine learning approach with artificial intelligence

Introduction.

Mental illness is a type of health condition that changes a person’s mind, emotions, or behavior (or all three), and has been shown to impact an individual’s physical health 1 , 2 . Mental health issues including depression, schizophrenia, attention-deficit hyperactivity disorder (ADHD), and autism spectrum disorder (ASD), etc., are highly prevalent today and it is estimated that around 450 million people worldwide suffer from such problems 1 . In addition to adults, children and adolescents under the age of 18 years also face the risk of mental health disorders. Moreover, mental health illnesses have also been one of the most serious and prevalent public health problems. For example, depression is a leading cause of disability and can lead to an increased risk for suicidal ideation and suicide attempts 2 .

To better understand the mental health conditions and provide better patient care, early detection of mental health problems is an essential step. Different from the diagnosis of other chronic conditions that rely on laboratory tests and measurements, mental illnesses are typically diagnosed based on an individual’s self-report to specific questionnaires designed for the detection of specific patterns of feelings or social interactions 3 . Due to the increasing availability of data pertaining to an individual’s mental health status, artificial intelligence (AI) and machine learning (ML) technologies are being applied to improve our understanding of mental health conditions and have been engaged to assist mental health providers for improved clinical decision-making 4 , 5 , 6 . As one of the latest advances in AI and ML, deep learning (DL), which transforms the data through layers of nonlinear computational processing units, provides a new paradigm to effectively gain knowledge from complex data 7 . In recent years, DL algorithms have demonstrated superior performance in many data-rich application scenarios, including healthcare 8 , 9 , 10 .

In a previous study, Shatte et al. 11 explored the application of ML techniques in mental health. They reviewed literature by grouping them into four main application domains: diagnosis, prognosis, and treatment, public health, as well as research and clinical administration. In another study, Durstewitz et al. 9 explored the emerging area of application of DL techniques in psychiatry. They focused on DL in the studies of brain dynamics and subjects’ behaviors, and presented the insights of embedding the interpretable computational models into statistical context. In contrast, this study aims to provide a scoping review of the existing research applying DL methodologies on the analysis of different types of data related to mental health conditions. The reviewed articles are organized into four main groups according to the type of the data analyzed, including the following: (1) clinical data, (2) genetic and genomics data, (3) vocal and visual expression data, and (4) social media data. Finally, the challenges the current studies faced with, as well as future research directions towards bridging the gap between the application of DL algorithms and patient care, are discussed.

Deep learning overview

ML aims at developing computational algorithms or statistical models that can automatically infer hidden patterns from data 12 , 13 . Recent years have witnessed an increasing number of ML models being developed to analyze healthcare data 4 . However, conventional ML approaches require a significant amount of feature engineering for optimal performance—a step that is necessary for most application scenarios to obtain good performance, which is usually resource- and time-consuming.

As the newest wave of ML and AI technologies, DL approaches aim at the development of an end-to-end mechanism that maps the input raw features directly into the outputs through a multi-layer network structure that is able to capture the hidden patterns within the data. In this section, we will review several popular DL model architectures, including deep feedforward neural network (DFNN), recurrent neural network (RNN) 14 , convolutional neural network (CNN) 15 , and autoencoder 16 . Figure 1 provides an overview of these architectures.

figure 1

a Deep feedforward neural network (DFNN). It is the basic design of DL models. Commonly, a DFNN contains multiple hidden layers. b A recurrent neural network (RNN) is presented to process sequence data. To encode history information, each recurrent neuron receives the input element and the state vector of the predecessor neuron, and yields a hidden state fed to the successor neuron. For example, not only the individual information but also the dependence of the elements of the sequence x 1  → x 2  → x 3  → x 4  → x 5 is encoded by the RNN architecture. c Convolutional neural network (CNN). Between input layer (e.g., input neuroimage) and output layer, a CNN commonly contains three types of layers: the convolutional layer that is to generate feature maps by sliding convolutional kernels in the previous layer; the pooling layer is used to reduce dimensionality of previous convolutional layer; and the fully connected layer is to make prediction. For the illustrative purpose, this example only has one layer of each type; yet, a real-world CNN would have multiple convolutional and pooling layers (usually in an interpolated manner) and one fully connected layer. d Autoencoder consists of two components: the encoder, which learns to compress the input data into a latent representation layer by layer, whereas the decoder, inverse to the encoder, learns to reconstruct the data at the output layer. The learned compressed representations can be fed to the downstream predictive model.

Deep feedforward neural network

Artificial neural network (ANN) is proposed with the intention of mimicking how human brain works, where the basic element is an artificial neuron depicted in Fig. 2a . Mathematically, an artificial neuron is a nonlinear transformation unit, which takes the weighted summation of all inputs and feeds the result to an activation function, such as sigmoid, rectifier (i.e., rectified linear unit [ReLU]), or hyperbolic tangent (Fig. 2b ). An ANN is composed of multiple artificial neurons with different connection architectures. The simplest ANN architecture is the feedforward neural network (FNN), which stacks the neurons layer by layer in a feedforward manner (Fig. 1a ), where the neurons across adjacent layers are fully connected to each other. The first layer of the FNN is the input layer that each unit receives one dimension of the data vector. The last layer is the output layer that outputs the probabilities that a subject belonging to different classes (in classification). The layers between the input and output layers are the hidden layers. A DFNN usually contains multiple hidden layers. As shown in Fig. 2a , there is a weight parameter associated with each edge in the DFNN, which needs to be optimized by minimizing some training loss measured on a specific training dataset (usually through backpropagation 17 ). After the optimal set of parameters are learned, the DFNN can be used to predict the target value (e.g., class) of any testing data vectors. Therefore, a DFNN can be viewed as an end-to-end process that transforms a specific raw data vector to its target layer by layer. Compared with the traditional ML models, DFNN has shown superior performance in many data mining tasks and have been introduced to the analysis of clinical data and genetic data to predict mental health conditions. We will discuss the applications of these methods further in the Results section.

figure 2

a An illustration of basic unit of neural networks, i.e., artificial neuron. Each input x i is associated with a weight w i . The weighted sum of all inputs Σ w i x i is fed to a nonlinear activation function f to generate the output y j of the j -th neuron, i.e., y j  =  f (Σ w i x i ). b Illustrations of the widely used nonlinear activation function.

Recurrent neural network

RNNs were designed to analyze sequential data such as natural language, speech, and video. Given an input sequence, the RNN processes one element of the sequence at a time by feeding to a recurrent neuron. To encode the historical information along the sequence, each recurrent neuron receives the input element at the corresponding time point and the output of the neuron at previous time stamp, and the output will also be provided to the neuron at next time stamp (this is also where the term “recurrent” comes from). An example RNN architecture is shown in Fig. 1b where the input is a sequence of words (a sentence). The recurrence link (i.e., the edge linking different neurons) enables RNN to capture the latent semantic dependencies among words and the syntax of the sentence. In recent years, different variants of RNN, such as long short-term memory (LSTM) 18 and gated recurrent unit 19 have been proposed, and the main difference among these models is how the input is mapped to the output for the recurrent neuron. RNN models have demonstrated state-of-the-art performance in various applications, especially natural language processing (NLP; e.g., machine translation and text-based classification); hence, they hold great premise in processing clinical notes and social media posts to detect mental health conditions as discussed below.

Convolutional neural network

CNN is a specific type of deep neural network originally designed for image analysis 15 , where each pixel corresponds to a specific input dimension describing the image. Similar to a DFNN, CNN also maps these input image pixels to the corresponding target (e.g., image class) through layers of nonlinear transformations. Different from DFNN, where only fully connected layers are considered, there are typically three types of layers in a CNN: a convolution–activation layer, a pooling layer, and a fully connected layer (Fig. 1c ). The convolution–activation layer first convolves the entire feature map obtained from previous layer with small two-dimensional convolution filters. The results from each convolution filter are activated through a nonlinear activation function in the same way as a DFNN. A pooling layer reduces the size of the feature map through sub-sampling. The fully connected layer is analogous to the hidden layer in a DFNN, where each neuron is connected to all neurons of the previous layer. The convolution–activation layer extracts locally invariant patterns from the feature maps. The pooling layer effectively reduces the feature dimensionality to avoid model overfitting. The fully connected layer explores the global feature interactions as in DFNNs. Different combinations of these three types of layers constitute different CNN architectures. Because of the various characteristics of images such as local self-similarity, compositionality, and translational and deformation invariance, CNN has demonstrated state-of-the-art performance in many computer vision tasks 7 . Hence, the CNN models are promising in processing clinical images and expression data (e.g., facial expression images) to detect mental health conditions. We will discuss the application of these methods in the Results section.

Autoencoder

Autoencoder is a special variant of the DFNN aimed at learning new (usually more compact) data representations that can optimally reconstruct the original data vectors 16 , 20 . An autoencoder typically consists of two components (Fig. 1d ) as follows: (1) the encoder, which learns new representations (usually with reduced dimensionality) from the input data through a multi-layer FNN; and (2) the decoder, which is exactly the reverse of the encoder, reconstructs the data in their original space from the representations derived from the encoder. The parameters in the autoencoder are learned through minimizing the reconstruction loss. Autoencoder has demonstrated the capacity of extracting meaningful features from raw data without any supervision information. In the studies of mental health outcomes, the use of autoencoder has resulted in desirable improvement in analyzing clinical and expression image data, which will be detailed in the Results section.

The processing and reporting of the results of this review were guided by the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines 21 . To thoroughly review the literature, a two-step method was used to retrieve all the studies on relevant topics. First, we conducted a search of the computerized bibliographic databases including PubMed and Web of Science. The search strategy is detailed in Supplementary Appendix 1 . The literature search comprised articles published until April 2019. Next, a snowball technique was applied to identify additional studies. Furthermore, we manually searched other resources, including Google Scholar, and Institute of Electrical and Electronics Engineers (IEEE Xplore), to find additional relevant articles.

Figure 3 presents the study selection process. All articles were evaluated carefully and studies were excluded if: (1) the main outcome is not a mental health condition; (2) the model involved is not a DL algorithm; (3) full-text of the article is not accessible; and (4) the article is written not in English.

figure 3

In total, 57 studies, in terms of clinical data analysis, genetic data analysis, vocal and visual expression data analysis, and social media data analysis, which met our eligibility criteria, were included in this review.

A total of 57 articles met our eligibility criteria. Most of the reviewed articles were published between 2014 and 2019. To clearly summarize these articles, we grouped them into four categories according to the types of data analyzed, including (1) clinical data, (2) genetic and genomics data, (3) vocal and visual expression data, and (4) social media data. Table 1 summarizes the characteristics of these selected studies.

Clinical data

Neuroimages.

Previous studies have shown that neuroimages can record evidence of neuropsychiatric disorders 22 , 23 . Two common types of neuroimage data analyzed in mental health studies are functional magnetic resonance imaging (fMRI) and structural MRI (sMRI) data. In fMRI data, the brain activity is measured by identification of the changes associated with blood flow, based on the fact that cerebral blood flow and neuronal activation are coupled 24 . In sMRI data, the neurological aspect of a brain is described based on the structural textures, which show some information in terms of the spatial arrangements of voxel intensities in 3D. Recently, DL technologies have been demonstrated in analyzing both fMRI and sMRI data.

One application of DL in fMRI and sMRI data is the identification of ADHD 25 , 26 , 27 , 28 , 29 , 30 , 31 . To learn meaningful information from the neuroimages, CNN and deep belief network (DBN) models were used. In particular, the CNN models were mainly used to identify local spatial patterns and DBN models were to obtain a deep hierarchical representation of the neuroimages. Different patterns were discovered between ADHDs and controls in the prefrontal cortex and cingulated cortex. Also, several studies analyzed sMRIs to investigate schizophrenia 32 , 33 , 34 , 35 , 36 , where DFNN, DBN, and autoencoder were utilized. These studies reported abnormal patterns of cortical regions and cortical–striatal–cerebellar circuit in the brain of schizophrenia patients, especially in the frontal, temporal, parietal, and insular cortices, and in some subcortical regions, including the corpus callosum, putamen, and cerebellum. Moreover, the use of DL in neuroimages also targeted at addressing other mental health disorders. Geng et al. 37 proposed to use CNN and autoencoder to acquire meaningful features from the original time series of fMRI data for predicting depression. Two studies 31 , 38 integrated the fMRI and sMRI data modalities to develop predictive models for ASDs. Significant relationships between fMRI and sMRI data were observed with regard to ASD prediction.

Challenges and opportunities

The aforementioned studies have demonstrated that the use of DL techniques in analyzing neuroimages can provide evidence in terms of mental health problems, which can be translated into clinical practice and facilitate the diagnosis of mental health illness. However, multiple challenges need to be addressed to achieve this objective. First, DL architectures generally require large data samples to train the models, which may pose a difficulty in neuroimaging analysis because of the lack of such data 39 . Second, typically the imaging data lie in a high-dimensional space, e.g., even a 64 × 64 2D neuroimage can result in 4096 features. This leads to the risk of overfitting by the DL models. To address this, most existing studies reported to utilize MRI data preprocessing tools such as Statistical Parametric Mapping ( https://www.fil.ion.ucl.ac.uk/spm/ ), Data Processing Assistant for Resting-State fMRI 40 , and fMRI Preprocessing Pipeline 41 to extract useful features before feeding to the DL models. Even though an intuitive attribute of DL is the capacity to learn meaningful features from raw data, feature engineering tools are needed especially in the case of small sample size and high-dimensionality, e.g., the neuroimage analysis. The use of such tools mitigates the overfitting risk of DL models. As reported in some selected studies 28 , 31 , 35 , 37 , the DL models can benefit from feature engineering techniques and have been shown to outperform the traditional ML models in the prediction of multiple conditions such as depression, schizophrenia, and ADHD. However, such tools extract features relying on prior knowledge; hence may omit some information that is meaningful for mental outcome research but unknown yet. An alternative way is to use CNN to automatically extract information from the raw data. As reported in the previous study 10 , CNNs perform well in processing raw neuroimage data. Among the studies reviewed in this study, three 29 , 30 , 37 reported to involve CNN layers and achieved desirable performances.

Electroencephalogram data

As a low-cost, small-size, and high temporal resolution signal containing up to several hundred channels, analysis of electroencephalogram (EEG) data has gained significant attention to study brain disorders 42 . As the EEG signal is one kind of streaming data that presents a high density and continuous characteristics, it challenges traditional feature engineering-based methods to obtain sufficient information from the raw EEG data to make accurate predictions. To address this, recently the DL models have been employed to analyze raw EEG signal data.

Four articles reviewed proposed to use DL in understanding mental health conditions based on the analysis of EEG signals. Acharya et al. 43 used CNN to extract features from the input EEG signals. They found that the EEG signals from the right hemisphere of the human brain are more distinctive in terms of the detection of depression than those from the left hemisphere. The findings provided shreds of evidence that depression is associated with a hyperactive right hemisphere. Mohan et al. 44 modeled the raw EEG signals by DFNN to obtain information about the human brain waves. They found that the signals collected from the central (C3 and C4) regions are marginally higher compared with other brain regions, which can be used to distinguish the depressed and normal subjects from the brain wave signals. Zhang et al. 45 proposed a concatenated structure of deep recurrent and 3D CNN to obtain EEG features across different tasks. They reported that the DL model can capture the spectral changes of EEG hemispheric asymmetry to distinguish different mental workload effectively. Li et al. 46 presented a computer-aided detection system by extracting multiple types of information (e.g., spectral, spatial, and temporal information) to recognize mild depression based on CNN architecture. The authors found that both spectral and temporal information of EEG are crucial for prediction of depression.

EEG data are usually classified as streaming data that are continuous and are of high density. Despite the initial success in applying DL algorithms to analyze EEG data for studying multiple mental health conditions, there exist several challenges. One major challenge is that raw EEG data gathered from sensors have a certain degree of erroneous, noisy, and redundant information caused by discharged batteries, failures in sensor readings, and intermittent communication loss in wireless sensor networks 47 . This may challenge the model in extracting meaningful information from noise. Multiple preprocessing steps (e.g., data denoising, data interpolation, data transformation, and data segmentation) are necessary for dealing with the raw EEG signal before feeding to the DL models. Besides, due to the dense characteristics in the raw EEG data, analysis of the streaming data is computationally more expensive, which poses a challenge for the model architecture selection. A proper model should be designed relatively with less training parameters. This is one reason why the reviewed studies are mainly based on the CNN architecture.

Electronic health records

Electronic health records (EHRs) are systematic collections of longitudinal, patient-centered records. Patients’ EHRs consist of both structured and unstructured data: the structured data include information about a patient’s diagnosis, medications, and laboratory test results, and the unstructured data include information in clinical notes. Recently, DL models have been applied to analyze EHR data to study mental health disorders 48 .

The first and foremost issue for analyzing the structured EHR data is how to appropriately handle the longitudinal records. Traditional ML models address this by collapsing patients’ records within a certain time window into vectors, which comprised the summary of statistics of the features in different dimensions 49 . For instance, to estimate the probability of suicide deaths, Choi et al. 50 leveraged a DFNN to model the baseline characteristics. One major limitation of these studies is the omittance of temporality among the clinical events within EHRs. To overcome this issue, RNNs are more commonly used for EHR data analysis as an RNN intuitively handles time-series data. DeepCare 51 , a long short-term memory network (LSTM)-based DL model, encodes patient’s long-term health state trajectories to predict the future outcomes of depressive episodes. As the LSTM architecture appropriately captures disease progression by modeling the illness history and the medical interventions, DeepCare achieved over 15% improvement in prediction, compared with the conventional ML methods. In addition, Lin et al. 52 designed two DFNN models for the prediction of antidepressant treatment response and remission. The authors reported that the proposed DFNN can achieve an area under the receiver operating characteristic curve (AUC) of 0.823 in predicting antidepressant response.

Analyzing the unstructured clinical notes in EHRs refers to the long-standing topic of NLP. To extract meaningful knowledge from the text, conventional NLP approaches mostly define rules or regular expressions before the analysis. However, it is challenging to enumerate all possible rules or regular expressions. Due to the recent advance of DL in NLP tasks, DL models have been developed to mine clinical text data from EHRs to study mental health conditions. Geraci et al. 53 utilized term frequency-inverse document frequency to represent the clinical documents by words and developed a DFNN model to identify individuals with depression. One major limitation of such an approach is that the semantics and syntax of sentences are lost. In this context, CNN 54 and RNN 55 have shown superiority in modeling syntax for text-based prediction. In particular, CNN has been used to mine the neuropsychiatric notes for predicting psychiatric symptom severity 56 , 57 . Tran and Kavuluru 58 used an RNN to analyze the history of present illness in neuropsychiatric notes for predicting mental health conditions. The model engaged an attention mechanism 55 , which can specify the importance of the words in prediction, making the model more interpretable than their previous CNN model 56 .

Although DL has achieved promising results in EHR analysis, several challenges remain unsolved. On one hand, different from diagnosing physical health condition such as diabetes, the diagnosis of mental health conditions lacks direct quantitative tests, such as a blood chemistry test, a buccal swab, or urinalysis. Instead, the clinicians evaluate signs and symptoms through patient interviews and questionnaires during which they gather information based on patient’s self-report. Collection and deriving inferences from such data deeply relies on the experience and subjectivity of the clinician. This may account for signals buried in noise and affect the robustness of the DL model. To address this challenge, a potential way is to comprehensively integrate multimodal clinical information, including structured and unstructured EHR information, as well as neuroimaging and EEG data. Another way is to incorporate existing medical knowledge, which can guide model being trained in the right direction. For instance, the biomedical knowledge bases contain massive verified interactions between biomedical entities, e.g., diseases, genes, and drugs 59 . Incorporating such information brings in meaningful medical constraints and may help to reduce the effects of noise on model training process. On the other hand, implementing a DL model trained from one EHR system into another system is challenging, because EHR data collection and representation is rarely standardized across hospitals and clinics. To address this issue, national/international collaborative efforts such as Observational Health Data Sciences and Informatics ( https://ohdsi.org ) have developed common data models, such as OMOP, to standardize EHR data representation for conducting observational data analysis 60 .

Genetic data

Multiple studies have found that mental disorders, e.g., depression, can be associated with genetic factors 61 , 62 . Conventional statistical studies in genetics and genomics, such as genome-wide association studies, have identified many common and rare genetic variants, such as single-nucleotide polymorphisms (SNPs), associated with mental health disorders 63 , 64 . Yet, the effect of the genetic factors is small and many more have not been discovered. With the recent developments in next-generation sequencing techniques, a massive volume of high-throughput genome or exome sequencing data are being generated, enabling researchers to study patients with mental health disorders by examining all types of genetic variations across an individual’s genome. In recent years, DL 65 , 66 has been applied to identify genetic risk factors associated with mental illness, by borrowing the capacity of DL in identifying highly complex patterns in large datasets. Khan and Wang 67 integrated genetic annotations, known brain expression quantitative trait locus, and enhancer/promoter peaks to generate feature vectors of variants, and developed a DFNN, named ncDeepBrain, to prioritized non-coding variants associated with mental disorders. To further prioritize susceptibility genes, they designed another deep model, iMEGES 68 , which integrates the ncDeepBrain score, general gene scores, and disease-specific scores for estimating gene risk. Wang et al. 69 developed a novel deep architecture that combines deep Boltzmann machine architecture 70 with conditional and lateral connections derived from the gene regulatory network. The model provided insights about intermediate phenotypes and their connections to high-level phenotypes (disease traits). Laksshman et al. 71 used exome sequencing data to predict bipolar disorder outcomes of patients. They developed a CNN and used the convolution mechanism to capture correlations of the neighboring loci within the chromosome.

Although the use of genetic data in DL in studying mental health conditions shows promise, multiple challenges need to be addressed. For DL-based risk c/gene prioritization efforts, one major challenge is the limitation of labeled data. On one hand, the positive samples are limited, as known risk SNPs or genes associated with mental health conditions are limited. For example, there are about 108 risk loci that were genome-wide significant in ASD. On the other hand, the negative samples (i.e., SNPs, variants, or genes) may not be the “true” negative, as it is unclear whether they are associated with the mental illness yet. Moreover, it is also challenging to develop DL models for analyzing patient’s sequencing data for mental illness prediction, as the sequencing data are extremely high-dimensional (over five million SNPs in the human genome). More prior domain knowledge is needed to guide the DL model extracting patterns from the high-dimensional genomic space.

Vocal and visual expression data

The use of vocal (voice or speech) and visual (video or image of facial or body behaviors) expression data has gained the attention of many studies in mental health disorders. Modeling the evolution of people’s emotional states from these modalities has been used to identify mental health status. In essence, the voice data are continuous and dense signals, whereas the video data are sequences of frames, i.e., images. Conventional ML models for analyzing such types of data suffer from the sophisticated feature extraction process. Due to the recent success of applying DL in computer vision and sequence data modeling, such models have been introduced to analyze the vocal and/or visual expression data. In this work, most articles reviewed are to predict mental health disorders based on two public datasets: (i) the Chi-Mei corpus, collected by using six emotional videos to elicit facial expressions and speech responses of the subjects of bipolar disorder, unipolar depression, and healthy controls; 72 and (ii) the International Audio/Visual Emotion Recognition Challenges (AVEC) depression dataset 73 , 74 , 75 , collected within human–computer interaction scenario. The proposed models include CNNs, RNNs, autoencoders, as well as hybrid models based on the above ones. In particular, CNNs were leveraged to encode the temporal and spectral features from the voice signals 76 , 77 , 78 , 79 , 80 and static facial or physical expression features from the video frames 79 , 81 , 82 , 83 , 84 . Autoencoders were used to learn low-dimensional representations for people’s vocal 85 , 86 and visual expression 87 , 88 , and RNNs were engaged to characterize the temporal evolution of emotion based on the CNN-learned features and/or other handcraft features 76 , 81 , 84 , 85 , 86 , 87 , 88 , 89 , 90 . Few studies focused on analyzing static images using a CNN architecture to predict mental health status. Prasetio et al. 91 identified the stress types (e.g., neutral, low stress, and high stress) from facial frontal images. Their proposed CNN model outperforms the conventional ML models by 7% in terms of prediction accuracy. Jaiswal et al. 92 investigated the relationship between facial expression/gestures and neurodevelopmental conditions. They reported accuracy over 0.93 in the diagnostic prediction of ADHD and ASD by using the CNN architecture. In addition, thermal images that track persons’ breathing patterns were also fed to a deep model to estimate psychological stress level (mental overload) 93 .

From the above summary, we can observe that analyzing vocal and visual expression data can capture the pattern of subjects’ emotion evolution to predict mental health conditions. Despite the promising initial results, there remain challenges for developing DL models in this field. One major challenge is to link vocal and visual expression data with the clinical data of patients, given the difficulties involved in collecting such expression data during clinical practice. Current studies analyzed vocal and visual expression over individual datasets. Without clinical guidance, the developed prediction models have limited clinical meanings. Linking patients’ expression information with clinical variables may help to improve both the interpretability and robustness of the model. For example, Gupta et al. 94 designed a DFNN for affective prediction from audio and video modalities. The model incorporated depression severity as the parameter, linking the effects of depression on subjects’ affective expressions. Another challenge is the limitation of the samples. For example, the Chi-Mei dataset contains vocal–visual data from only 45 individuals (15 with bipolar disorder, 15 with unipolar disorder, and 15 healthy controls). Also, there is a lack of “emotion labels” for people’s vocal and visual expression. Apart from improving the datasets, an alternative way to solve this challenge is to use transfer learning, which transfers knowledge gained with one dataset (usually more general) to the target dataset. For example, some studies trained autoencoder in public emotion database such as eNTERFACE 95 to generate emotion profiles (EPs). Other studies 83 , 84 pre-trained CNN over general facial expression datasets 96 , 97 for extracting face appearance features.

Social media data

With the widespread proliferation of social media platforms, such as Twitter and Reddit, individuals are increasingly and publicly sharing information about their mood, behavior, and any ailments one might be suffering. Such social media data have been used to identify users’ mental health state (e.g., psychological stress and suicidal ideation) 6 .

In this study, the articles that used DL to analyze social media data mainly focused on stress detection 98 , 99 , 100 , 101 , depression identification 102 , 103 , 104 , 105 , 106 , and estimation of suicide risk 103 , 105 , 107 , 108 , 109 . In general, the core concept across these work is to mine the textual, and where applicable graphical, content of users’ social media posts to discover cues for mental health disorders. In this context, the RNN and CNN were largely used by the researchers. Especially, RNN usually introduces an attention mechanism to specify the importance of the input elements in the classification process 55 . This provides some interpretability for the predictive results. For example, Ive et al. 103 proposed a hierarchical RNN architecture with an attention mechanism to predict the classes of the posts (including depression, autism, suicidewatch, anxiety, etc.). The authors observed that, benefitting from the attention mechanism, the model can predict risk text efficiently and extract text elements crucial for making decisions. Coppersmith et al. 107 used LSTM to discover quantifiable signals about suicide attempts based on social media posts. The proposed model can capture contextual information between words and obtain nuances of language related to suicide.

Apart from text, users also post images on social media. The properties of the images (e.g., color theme, saturation, and brightness) provide some cues reflecting users’ mental health status. In addition, millions of interactions and relationships among users can reflect the social environment of individuals that is also a kind of risk factors for mental illness. An increasing number of studies attempted to combine these two types of information with text content for predictive modeling. For example, Lin et al. 99 leveraged the autoencoder to extract low-level and middle-level representations from texts, images, and comments based on psychological and art theories. They further extended their work with a hybrid model based on CNN by integrating post content and social interactions 101 . The results provided an implication that the social structure of the stressed users’ friends tended to be less connected than that of the users without stress.

The aforementioned studies have demonstrated that using social media data has the potential to detect users with mental health problems. However, there are multiple challenges towards the analysis of social media data. First, given that social media data are typically de-identified, there is no straightforward way to confirm the “true positives” and “true negatives” for a given mental health condition. Enabling the linkage of user’s social media data with their EHR data—with appropriate consent and privacy protection—is challenging to scale, but has been done in a few settings 110 . In addition, most of the previous studies mainly analyzed textual and image data from social media platforms, and did not consider analyzing the social network of users. In one study, Rosenquist et al. 111 reported that the symptoms of depression are highly correlated inside the circle of friends, indicating that social network analysis is likely to be a potential way to study the prevalence of mental health problems. However, comprehensively modeling text information and network structure remains challenging. In this context, graph convolutional networks 112 have been developed to address networked data mining. Moreover, although it is possible to discover online users with mental illness by social media analysis, translation of this innovation into practical applications and offer aid to users, such as providing real-time interventions, are largely needed 113 .

Discussion: findings, open issues, and future directions

Principle findings.

The purpose of this study is to investigate the current state of applications of DL techniques in studying mental health outcomes. Out of 2261 articles identified based on our search terms, 57 studies met our inclusion criteria and were reviewed. Some studies that involved DL models but did not highlight the DL algorithms’ features on analysis were excluded. From the above results, we observed that there are a growing number of studies using DL models for studying mental health outcomes. Particularly, multiple studies have developed disease risk prediction models using both clinical and non-clinical data, and have achieved promising initial results.

DL models “think to learn” like a human brain relying on their multiple layers of interconnected computing neurons. Therefore, to train a deep neural network, there are multiple parameters (i.e., weights associated links between neurons within the network) being required to learn. This is one reason why DL has achieved great success in the fields where a massive volume of data can be easily collected, such as computer vision and text mining. Yet, in the health domain, the availability of large-scale data is very limited. For most selected studies in this review, the sample sizes are under a scale of 10 4 . Data availability is even more scarce in the fields of neuroimaging, EEG, and gene expression data, as such data reside in a very high-dimensional space. This then leads to the problem of “curse of dimensionality” 114 , which challenges the optimization of the model parameters.

One potential way to address this challenge is to reduce the dimensionality of the data by feature engineering before feeding information to the DL models. On one hand, feature extraction approaches can be used to obtain different types of features from the raw data. For example, several studies reported in this review have attempted to use preprocessing tools to extract features from neuroimaging data. On the other hand, feature selection that is commonly used in conventional ML models is also an option to reduce data dimensionality. However, the feature selection approaches are not often used in the DL application scenario, as one of the intuitive attributes of DL is the capacity to learn meaningful features from “all” available data. The alternative way to address the issue of data bias is to use transfer learning where the objective is to improve learning a new task through the transfer of knowledge from a related task that has already been learned 115 . The basic idea is that data representations learned in the earlier layers are more general, whereas those learned in the latter layers are more specific to the prediction task 116 . In particular, one can first pre-train a deep neural network in a large-scale “source” dataset, then stack fully connected layers on the top of the network and fine-tune it in the small “target” dataset in a standard backpropagation manner. Usually, samples in the “source” dataset are more general (e.g., general image data), whereas those in the “target” dataset are specific to the task (e.g., medical image data). A popular example of the success of transfer learning in the health domain is the dermatologist-level classification of skin cancer 117 . The authors introduced Google’s Inception v3 CNN architecture pre-trained over 1.28 million general images and fine-tuned in the clinical image dataset. The model achieved very high-performance results of skin cancer classification in epidermal (AUC = 0.96), melanocytic (AUC = 0.96), and melanocytic–dermoscopic images (AUC = 0.94). In facial expression-based depression prediction, Zhu et al. 83 pre-trained CNN on the public face recognition dataset to model the static facial appearance, which overcomes the issue that there is no facial expression label information. Chao et al. 84 also pre-trained CNN to encode facial expression information. The transfer scheme of both of the two studies has been demonstrated to be able to improve the prediction performance.

Diagnosis and prediction issues

Unlike the diagnosis of physical conditions that can be based on lab tests, diagnoses of the mental illness typically rely on mental health professionals’ judgment and patient self-report data. As a result, such a diagnostic system may not accurately capture the psychological deficits and symptom progression to provide appropriate therapeutic interventions 118 , 119 . This issue accordingly accounts for the limitation of the prediction models to assist clinicians to make decisions. Except for several studies using the unsupervised autoencoder for learning low-dimensional representations, most studies reviewed in this study reported using supervised DL models, which need the training set containing “true” (i.e., expert provided) labels to optimize the model parameters before the model being used to predict labels of new subjects. Inevitably, the quality of the expert-provided diagnostic labels used for training sets the upper-bound for the prediction performance of the model.

One intuitive route to address this issue is to use an unsupervised learning scheme that, instead of learning to predict clinical outcomes, aims at learning compacted yet informative representations of the raw data. A typical example is the autoencoder (as shown in Fig. 1d ), which encodes the raw data into a low-dimensional space, from which the raw data can be reconstructed. Some studies reviewed have proposed to leverage autoencoder to improve our understanding of mental health outcomes. A constraint of the autoencoder is that the input data should be preprocessed to vectors, which may lead to information loss for image and sequence data. To address this, recently convolutional-autoencoder 120 and LSTM-autoencoder 121 have been developed, which integrate the convolution layers and recurrent layers with the autoencoder architecture and enable us to learn informative low-dimensional representations from the raw image data and sequence data, respectively. For instance, Baytas et al. 122 developed a variation of LSTM-autoencoder on patient EHRs and grouped Parkinson’s disease patients into meaningful subtypes. Another potential way is to predict other clinical outcomes instead of the diagnostic labels. For example, several selected studies proposed to predict symptom severity scores 56 , 57 , 77 , 82 , 84 , 87 , 89 . In addition, Du et al. 108 attempted to identify suicide-related psychiatric stressors from users’ posts on Twitter, which plays an important role in the early prevention of suicidal behaviors. Furthermore, training model to predict future outcomes such as treatment response, emotion assessments, and relapse time is also a promising future direction.

Multimodal modeling

The field of mental health is heterogeneous. On one hand, mental illness refers to a variety of disorders that affect people’s emotions and behaviors. On the other hand, though the exact causes of most mental illnesses are unknown to date, it is becoming increasingly clear that the risk factors for these diseases are multifactorial as multiple genetic, environmental, and social factors interact to influence an individual’s mental health 123 , 124 . As a result of domain heterogeneity, researchers have the chance to study the mental health problems from different perspectives, from molecular, genomic, clinical, medical imaging, physiological signal to facial, and body expressive and online behavioral. Integrative modeling of such multimodal data means comprehensively considering different aspects of the disease, thus likely obtaining deep insight into mental health. In this context, DL models have been developed for multimodal modeling. As shown in Fig. 4 , the hierarchical structure of DL makes it easily compatible with multimodal integration. In particular, one can model each modality with a specific network and combine them by the final fully connected layers, such that parameters can be jointly learned by a typical backpropagation manner. In this review, we found an increasing number of studies have attempted to use multimodal modeling. For example, Zou et al. 28 developed a multimodal model composed of two CNNs for modeling fMRI and sMRI modalities, respectively. The model achieved 69.15% accuracy in predicting ADHD, which outperformed the unimodal models (66.04% for fMRI modal-based and 65.86% for sMRI modal-based). Yang et al. 79 proposed a multimodal model to combine vocal and visual expression for depression cognition. The model results in 39% lower prediction error than the unimodal models.

figure 4

One can model each modality with a specific network and combine them using the final fully-connected layers. In this way, parameters of the entire neural network can be jointly learned in a typical backpropagation manner.

Model interpretability

Due to the end-to-end design, the DL models usually appear to be “black boxes”: they take raw data (e.g., MRI images, free-text of clinical notes, and EEG signals) as input, and yield output to reach a conclusion (e.g., the risk of a mental health disorder) without clear explanations of their inner working. Although this might not be an issue in other application domains such as identifying animals from images, in health not only the model’s prediction performance but also the clues for making the decision are important. For example in the neuroimage-based depression identification, despite estimation of the probability that a patient suffers from mental health deficits, the clinicians would focus more on recognizing abnormal regions or patterns of the brain associated with the disease. This is really important for convincing the clinical experts about the actions recommended from the predictive model, as well as for guiding appropriate interventions. In addition, as discussed above, the introduction of multimodal modeling leads to an increased challenge in making the models more interpretable. Attempts have been made to open the “black box” of DL 59 , 125 , 126 , 127 . Currently, there are two general directions for interpretable modeling: one is to involve the systematic modification of the input and the measure of any resulting changes in the output, as well as in the activation of the artificial neurons in the hidden layers. Such a strategy is usually used in CNN in identifying specific regions of an image being captured by a convolutional layer 128 . Another way is to derive tools to determine the contribution of one or more features of the input data to the output. In this case, the widely used tools include Shapley Additive Explanation 129 , LIME 127 , DeepLIFT 130 , etc., which are able to assign each feature an importance score for the specific prediction task.

Connection to therapeutic interventions

According to the studies reviewed, it is now possible to detect patients with mental illness based on different types of data. Compared with the traditional ML techniques, most of the reviewed DL models reported higher prediction accuracy. The findings suggested that the DL models are likely to assist clinicians in improved diagnosis of mental health conditions. However, to associate diagnosis of a condition with evidence-based interventions and treatment, including identification of appropriate medication 131 , prediction of treatment response 52 , and estimation of relapse risk 132 still remains a challenge. Among the reviewed studies, only one 52 proposed to target at addressing these issues. Thus, further efforts are needed to link the DL techniques with the therapeutic intervention of mental illness.

Domain knowledge

Another important direction is to incorporate domain knowledge. The existing biomedical knowledge bases are invaluable sources for solving healthcare problems 133 , 134 . Incorporating domain knowledge could address the limitation of data volume, problems of data quality, as well as model generalizability. For example, the unified medical language system 135 can help to identify medical entities from the text and gene–gene interaction databases 136 could help to identify meaningful patterns from genomic profiles.

Recent years have witnessed the increasing use of DL algorithms in healthcare and medicine. In this study, we reviewed existing studies on DL applications to study mental health outcomes. All the results available in the literature reviewed in this work illustrate the applicability and promise of DL in improving the diagnosis and treatment of patients with mental health conditions. Also, this review highlights multiple existing challenges in making DL algorithms clinically actionable for routine care, as well as promising future directions in this field.

World Health Organization. The World Health Report 2001: Mental Health: New Understanding, New Hope (World Health Organization, Switzerland, 2001).

Google Scholar  

Marcus, M., Yasamy, M. T., van Ommeren, M., Chisholm, D. & Saxena, S. Depression: A Global Public Health Concern (World Federation of Mental Health, World Health Organisation, Perth, 2012).

Hamilton, M. Development of a rating scale for primary depressive illness. Br. J. Soc. Clin. Psychol. 6 , 278–296 (1967).

CAS   PubMed   Google Scholar  

Dwyer, D. B., Falkai, P. & Koutsouleris, N. Machine learning approaches for clinical psychology and psychiatry. Annu. Rev. Clin. Psychol. 14 , 91–118 (2018).

PubMed   Google Scholar  

Lovejoy, C. A., Buch, V. & Maruthappu, M. Technology and mental health: the role of artificial intelligence. Eur. Psychiatry 55 , 1–3 (2019).

Wongkoblap, A., Vadillo, M. A. & Curcin, V. Researching mental health disorders in the era of social media: systematic review. J. Med. Internet Res. 19 , e228 (2017).

PubMed   PubMed Central   Google Scholar  

LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521 , 436 (2015).

Miotto, R., Wang, F., Wang, S., Jiang, X. & Dudley, J. T. Deep learning for healthcare: review, opportunities and challenges. Brief. Bioinformatics 19 , 1236–1246 (2017).

Durstewitz, D., Koppe, G. & Meyer-Lindenberg, A. Deep neural networks in psychiatry. Mol. Psychiatry 24 , 1583–1598 (2019).

Vieira, S., Pinaya, W. H. & Mechelli, A. Using deep learning to investigate the neuroimaging correlates of psychiatric and neurological disorders: methods and applications. Neurosci. Biobehav. Rev. 74 , 58–75 (2017).

Shatte, A. B., Hutchinson, D. M. & Teague, S. J. Machine learning in mental health: a scoping review of methods and applications. Psychol. Med. 49 , 1426–1448 (2019).

Murphy, K. P. Machine Learning: A Probabilistic Perspective (MIT Press, Cambridge, 2012).

Biship, C. M. Pattern Recognition and Machine Learning (Information Science and Statistics) (Springer-Verlag, Berlin, 2007).

Bengio, Y., Simard, P. & Frasconi, P. Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. Learn. Syst. 5 , 157–166 (1994).

CAS   Google Scholar  

LeCun, Y., Bottou, L., Bengio, Y. & Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 86 , 2278–2324 (1998).

Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y. & Manzagol, P. A. Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11 , 3371–3408 (2010).

Rumelhart, D. E., Hinton, G. E. & Williams, R. J. Learning representations by back-propagating errors. Cogn. modeling. 5 , 1 (1988).

Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput. 9 , 1735–1780 (1997).

Cho, K., Van Merriënboer, B., Bahdanau, D. & Bengio, Y. On the properties of neural machine translation: encoder-decoder approaches. In Proc . SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation 103–111 (Doha, Qatar, 2014).

Liou, C., Cheng, W., Liou, J. & Liou, D. Autoencoder for words. Neurocomputing 139 , 84–96 (2014).

Moher, D., Liberati, A., Tetzlaff, J. & Altman, D. G. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. Ann. Intern. Med. 151 , 264–269 (2009).

Schnack, H. G. et al. Can structural MRI aid in clinical classification? A machine learning study in two independent samples of patients with schizophrenia, bipolar disorder and healthy subjects. Neuroimage 84 , 299–306 (2014).

O’Toole, A. J. et al. Theoretical, statistical, and practical perspectives on pattern-based classification approaches to the analysis of functional neuroimaging data. J. Cogn. Neurosci. 19 , 1735–1752 (2007).

Logothetis, N. K., Pauls, J., Augath, M., Trinath, T. & Oeltermann, A. Neurophysiological investigation of the basis of the fMRI signal. Nature 412 , 150 (2001).

Kuang, D. & He, L. Classification on ADHD with deep learning. In Proc . Int. Conference on Cloud Computing and Big Data 27–32 (Wuhan, China, 2014).

Kuang, D., Guo, X., An, X., Zhao, Y. & He, L. Discrimination of ADHD based on fMRI data with deep belief network. In Proc . Int. Conference on Intelligent Computing 225–232 (Taiyuan, China, 2014).

Farzi, S., Kianian, S. & Rastkhadive, I. Diagnosis of attention deficit hyperactivity disorder using deep belief network based on greedy approach. In Proc . 5th Int. Symposium on Computational and Business Intelligence 96–99 (Dubai, United Arab Emirates, 2017).

Zou, L., Zheng, J. & McKeown, M. J. Deep learning based automatic diagnoses of attention deficit hyperactive disorder. In Proc . 2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP) 962–966 (Montreal, Canada, 2017).

Riaz A. et al. Deep fMRI: an end-to-end deep network for classification of fMRI data. In Proc . 2018 IEEE 15th Int. Symposium on Biomedical Imaging . 1419–1422 (Washington, DC, USA, 2018).

Zou, L., Zheng, J., Miao, C., Mckeown, M. J. & Wang, Z. J. 3D CNN based automatic diagnosis of attention deficit hyperactivity disorder using functional and structural MRI. IEEE Access. 5 , 23626–23636 (2017).

Sen, B., Borle, N. C., Greiner, R. & Brown, M. R. A general prediction model for the detection of ADHD and Autism using structural and functional MRI. PLoS ONE 13 , e0194856 (2018).

Zeng, L. et al. Multi-site diagnostic classification of schizophrenia using discriminant deep learning with functional connectivity MRI. EBioMedicine 30 , 74–85 (2018).

Pinaya, W. H. et al. Using deep belief network modelling to characterize differences in brain morphometry in schizophrenia. Sci. Rep. 6 , 38897 (2016).

CAS   PubMed   PubMed Central   Google Scholar  

Pinaya, W. H., Mechelli, A. & Sato, J. R. Using deep autoencoders to identify abnormal brain structural patterns in neuropsychiatric disorders: a large-scale multi-sample study. Hum. Brain Mapp. 40 , 944–954 (2019).

Ulloa, A., Plis, S., Erhardt, E. & Calhoun, V. Synthetic structural magnetic resonance image generator improves deep learning prediction of schizophrenia. In Proc . 25th IEEE Int. Workshop on Machine Learning for Signal Processing (MLSP) 1–6 (Boston, MA, USA, 2015).

Matsubara, T., Tashiro, T. & Uehara, K. Deep neural generative model of functional MRI images for psychiatric disorder diagnosis. IEEE Trans. Biomed. Eng . 99 (2019).

Geng, X. & Xu, J. Application of autoencoder in depression diagnosis. In 2017 3rd Int. Conference on Computer Science and Mechanical Automation (Wuhan, China, 2017).

Aghdam, M. A., Sharifi, A. & Pedram, M. M. Combination of rs-fMRI and sMRI data to discriminate autism spectrum disorders in young children using deep belief network. J. Digit. Imaging 31 , 895–903 (2018).

Shen, D., Wu, G. & Suk, H. -I. Deep learning in medical image analysis. Annu. Rev. Biomed. Eng. 19 , 221–248 (2017).

Yan, C. & Zang, Y. DPARSF: a MATLAB toolbox for “pipeline” data analysis of resting-state fMRI. Front. Syst. Neurosci. 4 , 13 (2010).

Esteban, O. et al. fMRIPrep: a robust preprocessing pipeline for functional MRI. Nat. Methods 16 , 111–116 (2019).

Herrmann, C. & Demiralp, T. Human EEG gamma oscillations in neuropsychiatric disorders. Clin. Neurophysiol. 116 , 2719–2733 (2005).

Acharya, U. R. et al. Automated EEG-based screening of depression using deep convolutional neural network. Comput. Meth. Prog. Biol. 161 , 103–113 (2018).

Mohan, Y., Chee, S. S., Xin, D. K. P. & Foong, L. P. Artificial neural network for classification of depressive and normal. In EEG Proc . 2016 IEEE EMBS Conference on Biomedical Engineering and Sciences 286–290 (Kuala Lumpur, Malaysia, 2016).

Zhang, P., Wang, X., Zhang, W. & Chen, J. Learning spatial–spectral–temporal EEG features with recurrent 3D convolutional neural networks for cross-task mental workload assessment. IEEE Trans. Neural Syst. Rehabil. Eng. 27 , 31–42 (2018).

Li, X. et al. EEG-based mild depression recognition using convolutional neural network. Med. Biol. Eng. Comput . 47 , 1341–1352 (2019).

Patel, S., Park, H., Bonato, P., Chan, L. & Rodgers, M. A review of wearable sensors and systems with application in rehabilitation. J. Neuroeng. Rehabil. 9 , 21 (2012).

Smoller, J. W. The use of electronic health records for psychiatric phenotyping and genomics. Am. J. Med. Genet. B Neuropsychiatr. Genet. 177 , 601–612 (2018).

Wu, J., Roy, J. & Stewart, W. F. Prediction modeling using EHR data: challenges, strategies, and a comparison of machine learning approaches. Med. Care. 48 , S106–S113 (2010).

Choi, S. B., Lee, W., Yoon, J. H., Won, J. U. & Kim, D. W. Ten-year prediction of suicide death using Cox regression and machine learning in a nationwide retrospective cohort study in South Korea. J. Affect. Disord. 231 , 8–14 (2018).

Pham, T., Tran, T., Phung, D. & Venkatesh, S. Predicting healthcare trajectories from medical records: a deep learning approach. J. Biomed. Inform. 69 , 218–229 (2017).

Lin, E. et al. A deep learning approach for predicting antidepressant response in major depression using clinical and genetic biomarkers. Front. Psychiatry 9 , 290 (2018).

Geraci, J. et al. Applying deep neural networks to unstructured text notes in electronic medical records for phenotyping youth depression. Evid. Based Ment. Health 20 , 83–87 (2017).

Kim, Y. Convolutional neural networks for sentence classification. arXiv Prepr. arXiv 1408 , 5882 (2014).

Yang, Z. et al. Hierarchical attention networks for document classification. In Proc . 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 1480–1489 (San Diego, California, USA, 2016).

Rios, A. & Kavuluru, R. Ordinal convolutional neural networks for predicting RDoC positive valence psychiatric symptom severity scores. J. Biomed. Inform. 75 , S85–S93 (2017).

Dai, H. & Jonnagaddala, J. Assessing the severity of positive valence symptoms in initial psychiatric evaluation records: Should we use convolutional neural networks? PLoS ONE 13 , e0204493 (2018).

Tran, T. & Kavuluru, R. Predicting mental conditions based on “history of present illness” in psychiatric notes with deep neural networks. J. Biomed. Inform. 75 , S138–S148 (2017).

Samek, W., Binder, A., Montavon, G., Lapuschkin, S. & Müller, K.-R. Evaluating the visualization of what a deep neural network has learned. IEEE Trans. Neural Netw. Learn. Syst. 28 , 2660–2673 (2016).

Hripcsak, G. et al. Characterizing treatment pathways at scale using the OHDSI network. Proc. Natl. Acad. Sci . USA 113 , 7329–7336 (2016).

McGuffin, P., Owen, M. J. & Gottesman, I. I. Psychiatric Genetics and Genomics (Oxford Univ. Press, New York, 2004).

Levinson, D. F. The genetics of depression: a review. Biol. Psychiatry 60 , 84–92 (2006).

Wray, N. R. et al. Genome-wide association analyses identify 44 risk variants and refine the genetic architecture of major depression. Nat. Genet. 50 , 668 (2018).

Mullins, N. & Lewis, C. M. Genetics of depression: progress at last. Curr. Psychiatry Rep. 19 , 43 (2017).

Zou, J. et al. A primer on deep learning in genomics. Nat. Genet. 51 , 12–18 (2019).

Yue, T. & Wang, H. Deep learning for genomics: a concise overview. Preprint at arXiv:1802.00810 (2018).

Khan, A. & Wang, K. A deep learning based scoring system for prioritizing susceptibility variants for mental disorders. In Proc . 2017 IEEE Int. Conference on Bioinformatics and Biomedicine (BIBM) 1698–1705 (Kansas City, USA, 2017).

Khan, A., Liu, Q. & Wang, K. iMEGES: integrated mental-disorder genome score by deep neural network for prioritizing the susceptibility genes for mental disorders in personal genomes. BMC Bioinformatics 19 , 501 (2018).

Wang, D. et al. Comprehensive functional genomic resource and integrative model for the human brain. Science 362 , eaat8464 (2018).

Salakhutdinov, R. & Hinton, G. Deep Boltzmann machines. In Proc . 12th Int. Conference on Artificial Intelligence and Statistics 448–455 (Clearwater, Florida, USA, 2009).

Laksshman, S., Bhat, R. R., Viswanath, V. & Li, X. DeepBipolar: Identifying genomic mutations for bipolar disorder via deep learning. Hum. Mutat. 38 , 1217–1224 (2017).

CAS   PubMed Central   Google Scholar  

Huang, K.-Y. et al. Data collection of elicited facial expressions and speech responses for mood disorder detection. In Proc . 2015 Int. Conference on Orange Technologies (ICOT) 42–45 (Hong Kong, China, 2015).

Valstar, M. et al. AVEC 2013: the continuous audio/visual emotion and depression recognition challenge. In Proc . 3rd ACM Int. Workshop on Audio/Visual Emotion Challenge 3–10 (Barcelona, Spain, 2013).

Valstar, M. et al. Avec 2014: 3d dimensional affect and depression recognition challenge. In Proc . 4th Int. Workshop on Audio/Visual Emotion Challenge 3–10 (Orlando, Florida, USA, 2014).

Valstar, M. et al. Avec 2016: depression, mood, and emotion recognition workshop and challenge. In Proc . 6th Int. Workshop on Audio/Visual Emotion Challenge 3–10 (Amsterdam, The Netherlands, 2016).

Ma, X., Yang, H., Chen, Q., Huang, D. & Wang, Y. Depaudionet: an efficient deep model for audio based depression classification. In Proc . 6th Int. Workshop on Audio/Visual Emotion Challenge 35–42 (Amsterdam, The Netherlands, 2016).

He, L. & Cao, C. Automated depression analysis using convolutional neural networks from speech. J. Biomed. Inform. 83 , 103–111 (2018).

Li, J., Fu, X., Shao, Z. & Shang, Y. Improvement on speech depression recognition based on deep networks. In Proc . 2018 Chinese Automation Congress (CAC) 2705–2709 (Xi’an, China, 2018).

Yang, L., Jiang, D., Han, W. & Sahli, H. DCNN and DNN based multi-modal depression recognition. In Proc . 2017 7th Int. Conference on Affective Computing and Intelligent Interaction 484–489 (San Antonio, Texas, USA, 2017).

Huang, K. Y., Wu, C. H. & Su, M. H. Attention-based convolutional neural network and long short-term memory for short-term detection of mood disorders based on elicited speech responses. Pattern Recogn. 88 , 668–678 (2019).

Dawood, A., Turner, S. & Perepa, P. Affective computational model to extract natural affective states of students with Asperger syndrome (AS) in computer-based learning environment. IEEE Access. 6 , 67026–67034 (2018).

Song, S., Shen, L. & Valstar, M. Human behaviour-based automatic depression analysis using hand-crafted statistics and deep learned spectral features. In Proc . 13th IEEE Int. Conference on Automatic Face & Gesture Recognition 158–165 (Xi’an, China, 2018).

Zhu, Y., Shang, Y., Shao, Z. & Guo, G. Automated depression diagnosis based on deep networks to encode facial appearance and dynamics. IEEE Trans. Affect. Comput. 9 , 578–584 (2018).

Chao, L., Tao, J., Yang, M. & Li, Y. Multi task sequence learning for depression scale prediction from video. In Proc . 2015 Int. Conference on Affective Computing and Intelligent Interaction (ACII) 526–531 (Xi’an, China, 2015).

Yang, T. H., Wu, C. H., Huang, K. Y. & Su, M. H. Detection of mood disorder using speech emotion profiles and LSTM. In Proc . 10th Int. Symposium on Chinese Spoken Language Processing (ISCSLP) 1–5 (Tianjin, China, 2016).

Huang, K. Y., Wu, C. H., Su, M. H. & Chou, C. H. Mood disorder identification using deep bottleneck features of elicited speech. In Proc . 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) 1648–1652 (Kuala Lumpur, Malaysia, 2017).

Jan, A., Meng, H., Gaus, Y. F. B. A. & Zhang, F. Artificial intelligent system for automatic depression level analysis through visual and vocal expressions. IEEE Trans. Cogn. Dev. Syst. 10 , 668–680 (2017).

Su, M. H., Wu, C. H., Huang, K. Y. & Yang, T. H. Cell-coupled long short-term memory with l-skip fusion mechanism for mood disorder detection through elicited audiovisual features. IEEE Trans. Neural Netw. Learn. Syst . 31 (2019).

Harati, S., Crowell, A., Mayberg, H. & Nemati, S. Depression severity classification from speech emotion. In Proc . 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) 5763–5766 (Honolulu, HI, USA, 2018).

Su, M. H., Wu, C. H., Huang, K. Y., Hong, Q. B. & Wang, H. M. Exploring microscopic fluctuation of facial expression for mood disorder classification. In Proc . 2017 Int. Conference on Orange Technologies (ICOT) 65–69 (Singapore, 2017).

Prasetio, B. H., Tamura, H. & Tanno, K. The facial stress recognition based on multi-histogram features and convolutional neural network. In Proc . 2018 IEEE Int. Conference on Systems, Man, and Cybernetics (SMC) 881–887 (Miyazaki, Japan, 2018).

Jaiswal, S., Valstar, M. F., Gillott, A. & Daley, D. Automatic detection of ADHD and ASD from expressive behaviour in RGBD data. In Proc . 12th IEEE Int. Conference on Automatic Face & Gesture Recognition 762–769 (Washington, DC, USA, 2017).

Cho, Y., Bianchi-Berthouze, N. & Julier, S. J. DeepBreath: deep learning of breathing patterns for automatic stress recognition using low-cost thermal imaging in unconstrained settings. In Proc . 2017 7th Int. Conference on Affective Computing and Intelligent Interaction (ACII) 456–463 (San Antonio, Texas, USA, 2017).

Gupta, R., Sahu, S., Espy-Wilson, C. Y. & Narayanan, S. S. An affect prediction approach through depression severity parameter incorporation in neural networks. In Proc . 2017 Int. Conference on INTERSPEECH 3122–3126 (Stockholm, Sweden, 2017).

Martin, O., Kotsia, I., Macq, B. & Pitas, I. The eNTERFACE'05 audio-visual emotion database. In Proc . 22nd Int. Conference on Data Engineering Workshops 8–8 (Atlanta, GA, USA, 2006).

Goodfellow, I. J. et al. Challenges in representation learning: A report on three machine learning contests. In Proc . Int. Conference on Neural Information Processing 117–124 (Daegu, Korea, 2013).

Yi, D., Lei, Z., Liao, S. & Li, S. Z.. Learning face representation from scratch. Preprint at arXiv 1411.7923 (2014).

Lin, H. et al. User-level psychological stress detection from social media using deep neural network. In Proc . 22nd ACM Int. Conference on Multimedia 507–516 (Orlando, Florida, USA, 2014).

Lin, H. et al. Psychological stress detection from cross-media microblog data using deep sparse neural network. In Proc . 2014 IEEE Int. Conference on Multimedia and Expo 1–6 (Chengdu, China, 2014).

Li, Q. et al. Correlating stressor events for social network based adolescent stress prediction. In Proc . Int. Conference on Database Systems for Advanced Applications 642–658 (Suzhou, China, 2017).

Lin, H. et al. Detecting stress based on social interactions in social networks. IEEE Trans. Knowl. Data En. 29 , 1820–1833 (2017).

Cong, Q. et al. X-A-BiLSTM: a deep learning approach for depression detection in imbalanced data. In Proc . 2018 IEEE Int. Conference on Bioinformatics and Biomedicine (BIBM) 1624–1627 (Madrid, Spain, 2018).

Ive, J., Gkotsis, G., Dutta, R., Stewart, R. & Velupillai, S. Hierarchical neural model with attention mechanisms for the classification of social media text related to mental health. In Proc . Fifth Workshop on Computational Linguistics and Clinical Psychology: From Keyboard to Clinic 69–77 (New Orleans, Los Angeles, USA, 2018).

Sadeque, F., Xu, D. & Bethard, S. UArizona at the CLEF eRisk 2017 pilot task: linear and recurrent models for early depression detection. CEUR Workshop Proc . 1866 (2017).

Fraga, B. S., da Silva, A. P. C. & Murai, F. Online social networks in health care: a study of mental disorders on Reddit. In Proc . 2018 IEEE/WIC/ACM Int. Conference on Web Intelligence (WI) 568–573 (Santiago, Chile, 2018).

Gkotsis, G. et al. Characterisation of mental health conditions in social media using Informed Deep Learning. Sci. Rep. 7 , 45141 (2017).

Coppersmith, G., Leary, R., Crutchley, P. & Fine, A. Natural language processing of social media as screening for suicide risk. Biomed. Inform. Insights 10 , 1178222618792860 (2018).

Du, J. et al. Extracting psychiatric stressors for suicide from social media using deep learning. BMC Med. Inform. Decis. Mak. 18 , 43 (2018).

Alambo, A. et al. Question answering for suicide risk assessment using Reddit. In Proc . IEEE 13th Int. Conference on Semantic Computing 468–473 (Newport Beach, California, USA, 2019).

Eichstaedt, J. C. et al. Facebook language predicts depression in medical records. Proc. Natl Acad. Sci. USA 115 , 11203–11208 (2018).

Rosenquist, J. N., Fowler, J. H. & Christakis, N. A. Social network determinants of depression. Mol. Psychiatry 16 , 273 (2011).

Kipf, T. N. & Welling, M. Semi-supervised classification with graph convolutional networks. In Proc. 2017 Int. Conference on Learning Representations (Toulon, France, 2017).

Rice, S. M. et al. Online and social networking interventions for the treatment of depression in young people: a systematic review. J. Med. Internet Res. 16 , e206 (2014).

Hastie, T., Tibshirani, R. & Friedman, J. The elements of statistical learning: data mining, inference, and prediction. Springer Series in Statistics. Math. Intell. 27 , 83–85 (2009).

Torrey, L. & Shavlik, J. in Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods, and Techniques 242–264 (IGI Global, 2010).

Yosinski, J., Clune, J., Bengio, Y. & Lipson, H. How transferable are features in deep neural networks? In Proc . Advances in Neural Information Processing Systems 3320–3328 (Montreal, Canada, 2014).

Esteva, A. et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature 542 , 115 (2017).

Insel, T. et al. Research domain criteria (RDoC): toward a new classification framework for research on mental disorders. Am. Psychiatr. Assoc. 167 , 748–751 (2010).

Nelson, B., McGorry, P. D., Wichers, M., Wigman, J. T. & Hartmann, J. A. Moving from static to dynamic models of the onset of mental disorder: a review. JAMA Psychiatry 74 , 528–534 (2017).

Guo, X., Liu, X., Zhu, E. & Yin, J. Deep clustering with convolutional autoencoders. In Proc . Int. Conference on Neural Information Processing 373–382 (Guangzhou, China, 2017).

Srivastava, N., Mansimov, E. & Salakhudinov, R. Unsupervised learning of video representations using LSTMs. In Proc . Int. Conference on Machine Learning 843–852 (Lille, France, 2015).

Baytas, I. M. et al. Patient subtyping via time-aware LSTM networks. In Proc . 23rd ACM SIGKDD Int. Conference on Knowledge Discovery and Data Mining 65–74 (Halifax, Canada, 2017).

American Psychiatric Association. Diagnostic and Statistical Manual of Mental Disorders (DSM-5®) (American Psychiatric Pub, Washington, DC, 2013).

Biological Sciences Curriculum Study. In: NIH Curriculum Supplement Series (Internet) (National Institutes of Health, USA, 2007).

Noh, H., Hong, S. & Han, B. Learning deconvolution network for semantic segmentation. In Proc . IEEE Int. Conference on Computer Vision 1520–1528 (Santiago, Chile, 2015).

Grün, F., Rupprecht, C., Navab, N. & Tombari, F. A taxonomy and library for visualizing learned features in convolutional neural networks. In Proc. 33rd Int. Conference on Machine Learning (ICML) Workshop on Visualization for Deep Learning (New York, USA, 2016).

Ribeiro, M. T., Singh, S. & Guestrin, C. Why should I trust you?: Explaining the predictions of any classifier. In Proc . 22nd ACM SIGKDD Int. Conference on Knowledge Discovery and Data Mining 1135–1144 (San Francisco, CA, 2016).

Zhang, Q. S. & Zhu, S. C. Visual interpretability for deep learning: a survey. Front. Inf. Technol. Electron. Eng. 19 , 27–39 (2018).

Lundberg, S. M. & Lee, S. I. A unified approach to interpreting model predictions. In Proc . 31st Conference on Neural Information Processing Systems 4765–4774 (Long Beach, CA, 2017).

Shrikumar, A., Greenside, P., Shcherbina, A. & Kundaje, A. Not just a black box: learning important features through propagating activation differences. In Proc . 33rd Int. Conference on Machine Learning (New York, NY, 2016).

Gawehn, E., Hiss, J. A. & Schneider, G. Deep learning in drug discovery. Mol. Inform. 35 , 3–14 (2016).

Jerez-Aragonés, J. M., Gómez-Ruiz, J. A., Ramos-Jiménez, G., Muñoz-Pérez, J. & Alba-Conejo, E. A combined neural network and decision trees model for prognosis of breast cancer relapse. Artif. Intell. Med. 27 , 45–63 (2003).

Zhu, Y., Elemento, O., Pathak, J. & Wang, F. Drug knowledge bases and their applications in biomedical informatics research. Brief. Bioinformatics 20 , 1308–1321 (2018).

Su, C., Tong, J., Zhu, Y., Cui, P. & Wang, F. Network embedding in biomedical data science. Brief. Bioinform . https://doi.org/10.1093/bib/bby117 (2018).

Bodenreider, O. The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res. 32 (suppl_1), D267–D270 (2004).

Szklarczyk, D. et al. STRING v10: protein–protein interaction networks, integrated over the tree of life. Nucleic Acids Res. 43 , D447–D452 (2014).

Download references

Acknowledgements

The work is supported by NSF 1750326, R01 MH112148, R01 MH105384, R01 MH119177, R01 MH121922, and P50 MH113838.

Author information

Authors and affiliations.

Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, NY, USA

Chang Su, Zhenxing Xu, Jyotishman Pathak & Fei Wang

You can also search for this author in PubMed   Google Scholar

Contributions

C.S., Z.X. and F.W. planned and structured the whole paper. C.S. and Z.X. conducted the literature review and drafted the manuscript. J.P. and F.W. reviewed and edited the manuscript.

Corresponding author

Correspondence to Fei Wang .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplemental material, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Su, C., Xu, Z., Pathak, J. et al. Deep learning in mental health outcome research: a scoping review. Transl Psychiatry 10 , 116 (2020). https://doi.org/10.1038/s41398-020-0780-3

Download citation

Received : 31 August 2019

Revised : 17 February 2020

Accepted : 26 February 2020

Published : 22 April 2020

DOI : https://doi.org/10.1038/s41398-020-0780-3

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Integrating machine learning and artificial intelligence in life-course epidemiology: pathways to innovative public health solutions.

  • Shanquan Chen

BMC Medicine (2024)

The artistic image processing for visual healing in smart city

Scientific Reports (2024)

Automated mood disorder symptoms monitoring from multivariate time-series sensory data: getting the full picture beyond a single number

  • Filippo Corponi
  • Bryan M. Li
  • Antonio Vergari

Translational Psychiatry (2024)

Detecting your depression with your smartphone? – An ethical analysis of epistemic injustice in passive self-tracking apps

  • Mirjam Faissner
  • Sebastian Laacke

Ethics and Information Technology (2024)

Analysing Children’s Responses from Multiple Modalities During Robot-Assisted Assessment of Mental Wellbeing

  • Nida Itrat Abbasi
  • Micol Spitale
  • Hatice Gunes

International Journal of Social Robotics (2024)

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

research proposal of deep learning

  • Open access
  • Published: 27 September 2021

Deep learning in cancer diagnosis, prognosis and treatment selection

  • Khoa A. Tran 1 , 2 ,
  • Olga Kondrashova 1 ,
  • Andrew Bradley 4 ,
  • Elizabeth D. Williams 2 , 3 ,
  • John V. Pearson 1 &
  • Nicola Waddell   ORCID: orcid.org/0000-0002-3950-2476 1  

Genome Medicine volume  13 , Article number:  152 ( 2021 ) Cite this article

48k Accesses

311 Citations

38 Altmetric

Metrics details

Deep learning is a subdiscipline of artificial intelligence that uses a machine learning technique called artificial neural networks to extract patterns and make predictions from large data sets. The increasing adoption of deep learning across healthcare domains together with the availability of highly characterised cancer datasets has accelerated research into the utility of deep learning in the analysis of the complex biology of cancer. While early results are promising, this is a rapidly evolving field with new knowledge emerging in both cancer biology and deep learning. In this review, we provide an overview of emerging deep learning techniques and how they are being applied to oncology. We focus on the deep learning applications for omics data types, including genomic, methylation and transcriptomic data, as well as histopathology-based genomic inference, and provide perspectives on how the different data types can be integrated to develop decision support tools. We provide specific examples of how deep learning may be applied in cancer diagnosis, prognosis and treatment management. We also assess the current limitations and challenges for the application of deep learning in precision oncology, including the lack of phenotypically rich data and the need for more explainable deep learning models. Finally, we conclude with a discussion of how current obstacles can be overcome to enable future clinical utilisation of deep learning.

Artificial intelligence (AI) encompasses multiple technologies with the common aim to computationally simulate human intelligence. Machine learning (ML) is a subgroup of AI that focuses on making predictions by identifying patterns in data using mathematical algorithms. Deep learning (DL) is a subgroup of ML that focuses on making predictions using multi-layered neural network algorithms inspired by the neurological architecture of the brain. Compared to other ML methods such as logistic regression, the neural network architecture of DL enables the models to scale exponentially with the growing quantity and dimensionality of data [ 1 ]. This makes DL particularly useful for solving complex computational problems such as large-scale image classification, natural language processing and speech recognition and translation [ 1 ].

Cancer care is undergoing a shift towards precision healthcare enabled by the increasing availability and integration of multiple data types including genomic, transcriptomic and histopathologic data (Fig. 1 ). The use and interpretation of diverse and high-dimensionality data types for translational research or clinical tasks require significant time and expertise. Moreover, the integration of multiple data types is more resource-intensive than the interpretation of individual data types and needs modelling algorithms that can learn from tremendous numbers of intricate features. The use of ML algorithms to automate these tasks and aid cancer detection (identifying the presence of cancer) and diagnosis (characterising the cancer) has become increasingly prevalent [ 2 , 3 ]. Excitingly, DL models have the potential to harness this complexity to provide meaningful insights and identify relevant granular features from multiple data types [ 4 , 5 ]. In this review, we describe the latest applications of deep learning in cancer diagnosis, prognosis and treatment selection. We focus on DL applications for omics and histopathological data, as well as the integration of multiple data types. We provide a brief introduction to emerging DL methods relevant to applications covered in this review. Next, we discuss specific applications of DL in oncology, including cancer origin detection, molecular subtypes identification, prognosis and survivability prediction, histological inference of genomic traits, tumour microenvironment profiling and future applications in spatial transcriptomics, metagenomics and pharmacogenomics. We conclude with an examination of current challenges and potential strategies that would enable DL to be routinely applied in clinical settings.

figure 1

Deep learning may impact clinical oncology during diagnosis, prognosis and treatment. Specific areas of clinical oncology where deep learning is showing promise include cancer of unknown primary, molecular subtyping of cancers, prognosis and survivability and precision oncology. Examples of deep learning applications within each of these areas are listed. The data modalities utilised by deep learning models are numerous and include genomic, transcriptomic and histopathology data categories covered in this review

Emerging deep learning methods

Covering all DL methods in detail is outside the scope of this review; rather, we provide a high-level summary of emerging DL methods in oncology. DL utilises artificial neural networks to extract non-linear, entangled and representative features from massive and high-dimensional data [ 1 ]. A deep neural network is typically constructed of millions of densely interconnected computing neurons organised into consecutive layers. Within each layer, a neuron is connected to other neurons in the layer before it, from which it receives data, and other neurons in the layer after it, to which it sends data. When presented with data, a neural network feeds each training sample, with known ground truth, to its input layer before passing the information down to all succeeding layers (usually called hidden layers). This information is then multiplied, divided, added and subtracted millions of times before it reaches the output layer, which becomes the prediction. For supervised deep learning, each pair of training sample and label is fed through a neural network while its weights and thresholds are being adjusted to get the prediction closer to the provided label. When faced with unseen (test) data, these trained weights and thresholds are frozen and used to make predictions.

Fundamental neural network methods

There are multiple neural network-based methods, all with different advantages and applications. Multilayer perceptron (MLP), recurrent neural network (RNN) and convolutional neural network (CNN) are the most fundamental and are frequently used as building blocks for more advanced techniques. MLPs are the simplest type of neural networks, where neurons are organised in consecutive layers so that signals travel through the network in one direction (from input to output) [ 1 ]. Although MLPs can perform well for generic predictions, they are also prone to overfitting [ 6 ]. RNNs process an input sequence one element at a time, while maintaining history of all past elements in hidden ‘state vector(s)’. Output predictions are made at every element using information from the current element and also previous elements [ 1 , 7 ]. RNNs are typically used for analysing sequential data such as text, speech or DNA sequences. By contrast, CNNs are designed to draw spatial relationships from image data. CNNs traverse an image and apply small feature-filter matrices, i.e. convolution filters, to extract granular features [ 1 ]. Features extracted by the last convolution layer are then used for making predictions. CNNs have also been adapted for analysis of non-image data, e.g. genomic data represented in a vector, matrix or tensor format [ 8 ]. A review by Dias and Torkamani [ 7 ] described in detail how MLPs, RNNs and CNNs operate on biomedical and genomics data. Moreover, the use of MLPs, RNNs and CNNs to assist clinicians and researchers has been proposed across multiple oncology areas, including radiotherapy [ 9 ], digital histopathology [ 10 , 11 ] and clinical and genomic diagnostics [ 7 ]. While routine clinical use is still limited, some of the models have already been FDA-approved and adopted into a clinical setting, for example CNNs for the prediction of malignancy in pulmonary nodules detected by CT [ 12 ], and prostate and breast cancer diagnosis prediction using digital histopathology [ 13 , 14 ].

Advanced neural-network methods

Graph convolutional neural networks (GCNNs) generalise CNNs beyond regular structures (Euclidean domains) to non-Euclidean domains such as graphs which have arbitrary structure. GCNNs are specifically designed to analyse graph data, e.g. using prior biological knowledge of an interconnected network of proteins with nodes representing proteins and pairwise connections representing protein–protein interactions (PPI) [ 15 ], using resources such as the STRING PPI database [ 16 ] (Fig. 2 a). This enables GCNNs to incorporate known biological associations between genetic features and perceive their cooperative patterns, which have been shown to be useful in cancer diagnostics [ 17 ].

figure 2

An overview of Deep Learning techniques and concepts in oncology. a Graph convolutional neural networks (GCNN) are designed to operate on graph-structured data. In this particular example inspired by [ 17 , 18 , 19 ], gene expression values (upper left panel) are represented as graph signals structured by a protein–protein interactions graph (lower left panel) that serve as inputs to GCNN. For a single sample (highlighted with red outline), each node represents one gene with its expression value assigned to the corresponding protein node, and inter-node connections represent known protein–protein interactions. GCNN methods covered in this review require a graph to be undirected. Graph convolution filters are applied on each gene to extract meaningful gene expression patterns from the gene’s neighbourhood (nodes connected by orange edges). Pooling, i.e. combining clusters of nodes, can be applied following graph convolution to obtain a coarser representation of the graph. Output of the final graph convolution/pooling layer would then be passed through fully connected layers producing GCNN’s decision. b Semantic segmentation is applied to image data where it assigns a class label to each pixel within an image. A semantic segmentation model usually consists of an encoder, a decoder and a softmax function. The encoder consists of feature extraction layers to ‘learn’ meaningful and granular features from the input, while the decoder learns features to generate a coloured map of major object classes in the input (through the use of the softmax function). The example shows a H&E tumour section with infiltrating lymphocyte map generated by Saltz et al. [ 20 ] DL model c multimodal learning allows multiple datasets representing the same underlying phenotype to be combined to increase predictive power. Multimodal learning usually starts with encoding each input modality into a representation vector of lower dimension, followed by a feature combination step to aggregate these vectors together. d Explainability methods take a trained neural network and mathematically quantify how each input feature influences the model’s prediction. The outputs are usually feature contribution scores, capable of explaining the most salient features that dictate the model’s predictions. In this example, each input gene is assigned a contribution score by the explainability model (colour scale indicates the influence on the model prediction). An example of gene interaction network is shown coloured by contribution scores (links between red dots represent biological connections between genes)

Semantic segmentation is an important CNN-based visual learning method specifically for image data (Fig. 2 b). The purpose of semantic segmentation is to produce a class label for every single pixel in an image and cluster parts of an image together into each class, where the class represents an object or component of the image. Semantic segmentation models are generally supervised, i.e. they are given class labels for each pixel and are trained to detect the major ‘semantics’ for each class.

To enhance the predictive power of DL models, different data types (modalities) can be combined using multimodal learning (Fig. 2 c). In clinical oncology, data modalities can include image, numeric and descriptive data. Cancer is a complex and multi-faceted disease with layers of microscopic, macroscopic and molecular features that can separately or together influence treatment responses and patient prognosis. Therefore, combining clinical data (e.g. diagnostic test results and pathology reports), medical images (e.g. histopathology and computed tomography) and different types of omics data, such as genomic, transcriptomic and proteomic profiles, may be useful. The two most important requirements for a multimodal network are the ability to create representations that contain dense meaningful features of the original input, and a mathematical method to combine representations from all modalities. There are several methods capable of performing the representative learning task, e.g. CNNs, RNNs, deep belief networks and autoencoders (AE) [ 21 ]; score-level fusion [ 22 ]; or multimodal data fusion [ 23 ]. The multimodal learning applications discussed in this review are based on AE models. In simplistic terms, AE architecture comprises of an encoder and a decoder working in tandem. The encoder is responsible for creating a representation vector of lower dimension than the input, while the decoder is responsible for reconstructing the original input using this low-dimensional vector [ 24 ]. This forces the encoder to ‘learn’ to encapsulate meaningful features from the input and has been shown to have good generalisability [ 24 ]. Moreover, it provides DL models the unique ability to readily integrate different data modalities, e.g. medical images, genomic data and clinical information, into a single ‘end-to-end optimised’ model [ 8 ].

A major challenge with implementing DL into clinical practice is the ‘black box’ nature of the models [ 25 ]. High-stake medical decisions, such as diagnosis, prognosis and treatment selection, require trustworthy and explainable decision processes. Most DL models have limited interpretability, i.e. it is very difficult to dissect a neural network and understand how millions of parameters work simultaneously. Some even argue that more interpretable models such as Decision Trees should be ultimately preferred for making medical decisions [ 26 ]. An alternative approach is explainability—mathematical quantification of how influential, or ‘salient’, the features are towards a certain prediction (Fig. 2 d). This information can be used to ‘explain’ the decision-making process of a neural network model and identify features that contribute to a prediction. This knowledge can enable resolution of potential disagreements between DL models and clinicians and thus increase trust in DL systems [ 27 ]. Moreover, DL models do not always have perfect performance due to either imperfect training data (e.g. assay noise or errors in recording) or systematic errors caused by bias within DL models themselves, which can result from the training data not being representative of the population where DL is later applied [ 27 ]. In these circumstances, explainability can assist clinicians in evaluating predictions [ 27 ]. While some explainability methods were developed specifically for neural networks [ 28 , 29 ], others offer a more model- and data-agnostic solution [ 30 , 31 , 32 , 33 ]. Excitingly, explainability methods can be used in conjunction with multi-modal learning for data integration and discovery of cross-modality insights, e.g. how cancer traits across different omics types correlate and influence each other.

Another challenge in applying DL in oncology is the requirement for large amounts of robust, well-phenotyped training data to achieve good model generalisability. Large curated ‘ground-truth’ datasets of matched genomic, histopathological and clinical outcome data are scarce beyond the publicly available datasets, such as The Cancer Genome Atlas (TCGA) [ 34 ], International Cancer Genome Consortium (ICGC) [ 35 ], Gene Expression Omnibus (GEO) [ 36 ], European Genome-Phenome Archive (EGA) [ 37 ] and Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) [ 38 ]. Pre-training on abundant datasets from other domains may help overcome the challenges of limited data (a process known as transfer learning). The pre-trained neural network would then be reconfigured and trained again on data from the domain of interest. This approach usually results in a considerable reduction in computational and time resources for models training, and a significant increase in predictive performance, compared to training on small domain-specific datasets [ 39 ].

Deep learning in oncology

A variety of DL approaches that utilise a combination of genomic, transcriptomic or histopathology data have been applied in clinical and translational oncology with the aim of enhancing patient diagnosis, prognosis and treatment selection (Fig. 1 , Table 1 ). However, even with the emerging DL approaches, human intervention remains essential in oncology. Therefore, the goal of DL is not to outperform or replace humans, but to provide decision support tools that assist cancer researchers to study the disease and health professionals in the clinical management of people with cancer [ 79 ].

Deep learning for microscopy-based assessment of cancer

Cancers are traditionally diagnosed by histopathology or cytopathology to confirm the presence of tumour cells within a patient sample, assess markers relevant to cancer and to characterise features such as tumour type, stage and grade. This microscopy-based assessment is crucial; however, the process is relatively labour-intensive and somewhat subjective [ 80 , 81 ]. A histology image viewed at high magnification (typically 20x or 40x) can reveal millions of subtle cellular features, and deep CNN models are exceptionally good at extracting features from high-resolution image data [ 82 ]. Automating cancer grading with histology-based deep CNNs has proven successful, with studies showing that performance of deep CNNs can be comparable with pathologists in grading prostate [ 40 , 41 , 42 ], breast [ 43 ], colon cancer [ 44 ] and lymphoma [ 45 ]. Explainability methods can enable and improve histology-based classification models by allowing pathologists to validate DL-generated predictions. For example, Hägele et al. applied the Layer-wise Relevance Propagation (LRP) [ 29 ] method on DL models classifying healthy versus cancerous tissues using whole-slide images of lung cancer [ 46 ]. The LRP algorithm assigned a relevance score for each pixel, and pixel-wise relevance scores were aggregated into cell-level scores and compared against pathologists’ annotations. These scores were then used to evaluate DL model performance and identify how multiple data biases affected the performance at cellular levels [ 46 ]. These insights allow clinician and software developers to gain insights into DL models during development and deployment phases.

In addition to classification and explainability, semantic segmentation approaches can also be applied on histopathology images to localise specific regions. One notable approach to perform semantic segmentation is to use generative adversarial networks (GANs) [ 47 ]. GAN is a versatile generative DL method comprising a pair of two neural networks: a generator and a discriminator [ 83 ]. In the context of semantic segmentation, the generator learns to label each pixel of an image to a class object (Fig. 2 b), while the discriminator learns to distinguish the predicted class labels from the ground truth [ 84 ]. This ‘adversarial’ mechanism forces the generator to be as accurate as possible in localising objects so that the discriminator cannot recognise the difference between predicted and ground-truth class labels [ 84 ]. Using this approach, Poojitha and Lal Sharma trained a CNN-based generator to segment cancer tissue to ‘help’ a CNN-based classifier predict prostate cancer grading [ 47 ]. The GAN-annotated tissue maps helped the CNN classifier achieve comparable accuracy to the grading produced by anatomical pathologists, indicating DL models can detect relevant cell regions in pathology images for decision making.

Molecular subtyping of cancers

Transcriptomic profiling can be used to assign cancers into clinically meaningful molecular subtypes that have diagnostic, prognostic or treatment selection relevance. Molecular subtypes were first described for breast cancer [ 85 , 86 ], then later for other cancers including colorectal [ 87 ], ovarian cancer [ 88 ] and sarcomas [ 89 ]. Standard computational methods, such as support vector machines (SVMs) or k-nearest neighbours, used to subtype cancers can be prone to errors due to batch effects [ 90 ] and may rely only on a handful of signature genes, omitting important biological information [ 91 , 92 , 93 ]. Deep learning algorithms can overcome these limitations by learning patterns from the whole transcriptome. A neural network model DeepCC trained on TCGA RNA-seq colon and breast cancer data, then tested on independent gene expression microarray data showed superior accuracy, sensitivity and specificity when compared to traditional ML approaches including random forest, logistic regression, SVM and gradient boosting machine [ 48 ]. Neural networks have also been successfully applied to transcriptomic data for molecular subtyping of lung [ 94 ], gastric and ovarian cancers [ 95 ]. DL methods have the potential to be highly generalisable in profiling cancer molecular subtypes due to their ability to train on a large number of features that are generated by transcriptomic profiling. Furthermore, due to their flexibility, DL methods can incorporate prior biological knowledge to achieve improved performance. For example, Rhee et al. trained a hybrid GCNN model on expression profiles of a cancer hallmark gene set, connected in a graph using the STRING PPI network [ 16 ] to predict breast cancer molecular subtypes, PAM50 [ 18 ]. This approach outperformed other ML methods in subtype classification. Furthermore, the granular features extracted by the GCNN model naturally clustered tumours into PAM50 subtypes without relying on a classification model demonstrating that the method successfully learned the latent properties in the gene expression profiles [ 18 ].

The use of multimodal learning to integrate transcriptomic with other omics data may enable enhanced subtype predictions. A novel multimodal method using two CNN models trained separately on copy number alterations (CNAs) and gene expression before concatenating their representations for predictions was able to predict PAM50 breast cancer subtypes better than CNNs trained on individual data types [ 54 ]. As multi-omics analysis becomes increasingly popular, multimodal learning methods are expected to become more prevalent in cancer diagnostics. However, the challenges of generating multi-omic data from patient samples in the clinical setting, as opposed to samples bio-banked for research, may hinder the clinical implementation of these approaches.

Digital histopathology images are an integral part of the oncology workflow [ 11 ] and can be an alternative to transcriptomic-based methods for molecular subtyping. CNN models have been applied on haematoxylin and eosin (H&E) sections to predict molecular subtypes of lung [ 49 ], colorectal [ 50 ], breast [ 51 , 52 ] and bladder cancer [ 53 ], with greater accuracy when compared to traditional ML methods.

Diagnosing cancers of unknown primary

Determining the primary cancer site can be important during the diagnostic process, as it can be a significant indicator of how the cancer will behave clinically, and the treatment strategies are sometimes decided by the tumour origin [ 96 , 97 ]. However, 3–5% of cancer cases are metastatic cancers of unknown origin, termed cancers of unknown primary (CUPs) [ 98 , 99 ]. Genomic, methylation and transcriptomic profiles of metastatic tumours have unique patterns that can reveal their tissues of origin [ 100 , 101 , 102 ].

Traditional ML methods, such as regression and SVMs, applied to these omics data can predict tumour origin; however, they usually rely on a small subset of genes, which can be limiting in predicting a broad range of cancer types and subtypes. In contrast, DL algorithms can utilise large number of genomic and transcriptomic features. The Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium [ 103 ] used a DL model to predict the origins of 24 cancer types individually and collectively using thousands of somatic mutation features across 2 different classes (mutational distribution and driver gene and pathway features) [ 55 ]. Remarkably, the study found that driver genes and pathways are not among the most salient features, highlighting why previous efforts in panel and exome sequencing for CUP produced mixed results [ 104 , 105 , 106 , 107 ]. Deep learning approaches utilising transcriptome data have also shown utility in predicting tumour site of origin [ 56 , 57 ]. A neural network called SCOPE, trained on whole transcriptome TCGA data, was able to predict the origins of treatment-resistant metastatic cancers, even for rare cancers such as metastatic adenoid cystic carcinoma [ 56 ]. The CUP-AI-Dx algorithm, built upon a widely used CNN model called Inception [ 108 ], achieved similar results on 32 cancer types from TCGA and ICGC [ 57 ]. As whole genome sequencing becomes increasingly available, these models show great potential for future DL methods to incorporate multiple omics features to accurately categorise tumours into clinically meaningful subtypes by their molecular features.

In addition to genomic and transcriptomic data, a new model call TOAD trained on whole slide images (WSIs) was able to simultaneously predict metastasis status and origin of 18 tumour types [ 58 ]. Moreover, the model employed an explainability method called attention [ 109 , 110 ] to assign diagnostic relevance scores to image regions and revealed that regions with cancer cells contributed most to both metastasis and origin decision making [ 58 ]. These results suggested TOAD can ‘focus’ on biologically relevant image patterns and is a good candidate for clinical deployment.

Cancer prognosis and survival

Prognosis prediction is an essential part of clinical oncology, as the expected disease path and likelihood of survival can inform treatment decisions [ 111 ]. DL applied to genomic, transcriptomic and other data types has the potential to predict prognosis and patient survival [ 59 , 60 , 61 , 62 , 112 ]. The most common survival prediction method is the Cox proportional hazard regression model (Cox-PH) [ 113 , 114 , 115 ], which is a multivariate linear regression model finding correlations between survival time and predictor variables. A challenge of applying Cox-PH on genomic and transcriptomic data is its linear nature, which can potentially neglect complex and possibly nonlinear relationships between features [ 116 ]. By contrast, deep neural networks are naturally nonlinear, and in theory could excel at this task. Interestingly, many studies have incorporated Cox regression used for survival analysis into DL and trained these models on transcriptomic data for enhanced prognosis predictions [ 59 , 60 , 61 , 62 , 112 ]. Among them, Cox-nnet was a pioneering approach that made Cox regression the output layer of neural networks, effectively using millions of deep features extracted by hidden layers as input for the Cox regression model [ 59 ]. Cox-nnet was trained on RNA-seq data from 10 TCGA cancer types and benchmarked against two variations of Cox-PH (Cox-PH and CoxBoost). Cox-nnet showed superior accuracy and was the only model able to uniquely identify important pathways including p53 signalling, endocytosis and adherens junctions [ 59 ], demonstrating that the combination of Cox-PH and neural networks has the potential to capture biological information relating to prognosis. The potential of DL was confirmed by Huang et al. [ 62 ] who found that 3 different DL versions of Cox Regression (Cox-nnet, DeepSurv [ 60 ] and AECOX [ 62 ]) outperformed Cox-PH and traditional ML models. These results suggest that DL models can provide better accuracy than traditional models in predicting prognosis by learning from complex molecular interactions using their flexible architecture.

The incorporation of biological pathways in DL has enabled the elucidation of key survival drivers among thousands of features. PASNET [ 63 ] and its Cox-regression version Cox-PASNet [ 64 ] are among the most advanced DL models in this area. Both models incorporate a pathway layer between the input and the hidden layers of the neural network, where each node of the pathway layer represents a pathway (based on pathway databases such as Reactome [ 117 ] and KEGG [ 118 ]), and the connections between the two layers represent the gene-pathway relationships. These trained pathway nodes have different weights. By analysing the weight differences across different survival groups and identifying genes connected to each node, PASNet and Cox-PASNet were able to identify clinically actionable genetic traits of glioblastoma multiforme (GBM) and ovarian cancer [ 63 , 64 ]. In GBM, Cox-PASNet correctly identified PI3K cascade, a pathway highly involved in tumour proliferation, invasion and migration in GBM [ 119 ]. Cox-PASNet also correctly detected MAPK9, a gene strongly associated with GBM carcinogenesis and a novel potential therapeutic, as one the most influential genes [ 120 ]. The GCNN-explainability model from Chereda et al. is the latest example of incorporating molecular networks in cancer prognosis [ 19 ]. The study used gene expression profiles, structured by a PPI from Human Protein Reference Database (HPRD) [ 121 ], to predict metastasis of breast cancer samples. The explainability method, LRP [ 29 ], was then used to identify and analyse the biological relevance of the most relevant genes for predictions [ 19 ]. Pathway analysis of these genes showed that they include oncogenes, molecular-subtype-specific and therapeutically targetable genes, such as EGFR and ESR1 [ 19 ].

In addition to prognosis predictions from transcriptomic data, CNN models trained on histopathology images have been used to infer survival in several cancers including brain [ 122 ], colorectal [ 123 ], renal cell [ 124 ], liver cancers [ 125 ] and mesothelioma [ 65 ]. Among them, MesoNet [ 65 ] stands out for incorporating a feature contribution explainability algorithm called CHOWDER [ 126 ] on H&E tissue sections of mesothelioma to identify that the features contributing the most to survival predictions were primarily stromal cells associated with inflammation, cellular diversity and vacuolisation [ 65 ]. The CHOWDER algorithm enabled MesoNet to utilise large H&E images as well as segment and detect important regions for survival predictions without any local annotations by pathologists [ 65 ]. These findings suggest that ‘white-box’ DL models like MesoNet could be useful companion diagnostic tools in clinical setting by assisting clinicians in identifying known and novel histological features associated with a survival outcome.

Multi-modal DL analysis integrating histopathology images and, if available, omics data has the potential to better stratify patients into prognostic groups, as well as suggest more personalised and targeted treatments. Most multi-modal prognostic studies have focussed on three aspects: individual feature extraction from a single modality, multi-modal data integration and cross-modal analysis of prognostic features. The model PAGE-Net performed these tasks by using a CNN to create representations of WSIs and Cox-PASNet [ 64 ] to extract genetic pathway information from gene expression [ 66 ]. This architecture allowed PAGE-NET to not only integrate histopathological and transcriptomic data, but also identify patterns across both modalities that cause different survival rates [ 66 ]. More interestingly, the combination of multi-modal and explainability methods is particularly promising. PathME [ 67 ] is a pioneer of this approach by bringing together representation-extraction AEs and an explainability algorithm called SHAP [ 31 , 32 , 33 , 127 ]. The AEs captured important features from gene expression, miRNA expression, DNA methylation and CNAs for survival prediction, while SHAP scores each feature from each omic based on how relevant it is to the prediction [ 67 ]. Together, the two algorithms detected clinically relevant cross-omics features that affect survival across GBM, colorectal, breast and lung cancer [ 67 ]. The PathME methodology is cancer-agnostic, which makes it a great candidate for clinical implementations to explore actionable biomarkers in large-scale multi-omics data. Additionally, other studies [ 128 , 129 , 130 ] have employed Principal Component Analysis (PCA) [ 131 ] to compress gene expression, mutational signatures and methylation status into eigengene vectors [ 132 ], which were then combined with CNN-extracted histopathology features for survival predictions. While these methods could integrate histopathology data with multi-omics, they are not as explainable as PAGE-Net [ 66 ] or PathME [ 67 ] and thus less clinically suitable, as the conversion of genes into eigengenes makes exploration of cross-modality interactions challenging.

  • Precision oncology

The promise of precision medicine is to use high-resolution omics data to enable optimised management and treatment of patients to improve survival. An important part of precision oncology involves understanding cancer genomics and the tumour microenvironment (TME). DL offers the potential to infer important genomic features from readily available histopathology data, as well as disentangle the complex heterogeneity of TME to enable precision oncology.

Genomic traits such as tumour mutation burden (TMB) and microsatellite instability (MSI) have been shown to be important biomarkers of immunotherapy response across cancer types [ 133 , 134 , 135 , 136 ]. Assessment of these traits requires sequencing (comprehensive panel, exome or whole genome), which is still expensive and is not readily available in the clinic.

Routinely used histopathological images are a potential window to genomic features and may in future prove useful for predictions of specific clinically meaningful molecular features without the need for tumour sequencing. Several CNN methods have been developed to infer TMB, MSI and other clinically relevant genomic features from H&E sections [ 68 , 69 , 70 , 137 ]. A model called Image2TMB used ensemble learning to predict TMB in lung cancer using H&E images. Image2TMB was able to achieve the same average accuracy as large panel sequencing with significantly less variance. It also attempted to estimate TMB for each region of an image [ 69 ], which could enable studies of histological features associated with molecular heterogeneity.

Another DL model called HE2RNA used weakly supervised learning to infer gene expression from histopathology images, which were then used to infer MSI status in colorectal cancer [ 68 ]. When compared with another DL method to predict MSI directly from H&E slides [ 137 ], HE2RNA showed superior performance on both formalin-fixed paraffin-embedded (FFPE) and frozen sections, indicating a high level of robustness across tissue processing approaches.

Kather et al. [ 70 ] has also showed that CNN models trained and evaluated on TCGA H&E slides can accurately predict a range of actionable genetic alterations across multiple cancer types, including mutational status of key genes, molecular subtypes and gene expression of standard biomarkers such as hormone receptor status. While these molecular inference methods demonstrate an intriguing application of DL in histopathology, their current clinical utility is likely to be limited as features such as MSI and hormone receptor status are already part of the routine diagnostic workflows (immunohistochemistry staining for mismatch-repair proteins in colorectal and endometrial cancer or ER, PR in breast cancer). However, these studies serve as proof-of-concept, and the developed models could in future be adapted to predict clinically important molecular features that are not routinely assessed. Thus, future investigations into histopathology-based genomic inference are warranted, with the understanding that the accuracy of such DL models needs to be exceptional for them to replace current assays.

The tumour microenvironment

The TME plays a key role in cancer progression, metastasis and response to therapy [ 138 ]. However, there remain many unknowns in the complex molecular and cellular interactions within the TME. The rise of DL in cancer research, coupled with large publicly available catalogues of genomic, transcriptomic and histopathology data, have created a strong technical framework for the use of neural networks in profiling the heterogeneity of TME.

Infiltrating immune cell populations, such as CD4+ and CD8+ T cells, are potential important biomarkers of immunotherapy response [ 139 , 140 ]. Traditional ML methods can accurately estimate TME cell compositions using transcriptomic [ 141 , 142 ] or methylation data [ 143 ]. However, most of these methods rely on the generation of signature Gene Expression Profiles (GEPs) or the selection of a limited number of CpG sites, biassed to previously known biomarkers. This can lead to models susceptible to noise and bias and unable to discover novel genetic biomarkers. DL methods can be trained on the whole dataset (i.e. the whole transcriptome) to identify the optimal features without relying on GEPs. Recently developed DL TME methods include Scaden [ 71 ], a transcriptomic-based neural network model, and MethylNet, a methylation-based model [ 72 ]. MethylNet also incorporated the SHAP explainability method [ 31 , 32 , 33 , 127 ] to quantify how relevant each CpG site is for deconvolution. While these methods currently focus on showing DL models are more robust against noise, bias and batch effects compared to traditional ML models, future follow-up studies are likely to reveal additional cellular heterogeneity traits of the TME and possibly inform treatment decisions. For example, a CNN trained on H&E slides of 13 cancer types [ 20 ] showed a strong correlation between spatial tumour infiltrating lymphocytes (TIL) patterns and cellular compositions derived by CIBERSORT (a Support Vector Regression model) [ 141 ]. These models have significant clinical implications, as rapid and automated identification of the composition, amount and spatial organisation of TIL can support the clinical decision making for prognosis predictions (for example, for breast cancer) and infer treatment options, specifically immunotherapy. We expect future DL methods will further explore the integrations of histopathology and omics in profiling tumour immune landscape [ 144 ]. We also expect future DL methods to incorporate single-cell transcriptomics (scRNA-Seq) data to improve TME predictions and even infer transcriptomic profiles of individual cell types. Several DL methods have already been developed to address batch correction, normalisation, imputation, dimensionality reduction and cell annotations for scRNA-Seq cancer data [ 145 , 146 , 147 ]. However, these studies are still experimental and require further effort and validation to be clinically applicable [ 148 ].

The new frontiers

An exciting new approach for studying the TME is spatial transcriptomics which allows quantification of gene expression in individual cells or regions while maintaining their positional representation, thus capturing spatial heterogeneity of gene expression at high resolution [ 149 , 150 ]. Given the complexity of this data, DL approaches are well suited for its analysis and interpretation. For example, by integrating histopathology images and spatial transcriptomics, DL can predict localised gene expression from tissue slides, as demonstrated by ST-Net, a neural network capable of predicting expressions of clinically relevant genes in breast cancer using tissue spots from H&E slides [ 73 ]. As the cost of spatial transcriptomics decreases in the future, it is expected more translational applications of DL will arise, for example utilising spatial transcriptomics information for improved prognosis predictions, subtype classification and refining our understanding of tumour heterogeneity [ 151 ].

In addition, gut microbiome, i.e. metagenome, has been an emerging field and shown to play an important role in cancer treatment efficacy and outcomes [ 152 , 153 ]. As more multi-omics datasets (genomics, transcriptomics, proteomics, microbiotics) are being generated, annotated and made available, we speculate that integrative analysis between these data types will help mapping omics profiles of each individual patient to the metagenome, which will unlock effective new exciting options.

Lastly, pharmacogenomics, to predict drug responses and the mechanisms of action using genomic characteristics, is an important and exciting area in precision oncology where DL methods have significant potential [ 154 ]. The increasing availability of public omics data has facilitated recent growth of DL applications in cancer pharmacogenomics [ 155 , 156 , 157 ]. Most common applications include therapy response and resistance (e.g. Dr.VAE [ 158 ] or CDRscan [ 74 ]), drug combination synergy (e.g. DeepSynergy [ 75 ] and Jiang et al. [ 76 ]), drug repositioning (e.g. deepDR [ 77 ]) and drug-target interactions (e.g. DeepDTI [ 78 ]). As pharmacogenomics is a highly translational field, we expect many such DL models will be applied in clinical setting in the future.

Challenges and limitations: the road to clinical implementation

This review provides an overview of exciting potential DL applications in oncology. However, there are several challenges to the widespread implementation of DL in clinical practice. Here, we discuss challenges and limitations of DL in clinical oncology and provide our perspective for future improvements.

Data variability

Data variability is a major challenge for applying DL to oncology. For example, in immunohistochemistry each lab may have different intensity of staining or have different qualities of staining. It is currently unclear how DL systems would deal with this inter- and intra-laboratory variability. For transcriptomic data, one of the principal difficulties is establishing the exact processing applied to generate a sequence library and processed dataset. Even properties as basic as ‘the list of human genes’ are not settled and multiple authorities publish and regularly update lists of genes, observed spliceforms, so any analysis should specify both the source and version of the gene model used. Additionally, there are a large range of data transformations (log, linear, etc.) and data normalisations (FPKM, TMM, TPM), with implementations in multiple programming languages resulting in a combinatorially large number of possible processing paths that should theoretically return the same results but without any formal process to ensure that that assumption is true.

Paucity of public phenotypically characterised datasets

One challenge of implementing DL into clinical practice is the need for large phenotypically characterised datasets that enable development and training of DL models with good generalisation performance. High-quality cancer datasets that have undergone omics profiling are difficult to acquire in the clinical setting due to cost, sample availability and quality. In addition, clinical tumour samples can be small and are typically stored as FFPE blocks, resulting in degraded RNA and crosslinked DNA not suitable for comprehensive molecular profiling. To overcome this, explanability methods, such as SHAP, could be applied on the current DL models, that are developed in research setting, to identify the most salient features and design targeted profiling workflows suitable for clinical samples. This way, the DL models could still capture the complexity and possible non-linear gene relationships, but be retrained to make clinical predictions using only the select salient features. Multi-modal based DL models coupled with explainability could also be explored due to their potential of using features in one modality to complement missing data in another. Transfer learning can also overcome challenges of requiring large datasets by pre-training DL models from other domains. In practice, however, large data sets with thousands of samples per class are still needed for accurate predictions in the clinic, as patient outcomes are complex and there is clinical heterogeneity between patients including responses, treatment courses, comorbidities and other lifestyle factors that may impact prognosis and survival. As more data is being routinely generated and clinical information centrally collected in digital health databases, we expect to see more DL models developed for treatment response predictions as well as the general prognosis predictions. More interestingly, DL’s ability to continue learning from and become more accurate with new training samples, i.e. active learning, can significantly help pathologists reduce time spent on training histopathology data annotation. For example, a histopathology-based DL model by Saltz et al. only required pathologists to annotate a few training images at a time, and stopping the manual annotation process when the model’s performance is satisfactory [ 20 ].

Lastly, clinical data about a sample or piece of data usually do not capture all the complexities of the samples and phenotype and can be prone to incompleteness, inconsistencies and errors. A potential strategy to address this issue is to design DL models less reliant on or independent from clinical annotations, for example the MesoNet model was able to detect prognostically meaningful regions from H&E images without any pathologist-derived annotations [ 65 ].

AI explainability and uncertainty

Finally, for DL to be implemented and accepted in the clinic, the models need to be designed to complement and enhance clinical workflows. For human experts to effectively utilise these models, they need to be not only explainable, but also capable of estimating the uncertainty in their predictions.

Over the last 5 years, research into explainable AI has accelerated. For DL to obtain regulatory approval and be used as a diagnostic tool, comprehensive studies of the biological relevance of explainability are imperative. In medical imaging, this entails validating DL-identified clinically relevant regions against pathology review, and in some cases, cross-validation with genomic features [ 46 ]. In genomics, this entails validating DL-identified relevant genetic features against those identified by conventional bioinformatics methods, for example confirming that the most discriminatory genes in predicting tissue types, as identified by SHAP, were also identified by pairwise differential expression analysis using edgeR [ 159 ] or showing that patient-specific molecular interaction networks produced in predicting metastasis status of breast cancer were not only linked to benign/malignant phenotype, but also indicative of tumour progression and therapeutic targets [ 19 ].

Furthermore, DL model’s ability to produce the ‘I don’t know’ output, when uncertain about predictions, is critical. Most DL applications covered in this review are point-estimate methods, i.e. the predictions are simply the best guess with the highest probability. In critical circumstances, overconfident predictions, e.g. predicting cancer primary site with only 40% certainty, can result in inaccurate diagnosis or cancer management decisions. Furthermore, when uncertainty estimates are too high, companion diagnostic tools should be able to abstain from making predictions and ask for medical experts’ opinion [ 160 ]. Probabilistic DL methods capable of quantifying prediction uncertainty, such as Bayesian DL [ 161 ], are great candidates to address these issues and have recently started to be applied in cancer diagnosis tasks [ 162 , 163 , 164 ]. We expect probabilistic models to become mainstream in oncology in the near future.

Conclusions

In summary, DL has the potential to dramatically transformed cancer care and bring it a step closer to the promise of precision oncology. In an era where genomics is being implemented into health delivery and health data is becoming increasingly digitised, it is anticipated that artificial intelligence and DL will be used in the development, validation and implementation of decision support tools to facilitate precision oncology. In this review, we showcased a number of promising applications of DL in various areas of oncology, including digital histopathology, molecular subtyping, cancer diagnosis, prognostication, histological inference of genomic characteristics, tumour microenvironment and emerging frontiers such as spatial transcriptomics and pharmacogenomics. As the research matures, the future of applied DL in oncology will likely focus on integration of medical images and omics data using multimodal learning that can identify biologically meaningful biomarkers. Excitingly, the combination of multimodal learning and explainability can reveal novel insights. Important prerequisites of widespread adoption of DL in clinical setting are phenotypically rich data for training models and clinical validation of the biological relevance of DL-generated insights. We expect as new technologies such as single-cell sequencing, spatial transcriptomics and multiplexed imaging become more accessible, more efforts will be dedicated to improving both the quantity and quality of labelling/annotation of medical data. Finally, for DL to be accepted in routine patient care, clinical validation of explainable DL methods will play a vital role.

Availability of data and materials

Not applicable

Abbreviations

Autoencoder

  • Artificial intelligence
  • Cancer of unknown primary

Copy number aberrations

Convolutional neural network

Cox proportional hazard regression model

  • Deep learning

European Genome Atlas

Formalin-fixed, paraffin-embedded

Glioblastoma multiforme

Graph convolutional neural network

Gene Expression Omnibus

Graphical Processing Units

Human Protein Reference Database

Haematoxylin and Eosin

International Cancer Genome Consortium

Layer-wise Relevance Propagation

Microsatellite instability

Machine learning

Multilayer perceptron

Pan-Cancer Analysis of Whole Genomes

Protein-protein interactions

RNA sequencing

Recurrent neural network

Support vector machine

The Cancer Genome Atlas

Tumour infiltrating lymphocytes

  • Tumour microenvironment

Tumour mutation burden

Weighted correlation network analysis

LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521:436–44.

Article   CAS   PubMed   Google Scholar  

Libbrecht MW, Noble WS. Machine learning applications in genetics and genomics. Nat Rev Genet. 2015;16:321–32.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Jones W, Alasoo K, Fishman D, Parts L. Computational biology: deep learning. Skolnick J, editor. Emerg Top Life Sci. 2017;1:257–74.

Article   PubMed   Google Scholar  

Wainberg M, Merico D, Delong A, Frey BJ. Deep learning in biomedicine. Nat Biotechnol. 2018;36:829–38.

Zou J, Huss M, Abid A, Mohammadi P, Torkamani A, Telenti A. A primer on deep learning in genomics. Nat Genet. 2019;51:12–8.

Montesinos-López OA, Montesinos-López A, Pérez-Rodríguez P, Barrón-López JA, Martini JWR, Fajardo-Flores SB, et al. A review of deep learning applications for genomic selection. BMC Genomics. 2021;22:19.

Article   PubMed   PubMed Central   Google Scholar  

Dias R, Torkamani A. Artificial intelligence in clinical and genomic diagnostics. Genome Med. 2019;11(1):70. https://doi.org/10.1186/s13073-019-0689-8 .

Eraslan G, Avsec Ž, Gagneur J, Theis FJ. Deep learning: new computational modelling techniques for genomics. Nat Rev Genet. 2019;20(7):389–403. https://doi.org/10.1038/s41576-019-0122-6 .

Huynh E, Hosny A, Guthier C, Bitterman DS, Petit SF, Haas-Kogan DA, et al. Artificial intelligence in radiation oncology. Nat Rev Clin Oncol. 2020;17:771–81.

Bera K, Schalper KA, Rimm DL, Velcheti V, Madabhushi A. Artificial intelligence in digital pathology—new tools for diagnosis and precision oncology. Nat Rev Clin Oncol. 2019;16:703–15.

Huss R, Coupland SE. Software-assisted decision support in digital histopathology. J Pathol. 2020;250:685–92.

Massion PP, Antic S, Ather S, Arteta C, Brabec J, Chen H, et al. Assessing the accuracy of a deep learning method to risk stratify indeterminate pulmonary nodules. Am J Respir Crit Care Med. 2020;202:241–9.

Kanan C, Sue J, Grady L, Fuchs TJ, Chandarlapaty S, Reis-Filho JS, et al. Independent validation of paige prostate: assessing clinical benefit of an artificial intelligence tool within a digital diagnostic pathology laboratory workflow. J Clin Oncol. 2020;38(15_suppl):e14076. https://doi.org/10.1200/JCO.2020.38.15_suppl.e14076 .

Article   Google Scholar  

Silva LM, Pereira EM, Salles PG, Godrich R, Ceballos R, Kunz JD, et al. Independent real-world application of a clinical-grade automated prostate cancer detection system. J Pathol. 2021;path:5662.

Google Scholar  

Schulte-Sasse R, Budach S, Hnisz D, Marsico A. Graph convolutional networks improve the prediction of cancer driver genes. Artif Neural Netw Mach Learn – ICANN 2019 [Internet]. Munich: Springer; 2019. p. 658–68. Available from: https://link.springer.com/chapter/10.1007%2F978-3-030-30493-5_60

Szklarczyk D, Franceschini A, Wyder S, Forslund K, Heller D, Huerta-Cepas J, et al. STRING v10: protein–protein interaction networks, integrated over the tree of life. Nucleic Acids Res. 2015;43:D447–52.

Ramirez R, Chiu Y-C, Hererra A, Mostavi M, Ramirez J, Chen Y, et al. Classification of cancer types using graph convolutional neural networks. Front Phys. 2020;8:203. https://doi.org/10.3389/fphy.2020.00203 .

Rhee S, Seo S, Kim S. Hybrid approach of relation network and localized graph convolutional filtering for breast cancer subtype classification. Proc Twenty-Seventh Int Jt Conf Artif Intell [Internet]. Stockholm: International Joint Conferences on Artificial Intelligence Organization; 2018. p. 3527–34. [cited 2021 Apr 30]. Available from: https://www.ijcai.org/proceedings/2018/490

Chereda H, Bleckmann A, Menck K, Perera-Bel J, Stegmaier P, Auer F, et al. Explaining decisions of graph convolutional neural networks: patient-specific molecular subnetworks responsible for metastasis prediction in breast cancer. Genome Med. 2021;13:42.

Saltz J, Gupta R, Hou L, Kurc T, Singh P, Nguyen V, et al. Spatial organization and molecular correlation of tumor-infiltrating lymphocytes using deep learning on pathology images. Cell Rep. 2018;23:181–193.e7.

Gao J, Li P, Chen Z, Zhang J. A survey on deep learning for multimodal data fusion. Neural Comput. 2020;32:829–64.

Sun D, Wang M, Li A. A multimodal deep neural network for human breast cancer prognosis prediction by integrating multi-dimensional data. IEEE/ACM Trans Comput Biol Bioinform. 2019;16:841–50.

Cheerla A, Gevaert O. Deep learning with multimodal representation for pancancer prognosis prediction. Bioinformatics. 2019;35(14):i446–54. https://doi.org/10.1093/bioinformatics/btz342 .

Tschannen M, Bachem O, Lucic M. Recent advances in autoencoder-based representation learning. ArXiv181205069 Cs Stat [Internet]. 2018; [cited 2020 Apr 21]; Available from: http://arxiv.org/abs/1812.05069 .

Kelly CJ, Karthikesalingam A, Suleyman M, Corrado G, King D. Key challenges for delivering clinical impact with artificial intelligence. BMC Med. 2019;17:195.

Rudin C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat Mach Intell. 2019;1:206–15.

The Precise4Q consortium, Amann J, Blasimme A, Vayena E, Frey D, Madai VI. Explainability for artificial intelligence in healthcare: a multidisciplinary perspective. BMC Med Inform Decis Mak. 2020;20:310.

Article   PubMed Central   Google Scholar  

Shrikumar A, Greenside P, Kundaje A. Learning important features through propagating activation differences. ArXiv170402685 Cs [Internet]. 2019; [cited 2020 Apr 20]; Available from: http://arxiv.org/abs/1704.02685 .

Bach S, Binder A, Montavon G, Klauschen F, Müller K-R, Samek W. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. Suarez OD, editor. PLoS One. 2015;10:e0130140.

Ribeiro MT, Singh S, Guestrin C. “Why Should I Trust You?”: explaining the predictions of any classifier. Proc 22nd ACM SIGKDD Int Conf Knowl Discov Data Min [Internet]. San Francisco: ACM; 2016. p. 1135–44. [cited 2020 Dec 8]. Available from: https://dl.acm.org/doi/10.1145/2939672.2939778

Lundberg SM, Lee S-I. A unified approach to interpreting model predictions. NIPS17 Proc 31st. Int Conf Neural Inf Process Syst Curran Associates Inc. 2017;30:4768–77.

Erion G, Janizek JD, Sturmfels P, Lundberg S, Lee S-I. Learning explainable models using attribution priors. ArXiv190610670 Cs Stat [Internet]. 2019; [cited 2020 Jun 22]; Available from: http://arxiv.org/abs/1906.10670 .

Lundberg SM, Erion G, Chen H, DeGrave A, Prutkin JM, Nair B, et al. From local explanations to global understanding with explainable AI for trees. Nat Mach Intell. 2020;2(1):56–67. https://doi.org/10.1038/s42256-019-0138-9 .

The Cancer Genome Atlas Research Network, Weinstein JN, Collisson EA, Mills GB, Shaw KRM, Ozenberger BA, et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet. 2013;45:1113–20.

Article   CAS   Google Scholar  

The International Cancer Genome Consortium. International network of cancer genome projects. Nature. 2010;464:993–8.

Article   CAS   PubMed Central   Google Scholar  

Edgar R. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 2002;30:207–10.

Lappalainen I, Almeida-King J, Kumanduri V, Senf A, Spalding JD, ur-Rehman S, et al. The European Genome-phenome Archive of human data consented for biomedical research. Nat Genet. 2015;47(7):692–5. https://doi.org/10.1038/ng.3312 .

METABRIC Group, Curtis C, Shah SP, Chin S-F, Turashvili G, Rueda OM, et al. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature. 2012;486:346–52.

Zhuang F, Qi Z, Duan K, Xi D, Zhu Y, Zhu H, et al. A comprehensive survey on transfer learning. ArXiv191102685 Cs Stat [Internet]. 2020; [cited 2020 Dec 6]; Available from: http://arxiv.org/abs/1911.02685 .

Ryu HS, Jin M-S, Park JH, Lee S, Cho J, Oh S, et al. Automated gleason scoring and tumor quantification in prostate core needle biopsy images using deep neural networks and its comparison with pathologist-based assessment. Cancers. 2019;11:1860.

Nir G, Karimi D, Goldenberg SL, Fazli L, Skinnider BF, Tavassoli P, et al. Comparison of artificial intelligence techniques to evaluate performance of a classifier for automatic grading of prostate cancer from digitized histopathologic images. JAMA Netw Open. 2019;2:e190442.

Ström P, Kartasalo K, Olsson H, Solorzano L, Delahunt B, Berney DM, et al. Artificial intelligence for diagnosis and grading of prostate cancer in biopsies: a population-based, diagnostic study. Lancet Oncol. 2020;21:222–32.

Ehteshami Bejnordi B, Mullooly M, Pfeiffer RM, Fan S, Vacek PM, Weaver DL, et al. Using deep convolutional neural networks to identify and classify tumor-associated stroma in diagnostic breast biopsies. Mod Pathol. 2018;31:1502–12.

Vuong TLT, Lee D, Kwak JT, Kim K. Multi-task deep learning for colon cancer grading. 2020 Int Conf Electron Inf Commun ICEIC [Internet]. Barcelona: IEEE; 2020. p. 1–2. [cited 2020 Nov 9]. Available from: https://ieeexplore.ieee.org/document/9051305/

El Achi HE, Khoury JD. Artificial intelligence and digital microscopy applications in diagnostic hematopathology. Cancers. 2020;12(4):797. https://doi.org/10.3390/cancers12040797 .

Hägele M, Seegerer P, Lapuschkin S, Bockmayr M, Samek W, Klauschen F, et al. Resolving challenges in deep learning-based analyses of histopathological images using explanation methods. Sci Rep. 2020;10:6423.

Poojitha UP, Lal SS. Hybrid unified deep learning network for highly precise gleason grading of prostate cancer. 2019 41st Annu Int Conf IEEE Eng Med Biol Soc EMBC [Internet]. Berlin: IEEE; 2019. p. 899–903. [cited 2020 Apr 3]Available from: https://ieeexplore.ieee.org/document/8856912/

Gao F, Wang W, Tan M, Zhu L, Zhang Y, Fessler E, et al. DeepCC: a novel deep learning-based framework for cancer molecular subtype classification. Oncogenesis. 2019;8:44.

Yu K-H, Wang F, Berry GJ, Ré C, Altman RB, Snyder M, et al. Classifying non-small cell lung cancer types and transcriptomic subtypes using convolutional neural networks. J Am Med Inform Assoc. 2020;27:757–69.

Sirinukunwattana K, Domingo E, Richman SD, Redmond KL, Blake A, Verrill C, et al. Image-based consensus molecular subtype (imCMS) classification of colorectal cancer using deep learning. Gut. 2020;gutjnl-2019:319866.

Stålhammar G, Fuentes Martinez N, Lippert M, Tobin NP, Mølholm I, Kis L, et al. Digital image analysis outperforms manual biomarker assessment in breast cancer. Mod Pathol. 2016;29(4):318–29. https://doi.org/10.1038/modpathol.2016.34 .

Couture HD, Williams LA, Geradts J, Nyante SJ, Butler EN, Marron JS, et al. Image analysis with deep learning to predict breast cancer grade, ER status, histologic subtype, and intrinsic subtype. NPJ Breast Cancer. 2018;4:30.

Woerl A-C, Eckstein M, Geiger J, Wagner DC, Daher T, Stenzel P, et al. Deep Learning Predicts Molecular Subtype of Muscle-invasive bladder cancer from conventional histopathological slides. Eur Urol. 2020;78:256–64.

Md MI, Huang S, Ajwad R, Chi C, Wang Y, Hu P. An integrative deep learning framework for classifying molecular subtypes of breast cancer. Comput Struct Biotechnol J. 2020;18:2185–99.

PCAWG Tumor Subtypes and Clinical Translation Working Group, PCAWG Consortium, Jiao W, Atwal G, Polak P, Karlic R, et al. A deep learning system accurately classifies primary and metastatic cancers using passenger mutation patterns. Nat Commun. 2020;11:728.

Grewal JK, Tessier-Cloutier B, Jones M, Gakkhar S, Ma Y, Moore R, et al. Application of a neural network whole transcriptome–based pan-cancer method for diagnosis of primary and metastatic cancers. JAMA Netw Open. 2019;2(4):e192597. https://doi.org/10.1001/jamanetworkopen.2019.2597 .

Zhao Y, Pan Z, Namburi S, Pattison A, Posner A, Balachander S, et al. CUP-AI-Dx: A tool for inferring cancer tissue of origin and molecular subtype using RNA gene-expression data and artificial intelligence. EBioMedicine. 2020;61:103030.

Lu MY, Chen TY, Williamson DFK, Zhao M, Shady M, Lipkova J, et al. AI-based pathology predicts origins for cancers of unknown primary. Nature. 2021;594:106–10.

Ching T, Zhu X, Garmire LX. Cox-nnet: an artificial neural network method for prognosis prediction of high-throughput omics data. Markowetz F, editor. PLoS Comput Biol. 2018;14:e1006076.

Katzman JL, Shaham U, Cloninger A, Bates J, Jiang T, Kluger Y. DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC Med Res Methodol. 2018;18:24.

Jing B, Zhang T, Wang Z, Jin Y, Liu K, Qiu W, et al. A deep survival analysis method based on ranking. Artif Intell Med. 2019;98:1–9.

Huang Z, Johnson TS, Han Z, Helm B, Cao S, Zhang C, et al. Deep learning-based cancer survival prognosis from RNA-seq data: approaches and evaluations. BMC Med Genet. 2020;13:41.

Hao J, Kim Y, Kim T-K, Kang M. PASNet: pathway-associated sparse deep neural network for prognosis prediction from high-throughput data. BMC Bioinformatics. 2018;19:510.

Hao J, Kim Y, Mallavarapu T, Oh JH, Kang M. Interpretable deep neural network for cancer survival analysis by integrating genomic and clinical data. BMC Med Genet. 2019;12:189.

Courtiol P, Maussion C, Moarii M, Pronier E, Pilcer S, Sefta M, et al. Deep learning-based classification of mesothelioma improves prediction of patient outcome. Nat Med. 2019;25:1519–25.

Hao J, Kosaraju SC, Tsaku NZ, Song DH, Kang M. PAGE-Net: interpretable and integrative deep learning for survival analysis using histopathological images and genomic data. Biocomput 2020 [Internet]. Kohala Coast: WORLD SCIENTIFIC; 2019. p. 355–66. [cited 2020 Apr 6]. Available from: https://www.worldscientific.com/doi/abs/10.1142/9789811215636_0032

Lemsara A, Ouadfel S, Fröhlich H. PathME: pathway based multi-modal sparse autoencoders for clustering of patient-level multi-omics data. BMC Bioinformatics. 2020;21:146.

Schmauch B, Romagnoni A, Pronier E, Saillard C, Maillé P, Calderaro J, et al. A deep learning model to predict RNA-Seq expression of tumours from whole slide images. Nat Commun. 2020;11:3877.

Jain MS, Massoud TF. Predicting tumour mutational burden from histopathological images using multiscale deep learning. Nat Mach Intell. 2020;2:356–62.

Kather JN, Heij LR, Grabsch HI, Loeffler C, Echle A, Muti HS, et al. Pan-cancer image-based detection of clinically actionable genetic alterations. Nat Can. 2020;1(8):789–99. https://doi.org/10.1038/s43018-020-0087-6 .

Menden K, Marouf M, Oller S, Dalmia A, Magruder DS, Kloiber K, et al. Deep learning–based cell composition analysis from tissue expression profiles. Sci Adv [Internet]. 2020;6 Available from: https://advances.sciencemag.org/content/6/30/eaba2619 .

Levy JJ, Titus AJ, Petersen CL, Chen Y, Salas LA, Christensen BC. MethylNet: an automated and modular deep learning approach for DNA methylation analysis. BMC Bioinformatics. 2020;21:108.

He B, Bergenstråhle L, Stenbeck L, Abid A, Andersson A, Borg Å, et al. Integrating spatial gene expression and breast tumour morphology via deep learning. Nat Biomed Eng. 2020;4:827–34.

Chang Y, Park H, Yang H-J, Lee S, Lee K-Y, Kim TS, et al. Cancer Drug Response Profile scan (CDRscan): a deep learning model that predicts drug effectiveness from cancer genomic signature. Sci Rep. 2018;8:8857.

Preuer K, Lewis RPI, Hochreiter S, Bender A, Bulusu KC, Klambauer G. DeepSynergy: predicting anti-cancer drug synergy with Deep Learning. Wren J, editor. Bioinformatics. 2018;34(9):1538–46. https://doi.org/10.1093/bioinformatics/btx806 .

Jiang P, Huang S, Fu Z, Sun Z, Lakowski TM, Hu P. Deep graph embedding for prioritizing synergistic anticancer drug combinations. Comput Struct Biotechnol J. 2020;18:427–38.

Zeng X, Zhu S, Liu X, Zhou Y, Nussinov R, Cheng F. deepDR: a network-based deep learning approach to in silico drug repositioning. Cowen L, editor. Bioinformatics. 2019;35:5191–5198.

Wen M, Zhang Z, Niu S, Sha H, Yang R, Yun Y, et al. Deep-learning-based drug−target interaction prediction. J Proteome Res. 2017;16(4):1401–9.

Walsh S, de Jong EEC, van Timmeren JE, Ibrahim A, Compter I, Peerlings J, et al. Decision support systems in oncology. JCO Clin Cancer Inform. 2019;3(3):1–9. https://doi.org/10.1200/CCI.18.00001 .

Gurcan M, Lozanski G, Pennell M, Shana′Ah A, Zhao W, Gewirtz A, et al. Inter-reader variability in follicular lymphoma grading: conventional and digital reading. J Pathol Inform. 2013;4:30.

Rabe K, Snir OL, Bossuyt V, Harigopal M, Celli R, Reisenbichler ES. Interobserver variability in breast carcinoma grading results in prognostic stage differences. Hum Pathol. 2019;94:51–7.

Maggiori E, Tarabalka Y, Charpiat G, Alliez P. High-resolution image classification with convolutional networks. 2017 IEEE Int Geosci Remote Sens Symp IGARSS [Internet]. Fort Worth: IEEE; 2017. p. 5157–60. [cited 2020 Dec 8]. Available from: http://ieeexplore.ieee.org/document/8128163/

Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, et al. Generative adversarial networks. ArXiv14062661 Cs Stat [Internet]. 2014; [cited 2021 Apr 27]; Available from: http://arxiv.org/abs/1406.2661 .

Luc P, Couprie C, Chintala S, Verbeek J. Semantic segmentation using adversarial networks. ArXiv161108408 Cs [Internet]. 2016; [cited 2021 Aug 12]; Available from: http://arxiv.org/abs/1611.08408 .

Sorlie T, Perou CM, Tibshirani R, Aas T, Geisler S, Johnsen H, et al. Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc Natl Acad Sci. 2001;98:10869–74.

Yersal O. Biological subtypes of breast cancer: prognostic and therapeutic implications. World J Clin Oncol. 2014;5(3):412–24. https://doi.org/10.5306/wjco.v5.i3.412 .

Komor MA, Bosch LJ, Bounova G, Bolijn AS, Delis-van Diemen PM, Rausch C, et al. Consensus molecular subtype classification of colorectal adenomas: CMS classification of colorectal adenomas. J Pathol. 2018;246:266–76.

Tothill RW, Tinker AV, George J, Brown R, Fox SB, Lade S, et al. Novel molecular subtypes of serous and endometrioid ovarian cancer linked to clinical outcome. Clin Cancer Res. 2008;14(16):5198–208. https://doi.org/10.1158/1078-0432.CCR-08-0196 .

Jain S, Xu R, Prieto VG, Lee P. Molecular classification of soft tissue sarcomas and its clinical applications. Int J Clin Exp. 2010;3:416–29.

CAS   Google Scholar  

Leek JT, Scharpf RB, Bravo HC, Simcha D, Langmead B, Johnson WE, et al. Tackling the widespread and critical impact of batch effects in high-throughput data. Nat Rev Genet. 2010;11:733–9.

Haury A-C, Gestraud P, Vert J-P. The influence of feature selection methods on accuracy, stability and interpretability of molecular signatures. Teh M-T, editor. PLoS One. 2011;6:e28210.

Kela I, Ein-Dor L, Getz G, Givol D, Domany E. Outcome signature genes in breast cancer: is there a unique set? Breast Cancer Res. 2005;7:P4.38, bcr1168.

Drier Y, Domany E. Do two machine-learning based prognostic signatures for breast cancer capture the same biological processes? El-Rifai W, editor. PLoS One. 2011;6:e17795.

Hu F, Zhou Y, Wang Q, Yang Z, Shi Y, Chi Q. Gene expression classification of lung adenocarcinoma into molecular subtypes. IEEE/ACM Trans Comput Biol Bioinform. 2020;17:1187–97.

Wang K, Duan X, Gao F, Wang W, Liu L, Wang X. Dissecting cancer heterogeneity based on dimension reduction of transcriptomic profiles using extreme learning machines. Wong K-K, editor. PLoS One. 2018;13:e0203824.

Varadhachary GR, Abbruzzese JL, Lenzi R. Diagnostic strategies for unknown primary cancer. Cancer. 2004;100:1776–85.

Greco FA. Molecular diagnosis of the tissue of origin in cancer of unknown primary site: useful in patient management. Curr Treat Options in Oncol. 2013;14:634–42.

Pavlidis N, Pentheroudakis G. Cancer of unknown primary site. Lancet. 2012;379:1428–35.

Varadhachary GR, Raber MN. Cancer of unknown primary site. N Engl J Med. 2014;371(8):757–65. https://doi.org/10.1056/NEJMra1303917 .

Kandoth C, McLellan MD, Vandin F, Ye K, Niu B, Lu C, et al. Mutational landscape and significance across 12 major cancer types. Nature. 2013;502:333–9.

Lawrence MS, Stojanov P, Polak P, Kryukov GV, Cibulskis K, Sivachenko A, et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature. 2013;499:214–8.

Ciriello G, Miller ML, Aksoy BA, Senbabaoglu Y, Schultz N, Sander C. Emerging landscape of oncogenic signatures across human cancers. Nat Genet. 2013;45:1127–33.

The ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium. Pan-cancer analysis of whole genomes. Nature. 2020;578:82–93.

Chen Y, Sun J, Huang L-C, Xu H, Zhao Z. Classification of cancer primary sites using machine learning and somatic mutations. Biomed Res Int. 2015;2015:1–9.

Tothill RW, Li J, Mileshkin L, Doig K, Siganakis T, Cowin P, et al. Massively-parallel sequencing assists the diagnosis and guided treatment of cancers of unknown primary: NGS in cancers of unknown primary. J Pathol. 2013;231:413–23.

Soh KP, Szczurek E, Sakoparnig T, Beerenwinkel N. Predicting cancer type from tumour DNA signatures. Genome Med. 2017;9:104.

Marquard AM, Birkbak NJ, Thomas CE, Favero F, Krzystanek M, Lefebvre C, et al. TumorTracer: a method to identify the tissue of origin from the somatic mutations of a tumor specimen. BMC Med Genet. 2015;8:58.

Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, et al. Going deeper with convolutions. 2015 IEEE Conf Comput Vis Pattern Recognit CVPR [Internet]. Boston: IEEE; 2015. p. 1–9. [cited 2020 Dec 8]. Available from: http://ieeexplore.ieee.org/document/7298594/

Ilse M, Tomczak JM, Welling M. Attention-based deep multiple instance learning. arXiv:1802.04712 Cs [Internet]. 2018. [cited 2021 Sep 17]. Available from https://arxiv.org/abs/1802.04712 .

Lu MY, Williamson DFK, Chen TY, Chen RJ, Barbieri M, Mahmood F. Data-efficient and weakly supervised computational pathology on whole-slide images. Nat Biomed Eng [Internet]. 2021; [cited 2021 May 10]; Available from: http://www.nature.com/articles/s41551-020-00682-w .

Nair M, Sandhu S, Sharma A. Prognostic and predictive biomarkers in cancer. Curr Cancer Drug Targets. 2014;14:477–504.

Lai Y-H, Chen W-N, Hsu T-C, Lin C, Tsao Y, Wu S. Overall survival prediction of non-small cell lung cancer by integrating microarray and clinical data with deep learning. Sci Rep. 2020;10:4679.

Cox DR. Regression Models and Life-Tables, vol. 35; 2020.

Ahmed FE, Vos PW, Holbert D. Modeling survival in colon cancer: a methodological review. Mol Cancer. 2007;6(1):15. https://doi.org/10.1186/1476-4598-6-15 .

de O Ferraz R, Moreira-Filho D de C. Survival analysis of women with breast cancer: competing risk models. Ciênc Saúde Coletiva. 2017;22:3743–54.

Solvang HK, Lingjærde OC, Frigessi A, Børresen-Dale A-L, Kristensen VN. Linear and non-linear dependencies between copy number aberrations and mRNA expression reveal distinct molecular pathways in breast cancer. BMC Bioinformatics. 2011;12:197.

Fabregat A, Jupe S, Matthews L, Sidiropoulos K, Gillespie M, Garapati P, et al. The Reactome Pathway Knowledgebase. Nucleic Acids Res. 2018;46(D1):D649–55. https://doi.org/10.1093/nar/gkx1132 .

Kanehisa M, Furumichi M, Tanabe M, Sato Y, Morishima K. KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res. 2017;45:D353–61.

Weber GL, Parat M-O, Binder ZA, Gallia GL, Riggins GJ. Abrogation of PIK3CA or PIK3R1 reduces proliferation, migration, and invasion in glioblastoma multiforme cells. Oncotarget. 2011;2:833–49.

Brahm CG, Walenkamp AME, Linde MEV, Verheul HMW, Stephan R, Fehrmann N. Identification of novel therapeutic targets in glioblastoma with functional genomic mRNA profiling. J Clin Oncol [Internet]. 2017;35 Available from: https://ascopubs.org/doi/10.1200/JCO.2017.35.15_suppl.2018 .

Keshava Prasad TS, Goel R, Kandasamy K, Keerthikumar S, Kumar S, Mathivanan S, et al. Human protein reference database--2009 update. Nucleic Acids Res. 2009;37:D767–72.

Zadeh Shirazi A, Fornaciari E, Bagherian NS, Ebert LM, Koszyca B, Gomez GA. DeepSurvNet: deep survival convolutional network for brain cancer survival rate classification based on histopathological images. Med Biol Eng Comput [Internet]. 2020; [cited 2020 Apr 6]; Available from: http://link.springer.com/10.1007/s11517-020-02147-3 .

Bychkov D, Linder N, Turkki R, Nordling S, Kovanen PE, Verrill C, et al. Deep learning based tissue analysis predicts outcome in colorectal cancer. Sci Rep. 2018;8:3395.

Tabibu S, Vinod PK, Jawahar CV. Pan-renal cell carcinoma classification and survival prediction from histopathology images using deep learning. Sci Rep. 2019;9:10509.

Saillard C, Schmauch B, Laifa O, Moarii M, Toldo S, Zaslavskiy M, et al. Predicting survival after hepatocellular carcinoma resection using deep-learning on histological slides. Hepatology. 2020;72(6):2000–13.

Courtiol P, Tramel EW, Sanselme M, Wainrib G. Classification and disease localization in histopathology using only global labels: a weakly-supervised approach. ArXiv180202212 Cs Stat [Internet]. 2020; [cited 2020 Apr 9]; Available from: http://arxiv.org/abs/1802.02212 .

Lundberg SM, Nair B, Vavilala MS, Horibe M, Eisses MJ, Adams T, et al. Explainable machine-learning predictions for the prevention of hypoxaemia during surgery. Nat Biomed Eng. 2018;2:749–60.

Shao W, Cheng J, Sun L, Han Z, Feng Q, Zhang D, et al. Ordinal multi-modal feature selection for survival analysis of early-stage renal cancer. In: Frangi AF, Schnabel JA, Davatzikos C, Alberola-López C, Fichtinger G, editors. Med Image Comput Comput Assist Interv – MICCAI 2018 [Internet]. Cham: Springer International Publishing; 2018. p. 648–56. [cited 2020 Apr 21]. Available from: http://link.springer.com/10.1007/978-3-030-00934-2_72 .

Ning Z, Pan W, Chen Y, Xiao Q, Zhang X, Luo J, et al. Integrative analysis of cross-modal features for the prognosis prediction of clear cell renal cell carcinoma. Schwartz R, editor. Bioinformatics. 2020;36(9):2888–95.

Shao W, Huang K, Han Z, Cheng J, Cheng L, Wang T, et al. Integrative analysis of pathological images and multi-dimensional genomic data for early-stage cancer prognosis. IEEE Trans Med Imaging. 2020;39(1):99–110. https://doi.org/10.1109/TMI.2019.2920608 .

Makiewicz A, Ratajczak W. Principal Components Analysis (PCA). Computers & Geosciences. 1993;19:303–42.

Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics. 2008;9:559.

Article   PubMed   PubMed Central   CAS   Google Scholar  

Samstein RM, Lee C-H, Shoushtari AN, Hellmann MD, Shen R, Janjigian YY, et al. Tumor mutational load predicts survival after immunotherapy across multiple cancer types. Nat Genet. 2019;51:202–6.

Riviere P, Goodman AM, Okamura R, Barkauskas DA, Whitchurch TJ, Lee S, et al. High tumor mutational burden correlates with longer survival in immunotherapy-naïve patients with diverse cancers. Mol Cancer Ther. 2020;19(10):2139–45. https://doi.org/10.1158/1535-7163.MCT-20-0161 .

Bao X, Zhang H, Wu W, Cheng S, Dai X, Zhu X, et al. Analysis of the molecular nature associated with microsatellite status in colon cancer identifies clinical implications for immunotherapy. J Immunother Cancer. 2020;8:e001437.

Cortes-Ciriano I, Lee S, Park W-Y, Kim T-M, Park PJ. A molecular portrait of microsatellite instability across multiple cancers. Nat Commun. 2017;8:15180.

Kather JN, Pearson AT, Halama N, Jäger D, Krause J, Loosen SH, et al. Deep learning can predict microsatellite instability directly from histology in gastrointestinal cancer. Nat Med. 2019;25:1054–6.

Runa F, Hamalian S, Meade K, Shisgal P, Gray PC, Kelber JA. Tumor microenvironment heterogeneity: challenges and opportunities. Curr Mol Biol Rep. 2017;3:218–29.

Borst J, Ahrends T, Bąbała N, Melief CJM, Kastenmüller W. CD4+ T cell help in cancer immunology and immunotherapy. Nat Rev Immunol. 2018;18(10):635–47. https://doi.org/10.1038/s41577-018-0044-0 .

Tumeh PC, Harview CL, Yearley JH, Shintaku IP, Taylor EJM, Robert L, et al. PD-1 blockade induces responses by inhibiting adaptive immune resistance. Nature. 2014;515:568–71.

Newman AM, Liu CL, Green MR, Gentles AJ, Feng W, Xu Y, et al. Robust enumeration of cell subsets from tissue expression profiles. Nat Methods. 2015;12:453–7.

Newman AM, Steen CB, Liu CL, Gentles AJ, Chaudhuri AA, Scherer F, et al. Determining cell type abundance and expression from bulk tissues with digital cytometry. Nat Biotechnol. 2019;37:773–82.

Chakravarthy A, Furness A, Joshi K, Ghorani E, Ford K, Ward MJ, et al. Pan-cancer deconvolution of tumour composition using DNA methylation. Nat Commun. 2018;9:3220.

Klauschen F, Müller K-R, Binder A, Bockmayr M, Hägele M, Seegerer P, et al. Scoring of tumor-infiltrating lymphocytes: from visual estimation to machine learning. Semin Cancer Biol. 2018;52(Pt 2):151–7. https://doi.org/10.1016/j.semcancer.2018.07.001 .

Lopez R, Regier J, Cole MB, Jordan MI, Yosef N. Deep generative modeling for single-cell transcriptomics. Nat Methods. 2018;15:1053–8.

Amodio M, van Dijk D, Srinivasan K, Chen WS, Mohsen H, Moon KR, et al. Exploring single-cell data with deep multitasking neural networks. Nat Methods. 2019;16:1139–45.

Deng Y, Bao F, Dai Q, Wu LF, Altschuler SJ. Scalable analysis of cell-type composition from single-cell transcriptomics using deep recurrent learning. Nat Methods. 2019;16:311–4.

Fan J, Slowikowski K, Zhang F. Single-cell transcriptomics in cancer: computational challenges and opportunities. Exp Mol Med. 2020;52:1452–65.

Ståhl PL, Salmén F, Vickovic S, Lundmark A, Navarro JF, Magnusson J, et al. Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science. 2016;353:78–82.

Gerlinger M, Rowan AJ, Horswell S, Larkin J, Endesfelder D, Gronroos E, et al. Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. N Engl J Med. 2012;366:883–92.

Yoosuf N, Navarro JF, Salmén F, Ståhl PL, Daub CO. Identification and transfer of spatial transcriptomics signatures for cancer diagnosis. Breast Cancer Res. 2020;22:6.

Vivarelli S, Salemi R, Candido S, Falzone L, Santagati M, Stefani S, et al. Gut microbiota and cancer: from pathogenesis to therapy. Cancers. 2019;11:38.

Cammarota G, Ianiro G, Ahern A, Carbone C, Temko A, Claesson MJ, et al. Gut microbiome, big data and machine learning to promote precision medicine for cancer. Nat Rev Gastroenterol Hepatol. 2020;17:635–48.

Relling MV, Evans WE. Pharmacogenomics in the clinic. Nature. 2015;526(7573):343–50. https://doi.org/10.1038/nature15817 .

Adam G, Rampášek L, Safikhani Z, Smirnov P, Haibe-Kains B, Goldenberg A. Machine learning approaches to drug response prediction: challenges and recent progress. NPJ Precis Oncol. 2020;4(1):19. https://doi.org/10.1038/s41698-020-0122-1 .

Kalinin AA, Higgins GA, Reamaroon N, Soroushmehr S, Allyn-Feuer A, Dinov ID, et al. Deep learning in pharmacogenomics: from gene regulation to patient stratification. Pharmacogenomics. 2018;19:629–50.

Chiu Y-C, Chen H-IH, Gorthi A, Mostavi M, Zheng S, Huang Y, et al. Deep learning of pharmacogenomics resources: moving towards precision oncology. Brief Bioinform. 2020;21(6):2066–83.

Rampášek L, Hidru D, Smirnov P, Haibe-Kains B, Goldenberg A. Dr.VAE: improving drug response prediction via modeling of drug perturbation effects. Schwartz R, editor. Bioinformatics. 2019;35:3743–3751.

Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26:139–40.

Kompa B, Snoek J, Beam AL. Second opinion needed: communicating uncertainty in medical machine learning. NPJ Digit Med. 2021;4:4.

Wang H, Yeung D-Y. Towards Bayesian deep learning: a framework and some existing methods. IEEE Trans Knowl Data Eng. 2016;28:3395–408.

Danaee P, Ghaeini R, Hendrix DA. A deep learning approach for cancer detection and relevant gene identification. Biocomput 2017 [Internet]. Kohala Coast: World Scientific; 2017. p. 219–29. [cited 2021 May 10]. Available from: http://www.worldscientific.com/doi/abs/10.1142/9789813207813_0022

Khairnar P, Thiagarajan P, Ghosh S. A modified Bayesian convolutional neural network for breast histopathology image classification and uncertainty quantification. ArXiv201012575 Cs Eess [Internet]. 2020; [cited 2021 May 10]; Available from: http://arxiv.org/abs/2010.12575 .

Abdar M, Samami M, Mahmoodabad SD, Doan T, Mazoure B, Hashemifesharaki R, et al. Uncertainty quantification in skin cancer classification using three-way decision-based Bayesian deep learning. Comput Biol Med. 2021;135:104418.

Download references

Acknowledgements

Khoa Tran was the recipient of the Maureen and Barry Stevenson PhD Scholarship, we are grateful to Maureen Stevenson for her support.

We would also like to thank Rebecca Johnston for her scientific advice and intellectual discussions.

Nicola Waddell is supported by a National Health and Medical Research Council of Australia (NHMRC) Senior Research Fellowship (APP1139071).

Author information

Authors and affiliations.

Department of Genetics and Computational Biology, QIMR Berghofer Medical Research Institute, Brisbane, 4006, Australia

Khoa A. Tran, Olga Kondrashova, John V. Pearson & Nicola Waddell

School of Biomedical Sciences, Faculty of Health, Queensland University of Technology (QUT), Brisbane, 4059, Australia

Khoa A. Tran & Elizabeth D. Williams

Australian Prostate Cancer Research Centre - Queensland (APCRC-Q) and Queensland Bladder Cancer Initiative (QBCI), Brisbane, 4102, Australia

Elizabeth D. Williams

Faculty of Engineering, Queensland University of Technology (QUT), Brisbane, 4000, Australia

Andrew Bradley

You can also search for this author in PubMed   Google Scholar

Contributions

Khoa Tran, Olga Kondrashova and Nicola Waddell co-wrote the paper. Andrew Bradley, Elizabeth Williams and John Pearson reviewed and edited the paper. The authors read and approved the final manuscript.

Corresponding author

Correspondence to Nicola Waddell .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests.

John V Pearson and Nicola Waddell are co-founders and Board members of genomiQa. The remaining authors declare that they have no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Tran, K.A., Kondrashova, O., Bradley, A. et al. Deep learning in cancer diagnosis, prognosis and treatment selection. Genome Med 13 , 152 (2021). https://doi.org/10.1186/s13073-021-00968-x

Download citation

Received : 13 December 2020

Accepted : 12 September 2021

Published : 27 September 2021

DOI : https://doi.org/10.1186/s13073-021-00968-x

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Multi-modal learning
  • Explainability
  • Cancer genomics
  • Molecular subtypes
  • Pharmacogenomics

Genome Medicine

ISSN: 1756-994X

research proposal of deep learning

  • DOI: 10.3837/tiis.2019.04.001
  • Corpus ID: 153311019

A Review of Deep Learning Research

  • Ruihui Mu , Xiaoqin Zeng
  • Published in KSII Transactions on Internet… 29 April 2019
  • Computer Science

Figures and Tables from this paper

figure 1

148 Citations

Research on deep neural network model construction and overfitting.

  • Highly Influenced

Bridging Techniques: A Review of Deep Learning and Fuzzy Logic Applications

A review on the integration of deep learning and service-oriented architecture, face recognition based on convolutional neural networks, music score recognition method based on deep learning, flower recognition based on transfer learning and adam deep learning optimization algorithm, deep learning in different remote sensing image categories and applications: status and prospects, the influence of deep learning algorithms factors in software fault prediction, classification model of wheat grain based on autoencoder, application of deep learning in defect detection, 57 references, on optimization methods for deep learning, imagenet classification with deep convolutional neural networks, image super-resolution using deep convolutional networks, extracting and composing robust features with denoising autoencoders, deep networks for image super-resolution with sparse prior, convolutional sequence to sequence learning, understanding the difficulty of training deep feedforward neural networks, learning deep architectures for ai, faster r-cnn: towards real-time object detection with region proposal networks, feature enhancement by deep lstm networks for asr in reverberant multisource environments, related papers.

Showing 1 through 3 of 0 Related Papers

RESEARCH PROPOSAL DEEP LEARNING

Get your deep learning proposal work from high end trained professionals. The passion of your areas of interest will be clearly reflected in your proposal. Chose an expert to provide you with custom research proposal work. To interpret the real-time process of the art, historical context and future scopes we have made a literature survey in Deep Learning (DL).

  • Define Objectives:
  • Clearly sketch what we need to execute with our comprehensive view.
  • Take transformers in Natural Language Processing (NLP) as an example and note its specific tasks and issues.
  • Primary Sources:
  • Research Databases: We can use the fields such as Google Scholar, arXIv, PubMed (for biomedical papers), IEEE Xplore, and others.
  • Conference: Here NeurIPS, ICML, ICLR, CVPR, ICCV, ACL, EMNLP are the basic conferences in DL.
  • Journal: The Journal of Machine Learning Research (JMLR) and Neural Computation are the papers frequently establish DL related studies.
  • Start by Reviews and Surveys:
  • Find the latest survey and review papers on our area of interest which gives a literature outline and frequently see the seminal latest works.
  • Begin with Convolutional Neural Networks (CNNs) architecture survey paper if we search for CNN.
  • Reading Papers:
  • Skim: Begin with reading abstracts, introductions, conclusions, and figures.
  • Deep Dive: When a study shows high similar to our work, then look in-depth to its methodology, experiments, and results.
  • Take Notes: Look down the basic plans, methods, datasets, Evaluation metrics, and open issues described in the paper and note it.
  • Forward and Backward Search:
  • Forward: We can detect how the area is emerging using the tools such as Google Scholar’s “Cited by” feature to find latest papers in our research.
  • Backward : We can track the improvement of designs by seeing the reference which is gives more knowledge in our study.  
  • Organize and Combine:
  • Classify the papers by its themes, methodologies and version.
  • We have to analyze the trends, patterns, and gaps in the literature.
  • Keep Updates:
  • We need to stay update with notifications on fields such as Google Scholar and arXiv for keywords similar to our title with the recent publications, because Dl is a fast-emerging area.
  • Tools and Platforms:
  • Utilize the tools such as Mendeley, Zotero and EndNote for maintaining and citing papers.
  • We find similar papers with AI-driven suggestions from Semantic Scholar platform.
  • Engage with the Community:
  • Join into mailing lists, social media groups and online conference to get related with DL. Websites such as Reddit’s r/Machine Learning or the AI Alignment Forum frequently gather latest papers.
  • By attending the webinars, workshops and meetings often can help us to gain skills from recent techniques and find knowledge of what the group seems essential.
  • Report and Share:
  • If we want to establish the paper make annotated bibliographies, presentations, and review papers based on our motive and file the research.
  • We can our scope to help others and publish us a skilled person in this topic.

            The objective of this review is to crucially recognize and integrate the real-time content in the area. Though it is a time-consuming work, it will be useful for someone aims to make research and latest works in DL.

Deep Learning project face recognition with python OpenCV

            Designing a face remembering system using Python and OpenCV is an amazing work that introduces us into the world of computer vision and DL. The following are the step-by-step guide to construct a simple face recognition system:

  • Install Necessary Libraries

Make sure that we have the required libraries installed:

pip install opencv-python opencv-python-headless

  • Capture Faces

We require a dataset for training. We utilize the pre-defined dataset and capture our own using OpenCV.

cam = cv2.VideoCapture(0)

detector = cv2.CascadeClassifier(cv2.data.haarcascades + ‘haarcascade_frontalface_default.xml’)

id = input(‘Enter user ID: ‘)

sampleNum = 0

while True:

    ret, img = cam.read()

    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

    faces = detector.detectMultiScale(gray, 1.3, 5)

    for (x,y,w,h) in faces:

        sampleNum += 1

        cv2.imwrite(f”faces/User.{id}.{sampleNum}.jpg”, gray[y:y+h,x:x+w])

        cv2.rectangle(img, (x,y), (x+w, y+h), (255,0,0), 2)

        cv2.waitKey(100)

    cv2.imshow(‘Capture’, img)

    cv2.waitKey(1)

    if sampleNum > 20: # capture 20 images

        break

cam.release()

cv2.destroyAllWindows()

  • Training the Recognizer

OpenCV has a built-in face recognizer. For this example, we’ll use the LBPH (Local Binary Pattern Histogram) face recognizer.

import numpy as np

from PIL import Image

path = ‘faces’

recognizer = cv2.face.LBPHFaceRecognizer_create()

def getImagesAndLabels(path):

    imagePaths = [os.path.join(path,f) for f in os.listdir(path)]    

    faceSamples=[]

    ids = []

    for imagePath in imagePaths:

        PIL_img = Image.open(imagePath).convert(‘L’)

        img_numpy = np.array(PIL_img,’uint8′)

        id = int(os.path.split(imagePath)[-1].split(“.”)[1])

        faces = detector.detectMultiScale(img_numpy)

        for (x,y,w,h) in faces:

           faceSamples.append(img_numpy[y:y+h,x:x+w])

            ids.append(id)

    return faceSamples, np.array(ids)

faces,ids = getImagesAndLabels(path)

recognizer.train(faces, ids)

recognizer.save(‘trainer/trainer.yml’)

  • Recognizing Faces

recognizer.read(‘trainer/trainer.yml’)

cascadePath = cv2.data.haarcascades + “haarcascade_frontalface_default.xml”

faceCascade = cv2.CascadeClassifier(cascadePath)

font = cv2.FONT_HERSHEY_SIMPLEX

minW = 0.1*cam.get(3)

minH = 0.1*cam.get(4)

    faces = faceCascade.detectMultiScale(

        gray,

        scaleFactor=1.2,

        minNeighbors=5,

        minSize=(int(minW), int(minH)),

        id, confidence = recognizer.predict(gray[y:y+h,x:x+w])

        if (confidence < 100):

            confidence = f”  {round(100 – confidence)}%”

        else:

            id = “unknown”

        cv2.putText(img, str(id), (x+5,y-5), font, 1, (255,255,255), 2)

        cv2.putText(img, str(confidence), (x+5,y+h-5), font, 1, (255,255,0), 1) 

    cv2.imshow(‘Face Recognition’,img)

    if cv2.waitKey(1) & 0xFF == ord(‘q’):

We have proper directories (faces and trainer) to design. It will be a basic face recognition system and can strengthen with DL models for better accuracy and robustness against various states in real-time. To achieve better accuracy in real-time conditions, we discover latest DL based techniques like FaceNet or pre-trained models from DL frameworks.

Deep learning MS Thesis topics

Have a conversation with our faculty members to get the best topics that matches with your interest. Some of the unique topic ideas are shared below …. contact us for more support.

RESEARCH PROPOSAL DEEP LEARNING BRILLIANT PROJECT IDEAS

  • Modulation Recognition based on Incremental Deep Learning
  • Fast Channel Analysis and Design Approach using Deep Learning Algorithm for 112Gbs HSI Signal Routing Optimization
  • Deep Learning of Process Data with Supervised Variational Auto-encoder for Soft Sensor
  • Methodological Principles for Deep Learning in Software Engineering
  • Recent Trends in Deep Learning for Natural Language Processing and Scope for Asian Languages
  • Adding Context to Source Code Representations for Deep Learning
  • Weekly Power Generation Forecasting using Deep Learning Techniques: Case Study of a 1.5 MWp Floating PV Power Plant
  • A Study of Deep Learning Approaches and Loss Functions for Abundance Fractions Estimation
  • A Trustless Federated Framework for Decentralized and Confidential Deep Learning
  • Research on Financial Data Analysis Based on Applied Deep Learning in Quantitative Trading
  • A Deep Learning model for day-ahead load forecasting taking advantage of expert knowledge
  • Locational marginal price forecasting using Transformer-based deep learning network
  • H-Stegonet: A Hybrid Deep Learning Framework for Robust Steganalysis
  • Comparison of Deep Learning Approaches for Sentiment Classification
  • An Unmanned Network Intrusion Detection Model Based on Deep Reinforcement Learning
  • Indoor Object Localization and Tracking Using Deep Learning over Received Signal Strength
  • Analysis of Deep Learning 3-D Imaging Methods Based on UAV SAR
  • Research and improvement of deep learning tool chain for electric power applications
  • Hybrid Intrusion Detector using Deep Learning Technique
  • Non-Trusted user Classification-Comparative Analysis of Machine and Deep Learning Approaches

Why Work With Us ?

Senior research member, research experience, journal member, book publisher, research ethics, business ethics, valid references, explanations, paper publication, 9 big reasons to select us.

Our Editor-in-Chief has Website Ownership who control and deliver all aspects of PhD Direction to scholars and students and also keep the look to fully manage all our clients.

Our world-class certified experts have 18+years of experience in Research & Development programs (Industrial Research) who absolutely immersed as many scholars as possible in developing strong PhD research projects.

We associated with 200+reputed SCI and SCOPUS indexed journals (SJR ranking) for getting research work to be published in standard journals (Your first-choice journal).

PhDdirection.com is world’s largest book publishing platform that predominantly work subject-wise categories for scholars/students to assist their books writing and takes out into the University Library.

Our researchers provide required research ethics such as Confidentiality & Privacy, Novelty (valuable research), Plagiarism-Free, and Timely Delivery. Our customers have freedom to examine their current specific research activities.

Our organization take into consideration of customer satisfaction, online, offline support and professional works deliver since these are the actual inspiring business factors.

Solid works delivering by young qualified global research team. "References" is the key to evaluating works easier because we carefully assess scholars findings.

Detailed Videos, Readme files, Screenshots are provided for all research projects. We provide Teamviewer support and other online channels for project explanation.

Worthy journal publication is our main thing like IEEE, ACM, Springer, IET, Elsevier, etc. We substantially reduces scholars burden in publication side. We carry scholars from initial submission to final acceptance.

Related Pages

Our benefits, throughout reference, confidential agreement, research no way resale, plagiarism-free, publication guarantee, customize support, fair revisions, business professionalism, domains & tools, we generally use, wireless communication (4g lte, and 5g), ad hoc networks (vanet, manet, etc.), wireless sensor networks, software defined networks, network security, internet of things (mqtt, coap), internet of vehicles, cloud computing, fog computing, edge computing, mobile computing, mobile cloud computing, ubiquitous computing, digital image processing, medical image processing, pattern analysis and machine intelligence, geoscience and remote sensing, big data analytics, data mining, power electronics, web of things, digital forensics, natural language processing, automation systems, artificial intelligence, mininet 2.1.0, matlab (r2018b/r2019a), matlab and simulink, apache hadoop, apache spark mlib, apache mahout, apache flink, apache storm, apache cassandra, pig and hive, rapid miner, support 24/7, call us @ any time, +91 9444829042, [email protected].

Questions ?

Click here to chat with us

IEEE Account

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

PhD Proposal: A Reconciliation of Deep Learning and the Brain: Towards Hybrid Biologically-Augmented Recurrent Neural Networks for Temporal Sensory Perception

In 1958, Frank Rosenblatt conceived of the Perceptron, in an effort to fulfill the dream of connectionism, to explain and recreate brain phenomena such as learning and behavior through simple learning rules of simple neurons. After his tragic death and the A.I. winter, and the resurgence that followed, his more brain-focused network was distilled into the more standardized feed-forward deep multi-layer perceptrons, or deep artificial neural networks, that we are more familiar with today. However, even in proposing the perceptron, Rosenblatt hinted that it was really hierarchical and temporal information that was interesting: an intuitively clear point, as all the data we experience is in the temporal domain. Backpropagation continues to dominate, although it appears to be ill-fitted for recurrent network training. This is reinforced by the fact that backpropagation-trained feed-forward Transformer networks outperform RNNs on temporal tasks, causing RNNs to lose favor in the ML and AI communities for temporal data. Reservoir computing, a type of recurrent neural network that keeps random recurrent connections but trains only a readout layer, avoids the pitfalls of backpropagation with recurrence while showing strong performance, but needs further development to compete with the state of the art, especially hierarchical or deep variants.My proposed dissertation aims to pick up where the perceptron left off, in motivation and spirit, to continue to look to the brain to construct networks via the connectionist philosophy. Still believing that recurrent connections will be a powerful tool for temporal learning, and looking to the biology, I will propose a new class of recurrent neural networks, what I will call B-RNNs, short for Biologically-Augmented RNNs, that will build off of the success of the reservoir computing paradigm, but go steps further by incorporating reservoirs into new hybrid hierarchical architectures trained by new backpropagation alternatives, deep reinforcement learning, and composed of new insights and findings from neuroscience. The B-RNN nomenclature will also serve as a taxonomical organization umbrella for these networks. I will show in completed work that even simple hybridizations can beat deep LSTM and GRUs at complex temporal classification tasks, and propose several more complex B-RNNs in development and beyond. I will also lay out a framework for how B-RNNs can serve as a standardization for spiking neural network architectures..Examining Committee:

Chair: Dr. Yiannis Aloimonos Dept rep: Dr. James Reggia Members: Dr. Cornelia Fermüller Dr. Michelle Girvan Dr. Daniel Butts

Deep Reinforcement Learning of Region Proposal Networks for Object Detection

Research areas.

Machine Perception

Machine Intelligence

Learn more about how we conduct our research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work.

Philosophy-light-banner

  • Our Promise
  • Our Achievements
  • Our Mission
  • Proposal Writing
  • System Development
  • Paper Writing
  • Paper Publish
  • Synopsis Writing
  • Thesis Writing
  • Assignments
  • Survey Paper
  • Conference Paper
  • Journal Paper
  • Empirical Paper
  • Journal Support
  • Deep Learning Research Proposal

Various deep learning project proposals development process includes describing a research-based queries and problems, demonstrating its importance and suggesting an approach to overcome this. Below we discussed about the common flow of deep learning-based project proposals:

  • Our title should be effective, clear and brief.
  • Introduction:
  • Background: In background study, we offer various research related ideas. What’s the current state of deep learning in the area of our interest?
  • Problem statement: Specific problems and issues that we are intended to overcome are properly described in our study. Why is it important?
  • Objective: We have to discuss about the major goal of our research.
  • Literature Survey:
  • For our specified problem, we described some previous research solutions and techniques.
  • We have to point out the research gaps in some techniques or concepts that need further enhancement.
  • To inspire our research, initial experimental analysis and results are examined by us.
  • Research Queries and Hypotheses:
  • Demonstrate particular queries that we ensure about answer and we will examine hypotheses.
  • Methodology:
  • Data: Utilized datasets and the processes involved such as gathering of data, preprocessing and data augmentation are investigated by us.
  • Framework: In our study, we discussed about the utilized or constructed neural network framework.
  • Training: We investigated about the process involved like model training, loss functions, optimization and regularization methods etc.
  • Evaluation: Discussed about how we are going to examine the model’s efficiency. What metrics will we use?
  • Software and Hardware: Utilized libraries, tools and computing resources have to be mentioned in our research.
  • Preliminary Findings (if any):
  • We can discuss about the basic experimental analysis and results that may inspire our research or it may show its efficiency.
  • Significance and Impact:
  • Important objective of our research are investigated by us. Who will benefit from it? How it might it advance the field?
  • Time utilization for various stages of our project is described in our study.
  • Budget (if applicable):
  • We discussed about the calculated cost like data acquisition, computing resources and software licenses.
  • Potential Challenges and Overcoming Plans:
  • The defined problems are highlighted and the strategies to overcome are also described by us.
  • Conclusion:
  • In conclusion, we explained about the essential aspects of our proposal and its significance is restated.
  • References:
  • We should point out all the cited references which include articles, papers and resources.

       Tips:

  • Clarity: We need to check whether our project is understandable and free from jargon or not. Even for the people who are not well versed in our research field, it should be interpretable.
  • Rigour: The exactness of our proposed techniques and concept is very important.
  • Feedback: Look for reviews from professionals, associations, and staffs to confirm about model’s efficiency is an appreciable one before publishing our project.

Often, we consider particular needs and conditions of the association, institution or some research groups in which we are going to submit your project proposals. We follow the rules and instructions properly. Plagiarism free paper will be provided in leading tools like Turnitin we detect plagiarism to assure you success.

Which of these are the research areas covered by deep learning?

            Deep learning has been utilized in various research concepts. We listed out deep learning based innovative fields and sub-fields that we frequently work out and achieve success are listed below:

  • Natural Language Processing (NLP):
  • Speech Recognition and Generation
  • Question-Answering
  • Language Modeling (for instance: Transformers)
  • Named Entity Recognition
  • Sentiment Analysis
  • Text Summarization
  • Machine Translation
  • Healthcare and Biomedical :
  • Drug Findings
  • Predictive Analytics for patient Care
  • Medical Image Analysis
  • Genomic Sequence Analysis
  • Algorithmic Trading
  • Credit Scoring
  • Fraud Detection
  • Computer Vision :
  • Image Categorization
  • Facial Recognition
  • Object Identification and Segmentation
  • Image Generation (For example: GANs)
  • Image to Image Translation
  • Super-Resolution
  • Art and Creativity:
  • Art Generation
  • Music Composition
  • Style Transfer
  • Anomaly Identification :
  • Network Security
  • Industrial Defect Identification
  • Multimodal Learning :
  • Combining Information from Various Sources
  • Cross-modal Transfer Learning
  • Robot Navigation
  • Manipulation Tasks
  • Ethics and Fairness:
  • Bias Identification in Frameworks
  • Interpretability & Explainability of deep Models
  • Generative Models :
  • Generative Adversarial Networks (GANs)
  • Variational Autoencoders (VAEs)
  • Agriculture :
  • Identification of Crop Disease
  • Precision Agriculture
  • Reinforcement Learning:
  • Robotics Control
  • Game Playing (For instance: AlphaGo, OpenAI Five)
  • Optimization Issues
  • Neuroscience :
  • Neural Signal processing
  • Brain-Computer Interfaces
  • Time Series Analysis:
  • Weather Prediction
  • Stock Price Forecasting
  • Audio and Speech processing:
  • Speech Recognition
  • Speech Synthesis
  • Audio Categorization
  • Music Generation
  • Automatic System:
  • Drone Navigation
  • Self-Driving Cars
  • IoT and Edge Devices:
  • Activity Recognition
  • On-Device ML for Smart Devices

These various concepts are the subdivisions of several deep learning-based research topics. From this, we state that, the deep learning methods can be employed in a wide area and we can make use of its efficiency and ability.

Where can I find online deep learning projects?

If you are reading this page means you are in deep learning research help. Taught provoking research assistance will be given by online no matter where you are. We assist globally as our framework is reliable for all. Thesis topics and thesis ideas will be shared by our professional experts hurry up.

  • Recognition and classification of mathematical expressions using machine learning and deep learning methods
  • Prediction of Subscriber VoLTE using Machine Learning and Deep Learning
  • Light-Weight Design and Implementation of Deep Learning Accelerator for Mobile Systems
  • Multifarious Face Attendance System using Machine Learning and Deep Learning
  • A bert model for sms and twitter spam ham classification and comparative study of machine learning and deep learning technique
  • A Comparative Analysis for Leukocyte Classification Based on Various Deep Learning Models Using Transfer Learning
  • Machine Learning Based Real-Time Industrial Bin-Picking: Hybrid and Deep Learning Approaches
  • Insight on Human Activity Recognition Using the Deep Learning Approach
  • A Comprehensive Survey of Trending Tools and Techniques in Deep Learning
  • The Advance of the Combination Method of Machine Learning and Deep Learning
  • Research and Discussion on Image Recognition and Classification Algorithm Based on Deep Learning
  • An Intelligent Anti-jamming Decision-making Method Based on Deep Reinforcement Learning for Cognitive Radar
  • Conv2D Xception Adadelta Gradient Descent Learning Rate Deep learning Optimizer for Plant Species Classification
  • Beyond the Bias Variance Trade-Off: A Mutual Information Trade-Off in Deep Learning
  • Machine Learning and Deep Learning framework with Feature Selection for Intrusion Detection
  • Transfer Learning with Shapeshift Adapter: A Parameter-Efficient Module for Deep Learning Model
  • Deep Learning Network for Object Detection Under the Poor Lighting Condition
  • Sign Language Recognizer: A Deep Learning Approach
  • Ensemble Deep Learning Applied to Predict Building Energy Consumption
  • Will Deep Learning Change How Teams Execute Big Data Projects?

MILESTONE 1: Research Proposal

Finalize journal (indexing).

Before sit down to research proposal writing, we need to decide exact journals. For e.g. SCI, SCI-E, ISI, SCOPUS.

Research Subject Selection

As a doctoral student, subject selection is a big problem. Phdservices.org has the team of world class experts who experience in assisting all subjects. When you decide to work in networking, we assign our experts in your specific area for assistance.

Research Topic Selection

We helping you with right and perfect topic selection, which sound interesting to the other fellows of your committee. For e.g. if your interest in networking, the research topic is VANET / MANET / any other

Literature Survey Writing

To ensure the novelty of research, we find research gaps in 50+ latest benchmark papers (IEEE, Springer, Elsevier, MDPI, Hindawi, etc.)

Case Study Writing

After literature survey, we get the main issue/problem that your research topic will aim to resolve and elegant writing support to identify relevance of the issue.

Problem Statement

Based on the research gaps finding and importance of your research, we conclude the appropriate and specific problem statement.

Writing Research Proposal

Writing a good research proposal has need of lot of time. We only span a few to cover all major aspects (reference papers collection, deficiency finding, drawing system architecture, highlights novelty)

MILESTONE 2: System Development

Fix implementation plan.

We prepare a clear project implementation plan that narrates your proposal in step-by step and it contains Software and OS specification. We recommend you very suitable tools/software that fit for your concept.

Tools/Plan Approval

We get the approval for implementation tool, software, programing language and finally implementation plan to start development process.

Pseudocode Description

Our source code is original since we write the code after pseudocodes, algorithm writing and mathematical equation derivations.

Develop Proposal Idea

We implement our novel idea in step-by-step process that given in implementation plan. We can help scholars in implementation.

Comparison/Experiments

We perform the comparison between proposed and existing schemes in both quantitative and qualitative manner since it is most crucial part of any journal paper.

Graphs, Results, Analysis Table

We evaluate and analyze the project results by plotting graphs, numerical results computation, and broader discussion of quantitative results in table.

Project Deliverables

For every project order, we deliver the following: reference papers, source codes screenshots, project video, installation and running procedures.

MILESTONE 3: Paper Writing

Choosing right format.

We intend to write a paper in customized layout. If you are interesting in any specific journal, we ready to support you. Otherwise we prepare in IEEE transaction level.

Collecting Reliable Resources

Before paper writing, we collect reliable resources such as 50+ journal papers, magazines, news, encyclopedia (books), benchmark datasets, and online resources.

Writing Rough Draft

We create an outline of a paper at first and then writing under each heading and sub-headings. It consists of novel idea and resources

Proofreading & Formatting

We must proofread and formatting a paper to fix typesetting errors, and avoiding misspelled words, misplaced punctuation marks, and so on

Native English Writing

We check the communication of a paper by rewriting with native English writers who accomplish their English literature in University of Oxford.

Scrutinizing Paper Quality

We examine the paper quality by top-experts who can easily fix the issues in journal paper writing and also confirm the level of journal paper (SCI, Scopus or Normal).

Plagiarism Checking

We at phdservices.org is 100% guarantee for original journal paper writing. We never use previously published works.

MILESTONE 4: Paper Publication

Finding apt journal.

We play crucial role in this step since this is very important for scholar’s future. Our experts will help you in choosing high Impact Factor (SJR) journals for publishing.

Lay Paper to Submit

We organize your paper for journal submission, which covers the preparation of Authors Biography, Cover Letter, Highlights of Novelty, and Suggested Reviewers.

Paper Submission

We upload paper with submit all prerequisites that are required in journal. We completely remove frustration in paper publishing.

Paper Status Tracking

We track your paper status and answering the questions raise before review process and also we giving you frequent updates for your paper received from journal.

Revising Paper Precisely

When we receive decision for revising paper, we get ready to prepare the point-point response to address all reviewers query and resubmit it to catch final acceptance.

Get Accept & e-Proofing

We receive final mail for acceptance confirmation letter and editors send e-proofing and licensing to ensure the originality.

Publishing Paper

Paper published in online and we inform you with paper title, authors information, journal name volume, issue number, page number, and DOI link

MILESTONE 5: Thesis Writing

Identifying university format.

We pay special attention for your thesis writing and our 100+ thesis writers are proficient and clear in writing thesis for all university formats.

Gathering Adequate Resources

We collect primary and adequate resources for writing well-structured thesis using published research articles, 150+ reputed reference papers, writing plan, and so on.

Writing Thesis (Preliminary)

We write thesis in chapter-by-chapter without any empirical mistakes and we completely provide plagiarism-free thesis.

Skimming & Reading

Skimming involve reading the thesis and looking abstract, conclusions, sections, & sub-sections, paragraphs, sentences & words and writing thesis chorological order of papers.

Fixing Crosscutting Issues

This step is tricky when write thesis by amateurs. Proofreading and formatting is made by our world class thesis writers who avoid verbose, and brainstorming for significant writing.

Organize Thesis Chapters

We organize thesis chapters by completing the following: elaborate chapter, structuring chapters, flow of writing, citations correction, etc.

Writing Thesis (Final Version)

We attention to details of importance of thesis contribution, well-illustrated literature review, sharp and broad results and discussion and relevant applications study.

How PhDservices.org deal with significant issues ?

1. novel ideas.

Novelty is essential for a PhD degree. Our experts are bringing quality of being novel ideas in the particular research area. It can be only determined by after thorough literature search (state-of-the-art works published in IEEE, Springer, Elsevier, ACM, ScienceDirect, Inderscience, and so on). SCI and SCOPUS journals reviewers and editors will always demand “Novelty” for each publishing work. Our experts have in-depth knowledge in all major and sub-research fields to introduce New Methods and Ideas. MAKING NOVEL IDEAS IS THE ONLY WAY OF WINNING PHD.

2. Plagiarism-Free

To improve the quality and originality of works, we are strictly avoiding plagiarism since plagiarism is not allowed and acceptable for any type journals (SCI, SCI-E, or Scopus) in editorial and reviewer point of view. We have software named as “Anti-Plagiarism Software” that examines the similarity score for documents with good accuracy. We consist of various plagiarism tools like Viper, Turnitin, Students and scholars can get your work in Zero Tolerance to Plagiarism. DONT WORRY ABOUT PHD, WE WILL TAKE CARE OF EVERYTHING.

3. Confidential Info

We intended to keep your personal and technical information in secret and it is a basic worry for all scholars.

  • Technical Info: We never share your technical details to any other scholar since we know the importance of time and resources that are giving us by scholars.
  • Personal Info: We restricted to access scholars personal details by our experts. Our organization leading team will have your basic and necessary info for scholars.

CONFIDENTIALITY AND PRIVACY OF INFORMATION HELD IS OF VITAL IMPORTANCE AT PHDSERVICES.ORG. WE HONEST FOR ALL CUSTOMERS.

4. Publication

Most of the PhD consultancy services will end their services in Paper Writing, but our PhDservices.org is different from others by giving guarantee for both paper writing and publication in reputed journals. With our 18+ year of experience in delivering PhD services, we meet all requirements of journals (reviewers, editors, and editor-in-chief) for rapid publications. From the beginning of paper writing, we lay our smart works. PUBLICATION IS A ROOT FOR PHD DEGREE. WE LIKE A FRUIT FOR GIVING SWEET FEELING FOR ALL SCHOLARS.

5. No Duplication

After completion of your work, it does not available in our library i.e. we erased after completion of your PhD work so we avoid of giving duplicate contents for scholars. This step makes our experts to bringing new ideas, applications, methodologies and algorithms. Our work is more standard, quality and universal. Everything we make it as a new for all scholars. INNOVATION IS THE ABILITY TO SEE THE ORIGINALITY. EXPLORATION IS OUR ENGINE THAT DRIVES INNOVATION SO LET’S ALL GO EXPLORING.

Client Reviews

I ordered a research proposal in the research area of Wireless Communications and it was as very good as I can catch it.

I had wishes to complete implementation using latest software/tools and I had no idea of where to order it. My friend suggested this place and it delivers what I expect.

It really good platform to get all PhD services and I have used it many times because of reasonable price, best customer services, and high quality.

My colleague recommended this service to me and I’m delighted their services. They guide me a lot and given worthy contents for my research paper.

I’m never disappointed at any kind of service. Till I’m work with professional writers and getting lot of opportunities.

- Christopher

Once I am entered this organization I was just felt relax because lots of my colleagues and family relations were suggested to use this service and I received best thesis writing.

I recommend phdservices.org. They have professional writers for all type of writing (proposal, paper, thesis, assignment) support at affordable price.

You guys did a great job saved more money and time. I will keep working with you and I recommend to others also.

These experts are fast, knowledgeable, and dedicated to work under a short deadline. I had get good conference paper in short span.

Guys! You are the great and real experts for paper writing since it exactly matches with my demand. I will approach again.

I am fully satisfied with thesis writing. Thank you for your faultless service and soon I come back again.

Trusted customer service that you offer for me. I don’t have any cons to say.

I was at the edge of my doctorate graduation since my thesis is totally unconnected chapters. You people did a magic and I get my complete thesis!!!

- Abdul Mohammed

Good family environment with collaboration, and lot of hardworking team who actually share their knowledge by offering PhD Services.

I enjoyed huge when working with PhD services. I was asked several questions about my system development and I had wondered of smooth, dedication and caring.

I had not provided any specific requirements for my proposal work, but you guys are very awesome because I’m received proper proposal. Thank you!

- Bhanuprasad

I was read my entire research proposal and I liked concept suits for my research issues. Thank you so much for your efforts.

- Ghulam Nabi

I am extremely happy with your project development support and source codes are easily understanding and executed.

Hi!!! You guys supported me a lot. Thank you and I am 100% satisfied with publication service.

- Abhimanyu

I had found this as a wonderful platform for scholars so I highly recommend this service to all. I ordered thesis proposal and they covered everything. Thank you so much!!!

Related Pages

Research on the Integration of Deep Learning and Psychology in Intelligent Digital Education Technology

New citation alert added.

This alert has been successfully added and will be sent to:

You will be notified whenever a record that you have chosen has been cited.

To manage your alert preferences, click on the button below.

New Citation Alert!

Please log in to your account

Information & Contributors

Bibliometrics & citations, index terms.

Human-centered computing

Accessibility

Social and professional topics

User characteristics

Religious orientation

Recommendations

Improving learning outcomes for higher education through smart technology.

The ever-decreasing time between the doubling of knowledge creates a problem for education concerning how to handle information overload. To address this issue, educators must learn to make learning more effective and more efficient. Currently, there is ...

A review of empirical research on blended learning in teacher education programs

Although blended learning has been considered as an important alternative approach that can overcome various limitations related to both face-to-face and online learning, there is relatively limited empirical studies on blended learning approach in ...

Blended learning in higher education: Students' perceptions and their relation to outcomes

New information and communication technologies (ICTs) provide educators and learners with an innovative learning environment to stimulate and enhance the teaching and learning process. In this context, novel educational concepts such as blended learning ...

Information

Published in.

cover image ACM Other conferences

Association for Computing Machinery

New York, NY, United States

Publication History

Permissions, check for updates.

  • Research-article
  • Refereed limited

Contributors

Other metrics, bibliometrics, article metrics.

  • 0 Total Citations
  • 0 Total Downloads
  • Downloads (Last 12 months) 0
  • Downloads (Last 6 weeks) 0

View Options

Login options.

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

View options.

View or Download as a PDF file.

View online with eReader .

HTML Format

View this article in HTML Format.

Share this Publication link

Copying failed.

Share on social media

Affiliations, export citations.

  • Please download or close your previous search result export first before starting a new bulk export. Preview is not available. By clicking download, a status dialog will open to start the export process. The process may take a few minutes but once it finishes a file will be downloadable from your browser. You may continue to browse the DL while the export process is in progress. Download
  • Download citation
  • Copy citation

We are preparing your search results for download ...

We will inform you here when the file is ready.

Your file of search results citations is now ready.

Your search export query has expired. Please try again.

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here .

Loading metrics

Open Access

Peer-reviewed

Research Article

Combination therapy synergism prediction for virus treatment using machine learning models

Roles Methodology, Software, Writing – original draft

Affiliation Department of Computer Science and Information Technology, Institute for Advanced Studies in Basic Sciences (IASBS), Zanjan, Iran

Roles Conceptualization, Methodology, Writing – original draft, Writing – review & editing

Affiliation Department of QA, Kimia Zist Parsian Pharmaceutical Company, Zanjan, Iran

Roles Conceptualization, Methodology, Supervision, Writing – original draft, Writing – review & editing

* E-mail: [email protected]

ORCID logo

  • Shayan Majidifar, 
  • Arash Zabihian, 
  • Mohsen Hooshmand

PLOS

  • Published: September 4, 2024
  • https://doi.org/10.1371/journal.pone.0309733
  • Reader Comments

Table 1

Combining different drugs synergistically is an essential aspect of developing effective treatments. Although there is a plethora of research on computational prediction for new combination therapies, there is limited to no research on combination therapies in the treatment of viral diseases. This paper proposes AI-based models for predicting novel antiviral combinations to treat virus diseases synergistically. To do this, we assembled a comprehensive dataset comprising information on viral strains, drug compounds, and their known interactions. As far as we know, this is the first dataset and learning model on combination therapy for viruses. Our proposal includes using a random forest model, an SVM model, and a deep model to train viral combination therapy. The machine learning models showed the highest performance, and the predicted values were validated by a t-test, indicating the effectiveness of the proposed methods. One of the predicted combinations of acyclovir and ribavirin has been experimentally confirmed to have a synergistic antiviral effect against herpes simplex type-1 virus, as described in the literature.

Citation: Majidifar S, Zabihian A, Hooshmand M (2024) Combination therapy synergism prediction for virus treatment using machine learning models. PLoS ONE 19(9): e0309733. https://doi.org/10.1371/journal.pone.0309733

Editor: Michael Nevels, University of St Andrews, UNITED KINGDOM OF GREAT BRITAIN AND NORTHERN IRELAND

Received: June 2, 2024; Accepted: August 16, 2024; Published: September 4, 2024

Copyright: © 2024 Majidifar et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The data and code of Virus Combination Therapy are freely available at https://github.com/BioinformaticsIASBS/CombinationTherapy .

Funding: This work is based upon research funded by Iran National Science Foundation (INSF) under project No.4027788. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Bioinformatics is an interdisciplinary domain among biology, mathematics, statistics, and computer science that tends to theoretically and practically explore the field of human health solutions [ 1 ]. In other words, it utilizes the notions and tools of computer science and engineering in the analysis or introduction of efficient solutions for working with biological, medical, and even pharmacological data and information. One of the aspects of bioinformatics is to assist the drug discovery industry [ 2 ]. This is because drug discovery is an expensive research area and always looks for methods that reduce the cost and time of proposing a new drug for a disease, especially in emergency situations [ 3 ]. The virus-based diseases like SARS-CoV-2 [ 4 ], Mpox [ 5 ], MERS-CoV [ 6 , 7 ] confirm the necessity of introducing new treatments as fast as possible. However, the drugs need to be efficient with a low side effect [ 8 ]. To meet these goals, drug repurposing, a screening method, tries to locate new targets for the approved drugs [ 9 ]. First, it uses drugs that are approved, therefore have lower side effects, and can be trusted in treatments. Additionally, this approach narrows down the search space and consequently the cost and time for introducing the new drug. AI approaches, especially machine learning models, are commonly used in drug repurposing. The proposed drug repurposing methods cover a wide range of approaches from machine learning, e.g., logistic regression [ 10 ], random forest, support vector machine, neural networks [ 11 – 14 ], a spectrum of deep learning methods [ 15 , 16 ] such as DTINet [ 17 ], NeoDTI [ 18 ], HIDTI [ 19 ], MolTrans [ 20 ], TransDTI [ 21 ]. Certain drug repurposing techniques have focused on predicting new associations between viruses and antivirals [ 22 – 27 ]. All previous methods have only considered single-drug treatments and have not explored the synergistic effects of combining multiple drugs.

However, each drug in addition to controlling and treating properties may have side effects and therefore increasing its usage dose causes high-risk issues in the patient [ 28 ]. Moreover, using a higher dose of a drug may cause drug resistance and nullify the treatment’s effectiveness [ 29 ]. Drug repurposing has another branch of drug-target association which uses more than one drug in the treatment of a target. It is called combination therapy [ 30 ] that tends to reduce the side effects of drugs, fix drug resistance, and more importantly increase the effect of the treatment, e.g., synergistic drug pairs [ 31 ]. Therefore, combination therapy aims to improve the treatment and drug efficacy [ 32 ].

The first method to check the efficacy of the combination of drug pairs is brute force search. No need to mention that this method is costly and uses a tremendous amount of time and resources. High-throughput screening is another approach to investigating combination therapy. Same as brute force it consumes time and resources tremendously. One approach to researching drugs for disease treatment is through computational methods that investigate the drug space and suggest drug pairs. Such machine learning methods have achieved significant prediction power in this research area [ 33 ].

Computational combination therapy in oncology is an enriched and hot topic nowadays [ 9 , 34 – 38 ]. Preuer et al. used cancer cell line properties, i.e., gene expression, copy number, and gene mutation and drug information including, structural and molecular similarities and drug toxicity from Merck [ 34 ] and proposed a deep network to compute the synergistic score of combined drugs [ 36 ]. Zhang et al. used those entities from the NCI-ALMANAC [ 39 ] that have signaling pathways [ 37 ]. Zhang et al. [ 38 ] and Wang et al. [ 9 ] applied other deep models on new embeddings of cancer cell line properties. The former used autoencoder to drive new embeddings and the latter used kernel-based methods to extract meaningful features. Kuru et al. used two deep networks for embedding the generation of drugs and viruses from DrugCombo [ 40 ] and the new representations were fed to a third deep network for synergistic prediction of drugs for cancer treatment [ 41 ]. Julkunen et al. utilized the NCI-ALMANAC [ 39 ] dataset and mentioned that the previous works on drug combination in oncology had not considered protein properties and biological information of drugs. Then, they used factorization machines to decompose the information into latent spaces [ 42 ]. Meng et al. used a graph learning method to estimate the synergistic effect of combination therapy [ 43 ].

As mentioned earlier while combination therapy for oncology is a hot field, there are no general studies for virus treatments using synergistic therapy. Tan et al. proposed a multiplex screening method for HIV treatments [ 44 ]. This work does not use predictive learning models and targeted treatment of a single virus. Few studies proposed combination therapy solutions for SARS-CoV-2 [ 33 , 45 ]. Although both works proposed combination therapy using deep models, they are limited to SARS-CoV-2 and have no general dataset for combination therapy.

This work proposes several machine learning methods for analyzing and evaluating virus-antivirals combination therapy. To accomplish this, we create a dataset containing the characteristics of both viruses and antivirals. Then, we devise and apply several machine learning methods to evaluate the effect of AI-based methods on the subject. The results are promising, and several new combined drugs for virus treatments are proposed. Based on our knowledge and the literature review, all research studies on virus treatment using combination therapy have been limited to experimental or single-virus treatment. Therefore, this is the first study on general virus combination therapy. The contribution of the paper is four-fold:

  • First work on virus combination therapy.
  • First complete dataset on virus combination therapy (CombTVir).
  • Applying machine learning methods and evaluating the results.
  • Applying t-test analysis for statistical analysis and prediction validation.
  • Proposing new combined drugs for virus treatment. Some of these predictions have been confirmed in the literature.

The structure of the paper is as follows. Section Dataset generation describes the properties and aspects of the generated dataset. Section Methods introduces the proposed methods for combination therapy prediction. The results are reported in section Results . Section Conclusion concludes the paper.

Dataset generation

This paper proposes a method for predicting effective antiviral combinations for treating viral diseases. The first step of this proposal is to find a suitable dataset that contains information on antivirals used for combination therapy. Unfortunately, there is currently no available dataset for viruses. Therefore, our paper’s first contribution is the creation of a virus combination therapy dataset, which we call it “CombTVir” dataset.

Myhre et al. gathered and reported a list of 541 drug combinations [ 46 – 48 ], of which 372 combinations belong to small molecule-small molecule (SM-SM) synergism, 103 combinations belong to biotech-biotech synergism, and the remaining 66 combinations belong to other types of combinations, e.g., SM-biotech. Notably, the combination list was sourced from PubMed or clinical trials. The selected combinations are derived from experiments in vitro , in vivo , or clinical trial phases. We chose those 372 SM-SM combined drugs for the dataset. Before describing the generation of the dataset, it is necessary to clarify the modifications made to the combination therapy list. The list contains HIV and HIV-1 (there were no reported HIV-2 in the list). After analyzing the main references of HIV and HIV-1, we treated HIV-1 as equivalent to HIV. Herpes simplex virus (HSV) has two subtypes—HSV-1 and HSV-2. These subtypes are highly similar genetically [ 49 ]. Since the dataset did not indicate the HSV subtype, we assumed HSV-1 and HSV are similar in this work. Some rows in the dataset are identical, such as the combination of acyclovir with foscarnet on HSV-1, which is repeated twice. The difference between the two rows is whether they were experimented on in vitro [ 50 ] or not reported [ 51 ].

We selected 372 SM-SM combinations from the dataset and removed all biotech-biotech and biotech-SM combinations, resulting in 44 viruses and 211 drugs being included in the chosen combinations. Table 1 briefly reports the statistics of the dataset. With these 372 SM-SM combinations, we gathered information about them from NCBI [ 52 ] and DrugBank [ 53 ]. NCBI is the National Center for Biotechnology Information which provides access to biomedical and genomic information. We gathered the Fasta version of viruses’ sequences from NCBI. DrugBank is a freely accessible database that contains information on drugs and their targets; therefore, we collected the SMILES [ 54 ] of drugs from it. Thus, we have information on drugs and viruses.

thumbnail

  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image

https://doi.org/10.1371/journal.pone.0309733.t001

research proposal of deep learning

Then, the feature vector of each antiviral is its Tanimoto scores with whole antivirals. Consequently, their generated similarity matrix acts as the feature set of antivirals.

research proposal of deep learning

The framework prepares embeddings for each drug and each target based on their similarity information. Then, the corresponding embeddings of each drug-drug-target combination are concatenated, which is the input of the prediction step. The final step uses one of the proposed learning methods, i.e., SVM, RF, and DRaW to predict the interaction of each pair.

https://doi.org/10.1371/journal.pone.0309733.g001

Support vector machine

research proposal of deep learning

Random forest

Random forest (RF) is an ensemble machine learning method that utilizes several decision trees and each tree randomly chooses several features from the feature sets. Following the learning phase of the trees, the class with the majority vote is chosen as the predicted label. Using several trees and a random selection of features for each tree leads to neutralizing the overfitting effect of decision trees. More importantly, the ensemble of trees yields a reliable prediction result of random forest. This principle makes a random forest a high-performance ML method for classification. In this work, the decision trees use Gini and logloss functions for score computation in each level of trees [ 62 ].

DRaW–a deep learning method

research proposal of deep learning

The activation function used in the inner layers is RelU. Finally, the last layer is the classification module, which uses a sigmoid activation function for classification.

https://doi.org/10.1371/journal.pone.0309733.g002

research proposal of deep learning

Algorithm 1 Proposed Deep Model ( DRaW )

Input : A , V , Y , ratio , folds , epochs

research proposal of deep learning

1: data split(Y, ratio)

2: k-Fold ← stratified-k-Fold(folds)

3: for each fold in k-Fold do

4:  divide data into train and test

5:   For each epoch in epochs do

6:    Model = Training ( A tr , V tr , Y tr )

research proposal of deep learning

8:   Loss computation using Eq 5

9:   end for

10: end for

11: Performance evaluation

Complexity analysis

Assuming a dataset with m antivirals and n viruses, the complexity analysis is divided into two parts: dataset preparation and generation of feature vectors for antivirals and viruses. As mentioned earlier, we used the Tanimoto score and sequence alignment score to create the similarity matrices. The Tanimoto score is used to measure the similarity between two sets, while the sequence alignment score is used to measure the similarity between two sequences. The Tanimoto score complexity is cn 2 , where c is a small constant. This means that the runtime is fast, as a result, the entire procedure can be completed in just a fraction of a second. Performing pairwise sequence alignment for all viruses is a task that takes a considerable amount of time. The complexity of this algorithm is Cn 2 , where C is a huge constant, therefore, it is a time-consuming computation. SVM training time complexity is in the range O ( m 2 n 2 ) and O ( m 3 n 3 ) depending on the C hyperparameter and its runtime is O (| G | mn ), where | G | is the number of support vectors [ 63 ]. The time complexity of random forest uses N trees each with at most V sampled features [ 64 ]. Therefore, its training time complexity is O ( NVmn (log m + log n ). Its runtime is O ( Nd ), where d is the depth of the tree. The DRaW runs for E epochs of each T long. Therefore, its time complexity is O ( mnET ) asymptotically.

This section provides the results of the proposed methods of virus-antiviral combination therapy. We performed 10-fold stratified cross-validation on a system with Ubuntu 22.04 LTS operating system. The system runs on an Intel Xeon Processor E5 v4 family with 4 CPU threads, 16 GB of RAM, and 20 GB of storage capacity.

research proposal of deep learning

Moreover, we conducted a t-test on the predicted results [ 65 ] to evaluate the domain applicability and statistical analysis. We assume the null hypothesis H0 results from a lack of correlation between the original and predicted labels. The alternative hypothesis states the existence of a correlation between the two sets. Large values of p-value confirm the H0 and small values reject it.

We conducted the simulation for several P-to-N sampling ratios, i.e., 1:3, 1:5, 1:10, 1:100, and 1:500. For the lower sampling ratios—1:10 and lower ones– the performance of all methods is almost equal and close to perfect. Therefore, we report the results for the sampling ratios of 1:100 and 1:500. In our study, we performed a grid search on various configurations of SVM and random forest to identify the optimal performance of these ML methods. For SVM, we analyzed three different kernels (Linear, Poly, and RBF) and evaluated three different values of C for each kernel. The results of this analysis are provided in S4-S6 Tables in S1 File for sampling ratios 1:10, 1:100, and 1:500, respectively. As the results show, the SVM with specifications “poly” kernel and C = 10 has the best performance. Therefore, we use this model of SVM for comparison with other learning models. Additionally, we evaluated the random forest model using two criteria, Gini and logloss. We also tested two different values for the maximum number of features for each criterion. The results of these analyses are presented in S9-S11 Tables in S1 File for sampling ratios 1:10, 1:100, and 1:500, respectively, in the S1 File . The results confirm that the random forest with the logloss criteria and a maximum number of features for log(n) has been chosen for comparison with other learning models. These configurations for RF and SVM were then used for general comparison purposes.

Table 2 shows the metric scores for DRaW, SVM, and random forest for the P-to-N sampling ratio of 1:100 . While all methods have the same accuracy, The SVM has the highest AUC-ROC and the random forest has the highest AUPR.

thumbnail

https://doi.org/10.1371/journal.pone.0309733.t002

Table 3 reports the results for the P-to-N sampling ratio of 1:500 . The same pattern similar to Table 2 happens for this ratio as well.

thumbnail

https://doi.org/10.1371/journal.pone.0309733.t003

Visual comparison of AUC-ROC and AUPR for different methods is presented in Fig 3 . Results are reported for three P-to-N sampling ratios: 1:10, 1:100, and 1:500. In this study, we compared the changes in AUC-ROC when varying the sampling ratio. Fig 3A shows that the AUC-ROC scores for 1:10 and 1:100 remain almost unchanged, regardless of whether DRaW, SVM, or RF are used. The ML methods outperform the deep model regarding the mentioned evaluation metric. Among the ML methods, the SVM has the highest AUC-ROC. However, all methods show a decrease in performance when increasing the sampling ratio to 1:500. In the figure labeled as Fig 3B , we can see the AUPR (Area Under the Precision-Recall Curve) of different methods for different P-to-N sampling ratios. As the P-to-N sampling ratio increases, there is a decrease in the AUPR scores of all methods. It is observed that DRaW has a lower score compared to ML (Machine Learning) methods for the whole sampling ratios. Random forest is the top performer based on AUPR for all sampling ratios.

thumbnail

The x-axis displays P-to-N sampling ratios while the y-axis represents AUC-ROC and AUPR values for the left and right plots. (A) The AUC-ROC value of SVM remains stable and almost constant even when the sampling ratio increases. (B) In contrast, the right plot shows a decrease in the AUPR value of all methods. For higher sampling ratios, the AUPR value and overall performance of the random forest remain higher than other methods.

https://doi.org/10.1371/journal.pone.0309733.g003

The validation of the proposed model is crucial for generalization and checking the suggested combinations. Therefore, we conducted a t-test statistical analysis to validate the prediction models. Table 4 shows the t-test results of the predicted values. It reports the significance for sampling ratios of 1:10, 1:100, and 1:500 for all methods, i.e., DRaW, SVM, and random forest. We set the threshold to 0.05. All predicted values have p-values below the threshold and reject the null hypothesis.

thumbnail

https://doi.org/10.1371/journal.pone.0309733.t004

The results demonstrate that the proposed methods effectively predict synergistic combinations of antiviral drugs. Therefore, we present the predicted combinations of antiviral drugs that are effective against previously unknown viruses. Fig 4 illustrates a schematic graph of the proposed antiviral drug combinations.

thumbnail

https://doi.org/10.1371/journal.pone.0309733.g004

In order to validate the results, we conducted a literature search to identify antiviral drug combinations that have individually demonstrated effectiveness in treating specific viruses. For instance, while acyclovir and brincidofovir have shown treatment efficacy for CMV, our model suggests that combining the two could produce a synergistic effect. However, this proposed effect will need to be confirmed by future experimental studies.

Another prediction of the proposed model is the synergistic effect of acyclovir and cidofovir on HSV-1. Both of these medications are individually effective treatments for the mentioned virus. The literature also indicates that the combination of acyclovir and zidovudine has an additive effect on HSV-1, which the model also predicted to have a synergistic effect. Acyclovir and foscarnet have an additive effect on VZV, where our proposed machine learning models predict their synergistic treatment [ 66 ]. The additive combination of acyclovir and maribavir on CMV is predicted to have a synergistic treatment [ 67 ]. Additionally, it is predicted that acyclovir in combination with trifluridine and adefovir has a synergistic effect on treatments for HSV-1, and in combination with brincidofovir and brivudine has a synergistic effect on VZV. The model predicts that alisporivir and ribavirin have a synergistic effect on HCV and their additive effect has been confirmed experimentally. Clinical trials are necessary for the validation of these new combinations. Table 5 reports those predictions which at least have an additive treatment for viruses. Additionally, S13 and S14 Tables in S1 File report the complete list of unknown synergistic combination therapies against viruses predicted with proposed methods. The frequency shows the number of predictions in test sets.

thumbnail

Each citation reports the efficacy of its corresponding antiviral against the virus. The complete list of predicted combinations is available in the S1 File . Note that the synergistic effect of Acyclovir and Ribavirin against HSV-1 has been confirmed.

https://doi.org/10.1371/journal.pone.0309733.t005

More importantly, one of the predicted combinations, i.e., the synergistic effect of acyclovir and ribavirin against the herpes simplex type-1 virus (HSV-1) has been confirmed experimentally [ 68 ].

This paper proposes machine learning models to predict the synergistic effects of antiviral combinations on viruses. While synergistic combination therapy has a rich history of research, to the best knowledge of the authors there is no research on computational combination therapy for viruses. Therefore, in this paper, we have proposed a first dataset for the virus synergistic combination therapy. Moreover, we conducted several learning methods including random forest, SVM, and a deep model for efficient prediction of the synergistic effect of combined antivirals on the virus. The results confirm the high performance of all proposed methods. The results show the high performance of the random forest model. Increasing the sampling ratios notably resulted in the random forest having the best performance. In the future, using attention-based learning methods to model synergistic viruses can improve results. Additionally, the feature vectors are similarity vectors of antivirals and viruses. The similarity vectors are based on linear operators like cosine similarity. Therefore, using the similarity vector can impact and decrease the effect of learning models. Therefore, the direct feeding of SMILES of antivirals can improve the performance of learning models. Combining the self-attention methods with different ways of preparing the input features is another area for further research.

This paper confirms the results by applying a t-test to the predicted results and rejecting the null hypothesis. Experimental analysis is required to validate proposed drug combinations and determine if their effects are additive or synergistic. One combination (not in the dataset), acyclovir and ribavirin, was successfully predicted and approved in the literature against HSV-1. It is worth mentioning that acyclovir shows up in most of the predictions. This is due to its frequent presence in most approved synergistic combination actions.

Supporting information

S1 file. the supplementary material contains comprehensive information on machine learning hyperparameter optimization, comprehensive results, and predicted combinations..

https://doi.org/10.1371/journal.pone.0309733.s001

Acknowledgments

The authors would like to thank Fatemeh Nasiri for helping with the dataset collection, Masih Hajsaeedi for conducting the sequence alignment, and Javad Asghari for helping with implementation.

  • View Article
  • PubMed/NCBI
  • Google Scholar
  • 37. Zhang H, Feng J, Zeng A, Payne P, Li F. Predicting tumor cell response to synergistic drug combinations using a novel simplified deep learning model. In: AMIA Annual Symposium Proceedings. vol. 2020. American Medical Informatics Association; 2020. p. 1364.
  • 46. Myhre V. Drug Combinations for Treatment of Emerging and Re-emerging Viral Infections; 2022.
  • 59. Durbin R, Eddy SR, Krogh A, Mitchison G. Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge university press; 1998.
  • 61. Evgeniou T, Pontil M. Support vector machines: Theory and applications. In: Advanced course on artificial intelligence. Springer; 1999. p. 249–257.
  • 62. Xie X, Yuan MJ, Bai X, Gao W, Zhou ZH. On the Gini-impurity Preservation For Privacy Random Forests. In: Oh A, Neumann T, Globerson A, Saenko K, Hardt M, Levine S, editors. Advances in Neural Information Processing Systems. vol. 36. Curran Associates, Inc.; 2023. p. 45055–45082.
  • 63. Bottou L, Chapelle O, DeCoste D, Weston J. In: Support Vector Machine Solvers; 2007. p. 1–27.

Advertisement

Advertisement

Enhancing Pre-trained Deep Learning Model with Self-Adaptive Reflection

  • Published: 03 September 2024

Cite this article

research proposal of deep learning

  • Xinzhi Wang 1 ,
  • Mengyue Li 1 ,
  • Hang Yu 1 ,
  • Chenyang Wang 2 ,
  • Vijayan Sugumaran 3 &
  • Hui Zhang 2  

In the text mining area, prevalent deep learning models primarily focus on mapping input features to result of predicted outputs, which exhibit a deficiency in self-dialectical thinking process. Inspired by self-reflective mechanisms in human cognition, we propose a hypothesis that existing models emulate decision-making processes and automatically rectify erroneous predictions. The Self-adaptive Reflection Enhanced pre-trained deep learning Model (S-REM) is introduced to validate our hypotheses and to determine the types of knowledge that warrant reproduction. Based on the pretrained-model, S-REM introduces the local explanation for pseudo-label and the global explanation for all labels as the explanation knowledge. The keyword knowledge from TF-IDF model is also integrated to form a reflection knowledge. Based on the key explanation features, the pretrained-model reflects on the initial decision by two reflection methods and optimizes the prediction of deep learning models. Experiments with local and global reflection variants of S-REM on two text mining tasks across four datasets, encompassing three public and one private dataset were conducted. The outcomes demonstrate the efficacy of our method in improving the accuracy of state-of-the-art deep learning models. Furthermore, the method can serve as a foundational step towards developing explainable through integration with various deep learning models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

research proposal of deep learning

Similar content being viewed by others

research proposal of deep learning

Cross-Domain Transfer of Generative Explanations Using Text-to-Text Models

research proposal of deep learning

Towards Interpretable Deep Learning Models for Knowledge Tracing

research proposal of deep learning

Dialogical Guidelines Aided by Knowledge Acquisition: Enhancing the Design of Explainable Interfaces and Algorithmic Accuracy

Explore related subjects.

  • Artificial Intelligence

Data Availability

The data and code can be shared via email once the work is published.

Bellagente M, Brack M, Teufel H, et al. Multifusion: fusing pre-trained models for multi-lingual, multi-modal image generation. Adv Neural Inf Process Syst. 2024;36.

Dai R. Text Data mining algorithm combining CNN and DBM models. Mob Inf Syst. 2021;2021:1–7.

Google Scholar  

Sajda P, Philiastides MG, Parra LC. Single-trial analysis of neuroimaging data: inferring neural networks underlying perceptual decision-making in the human brain. IEEE Rev Biomed Eng. 2009;2:97–109.

Article   Google Scholar  

Akhtar N, Jalwana MAAK. Towards credible visual model interpretation with path attribution[C]//International Conference on Machine Learning. PMLR. 2023;439–457.

Lewis PR, Sarkadi Ş. Reflective artificial intelligence. Mind Mach. 2024;34(2):1–30.

Campbell GE, Bolton AE. Fitting human data with fast, frugal, and computable models of decision-making. InProceedings of the Human Factors and Ergonomics Society Annual Meeting 2003 Oct (Vol. 47, No. 3, pp. 325–329). Sage CA: Los Angeles, CA: SAGE Publications.

Kim B, Park J, Suh J. Transparency and accountability in AI decision support: explaining and visualizing convolutional neural networks for text information. Decis Support Syst. 2020;134:113302.

Cao M, Stewart A, Leonard NE. Integrating human and robot decision-making dynamics with feedback: models and convergence analysis. In2008 47th IEEE Conference on Decision and Control. IEEE. 2008;1127–1132.

Hu Z, Shao M, Liu H, Mi J. Cognitive computing and rule extraction in generalized one-sided formal contexts. Cogn Comput. 2022;14(6):2087–107.

Zuo G, Pan T, Zhang T, Yang Y. SOAR improved artificial neural network for multistep decision-making tasks. Cogn Comput. 2021;13:612–25.

Young T, Hazarika D, Poria S, Cambria E. Recent trends in deep learning based natural language processing. IEEE Comput Intell Mag. 2018;13(3):55–75.

Hilzensauer W. Theoretische Zugänge und Methoden zur Reflexion des Lernens. Ein Diskussionsbeitrag. Bildungsforschung. 2008;2.

Leary MR. The curse of the self: self-awareness, egotism, and the quality of human life. Oxford University Press; 2007.

Ribeiro MT, Singh S, Guestrin C. “Why should I trust you?” Explaining the predictions of any classifier. InProceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 2016;1135–1144.

Wang Q, Mao Z, Wang B, Guo L. Knowledge graph embedding: a survey of approaches and applications. IEEE Trans Knowl Data Eng. 2017;29(12):2724–43.

Dettmers T, Minervini P, Stenetorp P, Riedel S. Convolutional 2d knowledge graph embeddings. Proc AAAI Conf Artif Intell. 2018;32(1).

Quinn CJ, Kiyavash N, Coleman TP. Directed information graphs. IEEE Trans Inf Theory. 2015;61(12):6887–909.

Article   MathSciNet   Google Scholar  

Weiss K, Khoshgoftaar TM, Wang D. A survey of transfer learning. J Big Data. 2016;3(1):1–40.

Pan SJ, Yang Q. A survey on transfer learning. IEEE Trans Knowl Data Eng. 2009;22(10):1345–59.

Nguyen BH, Xue B, Andreae P, Zhang M. A hybrid evolutionary computation approach to inducing transfer classifiers for domain adaptation. IEEE Trans Cybern. 2020;51(12):6319–32.

Zhao H, Sun X, Dong J, Chen C, Dong Z. Highlight every step: knowledge distillation via collaborative teaching. IEEE Trans Cybern. 2020;52(4):2070–81.

Zhang J, Chen B, Zhang L, Ke X, Ding H. Neural, symbolic and neural-symbolic reasoning on knowledge graphs. AI Open. 2021;2:14–35.

Hooker JN. A quantitative approach to logical inference. Decis Support Syst. 1988;4(1):45–69.

Deng H. Interpreting tree ensembles with inTrees. Int J Data Sci Anal. 2019;7(4):277–87.

Mashayekhi M, Gras R. Rule extraction from random forest: the RF+ HC methods. InAdvances in Artificial Intelligence: 28th Canadian Conference on Artificial Intelligence, Canadian AI 2015, Halifax, Nova Scotia, Canada, June 2–5, 2015, Proceedings 28 2015 (pp. 223–237). Springer International Publishing. https://doi.org/10.1007/978-3-319-18356-5_20 .

Puri N, Gupta P, Agarwal P, Verma S, Krishnamurthy B. Magix: Model agnostic globally interpretable explanations. arXiv preprint arXiv:1706.07160 . 2017 Jun 22. https://doi.org/10.48550/arXiv.1706.07160 .

Yang C, Rangarajan A, Ranka S. Global model interpretation via recursive partitioning. In2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS). IEEE. 2018;1563–1570. https://doi.org/10.1109/HPCC/SmartCity/DSS.2018.00256 .

Yuan H, Chen Y, Hu X, Ji S. Interpreting deep models for text analysis via optimization and regularization methods. Proc AAAI Conf Artif Intell. 2019;33(01):5717–24.

Mahendran A, Vedaldi A. Understanding deep image representations by inverting them. Proc IEEE Conf Comput Vision Pattern Recogn. 2015;5188–5196.

Dosovitskiy A, Brox T. Inverting visual representations with convolutional networks. InProceedings of the IEEE conference on computer vision and pattern recognition. 2016;4829–4837.

Guidotti R, Monreale A, Ruggieri S, Pedreschi D, Turini F, Giannotti F. Local rule-based explanations of black box decision systems. arXiv preprint arXiv:1805.10820 . 2018 May 28. https://doi.org/10.48550/arXiv.1805.10820 .

Ribeiro MT, Singh S, Guestrin C. Anchors: High-precision model-agnostic explanations. Proc AAAI Conf Artif Intell. 2018;32(1). https://doi.org/10.1609/aaai.v32i1.11491 .

Liu L, Wang L. What has my classifier learned? visualizing the classification rules of bag-of-feature model by support region detection. 2012 IEEE Conf Comput Vision Pattern Recogn IEEE. 2012;3586–3593.

Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D. Grad-cam: visual explanations from deep networks via gradient-based localization. Proc IEEE Int Conf Comput Vision. 2017;618–626.

Lundberg SM, Lee SI. A unified approach to interpreting model predictions. Adv Neural Inf Process Syst. 2017;30.

Guo W, Mu D, Xu J, Su P, Wang G, Xing X. Lemna: explaining deep learning based security applications. Inproceedings of the 2018 ACM SIGSAC conference on computer and communications security. 2018;364–379.

Li X, Xiong H, Li X, et al. G-LIME: statistical learning for local interpretations of deep neural networks using global priors. Artif Intell. 2023;314:103823.

Chiu CW, Minku LL. A diversity framework for dealing with multiple types of concept drift based on clustering in the model space. IEEE Trans Neural Netw Learn Syst. 2020;33(3):1299–309.

Liu S, Xue S, Wu J, Zhou C, Yang J, Li Z, Cao J. Online active learning for drifting data streams. IEEE Trans Neural Netw Learn Syst. 2021. https://doi.org/10.1109/TNNLS.2021.3091681 .

Bi X, Zhang C, Zhao X, Li D, Sun Y, Ma Y. CODES: Efficient incremental semi-supervised classification over drifting and evolving social streams. IEEE Access. 2020;8:14024–35. https://doi.org/10.1109/ACCESS.2020.2965766 .

Li H, Dong W, Hu BG. Incremental concept learning via online generative memory recall. IEEE Trans Neural Netw Learn Syst. 2020;32(7):3206–16. https://doi.org/10.1109/TNNLS.2020.3010581 .

Shan J, Zhang H, Liu W, Liu Q. Online active learning ensemble framework for drifted data streams. IEEE Trans Neural Netw Learn Syst. 2018;30(2):486–98.

Petit G, Popescu A, Schindler H, et al. Fetril: feature translation for exemplar-free class-incremental learning[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2023;3911–3920.

Li P, He L, Wang H, Hu X, Zhang Y, Li L, Wu X. Learning from short text streams with topic drifts. IEEE Trans Cybern. 2017;48(9):2697–711.

Lu Y, Cheung YM, Tang YY. Adaptive chunk-based dynamic weighted majority for imbalanced data streams with concept drift. IEEE Trans Neural Netw Learn Syst. 2019;31(8):2764–78.

Yang C, Cheung YM, Ding J, Tan KC. Concept drift-tolerant transfer learning in dynamic environments. IEEE Trans Neural Netw Learn Syst. 2021;33(8):3857–71.

Pan Z, Yu X, Zhang M, et al. DyCR: a dynamic clustering and recovering network for few-shot class-incremental learning. IEEE Trans Neural Netw Learn Syst. 2024.

Gehring J, Auli M, Grangier D, Dauphin YN. A convolutional encoder model for neural machine translation. arXiv preprint arXiv:1611.02344 . 2016 Nov 7. https://doi.org/10.48550/arXiv.1611.02344 .

Bartoli A, De Lorenzo A, Medvet E, Tarlao F. Active learning of regular expressions for entity extraction. IEEE Trans Cybern. 2017;48(3):1067–80.

Jiang H, He H. Learning from negative links. IEEE Trans Cybern. 2021;52(8):8481–92.

Wu Y, Dong Y, Qin J, Pedrycz W. Linguistic distribution and priority-based approximation to linguistic preference relations with flexible linguistic expressions in decision making. IEEE Trans Cybern. 2020;51(2):649–59.

Pang J, Rao Y, Xie H, Wang X, Wang FL, Wong TL, Li Q. Fast supervised topic models for short text emotion detection. IEEE Trans Cybern. 2019;51(2):815–28.

Wang X, Kou L, Sugumaran V, Luo X, Zhang H. Emotion correlation mining through deep learning models on natural language text. IEEE Trans Cybern. 2021;51(9):4400–13.

Wu Z, Ong DC. Context-guided bert for targeted aspect-based sentiment analysis. Proc AAAI Conf Artif Intell. 2021;35(16):14094–102.

Wu HC, Luk RW, Wong KF, Kwok KL. Interpreting TF-IDF term weights as making relevance decisions. ACM Trans Inf Syst (TOIS). 2008;26(3):1–37.

Pontiki M, Galanis D, Papageorgiou H, Androutsopoulos I, Manandhar S, AL-Smadi M, Al-Ayyoub M, Zhao Y, Qin B, De Clercq O, Hoste V. Semeval-2016 task 5: aspect based sentiment analysis. InProWorkshop on Semantic Evaluation (SemEval-2016). Assoc Comput Linguist. 2016;19–30.

Zhang X, Zhao J, LeCun Y. Character-level convolutional networks for text classification. Adv Neural Inf Process Syst. 2015;28.

Download references

The research reported in this paper was supported by the National Natural Science Foundation of China under the grant 72204155 and Natural Science Foundation of Shanghai under the grant 23ZR1423100.

Author information

Authors and affiliations.

School of Computer Engineering and Science, Shanghai University, Shanghai, 200444, China

Xinzhi Wang, Mengyue Li & Hang Yu

School of Safety Science, Tsinghua University, Beijing, 100084, China

Chenyang Wang & Hui Zhang

Department of Decision and Information Sciences, Oakland University, Rochester, MI, 48309, USA

Vijayan Sugumaran

You can also search for this author in PubMed   Google Scholar

Contributions

Methodology and resources were provided by Xinzhi Wang. Review, editing, and original draft preparation were done by Xinzhi Wang and Mengyue Li. Investigation and visualization were done by Mengyue Li and Chenyang Wang. Software and data curation were done by Mengyue Li. Conceptualization and formal analysis were discussed by Hang Yu and Vijayan Sugumaran. This work was supervised by Hang Yu and Xinzhi Wang.

Corresponding authors

Correspondence to Hang Yu or Chenyang Wang .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Wang, X., Li, M., Yu, H. et al. Enhancing Pre-trained Deep Learning Model with Self-Adaptive Reflection. Cogn Comput (2024). https://doi.org/10.1007/s12559-024-10348-3

Download citation

Received : 16 February 2024

Accepted : 12 August 2024

Published : 03 September 2024

DOI : https://doi.org/10.1007/s12559-024-10348-3

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Model self-reflection
  • Self-adaptive reflection
  • Cognitive enhanced deep learning model
  • Text mining
  • Find a journal
  • Publish with us
  • Track your research

COMMENTS

  1. Novel Deep Learning Research Proposal [High Quality Proposal]

    Deep Learning Research Proposal. The word deep learning is the study and analysis of deep features that are hidden in the data using some intelligent deep learning models. Recently, it turns out to be the most important research paradigm for advanced automated systems for decision-making. Deep learning is derived from machine learning ...

  2. Deep Learning: A Comprehensive Overview on Techniques, Taxonomy

    Deep learning (DL), a branch of machine learning (ML) and artificial intelligence (AI) is nowadays considered as a core technology of today's Fourth Industrial Revolution (4IR or Industry 4.0). Due to its learning capabilities from data, DL technology originated from artificial neural network (ANN), has become a hot topic in the context of computing, and is widely applied in various ...

  3. PDF Research proposal

    Research proposal Pierre Thodoro Abstract ... Deep learning models possess a remarkable capacity to model complex non-linear phenomena, how-ever, they often lack sample e ciency, reliability, and interpretability. When deployed in the real world, this can lead to unexpected and unstable behaviour (Yuan et al., 2019). In contrast,

  4. Deep learning: systematic review, models, challenges, and research

    The current development in deep learning is witnessing an exponential transition into automation applications. This automation transition can provide a promising framework for higher performance and lower complexity. This ongoing transition undergoes several rapid changes, resulting in the processing of the data by several studies, while it may lead to time-consuming and costly models. Thus ...

  5. EXPANSE, A Continual Deep Learning System; Research Proposal

    The final goal of the Artificial Intelligence community is to achieve Artificial General Intelligence, a human level of intelligence with precision, persistence, and processing speed of computers. As humans, we have the ability to progress via continual learning. We always add a skill on top of our past learned skills. Deep transfer learning, followed by progressive learning, is attempting to ...

  6. Deep learning in computer vision: A critical review of emerging

    Deep learning has been overwhelmingly successful in computer vision (CV), natural language processing, and video/speech recognition. ... It firstly generated a set of object-independent object proposals in the form of regions in the input image. It then extracts fixed-length deep features using CNN from processed regions of the image ...

  7. EXPANSE, A Continual Deep Learning System; Research Proposal

    Request PDF | On Dec 1, 2021, Mohammadreza Iman and others published EXPANSE, A Continual Deep Learning System; Research Proposal | Find, read and cite all the research you need on ResearchGate

  8. A Novel Proposal for Deep Learning-Based Diabetes Prediction

    A Novel Proposal for Deep Learning-Based Diabetes Prediction: Converting Clinical Data to Image Data. Muhammet Fatih Aslan, ... N.R.J. Artificial intelligence in disease diagnostics: A critical review and classification on the current state of research guiding future direction. Health Technol. 2021; 11:693-731. doi: 10.1007/s12553-021-00555-5.

  9. Deep learning in mental health outcome research: a scoping review

    Deep learning (DL), as one of the most recent generation of AI technologies, has demonstrated superior performance in many real-world applications ranging from computer vision to healthcare. The ...

  10. Deep learning in cancer diagnosis, prognosis and treatment selection

    Deep learning is a subdiscipline of artificial intelligence that uses a machine learning technique called artificial neural networks to extract patterns and make predictions from large data sets. The increasing adoption of deep learning across healthcare domains together with the availability of highly characterised cancer datasets has accelerated research into the utility of deep learning in ...

  11. PDF A Proposal for Performance-based Assessment of the Learning of Machine

    Recent progress in ML has been specifically achieved by deep learning ap-proaches using neural networks, dramatically improving the state-of-the-art in image ... A Proposal for Performance-based Assessment of the Learning of ... 481. 2019; Ramos et al., 2020). Representing a complex area, the best approach is to start ... this research aims to ...

  12. Deep Learning Model for Biological Entity Extraction: A Research Proposal

    In the proposed research we plane to apply a deep state-of-the-art deep learning based model for BioEE. In this work, different deep learning based model structures will be designed and tested ...

  13. [PDF] A Review of Deep Learning Research

    This paper introduces deep learning techniques from various aspects, including common models of deep learning and their optimization methods, commonly used open source frameworks, existing problems and future research directions. With the advent of big data, deep learning technology has become an important research direction in the field of machine learning, which has been widely applied in ...

  14. PDF DEEP LEARNING WITH GO A Thesis

    Stinson, Derek L. M.S.E.C.E., Purdue University, May 2020. Deep Learning with Go. Major Professor: Zina Ben Miled. Current research in deep learning is primarily focused on using Python as a sup-port language. Go, an emerging language, that has many bene ts including native support for concurrency has seen a rise in adoption over the past few ...

  15. How to Write Research Proposal Deep Learning

    RESEARCH PROPOSAL DEEP LEARNING. Get your deep learning proposal work from high end trained professionals. The passion of your areas of interest will be clearly reflected in your proposal. Chose an expert to provide you with custom research proposal work. To interpret the real-time process of the art, historical context and future scopes we ...

  16. A Survey of Deep Learning: Platforms, Applications and Emerging

    Deep learning has exploded in the public consciousness, primarily as predictive and analytical products suffuse our world, in the form of numerous human-centered smart-world systems, including targeted advertisements, natural language assistants and interpreters, and prototype self-driving vehicle systems. Yet to most, the underlying mechanisms that enable such human-centered smart products ...

  17. PhD Proposal: A Reconciliation of Deep Learning and the Brain: Towards

    In 1958, Frank Rosenblatt conceived of the Perceptron, in an effort to fulfill the dream of connectionism, to explain and recreate brain phenomena such as learning and behavior through simple learning rules of simple neurons. After his tragic death and the A.I. winter, and the resurgence that followed, his more brain-focused network was distilled into the more standardized feed-forward deep ...

  18. PDF Deep Learning: Project Final Report

    deep learning, more attention will be emphasized there. 4 Dataset and Features 4.1 Description The dataset consists of 60,000 AES-128 power traces extracted from ATMega8515 (AVR architecture) microcontroller, partitioned into 10,000 test and 50,000 train cases. It is a time series dataset. Each data point consists of three groups of information:

  19. Deep Reinforcement Learning of Region Proposal ...

    trained via deep reinforcement learning (RL). Our model is capable of accumulating class-specific evidence over time, potentially affecting subsequent proposals and classification scores, and we show that such context integration significantly boosts detection accuracy. Moreover, drl-RPN automatically decides when to stop the search process and

  20. Deep Learning Research Proposal

    Deep Learning Research Proposal; Various deep learning project proposals development process includes describing a research-based queries and problems, demonstrating its importance and suggesting an approach to overcome this. Below we discussed about the common flow of deep learning-based project proposals:

  21. Deep learning in business analytics and operations research: Models

    With recent advances in machine learning, a specific type of predictive model has received great traction lately: deep learning (LeCun, Bengio, & Hinton, 2015).The underlying concept is not specific to machine learning or data-analytics approaches from operations research, as it simply refers to deep neural networks. However, what has changed from early experiments with neural networks is the ...

  22. Research on the Integration of Deep Learning and Psychology in

    Mou, Chuan, "Current situation and strategy formulation of college sports psychology teaching following adaptive learning and deep learning under information education." Frontiers in psychology 12 (2022): 766621.

  23. Combination therapy synergism prediction for virus treatment using

    Our proposal includes using a random forest model, an SVM model, and a deep model to train viral combination therapy. The machine learning models showed the highest performance, and the predicted values were validated by a t-test, indicating the effectiveness of the proposed methods.

  24. PhD proposal

    E-mail: [email protected]. Deep Learning Systems Security and Robustness. While deep learning developed significantly during the last years and found. many applications, the security of deep ...

  25. Enhancing Pre-trained Deep Learning Model with Self-Adaptive ...

    In the text mining area, prevalent deep learning models primarily focus on mapping input features to result of predicted outputs, which exhibit a deficiency in self-dialectical thinking process. Inspired by self-reflective mechanisms in human cognition, we propose a hypothesis that existing models emulate decision-making processes and automatically rectify erroneous predictions. The Self ...

  26. How to Read Deep Learning Paper as a Software Engineer

    Deep learning papers can look daunting to read.Especially if you don't have a strong theoretical background in machine or deep learning.Some papers can be so...

  27. A lightweight deep learning architecture for automatic modulation

    The rest of this paper is organized as follows: This paper proposes a lightweight deep learning architecture for AMC model based on Informer ProbSparse self-attention mechanism, which is composed of 2D cross-section SCF curves generation module, and Informer self-attention classification module. We provide the architecture model of the system, and then complete the algorithm analysis and ...

  28. Security Assessment Framework for DDOS Attack Detection via Deep Learning

    In this paper, a novel Iot based Security Assessment for intrusion using Deep Learning (ISAI-DL) technique has been proposed to identify IoT device vulnerabilities such as DDoS and MitM attacks. Initially, the features are extracted from the API documents using Bag of Words (BoW) and Term Frequency-Inverse Document Frequency (TF-IDF) techniques.

  29. Unlocking the potential of AI: Machine learning and deep learning

    This review provides a comprehensive analysis of current machine learning and deep learning models for carcinogenicity prediction, underscoring the importance of high-quality and large datasets. These observations are anticipated to catalyze future advancements in developing effective and generalizable machine learning and deep learning models ...

  30. A Proposal for a Deep Learning Model to Enhance Student Guidance and

    CHKOURI Mohamed Yassin : Professor, SIGL-ENSA / Tetuan, Morocco. [email protected]. Abstract : Despite attempts to improve the university's training offer, dropout rates are. constantly ...