facebook

  • Skip to primary navigation
  • Skip to main content

OpenCV

Open Computer Vision Library

Computer Vision and Image Processing: Understanding the Distinction and Interconnection

Explore the essentials of Computer Vision and Image Processing in this easy-to-follow guide. Discover their unique roles and combined impact in today's tech-driven world, tailored for beginners.

Farooq Alvi December 13, 2023 Leave a Comment AI Careers Tags: computer vision and image processing difference between computer vision and image processing difference between image processing and computer vision digital image processing and computer vision introduction to computer vision and image processing what is computer vision and image processing

computer vision and image processing

In today’s digital world, computers are learning to ‘see’ and ‘understand’ images just like humans. But how do they do it? This fascinating journey involves two key fields: Computer Vision and Image Processing . While they may sound similar, they have distinct roles in the world of technology. Let’s dive in to understand these exciting fields better!

What is Image Processing?

The art of beautifying images.

Imagine you have a photograph that isn’t quite perfect – maybe it’s too dark, or the colors are dull. Image processing is like a magic wand that transforms this photo into a better version. It involves altering or improving digital images using various methods and tools. Think of it as editing a photo to make it look more appealing or to highlight certain features. It’s all about changing the image itself.

What is Computer Vision?

Teaching computers to interpret images.

Now, imagine a robot looking at the same photograph. Unlike humans, it doesn’t naturally understand what it’s seeing. This is where computer vision comes in. It’s like teaching the robot to recognize and understand the content of the image – is it a picture of a cat, a car, or a tree? Computer vision doesn’t change the image. Instead, it tries to make sense of it, much like how our brain interprets what our eyes see.

Core Principles & Techniques

Computer vision (cv): seeing beyond the surface.

In the realm of Computer Vision, the goal is to teach computers to understand and interpret visual information from the world around them. Let’s explore some of the key principles and techniques that make this possible:

Pattern Recognition

Think of this as teaching a computer to play a game of ‘spot the difference’. By recognizing patterns, computers can identify similarities and differences in images. This skill is crucial for tasks like facial recognition or identifying objects in a scene.

Deep Learning

Deep Learning is like giving a computer a very complex brain that learns from examples. By feeding it thousands, or even millions, of images, a computer learns to identify and understand various elements in these images. This is the backbone of modern computer vision, enabling machines to recognize objects, people, and even emotions.

Object Detection

This is where computers get really smart. Object detection is about identifying specific objects within an image. It’s like teaching a computer to not just see a scene, but to understand what each part of that scene is. For instance, in a street scene, it can distinguish cars, people, trees, and buildings.

Image Processing: Transforming Pixels into Perfection

In the world of Image Processing, the magic lies in altering and enhancing images to make them more useful or visually appealing. Let’s break down some of the fundamental principles and techniques:

Image Enhancement

This is like giving a makeover to an image. Image enhancement can brighten up a dark photo, bring out hidden details, or make colors pop. It’s all about improving the look and feel of an image to make it more pleasing or informative.

Imagine sifting through the ‘noise’ to find the real picture. Image filtering involves removing or reducing unwanted elements from an image, like blurring, smoothening rough edges, or sharpening blurry parts. It helps in cleaning up the image to highlight the important features.

Transformation Techniques

This is where an image can take on a new shape or form. Transformation techniques might include resizing an image, rotating it, or even warping it to change perspective. It’s like reshaping the image to fit a specific purpose or requirement.

These techniques form the toolbox of image processing, enabling us to manipulate and enhance images in countless ways.

Distinctions Between Computer Vision and Image Processing

Image processing: visual perfection.

The primary aim of image processing is to improve image quality. Whether it’s enhancing contrast, adjusting colors, or smoothing edges, the focus is on making the image more visually appealing or suitable for further use. It’s about transforming the raw image into a refined version of itself.

Image processing focuses on enhancing and transforming images. It’s vital in fields like digital photography for color correction, medical imaging for clearer scans, and graphic design for creating stunning visuals. These transformations not only improve aesthetics but also make images more suitable for analysis, laying the groundwork for deeper interpretation, including by computer vision systems.

Computer Vision: Decoding the Visual World

Computer vision, on the other hand, seeks to extract meaning from images. The goal isn’t to change how the image looks but to understand what the image represents. This involves identifying objects, interpreting scenes, and even recognizing patterns and behaviors within the image. It’s more about comprehension rather than alteration.

Computer Vision, conversely, aims to extract meaning and understanding from images. It’s at the heart of AI and robotics, helping machines recognize faces, interpret road scenes for autonomous vehicles, and understand human behavior. The success of these tasks often relies on the quality of image processing. High-quality, well-processed images can significantly enhance the accuracy of computer vision algorithms.

Techniques and Tools

Image processing techniques and tools.

In image processing, the toolkit includes a range of software and algorithms specifically designed for modifying images. This includes:

Software like Photoshop and GIMP, for manual edits such as retouching and resizing.

Algorithms for automated tasks like histogram equalization for contrast adjustment and filters for noise reduction and edge enhancement.

Computer Vision Techniques and Tools

Computer Vision, on the other hand, employs a different set of methodologies:

Machine Learning and Deep Learning Algorithms such as Convolutional Neural Networks (CNNs) are pivotal for tasks like image classification and object recognition.

Pattern Recognition Tools are used to identify and classify objects within an image, essential for applications like facial recognition.

Interconnection and Overlap: Synergy in Sight

This section illustrates the essential relationship between image processing and computer vision, showcasing their collaborative role in advanced technological applications.

Building Blocks: Image Processing in Computer Vision

Your Image Alt Text

Pre-processing in Computer Vision: Many computer vision algorithms require pre-processed images. Techniques like noise reduction and contrast enhancement from image processing improve the accuracy of computer vision tasks.

Feature Extraction: Simplified or enhanced images from image processing are easier for computer vision algorithms to analyze and interpret.

Integrated Systems: Collaborative Power

Both fields often work in tandem in complex systems:

Autonomous Vehicles: Computer vision systems rely on image processing to clarify and enhance road imagery for better object detection and obstacle avoidance.

Medical Imaging Analysis: Image processing is used to enhance medical images like MRIs or X-rays, which are then analyzed by computer vision algorithms for diagnosis and research.

Applications and Real-World Examples: Transforming Industries

Diverse Industries Benefiting from These Technologies

Medical Imaging: Image processing enhances medical scans for clarity, which are then analyzed by computer vision to detect abnormalities, aiding in early diagnosis and treatment planning.

Autonomous Vehicles: Utilize image processing for clear visual input, which is essential for computer vision systems to accurately identify and react to road signs, pedestrians, and other vehicles.

Surveillance

Security Systems: Image processing improves image quality from cameras, aiding computer vision in accurately recognizing faces or suspicious activities and enhancing security measures.

Entertainment

Film and Gaming: Image processing is used for visual effects, while computer vision contributes to interactive experiences, like augmented reality games.

Case Studies: Integrating Computer Vision and Image Processing

Smart city projects.

Traffic Management Systems: Utilize image processing to enhance traffic camera feeds, which are then analyzed by computer vision for managing traffic flow and detecting incidents.

Agricultural Technology

Crop Monitoring Systems: Image processing clarifies aerial images of crops, and computer vision analyzes these images to assess crop health and growth, optimizing agricultural practices.

These examples and case studies highlight the impactful and transformative role of image processing and computer vision across various sectors, demonstrating their critical contribution to technological advancements.

Conclusion: The Convergence of Vision and Processing in the Digital Age

summary, Computer Vision and Image Processing, though distinct in their goals and techniques, are interconnected fields that play a pivotal role in the advancement of modern technology. Image processing sets the stage by enhancing and transforming images, which are then interpreted and understood through computer vision. Together, they are revolutionizing industries such as healthcare, automotive, surveillance, and entertainment, driving innovation and opening new frontiers in technology.

Understanding these fields and their interplay is crucial for anyone looking to engage with the latest in tech development and application.

References and Further Reading

  • Why you need to start learning OpenCV in 2023
  • A Deep Dive into AI Jobs in 2023
  • History of AI
  • Top 7 AI Applications
  • Why you should absolutely learn PyTorch 2023
  • Exploring the 7 Types of AI in 2023
  • Career in AI

Related Posts

introduction to ai jobs in 2023

August 16, 2023    Leave a Comment

introduction to artificial intelligence

August 23, 2023    Leave a Comment

Knowing the history of AI is important in understanding where AI is now and where it may go in the future.

August 30, 2023    Leave a Comment

Become a Member

Stay up to date on OpenCV and Computer Vision news

Free Courses

  • TensorFlow & Keras Bootcamp
  • OpenCV Bootcamp
  • Python for Beginners
  • Mastering OpenCV with Python
  • Fundamentals of CV & IP
  • Deep Learning with PyTorch
  • Deep Learning with TensorFlow & Keras
  • Computer Vision & Deep Learning Applications
  • Mastering Generative AI for Art

Partnership

  • Intel, OpenCV’s Platinum Member
  • Gold Membership
  • Development Partnership

General Link

case study for image processing

Subscribe and Start Your Free Crash Course

case study for image processing

Stay up to date on OpenCV and Computer Vision news and our new course offerings

  • We hate SPAM and promise to keep your email address safe.

Join the waitlist to receive a 20% discount

Courses are (a little) oversubscribed and we apologize for your enrollment delay. As an apology, you will receive a 20% discount on all waitlist course purchases. Current wait time will be sent to you in the confirmation email. Thank you!

case study for image processing

Practical Python and OpenCV: An Introductory, Example Driven Guide to Image Processing and Computer Vision

Practical Python and OpenCV

An introductory, example driven guide to image processing and computer vision, learn computer vision in a single weekend..., ...with the practical python and opencv ebook.

Are you interested in computer vision and image processing, but don't know where to start? My new book is your guaranteed quick start guide to learning the fundamentals of computer vision and image processing using Python and OpenCV.

"Your teaching method rocks. Hands down." — Giridhur S.

"You can teach me computer vision in a single weekend? How is that possible?" I'll show you...

Take a sneak peek at what's inside...

Inside practical python and opencv + case studies you'll learn the basics of computer vision and opencv , working your way up to more advanced topics such as face detection , object tracking in video , and handwriting recognition , all with lots of examples, code, and detailed walkthroughs., before you do anything else, take a look at the video to your left to see how my 275+ page book , 16 video tutorials covering 4+ hours of lessons , and downloadable ubuntu virtual machine (that comes with opencv pre-installed) can turn you into an opencv ninja, guaranteed..

I'm sold — I'm ready to grab my copy. »

Curious about computer vision? Let me help.

I wrote this book for you — for developers, programmers, and students who are interested in computer vision and image processing, but still need to learn the basics..

This book covers the fundamentals with tons of code examples that allow you to get your hands dirty, quickly and easily. Whether you are are a seasoned developer looking to learn more about computer vision, or a student at a university preparing for research in the computer vision field, this book is for you.

Learn how to detect faces in images using Python and OpenCV

Learn how to detect faces in images and video.

By far, the most requested tutorial of all time on this blog has been "how do i find faces in images" if you're interested in face detection and finding faces in images and video, then this book is for you..

We'll start Case Studies by talking to my old dorm buddy, Jeremy, a college student interested in computer vision. Instead of spending his time studying for his Algorithms final exam, he instead becomes entranced by computer vision.

Jeremy applies face detection to both pictures and videos, and while his final grade in Algorithms is in jeopardy, at least he learns a lot about computer vision.

Explore object tracking in video.

A few months ago i taught a developer the basics of object tracking ...he then went on to build surveillance systems to track people in video. curious how he learned so fast the secrets are in my book..

We'll then chat with Laura, who works at Initech (after it burned to the ground, allegedly over a red Swingline stapler) updating bank software. She's not very challenged at her job and she spends her night sipping Pinot Grigio, watching CSI re-runs.

Sick of her job at Initech, Laura studies up on computer vision and learns how to track objects in video. Ultimately, she's able to leave her job at Initech and join their rival, Initrode, and build software used to track eye movements in cameras.

Track objects in video using OpenCV.

Become a pro at handwriting recognition with HOG.

You've probably seen handwriting recognition software before, whether on your tablet or your ipad. but how do they do it i'll show you. and then you'll know the secret for yourself..

Next up, we'll stop by Hank's office. Hank and his team of programmers are consulting with the Louisiana post office, where they are tasked with building a system to accurately classify the zip codes on envelopes.

Unfortunately, Hank underbid on the job and he's currently extremely stressed that the project will not be complete on time. If the job isn't done on time, profits will suffer and he might lose his job!

Luckily, Hank remembers back to a machine learning course he took during his masters program. He is able to utilize Histogram of Oriented Gradients and a Linear Support Vector Machine to classify handwriting...and save his job.

Master machine learning by classifying flower species.

Interested in classifying an image based on its content i've got you covered. i'll show you how to use color histograms and a random forest classifier to classify the species of flowers. after reading this chapter, you'll be a pro at image classification.

Let me tell you about my friend Charles. He works for The New York Museum of Natural History in the Botany department. One of Charles' jobs is to classify the species of flowers in photographs. It's extremely time consuming and tedious, but the museum pays handsomely for it.

Charles decides to create a system to automatically classify the species of flowers using computer vision and machine learning techniques. This approach will save him a bunch of time and allow him to get back to his research, instead of mindlessly classifying flower species.

Image classification made easy with Python and OpenCV

Create the next startup by building an Amazon.com book cover search.

There's no doubt that the next big startup is going to involve computer vision . using the techniques in this chapter, you just might be able to launch the next big thing... and make a ton of money ..

Finally, we'll head to San Francisco to meet Gregory, the hotshot entrepreneur who is working with his co-founder to create a competitor to Amazon's Flow. Flow allows users to use the camera on their smartphones as a digital shopping device. By simply taking a picture of a cover of a book, DVD, or video game, Flow automatically identifies the product and adds it to the user's shopping cart.

Three weeks ago, Gregory and I went to a local bar to have a couple beers. I guess he had one too many, because guess what?

He clued me in on his secrets.

He begged me not to tell...but I couldn't resist.

Use your Raspberry Pi to build awesome computer vision projects.

Do you want to use your raspberry pi to detect faces in images , track objects in video , or recognize handwriting no problem all source code examples for both practical python and opencv + case studies are guaranteed to run on the raspberry pi 2, pi 3, and pi zero w right out of the box. no code modifications required.

Do you own a Raspberry Pi? Do you want to leverage it to build awesome computer vision apps?

No problem, I've got you covered!

Both Practical Python and OpenCV + Case Studies include Python and OpenCV source code examples that are guaranteed to run on your Raspberry Pi 2, Pi 3, and Pi Zero W right out of the box.

Use your Raspberry Pi to learn computer vision and OpenCV

Come code alongside me.

Imagine having me at your side, helping you learn computer vision and opencv — that's exactly what it's like when you work through my 16 video tutorials covering 4+ hours of lessons . together, we'll walk through each line of code as i detail my thought process and rational to how i'm solving each computer vision project. plus, these videos contain tips & tricks i don't cover in the books.

Let's face it. Reading tutorials from a book isn't always the best way to learn a new programming language or library. I've often found that it's much easier for me to learn a new skill if I can actually watch someone doing it first — and I'm sure you're the same way. That's exactly why I have put together 16 videos covering over 4+ hours of lessons from Practical Python and OpenCV . If you're the type of person that learns by watching, then you'll want to grab these videos — they are a fantastic asset to help you learn OpenCV and computer vision.

I'm sold. Upgrade me to the Quickstart Bundle »

Don't waste time installing packages...invest your time learning.

No need to configure your development environment and waste time installing packages. just download my pre-configured ubuntu virtual machine (guaranteed to run on osx , linux , & windows ) and start learning..

I recognize the fact that setting up your development environment isn't the most fun thing in the world — not to mention that it's also quite time consuming! In order to get you learning as fast as possible, I have created a downloadable Ubuntu VirtualBox virtual machine that has all the computer vision and image processing libraries you will need pre-installed.

Download the Practical Python and OpenCV Ubuntu Virtual Machine

OpenCV + Raspbian: Pre-configured and pre-installed.

Download the .img, flash it to your sd card, and enjoy opencv + python pre-baked on your raspberry pi 2, raspberry pi 3, and raspberry pi zero w..

I ran the numbers and determined that even if you know exactly what you are doing it can take over 2.2 hours to compile and install OpenCV on your Raspberry Pi (I know from experience). When I sampled a group of novice readers, I found their install experience jumped nearly 4x to over 8.7 hours

To make it as easy as possible for you to learn the basics of computer vision and image processing I have released my own personal Raspbian .img file with OpenCV pre-installed. This is the exact Raspbian image I use for my own projects and is compatible with the Raspberry Pi 2, Raspberry Pi 3, and Raspberry Pi Zero W.

If you're looking to get your Raspberry Pi up and running with OpenCV + Python, this is by far the easiest method.

Hardcopies of Practical Python and OpenCV + Case Studies — now available!

This is your exclusive hardcopy edition of practical python and opencv + case studies . hot off the press, this hardcopy book is 275 pages of the most comprehensive guide to learning computer vision and opencv that you can get. it's yours, shipped anywhere in the world for free..

There is just something about a hardcopy of a book that can't be beat. The feel of the book in your hands. The crisp sound as pages turn. And not to mention, it looks beautiful on your bookshelf as well! I've wanted to offer Practical Python and OpenCV + Case Studies in print ever since I finished up the first version of the book over a year ago, but I struggled to find a publisher. That's all changed now, the hardcopy editions are ready to go! If you like the feeling of having a book in your hands, then the hardcopy edition is a must have.

I'm sold. Upgrade me to the Hardcopy Bundle »

Hardcopy editions of Practical Python and OpenCV + Case Studies are now available!

Claim your access to the exclusive companion website.

Let's be realistic — there's only so much content that i can fit into a book...and that's exactly why i've created the practical python and opencv companion website. this companion website is the ultimate guide to making the most out of the material in my book and the pyimagesearch blog..

Your purchase of Practical Python and OpenCV includes complementary access to the book's exclusive companion website. This website provides:

  • Access to additional resources and guides to continue your computer vision + OpenCV education.
  • End of chapter discussions that cross-reference relevant PyImageSearch blog posts , allowing to build a strong computer vision and OpenCV foundation quickly and effectively.
  • Quizzes to test your knowledge — the best way to learn computer vision is to learn by doing , and that's exactly what these quizzes will allow you to do.
That perfect first step if you are interested in computer vision but don't know where to start...You'll be glued to your workstation as you try out just one more example. Jason Brownlee Creator of Machine Learning Mastery

The Practical Python and OpenCV + Case Studies method really works...

Don't waste time installing.....

...invest your time learning and jump start your computer vision education. In order to get you learning as fast as possible, I have created a downloadable VirtualBox virtual machine that has all the computer vision and image processing libraries you will need pre-installed . Get the eBook »

Learn the fundamentals of image processing

Practical Python and OpenCV + Case Studies covers the very basics of computer vision , starting from answering the question "what's a pixel?" to working your way up to more challenging tasks such as edge detection , thresholding , and finding objects in images and counting them , all with lots of examples and code. Get the eBook »

Lots of visual examples, lots of code

You probably learn by example. This book is tremendously example driven . When I first set out to write this book, I wanted it to be as hands-on as possible . I wanted lots of visual examples with a ton of example of code. I wanted to write something that you could easily learn from, without all the rigor and detail of mathematics. You don't need a college education to understand the examples in this book. Get the eBook »

Satisfy your curiosity

I'm willing to bet that you're curious to learn new things. And from Facebook to Flickr, we now have more images than ever! Ask yourself, what does your imagination want to build? Let it run wild. And let the computer vision techniques introduced in this book help you build it. Get the eBook »

The right formula for learning

Practical Python and OpenCV + Case Studies is an accessible 275+ page book written for developers , programmers , and students just like you who are looking to learn the fundamentals of computer vision and image processing. Get the eBook »

Computer vision isn't magic

You can learn computer vision. Let me teach you. What makes a computer "see"? How does it understand what's in an image? And how can you learn to program your computer to interpret images? Practical Python and OpenCV covers the image processing essentials to get you started in the world of computer vision . Get the eBook »

See, I've distilled the basics down to the very core — a quick read with tons of examples

This book is a great starting point for people looking to get started with computer vision. It will walks you through the most important functions in OpenCV that you'll need for any serious computer vision project. Ivo Flipse GitHub
Practical Python and OpenCV is a non-intimidating introduction to basic image processing tasks in Python. While reading the book, it feels as if Adrian is right next to you, helping you understand the many code examples without getting lost in mathematical details. Dr. Tomasz Malisiewicz Co-Founder of Vision.ai

case study for image processing

Enjoy a 100% money back guarantee.

After reading my book, if you haven't learned the basics of computer vision and image processing, then I don't want your money. That's why I offer a 100% Money Back Guarantee. Simply send me an email and ask for a refund, up to 30 days after your purchase. With all the copies I've sold, I count the number of refunds on one hand. My readers are satisfied and I'm sure you will be too.

I have a bundle tailored to your needs

Hardcopy bundle.

The core image processing guide

Price increases in:

motionmailapp.com

This bundle is the complete package . When you purchase this collection you'll receive the digital eBooks, the hardcopy edition, mailed to your doorstep , the downloadable Ubuntu VirtualBox virtual machine so you can start learning OpenCV instantly, my pre-configured Raspbian .img, and 16 videos covering over 4+ hours of tutorials from the books. If you're serious about learning computer vision and OpenCV, there is no doubt in my mind that this is the best bundle for you.

When you purchase the Hardcopy Bundle you'll receive:

  • An exclusive hardcopy edition of Practical Python and OpenCV + Case Studies mailed right to your doorstep
  • The complete 4th edition eBooks in PDF and Kindle format
  • 16 videos covering over 4+ hours of tutorials from Practical Python and OpenCV + Case Studies
  • The downloadable, pre-configured Ubuntu virtual machine (runs on OSX, Linux, & Windows)
  • My personal Raspbian .img file pre-baked with Python + OpenCV already installed.
  • All source code listings, example images, and datasets used in both books
  • FREE updates as the books are revised
  • The original 1st edition of Practical Python and OpenCV + Case Studies which covers OpenCV 2.4

No Risk 100% Money Back Guarantee!

click here to pay with PayPal --> click here to pay with PayPal click here to pay with PayPal

Hardcopy + University

Now Only $ 94 71

The world's #1 online computer vision course.

Discover how you can get the best of both worlds with PyImageSearch University and the Practical Python Hardcopy Bundle.

Use the power of OpenCV, TensorFlow, and PyTorch to solve complex computer vision problems in under 30 minutes with our easy-to-follow code examples.

Stay ahead of the curve and future-proof your skills with our comprehensive library of computer vision resources.

Upgrade to PyImageSearch University and the Practical Python Hardcopy Bundle to start your journey towards mastering computer vision today.

  • Lifetime access to PyImageSearch University
  • Brand new courses released every month, ensuring you can keep up with state-of-the-art techniques
  • 115 hours of on-demand video
  • 86 courses on essential computer vision, deep learning and OpenCV topics
  • 94 Certificates of Completion
  • 540 tutorials and downloadable resources
  • Pre-configured Jupyter Notebooks in Google Colab for 338 PyImageSearch tutorials
  • Run all code examples in your web browser - works on Windows, macOS, and Linux (no dev environment configuration required!)
  • Access to centralized code repos for all 348 tutorials on PyImageSearch
  • Easy one click downloads for code, datasets, pre-trained models, etc.
  • Access on mobile, laptop, desktop

Everything you need to become an OpenCV ninja

For the first time ever , I am proud to offer hardcopy editions of Practical Python and OpenCV + Case Studies . This bundle includes not only the digital eBook editions, but also an exclusive, hardcopy edition as well

There is just something about the hardcopy edition of the book that can't be beat. The feel of the book in your hands. The crisp sound as the pages turn. And not to mention, it looks beautiful on your bookshelf.

I can guarantee you that Case Studies and Practical Python and OpenCV are the best books to teach you OpenCV and Python right now. One June Kang Student at University of Southern California
First of all, thank you for the book. I find it really valuable and helpful. It gave me a good grasp on my path to learning computer vision/image processing. Now I have a good starting point to continue learning, exploring, and how to apply the OpenCV library in my new ideas. Mikko Leppänen GitHub
Practical Python and OpenCV is an easy read and step by step approach, smarter than any reference manual I have read. Lai-Cheung-Kit Christophe GitHub
I was lost for a couple of months until I ran into the books Practical Python and OpenCV and Case Studies. From that moment, I was able to face my university final project with utter confidence. Adrian's writing style is clear, straightforward, and very easy to understand, but also very close and entertaining. I'm happy I found it. Eduardo Valenzuela Student at University of Granada, Spain

The Team Bundle — Your all-access pass

Jumpstart your entire team with the skills they need to solve real-world computer vision problems. get everything in the quickstart bundle above plus:.

click here to pay with PayPal

45-minute video chat

You've gone through the books, but still have a few followup questions? Want to go straight to the source? Select this option and I'll hop on a 1-on-1 call with you and your team.

Share with your team

Practical Python and OpenCV + Case Studies are great to get your entire development or research team up to speed with Python and OpenCV. Since you care as much about copyright as I do, I have a team license that allows you to share the books + virtual machine with up to 10 members of your team. No, there isn't any DRM involved — just trust. You'll also receive 5 hardcopy editions that you can distribute to members of your team.

Get answers to your questions

You have specific questions about a computer vision problem you are trying to solve — and I’m here to help. I'll answer your questions and get you and your team instantly on track.

Here are some common questions that I get asked...

Which bundle should i buy.

This mostly depends on your budget. Obviously the Hardcopy Bundle is the best since you get the complete package , but the eBooks themselves are still 275+ pages of the best computer vision, image processing, and OpenCV guides and tutorials you can find.

What if I hate the book?

Well, hate is a strong word...but if you honestly hate this book and feel like you haven't learned the basics of computer vision, then I don't want your money. Just reply to your purchase receipt email within 30 days and I'll refund your purchase.

Why Python?

First of all, Python is awesome. Secondly, Python is the best way to learn the basics of computer vision. The simple, intuitive syntax allows you to focus on learning the basics of computer vision, rather than spending hours fixing crazy compiler errors in C/C++.

Why this book?

Practical Python and OpenCV is your best, guaranteed quick-start guide to learning the basics of computer vision and image processing. Whether you're new to the world of computer vision or already know a thing or two, this book can teach you the basics in a single weekend . I guarantee it.

Can you really teach me computer vision in a single weekend?

Yes, I absolutely guarantee it. In fact, I'm so confident that I'm offering a 100% money back guarantee on it.

What if I'm beginner at computer vision?

This book is designed for you. It gives you the very basics of computer vision and image processing using Python and OpenCV. This book is tremendously example driven and is as hands-on as possible. With me as a coach by your side, you'll learn by getting your hands dirty. It's the best way to learn!

What versions of Python + OpenCV are used?

The 4th edition of Practical Python and OpenCV + Case Studies covers Python2.7+, Python 3+, along with OpenCV 3 and OpenCV 4. The 1st edition of the book (also included in the download of all bundles) covers Python 2.7 and OpenCV 2.4.X.

Are OpenCV 3 and OpenCV 4 covered?

You bet they are! All source code listings will run out of the box with OpenCV 3 and OpenCV 4. Backwards compatability with OpenCV 2.4 is also included.

I've heard that OpenCV is a real pain to install...

OpenCV isn't like other Python packages. You can't let pip and easy_install do the heavy lifting for you. You need to download it, configure it, and compile it. It's a real time sink. In order to save you a bunch of time and hassle, I've created a downloadable Ubuntu VirtualBox virtual machine with all the necessary computer packages you need pre-installed. Check out the Quickstart Bundle above.

I'm just so busy right now...

I have boiled down computer vision and image processing to the core topics without all the fluff. If you can give me less an hour a night, I can teach you the basics of computer vision using Python and OpenCV in no time.

I don't have the money to buy your book...

Think of it this way. Most computer vision textbooks cost well over $200. And they don't even include 4+ hours of video tutorials or a downloadable, pre-configured Ubuntu virtual machine with all your computer vision libraries pre-installed! For less than the third of a cost of used textbook, you could be learning the basics of computer vision and image processing this weekend .

Can I purchase just a hardcopy of the book by itself?

The hardcopy edition of Practical Python and OpenCV + Case Studies is only offered in the Hardcopy Bundle. As a self-published author, it's not cheap to have copies of the books printed — I also manually fulfill all orders myself. In order to make the hardcopies feasible, I need to charge a little extra and provide a ton of added value through the virtual machine and video tutorials.

Is shipping included in the price of the Hardcopy Bundle?

Yes, shipping is already included in the price of the Hardcopy Bundle.

What countries do you ship to?

I ship to all countries. If you have a particular concern about shipping, please contact me .

Where can I learn more about you?

I have written a ton of blog posts about computer vision and image processing over at PyImageSearch . Definitely check out the posts to get a good feel for my teaching and writing style. I also suggest that you grab the sample chapter I am offering using the form above.

I have another question.

If you have any other questions, please send me a message and I’ll get back to you immediately.

Adrian Rosebrock, author of Practical Python and OpenCV

So, who's behind this?

Hey, I'm Adrian Rosebrock, an entrepreneur and Ph.D who has spent the last eight years studying computer vision, machine learning, and image search engines. I've launched two successful image search engines, ID My Pill, and Chic Engine. I've even consulted with the National Cancer Institute to develop image processing and machine learning algorithms to automatically analyze breast histology images for cancer risk factors.

It's safe to say that I have a ton of experience in the computer vision world and know my way around a Python shell and image processing libraries. I'm here to distill all my years of experience into bite size, easy to understand chunks, while sharing the tips, tricks, and hacks I've learned along the way.

If you are interested in computer vision and image processing but don't know where to start, then this book is definitely for you. It's the best, guaranteed quick start guide to learning the fundamentals of computer vision and image processing using Python and OpenCV.

Practical Python and OpenCV: The best, guaranteed quick start guide to learning the fundamentals of computer vision and image processing using Python and OpenCV

Navigation Menu

Search code, repositories, users, issues, pull requests..., provide feedback.

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly.

To see all available qualifiers, see our documentation .

  • Notifications You must be signed in to change notification settings

This repository contains projects related to various aspects of image processing, from basic operations to advanced techniques like active contours. Examples and case studies focus on applications in medical imaging.

zahrasalarian/Digital-Image-Processing-for-Medical-Applications

Folders and files.

NameName
145 Commits

Repository files navigation

Digital image processing for medical applications.

This repository contains assignments and projects related to various aspects of image processing, from basic operations to advanced techniques like active contours. Examples and case studies focus on applications in medical imaging.

Table of Contents

  • pt-1.ipynb (File Link)
  • pt-2.ipynb (File Link)
  • pt-3.ipynb (File Link)
  • pt-4.ipynb (File Link)
  • affine_transformations_and_image_interpolation.ipynb (File Link)
  • contrast_and_brightness_adjustments.ipynb (File Link)
  • contrast_stretching_and_power_law.ipynb (File Link)
  • histogram_equalization_and_CLAHE.ipynb (File Link)
  • mean_median_and_laplacian_isotropic_filters.ipynb (File Link)
  • laplacian_sharpening_sobely_sobely.ipynb (File Link)
  • fourier_transform_and_band_reject.ipynb (File Link)
  • low_and_high_ideal_butterworth_guassian_filters.ipynb (File Link)
  • restoration.ipynb (File Link)
  • morphological_operations.ipynb (File Link)
  • non_maximum_suppression_and_hysteresis_thresholding.ipynb (File Link)
  • hough_circle_detection.py (File Link)
  • active_contours_snakes_method.ipynb (File Link)

HW0 - Introduction to Image Analysis with Python

In this section, we introduce the basics of Python programming and data visualization, laying the groundwork for advanced image analysis topics.

Topics Covered

  • Exploring NumPy functionalities
  • Data Types and Memory Management
  • Array Manipulations
  • Populating Matrixes Based on Defined Rules
  • 2D Matrix Generation with Circle Pattern
  • Adding Random Noise to Matrix
  • Data Distribution Visualization
  • Plotting Histograms with Matplotlib

HW1 - Introduction to Operations on Images

In this section, we delve into basic image operations, including transformations and adjustments. The notebooks cover a variety of techniques such as affine transformations, image interpolation, and contrast & brightness adjustments.

affine_transformations_and_image_interpolation.ipynb

  • Affine Transformations (Rotation, Scaling, Shearing)
  • Downsampling
  • Resampling & Interpolation (Cubic, Linear, Nearest)

contrast_and_brightness_adjustments.ipynb

  • Images Normalization
  • Linear and Non-linear Transformations
  • Adjusted Contrast & Brightness

HW2 - Intensity-based Operations

This part, explores the basics of intensity-based operations for image enhancement. Techniques ranging from contrast stretching and power law transformations to histogram equalization and CLAHE are covered. Each notebook offers a thorough analysis of histogram techniques and their outcomes, providing a complete understanding of the subject.

contrast_stretching_and_power_law.ipynb

  • Contrast Stretching
  • Power-Law (Gamma) Transformation
  • Different Gamma Value Experimentation
  • Comparison between Contrast Stretching and Power-Law Along

histogram_equalization_and_CLAHE.ipynb

  • Histogram Equalization
  • Contrast Limited Adaptive Histogram Equalization (CLAHE)
  • Analysis of Histogram Techniques and Their Outcomes

HW3 - Spatial Operations

In this part, the focus shifts to spatial filtering techniques that emphasize on specific features in images. We explore various types of filters like mean, median, and Laplacian, along with edge-detection methods such as Sobel operators.

mean_median_and_laplacian_isotropic_filters.ipynb

  • Spatial Filters (Mean, Median)
  • Image Blurring Techniques
  • Laplacian Isotropic Filter
  • Image Enhancement

laplacian_sharpening_sobely_sobely.ipynb

  • Laplacian Sharpening
  • Sobel Filters (Sobel-X, Sobel-Y)
  • Edge Detection Techniques

HW4 - Frequency Domain Operations

In this section, we delve into the realm of frequency domain operations, studying the Fourier Transform and its applications in image processing. From basic Fourier Transform techniques to the implementation of various types of filters such as Ideal, Butterworth, and Gaussian, this section provides a comprehensive look into the manipulation of images in the frequency domain.

fourier_transform_and_band_reject.ipynb

  • Fourier Transform for Image Analysis
  • Band-Reject Filtering
  • Frequency Domain Techniques

low_and_high_ideal_butterworth_guassian_filters.ipynb

  • Fourier Transform & Inverse Fourier Transform
  • Low- and High-Pass Filters (Ideal, Butterworth, Gaussian)

HW5 - Image Restoration and Morphological Image Processing

In this part, we explore various methods for improving image quality and enhancing features through various restoration and morphological techniques. This section covers a range of topics, from eliminating unwanted artifacts to performing operations like dilation and erosion. We explore the fundamentals of these methods, their applications, and their effects on different types of images.

restoration.ipynb

  • Noise Distribution Analysis
  • Alpha-Trimmed Mean Filtering
  • Inverse Filtering for Image Restoration
  • High- and Low-Pass Butterworth Filters

morphological_operations.ipynb

  • Dilation and Erosion Functions
  • Boundary Identification through Textural Segmentation
  • Morphologic Opening and Closing

HW6 - Segmentation and Active Contours

The final section focuses on the complex realm of image segmentation and contour detection. We employ a range of algorithms and techniques to identify and isolate specific structures within images. From basic circle detection using the Hough transform to sophisticated active contours known as "snakes". These techniques help us to explore how to extract meaningful information from complex visual scenes.

non_maximum_suppression_and_hysteresis_thresholding.ipynb

  • Sobel and Prewitt Operators
  • Non-Maximum Suppression
  • Hysteresis Thresholding

hough_circle_detection.py

  • Circle Detection using Hough Transform

active_contours_snakes_method.ipynb

  • User Interface for Gathering Initial Contour Points
  • Calculating Equally Spaced 2D Contour Points
  • Snake External and Internal Energy Calculating

Contour Evolution

To give a visual summary of the exploration into active contours, below is an image illustrating the evolution of a contour after several iterations:

Contour Evolution

  • Jupyter Notebook 100.0%

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Review Article
  • Open access
  • Published: 12 April 2022

Machine learning for medical imaging: methodological failures and recommendations for the future

  • Gaël Varoquaux 1 , 2 , 3 &
  • Veronika Cheplygina   ORCID: orcid.org/0000-0003-0176-9324 4  

npj Digital Medicine volume  5 , Article number:  48 ( 2022 ) Cite this article

68k Accesses

226 Citations

278 Altmetric

Metrics details

  • Computer science
  • Medical research
  • Research data

Research in computer analysis of medical images bears many promises to improve patients’ health. However, a number of systematic challenges are slowing down the progress of the field, from limitations of the data, such as biases, to research incentives, such as optimizing for publication. In this paper we review roadblocks to developing and assessing methods. Building our analysis on evidence from the literature and data challenges, we show that at every step, potential biases can creep in. On a positive note, we also discuss on-going efforts to counteract these problems. Finally we provide recommendations on how to further address these problems in the future.

Similar content being viewed by others

case study for image processing

Causality matters in medical imaging

case study for image processing

Designing clinically translatable artificial intelligence systems for high-dimensional medical imaging

case study for image processing

Self-supervised learning for medical image classification: a systematic review and implementation guidelines

Introduction.

Machine learning, the cornerstone of today’s artificial intelligence (AI) revolution, brings new promises to clinical practice with medical images 1 , 2 , 3 . For example, to diagnose various conditions from medical images, machine learning has been shown to perform on par with medical experts 4 . Software applications are starting to be certified for clinical use 5 , 6 . Machine learning may be the key to realizing the vision of AI in medicine sketched several decades ago 7 .

The stakes are high, and there is a staggering amount of research on machine learning for medical images. But this growth does not inherently lead to clinical progress. The higher volume of research could be aligned with the academic incentives rather than the needs of clinicians and patients. For example, there can be an oversupply of papers showing state-of-the-art performance on benchmark data, but no practical improvement for the clinical problem. On the topic of machine learning for COVID, Robert et al. 8 reviewed 62 published studies, but found none with potential for clinical use.

In this paper, we explore avenues to improve clinical impact of machine learning in medical imaging. After sketching the situation, documenting uneven progress in Section It’s not all about larger datasets, we study a number of failures frequent in medical imaging papers, at different steps of the “publishing lifecycle”: what data to use (Section Data, an imperfect window on the clinic), what methods to use and how to evaluate them (Section Evaluations that miss the target), and how to publish the results (Section Publishing, distorted incentives). In each section, we first discuss the problems, supported with evidence from previous research as well as our own analyses of recent papers. We then discuss a number of steps to improve the situation, sometimes borrowed from related communities. We hope that these ideas will help shape research practices that are even more effective at addressing real-world medical challenges.

It’s not all about larger datasets

The availability of large labeled datasets has enabled solving difficult machine learning problems, such as natural image recognition in computer vision, where datasets can contain millions of images. As a result, there is widespread hope that similar progress will happen in medical applications, algorithm research should eventually solve a clinical problem posed as discrimination task. However, medical datasets are typically smaller, on the order of hundreds or thousands: 9 share a list of sixteen “large open source medical imaging datasets”, with sizes ranging from 267 to 65,000 subjects. Note that in medical imaging we refer to the number of subjects, but a subject may have multiple images, for example, taken at different points in time. For simplicity here we assume a diagnosis task with one image/scan per subject.

Few clinical questions come as well-posed discrimination tasks that can be naturally framed as machine-learning tasks. But, even for these, larger datasets have to date not lead to the progress hoped for. One example is that of early diagnosis of Alzheimer’s disease (AD), which is a growing health burden due to the aging population. Early diagnosis would open the door to early-stage interventions, most likely to be effective. Substantial efforts have acquired large brain-imaging cohorts of aging individuals at risk of developing AD, on which early biomarkers can be developed using machine learning 10 . As a result, there have been steady increases in the typical sample size of studies applying machine learning to develop computer-aided diagnosis of AD, or its predecessor, mild cognitive impairment. This growth is clearly visible in publications, as on Fig. 1 a, a meta-analysis compiling 478 studies from 6 systematic reviews 4 , 11 , 12 , 13 , 14 , 15 .

figure 1

A meta-analysis across 6 review papers, covering more than 500 individual publications. The machine-learning problem is typically formulated as distinguishing various related clinical conditions, Alzheimer’s Disease (AD), Healthy Control (HC), and Mild Cognitive Impairment, which can signal prodromal Alzheimer’s . Distinguishing progressive mild cognitive impairment (pMCI) from stable mild cognitive impairment (sMCI) is the most relevant machine-learning task from the clinical standpoint. a Reported sample size as a function of the publication year of a study. b Reported prediction accuracy as a function of the number of subjects in a study. c Same plot distinguishing studies published in different years.

However, the increase in data size (with the largest datasets containing over a thousand subjects) did not come with better diagnostic accuracy, in particular for the most clinically relevant question, distinguishing pathological versus stable evolution for patients with symptoms of prodromal Alzheimer’s (Fig. 1 b). Rather, studies with larger sample sizes tend to report worse prediction accuracy. This is worrisome, as these larger studies are closer to real-life settings. On the other hand, research efforts across time did lead to improvements even on large, heterogeneous cohorts (Fig. 1 c), as studies published later show improvements for large sample sizes (statistical analysis in Supplementary Information) . Current medical-imaging datasets are much smaller than those that brought breakthroughs in computer vision. Although a one-to-one comparison of sizes cannot be made, as computer vision datasets have many classes with high variation (compared to few classes with less variation in medical imaging), reaching better generalization in medical imaging may require assembling significantly larger datasets, while avoiding biases created by opportunistic data collection, as described below.

Data, an imperfect window on the clinic

Datasets may be biased: reflect an application only partly.

Available datasets only partially reflect the clinical situation for a particular medical condition, leading to dataset bias 16 . As an example, a dataset collected as part of a population study might have different characteristics that people who are referred to the hospital for treatment (higher incidence of a disease). As the researcher may be unaware of the corresponding dataset bias is can lead to important that shortcomings of the study. Dataset bias occurs when the data used to build the decision model (the training data), has a different distribution than the data on which it should be applied 17 (the test data). To assess clinically-relevant predictions, the test data must match the actual target population, rather than be a random subset of the same data pool as the train data, the common practice in machine-learning studies. With such a mismatch, algorithms which score high in benchmarks can perform poorly in real world scenarios 18 . In medical imaging, dataset bias has been demonstrated in chest X-rays 19 , 20 , 21 , retinal imaging 22 , brain imaging 23 , 24 , histopathology 25 , or dermatology 26 . Such biases are revealed by training and testing a model across datasets from different sources, and observing a performance drop across sources.

There are many potential sources of dataset bias in medical imaging, introduced at different phases of the modeling process 27 . First, a cohort may not appropriately represent the range of possible patients and symptoms, a bias sometimes called spectrum bias 28 . A detrimental consequence is that model performance can be overestimated for different groups, for example between male and female individuals 21 , 26 . Yet medical imaging publications do not always report the demographics of the data.

Imaging devices or procedures may lead to specific measurement biases. A bias particularly harmful to clinically relevant automated diagnosis is when the data capture medical interventions. For instance, on chest X-ray datasets, images for the “pneumothorax” condition sometimes show a chest drain, which is a treatment for this condition, and which would not yet be present before diagnosis 29 . Similar spurious correlations can appear in skin lesion images due to markings placed by dermatologists next to the lesions 30 .

Labeling errors can also introduce biases. Expert human annotators may have systematic biases in the way they assign different labels 31 , and it is seldom possible to compensate with multiple annotators. Using automatic methods to extract labels from patient reports can also lead to systematic errors 32 . For example, a report on a follow-up scan that does not mention previously-known findings, can lead to an incorrect “negative” labels.

Dataset availability distorts research

The availability of datasets can influence which applications are studied more extensively. A striking example can be seen in two applications of oncology: detecting lung nodules, and detecting breast tumors in radiological images. Lung datasets are widely available on Kaggle or grand-challenge.org , contrasted with (to our knowledge) only one challenge focusing on mammograms. We look at the popularity of these topics, here defined by the fraction of papers focusing on lung or breast imaging, either in literature on general medical oncology, or literature on AI. In medical oncology this fraction is relatively constant across time for both lung and breast imaging, but in the AI literature lung imaging publications show a substantial increase in 2016 (Fig. 2 , methodological details in Supplementary Information ). We suspect that the Kaggle lung challenges published around that time contributed to this disproportional increase. A similar point on dataset trends has been made throughout the history of machine learning in general 33 .

figure 2

We show the percentage of papers on lung cancer (in blue) vs breast cancer (in red), relative to all papers within two fields: medical oncology (solid line) and AI (dotted line). Details on how the papers are selected are given in the Supplementary Information) . The percentages are relatively constant, except lung cancer in AI, which shows an increase after 2016.

Let us build awareness of data limitations

Addressing such problems arising from the data requires critical thinking about the choice of datasets, at the project level, i.e. which datasets to select for a study or a challenge, and at a broader level, i.e. which datasets we work on as a community.

At the project level, the choice of the dataset will influence the models trained on the data, and the conclusions we can draw from the results. An important step is using datasets from multiple sources, or creating robust datasets from the start when feasible 9 . However, existing datasets can still be critically evaluated for dataset bias 34 , hidden subgroups of patients 29 , or mislabeled instances 35 . A checklist for such evaluation on computer vision datasets is presented in Zendel et al. 18 . When problems are discovered, relabeling a subset of the data can be a worthwhile investment 36 .

At the community level, we should foster understanding of the datasets’ limitations. Good documentation of datasets should describe their characteristics and data collection 37 . Distributed models should detail their limitations and the choices made to train them 38 .

Meta-analyses which look at evolution of dataset use in different areas are another way to reflect on current research efforts. For example, a survey of crowdsourcing in medical imaging 39 shows a different distribution of applications than surveys focusing on machine learning 1 , 2 . Contrasting more clinically-oriented venues to more technical venues can reveal opportunities for machine learning research.

Evaluations that miss the target

Evaluation error is often larger than algorithmic improvements.

Research on methods often focuses on outperforming other algorithms on benchmark datasets. But too strong a focus on benchmark performance can lead to diminishing returns , where increasingly large efforts achieve smaller and smaller performance gains. Is this also visible in the development of machine learning in medical imaging?

We studied performance improvements in 8 Kaggle medical-imaging challenges, 5 on detection of diagnosis of diseases and 3 on image segmentation (details in Supplementary Information) . We use the differences in algorithms performance between the public and private leaderboards (two test sets used in the challenge) to quantify the evaluation noise –the spread of performance differences between the public and private test sets–, in Fig. 3 . We compare its distribution to the winner gap —the difference in performance between the best algorithm, and the “top 10%” algorithm.

figure 3

The blue violin plot shows the evaluation noise —the distribution of differences between public and private leaderboards. A systematic shift between public and private set (positive means that the private leaderboard is better than the public leaderboard) indicates overfitting or dataset bias. The width of this distribution shows how noisy the evaluation is, or how representative the public score is for the private score. The brown bar is the winner gap , the improvement between the top-most model (the winner) and the 10% best model. It is interesting to compare this improvement to the shift and width in the difference between the public and private sets: if the winner gap is smaller, the 10% best models reached diminishing returns and did not lead to a actual improvement on new data.

Overall, 6 of the 8 challenges are in the diminishing returns category. For 5 challenges—lung cancer, schizophrenia, prostate cancer diagnosis and intracranial hemorrhage detection—the evaluation noise is worse than the winner gap. In other words, the gains made by the top 10% of methods are smaller than the expected noise when evaluating a method.

For another challenge, pneumothorax segmentation, the performance on the private set is worse than on the public set, revealing an overfit larger than the winner gap. Only two challenges (covid 19 abnormality and nerve segmentation) display a winner gap larger than the evaluation noise, meaning that the winning method made substantial improvements compared to the 10% competitor.

Improper evaluation procedures and leakage

Unbiased evaluation of model performance relies on training and testing the models with independent sets of data 40 . However incorrect implementations of this procedure can easily leak information, leading to overoptimistic results. For example some studies classifying ADHD based on brain imaging have engaged in circular analysis 41 , performing feature selection on the full dataset, before cross-validation. Another example of leakage arises when repeated measures of an individual are split across train and test set, the algorithm then learning to recognize the individual patient rather than markers of a condition 42 .

A related issue, yet more difficult to detect, is what we call “overfitting by observer”: even when using cross-validation, overfitting may still occur by the researcher adjusting the method to improve the observed cross-validation performance, which essentially includes the test folds into the validation set of the model. Skocik et al. 43 provide an illustration of this phenomenon by showing how by adjusting the model this way can lead to better-than-random cross-validation performance for randomly generated data. This can explain some of the overfitting visible in challenges (Section Evaluation error is often larger than algorithmic improvements), though with challenges a private test set reveals the overfitting, which is often not the case for published studies. Another recommendation for challenges would be to hold out several datasets (rather than a part of the same dataset), as is for example done in the Decathlon challenge 44 .

Metrics that do not reflect what we want

Evaluating models requires choosing a suitable metric. However, our understanding of “suitable” may change over time. For example, an image similarity metric which was widely used to evaluate image registration algorithms, was later shown to be ineffective as scrambled images could lead to high scores 45 .

In medical image segmentation, Maier-Hein et al. 46 review 150 challenges and show that the typical metrics used to rank algorithms are sensitive to different variants of the same metric, casting doubt on the objectivity of any individual ranking.

Important metrics may be missing from evaluation. Next to typical classification metrics (sensitivity, specificity, area under the curve), several authors argue for a calibration metric that compares the predicted and observed probabilities 28 , 47 .

Finally, the metrics used may not be synonymous with practical improvement 48 , 49 . For example, typical metrics in computer vision do not reflect important aspects of image recognition, such as robustness to out-of-distribution examples 49 . Similarly, in medical imaging, improvements in traditional metrics may not necessarily translate to different clinical outcomes, e.g. robustness may be more important than an accurate delineation in a segmentation application.

Incorrectly chosen baselines

Developing new algorithms builds upon comparing these to baselines. However, if these baselines are poorly chosen, the reported improvement may be misleading.

Baselines may not properly account for recent progress, as revealed in machine-learning applications to healthcare 50 , but also other applications of machine learning 51 , 52 , 53 .

Conversely, one should not forget simple approaches effective for the problem at hand. For example, Wen et al. 14 show that convolutional neural networks do not outperform support vector machines for Alzheimer’s disease diagnosis from brain imaging.

Finally, minute implementation details of algorithms may be important and many are not aware of implementation factors 54 .

Statistical significance not tested, or misunderstood

Experimental results are by nature noisy: results may depend on which specific samples were used to train the models, the random initializations, small differences in hyper-parameters 55 . However, benchmarking predictive models currently lacks well-adopted statistical good practices to separate out noise from generalizable findings.

A first, well-documented, source of brittleness arises from machine-learning experiments with too small sample sizes 56 . Indeed, testing predictive modeling requires many samples, more than conventional inferential studies, else the measured prediction accuracy may be a distant estimation of real-life performance. Sample sizes are growing, albeit slowly 57 . On a positive note, a meta-analysis of public vs private leaderboards on Kaggle 58 suggests that overfitting is less of an issue with “large enough” test data (at least several thousands).

Another challenge is that strong validation of a method requires it to be robust to details of the data. Hence validation should go beyond a single dataset, and rather strive for statistical consensus across multiple datasets 59 . Yet, the corresponding statistical procedures require dozens of datasets to establish significance and are seldom used in practice. Rather, medical imaging research often reuses the same datasets across studies, which raises the risk of finding an algorithm that performs well by chance, in an implicit multiple comparison problem 60 .

But overall medical imaging research seldom analyzes how likely empirical results are to be due to chance: only 6% of segmentation challenges surveyed 61 , and 15% out of 410 popular computer science papers published by ACM used a statistical test 62 .

However, null-hypothesis tests are often misinterpreted 63 , with two notable challenges: (1) the lack of statistically significant results does not demonstrate the absence of effect, and (2) any trivial effect can be significant given enough data 64 , 65 . For these reasons, Bouthiellier et al. 66 recommend to replace traditional null-hypothesis testing with superiority testing , testing that the improvement is above a given threshold.

Let us redefine evaluation

Higher standards for benchmarking.

Good machine-learning benchmarks are difficult. We compile below several recognized best practices for medical machine learning evaluation 28 , 40 , 67 , 68 :

Safeguarding from data leakage by separating out all test data from the start, before any data transformation.

A documented way of selecting model hyper-parameters (including architectural parameters for neural networks, the use of additional (unlabeled) dataset or transfer learning 2 ), without ever using data from the test set.

Enough data in the test set to bring statistical power, at least several hundreds samples, ideally thousands or more 9 , and confidence intervals on the reported performance metric—see Supplementary Information . In general, more research on appropriate sample sizes for machine learning studies would be helpful.

Rich data to represent the diversity of patients and disease heterogeneity, ideally multi-institutional data including all relevant patient demographics and disease state, with explicit inclusion criteria; other cohorts with different recruitment go the extra mile to establish external validity 69 , 70 .

Strong baselines that reflect the state of the art of machine-learning research, but also historical solutions including clinical methodologies not necessarily relying on medical imaging.

A discussion the variability of the results due to arbitrary choices (random seeds) and data sources with an eye on statistical significance—see Supplementary Information .

Using different quantitative metrics to capture the different aspects of the clinical problem and relating them to relevant clinical performance metrics. In particular, the potential health benefits from a detection of the outcome of interest should be used to choose the right trade off between false detections and misses 71 .

Adding qualitative accounts and involving groups that will be most affected by the application in the metric design 72 .

More than beating the benchmark

Even with proper validation and statistical significance testing, measuring a tiny improvement on a benchmark is seldom useful. Rather, one view is that, beyond rejecting a null, a method should be accepted based on evidence that it brings a sizable improvement upon the existing solutions. This type of criteria is related to superiority tests sometimes used in clinical trials 73 , 74 , 75 . These tests are easy to implement in predictive modeling benchmarks, as they amount to comparing the observed improvement to variation of the results due to arbitrary choices such as data sampling or random seeds 55 .

Organizing blinded challenges, with a hidden test set, mitigate the winner’s curse. But to bring progress, challenges should not only focus on the winner. Instead, more can be learned by comparing the competing methods and analyzing the determinants of success, as well as failure cases.

Evidence-based medicine good practices

A machine-learning algorithm deployed in clinical practice is a health intervention. There is a well-established practice to evaluate the impact of health intervention, building mostly on randomized clinical trials 76 . These require actually modifying patients’ treatments and thus should be run only after thorough evaluation on historical data.

A solid trial evaluates a well-chosen measure of patient health outcome, as opposed to predictive performance of an algorithm. Many indirect mechanisms may affect this outcome, including how the full care processes adapts to the computer-aided decision. For instance, a positive consequence of even imperfect predictions may be reallocating human resources to complex cases. But a negative consequence may be over-confidence leading to an increase in diagnostic errors. Cluster randomized trials can account for how modifications at the level of care unit impact the individual patient: care units, rather than individuals are randomly allocated to receive the intervention (the machine learning algorithm) 77 . Often, double blind is impossible: the care provider is aware of which arm of the study is used, the baseline condition or the system evaluated. Providers’ expectations can contribute to the success of a treatment, for instance via indirect placebo or nocebo effects 78 , making objective evaluation of the health benefits challenging, if these are small.

Publishing, distorted incentives

No incentive for clarity.

The publication process does not create incentives for clarity. Efforts to impress may give rise to unnecessary “mathiness” of papers or suggestive language 79 (such as “human-level performance”).

Important details may be omitted, from ablation experiments showing what part of the method drives improvements 79 , to reporting how algorithms were evaluated in a challenge [ 46 ]. This in turn undermines reproducibility: being able to reproduce the exact results or even draw the same conclusions 80 , 81 .

Optimizing for publication

As researchers our goal should be to solve scientific problems. Yet, the reality of the culture we exist in can distort this objective. Goodhart’s law summarizes well the problem: when a measure becomes a target, it ceases to be a good measure . As our academic incentive system is based publications, it erodes their scientific content via Goodhart’s law.

Methods publication are selected for their novelty. Yet, comparing 179 classifiers on 121 datasets shows no statistically significant differences between the top methods [ 82 ]. In order to sustain novelty, researchers may be introducing unnecessary complexity into the methods, that do not improve their prediction but rather contribute to technical debt, making systems harder to maintain and deploy 83 .

Another metric emphasized is obtaining “state-of-the-art” results, which leads to several of the evaluation problems outlined in Section Evaluations that miss the target. The pressure to publish “good” results can aggravate methodological loopholes 84 , for instance gaming the evaluation in machine learning 85 . It is then all too appealing to find after-the-fact theoretical justifications of positive yet fragile empirical findings. This phenomenon, known as HARKing (hypothesizing after the results are known) 86 , has been documented in machine learning 87 and computer science in general 62 .

Finally, the selection of publications creates the so-called “file drawer problem” 88 : positive results, some due to experimental flukes, are more likely to be published than corresponding negative findings. For example, in 410 most downloaded papers from the ACM, 97% of the papers which used significance testing had a finding with p -value of less than 0.05 62 . It seems highly unlikely that only 3% of the initial working hypotheses—even for impactful work—turned out not confirmed.

Let us improve our publication norms

Fortunately there are various alleys to improve reporting and transparency. For instance, the growing set of open datasets could be leveraged for collaborative work beyond the capacities of a single team 89 . The set of metrics studied could then be broadened, shifting the publication focus away from a single-dimension benchmark. More metrics can indeed help understanding a method’s strengths and weaknesses 41 , 90 , 91 , exploring for instance calibration metrics 28 , 47 , 92 or learning curves 93 . The medical-research literature has several reporting guidelines for prediction studies 67 , 94 , 95 . They underline many points raised in previous sections: reporting on how representative the study sample is, on the separation between train and test data, on the motivation for the choice of outcome, evaluation metrics, and so forth. Unfortunately, algorithmic research in medical imaging seldom refers to these guidelines.

Methods should be studied on more than prediction performance: reproducibility 81 , carbon footprint 96 , or a broad evaluation of costs should be put in perspective with the real-world patient outcomes, from a putative clinical use of the algorithms 97 .

Preregistration or registered reports can bring more robustness and trust: the motivation and experimental setup of a paper are to be reviewed before empirical results are available, and thus the paper is be accepted before the experiments are run 98 . Translating this idea to machine learning faces the challenge that new data is seldom acquired in a machine learning study, yet it would bring sizeable benefits 62 , 99 .

More generally, accelerating the progress in science calls for accepting that some published findings are sometimes wrong 100 . Popularizing different types of publications may help, for example publishing negative results 101 , replication studies 102 , commentaries 103 and reflections on the field 68 or the recent NeurIPS Retrospectives workshops. Such initiatives should ideally be led by more established academics, and be welcoming of newcomers 104 .

Conclusions

Despite great promises, the extensive research in medical applications of machine learning seldom achieves a clinical impact. Studying the academic literature and data-science challenges reveals troubling trends: accuracy on diagnostic tasks progresses slower on research cohorts that are closer to real-life settings; methods research is often guided by dataset availability rather than clinical relevance; many developments of model bring improvements smaller than the evaluation errors. We have surveyed challenges of clinical machine-learning research that can explain these difficulties. The challenges start with the choice of datasets, plague model evaluation, and are amplified by publication incentives. Understanding these mechanisms enables us to suggest specific strategies to improve the various steps of the research cycle, promoting publications best practices 105 . None of these strategies are silver-bullet solutions. They rather require changing procedures, norms, and goals. But implementing them will help fulfilling the promises of machine-learning in healthcare: better health outcomes for patients with less burden on the care system.

Data availability

For reproducibility, all data used in our analyses are available on https://github.com/GaelVaroquaux/ml_med_imaging_failures .

Code availability

For reproducibility, all code for our analyses is available on https://github.com/GaelVaroquaux/ml_med_imaging_failures .

Litjens, G. et al. A survey on deep learning in medical image analysis. Med. Image Anal. 42 , 60–88 (2017).

Article   PubMed   Google Scholar  

Cheplygina, V., de Bruijne, M. & Pluim, J. P. W. Not-so-supervised: a survey of semi-supervised, multi-instance, and transfer learning in medical image analysis. Med. Image Anal. 54 , 280–296 (2019).

Zhou, S. K. et al. A review of deep learning in medical imaging: Image traits, technology trends, case studies with progress highlights, and future promises. Proceedings of the IEEE1-19 (2020).

Liu, X. et al. A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. The Lancet Digital Health (2019).

Topol, E. J. High-performance medicine: the convergence of human and artificial intelligence. Nat. Med. 25 , 44–56 (2019).

Article   CAS   PubMed   Google Scholar  

Sendak, M. P. et al. A path for translation of machine learning products into healthcare delivery. Eur. Med. J. Innov. 10 , 19–00172 (2020).

Google Scholar  

Schwartz, W. B., Patil, R. S. & Szolovits, P. Artificial intelligence in medicine (1987).

Roberts, M. et al. Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans. Nat. Mach. Intell. 3 , 199–217 (2021).

Article   Google Scholar  

Willemink, M. J. et al. Preparing medical imaging data for machine learning. Radiology192224 (2020).

Mueller, S. G. et al. Ways toward an early diagnosis in Alzheimer’s disease: the Alzheimer’s Disease Neuroimaging Initiative (ADNI). Alzheimer’s Dement. 1 , 55–66 (2005).

Dallora, A. L., Eivazzadeh, S., Mendes, E., Berglund, J. & Anderberg, P. Machine learning and microsimulation techniques on the prognosis of dementia: A systematic literature review. PLoS ONE 12 , e0179804 (2017).

Article   PubMed   PubMed Central   Google Scholar  

Arbabshirani, M. R., Plis, S., Sui, J. & Calhoun, V. D. Single subject prediction of brain disorders in neuroimaging: Promises and pitfalls. NeuroImage 145 , 137–165 (2017).

Sakai, K. & Yamada, K. Machine learning studies on major brain diseases: 5-year trends of 2014–2018. Jpn. J. Radiol. 37 , 34–72 (2019).

Wen, J. et al. Convolutional neural networks for classification of Alzheimer’s disease: overview and reproducible evaluation. Medical Image Analysis 101694 (2020).

Ansart, M. et al. Predicting the progression of mild cognitive impairment using machine learning: a systematic, quantitative and critical review. Medical Image Analysis 101848 (2020).

Torralba, A. & Efros, A. A. Unbiased look at dataset bias. In Computer Vision and Pattern Recognition (CVPR) , 1521–1528 (2011).

Dockès, J., Varoquaux, G. & Poline, J.-B. Preventing dataset shift from breaking machine-learning biomarkers. GigaScience 10 , giab055 (2021).

Zendel, O., Murschitz, M., Humenberger, M. & Herzner, W. How good is my test data? introducing safety analysis for computer vision. Int. J. Computer Vis. 125 , 95–109 (2017).

Pooch, E. H., Ballester, P. L. & Barros, R. C. Can we trust deep learning models diagnosis? the impact of domain shift in chest radiograph classification. In MICCAI workshop on Thoracic Image Analysis (Springer, 2019).

Zech, J. R. et al. Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: a cross-sectional study. PLoS Med. 15 , e1002683 (2018).

Larrazabal, A. J., Nieto, N., Peterson, V., Milone, D. H. & Ferrante, E. Gender imbalance in medical imaging datasets produces biased classifiers for computer-aided diagnosis. Proceedings of the National Academy of Sciences (2020).

Tasdizen, T., Sajjadi, M., Javanmardi, M. & Ramesh, N. Improving the robustness of convolutional networks to appearance variability in biomedical images. In International Symposium on Biomedical Imaging (ISBI), 549–553 (IEEE, 2018).

Wachinger, C., Rieckmann, A., Pölsterl, S. & Initiative, A. D. N. et al. Detect and correct bias in multi-site neuroimaging datasets. Med. Image Anal. 67 , 101879 (2021).

Ashraf, A., Khan, S., Bhagwat, N., Chakravarty, M. & Taati, B. Learning to unlearn: building immunity to dataset bias in medical imaging studies. In NeurIPS workshop on Machine Learning for Health (ML4H) (2018).

Yu, X., Zheng, H., Liu, C., Huang, Y. & Ding, X. Classify epithelium-stroma in histopathological images based on deep transferable network. J. Microsc. 271 , 164–173 (2018).

Abbasi-Sureshjani, S., Raumanns, R., Michels, B. E., Schouten, G. & Cheplygina, V. Risk of training diagnostic algorithms on data with demographic bias. In Interpretable and Annotation-Efficient Learning for Medical Image Computing , 183–192 (Springer, 2020).

Suresh, H. & Guttag, J. V. A framework for understanding unintended consequences of machine learning. arXiv preprint arXiv:1901.10002 (2019).

Park, S. H. & Han, K. Methodologic guide for evaluating clinical performance and effect of artificial intelligence technology for medical diagnosis and prediction. Radiology 286 , 800–809 (2018).

Oakden-Rayner, L., Dunnmon, J., Carneiro, G. & Ré, C. Hidden stratification causes clinically meaningful failures in machine learning for medical imaging. In ACM Conference on Health, Inference, and Learning, 151–159 (2020).

Winkler, J. K. et al. Association between surgical skin markings in dermoscopic images and diagnostic performance of a deep learning convolutional neural network for melanoma recognition. JAMA Dermatol. 155 , 1135–1141 (2019).

Joskowicz, L., Cohen, D., Caplan, N. & Sosna, J. Inter-observer variability of manual contour delineation of structures in CT. Eur. Radiol. 29 , 1391–1399 (2019).

Oakden-Rayner, L. Exploring large-scale public medical image datasets. Academic Radiol. 27 , 106–112 (2020).

Langley, P. The changing science of machine learning. Mach. Learn. 82 , 275–279 (2011).

Rabanser, S., Günnemann, S. & Lipton, Z. C. Failing loudly: an empirical study of methods for detecting dataset shift. In Neural Information Processing Systems (NeurIPS) (2018).

Rädsch, T. et al. What your radiologist might be missing: using machine learning to identify mislabeled instances of X-ray images. In Hawaii International Conference on System Sciences (HICSS) (2020).

Beyer, L., Hénaff, O. J., Kolesnikov, A., Zhai, X. & Oord, A. v. d. Are we done with ImageNet? arXiv preprint arXiv:2006.07159 (2020).

Gebru, T. et al. Datasheets for datasets. In Workshop on Fairness, Accountability, and Transparency in Machine Learning (2018).

Mitchell, M. et al. Model cards for model reporting. In Fairness, Accountability, and Transparency (FAccT) , 220–229 (ACM, 2019).

Ørting, S. N. et al. A survey of crowdsourcing in medical image analysis. Hum. Comput. 7 , 1–26 (2020).

Poldrack, R. A., Huckins, G. & Varoquaux, G. Establishment of best practices for evidence for prediction: a review. JAMA Psychiatry 77 , 534–540 (2020).

Pulini, A. A., Kerr, W. T., Loo, S. K. & Lenartowicz, A. Classification accuracy of neuroimaging biomarkers in attention-deficit/hyperactivity disorder: Effects of sample size and circular analysis. Biol. Psychiatry.: Cogn. Neurosci. Neuroimaging 4 , 108–120 (2019).

Saeb, S., Lonini, L., Jayaraman, A., Mohr, D. C. & Kording, K. P. The need to approximate the use-case in clinical machine learning. Gigascience 6 , gix019 (2017).

Hosseini, M. et al. I tried a bunch of things: The dangers of unexpected overfitting in classification of brain data. Neuroscience & Biobehavioral Reviews (2020).

Simpson, A. L. et al. A large annotated medical image dataset for the development and evaluation of segmentation algorithms. arXiv preprint arXiv:1902.09063 (2019).

Rohlfing, T. Image similarity and tissue overlaps as surrogates for image registration accuracy: widely used but unreliable. IEEE Trans. Med. Imaging 31 , 153–163 (2011).

Maier-Hein, L. et al. Why rankings of biomedical image analysis competitions should be interpreted with care. Nat. Commun. 9 , 5217 (2018).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Van Calster, B., McLernon, D. J., Van Smeden, M., Wynants, L. & Steyerberg, E. W. Calibration: the Achilles heel of predictive analytics. BMC Med. 17 , 1–7 (2019).

Wagstaff, K. L. Machine learning that matters. In International Conference on Machine Learning (ICML), 529–536 (2012).

Shankar, V. et al. Evaluating machine accuracy on imagenet. In International Conference on Machine Learning (ICML) (2020).

Bellamy, D., Celi, L. & Beam, A. L. Evaluating progress on machine learning for longitudinal electronic healthcare data. arXiv preprint arXiv:2010.01149 (2020).

Oliver, A., Odena, A., Raffel, C., Cubuk, E. D. & Goodfellow, I. J. Realistic evaluation of semi-supervised learning algorithms. In Neural Information Processing Systems (NeurIPS) (2018).

Dacrema, M. F., Cremonesi, P. & Jannach, D. Are we really making much progress? a worrying analysis of recent neural recommendation approaches. In ACM Conference on Recommender Systems, 101–109 (2019).

Musgrave, K., Belongie, S. & Lim, S.-N. A metric learning reality check. In European Conference on Computer Vision, 681–699 (Springer, 2020).

Pham, H. V. et al. Problems and opportunities in training deep learning software systems: an analysis of variance. In IEEE/ACM International Conference on Automated Software Engineering, 771–783 (2020).

Bouthillier, X. et al. Accounting for variance in machine learning benchmarks. In Machine Learning and Systems (2021).

Varoquaux, G. Cross-validation failure: small sample sizes lead to large error bars. NeuroImage 180 , 68–77 (2018).

Szucs, D. & Ioannidis, J. P. Sample size evolution in neuroimaging research: an evaluation of highly-cited studies (1990–2012) and of latest practices (2017–2018) in high-impact journals. NeuroImage117164 (2020).

Roelofs, R. et al. A meta-analysis of overfitting in machine learning. In Neural Information Processing Systems (NeurIPS), 9179–9189 (2019).

Demšar, J. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7 , 1–30 (2006).

Thompson, W. H., Wright, J., Bissett, P. G. & Poldrack, R. A. Meta-research: dataset decay and the problem of sequential analyses on open datasets. eLife 9 , e53498 (2020).

Maier-Hein, L. et al. Is the winner really the best? a critical analysis of common research practice in biomedical image analysis competitions. Nature Communications (2018).

Cockburn, A., Dragicevic, P., Besançon, L. & Gutwin, C. Threats of a replication crisis in empirical computer science. Commun. ACM 63 , 70–79 (2020).

Gigerenzer, G. Statistical rituals: the replication delusion and how we got there. Adv. Methods Pract. Psychol. Sci. 1 , 198–218 (2018).

Benavoli, A., Corani, G. & Mangili, F. Should we really use post-hoc tests based on mean-ranks? J. Mach. Learn. Res. 17 , 152–161 (2016).

Berrar, D. Confidence curves: an alternative to null hypothesis significance testing for the comparison of classifiers. Mach. Learn. 106 , 911–949 (2017).

Bouthillier, X., Laurent, C. & Vincent, P. Unreproducible research is reproducible. In International Conference on Machine Learning (ICML), 725–734 (2019).

Norgeot, B. et al. Minimum information about clinical artificial intelligence modeling: the MI-CLAIM checklist. Nat. Med. 26 , 1320–1324 (2020).

Drummond, C. Machine learning as an experimental science (revisited). In AAAI workshop on evaluation methods for machine learning, 1–5 (2006).

Steyerberg, E. W. & Harrell, F. E. Prediction models need appropriate internal, internal–external, and external validation. J. Clin. Epidemiol. 69 , 245–247 (2016).

Woo, C.-W., Chang, L. J., Lindquist, M. A. & Wager, T. D. Building better biomarkers: brain models in translational neuroimaging. Nat. Neurosci. 20 , 365 (2017).

Van Calster, B. et al. Reporting and interpreting decision curve analysis: a guide for investigators. Eur. Urol. 74 , 796 (2018).

Thomas, R. & Uminsky, D. The problem with metrics is a fundamental problem for AI. arXiv preprint arXiv:2002.08512 (2020).

for the Evaluation of Medicinal Products, E. A. Points to consider on switching between superiority and non-inferiority. Br. J. Clin. Pharmacol. 52 , 223–228 (2001).

D’Agostino Sr, R. B., Massaro, J. M. & Sullivan, L. M. Non-inferiority trials: design concepts and issues–the encounters of academic consultants in statistics. Stat. Med. 22 , 169–186 (2003).

Christensen, E. Methodology of superiority vs. equivalence trials and non-inferiority trials. J. Hepatol. 46 , 947–954 (2007).

Hendriksen, J. M., Geersing, G.-J., Moons, K. G. & de Groot, J. A. Diagnostic and prognostic prediction models. J. Thrombosis Haemost. 11 , 129–141 (2013).

Campbell, M. K., Elbourne, D. R. & Altman, D. G. Consort statement: extension to cluster randomised trials. BMJ 328 , 702–708 (2004).

Blasini, M., Peiris, N., Wright, T. & Colloca, L. The role of patient–practitioner relationships in placebo and nocebo phenomena. Int. Rev. Neurobiol. 139 , 211–231 (2018).

Lipton, Z. C. & Steinhardt, J. Troubling trends in machine learning scholarship: some ML papers suffer from flaws that could mislead the public and stymie future research. Queue 17 , 45–77 (2019).

Tatman, R., VanderPlas, J. & Dane, S. A practical taxonomy of reproducibility for machine learning research. In ICML workshop on Reproducibility in Machine Learning (2018).

Gundersen, O. E. & Kjensmo, S. State of the art: Reproducibility in artificial intelligence. In AAAI Conference on Artificial Intelligence (2018).

Fernández-Delgado, M., Cernadas, E., Barro, S., Amorim, D. & Amorim Fernández-Delgado, D. Do we need hundreds of classifiers to solve real world classification problems? J. Mach. Learn. Res. 15 , 3133–3181 (2014).

Sculley, D. et al. Hidden technical debt in machine learning systems. In Neural Information Processing Systems (NeurIPS), 2503–2511 (2015).

Ioannidis, J. P. A. Why most published research findings are false. PLoS Med. 2 , e124 (2005).

Teney, D. et al. On the value of out-of-distribution testing: an example of Goodhart’s Law. In Neural Information Processing Systems (NeurIPS) (2020).

Kerr, N. L. HARKing: hypothesizing after the results are known. Personal. Soc. Psychol. Rev. 2 , 196–217 (1998).

Article   CAS   Google Scholar  

Gencoglu, O. et al. HARK side of deep learning–from grad student descent to automated machine learning. arXiv preprint arXiv:1904.07633 (2019).

Rosenthal, R. The file drawer problem and tolerance for null results. Psychological Bull. 86 , 638 (1979).

Kellmeyer, P. Ethical and legal implications of the methodological crisis in neuroimaging. Camb. Q. Healthc. Ethics 26 , 530–554 (2017).

Japkowicz, N. & Shah, M. Performance evaluation in machine learning. In Machine Learning in Radiation Oncology , 41–56 (Springer, 2015).

Santafe, G., Inza, I. & Lozano, J. A. Dealing with the evaluation of supervised classification algorithms. Artif. Intell. Rev. 44 , 467–508 (2015).

Han, K., Song, K. & Choi, B. W. How to develop, validate, and compare clinical prediction models involving radiological parameters: study design and statistical methods. Korean J. Radiol. 17 , 339–350 (2016).

Richter, A. N. & Khoshgoftaar, T. M. Sample size determination for biomedical big data with limited labels. Netw. Modeling Anal. Health Inform. Bioinforma. 9 , 12 (2020).

Collins, G. S., Reitsma, J. B., Altman, D. G. & Moons, K. G. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (tripod): the tripod statement. J. Br. Surg. 102 , 148–158 (2015).

Wolff, R. F. et al. Probast: a tool to assess the risk of bias and applicability of prediction model studies. Ann. Intern. Med. 170 , 51–58 (2019).

Henderson, P. et al. Towards the systematic reporting of the energy and carbon footprints of machine learning. J. Mach. Learn. Res. 21 , 1–43 (2020).

Bowen, A. & Casadevall, A. Increasing disparities between resource inputs and outcomes, as measured by certain health deliverables, in biomedical research. Proc. Natl Acad. Sci. 112 , 11335–11340 (2015).

Chambers, C. D., Dienes, Z., McIntosh, R. D., Rotshtein, P. & Willmes, K. Registered reports: realigning incentives in scientific publishing. Cortex 66 , A1–A2 (2015).

Forde, J. Z. & Paganini, M. The scientific method in the science of machine learning. In ICLR workshop on Debugging Machine Learning Models (2019).

Firestein, S.Failure: Why science is so successful (Oxford University Press, 2015).

Borji, A. Negative results in computer vision: a perspective. Image Vis. Comput. 69 , 1–8 (2018).

Voets, M., Møllersen, K. & Bongo, L. A. Replication study: Development and validation of deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. arXiv preprint arXiv:1803.04337 (2018).

Wilkinson, J. et al. Time to reality check the promises of machine learning-powered precision medicine. The Lancet Digital Health (2020).

Whitaker, K. & Guest, O. #bropenscience is broken science. Psychologist 33 , 34–37 (2020).

Kakarmath, S. et al. Best practices for authors of healthcare-related artificial intelligence manuscripts. NPJ Digital Med. 3 , 134–134 (2020).

Download references

Acknowledgements

We would like to thank Alexandra Elbakyan for help with the literature review. We thank Pierre Dragicevic for providing feedback on early versions of this manuscript, and Pierre Bartet for comments on the preprint. We also thank the reviewers, Jack Wilkinson and Odd Erik Gundersen, for excellent comments which improved our manuscript. GV acknowledges funding from grant ANR-17-CE23-0018, DirtyData.

Author information

Authors and affiliations.

INRIA, Versailles, France

Gaël Varoquaux

McGill University, Montreal, Canada

Mila, Montreal, Canada

IT University of Copenhagen, Copenhagen, Denmark

Veronika Cheplygina

You can also search for this author in PubMed   Google Scholar

Contributions

Both V.C. and G.V. collected the data; conceived, designed, and performed the analysis; reviewed the literature; and wrote the paper.

Corresponding authors

Correspondence to Gaël Varoquaux or Veronika Cheplygina .

Ethics declarations

Competing interests.

The authors declare that there are no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Latex source files, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Varoquaux, G., Cheplygina, V. Machine learning for medical imaging: methodological failures and recommendations for the future. npj Digit. Med. 5 , 48 (2022). https://doi.org/10.1038/s41746-022-00592-y

Download citation

Received : 21 June 2021

Accepted : 09 March 2022

Published : 12 April 2022

DOI : https://doi.org/10.1038/s41746-022-00592-y

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

case study for image processing

Breadcrumbs Section. Click here to navigate to respective pages.

Image Processing Using Artificial Intelligence: Case Study on Classification of High-Dimensional Remotely Sensed Images

Image Processing Using Artificial Intelligence: Case Study on Classification of High-Dimensional Remotely Sensed Images

DOI link for Image Processing Using Artificial Intelligence: Case Study on Classification of High-Dimensional Remotely Sensed Images

Click here to navigate to parent product.

Recent advancement in artificial intelligence (AI) has created new vistas in various applications of image processing during the last two decades. Image-processing techniques have been adopted in many fields like electronics and telecommunications, medical science, remote sensing (RS), biotechnology, robotics, etc. The RS is one area where interpretation of image and associated processing techniques are very important in different stages of earth observation study. This area has drawn a lot of research interest in the advancement of image-processing techniques for long. The RS research focuses on the classification of satellite images, as the deliverables of classification process are the basic foundation for many application areas of natural resources and environment. During the last few decades, considerable amount of attempts has been made to examine the efficiency of conventional image processing techniques for enhancement of image quality as well as the classification accuracy of RS images. The current literature has shown significant potentials of machine learning (ML) approaches in object detection and pattern recognition with a high success rate. The classification process has raised considerable issues and interests when the characteristic of the landscape becomes too complex. The ML-based classification approaches have been found effective on many benchmark RS datasets. This chapter explores the issues and challenges of image-processing techniques in the classification of high-dimensional RS datasets. It discusses the potential of AI/ML-based approaches by showcasing a case study on the classification of Airborne ROSIS-3 hyperspectral sensor data. Finally, it provides the concluding remarks of the chapter and gives possible research directions for object detection and classification from very-high-resolution (VHR) RS dataset like unmanned aerial vehicle (UAV) imagery using deep learning (DL) techniques.

  • Privacy Policy
  • Terms & Conditions
  • Cookie Policy
  • Taylor & Francis Online
  • Taylor & Francis Group
  • Students/Researchers
  • Librarians/Institutions

Connect with us

Registered in England & Wales No. 3099067 5 Howick Place | London | SW1P 1WG © 2024 Informa UK Limited

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Front Public Health
  • PMC10662291

Medical image analysis using deep learning algorithms

Mengfang li.

1 The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China

Yuanyuan Jiang

2 Department of Cardiovascular Medicine, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China

Yanzhou Zhang

Haisheng zhu.

3 Department of Cardiovascular Medicine, Wencheng People’s Hospital, Wencheng, China

Associated Data

The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.

In the field of medical image analysis within deep learning (DL), the importance of employing advanced DL techniques cannot be overstated. DL has achieved impressive results in various areas, making it particularly noteworthy for medical image analysis in healthcare. The integration of DL with medical image analysis enables real-time analysis of vast and intricate datasets, yielding insights that significantly enhance healthcare outcomes and operational efficiency in the industry. This extensive review of existing literature conducts a thorough examination of the most recent deep learning (DL) approaches designed to address the difficulties faced in medical healthcare, particularly focusing on the use of deep learning algorithms in medical image analysis. Falling all the investigated papers into five different categories in terms of their techniques, we have assessed them according to some critical parameters. Through a systematic categorization of state-of-the-art DL techniques, such as Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Generative Adversarial Networks (GANs), Long Short-term Memory (LSTM) models, and hybrid models, this study explores their underlying principles, advantages, limitations, methodologies, simulation environments, and datasets. Based on our results, Python was the most frequent programming language used for implementing the proposed methods in the investigated papers. Notably, the majority of the scrutinized papers were published in 2021, underscoring the contemporaneous nature of the research. Moreover, this review accentuates the forefront advancements in DL techniques and their practical applications within the realm of medical image analysis, while simultaneously addressing the challenges that hinder the widespread implementation of DL in image analysis within the medical healthcare domains. These discerned insights serve as compelling impetuses for future studies aimed at the progressive advancement of image analysis in medical healthcare research. The evaluation metrics employed across the reviewed articles encompass a broad spectrum of features, encompassing accuracy, sensitivity, specificity, F-score, robustness, computational complexity, and generalizability.

1. Introduction

Deep learning is a branch of machine learning that employs artificial neural networks comprising multiple layers to acquire and discern intricate patterns from extensive datasets ( 1 , 2 ). It has brought about a revolution in various domains, including computer vision, natural language processing, and speech recognition, among other areas ( 3 ). One of the primary advantages of deep learning is its capacity to automatically learn features from raw data, thereby eliminating the necessity for manual feature engineering ( 4 ). This makes it especially powerful in domains with large, complex datasets, where traditional machine learning methods may struggle to capture the underlying patterns ( 5 ). Deep learning has also facilitated significant advancements in various tasks, including but not limited to image and speech recognition, comprehension of natural language, and the development of autonomous driving capabilities ( 6 ). For instance, deep learning has enabled the creation of exceptionally precise computer vision systems capable of identifying objects in images and videos with unparalleled precision. Likewise, deep learning has brought about substantial enhancements in natural language processing, leading to the development of models capable of comprehending and generating language that resembles human-like expression ( 7 ). Overall, deep learning has opened up new opportunities for solving complex problems and has the potential to transform many industries, including healthcare, finance, transportation, and more.

Medical image analysis is a field of study that involves the processing, interpretation, and analysis of medical images ( 8 ). The emergence of deep learning algorithms has prompted a notable transformation in the field of medical image analysis, as they have increasingly been employed to enhance the diagnosis, treatment, and monitoring of diverse medical conditions in recent years ( 9 ). Deep learning, as a branch of machine learning, encompasses the training of algorithms to acquire knowledge from vast quantities of data. When applied to medical image analysis, deep learning algorithms possess the capability to automatically identify and categorize anomalies in various medical images, including X-rays, MRI scans, CT scans, and ultrasound images ( 10 ). These algorithms can undergo training using extensive datasets consisting of annotated medical images, where each image is accompanied by labels indicating the corresponding medical condition or abnormality ( 11 ). Once trained, the algorithm can analyze new medical images and provide diagnostic insights to healthcare professionals. The application of deep learning algorithms in medical image analysis has exhibited promising outcomes, as evidenced by studies showcasing high levels of accuracy in detecting and diagnosing a wide range of medical conditions ( 12 ). This has led to the development of various commercial and open-source software tools that leverage deep learning algorithms for medical image analysis ( 13 ). Overall, the utilization of deep learning algorithms in medical image analysis has the capability to bring about substantial enhancements in healthcare results and transform the utilization of medical imaging in diagnosis and treatment.

Medical image processing is an area of research that encompasses the creation and application of algorithms and methods to analyze and decipher medical images ( 14 ). The primary objective of medical image processing is to extract meaningful information from medical images to aid in diagnosis, treatment planning, and therapeutic interventions ( 15 ). Medical image processing involves various tasks such as image segmentation, image registration, feature extraction, classification, and visualization. The primary aim of medical image processing is to extract pertinent information from medical images, facilitating the tasks of diagnosis, treatment planning, and therapeutic interventions. Each modality has its unique strengths and limitations, and the images produced by different modalities may require specific processing techniques to extract useful information ( 16 ). Medical image processing techniques have revolutionized the field of medicine by providing a non-invasive means to visualize and analyze the internal structures and functions of the body. It has enabled early detection and diagnosis of diseases, accurate treatment planning, and monitoring of treatment response. The use of medical image processing has significantly improved patient outcomes, reduced treatment costs, and enhanced the quality of care provided to patients. Visual depictions of CNNs in the context of medical image analysis using DL algorithms portray a layered architecture, where initial layers capture rudimentary features like edges and textures, while subsequent layers progressively discern more intricate and abstract characteristics, allowing the network to autonomously extract pertinent information from medical images for tasks like detection, segmentation, and classification. Additionally, the visual representations of RNNs in medical image analysis involving DL algorithms illustrate a network structure adept at grasping temporal relationships and sequential patterns within images, rendering them well-suited for tasks such as video analysis or the processing of time-series medical image data. Furthermore, visual representations of GANs in medical image analysis employing DL algorithms exemplify a dual-network framework: one network, the generator, fabricates synthetic medical images, while the other, the discriminator, assesses their authenticity, facilitating the generation of lifelike images closely resembling actual medical data. Moreover, visual depictions of LSTM networks in medical image analysis with DL algorithms delineate a specialized form of recurrent neural network proficient in processing sequential medical image data by preserving long-term dependencies and learning temporal patterns crucial for tasks like video analysis and time-series image processing. Finally, visual representations of hybrid methods in medical image analysis using DL algorithms portray a combination of diverse neural network architectures, often integrating CNNs with RNNs or other specialized modules, enabling the model to harness both spatial and temporal information for a comprehensive analysis of medical images.

Case studies and real-world examples provide tangible evidence of the effectiveness and applicability of DL algorithms in various medical image analysis tasks. They underscore the potential of this technology to revolutionize healthcare by improving diagnostic accuracy, reducing manual labor, and enabling earlier interventions for patients. Here are several examples of case studies and real-worlds applications:

Case Study: In Vijayalakshmi ( 17 ), a DL algorithm was trained to identify skin cancer from images of skin lesions. The algorithm demonstrated accuracy comparable to that of dermatologists, highlighting its potential as a tool for early skin cancer detection.

Case Study: also, De Fauw et al. ( 18 ) in Moorfields Eye Hospital, developed a DL system capable of identifying diabetic retinopathy from retinal images. The system was trained on a dataset of over 128,000 images and achieved a level of accuracy comparable to expert ophthalmologists.

Case Study: A study conducted by Guo et al. ( 8 ) at Massachusetts General Hospital utilized DL techniques to automate the segmentation of brain tumors from MRI scans. The algorithm significantly reduced the time required for tumor delineation, enabling quicker treatment planning for patients.

Case Study: The National Institutes of Health (NIH) released a dataset of chest X-ray images for the detection of tuberculosis. Researchers have successfully applied deep learning algorithms to this dataset, achieving high accuracy in identifying TB-related abnormalities.

Case Study: Meena and Roy ( 19 ) at Stanford University developed a deep learning model capable of detecting bone fractures in X-ray images. The model demonstrated high accuracy and outperformed traditional rule-based systems in fracture detection.

Within the realm of medical image analysis utilizing DL algorithms, ML algorithms are extensively utilized for precise and efficient segmentation tasks. DL approaches, especially convolutional neural networks (CNNs) and recurrent neural networks (RNNs), have demonstrated exceptional proficiency in capturing and leveraging spatial dependencies and symmetrical properties inherent in medical images. These algorithms enable the analyzing medical image of symmetric structures, such as organs or limbs, by leveraging their inherent symmetrical patterns. The utilization of DL mechanisms in medical image analysis encompasses various practical approaches, including generative adversarial networks (GANs), hybrid models, and combinations of CNNs and RNNs. The objective of this research is to offer a thorough examination of the uses of DL techniques in the domain of deep symmetry-based image analysis within medical healthcare, providing a comprehensive overview. By conducting an in-depth systematic literature review (SLR), analyzing multiple studies, and exploring the properties, advantages, limitations, datasets, and simulation environments associated with different DL mechanisms, this study enhances comprehension regarding the present state and future pathways for advancing and refining deep symmetry-based image analysis methodologies in the field of medical healthcare. The article is structured in the following manner: The key principles and terminology of ML/DL in medical image analysis are covered in the first part, followed by an investigation of relevant papers in part 3. Part 4 discusses the studied mechanisms and tools for paper selection, while part 5 illustrates the classification that was selected. Section 6 presents the results and comparisons, and the remaining concerns and conclusion are explored in the last section.

2. Fundamental concepts and terminology

The concepts and terms related to medical image analysis using DL algorithms that are covered in this section are essential for understanding the underlying principles and techniques used in medical image analysis.

2.1. The role of image analysis in medical healthcare

The utilization of deep learning algorithms for image analysis has brought about a revolution in medical healthcare by facilitating advanced and automated analysis of medical images ( 20 ). Deep learning methods, including Convolutional Neural Networks (CNNs), have showcased outstanding proficiency in tasks like image segmentation, feature extraction, and classification, exhibiting remarkable performance ( 21 ). By leveraging large amounts of annotated data, deep learning models can learn intricate patterns and relationships within medical images, facilitating accurate detection, localization, and diagnosis of diseases and abnormalities. Deep learning-based image analysis allows for faster and more precise interpretation of medical images, leading to improved patient outcomes, personalized treatment planning, and efficient healthcare workflows ( 22 ). Furthermore, these algorithms have the potential to assist in early disease detection, assist radiologists in decision-making, and enhance medical research through the analysis of large-scale image datasets. Overall, deep learning-based image analysis is transforming medical healthcare by providing powerful tools for image interpretation, augmenting the capabilities of healthcare professionals, and enhancing patient care ( 23 ).

2.2. Medical image analysis application

The utilization of deep learning algorithms in medical image analysis has discovered numerous applications within the healthcare sector. Deep learning techniques, notably Convolutional Neural Networks (CNNs), have been widely employed for tasks encompassing image segmentation, object detection, disease classification, and image reconstruction ( 24 ). In medical image analysis, these algorithms can assist in the detection and diagnosis of various conditions, such as tumors, lesions, anatomical abnormalities, and pathological changes. They can also aid in the evaluation of disease progression, treatment response, and prognosis. Deep learning models can automatically extract meaningful features from medical images, enabling efficient and accurate interpretation ( 25 ). The application of this technology holds promise for elevating clinical decision-making, ameliorating patient outcomes, and optimizing resource allocation in healthcare settings. Moreover, deep learning algorithms can be employed for data augmentation, image registration, and multimodal fusion, facilitating a comprehensive and integrated analysis of medical images obtained from various modalities. With continuous advancements in deep learning algorithms, medical image analysis is witnessing significant progress, opening up new possibilities for precision medicine, personalized treatment planning, and advanced healthcare solutions ( 26 ).

2.3. Various aspects of medical image analysis for the healthcare section

Medical image analysis encompasses various crucial aspects in the healthcare sector, enabling in-depth examination and diagnosis based on medical imaging data ( 27 ). Image preprocessing constitutes a crucial element, encompassing techniques like noise reduction, image enhancement, and normalization, aimed at enhancing the quality and uniformity of the images. Another essential aspect is image registration, which aligns multiple images of the same patient or acquired through different imaging modalities, enabling precise comparison and fusion of information ( 28 ). Feature extraction is another crucial step, where relevant characteristics and patterns are extracted from the images, aiding in the detection and classification of abnormalities or specific anatomical structures. Segmentation plays a vital role in delineating regions of interest, enabling precise localization and measurement of anatomical structures, tumors, or lesions ( 29 ). Finally, classification and recognition techniques are applied to differentiate normal and abnormal regions, aiding in disease diagnosis and treatment planning. Deep learning algorithms, notably Convolutional Neural Networks (CNNs), have exhibited extraordinary achievements in diverse facets of medical image analysis by acquiring complex patterns and representations from extensive datasets of medical imaging ( 30 ). However, challenges such as data variability, interpretability, and generalization across different patient populations and imaging modalities need to be addressed to ensure reliable and effective medical image analysis in healthcare applications.

3. Relevant reviews

We are going to look into some recent research on medical image analysis using DL algorithms in this part. The initial purpose is to properly make a distinction between the current study’s significant results in comparison with what is discussed in this paper. Due to advancements in AI technology, there is a growing adoption of AI mechanisms in medical image analysis. Simultaneously, academia has shown a heightened interest in addressing challenges related to medical image analysis. Furthermore, medical image analysis is a hierarchical network management framework modeled to direct analysis availability to aim medical healthcare. In this regard, Gupta and Katarya ( 31 ) provided a comprehensive review of the literature on social media-based surveillance systems for healthcare using machine learning. The authors analyzed 50 studies published between 2011 and 2021, covering a wide range of topics related to social media monitoring for healthcare, including disease outbreaks, adverse drug reactions, mental health, and vaccine hesitancy. The review highlighted the potential of machine learning algorithms for analyzing vast amounts of social media data and identifying relevant health-related information. The authors also identified several challenges associated with the use of social media data, such as data quality and privacy concerns, and discuss potential solutions to address these challenges. The authors noted that social media-based surveillance systems can complement traditional surveillance methods by providing real-time data on health-related events and trends. They also suggested that machine learning algorithms can improve the accuracy and efficiency of social media monitoring by automatically filtering out irrelevant information and identifying patterns and trends in the data. The review highlighted the importance of data pre-processing and feature selection in developing effective machine learning models for social media analysis.

As well, Kourou et al. ( 32 ) reviewed machine learning (ML) applications for cancer prognosis and prediction. The authors started by describing the challenges of cancer treatment, highlighting the importance of personalized medicine and the role of ML algorithms in enabling it. The paper then provided an overview of different types of ML algorithms, including supervised and unsupervised learning, and discussed their potential applications in cancer prognosis and prediction. The authors presented examples of studies that have used ML algorithms for diagnosis, treatment response prediction, and survival prediction across different types of cancer. They also discussed the use of multiple data sources for ML algorithms, such as genetic data, imaging data, and clinical data. The paper concluded by addressing the challenges and limitations encountered in using ML algorithms for cancer prognosis and prediction, which include concerns regarding data quality, overfitting, and interpretability. The authors proposed that ML algorithms hold significant potential for enhancing cancer treatment outcomes. However, they emphasized the necessity for further research to optimize their application and tackle the associated challenges in this domain.

Moreover, Razzak et al. ( 33 ) provided a comprehensive overview of the use of deep learning in medical image processing. The authors deliberated on the potential of deep learning algorithms in diverse medical imaging tasks, encompassing image classification, segmentation, registration, and synthesis. They emphasized the challenges encountered when employing deep learning, such as the requirement for extensive annotated datasets, interpretability of deep models, and computational demands. Additionally, the paper delved into prospective avenues in the field, including the integration of multi-modal data, transfer learning, and the utilization of generative models. In summary, the paper offered valuable perspectives on the present status, challenges, and potential advancements of deep learning in the domain of medical image processing.

In addition, Litjens et al. ( 34 ) provided a comprehensive survey of the applications of deep learning in medical image analysis. A thorough introduction of the deep learning approaches used in each of these areas is provided by the authors as they look at a variety of tasks in medical imaging, including picture classification, segmentation, detection, registration, and creation. Additionally, they look at the difficulties and restrictions of using deep learning algorithms for medical image analysis, such as the need for sizable datasets with annotations and the interpretability of deep models. The growth of explainable and interpretable deep learning models is highlighted in the paper’s conclusion along with other potential future possibilities in the area, such as the integration of multimodal data. In summary, this survey serves as a valuable resource for researchers and practitioners, offering insights into the current state and future prospects of deep learning in the context of medical image analysis.

Additionally, Bzdok and Ioannidis ( 35 ) discussed the importance of exploration, inference, and prediction in the fields of neuroscience and biomedicine. The author highlighted the importance of integrating diverse data types, such as neuroimaging, genetics, and behavioral data, in order to achieve a comprehensive comprehension of intricate systems. Bzdok also delved into the role of machine learning in facilitating the identification of patterns and making predictions based on extensive datasets. The author provided an account of several specific applications of machine learning in neuroscience and biomedicine, including forecasting disease progression and treatment response, analyzing brain connectivity networks, and identifying biomarkers for disease diagnosis. The paper concluded by discussing the challenges and limitations encountered when employing machine learning in these domains, while emphasizing the essentiality of carefully considering the ethical and social implications of these technologies. Moreover, the paper underscored the potential of machine learning to transform our understanding of complex biological systems and enhance medical outcomes. Table 1 depicts summary of related works.

Summary of related works.

AuthorMain ideaAdvantageDisadvantage
Gupta and Katarya ( )Providing a comprehensive review of the literature on social media-based surveillance systems for healthcare using machine learning
Kourou, et al. ( )Reviewing ML applications for cancer prognosis and prediction
Razzak, et al. ( )Providing a comprehensive overview of the use of deep learning in medical image processing
Litjens, et al. ( )Providing a comprehensive survey of the applications of deep learning in medical image analysis
Bzdok and Ioannidis ( )Discussing the importance of exploration, inference, and prediction in the fields of neuroscience and biomedicine
Our workIntroducing a new taxonomy of DL methods in medical image analysis

4. Methodology of research

We thoroughly examined pertinent documents that partially explored the utilization of DL methods in medical image analysis. By utilizing the Systematic Literature Review (SLR) methodology, this section comprehensively encompasses the field of medical image analysis. The SLR technique encompasses a thorough evaluation of all research conducted on a significant topic. This section concludes with an extensive investigation of ML techniques in the realm of medical image analysis. Furthermore, the reliability of the research selection methods is scrutinized. In the subsequent subsections, we have provided supplementary information concerning research techniques, encompassing the selection metrics and research inquiries.

4.1. Formalization of question

The primary aims of the research are to identify, assess, and differentiate all key papers within the realm of using DL methods medical image analysis. A systematic literature review (SLR) can be utilized to scrutinize the constituents and characteristics of methods for accomplishing the aforementioned objectives. Furthermore, an SLR facilitates the acquisition of profound comprehension of the pivotal challenges and difficulties in this domain. The following paragraph outlines several research inquiries:

Research Question 1: In what manners can DL techniques in the field of medical image analysis be categorized? The answer to this question can be found in Part 5.
Research Question 2: What types of techniques do scholars employ to execute their investigation? Parts 5.1 to 5.7 elucidate this query.
Research Question 3: Which parameters attracted the most attention in the papers? What are the most popular DL applications utilized in medical image analysis? The answer to this question is included in Part 6.
Research Question 4: What unexplored prospects exist in this area? Part 7 proffers the answer to this question.

4.2. The procedure of paper exploration

The present investigation’s pursuit and selection methodologies are classified into four distinct phases, as depicted in Figure 1 . In the initial phase, a comprehensive list of keywords and phrases was utilized to scour various sources, as demonstrated in Table 2 . An electronic database was employed to retrieve relevant documents, including Chapters, Journals, technical studies, conference papers, notes, and special issues, resulting in a total of 616 papers as is shown if Figure 2 . These papers were then subjected to an exhaustive analysis based on a set of predetermined standards, and only those meeting the stipulated criteria, illustrated in Figure 3 , were selected for further evaluation. The distribution of publishers in this initial phase is shown in Figure 4 , and the number of articles left after the first phase was 481.

An external file that holds a picture, illustration, etc.
Object name is fpubh-11-1273253-g001.jpg

The phases of the article searching and selection process.

Keywords and search criteria.

S#Keywords and search criteriaS#Keywords and search criteria
S1“DL” and “Medical”S6“AI” and “Healthcare”
S2“ML” and “Healthcare”S7“Healthcare” and “DL algorithms”
S3“DL” and “Image Analysis”S8“DL methods” and “Medical Images”
S4“ML” and “Medical Healthcare”S9“Image Analysis” and “Medical Healthcare”
S5“AI” and “Medical Healthcare”S10“AI methods” and “Medical Images”

An external file that holds a picture, illustration, etc.
Object name is fpubh-11-1273253-g002.jpg

Frequency of publications of studied paper in first stage of paper selection.

An external file that holds a picture, illustration, etc.
Object name is fpubh-11-1273253-g003.jpg

Criteria for inclusion in the paper selection process.

An external file that holds a picture, illustration, etc.
Object name is fpubh-11-1273253-g004.jpg

Frequency of publications of studied paper in second stage of paper selection.

In the subsequent phase, a thorough review of the selected papers’ titles and abstracts was conducted, focusing on the papers’ discussion, methodology, analysis, and conclusion to ensure their relevance to the study. As demonstrated in Figure 5 , only 227 papers were retained after this step and 105 papers were further.

An external file that holds a picture, illustration, etc.
Object name is fpubh-11-1273253-g005.jpg

Frequency of publications of studied paper in third stage of paper selection.

chosen for a more comprehensive review, as illustrated in Figure 6 , with the ultimate aim of selecting papers that adhered to the study’s predetermined metrics. Finally, after careful consideration, 25 articles were hand-picked to investigate other publications.

An external file that holds a picture, illustration, etc.
Object name is fpubh-11-1273253-g006.jpg

Frequency of publications of studied paper in forth stage of paper selection.

5. ML/DL techniques for medical image analysis

In this section, we delve into the implementation of DL methods in the medical healthcare image analysis field. A total of 25 articles satisfying our selection criteria will be presented herein. Initially, we categorize the techniques into 5 primary groups comprising CNNs, RNNs, GANs, LSTMs, and hybrid methodologies encompassing diverse methods. The proposed taxonomy of DL-associated medical image analysis in medical healthcare is depicted in Figure 7 .

An external file that holds a picture, illustration, etc.
Object name is fpubh-11-1273253-g007.jpg

The proposed taxonomy of Bioinformatics.

5.1. Convolutional neural network techniques for medical image analysis

When using deep learning approaches for medical image processing, convolutional neural networks (CNNs) play a significant role. They perform well in tasks like object localization, segmentation, and classification due to their capacity to automatically extract pertinent characteristics from intricate medical pictures. CNNs are able to accurately identify anomalies, diagnose tumors, and segment organs in medical pictures by capturing complex patterns and structures. Important characteristics may be learnt at various levels by utilizing the hierarchical structure of CNNs, which improves analysis and diagnosis. Employing CNNs in medical image analysis has notably improved the precision, effectiveness, and automation of diagnostic procedures, ultimately leading to advantageous patient care and treatment results.

In this regard, Singh et al. ( 36 ) highlighted the role of artificial intelligence (AI) and machine learning (ML) techniques in advancing biomedical material design and predicting their toxicity. The authors emphasized the need for efficient and safe materials for medical applications and how computational methods can aid in this process. The paper explored diverse categories of AI and ML algorithms, including random forests, decision trees, and support vector machines, which can be employed for predicting toxicity. The authors provided a case study wherein they utilized a random forest algorithm to predict the toxicity of carbon nanotubes. They also highlighted the importance of data quality and quantity for accurate predictions, as well as the need for interpretability and transparency of AI/ML models. The paper concluded by discussing future research directions in this area, including the integration of multi-omics data, network analysis, and deep learning techniques. This paper demonstrated the potential of AI/ML in advancing biomedical material design and reducing the need for animal testing.

Also, Jena et al. ( 37 ) investigated the impact of parameters on the performance of deep learning models for the classification of diabetic retinopathy (DR) in a smart healthcare system. Using retinal fundus pictures, the scientists developed a convolutional neural network (CNN) architecture with two branches to categorize diabetic retinopathy (DR). A branch for feature extraction and another for classification are both included in the suggested model. A pre-trained model is used in the feature extraction branch to extract pertinent characteristics from the input picture, and the classification branch uses these features to predict the severity of DR. The learning rate, number of epochs, batch size, and optimizer were among the variables that the authors experimented with in order to evaluate the model’s performance. The outcomes showed that the suggested model, when using the ideal parameter configuration, had an accuracy of 98.12%. The authors also suggested a secure IoT-based blockchain-based smart healthcare system for processing and storing medical data. The proposed system could be used for the early diagnosis and treatment of DR, thereby improving patient outcomes.

As well, Thilagam et al. ( 38 ) presented a secure Internet of Things (IoT) healthcare architecture with a deep learning-based access control system. The proposed system is designed to ensure that only authorized personnel can access the sensitive medical information stored in IoT devices. The authors used deep learning algorithms to develop a robust access control system that can identify and authenticate users with high accuracy. The system also included an encryption layer to ensure that all data transmitted between devices is secure. The authors assessed the proposed architecture through a prototype implementation, which revealed that the system can securely access medical data in real-time. Additionally, the authors conducted a comparison with existing solutions and demonstrated that their approach outperforms others in terms of accuracy, security, and scalability. The paper underscored the potential of employing deep learning algorithms in healthcare systems to enhance security and privacy, while facilitating real-time access to medical data.

Besides, Ismail et al. ( 39 ) proposed a CNN-based model for analyzing regular health factors in an IoMT (Internet-of-Medical-Things) environment. The model extracted feature from multiple health data sources, such as blood pressure, pulse rate, and body temperature, using CNN-based algorithms, which are then used to predict the risk of health issues. The proposed model is capable of classifying health data into five categories: normal, pre-hypertension, hypertension, pre-diabetes, and diabetes. The authors utilized a real-world dataset comprising health data from 50 individuals to train and evaluate the model. The findings indicated that the proposed model exhibited a remarkable level of accuracy and surpassed existing machine learning models in terms of both predictive accuracy and computational complexity. The authors expressed their confidence that the proposed model could contribute to the advancement of health monitoring systems, offering real-time monitoring and personalized interventions, thereby preventing health issues and enhancing patient outcomes.

And, More et al. ( 40 ) proposed a security-assured CNN-based model for the reconstruction of medical images on the Internet of Healthcare Things (IoHT) with the goal of ensuring the privacy and security of medical data. The proposed framework comprises two main components: a deep learning-based image reconstruction model and a security-enhanced encryption model. The image reconstruction model relies on a convolutional neural network (CNN) to accurately reconstruct original medical images from compressed versions. To safeguard the transmitted images, the encryption model employs a hybrid encryption scheme that combines symmetric and asymmetric techniques. Through evaluation using a widely recognized medical imaging dataset, the results demonstrated the model’s remarkable reconstruction accuracy and effective security performance. This study underscores the potential of leveraging deep learning models in healthcare, particularly within medical image processing, while emphasizing the crucial need for ensuring the security and privacy of medical data. Table 3 discusses the CNN methods used in medical image analysis and their properties.

The methods, properties, and features of CNN-medical image analysis mechanisms.

AuthorMain ideaAdvantageDisadvantageSimulation environmentDatasets
Singh, et al. ( )Presenting a case study where they employed a random forest algorithm for toxicity prediction of carbon nanotubes Python27 observations
Jena, et al. ( )Proposing a 2-branch convolutional neural network (CNN) architecture to classify DR in retinal fundus images PythonFundus images of 102 diabetic patients
Thilagam, et al. ( )Presenting a secure Internet of Things (IoT) healthcare architecture with a deep learning-based access control system Python100 participants performing 10 different gestures and activities over a duration of 60 s each
Ismail, et al. ( )Proposing a CNN-based model for analyzing regular health factors in an IoMT (Internet-of-Medical-Things) environment PythonReal-time health examinations of 10,806 citizens
More, et al. ( )Proposing a security-assured CNN-based model for the reconstruction of medical images on the Internet of Healthcare Things (IoHT) Python2,260 images of Ultrasound, CT scan, and MRI

5.2. Generative adversarial network techniques for medical image analysis

The importance of GAN methods in medical image analysis using deep learning algorithms lies in their ability to generate realistic synthetic images, augment datasets, and improve the accuracy and effectiveness of diagnosis and analysis for various medical conditions. By the same token, in Vaccari et al. ( 41 ) the authors proposed a generative adversarial network (GAN) technique to address the issue of generating synthetic medical data for Internet of Medical Things (IoMT) applications. The authors detailed the application of their proposed method for generating a wide range of medical data samples encompassing both time series and non-time series data. They emphasized the advantages of employing a Generative Adversarial Network (GAN)-based approach, such as the capacity to generate realistic data capable of enhancing the performance of Internet of Medical Things (IoMT) systems. Through experiments utilizing authentic medical datasets like electrocardiogram (ECG) data and healthcare imaging data, the authors validated the efficacy of their proposed technique. The results demonstrated that their GAN-based method successfully produced synthetic medical data that closely resembled real medical data, both visually and statistically, as indicated by various metrics. The authors concluded that their proposed technique has the potential to be a valuable tool for generating synthetic medical data for use in IoMT applications.

Toward accurate prediction of patient length of stay at emergency.

As well, Kadri et al. ( 42 ) presented a framework that utilizes a deep learning model to predict the length of stay of patients at emergency departments. The proposed model employed a GAN to generate synthetic training data and address the problem of insufficient training data. The model used multiple input modalities, including demographic information, chief complaint, triage information, vital signs, and lab results, to predict the length of stay of patients. The authors demonstrated that their proposed framework surpassed multiple baseline models, showcasing its exceptional performance in accurately predicting the length of stay for patients in emergency departments. They recommended the deployment of the proposed framework in real-world settings, anticipating its potential to enhance the efficiency of emergency departments and ultimately improve patient outcomes.

Yang et al. ( 43 ) proposed a novel semi-supervised learning approach using GAN for clinical decision support in Health-IoT platform. The proposed model generated new samples from existing labeled data, creating additional labeled data for training. The GAN-based model undergoes training on a vast unlabeled dataset to generate medical images that exhibit enhanced realism for subsequent training purposes. These generated samples are then employed to fine-tune the pre-trained CNN, resulting in an improved classification accuracy. To assess the effectiveness of the proposed model, three medical datasets are utilized, and the findings demonstrate that the GAN-based semi-supervised learning approach surpasses the supervised learning approach, yielding superior accuracy and reduced loss values. The paper concludes that the proposed model presents the potential to enhance the accuracy of clinical decision support systems by generating supplementary training data. Furthermore, the proposed approach can be extended to diverse healthcare applications, including disease diagnosis and drug discovery.

Huang et al. ( 44 ) proposed a deep learning-based model, DU-GAN, for low-dose computed tomography (CT) denoising in the medical imaging field. The architecture of DU-GAN incorporates dual-domain U-Net-based discriminators and a GAN, aiming to enhance denoising performance and generate high-quality CT images. The proposed approach adopts a dual-domain architecture, effectively utilizing both the image domain and transform domain to differentiate real images from generated ones. DU-GAN is trained on a substantial dataset of CT images to grasp the noise distribution and noise from low-dose CT images. The results indicate that the DU-GAN model surpasses existing methods in terms of both quantitative and qualitative evaluation metrics. Furthermore, the proposed model exhibits robustness across various noise levels and different types of image data. The study showed the potential of the proposed approach for practical application in the clinical diagnosis and treatment of various medical conditions.

Purandhar et al. ( 45 ) proposes the use of Generative Adversarial Networks (GAN) for classifying clustered health care data. This study’s GAN classifier contains both a discriminator network and a generator network. While the discriminator tells the difference between genuine and false samples, the generator learns the underlying data distribution. Utilizing data from Electronic Health Records (EHRs), the MIMIC-III dataset was used by the scientists in their research. The outcomes show that the GAN classifier accurately and successfully categorizes the medical problems of patients. The authors also demonstrated the superiority of their GAN classifier by contrasting it with conventional machine learning techniques. The suggested GAN-based strategy shows promise for illness early detection and diagnosis, with potential for bettering healthcare outcomes and lowering costs. Table 4 discusses the GAN methods used in medical image analysis.

The methods, properties, and features of GAN-medical image analysis mechanisms.

AuthorMain ideaAdvantageDisadvantageSimulation environmentDatasets
Vaccari, et al. ( )Proposing a generative adversarial network (GAN) technique to address the issue of generating synthetic medical data for Internet of Medical Things (IoMT) applications 43 samples
Kadri, et al. ( )Presenting a framework that utilizes a deep learning model to predict the length of stay of patients at emergency departments Python44,676 patients
Yang, et al. ( )Proposing a novel semi-supervised learning approach using GAN for clinical decision support in Health-IoT platform Python11,039 Stroke patient
Huang, et al. ( )Proposing a deep learning-based model, DU-GAN, for low-dose CT denoising 850 CT scans
Purandhar, et al. ( )Proposing the use of GAN for classifying clustered health care data 452 instances

5.3. Recurrent neural network techniques for medical image analysis

Recurrent Neural Networks (RNNs) are essential in medical image analysis using deep learning algorithms due to their ability to capture temporal dependencies and contextual information. RNNs excel in tasks involving sequential or time-series data, such as analyzing medical image sequences or dynamic imaging modalities. Their capability to model long-term dependencies and utilize information from previous time steps enables the detection of patterns, disease progression prediction, and tracking tumor growth. RNN variants like LSTM and GRU further enhance their ability to capture complex temporal dynamics, making them vital in extracting meaningful insights from medical image sequences.

Sridhar et al. ( 46 ) proposed a novel approach for reducing the size of medical images while preserving their diagnostic quality. The authors introduced a two-stage framework that combines a Recurrent Neural Network (RNN) and a Genetic Particle Swarm Optimization with Weighted Vector Quantization (GenPSOWVQ). In the first stage, the RNN is employed to learn the spatial and contextual dependencies within the images, capturing important features for preserving diagnostic information. In the second stage, the GenPSOWVQ algorithm optimized the image compression process by selecting the best encoding parameters. The experimental results demonstrated the effectiveness of the proposed model in achieving significant image size reduction while maintaining high diagnostic accuracy. The combination of RNN and GenPSOWVQ enabled an efficient and reliable approach for medical image compression, which can have practical implications in storage, transmission, and analysis of large-scale medical image datasets.

Pham et al. ( 47 ) discussed the use of DL to predict healthcare trajectories from medical records. The authors argued that deep learning can be used to model the complex relationships between different medical conditions and predict how a patient’s healthcare journey might evolve over time. The study used data from electronic medical records of patients with various conditions, including diabetes, hypertension, and heart disease. The proposed DL model used a CNNs and RNNs to capture both the temporal and spatial relationships in the data. The research discovered that the deep learning model exhibited a remarkable ability to accurately forecast the future healthcare path of patients with a notable level of precision. The authors’ conclusion highlighted the potential of deep learning to transform healthcare delivery through enhanced accuracy in predictions and personalized care. Nevertheless, the authors acknowledged that the integration of deep learning in healthcare is still at an early phase, necessitating further investigation to fully unleash its potential.

Wang et al. ( 48 ) proposed a new approach for dynamic treatment recommendation using supervised reinforcement learning with RNNs. The authors aimed to address the challenge of making treatment decisions for patients with complex and dynamic health conditions by developing an algorithm that can adapt to changes in patient health over time. The proposed approach involved using an RNN to model patient health trajectories and predict the optimal treatment at each step. The training of the model involves a blend of supervised and reinforcement learning techniques, aimed at optimizing treatment decisions for long-term health benefits. The authors assessed the effectiveness of this approach using a dataset comprising actual patients with hypertension and demonstrated its superiority over conventional machine learning methods in terms of predictive accuracy. The suggested method holds promise in enhancing patient outcomes by offering personalized treatment recommendations that can adapt to variations in the patient’s health status.

Jagannatha and Yu ( 49 ) discusses the use of bidirectional recurrent neural networks (RNNs) for medical event detection in electronic health records (EHRs). Electronic Health Records (EHRs) offer valuable insights for medical research, yet analyzing them can be arduous due to the intricate nature and fluctuations in the data. To address this, the authors introduce a bidirectional RNN model capable of capturing the interdependencies in the sequential data of EHRs, encompassing both forward and backward relations. Through training on an EHR dataset and subsequent evaluation, the model’s proficiency in detecting medical events is assessed. The findings reveal that the bidirectional RNN surpasses conventional machine learning methods in terms of medical event detection. The authors also compare different variations of the model, such as using different types of RNNs and adding additional features to the input. Overall, the study demonstrates the potential of using bidirectional RNNs for medical event detection in EHRs, which could have important implications for improving healthcare outcomes and reducing costs.

Cocos et al. ( 50 ) focused on developing a deep learning model for pharmacovigilance to identify adverse drug reactions (ADRs) mentioned on social media platforms such as Twitter. In the study, Adverse Drug Reactions (ADRs) were trained and classified using two unique RNN architectures, namely Bidirectional Long-Short Term Memory (Bi-LSTM) and Gated Recurrent Unit (GRU). Various feature extraction methods were also looked at, and their individual performances were discussed. The outcomes showed that the Bi-LSTM model performed better than the GRU model, obtaining an F1-score of 0.86. A comparison of the deep learning models with conventional machine learning models was also done, confirming the higher performance of the deep learning models. The study focused on the possibilities of utilizing social media platforms for pharmacovigilance and underlined the efficiency of deep learning models in precisely detecting ADRs. Table 5 discusses the RNN methods used in medical image analysis.

The methods, properties, and features of RNN-medical image analysis mechanisms.

AuthorMain ideaAdvantageDisadvantageSimulation environmentDatasets
Sridhar, et al. ( )Proposing a novel approach for reducing the size of medical images 50 instances
Pham, et al. ( )Proposing DL model used a CNNs and RNNs to capture both the temporal and spatial relationships in the data Python7,191 patients
Wang, et al. ( )Proposing a new approach for dynamic treatment recommendation 43 K patients
Jagannatha and Yu ( )Discussing the use of bidirectional recurrent neural networks (RNNs) for medical event detection Lasagne780 English EHR notes
Cocos, et al. ( )Developing a DL model for pharmacovigilance to identify adverse drug reactions (ADRs) Keras844 tweets

5.4. Long short-term memory techniques for medical image analysis

The importance of Long Short-Term Memory (LSTM) method in medical image analysis using deep learning algorithms lies in its ability to capture and model sequential dependencies within the image data. Medical images often contain complex spatial and temporal patterns that require understanding of contextual information. LSTM, as a type of recurrent neural network (RNN), excels in modeling long-range dependencies and capturing temporal dynamics, making it suitable for tasks such as time series analysis, disease progression modeling, and image sequence analysis. By leveraging the memory and gating mechanisms of LSTM, it can effectively learn and retain relevant information over time, enabling more accurate and robust analysis of medical image data and contributing to improved diagnostic accuracy and personalized treatment in healthcare applications.

Butt et al. ( 51 ) presented a ML-based approach for diabetes classification and prediction. They used a dataset of 768 patients and 8 clinical features, including age, BMI, blood pressure, and glucose levels. Three different machine learning techniques–logistic regression, decision tree, and k-nearest neighbors–were applied to the preprocessed data before each of these algorithms was used. Sorting patients into the diabetic or non-diabetic category was the goal. Metrics including accuracy, precision, recall, and F1 score were used to evaluate the effectiveness of each method. In order to forecast the patients’ blood glucose levels, a deep learning system, namely a feedforward neural network, was used. A comparison between the performance of the deep learning algorithm and that of the traditional machine learning algorithms was conducted, revealing that the deep learning algorithm surpassed the other algorithms in terms of prediction accuracy. The authors concluded that their approach can be used for early diagnosis and management of diabetes in healthcare applications.

Awais et al. ( 52 ) proposed an Internet of Things (IoT) framework that utilizes Long Short-Term Memory (LSTM) based emotion detection for healthcare and distance learning during COVID-19. The proposed framework offers the ability to discern individuals’ emotions by leveraging physiological signals such as electrocardiogram (ECG), electrodermal activity (EDA), and photoplethysmogram (PPG). Collected data undergoes preprocessing and feature extraction prior to training an LSTM model. To assess its effectiveness, the framework is tested using the PhysioNet emotion database, where the results demonstrate its accurate emotion detection capabilities, reaching an accuracy level of up to 94.5%. With its potential applications in healthcare and distance learning amid the COVID-19 pandemic, the framework proves invaluable for remotely monitoring individuals’ emotional states and providing necessary support and interventions. The paper highlighted the importance of using IoT and machine learning in healthcare, and how it can help to address some of the challenges posed by the pandemic.

Nancy et al. ( 53 ) proposed an IoT-Cloud-based smart healthcare monitoring system for heart disease prediction using deep learning. The technology uses wearable sensors to gather physiological signs from patients, then delivers those signals to a cloud server for analysis. By training on a sizable dataset of ECG signals, a Convolutional Neural Network (CNN)-based deep learning model is used to predict cardiac illness. Transfer learning techniques, especially fine-tuning, are used to optimize the model. The suggested system’s exceptional accuracy in forecasting cardiac illness has been rigorously tested on a real-world dataset. Additionally, the model exhibits the capability to detect the early onset of heart disease, facilitating timely intervention and treatment. The paper concluded that the proposed system can be an effective tool for real-time heart disease monitoring and prediction, which can help improve patient outcomes and reduce healthcare costs.

Queralta et al. ( 54 ) presents an Edge-AI solution for fall detection in health monitoring using LoRa communication technology, fog computing, and LSTM recurrent neural networks. The proposed system consists of a wearable device, a LoRa gateway, and an edge server that processes and analyzes sensor data locally, reducing the dependence on cloud services and improving real-time fall detection. The system employs a MobileNetV2 convolutional neural network to extract features from accelerometer and gyroscope data, followed by an LSTM network that predicts falls. The authors evaluated the performance of the proposed system using a dataset collected from volunteers and achieved a sensitivity of 93.14% and a specificity of 98.9%. They also compared the proposed system with a cloud-based solution, showing that the proposed system had lower latency and reduced data transmission requirements. Overall, the proposed Edge-AI system can provide a low-cost and efficient solution for fall detection in health monitoring applications.

Gao et al. ( 55 ) introduced a novel approach called Fully Convolutional Structured LSTM Networks (FCSLNs) for joint 4D medical image segmentation. The proposed approach utilized the strengths of fully convolutional networks and structured LSTM networks to overcome the complexities arising from spatial and temporal dependencies in 4D medical image data. By integrating LSTM units into the convolutional layers, the FCSLNs successfully capture temporal information and propagate it throughout the spatial dimensions. Empirical findings strongly indicate the outstanding performance of the FCSLNs when compared to existing methods, achieving precise and resilient segmentation of 4D medical images. The proposed framework demonstrates significant promise in advancing medical image analysis tasks and enhancing clinical decision-making processes. Table 6 discusses the LSTM methods used in medical image analysis.

The methods, properties, and features of LSTM-medical image analysis mechanisms.

AuthorMain ideaAdvantageDisadvantageSimulation environmentDatasets
Butt, et al. ( )Presenting a ML-based approach for diabetes classification and prediction 768 records
Awais, et al. ( )Proposing an Internet of Things (IoT) framework for healthcare and distance learning during COVID-19 Tensorflow1,000 samples of data
Nancy, et al. ( )Proposing an IoT-Cloud-based smart healthcare monitoring system for heart disease prediction using deep learning Tensorflow100,000 records
Queralta, et al. ( )Proposing an IoT-Cloud-based smart healthcare monitoring system for heart disease prediction Keras/Tensorflow20 data points
Gao, et al. ( )Introducing a novel approach called Fully Convolutional Structured LSTM Networks (FCSLNs) for joint 4D medical image segmentation 10 samples

5.5. Hybrid techniques for bio and medical informatics

Hybrid methods in medical image analysis, which combine deep learning algorithms with other techniques or data modalities, are of significant importance. Deep learning has demonstrated remarkable success in tasks like image segmentation and classification. However, it may face challenges such as limited training data or interpretability issues. By incorporating hybrid methods, researchers can overcome these limitations and achieve enhanced performance. Hybrid approaches can integrate traditional machine learning techniques, statistical models, or domain-specific knowledge to address data scarcity or improve interpretability. Additionally, combining multiple data modalities, such as medical images with textual reports or physiological signals, enables a more comprehensive understanding of the medical condition and facilitates better decision-making. Ultimately, hybrid methods in medical image analysis empower healthcare professionals with more accurate and reliable tools for diagnosis, treatment planning, and patient care. In this regard, Shahzadi et al. ( 56 ) proposed a novel cascaded framework for accurately classifying brain tumors using a combination of convolutional neural networks (CNNs) and long short-term memory (LSTM) networks. The proposed approach utilized the CNN’s capability to extract significant features from brain tumor images and the LSTM’s capacity to capture temporal dependencies present in the data. The cascaded framework comprised of two stages: firstly, a CNN was utilized to extract features from the tumor images, and subsequently, an LSTM network was employed to model the temporal information within these extracted features. The experimental findings clearly illustrate the exceptional performance of the CNN-LSTM framework when compared to other cutting-edge methods, exhibiting remarkable accuracy in the classification of brain tumors. The proposed method held promise for improving the diagnosis and treatment planning of brain tumors, ultimately benefiting patients and healthcare professionals in the field of neuro-oncology.

Also, Srikantamurthy et al. ( 57 ) proposed a hybrid approach for accurately classifying benign and malignant subtypes of breast cancer using histopathology imaging. Transfer learning was used to combine the strengths of long short-term memory (LSTM) networks with convolutional neural networks (CNNs) in a synergistic manner. The histopathological pictures were initially processed by the CNN to extract relevant characteristics, which were then sent into the LSTM network for sequential analysis and classification. By harnessing transfer learning, the model capitalized on pre-trained CNNs trained on extensive datasets, thereby facilitating efficient representation learning. The proposed hybrid approach showed promising results in accurately distinguishing between benign and malignant breast cancer subtypes, contributing to improved diagnosis and treatment decisions in breast cancer patients.

Besides, Banerjee et al. ( 58 ) presented a hybrid approach combining Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) for the classification of histopathological breast cancer images. Using data augmentation approaches, the classifier’s robustness is increased. ResNet50, InceptionV3, and a CNN that has been pretrained on ImageNet are used to extract deep convolutional features. An LSTM Recurrent Neural Network (RNN) is then fed these features for classification. Comparing the performance of three alternative optimizers, it is found that Adam outperforms the others without leading to model overfitting. The experimental findings showed that, for both binary and multi-class classification problems, the suggested strategy outperforms cutting-edge approaches. Furthermore, the method showed promise for application in the classification of other types of cancer and diseases, making it a versatile and potentially impactful approach.

Moreover, Nandhini Abirami et al. ( 59 ) explored the application of deep Convolutional Neural Networks (CNNs) and deep Generative Adversarial Networks (GANs) in computational visual perception-driven image analysis. To increase the precision and resilience of image analysis tasks, the authors suggested a unique framework that combines the advantages of both CNNs and GANs. The deep GAN is used to create realistic and high-quality synthetic pictures, while the deep CNN is used for feature extraction and capturing high-level visual representations. The combination of these two deep learning models made it possible to analyze images more efficiently, especially when performing tasks like object identification, picture recognition, and image synthesis. Experimental results demonstrated the superiority of the proposed framework over traditional approaches, highlighting the potential of combining deep CNNs and GANs for advanced computational visual perception in image analysis.

Additionally, Yao et al. ( 60 ) proposed a parallel structure deep neural network for breast cancer histology image classification, combining Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) with an attention mechanism. The histology pictures’ ability to extract both local and global characteristics thanks to the parallel construction improved the model’s capacity to gather pertinent data. The CNN component concentrated on obtaining spatial characteristics from picture patches, whereas the RNN component sequentially captured temporal relationships between patches. By focusing attention on key visual areas, the attention mechanism improved the model’s capacity for discrimination. The suggested method’s potential for accurate breast cancer histology picture categorization was shown by experimental findings, which showed that it performs better than baseline approaches. Table 7 discusses the hybrid methods used in medical image analysis.

The methods, properties, and features of hybrid-medical image analysis mechanisms.

AuthorMain ideaAdvantageDisadvantageSimulation environmentDatasets
Shahzadi, et al. ( )Proposing a novel cascaded framework for accurately classifying brain tumors MATLAB100 samples
Srikantamurthy, et al. ( )Proposing a hybrid approach for accurately classifying benign and malignant subtypes of breast cancer Python5,000 breast images
Banerjee, et al. ( )Presenting a hybrid approach combining CNN and LSTM for the classification of histopathological breast cancer images Tensorflow828 samples
Nandhini Abirami, et al. ( )Exploring the application of deep CNNs and deep GANs in computational visual perception-driven image analysis 70,000 images
Yao, et al. ( )Proposing a parallel structure deep neural network for breast cancer histology image classification 100 images

6. Results and comparisons

The utilization of DL algorithms in medical image analysis purposes represents a pioneering stride toward the progress of medical and healthcare industries. This paper presents various innovative applications that demonstrate this paradigm, showcasing advanced knowledge in medical image analysis for motivating readers to explore innovative categories pertaining to DL algorithms in medical image analysis. The primary focus of this work is on different classifications of DL techniques utilized for DL methods in medical image analysis. Through a comprehensive analysis, it has been discovered that most DL methods in medical image analysis concentrate on advanced datasets, combined learning tasks, and annotation protocols. However, a significant limitation toward achieving the same level of functionality in medical images-DL algorithms is the inadequacy of large datasets for training, and standardized collection of data. It is crucial to ensure that diverse types of data require larger and more diverse datasets to provide reliable outcomes. Detection tasks in this field predominantly employ CNN or CNN-based techniques. In most of investigated papers the authors evaluated the topic based on several attributes, including accuracy, F-score, AUC, sensitivity, specificity, robustness, recall, adaptability, and flexibility. Sections 5.1 to 5.5 illustrate the medical image analysis-DL algorithms, where the majority of the proposed methods use both benchmark and real-time data. The DL methods used in these sections has been demonstrated in Figure 8 . The systems employed various datasets in terms of numbers and diverse categories, with accuracy, computational complexity, sensitivity, specificity, robustness, generalizability, adaptability, scalability, and F-score being the primary parameters evaluated. Accuracy was the main parameter for image analysis-based systems, whereas transparency was the least applied parameter as is depicted in Figure 9 . Its importance lies behind its direct impact on patient outcomes and healthcare decision-making. Medical image analysis plays a critical role in diagnosing and monitoring various diseases and conditions, and any inaccuracies or errors in the analysis can have serious consequences. High accuracy ensures that the deep learning algorithms can effectively and reliably detect abnormalities, classify different tissue types, and provide accurate predictions. This enables healthcare professionals to make well-informed decisions regarding treatment plans, surgical interventions, and disease management. Furthermore, accurate analysis helps reduce misdiagnosis rates, minimizes unnecessary procedures or tests, and improves overall patient care by enabling timely and appropriate interventions. In order to guarantee the efficiency and dependability of deep learning algorithms in medical image processing, accuracy acts as a crucial criterion. The majority of the solutions used the data normalization approach to combine photos from various sources that were of comparable size and quality. Some of the systems offered, however, did not provide the compute time since different datasets were utilized in the study. The datasets used in the study varied in terms of sample size, accessibility requirements, picture size, and classes. One of the most often employed algorithms was the RNN method, although cross-validation was seldom ever applied in most studies. Given that it is uncertain how the test results fluctuate, this might potentially reduce the outcomes’ resilience while delivering a high-functioning model. It is worth mentioning that cross-validation is crucial for evaluating the entire dataset. Multiple studies employ DL-based methodologies, and it is challenging to establish clear, robust, and resilient models. Future tasks include minimizing false-positive and false-negative rates to emphasize viral from bacterial pneumonia dependability. Associating DL methods in for developing medical image analysis represents a groundbreaking pace forward in technological development. It is worth mentioning that as is demonstrated in Figure 10 , Python is the most common programming language used in this context due to several key factors. Firstly, Python offers a rich ecosystem of libraries and frameworks specifically tailored for machine learning and deep learning tasks, such as TensorFlow, PyTorch, and Keras. These libraries provide efficient and user-friendly tools for developing and deploying deep learning models. Additionally, Python’s simplicity and readability make it an accessible language for researchers, clinicians, and developers with varying levels of programming expertise. Its extensive community support and vast online resources further contribute to its popularity. Moreover, Python’s versatility allows seamless integration with other scientific computing libraries, enabling researchers to preprocess, visualize, and analyze medical image data efficiently. Its wide adoption in academia, industry, and research communities fosters collaboration and knowledge sharing among experts in the field. Overall, Python’s powerful capabilities, ease of use, and collaborative ecosystem make it the preferred choice for implementing deep learning algorithms in medical image analysis. In the domain of Medical Image Analysis using Deep Learning Algorithms, diverse methodologies are employed to extract meaningful insights from complex medical imagery. CNNs are extensively utilized for their ability to automatically identify intricate patterns and features within images. RNNs, on the other hand, are crucial when dealing with sequential medical image data, such as video sequences or time-series images, as they capture temporal dependencies. Additionally, GANs play a pivotal role, especially in tasks requiring image generation or translation. Hybrid models, which integrate different architectures like CNNs and RNNs, offer a versatile approach for handling diverse types of medical image data that may require both spatial and temporal analysis. These methodologies are implemented and simulated within specialized environments, commonly leveraging Python libraries like TensorFlow, PyTorch, and Keras, which provide comprehensive support for deep learning. GPU acceleration is often utilized to expedite model training due to the computational intensity of deep learning tasks. Furthermore, custom simulation environments may be created to mimic specific aspects of medical imaging processes. The choice of datasets is paramount; researchers may draw from open-access repositories like ImageNet for pre-training, but specialized medical imaging repositories such as TCIA or RSNA are crucial for tasks in healthcare. Additionally, custom-collected datasets tailored to specific medical image analysis tasks are often employed to ensure data relevance and quality. Data augmentation techniques, like rotation and scaling, are applied to expand datasets and mitigate limitations associated with data scarcity. These synergistic efforts in methodologies, simulation environments, and datasets are essential for the successful development and evaluation of deep learning algorithms in medical image analysis, facilitating accurate and reliable results for a wide array of healthcare applications.

An external file that holds a picture, illustration, etc.
Object name is fpubh-11-1273253-g008.jpg

DL methods used in medical image analysis.

An external file that holds a picture, illustration, etc.
Object name is fpubh-11-1273253-g009.jpg

The most important parameters considered in investigated papers.

An external file that holds a picture, illustration, etc.
Object name is fpubh-11-1273253-g010.jpg

Programming languages used in learning algorithms used for medical image analysis.

6.1. Convolutional neural network

CNNs have been used successfully in medical image processing applications, however they also have significant drawbacks and difficulties. Due to the high expense and complexity of image collecting and annotation, it may be challenging to get the vast quantity of labeled data needed to train the network in the context of medical imaging. Additionally, the labeling procedure may add some subjectivity or inter-observer variability, which can influence the CNN models’ accuracy and dependability ( 61 ). A further issue is the possible bias of CNN models toward the distribution of training data, which might result in subpar generalization performance on fresh or untried data. This is particularly relevant in medical imaging, where the patient population may be diverse and heterogeneous, and the image acquisition conditions may vary across different imaging modalities and clinical settings. Furthermore, the interpretability of CNN models in medical imaging is still a major concern, as they typically rely on complex and opaque learned features that are difficult to interpret or explain. This limits the ability of clinicians to understand and trust the decisions made by the CNN models, and may hinder their adoption in clinical practice. Finally, CNN models are computationally intensive and require significant computational resources, which may limit their scalability and practical use in resource-constrained environments or low-resource settings ( 62 ).

The CNN method offers several benefits in the context of healthcare applications. Firstly, CNNs can automatically learn relevant features from raw input data such as medical images or physiological signals, without requiring manual feature extraction. This makes them highly effective for tasks such as image classification, object detection, and segmentation, and can lead to more accurate and efficient analyzes. Secondly, CNNs can handle large amounts of complex data and improve classification accuracy, making them well-suited for medical diagnosis and prediction ( 63 ). Additionally, CNNs can be trained on large datasets, which can help in detecting rare or complex patterns in the data that may be difficult for humans to identify. Finally, the use of deep learning algorithms such as CNNs in healthcare applications has the potential to improve patient outcomes, enable early disease detection, and reduce medical costs.

6.2. Recurrent neural network

Recurrent Neural Networks (RNNs) have shown great success in modeling sequential data such as time series and natural language processing tasks. However, in medical image analysis, there are some challenges and limitations when using RNNs. RNNs are mainly designed to model temporal sequences and do not have a natural way of handling spatial information in images. This can limit their ability to capture local patterns and spatial relationships between pixels in medical images. RNNs require a lot of computational power to train, especially when dealing with large medical image datasets ( 64 ). This can make it difficult to train models with high accuracy. When training deep RNN models, the gradients can either vanish or explode, making it difficult to optimize the model parameters effectively. This can lead to longer training times and lower accuracy. RNNs are prone to overfitting when the size of the training dataset is small. This can result in poor generalization performance when the model is applied to new, unseen data. Unbalanced data: In medical image analysis, the dataset may be highly unbalanced, with a small number of positive cases compared to negative cases. This can make it difficult to train an RNN model that can accurately classify the data. Researchers have created a variety of RNN-based designs, including Long Short-Term Memory (LSTM) networks and Gated Recurrent Units (GRUs), which have demonstrated promising performance in applications involving medical picture interpretation. Additionally, combining RNNs with other deep learning techniques such as CNNs can help improve performance by capturing both spatial and temporal features ( 65 ).

It’s possible that these papers faced some challenges when using the RNN method. RNNs can suffer from vanishing gradients, where the gradients used for optimization become very small and make learning slow or even impossible. This can be a challenge for RNNs when working with long sequences of data. Overfitting is a problem with RNNs, when the model gets too complicated and begins to memorize the training set rather than generalizing to new data. When working with little data, like in applications for the healthcare industry, this can be particularly difficult. RNNs may be difficult to train computationally, especially when working with big volumes of data ( 66 ). This can be a challenge when working with IoT devices that have limited computational resources. There are many different types of RNNs and architectures to choose from, each with its own strengths and weaknesses. It can be challenging to select the right architecture for a given task. Overall, while RNNs can be powerful tools for analyzing time-series data in IoT applications, they do come with some potential challenges that must be carefully considered when using them.

6.3. Generative adversarial network

Generative Adversarial Networks (GANs) have shown promising results in various fields, including medical image analysis. However, GANs also face some challenges and limitations, which can affect their performance in medical image analysis. Medical image datasets are often limited due to the cost and difficulty of acquiring large amounts of high-quality data. To correctly understand the underlying distribution of the data, GANs need a lot of data. Therefore, when working with tiny medical picture datasets, the performance of GANs may be constrained ( 67 ). Medical picture databases may not be evenly distributed, which means that some classifications or diseases are underrepresented. For underrepresented classes or circumstances, GANs could find it difficult to provide realistic examples. Regardless of the input, mode collapse happens when a GAN’s generator learns to produce only a small number of samples. Mode collapse in medical image processing can lead to the creation of irrational pictures or the loss of crucial data. Overfitting is a problem with GANs that happens when the model memorizes the training data rather than generalizing to. There is currently no standardization for evaluating GANs in medical image analysis. This can make it challenging to compare different GAN models and assess their performance accurately. Addressing these challenges and limitations requires careful consideration of the specific medical image analysis task, the available data, and the design of the GAN model. Moreover, a multi-disciplinary approach involving clinicians, radiologists, and computer scientists is necessary to ensure that the GAN model’s outputs are meaningful and clinically relevant ( 68 ).

6.4. Long short-term memory

Long Short-Term Memory (LSTM) networks are a type of recurrent neural network that has shown promising results in various applications, including medical image analysis. However, LSTMs also face some challenges and limitations, which can affect their performance in medical image analysis. LSTMs rely on a fixed-length input sequence, and the context provided by the input sequence may be limited, especially in the case of medical image analysis. For example, in a sequence of medical images, it may be challenging to capture the full context of the images in a fixed-length input sequence. LSTMs can be prone to overfitting, especially when dealing with small datasets. When the model starts to memorize the training data instead of generalizing to new, untried data, overfitting might happen. This might lead to subpar performance when the model is tested on fresh medical photos ( 69 ). LSTMs are sometimes referred to as “black box” models since it might be difficult to understand how the model generates its predictions. This can be a limitation in medical image analysis, where clinicians need to understand how the model arrived at its decision. LSTMs can be computationally expensive, especially when dealing with long input sequences or large medical image datasets. This can make it challenging to train the model on a standard computer or within a reasonable time frame. Medical image datasets can be imbalanced, meaning that certain classes or conditions are underrepresented in the dataset. LSTMs may struggle to learn the patterns of underrepresented classes or conditions. LSTMs may have limited generalizability to new medical image datasets or different medical conditions, especially if the model is trained on a specific dataset or medical condition Addressing these challenges and limitations requires careful consideration of the specific medical image analysis task, the available data, and the design of the LSTM model. Moreover, a multi-disciplinary approach involving clinicians, radiologists, and computer scientists is necessary to ensure that the LSTM model’s outputs are meaningful and clinically relevant. Additionally, techniques such as data augmentation, transfer learning, and model compression can be used to improve the performance of LSTMs in medical image analysis ( 70 ).

6.5. Hybrid

The reason for using hybrid methods, such as combining CNN and LSTM, is that they have complementary strengths and weaknesses. CNN is particularly good at extracting spatial features from high-dimensional data such as images, while LSTM is good at modeling temporal dependencies in sequences of data. By combining them, one can leverage the strengths of both to improve the accuracy of the prediction. Additionally, hybrid methods can be used to address challenges such as overfitting, where the model may become too specialized on the training data, and underfitting, where the model may not capture the underlying patterns in the data ( 71 ). Hybrid models can also provide a more robust approach to dealing with noisy or missing data by allowing for more complex interactions between features and time.

The use of hybrid approaches, like CNN-LSTM, in medical image analysis with deep learning algorithms, presents several challenges and limitations. Firstly, the complexity of the network architecture poses a significant hurdle in training these models. Integrating different models with diverse parameters, loss functions, and optimization algorithms can lead to suboptimal performance, potentially causing overfitting or underfitting issues, which adversely impact accuracy and generalizability ( 72 ). Secondly, a major challenge lies in obtaining a substantial amount of data to effectively train hybrid models. Medical image data is often scarce and costly to acquire, thereby restricting the capacity to train deep learning models comprehensively ( 73 ). Furthermore, medical image data’s high variability and subjectivity can compromise the training data quality and model performance. Moreover, interpreting the results generated by hybrid models can be problematic. The models’ complexity may obscure the understanding of how they arrive at predictions or classifications, limiting their practicality in clinical practice and possibly raising doubts or skepticism among medical professionals. Lastly, the computational cost associated with training and deploying hybrid models can be prohibitive ( 74 ). These models demand powerful hardware and are computationally intensive, limiting their applicability in real-world medical settings. The ability to utilize the capabilities of both models and enhance the accuracy and performance of the entire system are two advantages of utilizing hybrid approaches, such as the CNN-LSTM model. The CNN layer is utilized in the CNN-LSTM model-based COVID-19 prediction to extract spatial characteristics from the data, while the LSTM layer is used to capture temporal relationships and provide predictions based on time series data. Similar to how the CNN layer is used to extract spatial information from the EEG data in the low-invasive and low-cost BCI headband, the LSTM layer is used to collect temporal relationships and categorize the signals. When reconstructing an ECG signal using a Doppler sensor, the hybrid. Overall, the hybrid models can provide better performance and accuracy compared to using either model alone ( 75 ).

The utilization of hybrid methods, such as the CNN-LSTM model, offers various advantages, including the amalgamation of both models’ strengths to enhance the overall system’s accuracy and performance. For instance, the CNN layer is used to extract spatial characteristics from the data in the COVID-19 prediction using the CNN-LSTM model, while the LSTM layer collects temporal relationships and makes predictions based on the time series data. Similar to how the CNN layer gets spatial information from the EEG data in the instance of EEG detection using a low-invasive and affordable BCI headband, the LSTM layer captures temporal relationships and categorizes the signals ( 76 ). The hybrid model makes use of the CNN layer to extract high-level features from the Doppler signal in the context of reconstructing an ECG signal using a Doppler sensor, and the LSTM layer makes use of the derived features to help reconstruct the ECG signal. In summary, employing hybrid models can yield superior performance and accuracy compared to using either model individually. This approach enables the combination of spatial and temporal information, harnessing the strengths of both CNN and LSTM models to enhance various applications such as COVID-19 prediction, EEG detection, and ECG signal reconstruction.

6.6. Prevalent evaluation criteria

Due to its capacity to increase the precision and efficacy of medical diagnosis and therapy, deep learning algorithms for medical image analysis have grown in popularity in recent years. In evaluating the performance of deep learning algorithms in medical image analysis, there are several prevalent evaluation criteria, which are described below ( 13 ).

6.6.1. Accuracy

Accuracy is the most commonly used metric for evaluating the performance of deep learning algorithms in medical image analysis. It measures the percentage of correctly classified images or regions of interest (ROIs) in medical images.

6.6.2. Sensitivity and specificity

Sensitivity measures the proportion of true positive results, which are the number of positive cases that are correctly identified by the algorithm. Specificity measures the proportion of true negative results, which are the number of negative cases that are correctly identified by the algorithm. Both metrics are used to evaluate the diagnostic performance of deep learning algorithms in medical image analysis.

6.6.3. Precision and recall

Precision measures the proportion of true positive results among all the positive cases identified by the algorithm. Recall measures the proportion of true positive results among all the positive cases in the ground truth data. Both metrics are used to evaluate the performance of deep learning algorithms in medical image analysis, particularly in binary classification tasks.

6.6.4. F1-score

The F1-score is a metric that combines precision and recall into a single score. It is often used to evaluate the performance of deep learning algorithms in medical image analysis, particularly in binary classification tasks.

6.6.5. Hausdorff distance

The Hausdorff distance is a metric that measures the maximum distance between the boundaries of two sets of ROIs in medical images. It is often used to evaluate the segmentation accuracy of deep learning algorithms in medical image analysis.

In general, the unique task and setting of the medical image analysis determine the selection of assessment criteria. In order to evaluate the outcomes of deep learning algorithms in the context of clinical practice, it is crucial to choose appropriate assessment criteria that are pertinent to the therapeutic demands.

6.7. Challenges of the DL applications in medical image analysis

The lack of high-quality annotated data is one of the greatest problems with deep learning (DL) algorithms used for medical image analysis. For DL models to perform well and generalize, they need a lot of labeled data. But getting high-quality annotations for medical photos is challenging for a number of reasons: restricted accessibility: Because it is expensive and time-consuming to capture and annotate medical pictures, the amount of data from annotated images is constrained ( 76 ). Additionally, the process of annotating calls for medical professionals with particular training and understanding, who are not always available. Due to changes in patient anatomy, imaging modality, and disease pathology, medical pictures are complicated and extremely varied. Annotating medical images requires a high degree of accuracy and consistency, which can be challenging for complex and heterogeneous medical conditions. Privacy and ethical issues: The annotation process has the potential to make medical photographs containing sensitive patient data vulnerable to abuse or unauthorized access. Medical image analysis has a significant difficulty in protecting patient privacy and confidentiality while preserving the caliber of annotated data. Annotating medical pictures requires making subjective assessments, which might result in bias and variability in the annotations. These variables may have an impact on the effectiveness and generalizability of DL models, especially when the annotations are inconsistent among datasets or annotators ( 77 ). To address the challenge of limited availability of high-quality annotated data, several approaches have been proposed, including:

  • Transfer learning: To enhance the performance of DL models on smaller datasets, transfer learning uses pre-trained models that have been learned on big datasets. By using this method, the volume of annotated data needed to train DL models may be decreased, and the generalizability of the models can be increased.
  • Data augmentation: By applying modifications to already-existing, annotated data, data augmentation includes creating synthetic data. The diversity and quantity of annotated data available for DL model training may be increased using this method, and it can also raise the models’ resistance to fluctuations in medical pictures.
  • Active learning: Active learning involves selecting the most informative and uncertain samples for annotation, rather than annotating all the data. This approach can reduce the annotation workload and improve the efficiency of DL model training.
  • Collaborative annotation: Collaborative annotation involves engaging medical experts, patients, and other stakeholders in the annotation process to ensure the accuracy, consistency, and relevance of annotations to clinical needs and values.

Overall, addressing the challenge of limited availability of high-quality annotated data in medical image analysis requires a combination of technical, ethical, and social solutions that can improve the quality, quantity, and diversity of annotated data while ensuring patient privacy and ethical standards.

Deep learning algorithms for medical image analysis have a significant problem in terms of data quality. The model’s performance may be considerably impacted by the caliber of the data utilized to train the deep learning algorithms ( 78 ). Obtaining medical pictures may be difficult, and their quality can vary based on a number of variables, such as the image capture equipment used, the image resolution, noise, artifacts, and the imaging technique. Furthermore, the annotations or labels used for training can also impact the quality of the data. Annotations may not always be accurate, and they may suffer from inter-and intra-observer variability, which can lead to biased models or models with poor generalization performance. To overcome the challenge of data quality, researchers need to establish robust quality control procedures for both image acquisition and annotation. Additionally, they need to develop algorithms that can handle noisy or low-quality data and improve the accuracy of annotations. Finally, they need to develop methods to evaluate the quality of the data used to train the deep learning models ( 79 ).

Interpretability poses a significant challenge in medical image analysis when employing deep learning algorithms, primarily due to the conventional black-box nature of these models, which makes it arduous to comprehend the reasoning behind their predictions. This lack of interpretability hinders clinical acceptance, as healthcare professionals necessitate understanding and trust in a model’s decision-making process to utilize it effectively. Moreover, interpretability plays a vital role in identifying and mitigating biases within the data and model, ensuring that decisions are not influenced by irrelevant or discriminatory features. Various approaches have been developed to enhance the interpretability of deep learning models for medical image analysis ( 80 ). These approaches include visualization techniques, saliency maps, and model explanations. Nonetheless, achieving complete interpretability remains a challenge in this field as it necessitates a trade-off between model performance and interpretability. Striking the right balance between these factors remains an ongoing endeavor. Transferability refers to the ability of a deep learning model trained on a particular dataset to generalize and perform well on new datasets that have different characteristics. In the context of medical image analysis, transferability is a significant challenge due to the diversity of medical imaging data, such as variations in image quality, imaging protocols, and imaging modalities. Deep learning models that are trained on a specific dataset may not perform well on different datasets that have variations in data quality and imaging characteristics. This can be problematic when developing deep learning models for medical image analysis because it is often not feasible to train a new model for every new dataset. To address this challenge, researchers have explored techniques such as transfer learning and domain adaptation. Transfer learning involves using a pre-trained model on a different but related dataset to initialize the model weights for the new dataset, which can improve performance and reduce the amount of training required. Domain adaptation involves modifying the model to account for the differences between the source and target domains, such as differences in imaging protocols or modalities ( 81 ). However, the challenge of transferability remains a significant issue in medical image analysis, and there is ongoing research to develop more robust and transferable deep learning models for this application.

In deep learning-based medical image analysis, overfitting is a frequent problem when a model gets overly complicated and fits the training data too closely, leading to poor generalization to new, unforeseen data. Numerous factors, including the inclusion of noise in the training data, an unbalanced class distribution, or a lack of training data, can lead to overfitting ( 64 ). The latter is a prevalent problem in medical imaging since the dataset size is constrained by the absence of annotated data. Overfitting can provide erroneous positive or negative findings because it can produce high accuracy on training data but poor performance on validation or testing data. To avoid overfitting in deep learning, several strategies may be used, including regularization, early halting, and data augmentation. In medical image analysis, ensuring the quality of data and increasing the size of the dataset are essential to prevent overfitting.

Clinical adoption refers to the process of integrating new technologies or methodologies into clinical practice. In the context of medical image analysis using deep learning algorithms, clinical adoption is a challenge because it requires a significant change in how physicians and healthcare providers diagnose and treat patients ( 82 ). Clinical adoption involves not only technical considerations such as integrating the algorithms into existing systems and workflows, but also ethical, legal, and regulatory considerations, as well as training healthcare providers to use the new technology effectively and safely. One of the key challenges of clinical adoption is ensuring that the deep learning algorithms are accurate and reliable enough to be used in clinical decision-making. This requires rigorous validation and testing of the algorithms, as well as addressing concerns around the interpretability and generalizability of the results. Additionally, healthcare providers and patients may have concerns about the use of these algorithms in making medical decisions, particularly if the algorithms are seen as replacing or minimizing the role of the human clinician. Another challenge of clinical adoption is the need for regulatory approval, particularly in cases where the algorithms are used to support diagnosis or treatment decisions. Regulatory bodies such as the FDA may require clinical trials to demonstrate the safety and effectiveness of the algorithms before they can be used in clinical practice. The adoption of these technologies may be slowed down by this procedure since it can be time-consuming and expensive. Overall, clinical adoption is an important challenge to consider in the development and deployment of medical image analysis using deep learning algorithms, as it affects the ultimate impact of these technologies on patient care ( 83 ).

6.8. Dataset in medical image analysis using ML algorithms

In medical image analysis, a dataset is a collection of medical images that are used to train machine learning algorithms to detect and classify abnormalities or diseases. The dataset could be obtained from various sources such as clinical trials, imaging studies, or public repositories ( 84 ). The dataset’s data quality and size have a significant impact on how well the machine learning algorithm performs. Therefore, a dataset should be diverse and representative of the population under study to ensure the accuracy and generalizability of the algorithm. In addition, datasets may require pre-processing, such as normalization or augmentation, to address issues such as data imbalance, low contrast, or artifacts. A fundamental issue in the field of medical image analysis is still finding and using big, carefully managed medical picture databases. However, efforts are underway to improve the quality and availability of medical image datasets for researchers to advance the development of ML algorithms for medical diagnosis and treatment. In medical image analysis using machine learning (ML) algorithms, a dataset is a collection of images that are used to train and test ML models. Any ML project must include a dataset since the dataset’s size and quality directly affect how well the model performs. Obtaining and annotating medical photos from a variety of sources, including hospitals, clinics, and research organizations, is a standard step in the process of producing a dataset ( 85 ). To specify the areas of interest or characteristics that the ML model needs to learn, the pictures must be tagged. These labels could provide details about the disease shown in the picture, the anatomy of the area being imaged, or other pertinent facts. The training set and the test set are formed once the dataset is first established. The ML model is trained using the training set, and tested using the test set. As such, there is ongoing research in the field of medical image analysis aimed at improving dataset quality and size, as well as developing better methods for acquiring and labeling medical images ( 74 , 86 ).

6.9. Security issues, challenges, risks, IoT and blockchain usage

Medical image analysis using deep learning algorithms raises several security issues, particularly with regard to patient privacy and data protection. The medical images used for training the deep learning models may contain sensitive information, such as personally identifiable information (PII), health records, and demographic information, which must be kept confidential and secure. One of the main security issues is the risk of data breaches, which can occur during the data collection, storage, and transmission stages. Hackers or unauthorized personnel can intercept the data during transmission, gain access to the storage systems, or exploit vulnerabilities in the software or hardware infrastructure used to process the data ( 13 ). To mitigate this risk, various security measures must be put in place, such as encryption, access controls, and monitoring tools ( 87 ). Another security issue is the possibility of malicious attacks on the deep learning models themselves. Attackers can attempt to manipulate the models’ outputs by feeding them with malicious inputs, exploiting vulnerabilities in the models’ architecture or implementation, or using adversarial attacks to deceive the models into making wrong predictions. This can have serious consequences for patient diagnosis and treatment, and thus, it is critical to design and implement secure deep learning models. In summary, security is a critical concern in medical image analysis using deep learning algorithms, and it is essential to adopt appropriate security measures to protect the confidentiality, integrity, and availability of medical data and deep learning models.

There are several risks associated with medical image analysis using deep learning algorithms. Some of the key risks include:

  • Inaccuracy: Deep learning algorithms may sometimes provide inaccurate results, which can lead to incorrect diagnoses or treatment decisions.
  • Bias: Deep learning algorithms may exhibit bias in their decision-making processes, leading to unfair or inaccurate results for certain groups of patients.
  • Privacy concerns: Medical images often contain sensitive information about patients, and there is a risk that this data could be exposed or compromised during the analysis process.
  • Cybersecurity risks: As with any technology that is connected to the internet or other networks, there is a risk of cyberattacks that could compromise the security of medical images and patient data.
  • Lack of transparency: Deep learning algorithms can be difficult to interpret, and it may be challenging to understand how they arrive at their conclusions. This lack of transparency can make it difficult to trust the results of the analysis.

Overall, it is important to carefully consider these risks and take steps to mitigate them when using deep learning algorithms for medical image analysis. This includes implementing strong cybersecurity measures, ensuring data privacy and confidentiality, and thoroughly validating the accuracy and fairness of the algorithms.

The term “Internet of Things” (IoT) describes how physical “things” are linked to the internet so they can trade and gather data. IoT may be used to link medical imaging devices and enable real-time data collecting and analysis in the field of medical image analysis. For instance, a network may be used to connect medical imaging equipment like CT scanners, MRIs, and ultrasounds, which can then transfer data to a cloud-based system for analysis ( 88 ). This can facilitate remote consultations and diagnostics and speed up the examination of medical images. IoT can also make it possible to combine different medical tools and data sources, leading to more thorough and individualized patient treatment. However, the use of IoT in medical image analysis also raises security and privacy concerns, as sensitive patient data is transmitted and stored on a network that can be vulnerable to cyber-attacks.

7. Open issues

There are several open issues related to medical image analysis using deep learning algorithms. These include:

7.1. Data privacy

Data privacy is a major concern in medical image analysis using deep learning algorithms. Medical images contain sensitive patient information that must be kept confidential and secure. In order to secure patient data from illegal access, usage, or disclosure, any algorithm or system used for medical image analysis must follow this rule. This can be particularly difficult since medical image analysis sometimes involves enormous volumes of data, which raises the possibility of data breaches or unwanted access. The need to strike a balance between the demands of data access and patient privacy protection is one of the primary issues with data privacy in medical image analysis. Many medical image analysis algorithms rely on large datasets to achieve high levels of accuracy and performance, which may require sharing data between multiple parties ( 89 ). This can be particularly challenging when dealing with sensitive patient information, as there is a risk of data leakage or misuse. Several methods may be utilized to protect data privacy in medical image analysis in order to deal with these issues. These include rules and processes to guarantee that data is accessed and used only for legal purposes, data anonymization, encryption, and access restrictions. Additionally, to guarantee that patient data is safeguarded and handled properly, healthcare companies must ensure that they adhere to pertinent data privacy laws, such as HIPAA in the United States or GDPR in the European Union.

7.2. Data bias

When employing deep learning algorithms to analyze medical images, data bias is a serious open problem. It alludes to the fact that the data used to train the deep learning models contains systematic flaws ( 90 ). These blunders may result from variables including the choice of training data, how the data is labeled, and how representative the data are of the population of interest. Data bias can result in the creation of models that underperform on particular segments of the population, such as members of underrepresented groups or those who suffer from unusual medical diseases. This can have serious implications for the accuracy and fairness of medical image analysis systems, as well as for the potential harm caused to patients if the models are used in clinical decision-making. Addressing data bias requires careful consideration of the data sources, data labeling, and model training strategies to ensure that the models are representative and unbiased ( 91 ).

7.3. Limited availability of annotated data

Deep learning algorithms in medical image analysis need a lot of annotated data to be taught properly. Annotated data refers to medical images that have been labeled by experts to indicate the location and type of abnormalities, such as tumors, lesions, or other pathologies. However, obtaining annotated medical image datasets is particularly challenging due to several factors. First off, annotating medical photos takes time and requires in-depth understanding. Only experienced radiologists or clinicians can accurately identify and label abnormalities in medical images, which can limit the availability of annotated data. Secondly, there are privacy concerns associated with medical image data. Patient privacy is a critical concern in healthcare, and medical image data is considered particularly sensitive ( 92 ). As a result, obtaining large-scale annotated medical image datasets for deep learning is challenging due to privacy concerns and the need to comply with regulations such as HIPAA. Thirdly, the diversity of medical image data can also pose a challenge. Medical images can vary widely in terms of modality, acquisition protocols, and image quality, making it difficult to create large, diverse datasets for deep learning. Deep learning algorithms for medical image analysis may be limited in their ability to develop and be validated as a result of the difficulties in getting datasets of annotated medical images. In order to decrease the volume of labeled data needed for training, researchers have tackled this issue by adopting methods including transfer learning, data augmentation, and semi-supervised learning ( 93 ). However, these techniques may not be sufficient in all cases, and there is a need for more annotated medical image datasets to be made available to researchers to advance the field of medical image analysis using deep learning.

7.4. Interpretability and transparency

When employing deep learning algorithms for medical picture analysis, interpretability and transparency are crucial concerns. Deep learning models are sometimes referred to as “black boxes” because they can be tricky to read, making it difficult to comprehend how they made judgments. In medical image analysis, interpretability is essential for clinicians to understand and trust the algorithms, as well as to identify potential errors or biases. Interpretability refers to the ability to understand the reasoning behind a model’s decision-making process. Convolutional neural networks (CNNs), one type of deep learning model, can include millions of parameters that interact in intricate ways. This complexity can make it difficult to understand how the model arrived at a particular decision, especially for clinicians who may not have experience with deep learning. Transparency refers to the ability to see inside the model and understand how it works ( 94 ). In other words, transparency means that the model’s decision-making process is clear and understandable, and can be validated and audited. Transparency is essential for ensuring that the model is working correctly and not introducing errors or biases. In medical image analysis, interpretability and transparency are critical because clinicians need to understand how the algorithm arrived at its decisions. This understanding can help clinicians identify errors or biases and ensure that the algorithm is making decisions that are consistent with clinical practice. To increase the interpretability and transparency of deep learning models in medical image analysis, several techniques have been developed. For instance, heatmaps that display which areas of an image the model is utilizing to make judgments may be produced using visualization approaches. Additionally, attention mechanisms can be used to highlight important features in an image and explain the model’s decision-making process. Other techniques include using explainable AI (XAI) methods and incorporating domain knowledge into the models. While these techniques have shown promise, there is still a need for more transparent and interpretable deep learning models in medical image analysis to improve their utility in clinical practice.

7.5. Generalizability

A significant unresolved problem in deep learning-based medical picture analysis is generalizability. The capacity of a model to function effectively on data that differs from the data it was trained on is referred to as generalizability. In other words, a trained model should be able to generalize to other datasets and still perform well. In medical image analysis, generalizability is critical because it ensures that the deep learning algorithms can be used on new patient populations or in different clinical settings. However, deep learning models can be prone to overfitting, which occurs when a model performs well on the data it was trained on but performs poorly on new data. This can be particularly problematic in medical image analysis, where a model that overfits can lead to inaccurate or inconsistent diagnoses. The generalizability of deep learning models for medical image processing might vary depending on a number of variables. For instance, a model’s capacity to generalize can be significantly impacted by the variety of the dataset used to train it ( 95 ). The model might not be able to identify anomalies that it has never seen before if the training dataset is not sufficiently varied. Another factor that can affect generalizability is the performance of the model on different types of medical images. For example, a model that is trained on CT scans may not perform well on MRI scans because the image modality and acquisition protocols are different. Researchers are examining methods including transfer learning, data augmentation, and domain adaptation to increase the generalizability of deep learning models in medical picture analysis. Transfer learning entails fine-tuning a pre-trained model using a fresh dataset as a starting point. Data augmentation entails using transformations like rotations and translations to artificially expand the size and variety of the training dataset. The process of domain adaptation is modifying a model that has been trained on one dataset to function on another dataset with different properties. The generalizability of deep learning models in medical image processing has to be improved in order to assure their safe and efficient application in clinical practice, even if these approaches have showed promise ( 96 ).

7.6. Validation and regulatory approval

Validation and regulatory approval are important open issues in medical image analysis using deep learning algorithms. Validation refers to the process of verifying that a model is accurate and reliable. Regulatory approval refers to the process of obtaining approval from regulatory bodies, such as the FDA in the US, before a model can be used in clinical practice. Validation is critical in medical image analysis because inaccurate or unreliable models can lead to incorrect diagnoses and treatment decisions. Validation involves testing the model on a separate dataset that was not used for training and evaluating its performance on a range of metrics. Validation can also involve comparing the performance of the model to that of human experts. Regulatory approval is important in medical image analysis to ensure that the models are safe and effective for use in clinical practice. Regulatory bodies require evidence of the model’s safety, efficacy, and performance before approving it for use. This evidence can include clinical trials, real-world data studies, and other forms of validation. There are several challenges associated with validation and regulatory approval of deep learning models in medical image analysis. One challenge is the lack of standardized validation protocols, which can make it difficult to compare the performance of different models ( 97 ). Another challenge is the lack of interpretability and transparency of deep learning models, which can make it difficult to validate their performance and ensure their safety and efficacy. Researchers and regulatory organizations are collaborating to provide standardized validation processes and criteria for regulatory approval of deep learning models in medical image analysis in order to overcome these issues. For instance, the FDA has published guidelines for the creation and approval of medical devices based on machine learning and artificial intelligence (AI/ML). These guidelines provide recommendations for the design and validation of AI/ML-based medical devices, including those used for medical image analysis. While these efforts are promising, there is still a need for further research and collaboration between researchers and regulatory bodies to ensure the safe and effective use of deep learning models in medical image analysis ( 98 ).

7.7. Ethical and legal considerations

Deep learning algorithms for medical image processing raise a number of significant outstanding questions about moral and legal dilemmas. These factors concern the use of patient data in research, the possibility of algorithmic biases, and the duty of researchers and healthcare professionals to guarantee the ethical and safe application of these technologies. Use of patient data in research is one ethical issue. Large volumes of patient data are needed for medical image analysis, and the use of this data raises questions concerning patient privacy and permission. Patients’ privacy must be maintained, and researchers and healthcare professionals must make sure that patient data is utilized responsibly ( 99 ). The possibility for prejudice in algorithms is another ethical issue. Deep learning algorithms may be taught on skewed datasets, which might cause the model’s outputs to become biased. Biases can result in incorrect diagnosis and treatment choices in medical image analysis, which can have catastrophic repercussions. Researchers must take action to address any potential biases in their datasets and algorithms. Deep learning algorithms for medical image interpretation raise legal questions around intellectual property, liability, and compliance with regulations. Concerns exist around the possibility of unwanted access to patient data as well as the requirement to uphold data protection regulations in order to preserve patient privacy. To address these ethical and legal considerations, researchers and healthcare providers must ensure that they are following best practices for data privacy and security, obtaining informed consent from patients, and working to mitigate potential biases in their algorithms. It is also important to engage with stakeholders, including patients, regulatory bodies, and legal experts, to ensure that the development and use of these technologies is safe, ethical, and compliant with relevant laws and regulations ( 100 ).

7.8. Future works

Future research in the fast-developing field of medical image analysis utilizing deep learning algorithms has a lot of potential to increase the precision and effectiveness of medical diagnosis and therapy. Some of these areas include:

7.8.1. Multi-modal image analysis

Future research in medical image analysis utilizing deep learning algorithms will focus on multi-modal picture analysis. Utilizing a variety of imaging modalities, including MRI, CT, PET, ultrasound, and optical imaging, allows for a more thorough understanding of a patient’s anatomy and disease ( 101 ). This strategy can aid in enhancing diagnostic precision and lowering the possibility of missing or incorrect diagnoses. Multi-modal picture data may be used to train deep learning algorithms for a range of tasks, including segmentation, registration, classification, and prediction. An algorithm built on MRI and PET data, for instance, might be used to identify areas of the brain afflicted by Alzheimer’s disease. Similarly, a deep learning algorithm could be trained on ultrasound and CT data to identify tumors in the liver. Multi-modal image analysis poses several challenges for deep learning algorithms. For example, different imaging modalities have different resolution, noise, and contrast characteristics, which can affect the performance of the algorithm. Additionally, multi-modal data can be more complex and difficult to interpret than single-modality data, requiring more advanced algorithms and computational resources ( 102 ). To address these challenges, researchers are developing new deep learning models and algorithms that can integrate and analyze data from multiple modalities. For example, multi-modal fusion networks can be used to combine information from different imaging modalities, while attention mechanisms can be used to focus the algorithm’s attention on relevant features in each modality. Overall, multi-modal image analysis holds promise for improving the accuracy and efficiency of medical diagnosis and treatment using deep learning algorithms. As these technologies continue to evolve, it will be important to ensure that they are being used safely, ethically, and in accordance with relevant laws and regulations.

7.8.2. Explainable AI

Future research in deep learning algorithms for medical image analysis will focus on explainable AI (XAI). XAI is the capacity of an AI system to explain its decision-making process in a way that is intelligible to a human ( 103 ). XAI can assist to increase confidence in deep learning algorithms when employed in the context of medical image analysis, guarantee that they are utilized safely and morally, and allow clinicians to base their judgments more intelligently on the results of these algorithms. XAI in medical image analysis involves developing algorithms that can not only make accurate predictions or segmentations but also provide clear and interpretable reasons for their decisions. This can be particularly important in cases where the AI system’s output contradicts or differs from the clinician’s assessment or prior knowledge. One approach to XAI in medical image analysis is to develop visual explanations or heatmaps that highlight the regions of an image that were most important in the algorithm’s decision-making process. These explanations can help to identify regions of interest, detect subtle abnormalities, and provide insight into the algorithm’s thought process ( 104 ). Another approach to XAI in medical image analysis is to incorporate external knowledge or prior information into the algorithm’s decision-making process. For example, an algorithm that analyzes brain MRIs could be designed to incorporate known patterns of disease progression or anatomical landmarks. Overall, XAI holds promise for improving the transparency, interpretability, and trustworthiness of deep learning algorithms in medical image analysis. As these technologies continue to evolve, it will be important to ensure that they are being used safely, ethically, and in accordance with relevant laws and regulations ( 105 ).

7.8.3. Transfer learning

Future research in the field of deep learning-based medical image processing will focus on transfer learning. Transfer learning is the process of using previously trained deep learning models to enhance a model’s performance on a new task or dataset. Transfer learning can be particularly helpful in the interpretation of medical images as it can eliminate the requirement for significant volumes of labeled data, which can be challenging and time-consuming to gather. Researchers can use pre-trained models that have already been trained on huge datasets to increase the precision and effectiveness of their own models by taking advantage of the information and representations acquired by these models. Since transfer learning can do away with the need for large amounts of labeled data, which can be difficult and time-consuming to collect, it can be very useful in the interpretation of medical pictures. By utilizing the knowledge and representations amassed by pre-trained models that have previously been trained on massive datasets, researchers may utilize them to improve the accuracy and efficacy of their own models ( 106 ). The pre-trained model could be a useful place to start for the medical image analysis problem since it enables the model to learn from less data and might lessen the possibility of overfitting. Additionally, transfer learning may increase the generalizability of deep learning models used for medical picture interpretation. Medical image analysis models may be able to develop more reliable and generalizable representations of medical pictures that are relevant to a wider range of tasks and datasets by making use of pre-trained models that have learnt representations of real images. Transfer learning has the potential to enhance the effectiveness, precision, and generalizability of deep learning models used for medical image interpretation. As these technologies continue to evolve, it will be important to ensure that they are being used safely, ethically, and in accordance with relevant laws and regulations.

7.8.4. Federated learning

Future research in deep learning algorithms for medical image analysis will focus on federated learning. Without the need to move the data to a central server, federated learning refers to the training of machine learning models on data that is dispersed among several devices or institutions. Federated learning can be especially helpful in the context of medical image analysis since it permits the exchange of information and expertise between institutions while safeguarding the confidentiality and security of sensitive patient data ( 107 ). In situations where patient data is subject to strong privacy laws, such as the Health Insurance Portability and Accountability Act (HIPAA) in the United States, this can be particularly crucial. Federated learning works by training a central machine learning model on a set of initial weights, which are then sent to each of the participating devices or institutions. Each device or institution then trains the model on their own local data, using the initial weights as a starting point. The updated weights from each device or institution are then sent back to the central server, where they are aggregated to update the central model. This process is repeated iteratively until the model converges. By training models using federated learning, medical institutions can leverage the collective knowledge and expertise of multiple institutions, improving the accuracy and generalizability of the models. Additionally, the confidentiality and privacy of patient data are preserved because the data stays on local devices or organizations. Overall, federated learning shows potential for enhancing deep learning models’ generalizability, speed, and privacy in the context of medical picture analysis ( 108 ). As these technologies continue to evolve, it will be important to ensure that they are being used safely, ethically, and in accordance with relevant laws and regulations.

7.8.5. Integration with electronic health records (EHRs)

Future development in deep learning algorithms for medical image analysis will focus on integration with electronic health records (EHRs). EHRs contain a wealth of clinical information, including patient demographics, medical history, laboratory results, and imaging studies. Researchers and clinicians may be able to increase the precision and effectiveness of medical image analysis by merging deep learning algorithms with EHRs. One potential application of this integration is to improve the interpretation of medical images by incorporating patient-specific information from EHRs. For example, deep learning algorithms could be trained to predict the likelihood of certain diseases or conditions based on a patient’s clinical history, laboratory results, and imaging studies. This may decrease the need for invasive or pricey diagnostic procedures and increase the accuracy of medical picture interpretation. Using deep learning algorithms to automatically extract data from medical photos and incorporate it into EHRs is a further possible use ( 109 ). For example, deep learning algorithms could be trained to automatically segment and measure lesions or tumors in medical images and record this information in the patient’s EHR. This may decrease the need for invasive or pricey diagnostic procedures and increase the accuracy of medical picture interpretation. Using deep learning algorithms to automatically extract data from medical photos and incorporate it into EHRs is a further possible use. This may lessen the strain on physicians and increase the effectiveness and precision of clinical decision-making. Overall, deep learning algorithm integration with EHRs shows potential for enhancing the precision, efficacy, and efficiency of medical picture processing. It will be crucial to make sure that these technologies are utilized safely, morally, and in line with all applicable laws and regulations regarding patient privacy and data security as they continue to advance ( 110 ).

7.8.6. Few-shots learning

Future research in Medical Image Analysis using DL algorithms should delve into the realm of Few-shot Learning. This approach holds great potential for scenarios where labeled data is limited or difficult to obtain, which is often the case in medical imaging ( 111 ). Investigating techniques that enable models to learn from a small set of annotated examples, and potentially even adapt to new, unseen classes, will be instrumental. Meta-learning algorithms, which aim to train models to quickly adapt to new tasks with minimal data, could be explored for their applicability in medical image analysis. Additionally, methods for data augmentation and synthesis specifically tailored for few-shot scenarios could be developed. By advancing Few-shot Learning in the context of medical imaging, we can significantly broaden the scope of applications, improve the accessibility of AI-driven healthcare solutions, and ultimately enhance the quality of patient care ( 112 ).

8. Conclusion and limitation

In recent years, there has been significant progress in medical image analysis using deep learning algorithms, with numerous studies highlighting the effectiveness of DL in various areas like cell, bone, tissue, tumor, vessel, and lesion segmentation. However, as the field continues to evolve, further research is essential to explore new techniques and methodologies that can improve the performance and robustness of DL algorithms in image analysis. Comprehensive evaluations of DL algorithms in real-world scenarios are needed, along with the development of scalable and robust systems for healthcare settings. Continuing research in this area is imperative to fully utilize the potential of DL in medical image segmentation and enhance healthcare outcomes. This article presents a systematic review of DL-based methods for image analysis, discussing advantages, disadvantages, and the strategy employed. The evaluation of DL-image analysis platforms and tools is also covered. Most papers are assessed based on qualitative features, but some important aspects like security and convergence time are overlooked. Various programming languages are used to evaluate the proposed methods. The investigation aims to provide valuable guidance for future research on DL application in medical and healthcare image analysis. However, the study encountered constraints, including limited access to non-English papers and a scarcity of high-quality research focusing on this topic. The heterogeneity in methodologies, datasets, and evaluation metrics used in the studies presents challenges in drawing conclusive insights and performing quantitative meta-analysis. Additionally, the rapidly evolving nature of DL techniques and the emergence of new algorithms may necessitate frequent updates to remain current. Despite these limitations, DL has proven to be a game-changing approach for addressing complex problems, and the study’s results are expected to advance DL approaches in real-world applications.

Data availability statement

Author contributions.

ML: Investigation, Writing – original draft. YJ: Investigation, Writing – review & editing. YZ: Investigation, Supervision, Writing – original draft. HZ: Investigation, Writing – original draft.

Funding Statement

The author(s) declare that no financial support was received for the research, authorship, and/or publication of this article.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

 
what is appropriate in image processing in science and what is not. It also shows how best practices in handling images intersects with other best practices.

for best practices in image processing, with illustrating each guideline. and a to teach how to avoid them through understanding the reasons for the guidelines. including an interactive that shows how, when best practices in image processing, mentoring, and authorship are used, the entire research group benefits, and a . of not conforming to best or even marginally good practices. , including a with a journal editor.

students and faculty members to help use and encourage best practices for promoting research integrity in their research groups. The site is also intended for researchers and administrators at all levels to help to teach best practices for research integrity among students and colleagues alike.

     
  Browse by Video Content  
     
     
 
   
: Learn how the image guidelines intersect with best practices in mentoring and authorship. : Illustrating the guidelines for appropriately handling images. : Suggestions for how to encourage best practices on a university-wide basis.
     
  Browse by Sections  
     
     
 
Guidelines

. See the guidelines demonstrated in .

', BGCOLOR, '#EFEFEF', BORDERCOLOR, '#CCCCCC', BORDERWIDTH, '2', WIDTH, '-250', PADDING, '10')" onmouseout="UnTip()">
  Questionable Practices

, such as:

Learn how to recognize QUESTIONABLE PRACTICES and yourself on your knowledge.

', BGCOLOR, '#EFEFEF', BORDERCOLOR, '#CCCCCC', BORDERWIDTH, '2', WIDTH, '-250', PADDING, '10')" onmouseout="UnTip()">
  Case Study

and a handout for facilitating live group discussions.

Viewing the case will help you to learn how the image guidelines intersect with best practices in mentoring and authorship. Answering the questions will enable you to practice your ethical deliberation skills in a safe environment.

', BGCOLOR, '#EFEFEF', BORDERCOLOR, '#CCCCCC', BORDERWIDTH, '2', WIDTH, '-250', PADDING, '10')" onmouseout="UnTip()">


Misconduct Cases

.

', BGCOLOR, '#EFEFEF', BORDERCOLOR, '#CCCCCC', BORDERWIDTH, '2', WIDTH, '-250', PADDING, '10')" onmouseout="UnTip()">
  The Continuum

is taught. How best practices can be encouraged university-wide is explained in a with a journal editor.

', BGCOLOR, '#EFEFEF', BORDERCOLOR, '#CCCCCC', BORDERWIDTH, '2', WIDTH, '-250', PADDING, '10')" onmouseout="UnTip()">
 

case study for image processing

| |

case study for image processing

Request for Quote

Image Processing Case Study

Let’s look at the transportation industry-based case of extensive image processing.

Two video cameras were looking at the boxes moving fast on the conveyor belt. To provide high enough image resolution the cameras were placed close to the belt but they could not cover all the belt cross-section. They were placed on the sides of the belt, and could see parts of the boxes. Customer wanted good images of the texture on the top of the boxes, so the images from the two cameras needed to be stitched.

Two cameras see the same object at different angles and distances. Before merging the images from the different cameras the images must be transformed from the coordinate systems of the cameras to one common coordinate system, and placed in one common plane in XYZ space. Our developed software performed transformation automatically, based on the known geometry of the camera positions relative to the conveyor belt.

Still, after such transformation, multi-megapixel grayscale images from the left and the right cameras are shifted in common plane relative to each other:

case study for image processing

Here grayscale images from the two cameras are shown in false color. The scale on the right demonstrates the relation between 8-bit pixel signal strength and the false color. We see that the two images also have different brightness.

Our algorithms adjust the brightness and shift the images from the left and right cameras to make merging of two images into one image possible. The resulting combined image is shown using different choice of false colors:

case study for image processing

Right image pixels are shown using magenta, and the left image pixels are shown using green color.

Here is the zoomed version of the overlap region of the stitched image:

case study for image processing

If the stitching would be perfect, then in the overlap region all the pixels would be gray. Our engineer saw that while there are small fringes of color on the edges of black digits and stripes, the overall stitching accuracy is good. This is not trivial, as stitching of the images obtained by different cameras, looking at nearby object from different angles, is not easy.

For comparison, here is an example of a  not very successful stitching:

case study for image processing

Avantier Inc.’s engineering  team with over 30 years of experience  developed software for the customer to perform  all the necessary transformations automatically, without any operator intervention.

GREAT ARTICLE!

Share this article to gain insights from your connections!

  • Aerospace and Defense (23)
  • AR/MR/VR (15)
  • Automotive (16)
  • Consumer (10)
  • Industrial (26)
  • Life Science (29)
  • Medical (33)
  • Security & Surveillance (24)
  • Application Note (29)
  • Aspheric Lens (5)
  • Customized Imaging Lens (6)
  • IR lens (3)
  • Large Optics (2)
  • Micro Optics (2)
  • Microlens Arrays (1)
  • Microscope Objective Lens (7)
  • OAP Mirrors (1)
  • Optical Domes (2)
  • Reverse Engineering (2)
  • Scan Lens (1)
  • Knowledge Center (63)
  • Technical Article (55)
  • Custom Optics (70)
  • Image Processing (14)
  • Metrology (13)
  • Optical Design (26)
  • Optical Engineering (20)
  • Optical Lens Assembly (42)
  • Opto-Mechanical Design (14)
  • Rapid Optical Prototyping (4)
  • Reverse Optical Engineering (6)
  • Uncategorized (6)

Image analysis and LSTM methods for forecasting surficial displacements of a landslide triggered by snowfall and rainfall

  • Original Paper
  • Open access
  • Published: 16 August 2024

Cite this article

You have full access to this open access article

case study for image processing

  • Yuting Liu 1 ,
  • Lorenzo Brezzi   ORCID: orcid.org/0000-0002-1077-8874 1 ,
  • Zhipeng Liang 2 ,
  • Fabio Gabrieli 1 ,
  • Zihan Zhou 3 &
  • Simonetta Cola 1  

56 Accesses

Explore all metrics

Landslide-prone areas, predominantly located in mountainous regions with abundant rainfall, present unique challenges when subject to significant snowfall at high altitudes. Understanding the role of snow accumulation and melting, alongside rainfall and other environmental variables like temperature and humidity, is crucial for assessing landslide stability. To pursue this aim, the present study focuses first on the quantification of snow accumulated on a slope through a simple parameter obtained with image processing. Then, this parameter is included in a slope displacement prediction analysis carried out with long short-term memory (LSTM) neural network. By employing image processing algorithms and filtering out noise from white-shown rocks, the methodology evaluates the percentage of snow cover in RGB images. Subsequent LSTM forecasts of landslide displacement utilize 28-day historical data on rainfall, snow, and slope movements. The presented procedure is applied to the case of a deep-seated landslide in Italy, a site that in winter 2020–2021 experienced heavy snowfall, leading to significant snow accumulation on the slope. These episodes motivated a study aimed at forecasting the superficial displacements of this landslide, considering the presence of snow both at that time and in the following days, along with humidity and temperature. This approach indirectly incorporates snow accumulation and potential melting phenomena into the model. Although the subsequent winters were characterized by reduced snowfall, including this information in the LSTM model for the period characterized by snow on the slope demonstrated a dependency of the predictions on this parameter, thus suggesting that snow is indeed a significant factor in accelerating landslide movements. In this context, detecting snow and incorporating it into the predictive model emerges as a significant aspect for considering the effects of winter snowfall. The method aims to propose an innovative strategy that can be applied in the future to the study of the landslide analyzed in this paper during upcoming winters characterized by significant snowfall, as well as to other case studies of landslides at high altitudes that lack precise snow precipitation recording instruments.

Similar content being viewed by others

case study for image processing

Identification and analysis of long-term changes in river deltas and riparian zones using time-series multispectral data

case study for image processing

Modelling of snow and glacier melt dynamics in a mountainous river basin using integrated SWAT and machine learning approaches

case study for image processing

Urban ground subsidence monitoring and prediction using time-series InSAR and machine learning approaches: a case study of Tianjin, China

Avoid common mistakes on your manuscript.

Introduction

Landslide disasters are widely distributed throughout the world (Kirschbaum et al. 2009 ). Some landslides occur within reservoirs and are influenced by long-term fluctuations in water level (Zou et al. 2023 ; Ye et al. 2024a ). Some are reactivated and triggered due to earthquakes and deformed by rainfall (Sassa et al. 2005 ; Yin et al. 2009 ). In the Alps region of Europe, snow, temperatures, and humidity pose challenges to the stability of landslides during transitional seasons characterized by a warming climate (Guzzetti 2000 ). Although rainfall is generally the main natural agent triggering landslides, in high-altitude countries and regions, snowfall and snowmelt are essential factors in considering the study of geological hazards (Has et al. 2012 ; Bajni et al. 2021 ). Therefore, identifying the individual contributions of various hydrometeorological factors to accelerated movements is key to landslide research (Jakob et al. 2006 ; Ye et al. 2024b ). In particular, landslides in frozen areas have been closely related to various meteorological and hydrological phenomena; winter blizzards, soil freeze-thaw cycles, and snowmelt water infiltration play pivotal roles (Subramanian et al. 2017 ; Hinds et al. 2021 ). Snowfall and accumulation on a slope can generate an additional load on the ground and induce instability. However, since temperatures are generally below zero, the additional resistance due to the soil freezing can partially compensate for the increased load (Harris et al. 2009 ). In addition, the thaw that occurs when temperatures rise in spring or on warmer days can generate water infiltration that cannot be easily accounted for in the hydrological balance of a slope.

Previous studies have reported evidence that anomalous spring and landslide events induced by snowmelt are clearly linked to warming (Durand et al. 2009 ; Saez et al. 2013 ; Xian et al. 2022 ). The important role of the dynamics controlling the atmosphere-surface interactions, which affected snowmelt processes and soil freezing-thawing cycles, is evidenced by Subramanian et al. ( 2020 ) and Xian et al. ( 2022 ). Also, Matsuura et al. ( 2003 ) considered meteorological factors, particularly snowmelt, in their landslide research and confirmed that snowmelt influences landslide displacement. The hydrological behavior of landslides, considering both snowpack and meteorological factors, has also been studied by Osawa et al. ( 2017 ) and Okamoto et al. ( 2018 ). They found that the pore water pressure response to the snow loading is strongly affected by the sliding mass permeability, which in turn is influenced by seasonal changes. Furthermore, Osawa et al. ( 2024 ) proposed a simplified semi-empirical hydrological model for landslide pressure response, aiming to predict the short-term response of pore pressure to rainfall and/or snowmelt inputs.

A comprehensive consideration of interrelated factors affecting the thermal-hydraulic-mechanical behavior of a landslide is indispensable for understanding and predicting landslide occurrences in frozen terrains. However, quantifying snowfall or snowmelt is challenging due to monitoring methods and the stochastic nature of the environment. Martelloni et al. ( 2012 ) have proposed an integration of a simple snow accumulation-melting model (SAMM) with landslide warning systems that utilize empirical rainfall thresholds. This model overcomes the gap between physically based models and the empirical temperature index models based on the traditional pluviometer monitoring combined with temperature data. Accordingly, snow accumulation is considered and integrated into a regional-scale early warning system that relies on statistical rainfall thresholds to predict landslide occurrences in the Emilia Romagna region (Italy) with the SAMM model. Another study has been carried out on a mountain basin nearby Champoluc in the Val D’Aosta region in Italy (Panzeri et al. 2022 ). The objective of evaluating the influence of snowmelt and precipitation on the development of shallow landslides was achieved by statistically comparing in situ meteorological observations with laboratory analyses. Cutting-edge weather and snowpack stations were utilized to conduct comprehensive analyses of snowmelt and meteorological data. The snowmelt plus atmospheric conditions such as temperature and humidity were integrated to intuitively observe the interaction between snow and soil. Chiarelli et al. ( 2023 ), instead, examined the case of the Tartano basin in Lombardia (Italy). There, the landslide susceptibility prediction accuracy improved by 5% when involving snowmelt factor obtained with a traditional mechanical model that mimics the triggering mechanism of shallow landslides. Although the research studies demonstrated the significant influence of snow on landslide kinematics and some of them suggested and implemented monitoring methods and quantitative measures to consider the snow effects, they were unable to assess a standard methodology for considering the snow’s influence on the dynamics of a landslide or on a forecasting tool.

Moreover, even if many scholars indicated that the snowmelt affected the landslide stability more than snow accumulation, snowmelt remains difficult to be allowed for landslide assessment because it cannot be directly measured by using traditional equipment, such as the heated rain gauge. Moving from the basic idea that the snow melting is related to the variation of snow accumulated on the slope surface, the authors suggest a procedure based on image processing for evaluating the actual amount of snow present on landslides and we try to apply this method on the case study of Sant’Andrea landslide, a deep-seated landslide in the North-East of Italy. This slope is chosen because, in addition to traditional monitoring tools, there is a terrestrial digital photographic system collecting RGB images of the slope for 3 years with the aim of using them as a monitoring dataset.

The use of photogrammetry within the monitoring procedure for landslide is a strategy that has already proven effective both on the Sant’Andrea landslide and on another Italian landslide (Gabrieli et al. 2016 ; Brezzi et al. 2020 , 2021b ). In these previous applications, the images were processed through some image processing and digital correlation algorithms finalized to reconstruct the 2D displacement field on the image plane and subsequently to project it onto the three-dimensionality of the scene. Broadly speaking, the applications of photogrammetry monitoring methods for landslides are increasing (Pan 2018 ). There are significant breakthroughs in multi-view photogrammetry, leading to the development of a novel category of algorithms and software tools that enable enhanced automation in surface reconstruction, feature detection, and displacement monitoring. These cutting-edge techniques potentially offer topographic information to be used in geoscience applications, all while drastically reducing costs compared to some traditional methods such as topographic and laser scanning surveys (Stumpf et al. 2015 ). Moreover, the efficacy of integrating digital photogrammetry with geological data to enhance the characterization and comprehension of landslide mechanisms is noticeable. Consequently, this knowledge can contribute to the identification and formulation of effective mitigation strategies (Laribi et al. 2015 ).

When analyzing large numbers of images, the algorithm must be efficient and swift, based on determining simple yet descriptive quantities. For this reason, to precisely measure the volume change in snow accumulation and melting, a dimensionless indicator called Pixel Volume Index (PVI) is here introduced to accurately represent variations in snow cover. The concept of Pixel Volume (PV) was initially employed to achieve video deblurring and the state-of-the-art performance both quantitatively and qualitatively (Son et al. 2021 ). Here, PVI is utilized and refined to quantify the snow volume, aiming to clearly depict the actual condition of snow presence on the slope. The model, which integrates photogrammetry monitoring with deep learning using PVI as a link, holds significant potential and development space for predicting landslide displacement.

Next, once the quantification of the snow on the slope has been obtained, it is then used, together with other quantities such as temperature, humidity, and rainfall, as input of a model that forecasts the landslide displacements. Deep learning (DL) is an effective method for forecasting in many domains including geohazards (LeCun et al. 2015 ; Mondini et al. 2023 ; Nava et al. 2023 ; Liu et al. 2024 ). Even more recently, it has also been introduced in the detection and evaluation of landslide susceptibility (Feizizadeh et al. 2021 ; Sameen et al. 2020 ). Within DL, convolutional neural networks (CNNs) are most frequently used and behave the best in the image feature extraction (Bishop 1995 ). Therefore, they are frequently used for processing remote-sensing image dataset (Prakash et al. 2020 ; Ngo et al. 2021 ). However, unlike image data, matching time series data for use by CNNs can be difficult. To solve this, Teza et al. ( 2022 ) processed the time series data and transformed them into figures, in the shape of scalograms, with dimensions of 224 × 224 × 3. In this way, they were able to use a CNN to extract features and to obtain successful predictions of the surficial displacements of the Sant’Andrea landslide.

However, to establish a multi-factor landslide prediction model, CNNs can no longer undertake this task due to the resolution limitations of images. Thus, another DL method, the long short-term memory (LSTM) neural network, is introduced and applied to landslides located in the Three Gorges reservoir (Xu and Niu 2018 ; Yang et al. 2019 ). As a type of recurrent neural network (RNN) (Medsker and Jain  1999 ), LSTM can capture long-term dependencies in various time series data (Van Houdt et al. 2020 ), leveraging contextual information while establishing mappings between input and output sequences (Hochreiter and Schmidhuber 1997 ). Moreover, the architecture of LSTM can be modified and extended according to the requirements of the task. In practical applications, multiple LSTM layers can be stacked to enhance the model’s representational capacity (Graves 2012 ).

In the study here presented, in consideration of many involved factors such as rainfall, snow (PVI), temperature, and humidity, different combinations and training durations are attempted to test and improve the accuracy of LSTM predictions of the Sant’Andrea landslide’s movement.

Methodology

This methodology integrates several approaches from different fields for photogrammetric computation and machine learning. In the section on landslide monitoring, the primary focus is on the collection and pre-processing of image monitoring and geodetic measurement data. Within the early warning part, image processing techniques, machine learning, and deep learning are effectively combined into a comprehensive system. As depicted in Fig.  1 , initial data collection involves several monitored parameters of the landslide, including rainfall, temperature, humidity, and displacement (yellow circle). The directly detected quantities can also include the images acquired by the photographic system aimed at monitoring the slope. In order to quantify the snow and generate usable data related to the snow quantity on the slope, since this cannot be directly monitored, image data from monitoring is compiled and processed to compute a Pixel Volume Index (PVI) for snow quantification (green boxes). All these quantities, image datasets, and elaboration results, directly collected and then processed, are included in the yellow dashed circle. Lastly, all the collected data are used as input into a customized LSTM neural network for the identification of the most suitable approach for predicting future displacements (red boxes).

figure 1

Early warning system considering snow. The design is composed of three subsystems: monitored quantities are represented in the yellow circle; image recognition algorithms for snow estimation are shown in green; and the deep learning forecasting system is depicted in red shapes

Pixel volume index

Due to the absence of dedicated snow monitoring devices, in the present work, the quantification of snow cover is determined using the Pixel Volume Index (PVI), a parameter that quantifies the current presence of snow on the slope (see brief explanation below). Figure  2 a represents a schematic scene to be processed, whereas Fig.  2 b and c represent two important features necessary for snow quantification: an estimation of the snow thickness proportion and a percentage evaluation of the base area occupied by snow, detected on a representative patch of the image. These two quantities, named S T and S C respectively, allow the definition of a variable, denoted as snow PVI, obtained by directly multiplying S T and S C .

figure 2

a Example of scene to process; b , c schematic diagrams for the quantification of the PVI components (parameters: A C , masked base area of snow; A T , masked thickness area of snow; S C , snow coverage; S T , snow thickness)

The processes of obtaining snow coverage S C and snow thickness S T follow two distinct paths, each made up of several steps. When utilizing image analysis tools as a foundation, the first two specific areas must be masked, named A C and A T , as depicted in Fig.  2 , respectively, for the evaluation of S C and S T .

A first approximation of snow coverage estimation could be obtained by counting the number of white pixels present in the masked area A C . Nonetheless, when analyzing the image dataset, different levels of exposure or lighting of the photo, the cloudy or sunny condition of the area, and different colors in different seasons make the identification of the level of “white” contained in the pixels ambiguous. The direct consequence of this is an unreliable quantification of snow coverage. Furthermore, if there are other white objects on the scene, such as rocks and walls, these could hinder the code, resulting in a false recognition of the snow and consequent erroneous quantification of the S C parameter. To overcome these problems, before counting the number of white pixels in the A C area, a specific algorithm is implemented to categorize the type of image in terms of illumination and overall coloration. For each image category, then, specific thresholds are searched for identifying the RGB contents which indicate the presence of “white” corresponding to snow and “white” corresponding to other objects. The algorithm used both for identifying the type of image and for recognizing the “white” corresponding to snow is based on the analysis of the red and blue contents of the pixels in the cropped area A C . In particular, it is possible to plot the information contents relating to all the pixels included in A C in a graph having red/blue in the X-axis and red in the Y-axis. Once these quantities are plotted, the algorithm can easily search for the location of the centroid of that group of graphed points. In fact, the position of the centroid allows to identify the type of image in terms of lighting and color. The threshold values and the categorization of the image types to be distinguished must be appropriately calibrated on the basis of the monitored scene and the variability of the predominant environmental conditions.

After distinguishing the different types of images, for each of them, a methodology must be calibrated for recognizing the pixels representing snow and those representing walls or rocks (Fig.  3 ). This calculation is then automatically performed across all the RGB images collected. Following this method, it is effective to distinguish image tones as well as snow and white-shown rocks on the slope. Once the number of pixels covered by snow is recognized, the percentage of snow coverage can be calculated by dividing the obtained number by the total number of pixels included in A C .

figure 3

Separation of snow images from images of white-shown rocks on the base of characteristic RGB patterns

The second parameter used for identification aims at quantifying the thickness of snow present at the scene. Once S C is high, in fact, it becomes no longer able to express the accumulation of snow on the ground. For this reason, it is decided to add the second parameter, S T , which expresses the snow thickness. To attain this, a second area A T must be masked, identifying a part of the image in which a clear accumulation process is observable, such as house roofs, fence walls, or other objects. In this area, snow thickness is designed to be the ratio of snow pixels to the total masked area and is defined to be a dimensionless value as well as snow coverage. Ultimately, the product of snow coverage ( S C ) and snow thickness ( S T ) is computed to be a dimensionless variable representing snow as PVI.

Long short-term memory neural network

In this work, long short-term memory (LSTM) neural network is selected to conduct a forecasting mission due to its wide range of uses and ability to handle various types of data. LSTMs are a type of recurrent neural network architecture designed to better capture and remember long-term dependencies in sequential data, especially nonlinear relations among datasets (Fan et al. 2021 ). In contrast to conventional neural networks, which process input data linearly from start to finish, LSTMs have a recurrent connection that enables them to maintain a hidden state or a memory of past inputs as layer 1 and layer 2 indicate in Fig.  4 . This capacity to remember or forget some information makes them particularly adept at tasks involving sequential data or time series (Hochreiter and Schmidhuber 1997 ). The choice of the number of LSTM layers is determined based on the size of the dataset and the length of the sequences. Increasing the number of layers enhances data processing capacity. Given that this study involves five parameters including temperature, snow, rainfall, humidity, and displacement rate over a time span of 3 years, two LSTM layers are used in testing various configurations. Subsequently, displacement is predicted as the output value.

figure 4

Structure of the adopted LSTM network: temperature ( T ), snow ( S ), rainfall ( R ), humidity ( H ), and displacement rate ( D ) are the data used as input in layer 0, layer 1, and layer 2 which are LSTM layers and layer 3 is the output layer

To gain a precise understanding of the factors influencing landslides, starting from the traditional model coupling only rainfall and displacement (RD model) rate, other combinations of factors are here considered, such as rainfall-snow-displacement (RSD) rate, rainfall-temperature-displacement (RTD) rate, rainfall-humidity-displacement (RHD) rate, rainfall-snow-temperature-displacement (RSTD) rate, and rainfall-snow-temperature-humidity-displacement (RSTHD) rate models. All this is required in order to explore displacement rate trends under different conditions and, finally, select the optimal model.

In order to assess the influence of snowmelt on landslide deformation and the reliability of the early warning model, this paper employs three statistical indicators which are mean squared error (MSE), root mean squared error (RMSE), and mean absolute error (MAE). Among them, MSE weights the square of errors during calculation, making it more sensitive to large errors. This means that in the presence of outliers or anomalies, MSE can be influenced by these values, potentially causing the model to be overly responsive to outliers. MAE treats all errors equally and does not magnify the impact of outliers. RMSE falls between these two as it weights the square of errors but is still influenced by larger errors (Brassington  2017 ; Karunasingha 2022 ). Therefore, both RMSE and MAE can offer a more comprehensive perspective, aiming in evaluating the model’s performance across various scenarios.

Landslide description

The Sant’Andrea landslide is situated in the Dolomites area of the Northern-Eastern Italian Alps and poses a significant risk to nearby residents (Fig.  5 ). Its estimated volume is around 60,000 m 3 , located in an area of about 72,000 m 2 . If the unstable mass collapsed, it could potentially block the Boite river, leading to flooding of the nearby Perarolo di Cadore hamlet and the area downstream (Brezzi et al. 2021a ). Extensive geological and geotechnical studies have been conducted to understand the characteristics of the landslide area. The landslide consists of a 30-m-thick layer of clay-calcareous debris, composed of heterogeneous materials with varying grain sizes and geotechnical properties. It slides over weathered bedrock comprised of dolomitic lithology and folded layers rich in anhydrides and gypsum.

figure 5

a Location of Sant’Andrea landslide in North-East of Italy (46° 23′ 57″N, 12° 21′ 20″E). b Aerial view of Sant’Andrea landslide. c Front view of the main scarp

The geological and hydrogeological investigations revealed two groundwater flow systems within the landslide area: a shallow system in the upper layers of the debris deposits and a deep system involving the upper part of the bedrock, which contains altered and fractured gypsum. Water plays a crucial role in slope instability, accelerating displacements during intense and prolonged rainfall events and causing slow displacements during dry periods through active water circulation.

The interaction between water and gypsum components of the bedrock and debris layers influences the mechanical properties of the rock mass. Hydration processes result in the plastic rheology of the weak gypsum lithology, as well as the development of karst cavities and microcracks.

Landslide monitoring

Monitoring of Sant’Andrea landslide has been carried out since 2013 using a topographic system with reflective targets installed inside and outside the unstable area. It has provided valuable data on cumulative displacement over time, helping to understand the behavior of the landslide and identify areas of significant displacement. Since 2013, the target configuration has been adapted several times after some episodes in which slope volumes close to the main scarp collapsed, the last of which occurred in June 2021. Figure  6 a illustrates the arrangement of monitoring stations before and after this event. A natural collapse occurring in early June produced the detachment of about 8000 m 3 from the front face (Fig.  6 b), while on June 25, 2021, a professional team conducted a controlled blasting of about 5000 m 3 on another unstable volume to remove it and mitigate the risk for the downstream area. Following these interventions, based on monitoring data, the landslide appears to gradually reduce its displacement rate, even if not completely stabilized. This is clearly evidenced in Fig.  6 , where the displacements developed in a 2-month interval before and after the controlled blasting are superimposed on the target maps.

figure 6

Positions of topographic targets and colored distribution of the displacement accumulated during the periods a April–May 2021 and b August–September 2021. The arrows represent the horizontal displacements accumulated at each specific target

In addition to the topographic monitoring, in May 2021 a photogrammetric system has been put into official use for monitoring the superficial deformation. The photographic system consists of three Canon EOS 1300D cameras installed on the other side of the Boite valley just at a distance of around 350 m from the Sant’Andrea landslide. Every 15 min, the system prompts the cameras to take pictures and then uploads them to an FTP (File Transfer Protocol) server. On the FTP server, all images can be viewed and downloaded as needed. However, some images captured during the night, heavy fog, or periods of intense rainfall and snowfall, due to their inherent visual disturbances, cannot be effectively utilized. They can only be regarded as references.

Snow quantification algorithms

Snow data is the only data that cannot be directly measured and obtained through a monitoring probe or device. Currently, in fact, there are no related devices installed on the Sant’Andrea landslide that can quantitively monitor snowfall. Therefore, obtaining snow measurements indirectly through the collected images has become the strategy in this research, and the snow recognition algorithms have been developed and applied to it.

To this aim, we purposefully chose a target area according to two criteria. Firstly, we excluded areas densely covered by vegetation or persistently shrouded in shade, as these factors could introduce bias and obscure our findings. Secondly, the area must be correctly oriented with respect to the camera to have an accurate quantification of its cover. Following these criteria, we selected an optimal study area on the right side of the slope as Fig.  7 a indicates, which contains 124,206 pixels. Subsequently, the centroid which represents the center of a set of a finite number of selected points (Berele & Catoiu, 2018 ; Abdi 2009 ) of each image is calculated in Fig.  7 b. This was necessary because the image tone has a strong interference on recognition and quantification of snow presence. In fact, as later explained, the recognition process is based on the color composition of the selected area, which can vary significantly during the same day (morning, noon, and evening) as well as under different weather conditions (sunny, snowy, and cloudy weather). During different moments in the day and weather conditions, the images can exhibit a mainly bluish, yellowish, reddish, or adequate sunlight, as Fig.  8 shows. Consequently, this variance impacts the RGB values of the object. In general, the brighter the color of an image, the higher the RGB values (Riehle et al. 2020 ). This leads to RGB values in images captured on sunny days being higher than those taken on cloudy and rainy/snowy weather. Similarly, RGB values for images with a diffuse snow coverage are higher than those without snow coverage. This necessitates the establishment of distinct thresholds for quantifying snow, including snow coverage rate and snow coverage thickness.

figure 7

a Selected area for quantifying snow (pink colored area on the left side of the image). b Inside the selected area, some points are indicated for the centroid individuation (right coordinate system); “×” represents the centroid of these 124,206 pixels

figure 8

Image data in four different tones without and with snow on the slope. a Bluish image with rock. b Yellowish image with rock. c Special brightness image with rock. d Sunlight image with rock. e Bluish image with snow. f Yellowish image with snow. g Special brightness image with snow. h Sunlight image with snow

Generally, blue-toned images are mostly taken during the early morning and evening at sunrise or sunset, and yellow-toned images are shot during daytime, with cloudy or rainy weather. In addition, weak sunlight can also cause a yellowish tint in the image. Furthermore, it is speculated that during the periods of seasonal transition (spring or fall), some cyclones originating from the south to southwest may carry sand and dust from the Sahara Desert, leading to images with shades of yellow and special brightness tones. On the contrary, when cyclones originate from the north or west, devoid of sand and dust, the color tones change entirely based on the varying intensity of light. Finally, sunlight exposure refers to the condition of direct sunlight on extremely sunny days. Therefore, the image tone differentiation was deciphered using a process based on centroid clustering.

The masking areas in images with different tones exhibit different centroid distributions. Centroids of the target area in each image are calculated by means of the K-mean clustering method (Likas et al. 2003 ). The images will be classified into different tones’ collection based on the position of the centroid. Figure  9 shows the data related to a test set which includes 50 images, manually selected and categorized to be representative of the four different tones: each point represents the red value vs. red/blue ratio determined by the centroid of the target area in each image. In the same plot, the subdivisions among the four categories are drawn and implemented in an algorithm to divide the entire dataset (4369 images), consisting of 1298 images collected in winter.

figure 9

Range of images’ centroid distribution in different tones

The next step in the process of snowfall identification involves identifying the presence of exposed, white rocks on the slope, as shown in Fig.  10 b and c. These rocks introduce significant interference in the recognition work. In fact, the algorithm first adopted for snow recognition identified white-shown rocks as snow. To solve this error, the RGB values of all the photos were extracted and analyzed. These data are reported in a plot (Fig.  11 ) where the X-axis represents the ratio of red value to blue value, while the Y-axis represents the red value. Of course, one R/B vs. R plot for each group of photos needed to be constructed with different color tones. The points appear grouped in two clusters: data referring to the green group represents images showing mostly snow with a few visible rocks, while the gray group comprises conditions where there is no snow at all. The pixel data of the two groups are well separated, making it possible to establish a criterion for effectively distinguishing areas covered by snow from white-shown rocks. The threshold curves were generated accordingly by manually selecting three calibration points and calculating the fitting curve through them (Burton-Johnson and Wyniawskyj  2020 ). In Fig.  11 , the thresholds were manually set as either bluish, yellowish, sunlight, or special brightness. Note that the images with relatively complete snow coverage before and after snowfall are predominantly in blue tones: in this case, the distinction between snow and white rocks is quite evident. Additionally, most images acquired during the winter include rocks that are not entirely covered by snow.

figure 10

Separation of snow from exposed white rocks: a panorama of landslide with partial snow cover. b Exposed white rocks on the slope seen after zooming in. c The actual snow seen after zooming in

figure 11

Separation of snow images from images of white-shown rocks on the basis of characteristic RGB patterns. Thresholds for a bluish images, b yellowish images, c sunlight images, and d special brightness images

Therefore, within the algorithm, different thresholds are set for snowfall detection in different tonal images to ensure the separation between snowfall and white-shown rocks, ultimately identifying the snow coverage in the selected area. We consider the selected area as representative of the whole image, and the recognition results are represented by a coverage percentage.

To quantify the snow presence with accuracy, in addition to coverage percentage, another essential parameter is snow thickness. To achieve this objective, considering the appropriate recognition views and places where it is possible to discern clear boundaries between snow-covered and non-covered areas, the most suitable place to quantify the snow thickness appears to be the top of a retaining wall located at the landslide rear (Fig.  12 a). This is because the snow covering the top of this wall is clearly visible and it aligns with the shooting angle, providing a more realistic representation of the snow depth. As Fig.  12 indicates, on the top of the retaining wall (Fig.  12 b), a rectangular region of 10 × 40 pixels (Fig.  12 c) was selected for snow thickness quantification. Due to the absence of interference from white-shown rocks on the slope, the extracted rectangular area is converted into a binary image (Liu et al. 2012 ). Then, to identify the boundary between the snow and non-snow zone in the binary image, another criterion is used, for which the boundary localizes a difference of 45 or more in the gray values between the snowy and non-snowy areas. Using this method, as shown in the binary image of Fig.  12 c, two distinct boundaries ensue: the upper one represents the boundary between the snow cover and the background behind it, while the lower one represents the boundary between the snow cover and the retaining wall. Finally, after calculating the proportion of snow pixels in each column of the rectangular area, the average proportion of ten columns is used to represent the thickness of the snow, and it is also a dimensionless value as snow coverage. As a result, the thickness of the snow can be identified.

figure 12

Extraction of snow thickness in pixel unit from the image: a overall image. b Detail of the overall image showing the retaining wall at the landslide rear. c Selected 10 × 40 pixels area for determination of the snow thickness

Finally, a new parameter named Pixel Volume Index (PVI) is introduced to visualize the amount of snow. Its definition is:

where S C is the snow coverage, representing the percentage of selected area occupied by snow, and S T is the thickness proportion of snow as explained in Fig.  13 . Among all the collected data images, the data with the thickest snow is considered as 100%, and when there is no snow, the thickness is 0%. All other results are expressed as a proportionate percentage of the maximum value after recognition. S C and S T are dimensionless indicators as well as PVI whose variation over time can quantify the trends of snow accumulation or melting.

figure 13

Illustration of snow thickness under different conditions (Image 1 shows a little snow, image 2 exhibits a large amount of snow, while image 3 indicates no snow)

Snow data visualization

The images captured with the photogrammetric monitoring over 3 years including three winters are collected and used as input dataset. In total, the dataset contains 4369 images.

To quantify the landslide displacement rate, we selected monitoring stations located within the undamaged yellow deformation area in Fig.  6 b, i.e., the portion of landslide exhibiting medium displacement rates. Among these, the GPS1 station is considered the most representative since it presents a time series data more regular because it is the less susceptible to effects induced by wind and passage of animals.

The quantitative results of snow PVI, combined with daily rainfall, temperature, humidity, and displacement rate data, are displayed together in Fig.  14 , showing their relationship. The snow PVI appropriately captures the changes in snow cover on the Sant’Andrea landslide as highlighted by the dashed boxes. In the northern Italian Alps region, winter 2020–2021 produced heavy snowfalls: two substantial snows occurred on December 28, 2020, and on January 2, 2021, during the period of the most intense landslide deformation. In Fig.  14 , the snow PVI accurately depicts the accumulation and melting of these two intense snowfalls, while also evidencing the temporal relationship between snowfall, snowmelt, and landslide movement. The snow increased the gravitational load on the landslide and, on the other hand, the melting snow infiltrated on the landslide, thus disrupting the stress balance and causing accelerated deformation. Unfortunately, this effect is not completely determinative because the displacement rate is surely affected also by the large amount of rain that also occurred in the same period. During the winter of 2021–2022, there was only one snowfall, concentrated on December 9, 2021. Its entity was less than the previous year and of shorter duration. Additionally, there were minor rains and a slight acceleration of landslide activity. Finally, the winter of 2022–2023 was particularly warm with temperatures well above the mean values of the area for that period and, as a result, there was virtually no snow.

figure 14

Snow PVI related to temperature, humidity, and rainfall in the period from March 1, 2020, to June 30, 2023

It is important to note that the landslide exhibits high acceleration not only due to snow. As already demonstrated by Teza et al. ( 2022 ), rain is one of the most important factors controlling the mobility of this landslide. But Fig.  14 clearly evidences that also snow has an impact on landslides, which cannot be ignored when analyzing the deformation of the Sant’Andrea landslide.

Displacement forecasting and causal analysis

In order to accurately investigate the influence of snowfall on landslide displacement, various combinations of meteorological parameters are utilized in relation to the displacement rate. Here, R , S , T , H , and D respectively represent daily rainfall, snow PVI, air temperature, air humidity, and surficial displacement rate of the GPS1 monitoring station. Different configurations share the same data segmentation that consists of 373 days (from November 1, 2020, to November 7, 2021) used for training and 266 days (from November 8, 2021, to July 30, 2022) used in the forecasting. The testing of the model is conducted within a range of several combinations, among which some configurations, such as RTD and RHD, consider only temperature and humidity variables associated with rain, and others consider rain in association with snow and temperature (RSTD) or all the factors together (RSTHD). Totally, six combinations are considered.

Figure  15 compares the tests of displacement rates predicted in the forecasting period by the six different configurations. Table 1 lists the values of corresponding reliability indicators. Considering both Fig.  15 and Table  1 , it is observed that RSTHD predicted the highest and most unrealistic values for the displacement rate; consequently, it reaches the worst evaluation among all the configurations. RTD and RHD also yield unrealistic results, even if lower than those of RSTHD. In contrast, RD and RSD closely align with the measured displacement rate and exhibit favorable evaluation parameters.

figure 15

Comparison of forecasting tests’ results of different configurations (original represents the real displacement rate, while R , S , T , H , and D are rainfall, snow PVI, temperature, humidity, and displacement rate, respectively)

Notably, configurations including snow, such as RSD and RSTD, show an increase in displacement rate following the variation of snow on the slope, the behavior that seemed to occur during winter 2020–2021 and can be theoretically justified. In fact, it is commonly thought that snow melt increases water infiltration and groundwater pressure at the sliding surface, consequently changing the momentum balance of the landslide. The result here obtained successfully demonstrates that snow is one of the factors influencing landslide displacement, but it might be that the weight assigned to snow is too high.

The same consideration can be tested on temperature and humidity: their presence raises the displacement rate too high in the second period. We have attributed the reason for this to the learning pattern of deep learning neural networks. The essence of deep learning neural networks lies in the regression analysis between data, discerning relationships between different data through learning from input data and calculating weights and bias for mutual influences among data points. In this case study, unlike rainfall and snow where values can be zero at times, the values of temperature and humidity are always non-zero and span a wide range. This leads to their consistent presence, and they are therefore assigned a certain weight in the model throughout the learning process when the temperature is very low, thereby elevating the outcome. This also indicates that temperature and humidity, as indirect factors influencing landslide displacement, are not suitable for being incorporated into the Sant’Andrea displacement forecasting models.

The outcomes suggest that rainfall (RD) is still the most important factor which we should consider firstly; then, the configuration RSD and RSTD curves underwent a delayed displacement change after the snow event. Therefore, the preliminary analysis suggests that snow has an impact on the deformation of the landslide.

Furthermore, when focusing on the two configurations with the best results, RD and RSD, the comparison of the trend of the two curves is unexplainable, since RSD consistently remains higher than RD even after the snow events. However, upon analyzing Figs. 14 and 15 , it becomes apparent that after the collapse of June 2021 and the subsequent induced blasting, the landslide motion gradually stabilized toward a quite constant rate. It is natural to think that these events (natural and induced collapses) produced a significant change in the kinematic mechanism of Sant’Andrea landslide, thus resulting in a lack of alignment between RD and RSD predictions along the same level.

Considering the reasons above, the dataset was expanded and the landslide was studied in two phases. Based on the kinematic movement pattern, July 1, 2021, is regarded as a time point that divides the timeline, thereby segmenting the evolution of the landslide into two stages. The first stage spans from March 1, 2020, to June 30, 2021, while the second stage encompasses the period from July 1, 2021, to June 30, 2023. Both stages are utilized to investigate the influencing factors on the landslide and for forecasting purposes, applying only the LSTM model for RD and RSD configurations.

The first stage lasted for 487 days, with 275 days used for the training set and 212 days utilized as the forecasting set. The obtained results are shown in Fig.  16 . According to the results, it can be observed that in the first stage, snow variation during winter accelerated the landslide deformation, even if the predictive outcomes do not reach the peak values observed in reality: this indicates that although the model considers both rainfall and snowmelt could predict the landslide acceleration, it could not accurately forecast its displacement magnitude. The predictions based on rainfall are significantly lower than the actual values. Therefore, snow is also an important factor that accelerates landslide deformation in winter.

figure 16

Forecasting results of 1st period with original displacement rate

The reliability indicators of two LSTM models for this stage (Table  2 ) indicate a relatively poor reliability of both the models, even if the correspondence of predicted values with measured ones according to the trends of Fig.  16 seems effective. This is due to the fact that even if the trend seems well described, especially by the RSD model, there are large differences between the predicted and the original data in correspondence of the peak, which push the reliability indicators toward higher values. In any case, in comparison to the RD configuration, the RSD configuration demonstrated prediction performance that is closer to reality, in particular with regard to the identification of the displacement peak in February 2021. Although the forecasting was slightly delayed and the indicators were not very good, this directly demonstrates that in the first phase, snowmelt played a crucial role in accelerating the displacement rate of the landslide.

As regards the second stage, it spanned 730 days, with 488 days used for the training set and 242 days allocated for the forecasting set. The predicted displacement rates are illustrated in Fig.  17 , while Table  3 summarizes the reliability indicators. On the basis of the outcomes, it can be inferred that in the second stage, the landslide noticeably exhibits smaller deformations and strongly reduced sensitivity to rainfall. Moreover, due to the extremely limited snowfall during the winter of 2023, with only two minor snow events, the snow does not exert a significant impact on the landslide. Nevertheless, the model still proves applicable for predictions in the second stage, with RD and RSD results being quite similar, both satisfying the landslide forecasting requirements. Further exploration depends on future research involving substantial snowfall events that can be utilized.

figure 17

Forecasting results of 2nd period with original displacement rate

Table 3 manifests that both RD and RSD exhibited a fundamentally consistent trend and basically identical evaluation parameters, with the differences between each evaluation parameter being less than 0.01. Both can be used for predicting and assessing future trends. However, due to the notably warmer winter from 2022 to 2023, resulting in minimal snowfall, the significance of snowmelt in the deformation of Sant’Andrea landslide during the second stage could not be properly explored. Nonetheless, for future years with abundant snowfall, model validation and optimization can continue to be applied. Furthermore, similar experiments and exploration can be conducted on landslides situated in higher latitude areas with abundant snowfall.

Finally, to better understand and quantify the impact of snow on landslide movements, a Spearman correlation analysis was conducted to examine the relationship between snow PVI and displacement rate. Figure  18 a presents the correlation coefficient for the first period when snow was relatively abundant, while Fig.  18 b shows the correlation coefficient for the second period. In the first period, the correlation coefficient is 0.6, indicating a moderate level of correlation between snow PVI and displacement rate (moderate correlation typically ranges from 0.3 to 0.7). In the second period, due to reduced snow in the warmer season, the correlation coefficient decreases to 0.36, but still falls within the moderate correlation range. The significance of the correlation is assessed using a p -value, with a p -value less than 0.05 indicating that the observed correlation is statistically significant and unlikely to be due to random chance. Overall, these findings suggest a statistically significant, albeit moderate, correlation between snow PVI and displacement rate, underlining the importance of considering snow PVI in forecasting landslide movements.

figure 18

Spearman correlation analysis between snow PVI and landslide displacement rate (0.6 and 0.36 are correlation coefficients; a smaller p -value (**) indicates that the observed correlation is statistically significant)

This study combines photogrammetry monitoring and LSTM methods to analyze and predict the displacement rate of landslides. Photogrammetry is specifically used to evaluate the presence of snow on the slope, with the aim of also considering the influence of snow accumulation and melting on the landslide displacement, which is analyzed using LSTM. In previous research, snowfalls have not often been considered comprehensively as rainfalls, and, generally in time series data analysis, rainfall becomes the primary factor in displacement analysis (Brezzi et al.  2021b ; Teza et al. 2022 , and Nava et al. 2023 ). However, in high-latitude and high-altitude regions, snowfall occupies a significant duration in winter and is one of the factors that cannot be ignored for landslide study. Additionally, in this study, meteorological factors such as temperature and humidity are considered to complement the analysis of snowfall and snowmelt, providing a valuable exploration of conditions when snow occurs. The paper reveals that the variations of snow can accelerate the displacement rate of landslides to some extent. Also, a moderate positive correlation between the amount of snow and the displacement rate is shown.

From the observation carried out in the first snowy winter, the snow is an important factor affecting the landslide and modifying its stability. Unfortunately, the following winter provided just a few snowfalls, so the relationship between snow and movement does not appear to be so evident. Moreover, the kinematic mode of the Sant’Andrea landslide changed around July 2021 after both the natural and man-induced collapse. The former mode shows large displacement sensitive to rainfall and snow while the latter mode suggests being gradually stable. According to the kinematic movement pattern, July 1, 2021, is treated as a point that divides the timeline, thus splitting the landslide evolution into two stages. The first stage is from March 1, 2020, to June 30, 2021, while the second stage is from July 1, 2021, to June 30, 2023. Both are used to explore the affected factors of landslide and for forecasting. In the first stage, there were significant displacements, and the evaluation parameters for RD deviated notably. Among them, the MAE, MSE, and RMSE reached 5.08, 132.52, and 11.51, respectively. The RSD combination performed even better than RD, with values of 4.67, 89.42, and 9.46. The predictions differed considerably from the actual values at the most intense deformation points, yet simultaneously demonstrated the indispensable role of snow in accelerating landslide displacement. The supposed snow melting process increases the availability of water, which infiltrates the soil and rock layers, thereby raising pore water pressure within the landslide mass. This increase in pore water pressure reduces the effective stress, decreasing the shear strength of the slope materials and making them more prone to movement. The cyclic nature of freezing and thawing also contributes to soil and rock weakening over time, exacerbating deformation. Therefore, it is observed that after the snow thaws, it will gradually infiltrate downwards and exert an influence on landslide deformation. As the infiltration process obviously has a certain lag effect, the maximum snowpack and the maximum landslide deformation are obviously not synchronized. However, in the second stage, there was minimal and stable displacement variation. The predictive evaluation parameters MAE, MSE, and RMSE for both RD and RSD were all less than 1, showing consistent predictive accuracy. In both stages, humidity and temperature decrease the quality of the forecast and do not seem to be significant parameters for this case. As for correlation coefficients of snow PVI and displacement rate within the moderate range, the fact that snow PVI remains at 0 except during winter reversely impacts the correlation level. Therefore, this result supports a significant association between the two variables. Overall, these processes combined make snowmelt a critical factor in the temporal and spatial dynamics of landslide activity, particularly in regions with significant seasonal snow cover.

The basic segment of the model utilizes photogrammetry monitoring, which is a convenient, economical, and direct method. Additionally, with the newly described procedure, the presence of snow has been vividly reconstructed, and its quantification, visualized as a numerical parameter, is introduced into the LSTM network as data from the past 28 days to predict the value for the next day. The LSTM forecast model obtained in this manner demonstrates a relatively straightforward data-driven approach, enabling effective landslide prediction with promising results. However, the suggested method may encounter some difficulties in practical application. For example, in landslides located in remote mountains, problems could arise with monitoring, even when carried out with traditional tools. In our case, for instance, three cameras recording images of the slope were installed from three different positions. Due to inadequate maintenance during the winter, in some periods, we only received monitoring images from one camera. This reduces the possibility for accurate monitoring. Another aspect that complicated the application of LSTM models was the occurrence of natural and man-induced collapses, which significantly altered the kinematic mechanism of the analyzed landslide. This necessitated dividing the monitoring period into two or more stages for separate analysis. The overall dataset had to be subdivided into two parts relative to the periods before and after the collapses in June 2021. Consequently, the data series were reduced and may vary significantly. In the examined case, it was evident that the rainfall-snow configuration yielded the most accurate results in the first period, while both rainfall and rainfall-snow configurations performed well in the second period, particularly as snowfall reduced in intensity and frequency. From an operational perspective, one possible drawback is that the pixel volume index is a value that can reflect the process of snowfall and melting, but when the snowfall is very intense, even if it is possible to capture images from three cameras, the reference points for quantification of snow thickness are covered by snow. This poses a significant challenge to the reconstruction of the real snow accumulation process.

Due to the reasons above, the proposed method would be useful and convenient in specific circumstances. For instance, the snowfall amount should not be too low. Since the melting and infiltration of snow take longer than rainfall of equivalent intensity, snow is generally not considered a triggering factor for landslides. In winter 2022, for example, the warm weather rendered the effect of snowfall negligible. Additionally, this method is best applied to natural landslides that have not been disturbed by human activities, as human interventions may affect the accuracy of prediction models. Finally, for the collection of image data, the landslide location should be relatively accessible to facilitate maintenance. Apart from the mentioned limitations, this method offers a cost-effective and convenient approach for landslide monitoring and early warning.

Ultimately, it is important to emphasize that, in future research, it would be necessary to incorporate groundwater table data into this prediction model to investigate the thermo-hydraulic coupling process of landslide deformation. Unfortunately, in this case, no hydraulic data were available to delve into the physical mechanism of the snow melting phenomenon.

The method proposed in this paper offers a new approach for addressing landslides in snow-covered regions. To summarize the role of snow in the Sant’Andrea landslide, in the first stage, both rainfall and snowfall impact the landslide displacement rate, albeit with a delay, as it takes time for the rainfall and snowmelt to infiltrate into the soil. Rainfall consistently emerges as the most significant and frequent factor affecting the Sant’Andrea landslide year by year. However, during winter, snowmelt also plays an indispensable role in influencing deformation. Furthermore, a moderate positive correlation exists between snow amount and landslide displacement rate. This highlights a mixed level of significance of snowmelt in the Alpine region of northern Italy.

Moreover, the integration of two techniques enables the incorporation of snow into the prediction model. Using photogrammetric measurement techniques to monitor the Sant’Andrea landslide is advantageous due to its economic, convenient, and real-time transmission characteristics. Additionally, in the absence of dedicated snow monitoring devices on the slope, the image-based snow quantification procedure can be applied to reconstruct the snowfall and melting processes, thereby providing a more comprehensive explanation for landslide deformation.

Regardless, 3 years constitute a relatively short observational period, which is insufficient for drawing definitive conclusions. Despite the scarcity of snow limiting its intensive application in LSTM modeling, the preliminary results presented here are promising for the proposed method of incorporating snow influence. This method is innovative and straightforward, with potential applications in future observations during subsequent winters in Perarolo di Cadore or in other possible sites.

Data Availability

The meteorological data presented in this study can be freely downloaded from https://www.ambienteveneto.it . Other data will be made available on reasonable request.

Berele A, Catoiu S (2018) Bisecting the perimeter of a triangle. Math Mag 91(2):121–133. https://doi.org/10.1080/0025570X.2017.1418589

Article   Google Scholar  

Abdi H (2009) Centroids. Wiley Interdisciplinary Reviews: Computational Statistics 1(2):259–260. https://doi.org/10.1002/wics.31

Bajni G, Camera CAS, Apuani T (2021) Deciphering meteorological influencing factors for Alpine rockfalls: a case study in Aosta Valley. Landslides 18:3279–3298. https://doi.org/10.1007/s10346-021-01697-3

Bishop CM (1995) 1995. Oxford University Press, New York, Neural networks for pattern recognition

Google Scholar  

Brassington G (2017) Mean absolute error and root mean square error: which is the better metric for assessing model performance? In EGU General Assembly Conference Abstracts, p 3574

Brezzi L, Carraro E, Pasa D, Teza G, Cola S, Galgaro A (2021a) Post-Collapse Evolution of a Rapid Landslide from Sequential Analysis with FE and SPH-Based Models. Geosciences 11(9):364. https://doi.org/10.3390/geosciences11090364

Brezzi L, Gabrieli F, Cola S, Lorenzetti G, Spiezia N, Bisson A, Allegrini M (2020) Digital terrestrial stereo-photogrammetry for monitoring landslide displacements: a case study in Recoaro Terme (VI). Geotechnical Research for Land Protection and Development. CNRIG 2019. Lecture Notes in Civil Engineering 40:155–163. https://doi.org/10.1007/978-3-030-21359-6_17

Brezzi L, Vallisari D, Carraro E, Teza G, Pol A, Liang Z, Gabrieli F, Cola S, Galgaro A (2021b) Digital terrestrial photogrammetry for a dense monitoring of the surficial displacements of a landslide. Eurock (2021b) IOP Conference Series: Earth and Environmental Science, Volume 833. Mechanics and Rock Engineering, from Theory to Practice, Turin, Italy. https://doi.org/10.1088/1755-1315/833/1/012145

Burton-Johnson A, Wyniawskyj NS (2020) Rock and snow differentiation from colour (RGB) images. The Cryosphere Discuss [preprint].  https://doi.org/10.5194/tc-2020-115

Chiarelli DD, Galizzi M, Bocchiola D, Rosso R, Rulli MC (2023) Modeling snowmelt influence on shallow landslides in Tartano valley. Italian Alps Sci Total Environ 856:158772. https://doi.org/10.1016/j.scitotenv.2022.158772

Article   CAS   Google Scholar  

Durand Y, Laternser M, Giraud G, Etchevers P, Lesaffre B, Mérindol L (2009) Reanalysis of climate in the French Alps (1958–2002). J Appl Meteorol Clim 48:429–449. https://doi.org/10.1175/2008JAMC1808.1

Fan D, Sun H, Yao J, Zhang K, Yan X, Sun Z (2021) Well production forecasting based on ARIMA-LSTM model considering manual operations. Energy 220:119708. https://doi.org/10.1016/j.energy.2020.119708

Feizizadeh B, Garajeh MK, Lakes T, Blaschke T (2021) A deep learning convolutional neural network algorithm for detecting saline flow sources and mapping the environmental impacts of the Urmia Lake drought in Iran. CATENA 207:105585. https://doi.org/10.1016/j.catena.2021.105585

Gabrieli F, Corain L, Vettore L (2016) A low-cost landslide displacement activity assessment from time-lapse photogrammetry and rainfall data: application to the Tessina landslide site. Geomorphology 269:56–74. https://doi.org/10.1016/j.geomorph.2016.06.030

Graves A (2012) Long short-term memory. In: Supervised sequence labelling with recurrent neural networks. Studies in Computational Intelligence, vol 385. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24797-2_4

Guzzetti F (2000) Landslide fatalities and the evaluation of landslide risk in Italy. Eng Geol 58:89–107. https://doi.org/10.1016/S0013-7952(00)00047-8

Has B, Noro T, Maruyama K, Nakamura A, Ogawa K, Onoda S (2012) Characteristics of earthquake-induced landslides in a heavy snowfall region—landslides triggered by the northern Nagano prefecture earthquake, March 12, 2011, Japan. Landslides 9:539–546. https://doi.org/10.1007/s10346-012-0344-6

Harris C, Arenson LU, Christiansen HH, Etzelmuller B, Frauenfelder R, Gruber S, Haeberli W, Vonder Muhll D (2009) Permafrost and climate in Europe: monitoring and modelling thermal, geomorphological and geotechnical responses. Earth Sci Rev 92(3–4):117–171. https://doi.org/10.1016/j.earscirev.2008.12.002

Hinds ES, Lu N, Mirus BB, Godt JW, Wayllace A (2021) Evaluation of techniques for mitigating snowmelt infiltration-induced landsliding in a highway embankment. Eng Geol 291:106240. https://doi.org/10.1016/j.enggeo.2021.106240

Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735

Jakob M, Holm K, Lange O, Schwab JW (2006) Hydrometeorological thresholds for landslide initiation and forest operation shutdowns on the north coast of British Columbia. Landslides 3:228–238. https://doi.org/10.1007/s10346-006-0044-1

Karunasingha DSK (2022) Root mean square error or mean absolute error? Use their ratio as well. Inf Sci 585:609–629. https://doi.org/10.1016/j.ins.2021.11.036

Kirschbaum DB, Adler R, Hong Y, Hill S, Lerner-Lam A (2009) A global landslide catalog for hazard applications: method, results, and limitations. Nat Hazards 52:561–575. https://doi.org/10.1007/s11069-009-9401-4

Laribi A, Walstra J, Ougrine M, Seridi A, Dechemi N (2015) Use of digital photogrammetry for the study of unstable slopes in urban areas: case study of the El Biar landslide, Algiers. Eng Geol 187:73–83. https://doi.org/10.1016/j.enggeo.2014.12.018

LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–444. https://doi.org/10.1038/nature14539

Likas A, Vlassis N, Verbeek J (2003) The global k-means clustering algorithm. Pattern Recogn 36:451–461. https://doi.org/10.1016/S0031-3203(02)00060-2

Liu L, Zhang Q, Wei X (2012) A RGB image encryption algorithm based on DNA encoding and chaos map. Comput Electr Eng 38:1240–1248. https://doi.org/10.1016/j.compeleceng.2012.02.007

Liu YT, Teza G, Nava L, Chang ZL, Shang M, Xiong DB, Cola S (2024) Deformation evaluation and displacement forecasting of baishuihe landslide after stabilization based on continuous wavelet transform and deep learning. Nat Hazards. https://doi.org/10.1007/s11069-024-06580-7

Martelloni G, Segoni S, Lagomarsino D, Fanti R, Catani F (2012) Snow accumulation-melting model (SAMM) for integrated use in regional scale landslide early warning systems. Hydrol Earth Syst Sci Discuss 9:9391–9423. https://doi.org/10.5194/hess-17-1229-2013

Matsuura S, Asano S, Okamoto T, Takeuchi Y (2003) Characteristics of the displacement of a landslide with shallow sliding surface in a heavy snow district of Japan. Eng Geol 69(1–2):15–35. https://doi.org/10.1016/S0013-7952(02)00245-4

Medsker L, Jain LC (1999) Recurrent neural networks: design and applications. CRC Press

Book   Google Scholar  

Mondini AC, Guzzetti F, Melillo M (2023) Deep learning forecast of rainfall-induced shallow landslides. Nat Commun 14:2466. https://doi.org/10.1038/s41467-023-38135-y

Ngo PTT, Panahi M, Khosravi K, Ghorbanzadeh O, Kariminejad N, Cerda A, Lee S (2021) Evaluation of deep learning algorithms for national scale landslide susceptibility mapping of Iran. Geosci Front 12(2):505–519. https://doi.org/10.1016/j.gsf.2020.06.013

Nava L, Carraro E, Reyes-Carmona C, Puliero S, Bhuyan K, Rosi A, Monserrat O, Floris M, Meena SR, Galve JP, Catani F (2023) Landslide displacement forecasting using deep learning and monitoring data across selected sites. Landslides. https://doi.org/10.1007/s10346-023-02104-9

Okamoto T, Matsuura S, Larsen JO, Asano S, Abe K (2018) The response of pore water pressure to snow accumulation on a low-permeability clay landslide. Eng Geol 242:130–141. https://doi.org/10.1016/j.enggeo.2018.06.002

Osawa H, Matsuura S, Matsushi Y, Okamoto T (2017) Seasonal change in permeability of surface soils on a slow-moving landslide in a heavy snow region. Eng Geol 221:1–9. https://doi.org/10.1016/j.enggeo.2017.02.019

Osawa H, Matsushi Y, Matsuura S, Okamoto T (2024) Semiempirical modeling of the transient response of pore pressure to rainfall and snowmelt in a dormant landslide. Landslides 21:245–256. https://doi.org/10.1007/s10346-023-02158-9

Pan B (2018) Digital image correlation for surface deformation measurement: historical developments, recent advances and future goals. Meas Sci Technol 29:082001. https://doi.org/10.1088/1361-6501/aac55b

Panzeri L, Mondani M, Taddia G, Papini M, Longoni L (2022) Analysis of snowmelt as a triggering factor for shallow landslide. International Multidisciplinary Scientific GeoConference: SGEM 22(1.1):77–83. https://doi.org/10.5593/sgem2022/1.1/s02.009

Prakash N, Manconi A, Loew S (2020) Mapping landslides on EO data: performance of deep learning models vs. traditional machine learning models. Remote Sens 12(3):346. https://doi.org/10.3390/rs12030346

Riehle D, Reiser D, Griepentrog HW (2020) Robust index-based semantic plant/background segmentation for RGB- images. Comput Electron Agric 169:105201. https://doi.org/10.1016/j.compag.2019.105201

Sameen MI, Pradhan B, Lee S (2020) Application of convolutional neural networks featuring Bayesian optimization for landslide susceptibility assessment. CATENA 186:104249. https://doi.org/10.1016/j.catena.2019.104249

Saez JL, Corona C, Stoffel M, Berger F (2013) Climate change increases frequency of shallow spring landslides in the French Alps. Geology 41(5):619–622. https://doi.org/10.1130/G34098.1

Sassa K, Fukuoka H, Wang FW, Wang GH (2005) Dynamic properties of earthquake-induced large-scale rapid landslides within past landslide masses. Landslides 2:125–134. https://doi.org/10.1007/s10346-005-0055-3

Son H, Lee J, Lee J, Cho S, Lee S (2021) Recurrent video deblurring with blur-invariant motion estimation and pixel volumes. ACM Transactions on Graphics (TOG) 40(5):1–18. https://doi.org/10.1145/3453720

Stumpf A, Malet JP, Allemand P, Pierrot-Deseilligny M, Skupinski G (2015) Ground-based multi-view photogrammetry for the monitoring of landslide deformation and erosion. Geomorphology 231:130–145. https://doi.org/10.1016/j.geomorph.2014.10.039

Subramanian SS, Fan X, Yunus AP, Van Asch T, Scaringi G, Xu Q, Dai L, Ishikawa T, Huang R (2020) A sequentially coupled catchment-scale numerical model for snowmelt-induced soil slope instabilities. J Geophys Res Earth Surf 125(5):e2019JF005468. https://doi.org/10.1029/2019JF005468

Subramanian SS, Ishikawa T, Tokoro T (2017) Stability assessment approach for soil slopes in seasonal cold regions. Eng Geol 221:154–169. https://doi.org/10.1016/j.enggeo.2017.03.008

Teza G, Cola S, Brezzi L, Galgaro A (2022) Wadenow: a Matlab toolbox for early forecasting of the velocity trend of a rainfall-triggered landslide by means of continuous wavelet transform and deep learning. Geosciences 12(5):205. https://doi.org/10.3390/geosciences12050205

Van Houdt G, Mosquera C, Nápoles G (2020) A review on the long short-term memory model. Artif Intell Rev 53:5929–5955. https://doi.org/10.1007/s10462-020-09838-1

Xian Y, Wei XL, Zhou HB, Chen N, Liu Y, Liu F, Sun H (2022) Snowmelt-triggered reactivation of a loess landslide in Yili, Xinjiang, China: mode and mechanism. Landslides 19(8):1843–1860. https://doi.org/10.1007/s10346-022-01879-7

Xu S, Niu R (2018) Displacement prediction of Baijiabao landslide based on empirical mode decomposition and long short-term memory neural network in Three Gorges area, China. Comput Geosci 111:87–96. https://doi.org/10.1016/j.cageo.2017.10.013

Yang B, Yin K, Lacasse S, Liu Z (2019) Time series analysis and long short-term memory neural network to predict landslide displacement. Landslides 16:677–694. https://doi.org/10.1007/s10346-018-01127-x

Ye X, Zhu HH, Chang FN, Xie TC, Tian F, Zhang W, Catani F (2024a) Revisiting spatiotemporal evolution process and mechanism of a giant reservoir landslide during weather extremes. Eng Geol 332:107480. https://doi.org/10.1016/j.enggeo.2024.107480

Ye X, Zhu HH, Wang J, Zheng WJ, Zhang W, Schenato L, Pasuto A, Catani F (2024b) Towards hydrometeorological thresholds of reservoir-induced landslide from subsurface strain observations. Sci China Technol Sci. https://doi.org/10.1007/s11431-023-2657-3

Yin YP, Wang FW, Sun P (2009) Landslide hazards triggered by the 2008 Wenchuan earthquake, Sichuan, China. Landslides 6:139–152. https://doi.org/10.1007/s10346-009-0148-5

Zou ZX, Luo T, Zhang S, Duan HJ, Li SW, Deng YD, Wang J (2023) A novel method to evaluate the time-dependent stability of reservoir landslides: exemplified by Outang landslide in the Three Gorges Reservoir. Landslides 20:1731–1746. https://doi.org/10.1007/s10346-023-02056-0

Download references

Open access funding provided by Università degli Studi di Padova within the CRUI-CARE Agreement. The research was financially supported by the Fondazione Cariverona through the research grant titled “Monitoring of Natural Hazards and Protective Structures Using Computer Vision Techniques for Environmental Safety and Preservation” and by the Veneto Region through the grant “Scientific Support for the Characterization of Hydrogeological Risk and the Evaluation of the Effectiveness of Interventions Related to the Landslide Phenomenon of Busa del Cristo in Perarolo di Cadore (BL) through the Development of Predictive Geo-Hydrological Models”.

Author information

Authors and affiliations.

Department of Civil, Environmental and Architectural Engineering, University of Padua, Via Ognissanti, 39, 35131, Padua, Italy

Yuting Liu, Lorenzo Brezzi, Fabio Gabrieli & Simonetta Cola

School of Resources and Safety Engineering, Central South University, Changsha, 410083, China

Zhipeng Liang

Department of Civil Engineering, Tsinghua University, Beijing, 100084, China

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Lorenzo Brezzi .

Ethics declarations

Competing interest.

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Liu, Y., Brezzi, L., Liang, Z. et al. Image analysis and LSTM methods for forecasting surficial displacements of a landslide triggered by snowfall and rainfall. Landslides (2024). https://doi.org/10.1007/s10346-024-02328-3

Download citation

Received : 05 February 2024

Accepted : 18 July 2024

Published : 16 August 2024

DOI : https://doi.org/10.1007/s10346-024-02328-3

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Displacement forecasting
  • Landslide triggering
  • Images recognition
  • Smart monitoring
  • Find a journal
  • Publish with us
  • Track your research

Grab your spot at the free arXiv Accessibility Forum

Help | Advanced Search

Electrical Engineering and Systems Science > Image and Video Processing

Title: dft-based adversarial attack detection in mri brain imaging: enhancing diagnostic accuracy in alzheimer's case studies.

Abstract: Recent advancements in deep learning, particularly in medical imaging, have significantly propelled the progress of healthcare systems. However, examining the robustness of medical images against adversarial attacks is crucial due to their real-world applications and profound impact on individuals' health. These attacks can result in misclassifications in disease diagnosis, potentially leading to severe consequences. Numerous studies have explored both the implementation of adversarial attacks on medical images and the development of defense mechanisms against these threats, highlighting the vulnerabilities of deep neural networks to such adversarial activities. In this study, we investigate adversarial attacks on images associated with Alzheimer's disease and propose a defensive method to counteract these attacks. Specifically, we examine adversarial attacks that employ frequency domain transformations on Alzheimer's disease images, along with other well-known adversarial attacks. Our approach utilizes a convolutional neural network (CNN)-based autoencoder architecture in conjunction with the two-dimensional Fourier transform of images for detection purposes. The simulation results demonstrate that our detection and defense mechanism effectively mitigates several adversarial attacks, thereby enhancing the robustness of deep neural networks against such vulnerabilities.
Comments: 10 pages, 4 figures, conference
Subjects: Image and Video Processing (eess.IV); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
Cite as: [eess.IV]
  (or [eess.IV] for this version)
  Focus to learn more arXiv-issued DOI via DataCite

Submission history

Access paper:.

  • HTML (experimental)
  • Other Formats

license icon

References & Citations

  • Google Scholar
  • Semantic Scholar

BibTeX formatted citation

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

IMAGES

  1. (PDF) CASE STUDY:- IMAGE PROCESSING SYSTEM FOR LEATHER DESIGNING

    case study for image processing

  2. Case Study on Image Processing Service to a Customer- MAPSystems

    case study for image processing

  3. Some of the case study documents of the digital image processing course

    case study for image processing

  4. Image Processing Case Study

    case study for image processing

  5. Top 9 Digital Image Processing Projects Using Python with Source Code

    case study for image processing

  6. 【Case Study】What is the simple, easy and low-cost image processing

    case study for image processing

COMMENTS

  1. Deep learning models for digital image processing: a review

    Within the domain of image processing, a wide array of methodologies is dedicated to tasks including denoising, enhancement, segmentation, feature extraction, and classification. ... Identifying urban wetlands through remote sensing scene classification using deep learning: a case study of Shenzhen. China ISPRS Int J Geo-Inf 11:131. https://doi ...

  2. Computer Vision and Image Processing: A Beginner's Guide

    Case Studies: Integrating Computer Vision and Image Processing Smart City Projects. Traffic Management Systems: ... Image processing sets the stage by enhancing and transforming images, which are then interpreted and understood through computer vision. Together, they are revolutionizing industries such as healthcare, automotive, surveillance ...

  3. Practical Python and OpenCV: Learn Computer Vision in a ...

    Take a sneak peek at what's inside... Inside Practical Python and OpenCV + Case Studies you'll learn the basics of computer vision and OpenCV, working your way up to more advanced topics such as face detection, object tracking in video, and handwriting recognition, all with lots of examples, code, and detailed walkthroughs. Before you do anything else, take a look at the video to your left to ...

  4. (PDF) Advances in Artificial Intelligence for Image Processing

    Real-Time Case Study: AI for Image Processing in the Medical Field. Medical picture analysis and interpretation have been greatly enhanced by AI and image processing . approaches.

  5. Digital Image Processing for Medical Applications

    This repository contains assignments and projects related to various aspects of image processing, from basic operations to advanced techniques like active contours. Examples and case studies focus on applications in medical imaging.

  6. Image Processing on IOPA Radiographs: A comprehensive case study on

    1 . Abstract— With the recent advancements in Image Processing Techniques and development of new robust computer vision algorithms, new areas of research within Medical Diagnosis and Biomedical Engineering are picking up pace. This paper provides a comprehensive in-depth case study of Image Processing, Feature Extraction and Analysis of ...

  7. Machine learning for medical imaging: methodological failures ...

    A review of deep learning in medical imaging: Image traits, technology trends, case studies with progress highlights, and future promises. Proceedings of the IEEE1-19 (2020). Liu, X. et al.

  8. Case Study: Image Processing

    The most common ordering system is RGB: the first channel represents the red intensity of the pixel, the second channel represents green, and the third blue. 10 Using three bytes to represent each color gives us a much larger colorspace than greyscale images: 256 3 = 16, 777, 216 possible values for each pixel.

  9. A Survey of Medical Image Processing and its Applications

    Based on image processing methodologies, this article presents a comprehensive study on medical image analysis and its application on various forms of medical images. We also discuss the challenges that scholars face during successful execution and provide an overview of the ups and downs of existing algorithms. This case study highlights the ...

  10. PDF JOURNAL OF LA A Deep Decomposition Network for Image Processing: A Case

    for Image Processing: A Case Study of Visible and Infrared Image Fusion Yu Fu, Tianyang Xu, Xiao-Jun Wu, Abstract—Image decomposition into constituent components has many applications in the field of image processing. It aims to extract salient features from the source image for subsequent pattern recognition. In this paper, we propose a new ...

  11. Deep Decomposition Network for Image Processing: A Case Study for

    Image decomposition is a crucial subject in the field of image processing. It can extract salient features from the source image. We propose a new image decomposition method based on convolutional neural network. This method can be applied to many image processing tasks. In this paper, we apply the image decomposition network to the image fusion task. We input infrared image and visible light ...

  12. Image Processing Using Artificial Intelligence: Case Study on

    The ML-based classification approaches have been found effective on many benchmark RS datasets. This chapter explores the issues and challenges of image-processing techniques in the classification of high-dimensional RS datasets. It discusses the potential of AI/ML-based approaches by showcasing a case study on the classification of Airborne ...

  13. Machine Learning and Genetic Algorithms: A case study on image

    The image reconstruction problem as a case study. In this section, we describe the problem that was used to demonstrate the benefits of integrating machine learning and metaheuristics. The image reconstruction problem has several applications in image processing and can be used as a filter on an input image.

  14. Medical image analysis using deep learning algorithms

    Medical image analysis is a field of study that involves the processing, interpretation, and analysis of medical images . The emergence of deep learning algorithms has prompted a notable transformation in the field of medical image analysis, as they have increasingly been employed to enhance the diagnosis, treatment, and monitoring of diverse ...

  15. Online Learning Tool for Research Integrity and Image Processing

    a case study section including an interactive video case study that shows how, when best practices in image processing, mentoring, and authorship are used, the entire research group benefits, and a handout for live group discussion.

  16. Case Study: Image Processing

    Case Study: Image Processing. April 2020. DOI: 10.1007/978-3-030-36826-5_13. In book: Computational Frameworks for Political and Social Research with Python (pp.165-189) Authors: Josh Cutler. Matt ...

  17. Computation on Stochastic Bit Streams Digital Image Processing Case Studies

    Computation on Stochastic Bit Streams Digital Image Processing Case Studies Abstract: Maintaining the reliability of integrated circuits as transistor sizes continue to shrink to nanoscale dimensions is a significant looming challenge for the industry. Computation on stochastic bit streams, which could replace conventional deterministic computation based on a binary radix, allows similar ...

  18. (PDF) A Review on Image Processing

    Abstract. Image Processing includes changing the nature of an image in order to improve its pictorial information for human interpretation, for autonomous machine perception. Digital image ...

  19. PDF Digital Image Processing in Medical Applications: A Case Study

    also stored computer tuT image 41 Ullrasorti-veraphy is a technique 1..'\ which energy is used ta detect the stale of the ernal body organs. of' ultrasonic energy transmitted trom a piem-e o are nna.net13strie+.ive tr¿nsdueer through the and inlermal Strikes an inle.rtace between two tissues ol' dil'fi-tre.nl acaustieal impedance,

  20. 3D Image Processing Case Studies

    Contact Us 3D Image Processing Solutions. Learn about the applications of Simpleware software for a range of 3D image processing and model generation projects.

  21. Image Processing Case Study

    Image Processing Case Study. Let's look at the transportation industry-based case of extensive image processing. Two video cameras were looking at the boxes moving fast on the conveyor belt. To provide high enough image resolution the cameras were placed close to the belt but they could not cover all the belt cross-section. They were placed ...

  22. (PDF) Studies on application of image processing in ...

    1. Studies on application of image processing in vario us fields: An. overview. T Prabaharan, P Periasamy,V Mugendiran,Ramanan. 1 Research Scholar, St. Peter's Institute of Higher Education and ...

  23. An Industrial Case study on Deep learning image classification

    In this post, I am going to explain a end-to-end use case of deep learning image classification in order to automate the process of classifying defective and non-defective castings in foundry.

  24. Image analysis and LSTM methods for forecasting surficial ...

    By employing image processing algorithms and filtering out noise from white-shown rocks, the methodology evaluates the percentage of snow cover in RGB images. Subsequent LSTM forecasts of landslide displacement utilize 28-day historical data on rainfall, snow, and slope movements. ... In this case study, unlike rainfall and snow where values ...

  25. [2408.08489] DFT-Based Adversarial Attack Detection in MRI Brain

    Recent advancements in deep learning, particularly in medical imaging, have significantly propelled the progress of healthcare systems. However, examining the robustness of medical images against adversarial attacks is crucial due to their real-world applications and profound impact on individuals' health. These attacks can result in misclassifications in disease diagnosis, potentially leading ...