Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts

Image processing articles within Scientific Reports

Article 10 June 2024 | Open Access

Fast and robust feature-based stitching algorithm for microscopic images

  • Fatemeh Sadat Mohammadi
  • , Hasti Shabani
  •  &  Mojtaba Zarei

Article 09 June 2024 | Open Access

A deep image classification model based on prior feature knowledge embedding and application in medical diagnosis

  • , Jiangxing Wu
  •  &  Yihua Cheng

Article 07 June 2024 | Open Access

Estimation of the amount of pear pollen based on flowering stage detection using deep learning

  • , Takefumi Hiraguri
  •  &  Yoshihiro Takemura

Article 29 May 2024 | Open Access

Remote sensing image dehazing using generative adversarial network with texture and color space enhancement

  • , Tie Zhong
  •  &  Chunming Wu

A novel approach to craniofacial analysis using automated 3D landmarking of the skull

  • Franziska Wilke
  • , Harold Matthews
  •  &  Susan Walsh

Article 28 May 2024 | Open Access

Impact of functional electrical stimulation on nerve-damaged muscles by quantifying fat infiltration using deep learning

  • Kassandra Walluks
  • , Jan-Philipp Praetorius
  •  &  Marc Thilo Figge

Article 25 May 2024 | Open Access

An effective ensemble learning approach for classification of glioma grades based on novel MRI features

  • Mohammed Falih Hassan
  • , Ahmed Naser Al-Zurfi
  •  &  Khandakar Ahmed

Deep-learning segmentation to select liver parenchyma for categorizing hepatic steatosis on multinational chest CT

  • Zhongyi Zhang
  • , Guixia Li
  •  &  Xiangchun Liu

Article 24 May 2024 | Open Access

Automated thorax disease diagnosis using multi-branch residual attention network

  • Dongfang Li
  •  &  Shuya Chen

Article 23 May 2024 | Open Access

Automatic segmentation and classification of frontal sinuses for sex determination from CBCT scans using a two-stage anatomy-guided attention network

  • Renan Lucio Berbel da Silva
  •  &  Won-Jin Yi

Classification and identification of tea diseases based on improved YOLOv7 model of MobileNeXt

  • , Wenxia Yuan
  •  &  Baijuan Wang

Article 22 May 2024 | Open Access

Polyp segmentation based on implicit edge-guided cross-layer fusion networks

  • Junqing Liu
  • , Weiwei Zhang
  •  &  Qinghe Zhang

A bi-directional segmentation method for prostate ultrasound images under semantic constraints

  •  &  Chao Gao

A novel plant type, leaf disease and severity identification framework using CNN and transformer with multi-label method

  • , Mingwei Li
  •  &  Jianwu Wang

Article 21 May 2024 | Open Access

Bayesian decision based fusion algorithm for remote sensing images

  • , Xunyan Jiang
  •  &  Kai Liu

Applying oversampling before cross-validation will lead to high bias in radiomics

  • Aydin Demircioğlu

Svetlana a supervised segmentation classifier for Napari

  • Clément Cazorla
  • , Renaud Morin
  •  &  Pierre Weiss

Article 11 May 2024 | Open Access

Deep learning segmentation of non-perfusion area from color fundus images and AI-generated fluorescein angiography

  • Kanato Masayoshi
  • , Yusaku Katada
  •  &  Toshihide Kurihara

Shifting to machine supervision: annotation-efficient semi and self-supervised learning for automatic medical image segmentation and classification

  • Pranav Singh
  • , Raviteja Chukkapalli
  •  &  Jacopo Cirrone

Article 09 May 2024 | Open Access

A dual-branch selective attention capsule network for classifying kiwifruit soft rot with hyperspectral images

  • Zhiqiang Guo
  • , Yingfang Ni
  •  &  Yunliu Zeng

Article 07 May 2024 | Open Access

Long-term stimulation by implanted pacemaker enables non-atrophic treatment of bilateral vocal fold paresis in a human-like animal model

  • , Bianca Hoffmann
  •  &  Dirk Arnold

Article 30 April 2024 | Open Access

Segmentation of liver CT images based on weighted medical transformer model

  • , Hai Zhang
  •  &  Rui Wang

Article 24 April 2024 | Open Access

MRI radiomics in head and neck cancer from reproducibility to combined approaches

  • , Stefano Cavalieri
  •  &  Luca Mainardi

Article 22 April 2024 | Open Access

A comparative analysis of pairwise image stitching techniques for microscopy images

  • , Seyyed Erfan Mohammadi
  •  &  Hasti Shabani

Article 19 April 2024 | Open Access

Identification of CT radiomic features robust to acquisition and segmentation variations for improved prediction of radiotherapy-treated lung cancer patient recurrence

  • Thomas Louis
  • , François Lucia
  •  &  Roland Hustinx

Article 18 April 2024 | Open Access

Joint transformer architecture in brain 3D MRI classification: its application in Alzheimer’s disease classification

  • , Taymaz Akan
  •  &  Mohammad A. N. Bhuiyan

Article 16 April 2024 | Open Access

Automated quantification of avian influenza virus antigen in different organs

  • Maria Landmann
  • , David Scheibner
  •  &  Reiner Ulrich

Article 08 April 2024 | Open Access

A novel vector field analysis for quantitative structure changes after macular epiretinal membrane surgery

  • Seok Hyun Bae
  • , Sojung Go
  •  &  Sang Jun Park

Article 05 April 2024 | Open Access

Advanced disk herniation computer aided diagnosis system

  • Maad Ebrahim
  • , Mohammad Alsmirat
  •  &  Mahmoud Al-Ayyoub

Article 28 March 2024 | Open Access

Brain temperature and free water increases after mild COVID-19 infection

  • Ayushe A. Sharma
  • , Rodolphe Nenert
  •  &  Jerzy P. Szaflarski

Article 26 March 2024 | Open Access

High-capacity data hiding for medical images based on the mask-RCNN model

  • Hadjer Saidi
  • , Okba Tibermacine
  •  &  Ahmed Elhadad

Article 25 March 2024 | Open Access

Integrated image and location analysis for wound classification: a deep learning approach

  • , Tirth Shah
  •  &  Zeyun Yu

Article 21 March 2024 | Open Access

A number sense as an emergent property of the manipulating brain

  • Neehar Kondapaneni
  •  &  Pietro Perona

Article 16 March 2024 | Open Access

Lesion-conditioning of synthetic MRI-derived subtraction-MIPs of the breast using a latent diffusion model

  • Lorenz A. Kapsner
  • , Lukas Folle
  •  &  Sebastian Bickelhaupt

Article 14 March 2024 | Open Access

Dual ensemble system for polyp segmentation with submodels adaptive selection ensemble

  • , Kefeng Fan
  •  &  Kaijie Jiao

Article 11 March 2024 | Open Access

Generalizable disease detection using model ensemble on chest X-ray images

  • Maider Abad
  • , Jordi Casas-Roma
  •  &  Ferran Prados

Article 08 March 2024 | Open Access

Segmentation-based cardiomegaly detection based on semi-supervised estimation of cardiothoracic ratio

  • Patrick Thiam
  • , Christopher Kloth
  •  &  Hans A. Kestler

Article 05 March 2024 | Open Access

Brain volume measured by synthetic magnetic resonance imaging in adult moyamoya disease correlates with cerebral blood flow and brain function

  • Kazufumi Kikuchi
  • , Osamu Togao
  •  &  Kousei Ishigami

Article 04 March 2024 | Open Access

Critical evaluation of artificial intelligence as a digital twin of pathologists for prostate cancer pathology

  • Okyaz Eminaga
  • , Mahmoud Abbas
  •  &  Olaf Bettendorf

Computational pathology model to assess acute and chronic transformations of the tubulointerstitial compartment in renal allograft biopsies

  • Renaldas Augulis
  • , Allan Rasmusson
  •  &  Arvydas Laurinavicius

Opportunistic screening with multiphase contrast-enhanced dual-layer spectral CT for osteoblastic lesions in prostate cancer compared with bone scintigraphy

  • Ming-Cheng Liu
  • , Chi-Chang Ho
  •  &  Yi-Jui Liu

Article 02 March 2024 | Open Access

Reduction of NIFTI files storage and compression to facilitate telemedicine services based on quantization hiding of downsampling approach

  • Ahmed Elhadad
  • , Mona Jamjoom
  •  &  Hussein Abulkasim

Article 29 February 2024 | Open Access

Attention-guided jaw bone lesion diagnosis in panoramic radiography using minimal labeling effort

  • Minseon Gwak
  • , Jong Pil Yun
  •  &  Chena Lee

End-to-end multimodal 3D imaging and machine learning workflow for non-destructive phenotyping of grapevine trunk internal structure

  • Romain Fernandez
  • , Loïc Le Cunff
  •  &  Cédric Moisy

Article 27 February 2024 | Open Access

An improved V-Net lung nodule segmentation model based on pixel threshold separation and attention mechanism

  • , Handing Song
  •  &  Zhan Wang

Article 26 February 2024 | Open Access

Quantifying mangrove carbon assimilation rates using UAV imagery

  • Javier Blanco-Sacristán
  • , Kasper Johansen
  •  &  Matthew F. McCabe

Article 24 February 2024 | Open Access

Iterative pseudo balancing for stem cell microscopy image classification

  • Adam Witmer
  •  &  Bir Bhanu

Article 22 February 2024 | Open Access

Deep learning-based, fully automated, pediatric brain segmentation

  • Min-Jee Kim
  • , EunPyeong Hong
  •  &  Tae-Sung Ko

Article 21 February 2024 | Open Access

Correction of high-rate motion for photoacoustic microscopy by orthogonal cross-correlation

  • , Qiuqin Mao
  •  &  Xiaojun Liu

Article 20 February 2024 | Open Access

ERCP-Net: a channel extension residual structure and adaptive channel attention mechanism for plant leaf disease classification network

  •  &  Yannan Xu

Advertisement

Browse broader subjects

  • Computational biology and bioinformatics

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

image processing research paper

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

Topic Information

Participating journals, topic editors.

image processing research paper

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

Recent Trends in Image Processing and Pattern Recognition

Dear Colleagues,

The 5th International Conference on Recent Trends in Image Processing and Pattern Recognition (RTIP2R) aims to attract current and/or advanced research on image processing, pattern recognition, computer vision, and machine learning. The RTIP2R will take place at the Texas A&M University—Kingsville, Texas (USA), on November 22–23, 2022, in collaboration with the 2AI Research Lab—Computer Science, University of South Dakota (USA).

Authors of selected papers from the conference will be invited to submit extended versions of their original papers and contributions under the conference topics (new papers that are closely related to the conference themes are also welcome).

We, however, are not limited to RIP2R 2022 to increase the number of submissions.

Topics of interest include, but are not limited to, the following:

  • Signal and image processing .
  • Computer vision and pattern recognition : object detection and/or recognition (shape, color, and texture analysis) as well as pattern recognition (statistical, structural, and syntactic methods).
  • Machine learning : algorithms, clustering and classification, model selection (machine learning), feature engineering, and deep learning.
  • Data analytics : data mining tools and high-performance computing in big data.
  • Federated learning : applications and challenges.
  • Pattern recognition and machine learning for the Internet of things (IoT).
  • Information retrieval : content-based image retrieval and indexing, as well as text analytics.
  • Document image analysis and understanding.
  • Biometrics: face matching, iris recognition/verification, footprint verification, and audio/speech analysis as well as understanding.
  • Healthcare informatics and (bio)medical imaging as well as engineering.
  • Big data (from document understanding and healthcare to risk management).
  • Cryptanalysis (cryptology and cryptography).

Prof. Dr. KC Santosh Dr. Ayush Goyal Dr. Djamila Aouada Dr. Aaisha Makkar Dr. Yao-Yi Chiang Dr. Satish Kumar Singh Prof. Dr. Alejandro Rodríguez-González Topic Editors

Journal Name Impact Factor CiteScore Launched Year First Decision (median) APC
entropy 1999 20.8 Days CHF 2600
applsci 2011 16.9 Days CHF 2400
healthcare 2013 19.5 Days CHF 2700
jimaging 2015 21.7 Days CHF 1800
computers 2012 17.7 Days CHF 1800
BDCC 2017 18.2 Days CHF 1800
ai - 2020 20.8 Days CHF 1600

image processing research paper

  • Immediately share your ideas ahead of publication and establish your research priority;
  • Protect your idea from being stolen with this time-stamped preprint article;
  • Enhance the exposure and impact of your research;
  • Receive feedback from your peers in advance;
  • Have it indexed in Web of Science (Preprint Citation Index), Google Scholar, Crossref, SHARE, PrePubMed, Scilit and Europe PMC.

Published Papers (14 papers)

image processing research paper

Further Information

Mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

Submit your Manuscript

Submit your abstract.

  • Reference Manager
  • Simple TEXT file

People also looked at

Editorial article, editorial: current trends in image processing and pattern recognition.

www.frontiersin.org

  • PAMI Research Lab, Computer Science, University of South Dakota, Vermillion, SD, United States

Editorial on the Research Topic Current Trends in Image Processing and Pattern Recognition

Technological advancements in computing multiple opportunities in a wide variety of fields that range from document analysis ( Santosh, 2018 ), biomedical and healthcare informatics ( Santosh et al., 2019 ; Santosh et al., 2021 ; Santosh and Gaur, 2021 ; Santosh and Joshi, 2021 ), and biometrics to intelligent language processing. These applications primarily leverage AI tools and/or techniques, where topics such as image processing, signal and pattern recognition, machine learning and computer vision are considered.

With this theme, we opened a call for papers on Current Trends in Image Processing & Pattern Recognition that exactly followed third International Conference on Recent Trends in Image Processing & Pattern Recognition (RTIP2R), 2020 (URL: http://rtip2r-conference.org ). Our call was not limited to RTIP2R 2020, it was open to all. Altogether, 12 papers were submitted and seven of them were accepted for publication.

In Deshpande et al. , authors addressed the use of global fingerprint features (e.g., ridge flow, frequency, and other interest/key points) for matching. With Convolution Neural Network (CNN) matching model, which they called “Combination of Nearest-Neighbor Arrangement Indexing (CNNAI),” on datasets: FVC2004 and NIST SD27, their highest rank-I identification rate of 84.5% was achieved. Authors claimed that their results can be compared with the state-of-the-art algorithms and their approach was robust to rotation and scale. Similarly, in Deshpande et al. , using the exact same datasets, exact same set of authors addressed the importance of minutiae extraction and matching by taking into low quality latent fingerprint images. Their minutiae extraction technique showed remarkable improvement in their results. As claimed by the authors, their results were comparable to state-of-the-art systems.

In Gornale et al. , authors extracted distinguishing features that were geometrically distorted or transformed by taking Hu’s Invariant Moments into account. With this, authors focused on early detection and gradation of Knee Osteoarthritis, and they claimed that their results were validated by ortho surgeons and rheumatologists.

In Tamilmathi and Chithra , authors introduced a new deep learned quantization-based coding for 3D airborne LiDAR point cloud image. In their experimental results, authors showed that their model compressed an image into constant 16-bits of data and decompressed with approximately 160 dB of PSNR value, 174.46 s execution time with 0.6 s execution speed per instruction. Authors claimed that their method can be compared with previous algorithms/techniques in case we consider the following factors: space and time.

In Tamilmathi and Chithra , authors carefully inspected possible signs of plant leaf diseases. They employed the concept of feature learning and observed the correlation and/or similarity between symptoms that are related to diseases, so their disease identification is possible.

In Das Chagas Silva Araujo et al. , authors proposed a benchmark environment to compare multiple algorithms when one needs to deal with depth reconstruction from two-event based sensors. In their evaluation, a stereo matching algorithm was implemented, and multiple experiments were done with multiple camera settings as well as parameters. Authors claimed that this work could be considered as a benchmark when we consider robust evaluation of the multitude of new techniques under the scope of event-based stereo vision.

In Steffen et al. ; Gornale et al. , authors employed handwritten signature to better understand the behavioral biometric trait for document authentication/verification, such letters, contracts, and wills. They used handcrafter features such as LBP and HOG to extract features from 4,790 signatures so shallow learning can efficiently be applied. Using k-NN, decision tree and support vector machine classifiers, they reported promising performance.

Author Contributions

The author confirms being the sole contributor of this work and has approved it for publication.

Conflict of Interest

The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Santosh, KC, Antani, S., Guru, D. S., and Dey, N. (2019). Medical Imaging Artificial Intelligence, Image Recognition, and Machine Learning Techniques . United States: CRC Press . ISBN: 9780429029417. doi:10.1201/9780429029417

CrossRef Full Text | Google Scholar

Santosh, KC, Das, N., and Ghosh, S. (2021). Deep Learning Models for Medical Imaging, Primers in Biomedical Imaging Devices and Systems . United States: Elsevier . eBook ISBN: 9780128236505.

Google Scholar

Santosh, KC (2018). Document Image Analysis - Current Trends and Challenges in Graphics Recognition . United States: Springer . ISBN 978-981-13-2338-6. doi:10.1007/978-981-13-2339-3

Santosh, KC, and Gaur, L. (2021). Artificial Intelligence and Machine Learning in Public Healthcare: Opportunities and Societal Impact . Spain: SpringerBriefs in Computational Intelligence Series . ISBN: 978-981-16-6768-8. doi:10.1007/978-981-16-6768-8

Santosh, KC, and Joshi, A. (2021). COVID-19: Prediction, Decision-Making, and its Impacts, Book Series in Lecture Notes on Data Engineering and Communications Technologies . United States: Springer Nature . ISBN: 978-981-15-9682-7. doi:10.1007/978-981-15-9682-7

Keywords: artificial intelligence, computer vision, machine learning, image processing, signal processing, pattern recocgnition

Citation: Santosh KC (2021) Editorial: Current Trends in Image Processing and Pattern Recognition. Front. Robot. AI 8:785075. doi: 10.3389/frobt.2021.785075

Received: 28 September 2021; Accepted: 06 October 2021; Published: 09 December 2021.

Edited and reviewed by:

Copyright © 2021 Santosh. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: KC Santosh, [email protected]

This article is part of the Research Topic

Current Trends in Image Processing and Pattern Recognition

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List

Logo of jimaging

The Constantly Evolving Role of Medical Image Processing in Oncology: From Traditional Medical Image Processing to Imaging Biomarkers and Radiomics

Kostas marias.

1 Department of Electrical and Computer Engineering, Hellenic Mediterranean University, 71410 Heraklion, Greece; rg.umh@sairamk

2 Computational Biomedicine Laboratory (CBML), Foundation for Research and Technology—Hellas (FORTH), 70013 Heraklion, Greece

The role of medical image computing in oncology is growing stronger, not least due to the unprecedented advancement of computational AI techniques, providing a technological bridge between radiology and oncology, which could significantly accelerate the advancement of precision medicine throughout the cancer care continuum. Medical image processing has been an active field of research for more than three decades, focusing initially on traditional image analysis tasks such as registration segmentation, fusion, and contrast optimization. However, with the advancement of model-based medical image processing, the field of imaging biomarker discovery has focused on transforming functional imaging data into meaningful biomarkers that are able to provide insight into a tumor’s pathophysiology. More recently, the advancement of high-performance computing, in conjunction with the availability of large medical imaging datasets, has enabled the deployment of sophisticated machine learning techniques in the context of radiomics and deep learning modeling. This paper reviews and discusses the evolving role of image analysis and processing through the lens of the abovementioned developments, which hold promise for accelerating precision oncology, in the sense of improved diagnosis, prognosis, and treatment planning of cancer.

1. Introduction

To better understand the evolution of medical image processing in oncology, it is necessary to explain the importance of measuring tumor appearance from medical images. Medical image processing approaches contain useful diagnostic and prognostic information that can add precision in cancer care. In addition, because biology is a system of systems, it is reasonable to assume that image-based information may convey multi-level pathophysiology information. This has led to the establishment of many sophisticated predictive and diagnostic image-based biomarker extraction approaches in cancer. In more detail, medical image processing efforts are focused on extracting imaging biomarkers able to decipher the variation within individuals in terms of imaging phenotype, enabling the identification of patient subgroups for precision medicine strategies [ 1 ]. From the very beginning, the main prerequisite for clinical use was that quantitative biomarkers must be precise and reproducible. If these conditions are met, imaging biomarkers have the potential to aid clinicians in assessing the pathophysiologic changes in patients and better planning personalized therapy. This is important, as in clinical practice subjective characterizations might be used (e.g., average heterogeneity, speculated mass, necrotic core) which can decrease the precision of diagnostic processes.

Based on the above considerations, the extraction of quantitative parameters characterizing size, shape, texture, and activity can enhance the role of medical imaging in assisting in diagnosis or therapy response assessment. However, in clinical practice, only simpler image metrics (e.g., linear) are often used in oncology, especially in the evaluation of solid tumor response to therapy (e.g., a longer lesion diameter in RECIST). Both RECIST and WHO evaluation criteria rely on anatomical image measurements, mainly in CT or MRI data, and were originally developed mainly for cytotoxic therapies. Such linear measures suffer from high intra/inter-observed variability, which in some cases can compromise the accurate assessment of tumor response, since some studies report inter-observer RECIST variability of up to 30% [ 2 ]. Several studies have shown that 3D quantitative response assessments are better correlated with disease progression than those based on 1D linear measurements [ 3 ]. Nevertheless, traditional tumor quantification approaches based on linear or 3D tumor measures have experienced substantial difficulties in assessing response to newer oncology therapies, such as targeted, anti-angiogenic treatments and immunotherapies [ 2 ]. Size-based tumor assessments do not always represent tumor response to therapy, since, for example, tumors may display internal necrosis formation, with or without reduction in lesion size (as in traditional cytotoxic treatments). Even if RECIST criteria are constantly updated to address these issues, as in the case of Immune RECIST [ 4 ], such approaches still do not take into consideration a tumor’s image structure and texture over time. In addition the size and the location of metastases have been reported to play a significant role in assessing early tumor shrinkage and depth of response [ 5 ]. To address these limitations, medical image processing has provided over the last few decades the means to extract tumor texture and size descriptors for obtaining more detailed (e.g., pixel-based) descriptors of tissue structure and for discovering feature patterns connected to disease or response. In this paper, it is argued that the evolution of medical image processing has been a gradual process, and the diverse factors that contributed to unprecedented progress in the field with the use of AI are explained. Initially, simplistic approaches to classify benign and malignant masses, e.g., in mammograms, were based on traditional feature extraction and pattern recognition methods. Functional tomographic imaging such as PET gave rise to more sophisticated, model-based approaches from which quantitative markers from tissue properties could be extracted in an effort to optimize diagnosis, treatment stratification, and personalize response criteria. Lastly, the advancement of artificial intelligence enabled the more exhaustive search of imaging phenotype descriptors and led to the increased performance of modern diagnostic and predictive models.

2. Traditional Image Analysis: The First Efforts towards CAD Systems

In the 1990s, one of the first challenges in medical image analysis was to facilitate the interpretation of mammograms in the context of national screening programs for breast cancer. In the United Kingdom, the design of the first screening program was undertaken by a working group under Sir Patrick Forrest, whose report was accepted by the government in 1986. As a consequence, the UK screening program was established for women between 50 and 64 in 1990 [ 6 ]. The implementation of such screening programs throughout Europe led to the establishment of specialist breast screening centers and the formal training of both radiographers and radiologists. X-ray mammography proved to be a cost-effective imaging modality for national screening, and population screening led to smaller and usually non-palpable masses being increasingly detected.

As a result, the radiologist’s task became more complex, since the interpretation of a mammogram is challenging, due to the projective nature of mammography, while at the same time the need for early and accurate detection of cancer became pressing. To address these needs, medical image analysis became an active field of research in the early nineties, giving rise to numerous research efforts into cancer and microcalcification detection, as well as mammogram registration for improving the comparison of temporal mammograms. Figure 1 depicts the temporal mammogram registration concept towards CAD systems that would facilitate comparison and aid clinicians in early diagnose of cancer in screening mammography [ 7 ]. When the ImageChecker system was certified by the FDA for screening mammography in 1998, R2 Technology became the first company to employ computer-assisted diagnosis (CAD) for mammography, and later for digital mammography as well.

An external file that holds a picture, illustration, etc.
Object name is jimaging-07-00124-g001.jpg

Traditional medical image processing on temporal mammograms. From left to right: the most recent mammogram ( a ) is registered to the previous mammogram ( b ), which is shown in ( c ). After registration there is one predominant region of significant difference in the subtraction image ( d ), which corresponds to a mass developed in the breast.

However, early diagnostic decision support systems suffered from low precision, which in turn could potentially lead to a negative impact in the number of unnecessary biopsies. In a relevant study [ 8 ], the positive predictive values of the interpretations worsened from 100%, 92.7%, and 95.5%, to 86.4%, 97.3%, and 91.1%, when mammograms were analyzed by three independent observers, with and without the CAD. This limitation was representative of the low generalizability of such cancer detection tools in these early days. At the same time the lack of more sophisticated imaging modalities hampered the research efforts towards predicting therapy response and optimizing therapy based on imaging data.

3. Quantitative Imaging Based on Models

With the advent of more sophisticated imaging modalities enabling functional imaging, medical image analysis efforts shifted towards the quantification of tissue properties. This opened new horizons in CAD systems towards translating image signals to cancer tissue properties such as perfusion and cellularity and developing more intuitive imaging biomarkers for several cancer imaging applications. For example, in the case of MRI, complex phenomena that occur after excitation are amenable to mathematical modeling, taking into consideration tissue interactions within the tumor microenvironment. In the context of evaluating a model-based approach, the model can be regarded reliable when the predicted data converges on the observed signal intensities and at the same time provides useful insights to radiologists and oncologists. MRI perfusion and diffusion imaging has been the main focus of such modeling efforts, not least due to fact that MRI is ionizing radiation-free.

Diffusion weighted MRI (DWI-MRI) is based on sequences sensitized to microscopic water mobility by means of strong gradient pulses and can provide quantitative information on tumor environment and architecture. Diffusivity can be assessed in the intracellular, extracellular, and intravascular spaces. Apparent diffusion coefficient (ADC) per pixel values derived from DWI-MRI theoretically have an inverse relationship to tumor cell density. In addition, with the introduction of the intravoxel incoherent motion (IVIM) model, both cellularity and microvascular perfusion information could be assessed after parametric modeling [ 9 ]. Figure 2 presents a parametric map of the stretching parameter α from the DWI-MRI stretched-exponential model (SEM), revealing highly heterogeneous parts of a dedifferentiated liposarcoma (DDLS) of Grade 3 [ 9 ].

An external file that holds a picture, illustration, etc.
Object name is jimaging-07-00124-g002.jpg

DWI-MRI stretched-exponential (SEM) DWI-MRI parametric map, revealing highly heterogeneous parts of a dedifferentiated liposarcoma (with permission from the department of Medical Imaging, Heraklion University Hospital). Heterogeneity index α ranges from 0 to 1, with lower values of α indicating microstructural heterogeneity.

DWI-MRI has been tested in most solid tumors for discriminating malignant from benign lesions, to automatize tumor grading, and to predict treatment response and post-treatment monitoring [ 10 ].

However, there is still a lack of standardization and generalization of these results, as well as validation against histopathology. While in clinical routine, in-depth DWI-MRI biomarker validation is difficult, recent pre-clinical studies have found that derived parametric maps can serve as a non-invasive marker of cell death and apoptosis in response to treatment [ 11 ]. To this end, they also confirmed significant correlations of ADC with immunohistochemistry measurements of cell density, cell death, and apoptosis.

In a similar fashion, in dynamic contrast-enhanced MRI (DCE-MRI), T1-weighted sequences are acquired before, during, and after the administration of a paramagnetic contrast agent (CA). Tissue-specific information about pathophysiology can be inferred from the dynamics of signal intensity in every pixel of the studied area. Usually this is performed by visual or semi-quantitative analysis from the signal time curves in selected regions of interest. However, with the use of pharmacokinetic modeling, e.g., between the intravascular and the extravascular extracellular space, it became possible to map signal intensities per pixel to CA concentration and then fit model parameters describing, e.g., interstitial space and transfer constant (ktrans). This enabled the generation of parametric maps, e.g., for ktrans providing more quantitative representation of tumor perfusion and heterogeneity within the tumor image region of interest. Although promising, e.g., for assessing treatment efficacy, such approaches have found limited use in clinical practice, not least due to the low reported reproducibility of model parameter estimation. One aspect of this problems is presented in the example shown in Figure 3 , where the use of image-driven methods based on multiple-flip angles produces a parametric map of a tumor with different contrast compared to the one produced with the Fritz–Hansen population based AIF [ 12 ]. This issue has several implications, including for the accuracy of assessing breast cancer response to neoadjuvant chemotherapy [ 13 ].

An external file that holds a picture, illustration, etc.
Object name is jimaging-07-00124-g003.jpg

( a ) ktrans map of a tumor from PK analysis using AIF measured directly from the MR image, while for the conversion from signal to CA concentration the multiple flip angles method (mFAs) was used, ( b ) ktrans map of the same tumor using a population based AIF from Fritz and Hansen.

In conclusion, the clinical translation of DWI and DCE MRI is hampered by low repeatability and reproducibility across several studies in oncology. To address this problem initiatives such as the Quantitative Imaging Biomarkers Alliance (QIBA) propose clinical and technological requirements for quantitative DWI and DCE-derived imaging biomarkers, as well as image acquisition, processing, and quality control recommendations aimed at improving reproducibility error, precision, and accuracy [ 14 ]. It is argued that this active area of medical image processing has not yet reached its full potential and still represents a complementary approach to AI driven methods, towards CAD systems for promoting precision oncology. In addition, the exploitation of multimodality imaging strategies (e.g., PET/MRI) can provide added value through the combination of anatomical and functional information.

4. Radiomics and Deep Learning Approaches in Oncology through the Cancer Continuum

Traditional cancer medical image analysis was for decades based on human-defined features, usually inspired by low-level image properties, such as intensity, contrast, and a limited number of texture measures. Such methods were successfully used. e.g., in cancer subclassification, but it was hard to capture the high-level, complex patterns that an expert radiologist uses to define the presence or absence of cancer [ 1 ].

However, with the advancement of machine learning and the availability of more powerful, high-performance computational infrastructures, it became possible to exhaustively analyze the texture and shape content of medical images in an effort to decipher high-level pathophysiology patterns. At the same time the evolution of texture representation and feature extraction, through a growing number of techniques during the last decades, played a catalytic role in better capturing tumor appearance through medical image analysis [ 15 ]. Last but not least, the need to decipher the imaging phenotype in cancer became even more pressing, due to the fact that the vast majority of visible phenotypic variation is now considered attributable to non-genetic determinants in chronic and age-associated disorders [ 1 ].

All these factors played a central role in the advancement of radiomics, where in analogy to genomics high-throughput feature extraction followed by ML enabled the development of significant discriminatory and predictive signatures, based on imaging phenotype. Radiomics have been enhanced with deep learning techniques, offering an alternative approach to medical image feature extraction by the learning of complex, high-level features in an automated fashion from a large number of medical images that contain variable instances of a particular tumor. Figure 4 illustrates the main AI/radiomics applications that can assist clinicians in adding precision in the management of cancer patients.

An external file that holds a picture, illustration, etc.
Object name is jimaging-07-00124-g004.jpg

The main medical image processing applications enhanced with AI/radiomics towards precision oncology.

4.1. Cancer Screening

Recent advancements in AI driven medical image processing can have a positive impact in national cancer screening programs, alleviating the heavy workload of radiologists and aiding clinicians to reduce the number of missed cancers and to detect them at an earlier stage. Compared to the initial efforts mentioned in previous sections, recent AI-driven image processing can exceed the limits of human vision and potentially reduce the number of cancers missed in screening, as well as cope with inter-observer variability.

Regarding lung cancer screening, early nodule detection and classification is of paramount importance for improving patient outcomes and quality of life. Despite the existence of such screening programs the majority of lung cancers are detected in the later stages, leading to increased mortality and low 5-year survival rate [ 16 ]. To this end, radiomics and deep-learning-based methods have shown encouraging results towards precision pulmonary nodule evaluation [ 17 ]. A very interesting recent example is reported by Ardill et al., who developed a deep learning algorithm that uses a patient’s current and prior computed tomography volumes to predict the risk of lung cancer. Their model achieved a state-of-the-art performance (94.4% area under the curve) on 6716 cases and performed similarly on an independent clinical validation set of 1139 cases. When prior computed tomography imaging was not available, their model outperformed all six radiologists, with absolute reductions of 11% in false positives and 5% in false negatives [ 18 ].

Regarding breast cancer screening technologies, it is argued that AI may provide the means to limit the inherent drawbacks of mammography and enhance diagnostic performance and robustness. In a prospective clinical study, a commercially available AI algorithm was evaluated as an independent reader of screening mammograms, and adequate diagnostic performance was reported [ 19 ].

4.2. Precision Cancer Diagnosis

During the last decades CAD-driven precision diagnosis has been the holy grail of medical image processing research efforts. However, the clinical interest in such applications has significantly grown only recently with the advancement of AI-driven efforts to generalize performance across diverse datasets. AI systems have reported unprecedented performance regarding the segmentation and classification of cancer. A recent study reported increased performance in segmenting and classifying brain tumors into meningioma, glioma, and pituitary tumors [ 20 ].

In addition, a growing number of studies are concerned with automated tumor grading, which is a prerequisite for optimal therapy planning. Yang et al. presented a retrospective glioma grading study (grade II and grade III concerning low grade glioma and high grade glioma) on one hundred and thirteen glioma patients and used transfer learning with AlexNet and GoogLeNet architectures, achieving up to 0.939 AUC [ 21 ].

At the same time, the quest to decode imaging phenotype has given rise to efforts to correlate imaging features with molecular and genetic markers in the context of radio-genomics [ 22 ]. This promising field of research can provide surrogate molecular information directly from medical images and is not prone to biopsy sampling errors, as the whole tumor can be analyzed. In a recent study, MRI radiomics were able to predict IDH1 mutation with an AUC of up to 90% [ 23 ].

4.3. Treatment Optimization

There are many challenging problems in optimizing treatment for cancer patients, such as accurate segmentation of organs at risk (OAR) in radiotherapy and prediction of neoadjuvant chemotherapy response. Intelligent processing of medical images has opened new horizons to address these clinical needs. In the case of nasopharyngeal carcinoma radiotherapy planning, a deep learning organs-at-risk (OAR) detection and segmentation network provides useful insights for clinicians for the accurate delineation of OARs [ 24 ]. Regarding prediction of neoadjuvant chemotherapy, the use of image-based algorithms to predict outcome has the potential to add precision, not least due to the fact that depending on tumor subtype the outcome can differ significantly. To this end, recent studies report promising preliminary results in applying AI to predict breast cancer neoadjuvant therapy response. Vulchi et al. reported improved prediction of response to HER2-targeted neoadjuvant therapy based on deep learning of DCE-MRI data [ 25 ]. Notably, the AUC dropped from 0.93 to 0.85 in the external validation cohort.

5. Radiomics Limitations Regarding Clinical Translation

While promising, radiomics methodologies are still in a translational phase and thorough clinical validation is needed towards clinical translation. To this end, when these technologies are tested and reviewed, a number of important limitations becomes apparent. In a recent review on MRI based radiomics in nasopharyngeal cancer [ 26 ], the authors reviewed the state of the art and used a radiomic quality score assessment (RQS). Several limitations were highlighted in the reviewed studies, including the absence of a validation cohort in 21% of them, as well as the lack of external validation in 92% of them. In another RQS based evaluation study on radiomics and radio-genomics papers, the RQS was low regarding clinical utility, test-retest analysis, prospective study, and open science [ 27 ]. It was also very interesting that no single study used phantoms to assess the robustness of radiomics features or performed a cost-effectiveness analysis. In a similar fashion, lack of feature robustness assessment and external validation was reported in studies regarding prostate cancer [ 28 ], while the main reported shortcomings in the quality of the MRI lymphoma radiomics studies regarded inconsistencies in the segmentation process and the lack of temporal data to increase model robustness [ 29 ]. All these recent studies clearly indicate that, although medical image processing in oncology has evolved significantly, the clinical translation of radiomics is still hampered by the lack of extensive, high quality validation studies. In addition, the lack of standardization in radiomics extraction remains a problem, which is currently being investigated by several studies, with respect to different software packages [ 30 ] and the reproducibility of standardized radiomics features using multi-modality patient data [ 31 ].

6. Discussion

Contrary to common belief, medical image processing has been evolving for the last few decades and its main application is cancer image analysis. Traditional medical image processing was founded on classical image processing and computer vision principles, focusing on low-level feature extraction and simple classification tasks, e.g., benign vs. malignant, or in the geometrical alignment of temporal images and the segmentation of tumors for volumetric analyses. This early stage in the 1990s was an important milestone for further development, since several radiologists and oncologists understood the future potential and helped in the creation of a multidisciplinary community on medical image analysis and processing. More importantly, it laid the foundations of radiomics by proposing the shape and textural analysis of tumors as useful patterns for detection, segmentation, and classification. However, the main limitation was the high degree of fragmentation in such efforts, the limited computational resources, and the very low availability of cancer image data; usually being mammograms or MRIs.

Functional imaging was another important milestone for medical image computing, since the idea of transforming dynamic image signals to tissue properties paved the way for the discovery of reliable and reproducible image biomarkers for oncology. To achieve this goal, non-conventional medical image processing was deployed based on compartmental models to link the imaging phenotype with microscopic tumor environment properties, based on diffusion and perfusion. Such model-based approaches include compartment pharmacokinetic models for DCE-MRI and the IVIM model for DWI-MRI, often requiring laborious pre-processing to transform the original signal to quantitative parametric maps able to convey perfusion and cellularity information to the clinician. It is argued that this is still an evolving research field and that the potential for clinical translation is significant, especially since techniques such as DWI-MRI do not involve ionizing radiation or the administration of contrast agent. That said, significant standardization efforts are still required in order to converge on stable imaging protocols and model implementations that will guarantee reproducible parametric maps and robust cancer biomarkers. Another limitation when comparing to modern radiomics/deep learning efforts is that the processing of such functional data with compartmental models is a very demanding task, requiring a deeper understanding of imaging protocols, as well as of numerical analysis methods for model fitting.

The gradual advancements of high-performance computing and machine learning and neural networks have revolutionized research in the field, especially during the last decade. The field of radiomics has extended the cancer medical image processing concepts regarding texture and shape descriptors to massive feature extraction and modeling. Such radiomics approaches have also been enhanced by convolutional neural networks, which outperformed the traditional image analysis methods in tasks such as lesion segmentation, while introducing more sophisticated predictive, diagnostic, and correlative pipelines towards precision diagnostics, therapy optimization, and synergistic radio-genomic biomarker discovery. The availability of open access computational tools for machine and deep learning, in combination with public cancer image resources such as the Cancer Imaging Archive (TCIA), has led to an unprecedented number of publications, AI start-ups, and accelerated discussions for the establishment of AI regulatory processes and clinical translation of such technologies. At the same time, the main limitation of these impressive technologies has been their low explainability, which came as a tradeoff for the impressive performances in oncological applications throughout the cancer continuum. Low explainability also contributed to reduced trust in these models, while the vast number of features explored made generalization difficult, especially due to the large variability of image quality and imaging protocols across vendors and clinical sites.

Medical image processing is still evolving and will continue to provide useful tools and methodological concepts for improving cancer image analysis and interpretation. Data science approaches focusing on radiomics have paved the way for accelerating precision oncology [ 32 ]. However, most of the efforts to date only use imaging data, which limits the performance of diagnostic and prognostic tools. To this end, novel data integration paradigms, exploiting both imaging and multi-omics data, is a very promising field for future research [ 33 ]. Recent studies have started exploring the synergy of deep learning with quantitative parametric maps. In [ 34 ], the authors present a deep learning method to predict good responders of locally advanced rectal cancer trained on apparent diffusion coefficient (ADC) parametric scans from different vendors. The fusion of standard imaging representations with parametric maps, as well as integrative diagnostic approaches [ 35 ] involving medical image and other cancer related data, hold promise for increasing accuracy and trustworthiness.

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Conflicts of interest.

The author declares no conflict of interest.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Subscribe to the PwC Newsletter

Join the community, add a new evaluation result row, image enhancement.

320 papers with code • 6 benchmarks • 16 datasets

Image Enhancement is basically improving the interpretability or perception of information in images for human viewers and providing ‘better’ input for other automated image processing techniques. The principal objective of Image Enhancement is to modify attributes of an image to make it more suitable for a given task and a specific observer.

Source: A Comprehensive Review of Image Enhancement Techniques

Benchmarks Add a Result

--> --> --> --> --> --> -->
Trend Dataset Best ModelPaper Code Compare
Retinexformer
ESDNet-L
LCDPNet
TreEnhance
CIDNet
CIDNet

image processing research paper

Most implemented papers

Learning enriched features for real image restoration and enhancement.

image processing research paper

With the goal of recovering high-quality image content from its degraded version, image restoration enjoys numerous applications, such as in surveillance, computational photography, medical imaging, and remote sensing.

Zero-Reference Deep Curve Estimation for Low-Light Image Enhancement

The paper presents a novel method, Zero-Reference Deep Curve Estimation (Zero-DCE), which formulates light enhancement as a task of image-specific curve estimation with a deep network.

EnlightenGAN: Deep Light Enhancement without Paired Supervision

yueruchen/EnlightenGAN • 17 Jun 2019

Deep learning-based methods have achieved remarkable success in image restoration and enhancement, but are they still competitive when there is a lack of paired training data?

LLNet: A Deep Autoencoder Approach to Natural Low-light Image Enhancement

kglore/llnet_color • 12 Nov 2015

In surveillance, monitoring and tactical reconnaissance, gathering the right visual information from a dynamic environment and accurately processing such data are essential ingredients to making informed decisions which determines the success of an operation.

Underwater Image Enhancement via Medium Transmission-Guided Multi-Color Space Embedding

As a result, our network can effectively improve the visual quality of underwater images by exploiting multiple color spaces embedding and the advantages of both physical model-based and learning-based methods.

Retinexformer: One-stage Retinex-based Transformer for Low-light Image Enhancement

When enhancing low-light images, many deep learning algorithms are based on the Retinex theory.

Simultaneous Enhancement and Super-Resolution of Underwater Imagery for Improved Visual Perception

xahidbuffon/Deep-SESR • 4 Feb 2020

In this paper, we introduce and tackle the simultaneous enhancement and super-resolution (SESR) problem for underwater robot vision and provide an efficient solution for near real-time applications.

Small-Object Detection in Remote Sensing Images with End-to-End Edge-Enhanced GAN and Object Detector Network

Inspired by the success of edge enhanced GAN (EEGAN) and ESRGAN, we apply a new edge-enhanced super-resolution GAN (EESRGAN) to improve the image quality of remote sensing images and use different detector networks in an end-to-end manner where detector loss is backpropagated into the EESRGAN to improve the detection performance.

Learning to Enhance Low-Light Image via Zero-Reference Deep Curve Estimation

This paper presents a novel method, Zero-Reference Deep Curve Estimation (Zero-DCE), which formulates light enhancement as a task of image-specific curve estimation with a deep network.

Uformer: A General U-Shaped Transformer for Image Restoration

Powered by these two designs, Uformer enjoys a high capability for capturing both local and global dependencies for image restoration.

IEEE Account

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

Introducing Apple’s On-Device and Server Foundation Models

At the 2024 Worldwide Developers Conference , we introduced Apple Intelligence, a personal intelligence system integrated deeply into iOS 18, iPadOS 18, and macOS Sequoia.

Apple Intelligence is comprised of multiple highly-capable generative models that are specialized for our users’ everyday tasks, and can adapt on the fly for their current activity. The foundation models built into Apple Intelligence have been fine-tuned for user experiences such as writing and refining text, prioritizing and summarizing notifications, creating playful images for conversations with family and friends, and taking in-app actions to simplify interactions across apps.

In the following overview, we will detail how two of these models — a ~3 billion parameter on-device language model, and a larger server-based language model available with Private Cloud Compute and running on Apple silicon servers — have been built and adapted to perform specialized tasks efficiently, accurately, and responsibly. These two foundation models are part of a larger family of generative models created by Apple to support users and developers; this includes a coding model to build intelligence into Xcode, as well as a diffusion model to help users express themselves visually, for example, in the Messages app. We look forward to sharing more information soon on this broader set of models.

Our Focus on Responsible AI Development

Apple Intelligence is designed with our core values at every step and built on a foundation of groundbreaking privacy innovations.

Additionally, we have created a set of Responsible AI principles to guide how we develop AI tools, as well as the models that underpin them:

  • Empower users with intelligent tools : We identify areas where AI can be used responsibly to create tools for addressing specific user needs. We respect how our users choose to use these tools to accomplish their goals.
  • Represent our users : We build deeply personal products with the goal of representing users around the globe authentically. We work continuously to avoid perpetuating stereotypes and systemic biases across our AI tools and models.
  • Design with care : We take precautions at every stage of our process, including design, model training, feature development, and quality evaluation to identify how our AI tools may be misused or lead to potential harm. We will continuously and proactively improve our AI tools with the help of user feedback.
  • Protect privacy : We protect our users' privacy with powerful on-device processing and groundbreaking infrastructure like Private Cloud Compute. We do not use our users' private personal data or user interactions when training our foundation models.

These principles are reflected throughout the architecture that enables Apple Intelligence, connects features and tools with specialized models, and scans inputs and outputs to provide each feature with the information needed to function responsibly.

In the remainder of this overview, we provide details on decisions such as: how we develop models that are highly capable, fast, and power-efficient; how we approach training these models; how our adapters are fine-tuned for specific user needs; and how we evaluate model performance for both helpfulness and unintended harm.

Modeling overview

Pre-Training

Our foundation models are trained on Apple's AXLearn framework , an open-source project we released in 2023. It builds on top of JAX and XLA, and allows us to train the models with high efficiency and scalability on various training hardware and cloud platforms, including TPUs and both cloud and on-premise GPUs. We used a combination of data parallelism, tensor parallelism, sequence parallelism, and Fully Sharded Data Parallel (FSDP) to scale training along multiple dimensions such as data, model, and sequence length.

We train our foundation models on licensed data, including data selected to enhance specific features, as well as publicly available data collected by our web-crawler, AppleBot. Web publishers have the option to opt out of the use of their web content for Apple Intelligence training with a data usage control.

We never use our users’ private personal data or user interactions when training our foundation models, and we apply filters to remove personally identifiable information like social security and credit card numbers that are publicly available on the Internet. We also filter profanity and other low-quality content to prevent its inclusion in the training corpus. In addition to filtering, we perform data extraction, deduplication, and the application of a model-based classifier to identify high quality documents.

Post-Training

We find that data quality is essential to model success, so we utilize a hybrid data strategy in our training pipeline, incorporating both human-annotated and synthetic data, and conduct thorough data curation and filtering procedures. We have developed two novel algorithms in post-training: (1) a rejection sampling fine-tuning algorithm with teacher committee, and (2) a reinforcement learning from human feedback (RLHF) algorithm with mirror descent policy optimization and a leave-one-out advantage estimator. We find that these two algorithms lead to significant improvement in the model’s instruction-following quality.

Optimization

In addition to ensuring our generative models are highly capable, we have used a range of innovative techniques to optimize them on-device and on our private cloud for speed and efficiency. We have applied an extensive set of optimizations for both first token and extended token inference performance.

Both the on-device and server models use grouped-query-attention. We use shared input and output vocab embedding tables to reduce memory requirements and inference cost. These shared embedding tensors are mapped without duplications. The on-device model uses a vocab size of 49K, while the server model uses a vocab size of 100K, which includes additional language and technical tokens.

For on-device inference, we use low-bit palletization, a critical optimization technique that achieves the necessary memory, power, and performance requirements. To maintain model quality, we developed a new framework using LoRA adapters that incorporates a mixed 2-bit and 4-bit configuration strategy — averaging 3.5 bits-per-weight — to achieve the same accuracy as the uncompressed models.

Additionally, we use an interactive model latency and power analysis tool, Talaria , to better guide the bit rate selection for each operation. We also utilize activation quantization and embedding quantization, and have developed an approach to enable efficient Key-Value (KV) cache update on our neural engines.

With this set of optimizations, on iPhone 15 Pro we are able to reach time-to-first-token latency of about 0.6 millisecond per prompt token, and a generation rate of 30 tokens per second. Notably, this performance is attained before employing token speculation techniques, from which we see further enhancement on the token generation rate.

Model Adaptation

Our foundation models are fine-tuned for users’ everyday activities, and can dynamically specialize themselves on-the-fly for the task at hand. We utilize adapters, small neural network modules that can be plugged into various layers of the pre-trained model, to fine-tune our models for specific tasks. For our models we adapt the attention matrices, the attention projection matrix, and the fully connected layers in the point-wise feedforward networks for a suitable set of the decoding layers of the transformer architecture.

By fine-tuning only the adapter layers, the original parameters of the base pre-trained model remain unchanged, preserving the general knowledge of the model while tailoring the adapter layers to support specific tasks.

We represent the values of the adapter parameters using 16 bits, and for the ~3 billion parameter on-device model, the parameters for a rank 16 adapter typically require 10s of megabytes. The adapter models can be dynamically loaded, temporarily cached in memory, and swapped — giving our foundation model the ability to specialize itself on the fly for the task at hand while efficiently managing memory and guaranteeing the operating system's responsiveness.

To facilitate the training of the adapters, we created an efficient infrastructure that allows us to rapidly retrain, test, and deploy adapters when either the base model or the training data gets updated. The adapter parameters are initialized using the accuracy-recovery adapter introduced in the Optimization section.

Performance and Evaluation

Our focus is on delivering generative models that can enable users to communicate, work, express themselves, and get things done across their Apple products. When benchmarking our models, we focus on human evaluation as we find that these results are highly correlated to user experience in our products. We conducted performance evaluations on both feature-specific adapters and the foundation models.

To illustrate our approach, we look at how we evaluated our adapter for summarization. As product requirements for summaries of emails and notifications differ in subtle but important ways, we fine-tune accuracy-recovery low-rank (LoRA) adapters on top of the palletized model to meet these specific requirements. Our training data is based on synthetic summaries generated from bigger server models, filtered by a rejection sampling strategy that keeps only the high quality summaries.

To evaluate the product-specific summarization, we use a set of 750 responses carefully sampled for each use case. These evaluation datasets emphasize a diverse set of inputs that our product features are likely to face in production, and include a stratified mixture of single and stacked documents of varying content types and lengths. As product features, it was important to evaluate performance against datasets that are representative of real use cases. We find that our models with adapters generate better summaries than a comparable model.

As part of responsible development, we identified and evaluated specific risks inherent to summarization. For example, summaries occasionally remove important nuance or other details in ways that are undesirable. However, we found that the summarization adapter did not amplify sensitive content in over 99% of targeted adversarial examples. We continue to adversarially probe to identify unknown harms and expand our evaluations to help guide further improvements.

In addition to evaluating feature specific performance powered by foundation models and adapters, we evaluate both the on-device and server-based models’ general capabilities. We utilize a comprehensive evaluation set of real-world prompts to test the general model capabilities. These prompts are diverse across different difficulty levels and cover major categories such as brainstorming, classification, closed question answering, coding, extraction, mathematical reasoning, open question answering, rewriting, safety, summarization, and writing.

We compare our models with both open-source models (Phi-3, Gemma, Mistral, DBRX) and commercial models of comparable size (GPT-3.5-Turbo, GPT-4-Turbo) 1 . We find that our models are preferred by human graders over most comparable competitor models. On this benchmark, our on-device model, with ~3B parameters, outperforms larger models including Phi-3-mini, Mistral-7B, and Gemma-7B. Our server model compares favorably to DBRX-Instruct, Mixtral-8x22B, and GPT-3.5-Turbo while being highly efficient.

We use a set of diverse adversarial prompts to test the model performance on harmful content, sensitive topics, and factuality. We measure the violation rates of each model as evaluated by human graders on this evaluation set, with a lower number being desirable. Both the on-device and server models are robust when faced with adversarial prompts, achieving violation rates lower than open-source and commercial models.

Our models are preferred by human graders as safe and helpful over competitor models for these prompts. However, considering the broad capabilities of large language models, we understand the limitation of our safety benchmark. We are actively conducting both manual and automatic red-teaming with internal and external teams to continue evaluating our models' safety.

To further evaluate our models, we use the Instruction-Following Eval (IFEval) benchmark to compare their instruction-following capabilities with models of comparable size. The results suggest that both our on-device and server model follow detailed instructions better than the open-source and commercial models of comparable size.

We evaluate our models’ writing ability on our internal summarization and composition benchmarks, consisting of a variety of writing instructions. These results do not refer to our feature-specific adapter for summarization (seen in Figure 3 ), nor do we have an adapter focused on composition.

The Apple foundation models and adapters introduced at WWDC24 underlie Apple Intelligence, the new personal intelligence system that is integrated deeply into iPhone, iPad, and Mac, and enables powerful capabilities across language, images, actions, and personal context. Our models have been created with the purpose of helping users do everyday activities across their Apple products, and developed responsibly at every stage and guided by Apple’s core values. We look forward to sharing more information soon on our broader family of generative models, including language, diffusion, and coding models.

[1] We compared against the following model versions: gpt-3.5-turbo-0125, gpt-4-0125-preview, Phi-3-mini-4k-instruct, Mistral-7B-Instruct-v0.2, Mixtral-8x22B-Instruct-v0.1, Gemma-1.1-2B, and Gemma-1.1-7B. The open-source and Apple models are evaluated in bfloat16 precision.

Related readings and updates.

Advancing speech accessibility with personal voice.

A voice replicator is a powerful tool for people at risk of losing their ability to speak, including those with a recent diagnosis of amyotrophic lateral sclerosis (ALS) or other conditions that can progressively impact speaking ability. First introduced in May 2023 and made available on iOS 17 in September 2023, Personal Voice is a tool that creates a synthesized voice for such users to speak in FaceTime, phone calls, assistive communication apps, and in-person conversations.

Apple Natural Language Understanding Workshop 2023

Earlier this year, Apple hosted the Natural Language Understanding workshop. This two-day hybrid event brought together Apple and members of the academic research community for talks and discussions on the state of the art in natural language understanding.

In this post, we share highlights from workshop discussions and recordings of select workshop talks.

Bottom banner

Discover opportunities in Machine Learning.

Our research in machine learning breaks new ground every day.

Work with us

image processing research paper

  • {{subColumn.name}}

Electronic Research Archive

image processing research paper

  • {{newsColumn.name}}
  • Share facebook twitter google linkedin

image processing research paper

Faster free pseudoinverse greedy block Kaczmarz method for image recovery

  • Wenya Shi , 
  • Xinpeng Yan , 
  • Zhan Huan , 
  • Aliyun Big Data College, Changzhou University, Changzhou 213159, China
  • Received: 09 April 2024 Revised: 22 May 2024 Accepted: 31 May 2024 Published: 17 June 2024
  • Full Text(HTML)
  • Download PDF

The greedy block Kaczmarz (GBK) method has been successfully applied in areas such as data mining, image reconstruction, and large-scale image restoration. However, the computation of pseudo-inverses in each iterative step of the GBK method not only complicates the computation and slows down the convergence rate, but it is also ill-suited for distributed implementation. The leverage score sampling free pseudo-inverse GBK algorithm proposed in this paper demonstrated significant potential in the field of image reconstruction. By ingeniously transforming the problem framework, the algorithm not only enhanced the efficiency of processing systems of linear equations with multiple solution vectors but also optimized specifically for applications in image reconstruction. A methodology that combined theoretical and experimental approaches has validated the robustness and practicality of the algorithm, providing valuable insights for technical advancements in related disciplines.

  • leverage score sampling ,
  • greedy block Kaczmarz method ,
  • free pseudo-inverse ,
  • multiple righthand sides ,
  • image recovery

Citation: Wenya Shi, Xinpeng Yan, Zhan Huan. Faster free pseudoinverse greedy block Kaczmarz method for image recovery[J]. Electronic Research Archive, 2024, 32(6): 3973-3988. doi: 10.3934/era.2024178

Related Papers:

[1] , Society for Industrial and Applied Mathematics, New York, 2001. --> A. C. Kak, M. Slaney, , Society for Industrial and Applied Mathematics, New York, 2001.
[2] , (2003), 560–574. https://doi.org/10.1109/TSP.2002.807005 --> J. A. Fessler, B. P. Sutton, Nonuniform fast Fourier transforms using min-max interpolation, , (2003), 560–574. https://doi.org/10.1109/TSP.2002.807005 doi:
[3] , (1978), 686–690. https://doi.org/10.1038/272686a0 --> S. F. Gull, G. J. Daniell, Image reconstruction from incomplete and noisy data, , (1978), 686–690. https://doi.org/10.1038/272686a0 doi:
[4] , (2017), 1375716. https://doi.org/10.1155/2017/1375716 --> X. L. Zhao, T. Z. Huang, X. M. Gu, L. J. Deng, Vector extrapolation based Landweber method for discrete ill-posed problems, , (2017), 1375716. https://doi.org/10.1155/2017/1375716 doi:
[5] , (2003), 21–36. https://doi.org/10.1109/MSP.2003.120320 --> S. C. Park, M. K. Park, M. G. Kang, Super-resolution image reconstruction: a technical overview, , (2003), 21–36. https://doi.org/10.1109/MSP.2003.120320 doi:
[6] , Society for Industrial and Applied Mathematics, New York, 2001. --> F. Natterer, F. Wübbeling, , Society for Industrial and Applied Mathematics, New York, 2001.
[7] , (2001), 97–103. https://doi.org/10.1016/S0895-6111(00)00059-8 --> G. L. Zeng, Image reconstruction—a tutorial, , (2001), 97–103. https://doi.org/10.1016/S0895-6111(00)00059-8 doi:
[8] , in , Springer, (2018), 39–63. --> J. I. Goldstein, D. E. Newbury, J. R. Michael, N. W. M. Ritchie, J. H. J. Scott, D. C. Joy, , in , Springer, (2018), 39–63.
[9] , Ph.D thesis, Technical University of Denmark in Copenhagen, 1996. --> P. Toft, , Ph.D thesis, Technical University of Denmark in Copenhagen, 1996.
[10] , Bachelor's thesis, Brac University in Dacca, 2018. --> A. Rahman, , Bachelor's thesis, Brac University in Dacca, 2018.
[11] , Springer Science & Business Media, Berlin, 2009. --> G. T. Herman, , Springer Science & Business Media, Berlin, 2009.
[12] , (2009), 431–436. https://doi.org/10.1007/s00041-009-9077-x --> Y. Censor, G. T. Herman, M. Jiang, A note on the behavior of the randomized Kaczmarz algorithm of Strohmer and Vershynin, , (2009), 431–436. https://doi.org/10.1007/s00041-009-9077-x doi:
[13] , Cambridge University Press, Cambridge, 1996. --> O. Axelsson, , Cambridge University Press, Cambridge, 1996.
[14] , American Mathematical Soc., Providence, 2005. --> I. Gohberg, I. A. Fel_dman, , American Mathematical Soc., Providence, 2005.
[15] , (2003), 71–82. --> Z. Z. Bai, C. H. Jin, Column-decomposed relaxation methods for the overdetermined systems of linear equations, , (2003), 71–82.
[16] , (1937), 355–357. --> S. Kaczmarz, Angenäherte auflösung von systemen linearer glei-chungen, , (1937), 355–357.
[17] , Master's Thesis, University of Ontario Institute of Technology in Oshawa, 2010. --> M. Brooks, , Master's Thesis, University of Ontario Institute of Technology in Oshawa, 2010.
[18] , (1988), 307–325. https://doi.org/10.1007/BF01589408 --> Y. Censor, Parallel application of block-iterative methods in medical imaging and radiation therapy, , (1988), 307–325. https://doi.org/10.1007/BF01589408 doi:
[19] , (20031), 103. https://doi.org/10.1088/0266-5611/20/1/006 --> C. Byrne, A unified treatment of some iterative algorithms in signal processing and image reconstruction, , (20031), 103. https://doi.org/10.1088/0266-5611/20/1/006 doi:
[20] , (2014), 1347–1351. --> D. A. Lorenz, S. Wenger, F. Schöpfer, M. Magnor, A sparse Kaczmarz solver and a linearized Bregman method for online compressed sensing, in , (2014), 1347–1351.
[21] , , (2021), 337–359. https://doi.org/10.1007/s10543-020-00824-1 --> J. D. Moorman, T. K. Tu, D. Molitor, D. Needell, , , (2021), 337–359. https://doi.org/10.1007/s10543-020-00824-1 doi:
[22] , (2009), 262–278. https://doi.org/10.1007/s00041-008-9030-4 --> T. Strohmer, R. Vershynin, A randomized Kaczmarz algorithm with exponential convergence, , (2009), 262–278. https://doi.org/10.1007/s00041-008-9030-4 doi:
[23] , (2013), 773–793. --> A. Zouzias, N. M. Freris, Randomized extended Kaczmarz for solving least squares, , (2013), 773–793.
[24]
[25] , (2014), 199–221. https://doi.org/10.1016/j.laa.2012.12.022 --> D. Needell, J. A. Tropp, Paved with good intentions: analysis of a randomized block Kaczmarz method, , (2014), 199–221. https://doi.org/10.1016/j.laa.2012.12.022 doi:
[26] , (2018), 21–26. https://doi.org/10.1016/j.aml.2018.03.008 --> Z Z. Bai, W. T. Wu, On relaxed greedy randomized Kaczmarz methods for solving large sparse linear systems, , (2018), 21–26. https://doi.org/10.1016/j.aml.2018.03.008 doi:
[27] , (2018), A592–A606. https://doi.org/10.1137/17M1137747 --> Z Z. Bai, W. T. Wu, On greedy randomized Kaczmarz method for solving large sparse linear systems, , (2018), A592–A606. https://doi.org/10.1137/17M1137747 doi:
[28] , (2021), 443–473. https://doi.org/10.1007/s11075-020-00895-9 --> E. Rebrova, D. Needell, On block Gaussian sketching for the Kaczmarz method, , (2021), 443–473. https://doi.org/10.1007/s11075-020-00895-9 doi:
[29] , (2020), 106294. https://doi.org/10.1016/j.aml.2020.106294 --> Y. Q. Niu, B. Zheng, A greedy block Kaczmarz algorithm for solving large-scale linear systems, , (2020), 106294. https://doi.org/10.1016/j.aml.2020.106294 doi:
[30] ., 2019), 1425–1452. https://doi.org/10.1137/19M1251643 --> I. Necoara, Faster randomized block Kaczmarz algorithms, ., 2019), 1425–1452. https://doi.org/10.1137/19M1251643 doi:
[31] , 1980), 1–12. https://doi.org/10.1007/BF01396365 --> T. Elfving, Block-iterative methods for consistent and inconsistent linear equations, , 1980), 1–12. https://doi.org/10.1007/BF01396365 doi:
[32] , (2014), 1–157. http://doi.org/10.1561/0400000060 --> D. P. Woodruff, Sketching as a tool for numerical linear algebra, , (2014), 1–157. http://doi.org/10.1561/0400000060 doi:
[33] , (2024). https://doi.org/10.1007/s10092-024-00577-1 --> Y. Zhang, H. Li, L. Tang, Greedy randomized sampling nonlinear Kaczmarz methods, , (2024). https://doi.org/10.1007/s10092-024-00577-1 doi:
[34] , (2023), 115065. https://doi.org/10.1016/j.cam.2023.115065 --> J. Zhang, Y. Wang, J. Zhao, On maximum residual nonlinear Kaczmarz-type algorithms for large nonlinear systems of equations, , (2023), 115065. https://doi.org/10.1016/j.cam.2023.115065 doi:
[35] , (2013), 595. https://doi.org/10.1088/0967-3334/34/6/595 --> T. Li, T. J. Kao, D. Isaacson, J. C. Newell, G. J. Saulnier, Adaptive Kaczmarz method for image reconstruction in electrical impedance tomography, , (2013), 595. https://doi.org/10.1088/0967-3334/34/6/595 doi:
[36] , (2017), 1758–1777. --> M. B. Cohen, C. Musco, C. Musco, Input sparsity time low-rank approximation via ridge leverage score sampling, in , (2017), 1758–1777.
[37] , (2018). --> A. Rudi, D. Calandriello, L. Carratino, L. Rosasco, On fast leverage score sampling and optimal learning, , (2018).
[38] , (2021), 126486. https://doi.org/10.1016/j.amc.2021.126486 --> Y. Zhang, H. Li, A count sketch maximal weighted residual Kaczmarz method for solving highly overdetermined linear systems, , (2021), 126486. https://doi.org/10.1016/j.amc.2021.126486 doi:
[39] , (2023). https://doi.org/10.1007/s10444-023-10018-2 --> Y. Jiang, G. Wu, L. Jiang, A semi-randomized Kaczmarz method with simple random sampling for large-scale linear systems, , (2023). https://doi.org/10.1007/s10444-023-10018-2 doi:
[40]
  • This work is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 4.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/4.0/ -->

Supplements

Access History

Reader Comments

  • © 2024 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0 )

通讯作者: 陈斌, [email protected]

沈阳化工大学材料科学与工程学院 沈阳 110142

image processing research paper

Article views( 38 ) PDF downloads( 7 ) Cited by( 0 )

Figures and Tables

image processing research paper

Figures( 2 )  /  Tables( 4 )

image processing research paper

Associated material

Other articles by authors.

  • Xinpeng Yan

Related pages

  • on Google Scholar
  • Email to a friend
  • Order reprints

Export File

shu

  • Figure 1. Reconstructed images when A is generated by radon()
  • Figure 2. Reconstructed images when A is generated by randn()

Research on Optimization of Natural Language Processing Model Based on Multimodal Deep Learning

  • Liang, Yaxin
  • Yang, Yining
  • Zhan, Qishi

This project intends to study the image representation based on attention mechanism and multimodal data. By adding multiple pattern layers to the attribute model, the semantic and hidden layers of image content are integrated. The word vector is quantified by the Word2Vec method and then evaluated by a word embedding convolutional neural network. The published experimental results of the two groups were tested. The experimental results show that this method can convert discrete features into continuous characters, thus reducing the complexity of feature preprocessing. Word2Vec and natural language processing technology are integrated to achieve the goal of direct evaluation of missing image features. The robustness of the image feature evaluation model is improved by using the excellent feature analysis characteristics of a convolutional neural network. This project intends to improve the existing image feature identification methods and eliminate the subjective influence in the evaluation process. The findings from the simulation indicate that the novel approach has developed is viable, effectively augmenting the features within the produced representations.

  • Computer Science - Computation and Language;
  • Computer Science - Artificial Intelligence;
  • Computer Science - Machine Learning

Private Cloud Compute: A new frontier for AI privacy in the cloud

Apple Intelligence is the personal intelligence system that brings powerful generative models to iPhone, iPad, and Mac. For advanced features that need to reason over complex data with larger foundation models , we created Private Cloud Compute (PCC), a groundbreaking cloud intelligence system designed specifically for private AI processing. For the first time ever, Private Cloud Compute extends the industry-leading security and privacy of Apple devices into the cloud, making sure that personal user data sent to PCC isn’t accessible to anyone other than the user — not even to Apple. Built with custom Apple silicon and a hardened operating system designed for privacy, we believe PCC is the most advanced security architecture ever deployed for cloud AI compute at scale.

Apple has long championed on-device processing as the cornerstone for the security and privacy of user data. Data that exists only on user devices is by definition disaggregated and not subject to any centralized point of attack. When Apple is responsible for user data in the cloud, we protect it with state-of-the-art security in our services — and for the most sensitive data, we believe end-to-end encryption is our most powerful defense . For cloud services where end-to-end encryption is not appropriate, we strive to process user data ephemerally or under uncorrelated randomized identifiers that obscure the user’s identity.

Secure and private AI processing in the cloud poses a formidable new challenge. Powerful AI hardware in the data center can fulfill a user’s request with large, complex machine learning models — but it requires unencrypted access to the user's request and accompanying personal data. That precludes the use of end-to-end encryption, so cloud AI applications have to date employed traditional approaches to cloud security. Such approaches present a few key challenges:

  • Cloud AI security and privacy guarantees are difficult to verify and enforce. If a cloud AI service states that it does not log certain user data, there is generally no way for security researchers to verify this promise — and often no way for the service provider to durably enforce it. For example, a new version of the AI service may introduce additional routine logging that inadvertently logs sensitive user data without any way for a researcher to detect this. Similarly, a perimeter load balancer that terminates TLS may end up logging thousands of user requests wholesale during a troubleshooting session.
  • It’s difficult to provide runtime transparency for AI in the cloud. Cloud AI services are opaque: providers do not typically specify details of the software stack they are using to run their services, and those details are often considered proprietary. Even if a cloud AI service relied only on open source software, which is inspectable by security researchers, there is no widely deployed way for a user device (or browser) to confirm that the service it’s connecting to is running an unmodified version of the software that it purports to run, or to detect that the software running on the service has changed.
  • It’s challenging for cloud AI environments to enforce strong limits to privileged access. Cloud AI services are complex and expensive to run at scale, and their runtime performance and other operational metrics are constantly monitored and investigated by site reliability engineers and other administrative staff at the cloud service provider. During outages and other severe incidents, these administrators can generally make use of highly privileged access to the service, such as via SSH and equivalent remote shell interfaces. Though access controls for these privileged, break-glass interfaces may be well-designed, it’s exceptionally difficult to place enforceable limits on them while they’re in active use. For example, a service administrator who is trying to back up data from a live server during an outage could inadvertently copy sensitive user data in the process. More perniciously, criminals such as ransomware operators routinely strive to compromise service administrator credentials precisely to take advantage of privileged access interfaces and make away with user data.

When on-device computation with Apple devices such as iPhone and Mac is possible, the security and privacy advantages are clear: users control their own devices, researchers can inspect both hardware and software, runtime transparency is cryptographically assured through Secure Boot, and Apple retains no privileged access (as a concrete example, the Data Protection file encryption system cryptographically prevents Apple from disabling or guessing the passcode of a given iPhone).

However, to process more sophisticated requests, Apple Intelligence needs to be able to enlist help from larger, more complex models in the cloud. For these cloud requests to live up to the security and privacy guarantees that our users expect from our devices, the traditional cloud service security model isn't a viable starting point. Instead, we need to bring our industry-leading device security model, for the first time ever, to the cloud.

The rest of this post is an initial technical overview of Private Cloud Compute, to be followed by a deep dive after PCC becomes available in beta. We know researchers will have many detailed questions, and we look forward to answering more of them in our follow-up post.

Designing Private Cloud Compute

We set out to build Private Cloud Compute with a set of core requirements:

  • Stateless computation on personal user data. Private Cloud Compute must use the personal user data that it receives exclusively for the purpose of fulfilling the user’s request. This data must never be available to anyone other than the user, not even to Apple staff, not even during active processing. And this data must not be retained, including via logging or for debugging, after the response is returned to the user. In other words, we want a strong form of stateless data processing where personal data leaves no trace in the PCC system.
  • Enforceable guarantees. Security and privacy guarantees are strongest when they are entirely technically enforceable, which means it must be possible to constrain and analyze all the components that critically contribute to the guarantees of the overall Private Cloud Compute system. To use our example from earlier, it’s very difficult to reason about what a TLS-terminating load balancer may do with user data during a debugging session. Therefore, PCC must not depend on such external components for its core security and privacy guarantees. Similarly, operational requirements such as collecting server metrics and error logs must be supported with mechanisms that do not undermine privacy protections.
  • No privileged runtime access. Private Cloud Compute must not contain privileged interfaces that would enable Apple’s site reliability staff to bypass PCC privacy guarantees, even when working to resolve an outage or other severe incident. This also means that PCC must not support a mechanism by which the privileged access envelope could be enlarged at runtime, such as by loading additional software.
  • Non-targetability. An attacker should not be able to attempt to compromise personal data that belongs to specific, targeted Private Cloud Compute users without attempting a broad compromise of the entire PCC system. This must hold true even for exceptionally sophisticated attackers who can attempt physical attacks on PCC nodes in the supply chain or attempt to obtain malicious access to PCC data centers. In other words, a limited PCC compromise must not allow the attacker to steer requests from specific users to compromised nodes; targeting users should require a wide attack that’s likely to be detected. To understand this more intuitively, contrast it with a traditional cloud service design where every application server is provisioned with database credentials for the entire application database, so a compromise of a single application server is sufficient to access any user’s data, even if that user doesn’t have any active sessions with the compromised application server.
  • Verifiable transparency. Security researchers need to be able to verify, with a high degree of confidence, that our privacy and security guarantees for Private Cloud Compute match our public promises. We already have an earlier requirement for our guarantees to be enforceable. Hypothetically, then, if security researchers had sufficient access to the system, they would be able to verify the guarantees. But this last requirement, verifiable transparency, goes one step further and does away with the hypothetical: security researchers must be able to verify the security and privacy guarantees of Private Cloud Compute, and they must be able to verify that the software that’s running in the PCC production environment is the same as the software they inspected when verifying the guarantees.

This is an extraordinary set of requirements, and one that we believe represents a generational leap over any traditional cloud service security model.

Introducing Private Cloud Compute nodes

The root of trust for Private Cloud Compute is our compute node: custom-built server hardware that brings the power and security of Apple silicon to the data center, with the same hardware security technologies used in iPhone, including the Secure Enclave and Secure Boot . We paired this hardware with a new operating system: a hardened subset of the foundations of iOS and macOS tailored to support Large Language Model (LLM) inference workloads while presenting an extremely narrow attack surface. This allows us to take advantage of iOS security technologies such as Code Signing and sandboxing .

On top of this foundation, we built a custom set of cloud extensions with privacy in mind. We excluded components that are traditionally critical to data center administration, such as remote shells and system introspection and observability tools. We replaced those general-purpose software components with components that are purpose-built to deterministically provide only a small, restricted set of operational metrics to SRE staff. And finally, we used Swift on Server to build a new Machine Learning stack specifically for hosting our cloud-based foundation model .

Let’s take another look at our core Private Cloud Compute requirements and the features we built to achieve them.

Stateless computation and enforceable guarantees

With services that are end-to-end encrypted, such as iMessage, the service operator cannot access the data that transits through the system. One of the key reasons such designs can assure privacy is specifically because they prevent the service from performing computations on user data. Since Private Cloud Compute needs to be able to access the data in the user’s request to allow a large foundation model to fulfill it, complete end-to-end encryption is not an option. Instead, the PCC compute node must have technical enforcement for the privacy of user data during processing, and must be incapable of retaining user data after its duty cycle is complete.

We designed Private Cloud Compute to make several guarantees about the way it handles user data:

  • A user’s device sends data to PCC for the sole, exclusive purpose of fulfilling the user’s inference request. PCC uses that data only to perform the operations requested by the user.
  • User data stays on the PCC nodes that are processing the request only until the response is returned. PCC deletes the user’s data after fulfilling the request, and no user data is retained in any form after the response is returned.
  • User data is never available to Apple — even to staff with administrative access to the production service or hardware.

When Apple Intelligence needs to draw on Private Cloud Compute, it constructs a request — consisting of the prompt, plus the desired model and inferencing parameters — that will serve as input to the cloud model. The PCC client on the user’s device then encrypts this request directly to the public keys of the PCC nodes that it has first confirmed are valid and cryptographically certified. This provides end-to-end encryption from the user’s device to the validated PCC nodes, ensuring the request cannot be accessed in transit by anything outside those highly protected PCC nodes. Supporting data center services, such as load balancers and privacy gateways, run outside of this trust boundary and do not have the keys required to decrypt the user’s request, thus contributing to our enforceable guarantees.

Next, we must protect the integrity of the PCC node and prevent any tampering with the keys used by PCC to decrypt user requests. The system uses Secure Boot and Code Signing for an enforceable guarantee that only authorized and cryptographically measured code is executable on the node. All code that can run on the node must be part of a trust cache that has been signed by Apple, approved for that specific PCC node, and loaded by the Secure Enclave such that it cannot be changed or amended at runtime. This also ensures that JIT mappings cannot be created, preventing compilation or injection of new code at runtime. Additionally, all code and model assets use the same integrity protection that powers the Signed System Volume . Finally, the Secure Enclave provides an enforceable guarantee that the keys that are used to decrypt requests cannot be duplicated or extracted.

The Private Cloud Compute software stack is designed to ensure that user data is not leaked outside the trust boundary or retained once a request is complete, even in the presence of implementation errors. The Secure Enclave randomizes the data volume’s encryption keys on every reboot and does not persist these random keys , ensuring that data written to the data volume cannot be retained across reboot. In other words, there is an enforceable guarantee that the data volume is cryptographically erased every time the PCC node’s Secure Enclave Processor reboots. The inference process on the PCC node deletes data associated with a request upon completion, and the address spaces that are used to handle user data are periodically recycled to limit the impact of any data that may have been unexpectedly retained in memory.

Finally, for our enforceable guarantees to be meaningful, we also need to protect against exploitation that could bypass these guarantees. Technologies such as Pointer Authentication Codes and sandboxing act to resist such exploitation and limit an attacker’s horizontal movement within the PCC node. The inference control and dispatch layers are written in Swift, ensuring memory safety, and use separate address spaces to isolate initial processing of requests. This combination of memory safety and the principle of least privilege removes entire classes of attacks on the inference stack itself and limits the level of control and capability that a successful attack can obtain.

No privileged runtime access

We designed Private Cloud Compute to ensure that privileged access doesn’t allow anyone to bypass our stateless computation guarantees.

First, we intentionally did not include remote shell or interactive debugging mechanisms on the PCC node. Our Code Signing machinery prevents such mechanisms from loading additional code, but this sort of open-ended access would provide a broad attack surface to subvert the system’s security or privacy. Beyond simply not including a shell, remote or otherwise, PCC nodes cannot enable Developer Mode and do not include the tools needed by debugging workflows.

Next, we built the system’s observability and management tooling with privacy safeguards that are designed to prevent user data from being exposed. For example, the system doesn’t even include a general-purpose logging mechanism. Instead, only pre-specified, structured, and audited logs and metrics can leave the node, and multiple independent layers of review help prevent user data from accidentally being exposed through these mechanisms. With traditional cloud AI services, such mechanisms might allow someone with privileged access to observe or collect user data.

Together, these techniques provide enforceable guarantees that only specifically designated code has access to user data and that user data cannot leak outside the PCC node during system administration.

Non-targetability

Our threat model for Private Cloud Compute includes an attacker with physical access to a compute node and a high level of sophistication — that is, an attacker who has the resources and expertise to subvert some of the hardware security properties of the system and potentially extract data that is being actively processed by a compute node.

We defend against this type of attack in two ways:

  • We supplement the built-in protections of Apple silicon with a hardened supply chain for PCC hardware, so that performing a hardware attack at scale would be both prohibitively expensive and likely to be discovered.
  • We limit the impact of small-scale attacks by ensuring that they cannot be used to target the data of a specific user.

Private Cloud Compute hardware security starts at manufacturing, where we inventory and perform high-resolution imaging of the components of the PCC node before each server is sealed and its tamper switch is activated. When they arrive in the data center, we perform extensive revalidation before the servers are allowed to be provisioned for PCC. The process involves multiple Apple teams that cross-check data from independent sources, and the process is further monitored by a third-party observer not affiliated with Apple. At the end, a certificate is issued for keys rooted in the Secure Enclave UID for each PCC node. The user’s device will not send data to any PCC nodes if it cannot validate their certificates.

These processes broadly protect hardware from compromise. To guard against smaller, more sophisticated attacks that might otherwise avoid detection, Private Cloud Compute uses an approach we call target diffusion to ensure requests cannot be routed to specific nodes based on the user or their content.

Target diffusion starts with the request metadata, which leaves out any personally identifiable information about the source device or user, and includes only limited contextual data about the request that’s required to enable routing to the appropriate model. This metadata is the only part of the user’s request that is available to load balancers and other data center components running outside of the PCC trust boundary. The metadata also includes a single-use credential, based on RSA Blind Signatures , to authorize valid requests without tying them to a specific user. Additionally, PCC requests go through an OHTTP relay — operated by a third party — which hides the device’s source IP address before the request ever reaches the PCC infrastructure. This prevents an attacker from using an IP address to identify requests or associate them with an individual. It also means that an attacker would have to compromise both the third-party relay and our load balancer to steer traffic based on the source IP address.

User devices encrypt requests only for a subset of PCC nodes, rather than the PCC service as a whole. When asked by a user device, the load balancer returns a subset of PCC nodes that are most likely to be ready to process the user’s inference request — however, as the load balancer has no identifying information about the user or device for which it’s choosing nodes, it cannot bias the set for targeted users. By limiting the PCC nodes that can decrypt each request in this way, we ensure that if a single node were ever to be compromised, it would not be able to decrypt more than a small portion of incoming requests. Finally, the selection of PCC nodes by the load balancer is statistically auditable to protect against a highly sophisticated attack where the attacker compromises a PCC node as well as obtains complete control of the PCC load balancer.

Verifiable transparency

We consider allowing security researchers to verify the end-to-end security and privacy guarantees of Private Cloud Compute to be a critical requirement for ongoing public trust in the system. Traditional cloud services do not make their full production software images available to researchers — and even if they did, there’s no general mechanism to allow researchers to verify that those software images match what’s actually running in the production environment. (Some specialized mechanisms exist, such as Intel SGX and AWS Nitro attestation.)

When we launch Private Cloud Compute, we’ll take the extraordinary step of making software images of every production build of PCC publicly available for security research . This promise, too, is an enforceable guarantee: user devices will be willing to send data only to PCC nodes that can cryptographically attest to running publicly listed software. We want to ensure that security and privacy researchers can inspect Private Cloud Compute software, verify its functionality, and help identify issues — just like they can with Apple devices.

Our commitment to verifiable transparency includes:

  • Publishing the measurements of all code running on PCC in an append-only and cryptographically tamper-proof transparency log.
  • Making the log and associated binary software images publicly available for inspection and validation by privacy and security experts.
  • Publishing and maintaining an official set of tools for researchers analyzing PCC node software.
  • Rewarding important research findings through the Apple Security Bounty program.

Every production Private Cloud Compute software image will be published for independent binary inspection — including the OS, applications, and all relevant executables, which researchers can verify against the measurements in the transparency log. Software will be published within 90 days of inclusion in the log, or after relevant software updates are available, whichever is sooner. Once a release has been signed into the log, it cannot be removed without detection, much like the log-backed map data structure used by the Key Transparency mechanism for iMessage Contact Key Verification .

As we mentioned, user devices will ensure that they’re communicating only with PCC nodes running authorized and verifiable software images. Specifically, the user’s device will wrap its request payload key only to the public keys of those PCC nodes whose attested measurements match a software release in the public transparency log. And the same strict Code Signing technologies that prevent loading unauthorized software also ensure that all code on the PCC node is included in the attestation.

Making Private Cloud Compute software logged and inspectable in this way is a strong demonstration of our commitment to enable independent research on the platform. But we want to ensure researchers can rapidly get up to speed, verify our PCC privacy claims, and look for issues, so we’re going further with three specific steps:

  • We’ll release a PCC Virtual Research Environment: a set of tools and images that simulate a PCC node on a Mac with Apple silicon, and that can boot a version of PCC software minimally modified for successful virtualization.
  • While we’re publishing the binary images of every production PCC build, to further aid research we will periodically also publish a subset of the security-critical PCC source code.
  • In a first for any Apple platform, PCC images will include the sepOS firmware and the iBoot bootloader in plaintext , making it easier than ever for researchers to study these critical components.

The Apple Security Bounty will reward research findings in the entire Private Cloud Compute software stack — with especially significant payouts for any issues that undermine our privacy claims.

More to come

Private Cloud Compute continues Apple’s profound commitment to user privacy. With sophisticated technologies to satisfy our requirements of stateless computation, enforceable guarantees, no privileged access, non-targetability, and verifiable transparency, we believe Private Cloud Compute is nothing short of the world-leading security architecture for cloud AI compute at scale.

We look forward to sharing many more technical details about PCC, including the implementation and behavior behind each of our core requirements. And we’re especially excited to soon invite security researchers for a first look at the Private Cloud Compute software and our PCC Virtual Research Environment.

A lightweight approach for image quality assessment

  • Original Paper
  • Published: 17 June 2024

Cite this article

image processing research paper

  • Quang Minh Dang 1 ,
  • Minh Tuyen Truong 1 &
  • Tuan Linh Dang 1  

Image quality assessment is a vital computer vision task for image validation and visual experience development. Lately, most research in this field has focused on enhancing model performance, resulting in a significant growth in model size. Those models may require considerable storage resources and computational costs. Additionally, they have yet to focus on the small-parameter models. Therefore, to contribute a lightweight model for this topic, this paper proposes a new module called GhostDPD and uses it to construct a MobileDPD model. The GhostDPD structure has lightweight attention layers, two depth-wise convolutional layers, and a module with fewer parameters to replace the point-wise layer. Experiments with the different datasets showed that the proposed model achieved similar results to state-of-the-art approaches despite being much smaller.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

image processing research paper

Data availability

The KonIQ-10K data can be obtained from the authors of the dataset. Our internal dataset can be obtained from the corresponding author upon reasonable request.

Sheikh, H., Wang, Z., Cormack, L., Bovik, A.: LIVE image quality assessment database. http://live.ece.utexas.edu/research/quality

Murray, N., Marchesotti, L., Perronnin, F.: Ava: A large-scale database for aesthetic visual analysis. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2408–2415 (2012). https://doi.org/10.1109/CVPR.2012.6247954

Hosu, V., Lin, H., Sziranyi, T., Saupe, D.: KonIQ-10k: an ecologically valid database for deep learning of blind image quality assessment. IEEE Trans. Image Process. 29 , 4041–4056 (2020)

Article   Google Scholar  

Lao, S., Gong, Y., Shi, S., Yang, S., Wu, T., Wang, J., Xia, W., Yang, Y.: Attentions help CNNs see better: attention-based hybrid image quality assessment network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1140–1149 (2022)

Cheon, M., Yoon, S.-J., Kang, B., Lee, J.: Perceptual image quality assessment with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 433–442 (2021)

Yang, F., Ke, J., Milanfar, P., Wang, Q., Wang, Y.: MUSIQ: multi-scale image quality transformer (2021)

Yang, S., Wu, T., Shi, S., Lao, S., Gong, Y., Cao, M., Wang, J., Yang, Y.: Maniqa: multi-dimension attention network for no-reference image quality assessment. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1191–1200 (2022)

Zhang, W., Ma, K., Yan, J., Deng, D., Wang, Z.: Blind image quality assessment using a deep bilinear convolutional neural network. In:IEEE Transactions on Circuits and Systems for Video Technology, 30 (1), pp. 36–47. https://doi.org/10.1109/TCSVT.2018.2886771

Su, S., Yan, Q., Zhu, Y., Zhang, C., Ge, X., Sun, J., Zhang, Y.: Blindly assess image quality in the wild guided by a self-adaptive hyper network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3667–3676 (2020)

Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.-C.: Mobilenetv2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

Han, K., Wang, Y., Tian, Q., Guo, J., Xu, C., Xu, C.: Ghostnet: more features from cheap operations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1580–1589 (2020)

Li, G., Zhang, M., Zhang, Q., Chen, Z., Liu, W., Li, J., Shen, X., Li, J., Zhu, Z., Yuen, C.: Psdnet and dpdnet: efficient channel expansion, depthwise-pointwise-depthwise inverted bottleneck block. arXiv preprint arXiv:1909.01026 (2019)

Woo, S., Debnath, S., Hu, R., Chen, X., Liu, Z., Kweon, I.S., Xie, S.: Convnext v2: co-designing and scaling convnets with masked autoencoders. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16133–16142 (2023)

Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: Cbam: convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018)

Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: practical guidelines for efficient CNN architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018)

Qian, S., Ning, C., Hu, Y.: Mobilenetv3 for image classification. In: 2021 IEEE 2nd International Conference on Big Data, Artificial Intelligence and Internet of Things Engineering (ICBAIE), pp. 490–497 (2021). IEEE

Mei, Z., Wang, Y.-C., He, X., Kuo, C.-C.J.: Greenbiqa: a lightweight blind image quality assessment method. In: 2022 IEEE 24th International Workshop on Multimedia Signal Processing (MMSP), pp. 1–6 (2022). IEEE

Zeng, H., Zhang, L., Bovik, A.C.: Blind image quality assessment with a probabilistic quality representation. In: 2018 25th IEEE International Conference on Image Processing (ICIP), pp. 609–613 (2018). https://doi.org/10.1109/ICIP.2018.8451285

Pan, Q., Guo, N., Qingge, L., Zhang, J., Yang, P.: PMT-IQA: progressive multi-task learning for blind image quality assessment. In: Pacific Rim International Conference on Artificial Intelligence, pp. 153–164 (2023). Springer

Pan, Q., Guo, N., Qingge, L., Zhang, J., Yang, P.: PMT-IQA github. https://github.com/pqy000/PMT-IQA/tree/main

Huang, Q., Fang, B.: A lightweight parallel framework for blind image quality assessment. arXiv preprint arXiv:2402.12043 (2024)

Download references

Acknowledgements

This research is funded by Hanoi University of Science and Technology (HUST) under grant number T2022-PC-052. This research is also partially supported by NAVER Corporation within the framework of collaboration with the International Research Center for Artificial Intelligence (BKAI), School of Information and Communications Technology, HUST under project NAVER.2022.DA02.

Author information

Authors and affiliations.

School of Information and Communications Technology, Hanoi University of Science and Technology, 01 Dai Co Viet Road, Hanoi, 100000, Vietnam

Quang Minh Dang, Minh Tuyen Truong & Tuan Linh Dang

You can also search for this author in PubMed   Google Scholar

Contributions

Quang Minh Dang was involved in the investigation, implementation, writing, and drawing of figures and tables. Tuyen Truong Minh was responsible for implementation and writing. Tuan Linh Dang participated in conceptualization, investigation, writing, supervision, project administration, and funding acquisition. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Tuan Linh Dang .

Ethics declarations

Conflict of interest.

The authors declare that they have no Conflict of interest.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Dang, Q.M., Truong, M.T. & Dang, T.L. A lightweight approach for image quality assessment. SIViP (2024). https://doi.org/10.1007/s11760-024-03349-0

Download citation

Received : 29 March 2024

Revised : 11 May 2024

Accepted : 01 June 2024

Published : 17 June 2024

DOI : https://doi.org/10.1007/s11760-024-03349-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Image assessment
  • Lightweight
  • No-reference quality assessment
  • Deep learning
  • Find a journal
  • Publish with us
  • Track your research

Apple Intelligence Preview

image processing research paper

AI for the rest of us.

Coming in beta this fall *

Static image of multiple iPhones showing Apple Intelligence features.

Built into your iPhone, iPad, and Mac to help you write, express yourself, and get things done effortlessly.

Draws on your personal context while setting a brand-new standard for privacy in AI.

image processing research paper

Write with intelligent new tools. Everywhere words matter.

Apple Intelligence powers new Writing Tools, which help you find just the right words virtually everywhere you write. With enhanced language capabilities, you can summarize an entire lecture in seconds, get the short version of a long group thread, and minimize unnecessary distractions with prioritized notifications.

iPhone and Mac showing Writing Tools

Explore new features for writing, focus, and communication.

UI for Writing Tools with a text field to enter prompts, buttons for Proofread and Rewrite, different tones of writing voice, and options for summarize, key points, table, and list

Transform how you communicate using intelligent Writing Tools that can proofread your text, rewrite different versions until the tone and wording are just right, and summarize selected text with a tap. Writing Tools are available nearly everywhere you write, including third-party apps.

Notifications list on an iPhone highlights Most Important at the top of the stack

Priority notifications appear at the top of the stack, letting you know what to pay attention to at a glance. And notifications are summarized, so you can scan them faster.

iPhone shows inbox in Mail app with important messages at the top and highlighted a different color

Priority messages in Mail elevate time-sensitive messages to the top of your inbox — like an invitation that has a deadline today or a check-in reminder for your flight this afternoon.

An email in the Mail app is shown with a summary you can read at the top.

Tap to reveal a summary of a long email in the Mail app and cut to the chase. You can also view summaries of email right from your inbox.

Phone app is shown with a new record function on a live call. A second iPhone shows a summary of the call based on live audio transcription.

Just hit record in the Notes or Phone apps to capture audio recordings and transcripts. Apple Intelligence generates summaries of your transcripts, so you can get to the most important information at a glance.

iPhone with Reduce Notifications Focus enabled shows a single notification marked "maybe important."

Reduce Interruptions is an all-new Focus that understands the content of your notifications and shows you the ones that might need immediate attention, like a text about picking up your child from daycare later today.

Smart Reply options in the Mail app are shown on an iPhone.

Use a Smart Reply in Mail to quickly draft an email response with all the right details. Apple Intelligence can identify questions you were asked in an email and offer relevant selections to include in your response. With a few taps you’re ready to send a reply with key questions answered.

Delightful images created just for you.

Apple Intelligence enables delightful new ways to express yourself visually. Create fun, original images and brand-new Genmoji that are truly personal to you. Turn a rough sketch into a related image that complements your notes with Image Wand. And make a custom memory movie based on the description you provide.

Custom images are shown in the Message app and the Image Wand feature in Notes is shown on an iPad.

Create expressive images, unique Genmoji, and custom memory movies.

UI of the Image Playground experience shows a colorful image of a brain surrounded by classical instruments and music notation with suggestions for more elements to add to the image

Produce fun, original images in seconds with the Image Playground experience right in your apps. Create an entirely new image based on a description, suggested concepts, and even a person from your Photos library. You can easily adjust the style and make changes to match a Messages thread, your Freeform board, or a slide in Keynote.

Image Playground app is shown on iPad. A custom image in the center is surrounded by different ideas and keywords used to make it.

Experiment with different concepts and try out image styles like animation, illustration, and sketch in the dedicated Image Playground app . Create custom images to share with friends in other apps or on social media.

Preview of a custom Genmoji of someone named Vee based on the prompt, race car driver

Make a brand-new Genmoji right in the keyboard to match any conversation. Provide a description to see a preview, and adjust your description until it’s perfect. You can even pick someone from your Photos library and create a Genmoji that looks like them.

A hand holding Apple Pencil draws a circle around a sketch in the Notes app on iPad.

Image Wand can transform your rough sketch into a related image in the Notes app. Use your finger or Apple Pencil to draw a circle around your sketch, and Image Wand will analyze the content around it to produce a complementary visual. You can even circle an empty space, and Image Wand will use the surrounding context to create a picture.

Cover of a custom new memory based on the description entered in the text field in the Photos app

Create a custom memory movie of the story you want to see, right in Photos. Enter a description, and Apple Intelligence finds the best photos and videos that match. It then crafts a storyline with unique chapters based on themes it identifies and arranges your photos into a movie with its own narrative arc.

A grid of photos based on the search prompt Katie with stickers on her face

Search for photos and videos in the Photos app simply by describing what you’re looking for. Apple Intelligence can even find a particular moment in a video clip that fits your search description and take you right to it.

A hand taps an object in the background of a photo on iPhone to highlight what to clean up

Remove distractions in your photos with the Clean Up tool in the Photos app. Apple Intelligence identifies background objects so you can remove them with a tap and perfect your shot — while staying true to the original image.

The start of a new era for Siri.

Siri draws on Apple Intelligence for all-new superpowers. With an all-new design, richer language understanding, and the ability to type to Siri whenever it’s convenient for you, communicating with Siri is more natural than ever. Equipped with awareness of your personal context, the ability to take action in and across apps, and product knowledge about your devices’ features and settings, Siri will be able to assist you like never before.

Mac, iPad, and iPhone are shown with new Siri features powered by Apple Intelligence

Discover an even more capable, integrated, personal Siri.

A light, colorful glow is barely visible around the edge of an iPhone showing the home screen

Siri has an all-new design that’s even more deeply integrated into the system experience, with an elegant, glowing light that wraps around the edge of your screen.

A text field at the top of keyboard in iPhone says Ask Siri

With a double tap on the bottom of your iPhone or iPad screen, you can type to Siri from anywhere in the system when you don’t want to speak out loud.

An iPhone is shown with step-by-step guidelines on how to schedule a text message to send later

Tap into the expansive product knowledge Siri has about your devices’ features and settings. You can ask questions when you’re learning how to do something new on your iPhone, iPad, and Mac, and Siri can give you step-by-step directions in a flash.

Siri, set an alarm for — oh wait no, set a timer for 10 minutes. Actually, make that 5.

Richer language understanding and an enhanced voice make communicating with Siri even more natural. And when you refer to something you mentioned in a previous request, like the location of a calendar event you just created, and ask ”What will the weather be like there?” Siri knows what you’re talking about.

A notification in the Apple TV+ app reminds you that a contact shared a show recommendation with you

Apple Intelligence empowers Siri with onscreen awareness , so it can understand and take action with things on your screen. If a friend texts you their new address, you can say “Add this address to their contact card,” and Siri will take care of it.

Snippets of information like calendar events, photos, and notes shows the many sources Siri can draw from

Awareness of your personal context enables Siri to help you in ways that are unique to you. Can’t remember if a friend shared that recipe with you in a note, a text, or an email? Need your passport number while booking a flight? Siri can use its knowledge of the information on your device to help find what you’re looking for, without compromising your privacy.

Photos library is shown on an iPhone along with a search description. A second iPhone is open to a single photo favorited based on the search. A third iPhone shows the photo incorporated into a note in the Notes app.

Seamlessly take action in and across apps with Siri. You can make a request like “Send the email I drafted to April and Lilly” and Siri knows which email you’re referencing and which app it’s in. And Siri can take actions across apps, so after you ask Siri to enhance a photo for you by saying “Make this photo pop,” you can ask Siri to drop it in a specific note in the Notes app — without lifting a finger.

Great powers come with great privacy.

Apple Intelligence is designed to protect your privacy at every step. It’s integrated into the core of your iPhone, iPad, and Mac through on-device processing. So it’s aware of your personal information without collecting your personal information. And with groundbreaking Private Cloud Compute, Apple Intelligence can draw on larger server-based models, running on Apple silicon, to handle more complex requests for you while protecting your privacy.

Private Cloud Compute

  • Your data is never stored
  • Used only for your requests
  • Verifiable privacy promise

image processing research paper

ChatGPT, seamlessly integrated.

With ChatGPT from OpenAI integrated into Siri and Writing Tools, you get even more expertise when it might be helpful for you — no need to jump between tools. Siri can tap into ChatGPT for certain requests, including questions about photos or documents. And with Compose in Writing Tools, you can create and illustrate original content from scratch.

You control when ChatGPT is used and will be asked before any of your information is shared. Anyone can access ChatGPT for free, without creating an account. ChatGPT subscribers can connect accounts to access paid features within these experiences.

The Compose in Writing Tools feature is shown on a MacBook

New possibilities for your favorite apps.

New App Intents, APIs, and frameworks make it incredibly easy for developers to integrate system-level features like Siri, Writing Tools, and Image Playground into your favorite apps.

Learn more about developing for Apple Intelligence

Apple Intelligence is compatible with these devices.

Apple Intelligence is free to use and will initially be available in U.S. English. Coming in beta this fall. *

  • iPhone 15 Pro Max A17 Pro
  • iPhone 15 Pro A17 Pro
  • iPad Pro M1 and later
  • iPad Air M1 and later
  • MacBook Air M1 and later
  • MacBook Pro M1 and later
  • iMac M1 and later
  • Mac mini M1 and later
  • Mac Studio M1 Max and later
  • Mac Pro M2 Ultra

IMAGES

  1. 😊 Research paper on digital image processing. Digital Image Processing

    image processing research paper

  2. (PDF) A Review on Image Processing

    image processing research paper

  3. Review Paper of Image processing technique for the diagnosis of diseases

    image processing research paper

  4. What is Research Paper?

    image processing research paper

  5. (PDF) Digital Image Processing Using Machine Learning

    image processing research paper

  6. (PDF) Review of Digital Image Processing System

    image processing research paper

VIDEO

  1. DATA COLLECTION AND PROCESSING ||RESEARCH AND METHODOLOGY || SYBAF || sem 4 #mumbaiuniversity

  2. AI for Journalists: Enhancing Research and Storytelling

  3. Image Processing Tutorial: Session 1

  4. video enhancement matlab

  5. Image Processing with Python

  6. Digital image processing paper rtu exam for computer science engineering 6th semester

COMMENTS

  1. Image Processing: Research Opportunities and Challenges

    Image Processing: Research O pportunities and Challenges. Ravindra S. Hegadi. Department of Computer Science. Karnatak University, Dharwad-580003. ravindrahegadi@rediffmail. Abstract. Interest in ...

  2. Deep learning models for digital image processing: a review

    Within the domain of image processing, a wide array of methodologies is dedicated to tasks including denoising, enhancement, segmentation, feature extraction, and classification. ... This compilation of research papers presents a comprehensive exploration of deep learning methodologies applied to two prominent types of image segmentation ...

  3. Image processing

    Image processing articles from across Nature Portfolio ... Research Open Access 15 Jun 2024 ... When it comes to bioimaging and image analysis, details matter. Papers in this issue offer guidance ...

  4. 471383 PDFs

    All kinds of image processing approaches. | Explore the latest full-text research PDFs, articles, conference papers, preprints and more on IMAGE PROCESSING. Find methods information, sources ...

  5. J. Imaging

    When we consider the volume of research developed, there is a clear increase in published research papers targeting image processing and DL, over the last decades. A search using the terms "image processing deep learning" in Springerlink generated results demonstrating an increase from 1309 articles in 2005 to 30,905 articles in 2022, only ...

  6. Image processing

    Deep learning segmentation of non-perfusion area from color fundus images and AI-generated fluorescein angiography. Kanato Masayoshi. , Yusaku Katada. & Toshihide Kurihara. Article. 11 May 2024 ...

  7. Frontiers

    The field of image processing has been the subject of intensive research and development activities for several decades. This broad area encompasses topics such as image/video processing, image/video analysis, image/video communications, image/video sensing, modeling and representation, computational imaging, electronic imaging, information forensics and security, 3D imaging, medical imaging ...

  8. IET Image Processing

    IET Image Processing journal publishes the latest research in image and video processing, covering the generation, processing ... This paper proposes a multi-level feature fusion template matching algorithm that combines the two algorithms of gradient matching and grey matching. This paper uses the grey-based Fast Normalized Matching algorithm ...

  9. Home

    The journal is dedicated to the real-time aspects of image and video processing, bridging the gap between theory and practice. Covers real-time image processing systems and algorithms for various applications. Presents practical and real-time architectures for image processing systems. Provides tools, simulation and modeling for real-time image ...

  10. Advances in image processing using machine learning techniques

    With the recent advances in digital technology, there is an eminent integration of ML and image processing to help resolve complex problems. In this special issue, we received six interesting papers covering the following topics: image prediction, image segmentation, clustering, compressed sensing, variational learning, and dynamic light coding.

  11. Techniques and Applications of Image and Signal Processing : A

    This paper comprehensively overviews image and signal processing, including their fundamentals, advanced techniques, and applications. Image processing involves analyzing and manipulating digital images, while signal processing focuses on analyzing and interpreting signals in various domains. The fundamentals encompass digital signal representation, Fourier analysis, wavelet transforms ...

  12. A Comprehensive Overview of Image Enhancement Techniques

    Image enhancement plays an important role in improving image quality in the field of image processing, which is achieved by highlighting useful information and suppressing redundant information in the image. In this paper, the development of image enhancement algorithms is surveyed. The purpose of our review is to provide relevant researchers with a comprehensive and systematic analysis on ...

  13. IEEE TRANSACTIONS ON IMAGE PROCESSING, JAN. -, NO. -,

    IEEE TRANSACTIONS ON IMAGE PROCESSING, JAN. -, NO. -, - 2023 1 Deep Learning for Human Parsing: A Survey Xiaomei Zhang, Xiangyu Zhu, Senior Member, IEEE, Ming Tang, Member, IEEE, and Zhen Lei, Senior Member, IEEE Abstract—Human parsing is a key topic in image processing with many applications, such as surveillance analysis, human-

  14. Recent Trends in Image Processing and Pattern Recognition

    The 5th International Conference on Recent Trends in Image Processing and Pattern Recognition (RTIP2R) aims to attract current and/or advanced research on image processing, pattern recognition, computer vision, and machine learning. The RTIP2R will take place at the Texas A&M University—Kingsville, Texas (USA), on November 22-23, 2022, in ...

  15. Frontiers

    Technological advancements in computing multiple opportunities in a wide variety of fields that range from document analysis (Santosh, 2018), biomedical and healthcare informatics (Santosh et al., 2019; Santosh et al., 2021; Santosh and Gaur, 2021; Santosh and Joshi, 2021), and biometrics to intelligent language processing.These applications primarily leverage AI tools and/or techniques, where ...

  16. An Analysis Of Convolutional Neural Networks For Image Classification

    Abstract. This paper presents an empirical analysis of theperformance of popular convolutional neural networks (CNNs) for identifying objects in real time video feeds. The most popular convolution neural networks for object detection and object category classification from images are Alex Nets, GoogLeNet, and ResNet50.

  17. A review on image processing and image segmentation

    A methodological study on significance of image processing and its applications in the field of computer vision is carried out here. During an image processing operation the input given is an image and its output is an enhanced high quality image as per the techniques used. Image processing usually referred as digital image processing, but optical and analog image processing also are possible ...

  18. Image Processing

    The two examples in this paper show the possibilities of image processing for structure suppression and contrast enhancement of low contrast features. ... Image processing is a constantly growing research area that is used in many applications in different fields, such as security, medicine, quality control, and astronomy, among others. It ...

  19. The Constantly Evolving Role of Medical Image Processing in Oncology

    In this paper, it is argued that the evolution of medical image processing has been a gradual process, and the diverse factors that contributed to unprecedented progress in the field with the use of AI are explained. ... During the last decades CAD-driven precision diagnosis has been the holy grail of medical image processing research efforts ...

  20. Image Classification

    114. Paper. Code. **Image Classification** is a fundamental task in vision recognition that aims to understand and categorize an image as a whole under a specific label. Unlike [object detection] (/task/object-detection), which involves classification and location of multiple objects within an image, image classification typically pertains to ...

  21. Image Enhancement

    4. Paper. Code. **Image Enhancement** is basically improving the interpretability or perception of information in images for human viewers and providing 'better' input for other automated image processing techniques. The principal objective of Image Enhancement is to modify attributes of an image to make it more suitable for a given task ...

  22. Research on Segmentation Technology of Process Image Based on

    With the advent of the digital age, the fields of computer vision and digital image processing have developed rapidly. Process image plays an important role in the automatic production process. However, due to the complexity and diversity of process images, how to segment process images accurately and efficiently is still a challenging problem in practical applications. Process image ...

  23. Recent trends in image processing and pattern recognition

    The Call for Papers of the special issue was initially sent out to the participants of the 2018 conference (2nd International Conference on Recent Trends in Image Processing and Pattern Recognition). To attract high quality research articles, we also accepted papers for review from outside the conference event.

  24. FULL PAPER on Image processing & Cryptography on Hardware CU

    Abstract: The importance of embedded applications on image and video processing, communication and cryptography domain has been taking a larger space in current research era. Improvement of pictorial information for betterment of human perception like deblurring, de-noising in several fields such as satellite imaging, medical imaging etc are ...

  25. Introducing Apple's On-Device and Server Foundation Models

    Figure 1: Modeling overview for the Apple foundation models. Pre-Training. Our foundation models are trained on Apple's AXLearn framework, an open-source project we released in 2023.It builds on top of JAX and XLA, and allows us to train the models with high efficiency and scalability on various training hardware and cloud platforms, including TPUs and both cloud and on-premise GPUs.

  26. Faster free pseudoinverse greedy block Kaczmarz method for image recovery

    The leverage score sampling free pseudo-inverse GBK algorithm proposed in this paper demonstrated significant potential in the field of image reconstruction. By ingeniously transforming the problem framework, the algorithm not only enhanced the efficiency of processing systems of linear equations with multiple solution vectors but also ...

  27. Research on Optimization of Natural Language Processing ...

    This project intends to study the image representation based on attention mechanism and multimodal data. By adding multiple pattern layers to the attribute model, the semantic and hidden layers of image content are integrated. The word vector is quantified by the Word2Vec method and then evaluated by a word embedding convolutional neural network. The published experimental results of the two ...

  28. Private Cloud Compute: A new frontier for AI privacy in the cloud

    Secure and private AI processing in the cloud poses a formidable new challenge. To support advanced features of Apple Intelligence with larger foundation models, we created Private Cloud Compute (PCC), a groundbreaking cloud intelligence system designed specifically for private AI processing. Built with custom Apple silicon and a hardened operating system, Private Cloud Compute extends the ...

  29. A lightweight approach for image quality assessment

    Signal, Image and Video Processing - Image quality assessment is a vital computer vision task for image validation and visual experience development. ... With the goal of contributing a lightweight architecture for this research area, this paper introduces the model called MobileDPD which has only 0.72 million parameters while remaining ...

  30. Apple Intelligence Preview

    Apple Intelligence enables delightful new ways to express yourself visually. Create fun, original images and brand-new Genmoji that are truly personal to you. Turn a rough sketch into a related image that complements your notes with Image Wand. And make a custom memory movie based on the description you provide.