• Resources Home 🏠
  • Try SciSpace Copilot
  • Search research papers
  • Add Copilot Extension
  • Try AI Detector
  • Try Paraphraser
  • Try Citation Generator
  • April Papers
  • June Papers
  • July Papers

SciSpace Resources

Best AI Tools for Research Paper Writing in 2024

Nikhil Seethi

Table of Contents

Before discussing the unique benefits and characteristics of using AI tools in research, let’s look at some of the AI writing tools that can help you with writing a research paper. There’s a plethora of AI research tools available today tailored to different research purposes and with different capabilities. Let’s look at the 5 best AI tools for research that you can incorporate into your research and writing process.

Top 5 AI Tools for Research Paper Writing

  • SciSpace Literature Review
  • ChatGPT | Custom GPTs for Research Writing

Let's explore these tools in detail!

1 . SciSpace Literature Review

SciSpace’s Literature Review tool is designed for researchers to conduct systematic literature reviews efficiently. This AI research tool helps you compare and contrast scientific papers, and extract key information, themes, patterns, and concepts of numerous research papers in minutes, all in one single workspace.

By simply entering your questions or keywords or niche-specific topics, you can find relevant research articles from SciSpace’s repository of 282M+ scientific papers. With the matrix feature, you can filter and derive useful insights from relevant research papers and save time. You can extract the summaries of all the prominent sections of the research paper including introduction, abstract, methodologies, conclusions, discussions, etc. And ask follow-up questions simultaneously to Copilot, the AI research assistant, to identify gaps in the literature, and even extract empirical data from papers. This lets you dive deeper into each paper and get more clarity on concepts or ideas within them. With an AI research tool like SciSpace Literature Review, researchers can save hours of valuable time and boost the overall quality and productivity of their research while improving academic writing.

Here’s a sample:

Using-SciSpace-Literature-Review-for-Research-Paper-Outlines

2. ChatGPT | Custom GPTs for Research Writing

ChatGPT serves as a valuable ally for brainstorming research paper outlines and structures . Researchers can leverage ChatGPT to generate ideas, refine research questions , and outline the paper's framework. Custom GPTs that are tailored to research paper writing can also be very helpful.

Especially, custom GPTs for research writing like SciSpace’s GPT (formerly ResearchGPT) that are trained on scientific papers can help you locate relevant literature with ease and also brainstorm ideas for your research paper.

An example of a simple outline:

ResearchGPT-for-Research-Paper-Writing

3. WordVice

WordVice enhances language precision, ensuring the clarity and coherence of research papers. It offers advanced proofreading, language enhancement, and coherence improvement.

By automatically detecting and correcting errors in academic papers with WordVice, researchers can let their words flow without the typical interruptions to make amends. It can also give coherent explanations of each error and suggest stylistic changes in your writing.

This AI writing tool is incredibly helpful in writing your research paper, especially if you’re not a native speaker of the language or if you don’t have enough grasp of grammar and the conventional usages in academic writing.

Take a look at this example:

Writing-Research-Paper-Using-Wordvice

4. Wordtune

A personal AI writing assistant that excels in paraphrasing, aiding researchers in presenting ideas with clarity and originality. It helps to rephrase and refine sentences, avoiding plagiarism and enhancing the overall quality of writing.

Do you want to sound more authentic in your writing and also find the appropriate words for different contexts? Wordtune can help you fine-tune any text and also improve the quality of writing by suggesting a plethora of options with the Spices feature. Spices feature is essentially meant to improve your writing by generating text and giving you different options to choose from.

This AI writing tool also includes the option to ‘Incorporate data from verified sources’ to make your paper more empirically adequate and relevant. This way, Wordtune will help you deal with writer’s block and also give you relevant information, and suggest stylistic adjustments to boost the quality of your research paper.

Here's an example:

How-to-Use-Wordtune-for-Writing-Research-Paper

An AI-powered transcription tool, is a valuable asset for researchers, particularly those involved in the social sciences. It streamlines tasks such as transcribing interviews, capturing research ideas, and converting spoken words into text through mobile dictation.

Despite the inherent challenges of AI-generated transcriptions, Otter excels in providing quick and automated solutions, saving significant time for researchers. The tool's support for real-time transcriptions enhances its utility as an efficient starting point for various research endeavors.

Here’s what it might look like to get started:

Use-Otter-for-Writing-Scientific-Paper

Benefits and Challenges of Using AI Tools

While these tools offer immense benefits, challenges exist. AI research tools can streamline tasks but must be used responsibly. The responsible integration of AI into research processes ensures a harmonious blend of technological guidance without compromising the researcher's intellectual rigor and creative input.

Let’s look at some of the main benefits and challenges of using ai for research paper writing.

  • Time Efficiency: AI tools accelerate tedious tasks, allowing researchers to focus on critical thinking and analysis. It also enables researchers to manage their time better, by focusing more on areas that require manual, intellectual effort.
  • Language Precision: Tools like WordVice and Wordtune refine language, elevating the quality of communication in research papers.
  • Heuristic support: AI tools can help you get acquainted with new theories, concepts, and arguments while offering personalized support to your learning process. Being better informed will positively reflect in your writing as well.
  • Improving overall quality: The tools we’ve seen so far are capable of improving your overall work quality. By helping you with idea organization, extracting information, creating rough drafts, and fine-tuning your writing skills, AI tools can improve your work’s overall quality.

Challenges:

  • Overreliance: Researchers must avoid overreliance on AI, preserving their unique insights and analytical skills. This balance ought to be cultivated in budding researchers, allowing them to explore diverse sources of information while remaining anchored in good research practices.
  • Potentials for Bias: While we think that AI tools have overcome biases, the truth is quite far from it. Especially in research, AI tools can generate responses that conform to the  restricted data available at the source, betraying its lack of reflexivity and bias-recognition ability.
  • Ethical Considerations: Responsible use involves addressing ethical concerns, such as data privacy and algorithmic biases. With the use of AI tools, the necessity for ethical considerations and regulations has become even more important in research.
  • Inherent limitations: There are inherent limitations to AI, which of course is a result of its design itself. For example, despite several attempts to overcome AI hallucinations, we still encounter the problems of AI tools generating fake or untrue information, precisely because the model runs on iterative text-predictions in a given context.

Despite the many challenges associated with it, AI tools can indeed propel radical changes in scientific research. Embracing the best AI tools for research paper writing pushes academia into a new era of efficiency and precision.

The synergy between human intellect and AI tools can cultivate a research landscape where technological advancements complement, rather than overshadow, our intellectual endeavors.

As you embark on your next research journey, consider integrating these tools and navigating the intricate path toward academic success with confidence.

1. How do I use AI for academic writing?

2.can ai tools help me brainstorm and outline my research paper.

Yes, AI tools like SciSpace can assist in generating ideas and outlining a research paper. AI tools can also help refine your writing, including the style and overall structure of your ideas.

3.Which is the top AI tool for writing research papers in 2024?

SciSpace is the best AI tool for writing research papers in 2024. It has a plethora of AI tools geared towards locating relevant papers, conducting literature reviews, and finessing your research writing.

4.What are the pros and cons of using AI tools for academic writing?

Pros include time efficiency, language precision, heuristic support, and improved productivity. Cons include overreliance on AI tools, potential for bias, ethical challenges, and limitations of the models themselves.

ai language model for scientific research paper writing

You might also like

Boosting Citations: A Comparative Analysis of Graphical Abstract vs. Video Abstract

Boosting Citations: A Comparative Analysis of Graphical Abstract vs. Video Abstract

Sumalatha G

The Impact of Visual Abstracts on Boosting Citations

Introducing SciSpace’s Citation Booster To Increase Research Visibility

Introducing SciSpace’s Citation Booster To Increase Research Visibility

ai language model for scientific research paper writing

Cheat Sheet | Large Language Models+ For Scientific Research

Large language models+ for scientific research.

Updated August 2023

LLMs and Tools for R&D

To help scientists and researchers navigate the increasing number of advanced artificial intelligence (AI) options, Enthought’s experts put together this summary of Large Language Models (LLMs) and related tools that are most relevant for R&D updated as of early August 2023. This is a fast-moving field, so expect that the landscape will continue to change quickly.

We also suggest getting started with our What Every R&D Leader Needs to Know About ChatGPT and LLMs on-demand webinar as well as these additional resources:

  • The Practical Guides for Large Language Models (GitHub)
  • Applications of transformer-based language models in bioinformatics: a survey (ISCB)
  • Recent advances and applications of deep learning methods in materials science (Nature)

Download PDF

Enthought | Cheat Sheet: Large Language Models+ for Scientific Research

The Major Players

Of the major players in AI, only OpenAI is currently offering their LLMs as a commercial service, and then only by invitation (as of this writing). However, many companies have experimental or non-commercial models to experiment with. Keep IP issues in mind with these.

OpenAI - openai.com OpenAI offers a variety of different LLMs and APIs addressing different use-cases, including fine-tuning models on your own data.  Serious commercial use should be via the APIs, which are currently available by invitation.

Meta AI LLaMA 2 - github.com/facebookresearch/llama/blob/main/MODEL_CARD.md A collection of related LLMs released by Meta AI (Facebook). Unlike version 1, version 2 is available for commerical and research purposes.

Google Bard - bard.google.com Google’s experimental LLM. No public APIs available yet, and chatbot conversations are used for further training, so not yet ready for commercial use.

Amazon AlexaTM - github.com/amazon-science/alexa-teacher-models Amazon Science’s LLM, which can be accessed for non-commercial use via AWS SageMaker.

Anthropic Claude - claude.ai Unique model because of its large context window (100k+ tokens), allowing it to answer questions about longer documents. API access is only available via inquiries. A chat interface is generally available, but conversations may be used for further training, so not a commercial option.

Hugging Face - huggingface.co Hugging Face provides infrastructure support for LLM and other Machine Learning operations, including hosting, training and deployment of models. They also host some internally developed and open-source models such as BLOOM.

Open-Source LLMs

If you want to train, fine-tune, or run a LLM on your own, there are a number of models available ranging from older models from major AI companies to non-commercial research models, to some more recent, permissively licensed models.

Google BERT - github.com/google-research/bert One of the first openly available transformer-based LLMs and available under the permissive Apache 2.0 license.  BERT is the foundation for many of the tools for scientific applications of LLMs.

OpenAI GPT-2 - github.com/openai/gpt-2 OpenAI’s 2nd generation LLM, released under a permissive MIT license. GPT-2 is now 4 years old, so well-behind the state-of-the-art, but ground-breaking at the time.

BLOOM - bigscience.huggingface.co/blog/bloom A multi-lingual LLM by a large consortium of researchers and organizations, including Hugging Face.  It is open-sourced under the Responsible AI License (usable commercially with some restrictions, particularly around disclosure and medical use-cases). There is also BLOOMZ which is fine-tuned for following instructions rather than conversation.

Falcon LLM - huggingface.co/tiiuae An LLM released by the Technology Innovation Institute under a permissive Apache 2.0 license. This is used as a basis for a number of other open tools, such as LAION’s Open Assistant (https://open-assistant.io/).

MPT-30 - mosaicml.com/blog/mpt-30b A collection of LLMs with different optimizations trained inexpensively on very large input sets. Released by MosaicML under the Apache 2.0 license with the intent that it is commercially usable.

Dolly/Pythia - huggingface.co/databricks/dolly-v2-12b An LLM tuned by Databricks based on the Pythia LLM. It is not cutting edge but is large and released under an MIT license.

Stanford University Alpaca - crfm.stanford.edu/2023/03/13/alpaca.html A model based on Meta’s LLaMA v1 produced by the Center for Research on Foundation Models (CRFM) group at Stanford. The model is open-sourced under a non-commercial license and designed to be trained inexpensively on smaller data sets. There are a number of other models derived from this, such as Vicuna (lmsys.org/blog/2023-03-30-vicuna).

LeRF - lerf.io LeRF combines the ability to reconstruct a 3D scene from a handful of still images using Neural Radiance Fields (NeRF) with LLMs, allowing easy searching of a 3D scene using natural language. The models and code are open source, but currently without a license, and so not yet commercially usable.

Toolkits and APIs

To go beyond simple chat applications of LLMs, you will need some tools to connect the models with other services or even libraries to build and train your own models.

Transformers - huggingface.co/docs/transformers/index A toolkit built on top of PyTorch and TensorFlow that provides building blocks for LLMs as well as other state-of-the-art machine learning models. It also integrates with the Hugging Face public API to facilitate building, training and running models in the cloud, as well as accessing many 3rd party models.

LangChain - python.langchain.com/en/latest/index.html LangChain is a toolkit for building LLM-centered applications, particularly agents and assistants. It provides automation for building special-purpose prompts which work well with LLMs to produce particular types of outputs, as well as integration with other services such as data sources and code execution.

Science-Specific Tools

In the last few years there have been a number of high-profile papers and toolkits in Material Science and Bioinformatics that use these new ML models.  Most of these have source code and model weights freely available, but there are not yet any services built on top of these. They are research-grade software, not production-grade, with many based on LLM techniques that are a generation or two behind the current state-of-the-art. There are likely to be better models in the future.

ChemBERT - github.com/HyunSeobKim/CHEM-BERT Chemical property prediction from SMILES molecular structure representation. There are other models derived from this original work.

ChemCrow - github.com/ur-whitelab/chemcrow-public LangChain-based package for solving reasoning-intensive chemical tasks posed using natural language. This currently needs API access for OpenAI and possibly other APIs depending on the tasks.

ProteinBERT - github.com/nadavbra/protein_bert A framework for building protein property predictors from protein sequence information. The base model is designed to be fine-tuned to for arbitrary properties.

TransUNet - github.com/Beckschen/TransUNet Next-generation medical image segmentation using transformer-based models. This has the potential to be cheaper to train and more capable of detecting large-scale structures in an image.

Enformer - huggingface.co/EleutherAI/enformer-preview Transformer-based gene expression and chromatin prediction from DNA sequences. Similar to LLMs, Enformer has the capability of tracking a wider context within a DNA sequence than previous models.

ai language model for scientific research paper writing

DOWNLOAD THIS RESOURCE (PDF)

Explore more: Blogs and Resources

Share this article:

Related content, digital transformation in practice.

There is much more to digital transformation than technology, and a holistic strategy is crucial for the journey.

Leveraging AI for More Efficient Research in BioPharma

In the rapidly-evolving landscape of drug discovery and development, traditional approaches to R&D in biopharma are no longer sufficient. Artificial intelligence (AI) continues to be a...

Utilizing LLMs Today in Industrial Materials and Chemical R&D

Leveraging large language models (LLMs) in materials science and chemical R&D isn't just a speculative venture for some AI future. There are two primary use...

Top 10 AI Concepts Every Scientific R&D Leader Should Know

R&D leaders and scientists need a working understanding of key AI concepts so they can more effectively develop future-forward data strategies and lead the charge...

Why A Data Fabric is Essential for Modern R&D

Scattered and siloed data is one of the top challenges slowing down scientific discovery and innovation today. What every R&D organization needs is a data...

Jupyter AI Magics Are Not ✨Magic✨

It doesn’t take ✨magic✨ to integrate ChatGPT into your Jupyter workflow. Integrating ChatGPT into your Jupyter workflow doesn’t have to be magic. New tools are…

Top 5 Takeaways from the American Chemical Society (ACS) 2023 Fall Meeting: R&D Data, Generative AI and More

By Mike Heiber, Ph.D., Materials Informatics Manager Enthought, Materials Science Solutions The American Chemical Society (ACS) is a premier scientific organization with members all over…

Real Scientists Make Their Own Tools

There’s a long history of scientists who built new tools to enable their discoveries. Tycho Brahe built a quadrant that allowed him to observe the…

How IT Contributes to Successful Science

With the increasing importance of AI and machine learning in science and engineering, it is critical that the leadership of R&D and IT groups at...

From Data to Discovery: Exploring the Potential of Generative Models in Materials Informatics Solutions

Generative models can be used in many more areas than just language generation, with one particularly promising area: molecule generation for chemical product development.

SciTechEdit International | Science Editing Services for Scientists and Academics

Researchers often find themselves juggling multiple responsibilities, leaving them with limited time to dedicate to the meticulous task of writing. Artificial intelligence (AI) in the form of large language models (LLM) has emerged as a powerful tool for assisting with academic and medical writing. In this article, we discuss why AI may be helpful in this context, how to apply AI effectively, when it is most beneficial, and when to use caution.

Why Use AI Tools for Science Writing?

Why Use AI Tools for Science Writing ?

The use of AI tools for science writing offers several compelling benefits.

First, it significantly enhances efficiency by expediting the writing process. AI-driven tools can swiftly generate drafts and summaries, allowing researchers to focus more on research and analysis.

Second, AI enhances productivity by automating routine tasks like formatting and referencing, freeing up valuable time for higher-level work.

Third, AI contributes to quality improvement by identifying and rectifying grammar and style issues, resulting in more polished and professional documents.

Lastly, AI ensures consistency in writing style, terminology, and formatting, a critical aspect of academic, medical, and science writing, especially in collaborative efforts. These advantages collectively empower writers to produce higher-quality, more efficient, and consistent academic and medical content.

How to Use AI for Scientific Writing

How to Use AI Tools for Scientific Writing

Select the correct large language model.

When selecting the appropriate large language model (LLM) for writing your research manuscript using AI tools, consider several factors to ensure the best fit for your specific needs. Available LLMs include GPT-3.5 , GPT-4 , BARD , LlaMA , Falcon , Cohere , PaLM , Claude v1 , among others. To make an informed choice, pay close attention to the following characteristics:

  • Task Suitability : Determine whether the LLM is well-suited for your research manuscript, considering factors like natural language understanding, text generation, summarization, or translation.
  • Training Data : Look at the quality and quantity of the training data. Models trained on diverse and extensive datasets tend to perform better across various domains. Pay attention to how long ago the LLM was trained or if there is ongoing training.
  • Fine-tuning Capability : If your task requires domain-specific language, check if the LLM can be fine-tuned to improve its performance in your field.
  • Data Handling/Retention Practices : Before using any LLM, review the terms of service, privacy policies, and data handling practices of the specific platform or service where the LLM is deployed. These policies can differ, so understand how your data are treated when using any AI model or service.
  • Community and Support : Models with active user communities often have better documentation, support, and open-source resources.
  • Ethical Considerations : Be mindful of responsible AI usage and mitigation of biases, and choose models that align with your ethical principles.
  • Cost and Licensing : Consider the cost associated with using the model, as well as any licensing or usage restrictions.

By carefully evaluating these characteristics, you can select the LLM that best fits your research manuscript needs and leverage AI tools effectively in your writing process.

Next: Create effective prompts

Applying AI for writing can be a bit like learning to navigate a complex maze to yield the results you desire. Crafting effective prompts is a skill that evolves with practice and experimentation.

A prompt, in the context of AI, is a specific instruction or input to generate the desired output or response. Prompts can take various forms, such as text, questions, or commands, and they are essential in shaping the AI’s output to align with the user’s intentions or objectives. The quality and specificity of a prompt play a significant role in determining the relevance and accuracy of the AI-generated results. The key to writing an effective prompt is to strike a balance between providing clear and specific instructions without unnecessary verbosity. How to get started:

  • Be Specific : Clearly define the context and topic of your prompt. The more specific your prompt, the more focused and relevant the AI-generated response will be.
  • State Your Goal : Clearly express what you want from the AI. Whether it’s generating a summary, answering a question, or offering an explanation, specifying your goal helps the AI understand your intent.
  • Use Open-Ended Questions : Instead of asking yes/no questions, use open-ended questions that encourage in-depth responses. These questions often lead to more informative and insightful answers.
  • Provide Context : Briefly introduce the background or context related to the prompt. This can help the AI understand the subject matter and provide more relevant information.
  • Request Multiple Perspectives : If you’re seeking a balanced view or different angles on a topic, explicitly ask the AI to provide multiple perspectives or arguments.
  • Specify Format : If you have a preferred format for the response (e.g., bullet points, a summary, a list), mention it in your prompt. This guides the AI in structuring the output accordingly.
  • Request Citations : If you’re using the AI for research or academic purposes, ask the AI to provide citations or sources to back up its responses. This ensures the information’s credibility. Be sure to check the citations for accuracy. The accuracy of citations generated by AI depends on the quality and accuracy of the data it is trained on, as well as the specific algorithms and models used. See below for more on using AI for citations.
  • Define Tone and Style : If you have a specific tone or writing style in mind, describe it in your prompt. For example, you can request a formal or casual tone, or specify whether you want the response to be technical or easy to understand.
  • Ask for Step-by-Step Instructions : If you need a set of instructions, ask the AI to provide step-by-step guidance for a specific task or process.
  • Limit Word Count : If you want a concise response, specify the desired word or character count. This prevents the AI from generating overly lengthy responses. Note, however, that the word counts provided by AI are often incorrect, and it may be necessary to shorten or increase the length manually.
  • Experiment and Iterate : Don’t hesitate to experiment with different prompts and iterate to refine your queries. AI models can respond differently to slight changes in phrasing and wording.
  • Test for Consistency : If you’re seeking consistency in responses, ask the AI to maintain a consistent stance or viewpoint throughout the text.
  • Use Conditional Statements : Employ conditional statements like “If…then” to guide the AI in generating conditional or hypothetical responses.
  • Include Sample Input : If necessary, provide a sample input or data that the AI can refer to in generating the response.
  • Avoid Ambiguity : Ensure your prompts are unambiguous. Avoid vague language or unclear instructions, as this can lead to confusing or irrelevant responses.
  • Consider Ethical Implications : Be mindful of the ethical implications of your prompts. Use clear, unbiased language and avoid requests that promote harmful, biased, or discriminatory content.
  • Review and Revise : Always review the AI-generated output for accuracy and relevance. If necessary, revise the prompts and responses to achieve the desired outcome.

Citation and Formatting Tools

Using AI tools for citations and formatting in your research manuscript can be efficient and time-saving, but it’s essential to follow a systematic approach while being mindful of potential pitfalls. Here’s a guide on how to use AI for citations and formatting efficiently, what input to provide, and where to use caution.

  • Reference Correction: LLMs can help identify and correct errors in your reference list, such as inconsistencies in formatting, missing details, or incorrect citation styles. You can provide the LLM with your reference information, and it can generate correctly formatted references according to the chosen citation style (e.g., APA, MLA, Chicago, etc.).
  • Style Guide Compliance: LLMs can ensure that your references adhere to the specific style guidelines required by your academic institution or publication. They can provide guidance on the proper arrangement of elements like authors’ names, publication dates, titles, and more.
  • Citation Style Conversion: If you need to switch between different citation styles, LLMs can help you convert your existing references into the desired format. This is particularly helpful if you’re submitting your paper to multiple journals with varying citation requirements.
  • Checking for Missing Citations: LLMs can assist in checking your manuscript for any missing citations or references, helping you avoid unintentional plagiarism and ensuring that all sources are properly cited.
  • Consistency and Organization: LLMs can assist in maintaining consistency throughout your reference list, ensuring that all references are formatted in the same way and that the citation style is consistent across the entire document.

When utilizing AI for referencing and formatting in your research manuscript, it’s crucial to exercise caution. Despite the general reliability of AI tools, the potential for errors remains, making it essential to meticulously double-check AI-generated citations and the formatting of your reference list to ensure accuracy. It’s equally important to ensure that the input data for your sources are comprehensive and precise, reducing the likelihood of inaccuracies.

Moreover, some AI tools may encounter challenges when dealing with less common or unconventional source types, necessitating vigilance on your part to both verify the correctness of the formatting and make manual adjustments if necessary. In collaborative research efforts, maintaining version control is pivotal. Ensuring that all team members employ the same AI tool and version guarantees consistency in formatting and referencing throughout the manuscript.

Grammar and Language Style Tools

AI can be highly useful for checking your grammar, punctuation, and language style as you write. Understanding what AI can and cannot do is essential.

  • Select a Reliable AI Tool : Choose a reputable AI-powered grammar and style-checking tool. Popular options include Grammarly , ProWritingAid , and Microsoft Word’s built-in grammar checker.
  • Integrate with Your Writing Software : Ensure that the AI tool integrates seamlessly with your preferred word processing software, such as Microsoft Word, Google Docs, or LaTeX. Most AI tools offer browser extensions or desktop applications for this purpose.
  • Document Preparation : Write or import your science research manuscript into your chosen word processing software.
  • Activate the AI Tool : Enable the AI tool’s grammar and style-checking features. Depending on the tool, this may involve clicking on a toolbar or selecting a specific option in the software.
  • Real-Time Suggestions : As you write, the AI tool will provide real-time suggestions for grammar, punctuation, and style improvements. Pay attention to the tool’s suggestions and implement them if they enhance the clarity and coherence of your writing.
  • Contextual Feedback : AI tools often offer contextual feedback, such as explanations for suggested changes. Use this feedback to understand the reasons behind the suggestions, check that the AI understands the context of your writing, and improve your writing skills.
  • Style Consistency : Ensure that your writing adheres to the specific style guide required for your science research manuscript (e.g., APA, IEEE). The AI tool can help you maintain consistency in style and formatting.
  • Customization : Some AI tools allow customization to align with your specific writing preferences. Explore the customization options to tailor the tool to your needs.
  • Document Review : Once you’ve completed your manuscript, run a comprehensive grammar and style check using the AI tool. This will help catch any overlooked issues.
  • Collaborative Editing : If you’re working with co-authors, share the document and invite them to review the AI tool’s suggestions. Collaborative editing can enhance the overall quality of the manuscript.
  • Final Review : After addressing all the AI tool’s suggestions, conduct a final manual review of your manuscript to ensure that the changes align with the context and content of your research. AI is a valuable assistant but should not replace your critical assessment.

While AI-powered grammar and writing style tools offer substantial benefits, they are not without their shortcomings. Some common pitfalls include over-correction, where AI suggests technically correct changes that may alter the intended meaning, such as replacing informal language with formal equivalents. Contextual misinterpretation can lead to suggestions that don’t align with the author’s intended meaning, particularly in specialized fields where technical terms are involved. False positives may flag constructions as errors that aren’t incorrect, such as sentence fragments or discipline-specific terminology. AI tools often favor active voice for clarity, potentially causing overcorrection in scientific writing, where passive voice may be preferred (see our article Knowing How, When, and Where to Use Active and Passive Voice in Science Writing ). Inconsistencies, homonym errors, and the introduction of incorrect suggestive language are also possible. Misplacement of modifiers can result in awkward or unclear sentences. To address these pitfalls, researchers should use AI tools as aids and perform a critical evaluation of their suggestions, ensuring that the research’s intended meaning and style are preserved. Combining AI tools with human judgment remains the most effective approach.

Plagiarism and Other Ethical Issues

The ethical implications of using AI should not be overlooked. Consider issues related to plagiarism, copyright, and data privacy, and ensure that your utilization of AI aligns with responsible and ethical practices. To ensure the originality of your AI-assisted work, it is advisable to run your manuscript through dedicated plagiarism detection software.

  • Plagiarism Detection : Use dedicated plagiarism detection software to ensure that your manuscript is free from any unintentional or intentional plagiarism. AI tools can assist with writing but may not identify potential issues with originality (see our article How to Avoid Plagiarism ).
  • Citation and Attribution : Properly attribute all sources, quotes, and references in your manuscript. AI tools can help with formatting citations, but it’s your responsibility to ensure that all sources are accurate and appropriately credited.
  • Review and Accountability : Always review and take responsibility for the content generated by AI. Do not solely rely on AI for the accuracy and ethical standards of your research.
  • Transparency : Most journals require transparency regarding the use of AI tools in the writing process. Check the journal instructions for guidance on how to disclose your use of AI tools in your manuscript preparation.

When Is AI Most Beneficial for Academic and Science Writing?

When Is AI Most Beneficial for Academic and Science Writing?

AI can be highly advantageous in specific academic and medical writing scenarios. First, it proves invaluable for time-sensitive projects when tight deadlines necessitate quick content generation or efficient proofreading to meet submission timelines. For example, AI can be leveraged to create presentation content. Second, AI can be particularly beneficial in the initial drafting of academic and medical manuscripts by providing a structured outline and content that authors can subsequently refine.

For example, researchers embarking on a new research paper can utilize AI to generate a preliminary outline and introduction, expediting the writing process. Additionally, AI serves as a valuable language enhancement tool, especially for non-native English speakers looking to improve the language and grammar of their academic or medical documents.

When Is AI Most Beneficial for Academic and Science Writing?

When to Approach AI with Caution in Academic and Medical Writing

The use of AI tools for science writing like medical and research manuscripts, offers undeniable time-saving benefits, but it also comes with potential drawbacks. AI’s lack of contextual understanding can lead to content misalignment with the research question and result in misinterpretations of intricate scientific concepts, potentially introducing inaccuracies. Language enhancement tools may inadvertently introduce errors or reduce clarity, emphasizing the need for human oversight. There’s a risk of AI generating false citations or referencing incorrect sources, which could compromise research integrity. Ethical decision-making, especially in sensitive research areas, might challenge AI, as nuanced judgment is often necessary. Additionally, AI’s inability to infuse creativity and insightful perspectives into writing, coupled with difficulties handling subjective elements in research, highlights the importance of human intervention to maintain the quality and ethical standards of research manuscripts.

AI has emerged as a valuable tool in science, academic, and medical writing, offering efficiency, quality enhancement, and consistency. Responsible and ethical usage of AI in your writing, however, remains critical. Authors must be transparent about their use of AI tools, diligently review AI-generated content, and uphold ethical standards throughout the writing process.

As AI technology continues to advance, researchers and medical professionals can anticipate even more sophisticated AI tools tailored to the specific needs of academic and medical writing. By embracing AI while recognizing its limitations, authors can leverage its capabilities to enhance the impact and quality of their research publications.

AI Disclaimer: This article was generated with the assistance of ChatGPT , an AI large language model, to enhance its content. The AI-generated content was thoroughly reviewed, rewritten, and edited to ensure accuracy and coherence. The authors take full responsibility for this article.

Want MORE Writing Tips?

Subscribe to our newsletter!

Email (required) *

Recent Articles

15 Tips for Self-Editing Your Science Manuscript2

15 Tips for Self-Editing Your Science Manuscript

Discover 15 expert tips for self-editing your science manuscript and why hiring a professional editor could enhance its quality and impact.

Grant Applications and Scientific Manuscripts

The Art of Tailored Editing for Grant Applications and Scientific Manuscripts

Discover editing strategies for science grant applications and manuscripts to clearly communicate research and secure funding. 2 min read…

Elevating Impact Through Post-Translation Editing

Optimizing Science Document Translations: Elevating Impact Through Post-Translation Editing

Learn the role of post-translation editing, ensuring manuscripts meet the standards of English-language journals for global impact.

Need Help With Your Writing?

At SciTechEdit, we are committed to delivering top-notch science editing services to enhance the impact and clarity of your research. We understand the importance of effective communication in the scientific community, and our team of experienced editors is here to help you refine and elevate your scientific manuscripts.

Need help with writing?

Loading metrics

Open Access

Ten simple rules for using large language models in science, version 1.0

Roles Conceptualization, Funding acquisition, Investigation, Project administration, Writing – original draft, Writing – review & editing

* E-mail: [email protected]

Affiliation Department of Environmental Systems Science, Institute of Integrative Biology, ETH Zürich, Zürich, Switzerland

ORCID logo

Roles Investigation, Writing – original draft, Writing – review & editing

Roles Funding acquisition, Investigation, Writing – original draft, Writing – review & editing

Affiliations Department of Environmental Systems Science, Institute of Integrative Biology, ETH Zürich, Zürich, Switzerland, Department of Genetics, Evolution and Environment, University College London, London, United Kingdom

Affiliation Anarchy, San Francisco, California, United States of America

Affiliations Department of Environmental Systems Science, Institute of Integrative Biology, ETH Zürich, Zürich, Switzerland, College of Water Sciences, Beijing Normal University, Beijing, China

Affiliations Department of Environmental Systems Science, Institute of Integrative Biology, ETH Zürich, Zürich, Switzerland, Sorbonne Université, Paris, France

Affiliations Department of Environmental Systems Science, Institute of Integrative Biology, ETH Zürich, Zürich, Switzerland, College of Resources and Environmental Sciences, Nanjing Agricultural University, Nanjing, Jiangsu, China

  •  [ ... ],

Roles Funding acquisition, Investigation, Supervision, Writing – original draft, Writing – review & editing

  • [ view all ]
  • [ view less ]
  • Gabriel Reuben Smith, 
  • Carolina Bello, 
  • Lalasia Bialic-Murphy, 
  • Emily Clark, 
  • Camille S. Delavaux, 
  • Camille Fournier de Lauriere, 
  • Johan van den Hoogen, 
  • Thomas Lauber, 
  • Haozhi Ma, 

PLOS

Published: January 31, 2024

  • https://doi.org/10.1371/journal.pcbi.1011767
  • Reader Comments

ai language model for scientific research paper writing

Citation: Smith GR, Bello C, Bialic-Murphy L, Clark E, Delavaux CS, Fournier de Lauriere C, et al. (2024) Ten simple rules for using large language models in science, version 1.0. PLoS Comput Biol 20(1): e1011767. https://doi.org/10.1371/journal.pcbi.1011767

Editor: Russell Schwartz, Carnegie Mellon University, UNITED STATES

Copyright: © 2024 Smith et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: GRS, CB, LB-M, EC, CSD, CFdL, JvdH, TL, HM, LM, DR, LKW, and TWC were funded by DOB Ecology and the Bernina Initiative. CSD was also funded by the Swiss National Science Foundation (Postdoctoral Fellowship #TMPFP3_209925). DSM, CMZ, and GRS were funded by Ambizione grants PZ00P3_193612, PZ00P3_193646, and PZ00P3_216194 respectively, from the Swiss National Science Foundation. GY was funded by the French Foreign Ministry (Grant number 102390W). ZW and QZ were funded by the China Scholarship Council (Nos. 202206040112 and 202206850027, respectively). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: MM is the founder and CEO of an LLM startup.

Introduction

Generative artificial intelligence (AI) tools, including large language models (LLMs), are expected to radically alter the way we live and work, with as many as 300 million jobs at risk [ 1 ]. Arguably the most well-known LLM currently is GPT (generative pre-trained transformer), developed by American company OpenAI [ 2 ]. Since its release in late 2022, GPT’s chatbot interface, ChatGPT, has exploded in popularity, setting a new record for the fastest growing user base in history [ 3 ]. The appeal of GPT and other LLMs stem from their ability to effectively carry out multistep tasks and provide clear, human-like responses to complicated queries and prompts ( Box 1 ). Unsurprisingly, this capacity is catching the eye of scientists [ 4 ].

Box 1—An LLM suggests 10 simple rules governing its own usage in science

Here, we ask ChatGPT to generate a list of 10 simple rules for using LLMs in science in order to illustrate its ability to provide detailed, human-like responses to short prompts.

Q1 : Can you provide me with a list of ten simple rules for using large language models in science?

R1 (ChatGPT 3.5) : Certainly! Here are ten simple rules for using large language models in science:

  • 1. Understand the limitations: Recognize that large language models like GPT-3 are powerful but not infallible. They may produce accurate-sounding information, but they can also generate false or biased content. Exercise caution and critical thinking .
  • 2. Validate and verify: Always cross-reference information generated by language models with reliable sources. Verify the accuracy of claims, check for consistency, and corroborate the findings through empirical evidence or expert opinions.
  • 3. Train with caution: When fine-tuning a language model for a specific scientific task, carefully design and curate the training data. Ensure the data is representative, diverse, and free from biases that may influence the model’s outputs.
  • 4. Provide context and instructions: Clearly specify the intended context and constraints when using a large language model. This helps narrow down the results and ensures the model generates more relevant and accurate responses.
  • 5. Understand the biases: Large language models learn from the data they are trained on, which can contain societal biases. Be aware of these biases and take steps to mitigate them by using diverse training data and evaluating the outputs for fairness.
  • 6. Collaborate and share knowledge: Engage in open dialogue with other researchers and scientists working with language models. Share experiences, discuss challenges, and collectively work towards best practices for using these models in scientific research.
  • 7. Maintain transparency: When presenting work that incorporates the use of large language models, disclose the methodology, tools, and limitations. Be transparent about the role of the language model in the research process and acknowledge its potential impact on the results.
  • 8. Ethical considerations: Consider the ethical implications of using language models in science. Be mindful of privacy concerns, consent, and potential misuse of generated content. Respect ethical guidelines and seek approval when working with sensitive or personal data.
  • 9. Avoid overreliance: Large language models can be a valuable tool, but they should not replace critical thinking, domain expertise, or rigorous scientific methods. Use them as a supportive tool alongside other research methodologies.
  • 10. Contribute to responsible AI development: Actively participate in efforts to improve large language models and promote responsible AI practices. Provide feedback to developers, collaborate on research, and advocate for ethical and inclusive use of these models in science and society.

Remember, these rules provide general guidance, and it’s important to adapt them to specific scientific disciplines and contexts.

Indeed, there is increasing interest in using GPT and other LLMs to accelerate scientific progress for the benefit of humankind [ 5 ]. However, specific challenges concerning possible misuse of LLMs in science are arising [ 6 ] in tandem with broader concerns about potential societal disruption and ethical risks [ 7 , 8 ]. As such, there is an urgent need for the scientific community to establish general guiding principles for the appropriate use of LLMs and other generative AI tools to maximise benefit and minimise harm [ 9 , 10 ].

Here, we propose a set of 10 simple rules for using LLMs in science, drawn from our own experimentation as cautiously optimistic environmental scientists curious about novel tools to streamline research. We note that the list is grounded in our expertise as scientists and experience as end-users of LLMs (GPT specifically), not as AI developers. We also note that we do not here address other sorts of generative AI, which could also be increasingly used for scientific research in the future.

We suggest safeguards against 5 areas of concern to be wary of ( Rules 1 to 5 ), complemented by suggestions for areas where LLMs have potential to support scientific research if sufficient care is taken to avoid issues ( Rules 6 to 10 ). Since LLMs are predictive language models, our use suggestions focus on language-centric aspects of scientific research, such as computer coding, writing, and publishing.

As developments in this field are rapid and outcomes often unpredictable [ 11 ], we envision that these guidelines can provide a starting point, not an end point; they will likely need to be revisited and adapted as circumstances change. We envision, additionally, that our list may also provide a basis for better standardised reporting and documentation ( S1 Appendix ) usable across journals, allowing researchers who are submitting manuscripts to document their use(s) of LLMs and affirm that they have appropriately considered potential problem areas.

1. Follow the rules of the target journal.

It’s essential to consult and follow an up-to-date version of the rules for the target journal prior to using an LLM for research. As these tools gain in popularity, journals are likely to provide explicit guidelines on what they consider to be acceptable or unacceptable uses in submitted manuscripts. Several journals have already issued statements on this topic [ 12 ], but these guidelines may be updated and changed as scientists increasingly experiment with LLMs and discover new uses. In fact, we anticipate that a key element of scientific ethics may soon concern proper disclosure of generative AI usage for research.

Importantly, different journals might adopt substantially different policies. If this occurs, work developed with one journal in mind could be fundamentally inadmissible in another even if the topic and novelty would otherwise be well-matched. This problem could potentially be mitigated by alignment along a standardised framework for reporting of generative AI use in science. We include an example document of this sort in the appendix of this paper ( S1 Appendix ), specifically for LLMs and based upon the rules given here. This document should be adapted and modified as new or use-specific challenges arise. If there is uncertainty concerning a given use, we encourage authors to discuss with editors and make use of the acknowledgements section of their papers to disclose their use of generative AI with sufficient detail.

2. Outline relevant risks before LLM use.

Because LLM use can lead not only to benefit but also to harm [ 13 ], researchers who would like to use one for their project should also first outline relevant risks [ 14 ]. Based on an assessment of the risks posed by a particular use case, an approach can be decided upon that maximises benefit and minimises potential harm.

For some risks, mitigation may be straightforward. For example, erroneous LLM-generated content can be caught and corrected by thoroughly and critically proofreading ( Rule 5 ). However, other well-documented LLM risks are too complex to be amenable to complete mitigation by a single researcher or team. In these cases, a researcher can instead qualitatively evaluate the extent to which their project might exacerbate the problem and make modifications to limit the specific negative impact of their work.

For example, if LLMs prove to be a boon to scientific research, variation in LLM accessibility and user-skill risks contributing to an unequal playing field for scientists competing for funding and employment. For this problem, a partial solution may take the form of researchers making LLM prompts they have used for their research freely available in the spirit of open science, so that others may learn from them. A second concern revolves around biases in a model’s underlying training data, which could cause its output to not only reinforce harmful societal prejudices [ 13 , 14 ] but also hamper scientific creativity by hewing closely to existing scientific narratives without emphasising knowledge gaps [ 15 , 16 ]. This problem may be addressed in part by checking LLM-generated content specifically for evidence of bias and using LLMs only in later stages of scientific work, after creative inquiry and hypothesis testing have already occurred.

3. Avoid plagiarism.

Because generative AI presents new challenges for legal frameworks surrounding copyright and intellectual property, researchers must proactively ensure that their LLM use is not considered plagiarism within its relevant context. Indeed, beyond transparently unethical uses of LLMs, for example, to defeat plagiarism-detection software [ 17 ], subtle issues surrounding plagiarism can also arise in everyday use of LLMs for science.

As long as an LLM is not considered a legal person, unattributed use of the text they produce would not seem to be plagiarism under most current legal definitions. However, it may be inadmissible for other reasons—for example, directly incorporating text from GPT into a manuscript currently violates some journal rules [ 12 ], rendering this an unacceptable usage ( Rule 1 ). Nevertheless, debate continues about how precisely guidelines and concepts of academic integrity ought to evolve in response to the increasing availability of generative AI [ 18 , 19 ], with no clear settled consensus yet. We anticipate that norms surrounding incorporation and declaration of LLM-generated text (after fact-checking) in scientific literature may change considerably in the near future. Currently, researchers might best conceptualise LLM-produced text as a third-party source, which can inform but not replace their own writing.

4. Respect confidentiality.

Unlike use of standard analytical tools, sharing confidential data or information with an LLM represents a potential breach of contract and must be avoided unless explicit permission is obtained. This is important to note because researchers wishing to debug computer code ( Rule 6 ), summarise content ( Rule 7 ), or improve manuscript writing ( Rule 10 ) might wish to share code, data, or writing with an LLM.

Though tempting, doing so presents a major potential problem because it remains unclear whether GPT in particular might “remember” this information and then share it with third parties who ask similar questions. In fact, OpenAI states that conversations that GPT (in the context of OpenAI consumer services) has are indeed used for further training of the model [ 20 ]. Scientists who work with sensitive, confidential data, for example medical records, must keep this in mind. Similar caution may be called for regarding preliminary scientific findings that are not yet ready for publication. We suggest that until there is sufficient clarity on this topic, data of this sort should not be transmitted to GPT, even if data confidentiality agreements (which may have been drafted before the advent of widespread LLM availability) do not explicitly rule out such an activity.

5. Verify the truthfulness of content generated by an LLM.

Content generated by an LLM must always be double-checked by a subject matter expert for factual accuracy. LLMs are well known to “hallucinate”—in other words, to give seemingly correct but false answers—particularly in response to complex questions ( Box 2 ) [ 21 ]. Problematically, the model’s eloquence can cause users to overlook false statements or references. This issue is an example of the “halo effect,” a well-known cognitive bias where an individual’s specific positive traits (e.g., eloquence) are mistakenly interpreted as evidence of a general “goodness” that extends beyond these specific traits [ 22 ]. In the case of an LLM, the quality of the presentation may be unconsciously assumed to match the veracity of the content.

Safeguarding against hallucinations may be easy or difficult, depending on the nature of the task that the LLM has been asked to perform. For example, a citation can be quickly cross-checked for accuracy and code can simply be executed to test whether it does what it is supposed to. But if one asks an LLM to summarise an article or transcribe a meeting (as in Rule 7 ), checking for accuracy might be much more challenging and time-consuming. The difficulty of checking in these cases might tempt users to trust responses uncritically—a tendency against which we must be on guard.

Box 2—Examples of LLM hallucination

Here, we request factual, referenced information from ChatGPT. In response, we receive apparently convincing answers which actually contain subtle, potentially misleading inaccuracies.

Q1 : Provide a referenced estimate of the amount of vegetation biomass on planet Earth.

R1 (ChatGPT3.5): As of my knowledge cutoff in September 2021, estimating the exact amount of vegetation biomass on planet Earth is challenging due to the vastness and complexity of ecosystems. However, scientists have made various estimations using different methods and data sources. It’s important to note that these estimates can vary depending on the specific criteria and assumptions used in the calculations.

One study published in the journal Nature in 2020 provides an estimate of global vegetation biomass based on satellite observations and ground measurements. The research suggests that the Earth’s total vegetation biomass is approximately 450 billion metric tons (gigatons). This includes all the vegetation on land, including forests, grasslands, shrublands, and other types of vegetation.

Here is the reference for the study:

Zhu, Z., et al. (2020). Greening of the Earth and its drivers. Nature, 597(7871), 405-409. doi: 10.1038/s41586-020-2474-6 .

Please keep in mind that new studies and advancements in remote sensing technologies may have provided more accurate estimates since my last update. It is advisable to consult recent scientific literature for the most up-to-date information on the subject.

Inaccuracies: This is a good example of a seemingly correct answer with some critical mistakes. The paper “Greening of the Earth and its drivers” was indeed published by Zhu and colleagues [ 23 ], but in Nature Climate Change , 2016 (doi: 10.1038/nclimate3004 ), not in Nature . Moreover, in Zhu and colleagues (2016), no estimate was provided of Earth’s total vegetation biomass. The estimate of 450 Gt C can instead be traced to Bar-On and colleagues (2018) [ 24 ].

Q2: What are the geospatial patterns of soil nematodes at a global scale? Provide references to the statements provided.

R2 (ChatGPT4): As of my last update in September 2021, the most comprehensive study on the global distribution of soil nematodes was published in 2019 in the journal "Nature". The research was conducted by Van den Hoogen et al., who assembled a global soil nematode dataset covering all continents and biomes.

Here are some of the key findings from the study:

Soil nematodes are ubiquitous and are found in every type of ecosystem, from the tropics to the Arctic and Antarctic, and from lowland areas to high mountain ranges.

Nematode density, or the number of nematodes per unit of soil, tends to decrease with increasing latitude. This means that nematode densities are generally higher in the tropics than in temperate or polar regions.

The highest nematode densities were found in grasslands and forests, and the lowest densities were found in deserts and tundra.

Nematode communities vary greatly in their composition depending on the ecosystem. For example, bacterial-feeding nematodes are more common in ecosystems with high organic matter content, while fungal-feeding nematodes are more common in ecosystems with low organic matter content.

The total global soil nematode population was estimated to be around 0.3 x 10^20 (or 30 quintillion) individuals, making them the most abundant group of animals on Earth.

Van den Hoogen, J., Geisen, S., Routh, D., Ferris, H., Traunspurger, W., Wardle, D. A., & & Crowther, T. W. (2019). Soil nematode abundance and functional group composition at a global scale. Nature, 572(7768), 194-198. doi: 10.1038/s41586-019-1418-6

Inaccuracies: Again, the answer provided is seemingly correct but contains errors. In the referenced paper [ 25 ], the estimated number of soil nematodes is actually 4.4 × 10^20, with a total biomass of 0.3 Gt. Additionally, the latitudinal patterns described in the paper are the opposite from ChatGPT’s answer. Finally, the paper contains no statements supporting a link between bacterial-feeding or fungal-feeding nematodes and organic matter content.

Use suggestions

6. use an llm to perform a more inclusive data search..

An LLM can assist researchers with data gathering for scientific meta-analyses or syntheses by collating “grey literature” resources—reports issued by governmental and nongovernmental organisations rather than scientific journals. This is helpful because while there are established methods for systematically searching the scientific literature using databases such as Web of Science [ 26 ], these approaches may neglect grey literature [ 27 ]. This oversight is particularly problematic in applied fields such as conservation and ecological restoration, where the insights of on-the-ground practitioners are of central importance.

Researchers can use GPT and other LLMs to collate and synthesise findings from grey literature (Box 3), which allows for easier incorporation into systematic reviews and facilitates knowledge sharing across organisations [ 28 ]. However, unless the LLM has internet access (as with some GPT plugins or Microsoft’s NewBing), the ability of the model to excel at this task may be limited by how comprehensive and recent its training corpus is. In the example below ( Box 3 ), ChatGPT provides a helpful overview of organisations with relevant information but some finer-scale details (e.g., ChatGPT’s estimation of website contents) are erroneous, either due to model hallucination ( Rule 5 ) or, potentially, changes occurring after the training data snapshot was taken.

Box 3—Performing an LLM-assisted grey literature search

Here, we use ChatGPT to generate a preliminary list of organisations that work in some aspect of forest restoration and which may have published reports online detailing the impact of their work.

Q1: Please generate a list of forest restoration organizations that publish data on the biodiversity and carbon outcomes of their work.

R1: Certainly! Here is a list of forest restoration organizations that publish data on the biodiversity and carbon outcomes of their work:

1. The Nature Conservancy: The Nature Conservancy is a global environmental organization that works to protect and restore critical habitats, including forests. They publish regular reports on their conservation efforts, including data on biodiversity and carbon sequestration.

2. Trees for the Future: Trees for the Future is a non-profit organization that works with local communities to plant trees and restore degraded lands. They publish annual reports that detail the impact of their work, including data on biodiversity and carbon sequestration.

3. Rainforest Alliance: The Rainforest Alliance is an international organization that works to conserve biodiversity and promote sustainable livelihoods through responsible land use practices. They publish regular reports on their forest conservation efforts, including data on carbon sequestration and biodiversity.

4. World Wildlife Fund: The World Wildlife Fund is a global conservation organization that works to protect and restore critical habitats, including forests. They publish regular reports on their conservation efforts, including data on biodiversity and carbon sequestration.

5. Forest Stewardship Council: The Forest Stewardship Council is an international organization that promotes responsible forest management practices. They publish regular reports on their certification process.

Q2: Have these organizations published reports on their work online?

R2: Yes, these organizations typically have published reports on their work available online. While I don’t have real-time access to the internet, I can provide you with general information about where to find these reports. You can visit the websites of these organizations and look for sections such as “Publications,” “Reports,” or “Impact” to find detailed information about their forest restoration projects, including data on biodiversity and carbon outcomes.

For example:

1. The Nature Conservancy: Visit their website at nature.org and explore their “Our Work” or “Publications” sections.

2. Trees for the Future: Visit their website at trees.org and look for sections like “Impact” or “Resources” for access to their reports.

3. Rainforest Alliance: Visit their website at rainforest-alliance.org and search for sections such as “Our Work” or “Publications” to find their reports.

4. World Wildlife Fund: Visit their website at wwf.org and explore their “Conservation” or “Publications” sections for access to their reports.

5. Forest Stewardship Council: Visit their website at fsc.org and look for sections like “Resources” or “Publications” for their reports on responsible forest management.

Please note that the availability and specific locations of these reports may vary over time, so it’s best to explore the respective websites or use their search functions to locate the most up-to-date information.

Q3: Please turn this into a table that facilitates finding and extracting data from the reports for these organizations, and 5 other relevant organizations.

R3: Certainly! Here is a table that includes the organizations mentioned earlier, as well as five additional forest restoration organizations, along with relevant information that can facilitate finding and extracting data from their reports:

[Table not shown for brevity, see Q4 for follow up that adds column on geographical scope]

Note that this table is not exhaustive and there may be additional relevant sections and types of reports available on each organization’s website. However, this should provide a good starting point for finding and extracting data from their reports.

Q4: Please add a column to this table that describes the geographic scope of each organization.

R4: Certainly! Here is an updated table that includes the organizations mentioned earlier, along with five additional forest restoration organizations, relevant sections on their website, types of reports they publish, and the geographic scope of their work:

thumbnail

  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image

https://doi.org/10.1371/journal.pcbi.1011767.t001

Note that the geographic scope of each organization’s work may vary over time, and this information is accurate as of the knowledge cutoff date for my training.

All organisations identified were relevant and many publish data or summaries of their impacts in annual reports somewhere on their websites. An extended list of additional organisations was provided with 1 additional prompt, and it is easy to export the list in a standard format (e.g.,.csv). However, the list does contain several inaccuracies (see Rule 5 ). For example, the “Relevant Sections” column often listed website sections that did not actually exist.

7. Use an LLM to summarise content.

To help researchers make more efficient use of their time, an LLM can quickly generate short summaries based on lengthy input content. This is helpful because as a scientist, it’s essential to regularly read new literature, but balancing this task with the duties of our own research, teaching, and supervision can be challenging. Yet, researchers who fail to strike a good balance and don’t frequently investigate new literature can miss out on findings that might helpfully inform and shape their own research programs, or which would be relevant to classes that they are teaching.

Currently, several third-party plugins for GPT (e.g., ChatPDF.com , UseChatGPT.AI , GPT Academic , SciSpace copilot ) can be used to generate summaries based on open-access scientific literature, complete with main points, findings, and contributions. Microsoft’s NewBing search engine, which incorporates GPT-4, can also be used for this purpose. With tailored structural prompts ( Box 4 ), a researcher can specify the length, style, and format they prefer for a summary. ChatGPT can also respond to follow-up questions requesting more details, if needed, producing a kind of interactive abstract. Yet, just as with standard abstracts, if a work appears to be of interest, appropriately contextualising and fully understanding the main findings requires that a researcher read the full paper.

Box 4—Using GPT to summarise scientific literature

Here, we use GPT-4 (via NewBing) to summarise a recently published open-access scientific paper.

Q1: I want you to act as a scientific paper summarizer. I will give you the URL of a scientific article. I want you to reply with a bold title for each section including the general information of this paper, the background of this research, the addressed question and hypothesis, the key findings, and major contributions. The summary of each section should be logical, concise, clear, and informative. Make sure the summary includes most of the key information and can grip the audience. The URL is: https://www.nature.com/articles/s41467-022-30888-2 .

R1 (GPT-4 via NewBing): General information: The article is titled “Global relationships in tree functional traits” and it was published in Nature Communications on 08 June 2022. The authors are Daniel S. Maynard, Lalasia Bialic-Murphy, Constantin M. Zohner, Colin Averill, Johan van den Hoogen, Haozhi Ma, Lidong Mo, Gabriel Reuben Smith, Alicia T. R. Acosta, Isabelle Aubin, Erika Berenguer, Coline C. F. Boonman, Jane A. Catford, Bruno E. L. Cerabolini, Arildo S. Dias, Andrés González-Melo, Peter Hietz, Christopher H. Lusk, Akira S. Mori, Ülo Niinemets, Valério D. Pillar, Bruno X. Pinho, Julieta A. Rosell, Frank M. Schurr and Thomas W. Crowther1.

Background: Trees are subject to unique physiological, mechanical and ecological pressures due to their massive energetic investments in woody support structures. Functional traits are the physiological and morphological features that determine how trees interact with, influence and respond to their environment. Understanding the patterns and trade-offs in tree functional traits is challenging because of the diversity and complexity of tree form and function.

Question and hypothesis: The authors aim to quantify the multidimensional relationships in tree trait expression across 18 functional traits encompassing leaf, seed, bark, wood, crown and root characteristics. They hypothesize that these traits reflect two main axes: one related to leaf economics (i.e., the trade-off between resource acquisition and conservation) and the other related to tree size and competition for light (i.e., the trade-off between growth and survival). They also expect that these axes show environmental convergence across temperature, moisture and elevation gradients.

Key findings: The authors find that nearly half of trait variation is captured by the two hypothesized axes1. They also find that these axes exhibit correlated responses to environmental factors1. Moreover, they identify eight distinct clusters of traits that capture the full dimensionality of trait space. These clusters reflect different aspects of tree form and function such as drought tolerance, shade tolerance, dispersal ability and nutrient cycling.

Major contributions: The authors identify a core set of traits needed to quantify global patterns in functional biodiversity. They also contribute to the fundamental understanding of the functioning of forests worldwide by revealing how tree traits shape their water, nutrient and light economies1. They suggest that their trait framework can be used to improve forest management and conservation strategies.

An LLM can even be used to produce summaries of internal documents, for example meeting minutes, if meeting participants give their consent and no confidential information was shared ( Rule 4 ). To facilitate this process, speech-to-text transcription software can first be used to produce a full written record of a meeting. Then, this transcribed meeting can be shared with the LLM, which can be tasked with producing a summary. For this use, the adequacy of the summary should of course be verified by the meeting participants to ensure that important points have not been left out nor misrepresented.

8. Use an LLM to refine written English in formal and informal communication.

An LLM can facilitate clearer and more idiomatic usage of English, the lingua franca of the international scientific community. This can be especially helpful for researchers who speak English as a second language, who may face unique obstacles in the English-dominated environment of global science.

Unfortunately, peer review tends to disadvantage authors from the developing world and non-English speaking countries [ 29 ] and researchers who speak English as a second language sometimes receive rude, discouraging remarks from referees [ 30 ]. Coupled with journal adoption of double-blind reviewing practices [ 31 ], an LLM can help to address this problem by copyediting manuscripts and other communication to ensure that English is used idiomatically [ 32 ], reducing opportunities for bias from referees. Importantly, before using an LLM in this way journal guidelines must be consulted ( Rule 1 ). If such a use is not explicitly permitted under journal policy, we suggest the model’s role as manuscript copyeditor should be directly communicated to journal editors.

An LLM’s ability to proofread and copyedit can also be helpfully extended to informal communication as well. For example, an LLM can also be used to verify that e-mails and presentations written in English capture the appropriate tone, which can be a challenge even for researchers whose first language is English. Skilfully handling informal communication is important for researchers who seek to establish collaborative networks with peers they haven’t yet met in person (e.g., [ 33 ]). Since collaboration supports both basic scientific progress and scientific career development [ 34 ], minimising the barriers posed by language in this sphere promises significant benefit.

9. Use an LLM to improve scientific coding.

An LLM can help scientists to write computer code more quickly and efficiently [ 35 ], as LLMs excel not only in natural language processing but also in programming tasks across diverse programming languages. Models can be used to generate initial code snippets or to improve the structure, efficiency, and readability of existing code. This can be done, for example, by (i) arranging code into logical sections like functions, classes and modules; (ii) refactoring code by suggesting better naming conventions, eliminating code duplication, and improving overall code readability; (iii) suggesting alternative algorithms or data structures that may improve efficiency and performance of the code; or (iv) writing clearer code annotations.

An LLM can also aid in debugging and error detection ( Box 5 ). Yet, suggestions from LLMs may not necessarily provide the most efficient code or follow established best practices in the field; depending on how recent the training data are, deprecated functions and packages may also be used. In all cases, it’s crucial to verify the functionality of any provided code to ensure that it aligns with the user’s specific needs.

Box 5—Using GPT to debug computer code

Here, the user provides ChatGPT with Google Earth Engine code (rendered as […] in the example) that is meant to perform a set of statistical aggregation and spatial calculations within ecoregions [ 36 ] but is instead returning an error. ChatGPT identifies the cause of the error and suggests solutions. The initial suggested code returns a new error, but with further prompting ChatGPT successfully corrects the error to produce functional code.

Q1 : […] Here is my code in Google Earth Engine. I get this message error: Too many concurrent aggregations. Can you help me to find a solution?

R1 : The error “Too many concurrent aggregations” occurs when you are trying to perform too many reduceRegion or other aggregation operations simultaneously. In your code, you are using both reduceRegion and map functions over a large list of ecoIds. The reduceRegion is applied inside the map function, which may cause too many concurrent aggregation operations to run.

To fix this issue, you can try the following approaches:

1. Split your ecoIds list into smaller chunks and process each chunk separately, then combine the results. You can use the ee.List.slice() function to achieve this.

2. Limit the number of ecoIds you process at once. You can experiment with different numbers of ecoIds to find the optimal number that doesn’t cause the error.

3. You can try to use other Earth Engine functions to perform the operations you need, such as reducing the number of aggregations by using simpler reducers or using other functions that don’t involve aggregation.

Here is an example of how to split the ecoIds list into smaller chunks and process them separately:

// Define a function to process a chunk of ecoIds

function processEcoIdsChunk(chunk) {

var chunkImageCollection = ee.ImageCollection(chunk.map(function(id) {

 // Your existing code for processing each ecoId

return chunkImageCollection.mosaic();

// Split the ecoIds list into smaller chunks

var chunkSize = 10; // Adjust this value to find the optimal chunk size

var numChunks = ecoIds.length().divide(chunkSize).ceil();

var ecoIdsChunks = ee.List.sequence(0, numChunks.subtract(1)).map(function(chunkIndex) {

var startIndex = ee.Number(chunkIndex).multiply(chunkSize);

var endIndex = startIndex.add(chunkSize).min(ecoIds.length());

return ecoIds.slice(startIndex, endIndex);

// Process each chunk and combine the results

var processedChunks = ecoIdsChunks.map(processEcoIdsChunk);

var imageWithPotValues = ee.ImageCollection(processedChunks).mosaic();

// Add the layer to the map and print results

print(imageWithPotValues);

Map.addLayer(imageWithPotValues);

This code will divide the ecoIds list into smaller chunks and process them one by one, potentially reducing the number of concurrent aggregations and avoiding the error. You may need to experiment with the chunkSize value to find the optimal size that doesn’t cause the error.

Q2 : “Line 20: chunk.map is not a function” I get this error when running the code. Is this because I didn’t implement the function the right way?

R2 : My apologies for the confusion. The error is occurring because the ‘chunk’ variable in the ‘processEcoIdsChunk’ function is of type ‘ee.List’, but the ‘map()’ function is being called on it as if it were a JavaScript array. To fix this, you need to cast ‘chunk’ to an ‘ee.List’ before calling the ‘map()’ function on it. Here’s how to do that:

Replace this line:

With this line:

var chunkImageCollection = ee.ImageCollection(ee.List(chunk).map(function(id) {

Now, the ‘chunk’ variable is cast to an ‘ee.List’, allowing you to call the ‘map()’ function without issues. This should resolve the error, and your code should work as expected.

An LLM can also facilitate code translation from one programming language to another, which can prove invaluable when packages or libraries needed for a specific analysis are only available in an unfamiliar language. In addition to translating between different programming languages, LLMs can also translate computer code into natural language. In other words, they can explain to a user in intuitive terms what each section of code does and generate documentation directly from the code itself. This can be invaluable when working with sporadically documented code.

Importantly, the effective use of LLMs heavily depends on the user’s ability to craft precise, well-defined prompts that clearly outline the task at hand. In the context of scientific coding, this means the user must already have a strong understanding of the coding language in question as well as a good conceptual comprehension of the task. As such, the current generation of LLMs do not represent an alternative to developing this fundamental capacity, but rather a way to quickly address challenges that arise in coding.

10. Use an LLM to jump-start your scientific writing process.

Researchers can overcome writer’s block and “blank page anxiety” by asking an LLM to produce an outline for their scientific paper ( Box 1 ; Box 6 ). This can be helpful because writing is a major hurdle for many researchers [ 37 ], with some of the biggest obstacles posing a challenge at the initial outset of a project’s writing phase [ 38 ]. In addition to providing a starting point, an LLM can help to organise subsections, offer relevant examples, and fine-tune the overall flow of the manuscript. In this way, an LLM can act for researchers as a fast and inexpensive proofreader and copyeditor.

Box 6—Using GPT to produce a tailored paper outline

Q1: Give me a 4-heading structure for a research article. Topic: Climate change effects on biodiversity in the tropics Context: Ecology Language: Academic Tone: Formal.

R1: I. Introduction

A. Background on climate change and its global impact

B. Importance of studying climate change effects on biodiversity in the tropics

C. Research question and objectives

II. Literature Review

A. Overview of climate change impacts on biodiversity worldwide

B. Key studies and findings on climate change effects on biodiversity in the tropics

C. Knowledge gaps and research limitations

III. Methodology

A. Study area and data collection

B. Variables and measurements

C. Statistical or analytical methods used

IV. Results and Discussion

A. Presentation of findings related to climate change effects on biodiversity in the tropics

B. Discussion of the results in the context of existing literature

C. Implications and potential future directions

V. Conclusion

A. Summary of the main findings

B. Significance of the research and its contribution to the field

C. Recommendations for future research and policy implications

Importantly, however, an LLM cannot be assumed to provide reliable factual information (see Rule 5 ). Thus, just as with suggestions from a proofreader who may not have subject expertise, erroneous statements can inadvertently appear (admittedly perhaps for different reasons). Additionally, while an LLM can propose a structure for a researcher’s manuscript, there is no guarantee that this structure is the best choice for the author’s purposes. Authors must therefore still carefully evaluate and revise LLM suggestions using their own expertise, while ensuring that they do not directly incorporate LLM-generated text into their manuscript if doing so would contravene journal rules or constitute plagiarism ( Rules 1 and 4 ).

Conclusions

In addition to transforming the world economy [ 1 ], generative AI tools like LLMs will likely transform the way we do science [ 5 ]. Alongside great potential benefits stand great potential dangers, and within both of these categories are certain to be uses of generative AI that we (and others) have not anticipated. Indeed, as we explore these new tools, we must not lose sight of the unresolved dilemmas that still surround generative AI technology [ 7 , 16 ]. Some of the challenges relevant for research include the use of unspecified, potentially biased training datasets for commercial models [ 16 ], an implicit attachment to existing scientific paradigms [ 15 ], and a sizeable carbon footprint [ 39 ]. Issues such as these are sufficiently complex so as to preclude simple, actionable solutions, such as those we can offer here.

Nevertheless, we suggest that reduction of potential harm in LLM use for science may already be supported by following Rules 1 to 5 . With these caveats firmly in mind, a charitable researcher may conceive of GPT and other LLMs as a research assistant, copyeditor, or consultant ( Rules 6 to 10 ) who is imperfect and does not possess subject knowledge, thus sometimes making erroneous suggestions or statements.

Before beginning, researchers must ensure that their planned LLM use complies with journal guidelines ( Rule 1 ) and does not appear to pose substantial, unmitigable risk ( Rule 2 ). To avoid plagiarism ( Rule 3 ), LLM-generated content should not be used without appropriate attribution, and at no point should confidential information be shared in the course of model usage ( Rule 4 ). Finally, due to the possibility of hallucination, all LLM-generated content must be fact checked ( Rule 5 ). If these guidelines are respected and regularly revised as circumstances develop, we believe that generative AI tools like LLMs stand to significantly accelerate scientific progress for the benefit of humankind.

Supporting information

S1 appendix. an example reporting document for large language model use in science..

https://doi.org/10.1371/journal.pcbi.1011767.s001

Acknowledgments

We thank F Ardila Camelo for insightful comments and helpful suggestions.

  • View Article
  • Google Scholar
  • 2. OpenAI. Introducing ChatGPT. [cited 2023 Jun 19]. Available from: https://openai.com/blog/chatgpt .
  • PubMed/NCBI
  • 26. Gough D, Oliver S, Thomas J, editors. An Introduction to Systematic Reviews. 2nd ed. London, United Kingdom: SAGE Publications Ltd; 2017.

Academia Insider

The best AI tools for research papers and academic research (Literature review, grants, PDFs and more)

As our collective understanding and application of artificial intelligence (AI) continues to evolve, so too does the realm of academic research. Some people are scared by it while others are openly embracing the change. 

Make no mistake, AI is here to stay!

Instead of tirelessly scrolling through hundreds of PDFs, a powerful AI tool comes to your rescue, summarizing key information in your research papers. Instead of manually combing through citations and conducting literature reviews, an AI research assistant proficiently handles these tasks.

These aren’t futuristic dreams, but today’s reality. Welcome to the transformative world of AI-powered research tools!

This blog post will dive deeper into these tools, providing a detailed review of how AI is revolutionizing academic research. We’ll look at the tools that can make your literature review process less tedious, your search for relevant papers more precise, and your overall research process more efficient and fruitful.

I know that I wish these were around during my time in academia. It can be quite confronting when trying to work out what ones you should and shouldn’t use. A new one seems to be coming out every day!

Here is everything you need to know about AI for academic research and the ones I have personally trialed on my YouTube channel.

My Top AI Tools for Researchers and Academics – Tested and Reviewed!

There are many different tools now available on the market but there are only a handful that are specifically designed with researchers and academics as their primary user.

These are my recommendations that’ll cover almost everything that you’ll want to do:

Find literature using semantic search. I use this almost every day to answer a question that pops into my head.
An increasingly powerful and useful application, especially effective for conducting literature reviews through its advanced semantic search capabilities.
An AI-powered search engine specifically designed for academic research, providing a range of innovative features that make it extremely valuable for academia, PhD candidates, and anyone interested in in-depth research on various topics.
A tool designed to streamline the process of academic writing and journal submission, offering features that integrate directly with Microsoft Word as well as an online web document option.
A tools that allow users to easily understand complex language in peer reviewed papers. The free tier is enough for nearly everyone.
A versatile and powerful tool that acts like a personal data scientist, ideal for any research field. It simplifies data analysis and visualization, making complex tasks approachable and quick through its user-friendly interface.

Want to find out all of the tools that you could use?

Here they are, below:

AI literature search and mapping – best AI tools for a literature review – elicit and more

Harnessing AI tools for literature reviews and mapping brings a new level of efficiency and precision to academic research. No longer do you have to spend hours looking in obscure research databases to find what you need!

AI-powered tools like Semantic Scholar and elicit.org use sophisticated search engines to quickly identify relevant papers.

They can mine key information from countless PDFs, drastically reducing research time. You can even search with semantic questions, rather than having to deal with key words etc.

With AI as your research assistant, you can navigate the vast sea of scientific research with ease, uncovering citations and focusing on academic writing. It’s a revolutionary way to take on literature reviews.

  • Elicit –  https://elicit.org
  • Litmaps –  https://www.litmaps.com
  • Research rabbit – https://www.researchrabbit.ai/
  • Connected Papers –  https://www.connectedpapers.com/
  • Supersymmetry.ai: https://www.supersymmetry.ai
  • Semantic Scholar: https://www.semanticscholar.org
  • Laser AI –  https://laser.ai/
  • Inciteful –  https://inciteful.xyz/
  • Scite –  https://scite.ai/
  • System –  https://www.system.com

If you like AI tools you may want to check out this article:

  • How to get ChatGPT to write an essay [The prompts you need]

AI-powered research tools and AI for academic research

AI research tools, like Concensus, offer immense benefits in scientific research. Here are the general AI-powered tools for academic research. 

These AI-powered tools can efficiently summarize PDFs, extract key information, and perform AI-powered searches, and much more. Some are even working towards adding your own data base of files to ask questions from. 

Tools like scite even analyze citations in depth, while AI models like ChatGPT elicit new perspectives.

The result? The research process, previously a grueling endeavor, becomes significantly streamlined, offering you time for deeper exploration and understanding. Say goodbye to traditional struggles, and hello to your new AI research assistant!

  • Consensus –  https://consensus.app/
  • Iris AI –  https://iris.ai/
  • Research Buddy –  https://researchbuddy.app/
  • Mirror Think – https://mirrorthink.ai

AI for reading peer-reviewed papers easily

Using AI tools like Explain paper and Humata can significantly enhance your engagement with peer-reviewed papers. I always used to skip over the details of the papers because I had reached saturation point with the information coming in. 

These AI-powered research tools provide succinct summaries, saving you from sifting through extensive PDFs – no more boring nights trying to figure out which papers are the most important ones for you to read!

They not only facilitate efficient literature reviews by presenting key information, but also find overlooked insights.

With AI, deciphering complex citations and accelerating research has never been easier.

  • Aetherbrain – https://aetherbrain.ai
  • Explain Paper – https://www.explainpaper.com
  • Chat PDF – https://www.chatpdf.com
  • Humata – https://www.humata.ai/
  • Lateral AI –  https://www.lateral.io/
  • Paper Brain –  https://www.paperbrain.study/
  • Scholarcy – https://www.scholarcy.com/
  • SciSpace Copilot –  https://typeset.io/
  • Unriddle – https://www.unriddle.ai/
  • Sharly.ai – https://www.sharly.ai/
  • Open Read –  https://www.openread.academy

AI for scientific writing and research papers

In the ever-evolving realm of academic research, AI tools are increasingly taking center stage.

Enter Paper Wizard, Jenny.AI, and Wisio – these groundbreaking platforms are set to revolutionize the way we approach scientific writing.

Together, these AI tools are pioneering a new era of efficient, streamlined scientific writing.

  • Jenny.AI – https://jenni.ai/ (20% off with code ANDY20)
  • Yomu – https://www.yomu.ai
  • Wisio – https://www.wisio.app

AI academic editing tools

In the realm of scientific writing and editing, artificial intelligence (AI) tools are making a world of difference, offering precision and efficiency like never before. Consider tools such as Paper Pal, Writefull, and Trinka.

Together, these tools usher in a new era of scientific writing, where AI is your dedicated partner in the quest for impeccable composition.

  • PaperPal –  https://paperpal.com/
  • Writefull –  https://www.writefull.com/
  • Trinka –  https://www.trinka.ai/

AI tools for grant writing

In the challenging realm of science grant writing, two innovative AI tools are making waves: Granted AI and Grantable.

These platforms are game-changers, leveraging the power of artificial intelligence to streamline and enhance the grant application process.

Granted AI, an intelligent tool, uses AI algorithms to simplify the process of finding, applying, and managing grants. Meanwhile, Grantable offers a platform that automates and organizes grant application processes, making it easier than ever to secure funding.

Together, these tools are transforming the way we approach grant writing, using the power of AI to turn a complex, often arduous task into a more manageable, efficient, and successful endeavor.

  • Granted AI – https://grantedai.com/
  • Grantable – https://grantable.co/

Best free AI research tools

There are many different tools online that are emerging for researchers to be able to streamline their research processes. There’s no need for convience to come at a massive cost and break the bank.

The best free ones at time of writing are:

  • Elicit – https://elicit.org
  • Connected Papers – https://www.connectedpapers.com/
  • Litmaps – https://www.litmaps.com ( 10% off Pro subscription using the code “STAPLETON” )
  • Consensus – https://consensus.app/

Wrapping up

The integration of artificial intelligence in the world of academic research is nothing short of revolutionary.

With the array of AI tools we’ve explored today – from research and mapping, literature review, peer-reviewed papers reading, scientific writing, to academic editing and grant writing – the landscape of research is significantly transformed.

The advantages that AI-powered research tools bring to the table – efficiency, precision, time saving, and a more streamlined process – cannot be overstated.

These AI research tools aren’t just about convenience; they are transforming the way we conduct and comprehend research.

They liberate researchers from the clutches of tedium and overwhelm, allowing for more space for deep exploration, innovative thinking, and in-depth comprehension.

Whether you’re an experienced academic researcher or a student just starting out, these tools provide indispensable aid in your research journey.

And with a suite of free AI tools also available, there is no reason to not explore and embrace this AI revolution in academic research.

We are on the precipice of a new era of academic research, one where AI and human ingenuity work in tandem for richer, more profound scientific exploration. The future of research is here, and it is smart, efficient, and AI-powered.

Before we get too excited however, let us remember that AI tools are meant to be our assistants, not our masters. As we engage with these advanced technologies, let’s not lose sight of the human intellect, intuition, and imagination that form the heart of all meaningful research. Happy researching!

Thank you to Ivan Aguilar – Ph.D. Student at SFU (Simon Fraser University), for starting this list for me!

ai language model for scientific research paper writing

Dr Andrew Stapleton has a Masters and PhD in Chemistry from the UK and Australia. He has many years of research experience and has worked as a Postdoctoral Fellow and Associate at a number of Universities. Although having secured funding for his own research, he left academia to help others with his YouTube channel all about the inner workings of academia and how to make it work for you.

Thank you for visiting Academia Insider.

We are here to help you navigate Academia as painlessly as possible. We are supported by our readers and by visiting you are helping us earn a small amount through ads and affiliate revenue - Thank you!

ai language model for scientific research paper writing

2024 © Academia Insider

ai language model for scientific research paper writing

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

nanomanufacturing-logo

Article Menu

  • Subscribe SciFeed
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

Good practices for scientific article writing with chatgpt and other artificial intelligence language models.

ai language model for scientific research paper writing

  • Consider breaking up the paragraph into smaller chunks. This can help with readability and make it easier for readers to follow your ideas.
  • Try to vary the sentence structure to avoid repetition. For example, you could rephrase the sentence “While ChatGPT and similar tools can be used to enhance scientific article writing and facilitate communication within the scientific community” to something like “ChatGPT and similar tools have the potential to enhance scientific article writing and improve communication among researchers”.
  • When introducing a new term, such as “paper mills” or “paper forges,” it’s helpful to define the term or provide some context to ensure that readers understand what you mean.
  • Consider providing more specific examples or evidence to support your argument. For instance, you could cite a study or news article that highlights the potential misuse of AI language models in scientific publishing.

Acknowldegment

Conflicts of interest, short biography of author.

is a Tenured Scientist in the Spanish National Research Council (Consejo Superior de Investigaciones Científicas, CSIC). He explores novel two-dimensional materials and studies their mechanical, electrical, and optical properties with special interest on the application of these materials in nanomechanical and optoelectronic devices. He is the principal investigator of a prestigious ERC Starting Grant. Among other recognitions, he has been awarded with the highest National and Regional scientific recognitions to young scientists: the Premio Nacional de Investigación “Felisa Martín-Bravo” 2022 and the “Miguel Catalan” Award 2022. He was appointed Fellow of the International Association of Advanced Materials (IAAM) in 2020, and he has been included in the Highly Cited Researchers 2018–2022 lists of Clarivate/WOS and has been also recognized with the Young Researcher Award (experimental physics) of the Royal Physical Society of Spain (2016).

Click here to enlarge figure

The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

Castellanos-Gomez, A. Good Practices for Scientific Article Writing with ChatGPT and Other Artificial Intelligence Language Models. Nanomanufacturing 2023 , 3 , 135-138. https://doi.org/10.3390/nanomanufacturing3020009

Castellanos-Gomez A. Good Practices for Scientific Article Writing with ChatGPT and Other Artificial Intelligence Language Models. Nanomanufacturing . 2023; 3(2):135-138. https://doi.org/10.3390/nanomanufacturing3020009

Castellanos-Gomez, Andres. 2023. "Good Practices for Scientific Article Writing with ChatGPT and Other Artificial Intelligence Language Models" Nanomanufacturing 3, no. 2: 135-138. https://doi.org/10.3390/nanomanufacturing3020009

Article Metrics

Article access statistics, further information, mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

MIDAS

  • Open Search Box
  • Toggle side widget area
  • No menu assigned!

QUICK LINKS

Using generative ai for scientific research, a quick user’s guide.

(Last updated: 5/03/2024)

If you’re just getting started with using generative AI (GenAI) in your research, start here.

This guide includes frequently asked questions and shows how GenAI can be used throughout the entire research process, based on published guidelines from journals, funding agencies, professional societies, and our own assessment of GenAI’s benefits and risks.

GenAI is a rapidly evolving technology, and we will update this guide as new information becomes available. Suggestions for improvements or additions? Email [email protected]. We look forward to developing this guide collaboratively with our research community.

Technical Guides for Using Generative AI

ai language model for scientific research paper writing

GenAI in Coding

This quick-start guide helps researchers with little programming experience learn coding with an AI assistant’s help, assuming a chosen programming language like Python, JavaScript, or C++.

ai language model for scientific research paper writing

Using ChatGPT’s ‘Data Analysis’

Explore how to use ChatGPT 4’s ‘Data Analysis’ feature effectively. This guide covers code organization, error checking, data visualization, and language translation, maximizing the likelihood of accurate and efficient results.

ai language model for scientific research paper writing

Using Custom GPTs in ChatGPT 4

Explore how to use ChatGPT 4’s ‘Custom GPT’ feature effectively, which allow users to create custom versions of ChatGPT for specific tasks.

Using Generative AI for Writing

Can i use generative ai to write research papers.

The default stance on using generative AI for writing research papers should generally be NO, particularly for creative contributions, due to issues around authorship, copyright, and plagiarism. However, generative AI can be beneficial for editorial assistance, provided you are aware of what is acceptable at your target publication venue.

Generating text and images for publications in scientific journals raises issues of authorship, copyright and plagiarism, many of which are still unresolved. Therefore, this is a very controversial area and many journals and research conferences are updating their policies. If you want to do this, please read very carefully the guidelines for authors of your target journal.

Here are a few examples of new authorship guidelines. 

  • Springer Nature journals prohibit the use of generative AI to generate images for manuscripts; texts generated by LLM should be well documented, and AI is not granted authorship.
  • Science journals require full disclosure for the use of generative AI to generate text; generative AI-generated images and multimedia can be used only with explicit permission of their editors. AI is not granted authorship.
  • JAMA and the JAMA network journals do not allow generative AI to be listed as authors. However, generative AI generated content or assistance in writing / editing are allowed in manuscripts but should be reported in the manuscript.
  • Elsevier permits the use of AI tools to enhance text readability but not creating or altering scientific content. Authors should provide full disclosure of  the use of AI. It prohibits the use of AI to generate or alter images, unless this is part of the research method. AI authorship is not allowed.
  • IEEE mandates disclosure of all AI-generated content in submissions, exceptI for editing and grammar enhancement.
  • The International Conference on Machine Learning prohibits content generated by generative AI, unless it is part of the research study being described.

While direct generation of content by generative AI is problematic, its role in the earlier stages of writing can be advantageous. For instance, non-native English speakers may use generative AI to refine the language of their writing. Generative AI can also serve as a tool for providing feedback on writing, similar to a copy editor’s role, by improving voice, argument, and structure. This utility is distinct from using AI for direct writing. As long as the human author assumes full responsibility for the final content, such editing help from generative AI is increasingly being recognized as acceptable in most disciplines where language is not the primary scholarly contribution. However, conservative editorial policies at some venues may limit the use of such techniques in the short term.

Can I use generative AI to write grants?

This should be undertaken only with an understanding of the risks involved. The bottom line is that the investigator is signing off on the proposal and is promising to do the work if funded, and so has to take responsibility for every part of the proposal content, even if generative AI assisted in some parts.

The reasoning is similar to that for writing papers, as discussed above, except that there usually will not be copyright and plagiarism issues. Also, not many funding agencies have well-developed policies as yet in this regard. 

For example, although the National Institutes of Health (NIH) does not specifically prohibit the use of generative AI to write grants (they do prohibit use of generative AI technology in the peer review process), they state that an author assumes the risk of using an AI tool to help write an application, noting “[…] when we receive a grant application, it is our understanding that it is the original idea proposed by the institution and their affiliated research team.” If AI generated text includes plagiarism, fabricated citations or falsified information, the NIH “will take appropriate actions to address the non-compliance.” ( Source.)

Similarly, the National Science Foundation (NSF), in its notice dated December 14, 2023, emphasizes the use of generative AI in grant proposal preparation and the merit review process. While NSF acknowledges the potential benefits of AI in enhancing productivity and creativity, it imposes strict guidelines to safeguard the integrity and confidentiality of proposals. 

The DOE requires authors to verify any citations suggested by generative AI,  due to potential inaccuracies, and does not allow  AI-based chatbots like ChatGPT to be credited as authors or co-authors.

Reviewers are prohibited from uploading proposal content to non-approved AI tools, and proposers are encouraged to disclose the extent and manner of AI usage in their proposals. The NSF stresses that any breach in confidentiality or authenticity, especially through unauthorized disclosure via AI, could lead to legal liabilities and erosion of trust in the agency. (Source.)

Can I use generative AI to help me when I write a literature review section for my paper?

Generative AI can offer multiple advantages. Generative AI can help you summarize a particular paper, so this saves you time and enables you to cover a much larger number of publications in the limited time you have. Generative AI can also help you summarize literature around certain research questions by searching through many papers. 

However, you should consider a number of factors that may impact how much you can trust such reviews.

  • When generative AI encounters a request that it lacks information / knowledge about, sometimes it “makes up” an answer. This “AI hallucination” is well documented and probably many of us have experienced it. You are responsible for verifying the summaries that generative AI gives you.
  • Unlike human researchers, generative AI does not have the ability to evaluate the quality of the published work. Therefore, it will indiscriminately include publications of varying quality, perhaps also many studies that cannot be reproduced. 
  • A generative AI model has a knowledge cutoff date, so newer publications after the cutoff date will not be included in the responses that it gives you.
  • Other types of inaccuracies. Generative AI’s effectiveness is based on the training datasets. Even though enormous amounts of training data are now used for generative AI models, there is still no guarantee that the training is unbiased.

Also, please do keep in mind all the limitations discussed above regarding the use of generative AI to assist in writing research papers. Subject to those limitations, this seems to be a reasonable thing to do.

Can I use generative AI to write non-technical summaries, create presentations, and translate my work?

Generative AI can be beneficial for summarizing or translating your work, especially with its ability to adjust the tone of a text, making it easier to create brief but complete summaries that suit different types of readers. Several advanced generative AI models are designed specifically to transform scientific manuscripts into presentations. 

However, you should be sure that, while using generative AI to summarize, present, or translate your work, you don’t input confidential information to generative AI. You should also always verify that summaries, presentations and translations created by generative AI accurately represent your work. When using generative AI for translation, it could be challenging if you are not proficient in both languages involved and you need to consult with a fluent speaker for verification. Also note that not all generative AI models are explicitly designed for translation tasks. Therefore, you should explore and identify the most suitable generative AI model that aligns with your specific translation needs.

Using Generative AI to Improve Productivity

Can i use generative ai to review grant proposals or review papers.

No, you should not do this. The National Institutes of Health recently announced that it prohibits the use of generative AI to analyze and formulate critiques of grant proposals . This not only applies to generative AI systems that are publicly available, but also to systems hosted locally (such as a university’s own generative AI), as long as data may be shared with multiple individuals. The main rationale is that this would constitute a breach of confidentiality, which is essential in the grant review process. To use generative AI tools to evaluate and summarize grant proposals, or even let it edit critiques, one would need to feed to the AI system “substantial, privileged, and detailed information.” When we don’t know how the AI system will save, share or use the information that it is fed, we should not feed it such information.

Furthermore, expert review relies upon subject matter expertise, which a generative AI system could not be relied upon to have. So, it is unlikely that generative AI will produce a reliable and high-quality review.

For these reasons, we don’t recommend that you use generative AI for reviewing grant proposals or papers, even if the relevant publication venue or funding agency, unlike NIH, has not issued explicit guidance.

Can I use generative AI to write letters of support?

Generative AI can, in some situations, be useful to help you draft a letter, or edit your draft and to help you adopt a certain tone. We are not aware of any explicit rules against this. However, please keep in mind the following:

  • You are still fully responsible for everything in the letter because you are still the author.
  • You should consider the issue of confidentiality. Is there confidential information in the letter? If so, generative AI should not “know” it, because, again, we do not know for sure what it does with the information that users feed it.
  • Texts written by GPT tend to sound very generic. This is not good for letters of support, whose value may depend on their providing very specific information, and recommendations, about the subject of the letter. You still need to ensure that the letter is what you feel comfortable sending and will convey to the reader the same level of support to the subject of the letter if you’d write it yourself.

How can I use generative AI as a Brainstorming Partner in My Research?

Generative AI can serve as effective brainstorming partners in research. These systems can – when used appropriately – help generate a variety of ideas, perspectives, and potential solutions, particularly useful during the initial stages of research planning. For instance, a researcher can input their basic research concept into the AI system and receive suggestions on experimental approaches, potential methodologies, or alternative research questions. An example prompt may be:

“Analyze recent research on memory consolidation and the influence of emotions on learning and recall. Based on this analysis, generate new hypotheses for potential studies investigating neurobiological mechanisms.”

However,  AI-generated ideas must be critically evaluated. While AI can offer diverse insights, these are based on existing data and may not always be novel or contextually appropriate. Researchers should use these suggestions as a starting point for further development rather than as definitive solutions.

Using Generative AI for Data Generation and Analysis

Can i use generative ai to write code.

Yes, provided you can read code! Generative AI can indeed output computer programs. But, just as in the case of text, it is possible you get code that is good-looking but erroneous. To the extent that it is often easier to read code than to write it, you may be better off using generative AI to write code for you. We provide a guide on generating, editing and reviewing code using ChatGPT 4.0 here and a coding tutorial using local software such as GitHub copilot here .

This applies not just to computer programs, but also to databases. You can have generative AI write code for you in SQL to manage and to query databases. In fact, in many cases, you could even do some minimal debugging just by running the code/query on known instances and checking to make sure you get the right answers. While basic tests like these can catch many errors, remember that there is no guarantee your program will work on complex examples just because it worked on simple ones.

Can I use generative AI for data analysis and visualization?

Yes. Generative AI models have been constantly improved to carry out data analysis and visualization. We provide some examples of data analysis and visualizations using ChatGPT 4.0 here . 

Can I use generative AI as a substitute for human participants in surveys?

Using generative AI as a substitute for human participants in surveys is not advisable due to significant concerns regarding construct validity. Generative AI, while adept at processing and generating data, cannot authentically replicate the nuances of human behavior and opinions that are the purpose of surveying humans in research. 

However, generative AI can be valuable in the preliminary stages of survey design. It can assist in testing the clarity and structure of survey questions, helping address ambiguity and effectively capture the intended information. This application leverages AI’s capability to process language and simulate varied responses, providing insights into how questions may be interpreted by a diverse audience. In short, while generative AI’s use as a direct replacement for human survey participants is not recommended due to validity concerns, its role in enhancing survey design and testing is a viable and beneficial application.

Can generative AI be used for labeling data?

Generative AI can be employed for labeling, such as categorizing text and images. This application can streamline processes that are traditionally time-consuming and labor-intensive for human judges. However, the reliability of AI in these tasks requires careful consideration and validation on a case-by-case basis.

The key concern with AI-based judgment in labeling is its dependence on the quality and bias of training data. AI systems might replicate any inherent biases present in their training datasets, leading to skewed or inaccurate labeling. Researchers must validate the AI’s performance – comparing output with human-labeled benchmarks to ensure accuracy and impartiality.

Can I use generative AI to review data for errors and biases?

Yes! Generative AI can serve as a supplementary tool in the process of data quality assurance, assisting in the identification of errors, inconsistencies, or biases in datasets. Its capability to process extensive data rapidly enables it to spot potential issues that might be missed in manual reviews. Researchers should use Generative AI as one component of a broader data review strategy. It’s essential to corroborate AI-detected anomalies with manual checks and expert assessments.

Reporting the Use of Generative AI

How do i cite contents created or assisted by generative ai.

You used generative AI in the course of writing a research paper. How do you give it credit? And how do you inform the reader of your paper about its use?

Generative AI should not be listed as a co-author, but its use must be noted in the paper, including appropriate detail, e.g. about specific prompts and responses. The Committee on Publication Ethics has a succinct and incisive analysis .

The use of generative AI should be disclosed in the paper, along with a description of the places and manners of use. Typically, such disclosures will be in a “Methods” section of the paper, if it has one. If you rely on generative AI output, you should cite it, just as you would cite a web page look up or a personal communication. Keep in mind that some conversation identifiers may be local to your account, and hence not useful to your reader. Good citation style recommendations have been suggested by the American Psychological Association (APA) and the Chicago Manual of Style .

How do I report the use of generative AI models in a paper?

We provide recommendations on reporting the use of generative AI in research here .

Considerations for Choosing Generative AI Models

How do i decide which generative ai to use in research.

The most important factor is which generative AI system (what data, what model, what computing requirements) fits well with your research questions. In addition, there are some general considerations. 

Open source . “Open source” describes software that is published alongside the source code for use and exploration by anyone. This is a consideration because most generative AI models are not developed locally by the researchers themselves (as opposed to the usual Machine Learning models). Open-source generative AIs, as well generative AI systems trained with publicly accessible data, can be advantageous for researchers who would like to fine tune generative AI models, scrutinize the security and functionality of the system, and improve explainability and interpretability of the models. 

Accuracy and precision . When outputs of a generative AI can be verified (for example, if it is used in data analytics), you can gauge the efficacy of a generative AI by its precision and accuracy. 

Cost. Some models require subscriptions to APIs (application programming interfaces) for research use. Other models may be able to be integrated locally, but also come with integration costs and potentially ongoing costs for maintenance and updates. When selecting otherwise free models, you might need to cover the cost for an expert to set up and maintain the model.

Can I customize generative AI models?

Yes. Some commercial generative AI developers now provide ways for users to easily customize the models, provide their own data and documents to fine tune the models, and specify the styles of model outputs. See our Custom GPT guide for more details.

What uniquely generative AI issues should I consider when I adopt generative AI in my research?

The nature of generative AI gives rise to a number of considerations that the entire research community is trying to grapple with. Transparency and accountability about the generative AI’s operations and decision making processes can be difficult when you operate a closed-source system.

We invite you to think about the following carefully, and be aware that many other issues might arise.

Data privacy concerns. Data privacy is more complicated with generative AI when using cloud-based services, as users don’t know for certain what happens to their input data and whether it could be retained for training future AI models. One way to circumvent these privacy concerns is to use locally-deployed generative AI models that run entirely on your own hardware and do not send data back to the AI provider. An example is Nvidia ChatRTX .

Bias in data. Bias in data, and consequently bias in the AI system’s output, could be a major issue because generative AI is trained on large datasets that you usually can’t access or assess, and may inadvertently learn and reproduce biases, stereotypes, and majority views present in these data. Moreover, many generative AI models are trained with overwhelmingly English texts, Western images and other types of data. Non-Western or non-English speaking cultures, as well as work by minorities and non-English speakers are seriously underrepresented in the training data. Thus, the results created by generative AI are definitely culturally biased. This should be a major consideration when assessing whether generative AI is suitable for your research.

AI hallucination. generative AI can produce outputs that are factually inaccurate or entirely incorrect, uncorroborated, nonsensical or fabricated. These phenomena are dubbed “hallucinations”. Therefore, it is essential for you to verify generative AI-generated output with reliable and credible sources.

Plagiarism. generative AI can only generate new contents based on, or drawn from, the data that it is trained on. Therefore, there is a likelihood that they will produce outputs that are similar to the training data, even to the point of being regarded as plagiarism if the similarity is too high. As such, you should confirm (e.g. by using plagiarism detection tools) that generative AI outputs are not plagiarized but instead “learned” from various sources in the manner humans learn without plagiarizing. 

Prompt Engineering. The advent of generative AI has created a new human activity – prompt engineering – because the quality of generative AI responses is heavily influenced by the user input or ‘prompt’. There are courses dedicated to this concept. However, you will need to experiment with how to craft prompts that are clear, specific and appropriately structured so that generative AI will generate the output with the desired style, quality and purpose. 

Knowledge Cutoff Date. Many generative AI models are trained on data up to a specific date, and are therefore unaware of any events or information produced beyond that. For example, if a generative AI is trained on data up to March 2019, they would be unaware of COVID-19 and the impact it had on humanity, or who is the current monarch of Britain. You need to know the cutoff date of the generative AI model that you use in order to assess what research questions are appropriate for its use.

Model Continuity . When you use generative AI models developed by external entities / vendors, you need to consider the possibility that one day the vendor might discontinue the model. This might have a big impact on the reproducibility of your research. 

Security . As with any computer or online system, a generative AI system is susceptible to security breaches and attacks. We have already mentioned the issue of confidentiality and privacy as you input information or give prompts to the system. But malicious attacks could be a bigger threat. For example, a new type of attack, prompt injection, deliberately feeds harmful or malicious contents into the system to manipulate the results that it generates for users. generative AI developers are designing processes and technical solutions against such risks (for example, see OpenAI’s GPT4 System Card and disallowed usage policy . But as a user, you also need to be aware what is at risk, follow guidelines of your local IT providers, and do due diligence with the results that a generative AI creates for you.

Lack of Standardized Evaluations : The AI Index Report 2024 found that leading developers test their models against different responsible AI benchmarks, making it challenging to systematically compare the risks and limitations of AI models. Be wary when models tout confidence in certain evaluation measures, as the measures may not have been fully tested.

Related Resources

Additional reading.

Many recommendations, guidelines and comments are out there regarding the use of Generative AI in research and in other lines of work. Here are a few examples.

  • Best Practices for Using AI When Writing Scientific Manuscripts: Caution, Care, and Consideration: Creative Science Depends on It. Jullian M. Buriak, et al.  ACS Nano (2023)
  • Science journals set new authorship guidelines for AI-generated text . Jennifer Harker. National Institute of Environmental Health Sciences (2023)
  • NIH prohibits the use of Generative AI in peer review . (2023)
  • Fighting reviewer fatigue or amplifying bias? Considerations and recommendations for use of ChatGPT and other large language models in scholarly peer review. Mohammad Hosseini and Serge P J M Horbach. Research integrity and peer review (2023)
  • Nonhuman “Authors” and Implications for the Integrity of Scientific Publication and Medical Knowledge . Annette Flanagin et al, JAMA (2023)
  • 2024 AI Index Report . Stanford University (2024)

For more content including manuscripts, use of generative AI in research and more – see our generative AI resource page . 

Sign Up for the MIDAS Newsletter

University of Michigan Office of Research

Copyright © 2020 The Regents of the University of Michigan

  • U-M Gateway
  • U-M Non-discrimination Statement
  • Governance and Advisory Boards
  • Annual Highlights
  • Newsletters
  • Research Pillars
  • MIDAS Funded Projects
  • Generative AI
  • AI in Science Program
  • Michigan Data Science Fellows
  • Graduate Certificate
  • Summer Academies
  • Future Leaders Summit
  • Summer Camp
  • Other Programs
  • Mini-Symposia
  • Annual Meeting 2024
  • Past Events
  • Academic Partnerships
  • Data Science for Social Good
  • Working Groups
  • Staff Collective for Data Science
  • NLP+CSS 201
  • Student Organizations Council
  • Projects for Students
  • Generative AI Resources
  • Reproducibility Hub
  • Research Datasets
  • Drug Discovery IUCRC
  • Interesting
  • Scholarships
  • UGC-CARE Journals

How to Use Google Bard AI to Write a Scientific Research Paper

Google Bard AI is a powerful tool that can be used to accelerate scientific research. This article will discuss how to use Google Bard AI to write a scientific research paper, including tips on brainstorming ideas, conducting literature reviews, writing your paper, and editing your paper.

Dr. Somasundaram R

As technology advances, the role of Artificial Intelligence (AI) in the research field is becoming more and more prominent. One of the latest developments in this regard is Google’s latest AI-powered language model, Google Bard. Google Bard is an AI tool that is designed to help researchers write scientific research papers more efficiently and effectively.

It uses advanced algorithms to analyze research data and generate a coherent research paper that follows the standard scientific writing style. In this article, ilovephd will discuss how to use Google Bard AI to write a scientific research paper .

Understanding the Basics of Google Bard AI:

Google Bard is an AI tool developed by Google that uses advanced Natural Language Processing (NLP) algorithms to analyze research data and generate a research paper.

It is a deep learning-based AI model that has been trained on a large corpus of scientific research papers to understand the structure, language, and tone of scientific writing.

Google Bard AI is designed to automate the tedious and time-consuming task of writing a scientific research paper by analyzing research data, identifying key findings, and presenting them in a coherent and understandable manner.

Steps to Use Google Bard AI to Write a Scientific Research Paper:

1. Gather Data and Create an Outline:

The first step in using Google Bard AI to write a scientific research paper is to gather all the necessary data and create an outline. The data should include all the relevant research materials, including research articles, scientific papers, and other relevant sources.

Once you have gathered all the necessary data, create an outline that includes the main sections of your paper, such as the introduction, methodology, results, discussion, and conclusion. The outline should also include the key points you want to make in each section.

2. Input Data into Google Bard AI:

The next step is to input the data into Google Bard AI. To do this, go to the Google Bard website and sign in with your Google account.

Once you have logged in, click on the “Create New Document” button and select the option to create a scientific research paper. This will open a new document where you can input your research data.

3. Customize the Settings:

Before you start writing, you can customize the settings of Google Bard AI according to your preferences. For example, you can choose the language, tone, and style of the writing. You can also set the level of complexity and the length of the paper.

4. Start Writing:

Once you have customized the settings, you can start writing. Google Bard AI will analyze your research data and start generating a research paper. You can write your paper in sections, or you can write it all at once. As you write, Google Bard AI will suggest changes and improvements to your writing. You can accept or reject these suggestions according to your preferences.

5. Edit and Proofread:

Once you have finished writing, you should edit and proofread your paper carefully. Google Bard AI is not perfect, and there may be errors or inconsistencies in the writing. You should check the grammar, spelling, and punctuation carefully and make any necessary corrections.

6. Submit your Paper:

Once you are satisfied with your paper, you can submit it for review. You can download your paper as a Word document or a PDF file and submit it to a scientific journal or conference.

The Benefits and Limitations of Using Google Bard AI for Scientific Research

Google Bard is a large language model, also known as a conversational AI or chatbot trained to be informative and comprehensive.

It is trained on a massive amount of text data and is able to communicate and generate human-like text in response to a wide range of prompts and questions.

For example, Google Bard can provide summaries of factual topics or create stories.

Benefits of Using Google Bard AI to Write a Scientific Research Paper:

  • Saves Time:

One of the biggest benefits of using Google Bard AI to write a scientific research paper is that it saves time. Writing a research paper can be a tedious and time-consuming task, especially if you are not familiar with the scientific writing style. With Google Bard AI, you can automate the writing process and generate a research paper quickly and efficiently.

  • Improves Writing Quality:

Another benefit of using Google Bard AI is that it improves the quality of your writing. Google Bard AI is

trained on a large corpus of scientific research papers, so it is familiar with the standard scientific writing style. This means that the AI can help you write more coherently and effectively by suggesting changes to your writing.

  • Helps with Data Analysis:

Google Bard AI also helps with data analysis. The AI can analyze research data and identify key findings that can be included in your research paper. This saves time and effort in manually analyzing data and ensures that your research is comprehensive and accurate.

  • Provides Suggestions and Improvements:

Google Bard AI provides suggestions and improvements to your writing. This can be helpful in improving the quality of your research paper and ensuring that it meets the standards of the scientific community. You can accept or reject these suggestions based on your preferences.

5. Customizable Settings:

Google Bard AI also has customizable settings that allow you to adjust the language, tone, and style of your writing. This allows you to tailor your writing to your audience and ensure that your research paper is effective in communicating your findings.

6. Accelerating the pace of research:  

Bard can be used to automate tasks that are typically time-consuming and labor-intensive, such as data collection, analysis, and writing. This can free up researchers to focus on more creative and strategic aspects of their work.

7. Improving the quality of research:  

Bard can be used to help researchers identify and correct errors in their work. It can also be used to provide researchers with access to a wider range of information, which can help them to develop more comprehensive and informed research proposals and findings.

8. Making research more accessible:  

Bard can be used to make scientific research more accessible to a wider range of people. This can be done by providing a platform for researchers to share their work with others and by making it easier for people to learn about scientific research.

Challenges of Using Google Bard AI to Write a Scientific Research Paper:

  • Limited Vocabulary:

One of the challenges of using Google Bard AI is that it has a limited vocabulary. This means that the AI may not be able to understand or accurately represent complex scientific concepts or terminology. It is important to review the writing generated by the AI carefully and make any necessary corrections or additions.

  • May Not Capture Your Writing Style:

Google Bard AI is designed to mimic the standard scientific writing style. However, it may not capture your individual writing style or tone. It is important to review the writing generated by the AI and make any necessary adjustments to ensure that it accurately represents your research and writing style.

3. Limited Control over the Writing Process:

When using Google Bard AI, you have limited control over the writing process. While the AI can generate a research paper quickly and efficiently, it may not always capture the nuances or complexities of your research. It is important to review the writing generated by the AI carefully and make any necessary corrections or additions.

4. Bard is still under development:  

Bard is still under development, so it is not always perfect. It can sometimes make mistakes, such as providing inaccurate or incomplete information. It is important for researchers to be aware of Bard’s limitations and to use it responsibly.

5. Bard is not a replacement for human researchers:  

Bard is a powerful tool, but it is not a replacement for human researchers. It is important for researchers to use Bard in conjunction with their own knowledge and expertise.

6. Bard can be biased:  

Bard is trained on a massive amount of text data, which may contain biases. It is important for researchers to be aware of these biases and to take steps to mitigate them.

Overall, Google Bard is a powerful tool that can be used to accelerate and improve the quality of scientific research. However, it is important to be aware of its limitations and to use it responsibly.

By analyzing research data and generating a coherent research paper, Google Bard AI saves time and improves the quality of writing. While there are some challenges to using Google Bard AI, such as limited vocabulary and control over the writing process, the benefits of using the AI far outweigh the challenges.

Researchers should consider using Google Bard AI to streamline the writing process and improve the quality of their research papers.

Also Read: How to Use ChatGpt to Write a Scientific Research Paper?

  • brainstorming ideas
  • conducting literature reviews
  • editing your paper
  • getting feedback
  • Google Bard AI
  • scientific research paper
  • writing tips
  • writing your paper

Dr. Somasundaram R

40 Part-Time Jobs Websites for PhD Scholars to Earn Extra Income!

Advantages and disadvantages of getting a patent, working sci-hub proxy links – 2024, most popular, icmr call for research proposal 2024, call for applications: dst inspire faculty fellowship (2024), should you quit your phd explore reasons & alternatives, india – sri lanka joint research funding opportunity, how to check scopus indexed journals 2024, apply for the dst-jsps indo-japan call 2024, india-eu partner up for explainable and robust ai research, best for you, 24 best online plagiarism checker free – 2024, what is phd, popular posts, scopus indexed journals list 2024, how to write a research paper in a month, popular category.

  • POSTDOC 317
  • Interesting 258
  • Journals 234
  • Fellowship 131
  • Research Methodology 102
  • All Scopus Indexed Journals 92

Mail Subscription

ilovephd_logo

iLovePhD is a research education website to know updated research-related information. It helps researchers to find top journals for publishing research articles and get an easy manual for research tools. The main aim of this website is to help Ph.D. scholars who are working in various domains to get more valuable ideas to carry out their research. Learn the current groundbreaking research activities around the world, love the process of getting a Ph.D.

Contact us: [email protected]

Google News

Copyright © 2024 iLovePhD. All rights reserved

  • Artificial intelligence

ai language model for scientific research paper writing

  • USC Libraries
  • Research Guides

Organizing Your Social Sciences Research Paper

Generative ai and writing.

  • Purpose of Guide
  • Design Flaws to Avoid
  • Independent and Dependent Variables
  • Glossary of Research Terms
  • Reading Research Effectively
  • Narrowing a Topic Idea
  • Broadening a Topic Idea
  • Extending the Timeliness of a Topic Idea
  • Academic Writing Style
  • Applying Critical Thinking
  • Choosing a Title
  • Making an Outline
  • Paragraph Development
  • Research Process Video Series
  • Executive Summary
  • The C.A.R.S. Model
  • Background Information
  • The Research Problem/Question
  • Theoretical Framework
  • Citation Tracking
  • Content Alert Services
  • Evaluating Sources
  • Primary Sources
  • Secondary Sources
  • Tiertiary Sources
  • Scholarly vs. Popular Publications
  • Qualitative Methods
  • Quantitative Methods
  • Insiderness
  • Using Non-Textual Elements
  • Limitations of the Study
  • Common Grammar Mistakes
  • Writing Concisely
  • Avoiding Plagiarism
  • Footnotes or Endnotes?
  • Further Readings
  • USC Libraries Tutorials and Other Guides
  • Bibliography

Research Writing and Generative AI Large Language Models

A rapidly evolving phenomenon impacting higher education is the availability of generative artificial intelligence systems [such as Chat Generative Pre-trained Transformer or ChatGPT]. These systems have been developed from scanning text from millions of books, web sites, and other sources to enable algorithms within the system to learn patterns in how words and sentences are constructed. This allows the platforms to respond to a broad range of questions and prompts, generate stories, compose essays, create lists, and more. Generative AI systems are not actually thinking or understanding like a human, but they are good at mimicking written text based on what it has learned from the sources of input data used to build and enhance its artificial intelligence algorithms, protocols, and standards.

As such, generative AI systems [a.k.a., “Large Language Models”] have emerged , depending on one’s perspective, as either a threat or an opportunity in how faculty create or modify class assignments and how students approach the task of writing a college-level research paper. We are in the early stages of understanding how LLMs may impact learning outcomes associated with information literacy, i.e., fluency in effectively applying the skills needed to effectively identify, gather, organize, critically evaluate, interpret, and report information. However, before this is fully understood, Large Language Models w ill continue to improve and become more sophisticated, as will academic integrity detection programs used to identify AI generated text in student papers.

When assigned to write a research paper, it is up to your professor if using ChatGTP is permitted or not. Some professors embrace using these systems as part of an in-class writing exercise to help understand their limitations, while others will warn against its use because of their current defects and biases. That said, the future of information seeking using LLMs means that the intellectual spaces associated with research and writing will likely collapse into a single online environment in which students will be able to perform in-depth searches for information connected to the Libraries' many electronic resources.

As LLMs quickly become more sophisticated, here are some potential ways generative artificial intelligence programs could facilitate organizing and writing your social sciences research paper:

  • Explore a Topic – develop a research problem related to the questions you have about a general subject of inquiry.
  • Formulate Ideas – obtain background information and explore ways to place the research problem within specific contexts .
  • Zero in on Specific Research Questions and Related Sub-questions – create a query-based framework for how to investigate the research problem.
  • Locate Sources to Answer those Questions – begin the initial search for sources concerning your research questions.
  • Obtain Summaries of Sources – build a synopsis of the sources to help determine their relevance to the research questions underpinning the problem.
  • Outline and Structure an Argument – present information that assists in formulating an argument or an explanation for a stated position.
  • Draft and Iterate on a Final Essay – create a final essay based on a process of repeating the action of text generation on the results of each prior action [i.e., ask follow up questions to build on or clarify initial results].

Despite their power to create text, generative AI systems are far from perfect and their ability to “answer” questions can be misleading, deceiving, or outright false. Described below are some current problems adapted from an essay written by Bernard Marr at Forbes Magazine and reiterated by researchers studying LLMs and writing. These issues focus on problems with using ChatGPT, but they are applicable to any current Large Language Model program .

  • Not Connected to the Internet . Although the generative AI systems may appear to possess a significant amount of information, most LLM’s are currently not mining the Internet for that information [note that this is changing quickly. For example, an AI chatbot feature is now embedded into Microsoft’s Bing search engine, but you'll probably need to pay for this feature in the future]. Without a connection to the Internet, LLMs cannot provide real-time information about a topic. As a result, the scope of research is limited and any new developments in a particular field of study will not be included in the responses. In addition, the LLMs can only accept input in text format. Therefore, other forms of knowledge such as videos, web sites, audio recordings, or images, are excluded as part of the inquiry prompts.
  • The Time-consuming Consequences of AI Generated Hallucinations . If proofreading AI generated text results in discovering nonsensical information or an invalid list of scholarly sources [e.g., the title of a book is not in the library catalog or found anywhere online], you obviously must correct these errors before handing in your paper. The challenge is that you have to replace nonsensical or false statements with accurate information and you must support any AI generated declarative statements [e.g., "Integrated reading strategies are widely beneficial for children in middle school"] with citations to valid academic research that supports this argument . This requires reviewing the literature to locate real sources and real information, which is time consuming and challenging if you didn't actually compose the text. And, of course, if your professor asks you to show what page in a book or journal article you got the information from to support a generated statement of fact, well, that's a problem. Given this, ChatGPT and other systems should be viewed as a help tool and never a shortcut to actually doing the work of investigating a research problem.
  • Trouble Generating Long-form, Structured Content . ChatGPT and other systems are inadequate at producing long-form content that follows a particular structure, format, or narrative flow. The models are capable of creating coherent and grammatically correct text and, as a result, they are currently best suited for generating shorter pieces of content like summaries of topics, bullet point lists, or brief explanations. However, they are poor at creating a comprehensive, coherent, and well-structured college-level research paper.
  • Limitations in Handling Multiple Tasks . Generative AI systems perform best when given a single task or objective to focus on. If you ask LLMs to perform multiple tasks at the same time [e.g., a question that includes multiple sub-questions], the models struggle to prioritize them, which will lead to a decrease in the accuracy and reliability of the results.
  • Biased Responses . This is important to understand. While ChatGPT and other systems are trained on a large set of text data, that data has not been widely shared so that it can be reviewed and critically analyzed. You can ask the systems what sources they are using, but any responses can not be independently verified. Therefore, it is not possible to identify any hidden biases or prejudices that exist within the data [i.e., it doesn't cite its sources]. This means the LLM may generate responses that are biased, discriminatory, or inappropriate in certain contexts .
  • Accuracy Problems or Grammatical Issues . The sensitivity to typographical errors, grammatical errors, and misspellings is currently very limited in LLMs. The models may produce responses that are technically correct, but they may not be entirely accurate in terms of context or relevance. This limitation can be particularly challenging when processing complex or specialized information where accuracy and precision are essential. Given this, never take the information that is generated at face value; always proofread and verify the results!

As they currently exist, ChatGPT and other Large Language Models truly are artificial in their intelligence. They cannot express thoughts, feelings, or other affective constructs that help a reader intimately engage with the author's written words; the output contains text, but the systems are incapable of producing creative expressions or thoughts, such as, conveying the idea of willful deception and other narrative devices that you might find in a poem or song lyric. Although creative devices, such as metaphors, idioms, imagery or subtleties in narrative rhythm, style, or voice, are rarely used in academic writing, it does illustrate that personalizing the way you present your research [e.g., sharing a personal story relating to the significance of the topic or being asked to write a reflective paper ] cannot be generated artificially.

Ethical Considerations

In the end, the ethical choice of whether to use ChatGTP or similar platforms to help write your research paper is up to you; it’s an introspective negotiation between you and your conscience. As noted by Bjork (2023) and others, though, it is important to keep in mind the overarching ethical problems related to the use of LLMs. These include:

  • LLMs Do Not Understand the Meaning of Words . Without meaning as a guide, these systems use algorithms that rely on formulating context clues, stylistic structures, writing forms, linguistic patterns, and word frequency in determining how to respond to queries. This functionality means that, by default, LLMs perpetuate dominant modes of writing and language use while minimizing or hiding less common ones. As a result,...
  • LLMs Prioritize Standard American English . White English-speaking men have dominated most writing-intensive sectors of the knowledge economy, such as, journalism, law, politics, medicine, academia, and perhaps most importantly, computer programming. As a result, writers and speakers of African American, Indigenous English, and other sociolinguistic dialects that use forms of language with its own grammar, lexicon, slang, and history of resistance within the dominant culture, are penalized and shamed for writing as they speak. The default functionality and outputs of LLMs, therefore, can privilege forms of English writing developed primarily by the dominant culture.
  • LLMs Do Not Protect User Privacy . ChatGPT and other platforms record and retain the entire content of your conversations with the systems. This means any information you enter, including personal information or, for example, any documents you ask the systems to revise is retained and cannot be removed. Although the American Data Privacy and Protection Act was being considered within the 117th Congress, there is no federal privacy law that regulates how these for-profit companies can store, use, or possibly sell information entered into their platforms. Given this, it is highly recommended that personal information should never be included in any queries.

NOTE:   If your professor allows you to use generative AI programs or you decide on your own to use an LLM for a writing assignment, then this fact should be cited in your research paper, just as any other source of information used to write your paper should be acknowledged. Why? Because unlike grammar or citation tools, such as Grammarly or Citation Machine that correct text you've already written, generative AI programs are creating new content that is not in your own words. Currently, the American Psychological Association (APA), Modern Language Association (MLA) and the Chicago Manual of Style provide recommendations on how to cite generated text.

ANOTHER NOTE: LLMs have significant deficiencies that still require attention to thorough proofreading and source verification, an ability to discern quality information from misleading, false, irrelevant, or even made up information, a capacity to interpret and critically analyze what you have found, and the skills required to extrapolate meaning from the research your have conducted. For help with any or all of these elements of college-level research and writing, you should still contact a librarian for help.

YET ANOTHER NOTE: Researchers are finding early evidence that suggests over-reliance on ChatGPT and other LLM platforms for even the simplest writing task may, over time, undermine confidence in a student's own writing ability. Just like getting better at giving a class presentation or working on a group project, good writing is an acquired skill that can only be improved upon through the act of doing; the more you write, the more comfortable and confident you become expressing your own ideas, opinions, and judgements applied to the problem you have researched. Substituting LLMs with your own voice can inhibit your growth as a writer, so give yourself room to think and write creatively and with confidence by accepting LLMs as a tool rather than a definitive source of text.

For more information about Generative AI platforms and guidance on their ethical use in an academic setting, review the USC Libraries' Using Generative AI in Research guide for students and faculty. For an introduction to the limitations and potential pitfalls generative AI text generators applied to law, GO HERE .

Introduction to ChatGPT for Library Professionals. Mike Jones and Curtis Fletcher. USC Libraries, Library Forum, May 18, 2023; Aikins, Ross and Albert Kuo. “What Students Said About the Spring of ChatGPT.” Inside Higher Education , September 3, 2023; Baugh, John. “Linguistic Profiling across International Geopolitical Landscapes.” 152 Dædalus (Summer 2023): 167-177; ChatGPT. Library, Wesleyan University; Bjork, Collin. "ChatGPT Threatens Language Diversity." The Conversation , February 9, 2023; Understanding AI Writing Tools and their Uses for Teaching and Learning at UC Berkeley . Center for Teaching & Learning, University of California, Berkeley; Ellis, Amanda R., and Emily Slade. "A New Era of Learning: Considerations for ChatGPT as a Tool to Enhance Statistics and Data Science Education." Journal of Statistics and Data Science Education 31 (2023): 1-10; Ray, Partha Pratim. “ChatGPT: A Comprehensive Review on Background, Applications, Key Challenges, Bias, Ethics, Limitations and Future Scope.” Internet of Things and Cyber-Physical Systems (2023); Uzun, Levent. "ChatGPT and Academic Integrity Concerns: Detecting Artificial Intelligence Generated Content." Language Education and Technology 3, no. 1 (2023); Lund, Brady D. Et al. “ChatGPT and a New Academic Reality: Artificial Intelligence Written Research Papers and the Ethics of the Large Language Models in Scholarly Publishing.” Journal of the Association for Information Science and Technology 74 (February 2023): 570–581; Rasul, Tareq et al. "The Role of ChatGPT in Higher Education: Benefits, Challenges, and Future Research Directions.” Journal of Applied Learning and Teaching 6 (2023); Rudolph, Jürgen, Samson Tan, and Shannon Tan. "ChatGPT: Bullshit Spewer or the End of Traditional Assessments in Higher Education?" Journal of Applied Learning and Teaching 6, no. 1 (2023): 342-362; Marr, Bernard. “The Top 10 Limitations Of ChatGPT.” Forbes (March 3, 2023): https://www.forbes.com/sites/bernardmarr/2023/03/03/the-top-10-limitations-of-chatgpt/?sh=41ae78e8f355; Thinking about ChatGPT? Academic Integrity at UBC, Office of the Provost and Vice-President Academic, University of British Columbia.

  • << Previous: Further Readings
  • Next: Acknowledgments >>
  • Last Updated: Jun 18, 2024 10:45 AM
  • URL: https://libguides.usc.edu/writingguide

Mapping the Increasing Use of LLMs in Scientific Papers

Scientific publishing lays the foundation of science by disseminating research findings, fostering collaboration, encouraging reproducibility, and ensuring that scientific knowledge is accessible, verifiable, and built upon over time. Recently, there has been immense speculation about how many people are using large language models (LLMs) like ChatGPT in their academic writing, and to what extent this tool might have an effect on global scientific practices. However, we lack a precise measure of the proportion of academic writing substantially modified or produced by LLMs. To address this gap, we conduct the first systematic, large-scale analysis across 950,965 papers published between January 2020 and February 2024 on the arXiv , bioRxiv , and Nature portfolio journals, using a population-level statistical framework to measure the prevalence of LLM-modified content over time. Our statistical estimation operates on the corpus level and is more robust than inference on individual instances. Our findings reveal a steady increase in LLM usage, with the largest and fastest growth observed in Computer Science papers (up to 17.5%). In comparison, Mathematics papers and the Nature portfolio showed the least LLM modification (up to 6.3%). Moreover, at an aggregate level, our analysis reveals that higher levels of LLM-modification are associated with papers whose first authors post preprints more frequently, papers in more crowded research areas, and papers of shorter lengths. Our findings suggests that LLMs are being broadly used in scientific writings.

1 Introduction

Since the release of ChatGPT in late 2022, anecdotal examples of both published papers (Okunytė, 2023 ; Deguerin, 2024 ) and peer reviews (Oransky & Marcus, 2024 ) which appear to be ChatGPT-generated have inspired humor and concern. 1 1 1 Increased attention to ChatGPT-use by multilingual scholars has also brought to the fore important conversations about entrenched linguistic discrimination in academic publishing (Khanna et al., 2022 ) . While certain tells, such as “regenerate response” (Conroy, 2023b ; a ) and “as an AI language model” (Vincent, 2023 ) , found in published papers indicate modified content, less obvious cases are nearly impossible to detect at the individual level (Else, 2023 ; Gao et al., 2022 ) . Liang et al. ( 2024 ) present a method for detecting the percentage of LLM-modified text in a corpus beyond such obvious cases. Applied to scientific publishing, the importance of this at-scale approach is two-fold: first, rather than looking at LLM-use as a type of rule-breaking on an individual level, we can begin to uncover structural circumstances which might motivate its use. Second, by examining LLM-use in academic publishing at-scale, we can capture epistemic and linguistic shifts, miniscule at the individual level, which become apparent with a birdseye view.

Measuring the extent of LLM-use on scientific publishing has urgent applications. Concerns about accuracy, plagiarism, anonymity, and ownership have prompted some prominent scientific institutions to take a stance on the use of LLM-modified content in academic publications. The International Conference on Machine Learning (ICML) 2023, a major machine learning conference, has prohibited the inclusion of text generated by LLMs like ChatGPT in submitted papers, unless the generated text is used as part of the paper’s experimental analysis  (ICML, 2023 ) . Similarly, the journal Science has announced an update to their editorial policies, specifying that text, figures, images, or graphics generated by ChatGPT or any other LLM tools cannot be used in published works  (Thorp, 2023 ) . Taking steps to measure the extent of LLM-use can offer a first-step in identifying risks to the scientific publishing ecosystem. Furthermore, exploring the circumstances in which LLM-use is high can offer publishers and academic institutions useful insight into author behavior. Sites of high LLM-use can act as indicators for structural challenges faced by scholars. These range from pressures to “publish or perish” which encourage rapid production of papers to concerns about linguistic discrimination that might lead authors to use LLMs as prose editors.

We conduct the first systematic, large-scale analysis to quantify the prevalence of LLM-modified content across multiple academic platforms, extending a recently proposed, state-of-the-art distributional GPT quantification framework  (Liang et al., 2024 ) for estimating the fraction of AI-modified content in a corpus. Throughout this paper, we use the term “LLM-modified” to refer to text content substantially updated by ChatGPT beyond basic spelling and grammatical edits. Modifications we capture in our analysis could include, for example, summaries of existing writing or the generation of prose based on outlines.

A key characteristic of this framework is that it operates on the population level, without the need to perform inference on any individual instance. As validated in the prior paper, the framework is orders of magnitude more computationally efficient and thus scalable, produces more accurate estimates, and generalizes better than its counterparts under significant temporal distribution shifts and other realistic distribution shifts.

We apply this framework to the abstracts and introductions (Figures  1 and 7 ) of academic papers across multiple academic disciplines,including arXiv , bioRxiv , and 15 journals within the Nature portfolio, such as Nature ,   Nature Biomedical Engineering , Nature Human Behaviour , and Nature Communications . Our study analyzes a total of 950,965 papers published between January 2020 and February 2024, comprising 773,147 papers from arXiv , 161,280 from bioRxiv , and 16,538 from the Nature portfolio journals. The papers from arXiv cover multiple academic fields, including Computer Science, Electrical Engineering and Systems Science, Mathematics, Physics, and Statistics. These datasets allow us to quantify the prevalence of LLM-modified academic writing over time and across a broad range of academic fields.

Our results indicate that the largest and fastest growth was observed in Computer Science papers, with α 𝛼 \alpha italic_α reaching 17.5% for abstracts and 15.3% for introductions by February 2024. In contrast, Mathematics papers and the Nature portfolio showed the least increase, with α 𝛼 \alpha italic_α reaching 4.9% and 6.3% for abstracts and 3.5% and 6.4% for introductions, respectively. Moreover, our analysis reveals at an aggregate level that higher levels of LLM-modification are associated with papers whose first authors post preprints more frequently and papers with shorter lengths. Results also demonstrate a closer relationship between papers with LLM-modifications, which could indicate higher use in more crowded fields of study (as measured by the distance to the nearest neighboring paper in the embedding space), or that generated-text is flattening writing diversity.

2 Related Work

Gpt detectors.

Various methods have been proposed for detecting LLM-modified text, including zero-shot approaches that rely on statistical signatures characteristic of machine-generated content (Lavergne et al., 2008 ; Badaskar et al., 2008 ; Beresneva, 2016 ; Solaiman et al., 2019 ; Mitchell et al., 2023a ; Yang et al., 2023a ; Bao et al., 2023 ; Tulchinskii et al., 2023 ) and training-based methods that finetune language models for binary classification of human vs. LLM-modified text (Bhagat & Hovy, 2013 ; Zellers et al., 2019 ; Bakhtin et al., 2019 ; Uchendu et al., 2020 ; Chen et al., 2023 ; Yu et al., 2023 ; Li et al., 2023 ; Liu et al., 2022 ; Bhattacharjee et al., 2023 ; Hu et al., 2023a ) . However, these approaches face challenges such as the need for access to LLM internals, overfitting to training data and language models, vulnerability to adversarial attacks (Wolff, 2020 ) , and bias against non-dominant language varieties (Liang et al., 2023a ) . The effectiveness and reliability of publicly available LLM-modified text detectors have also been questioned (OpenAI, 2019 ; Jawahar et al., 2020 ; Fagni et al., 2021 ; Ippolito et al., 2019 ; Mitchell et al., 2023b ; Gehrmann et al., 2019 ; Heikkilä, 2022 ; Crothers et al., 2022 ; Solaiman et al., 2019 ; Kirchner et al., 2023 ; Kelly, 2023 ) , with the theoretical possibility of accurate instance-level detection being debated (Weber-Wulff et al., 2023 ; Sadasivan et al., 2023 ; Chakraborty et al., 2023 ) . In this study, we apply the recently proposed distributional GPT quantification framework (Liang et al., 2024 ) , which estimates the fraction of LLM-modified content in a text corpus at the population level, circumventing the need for classifying individual documents or sentences and improving upon the stability, accuracy, and computational efficiency of existing approaches. A more comprehensive discussion of related work can be found in Appendix  G .

3 Background: the distributional LLM quantification framework

We adapt the distributional LLM quantification framework from Liang et al. ( 2024 ) to quantify the use of AI-modified academic writing. The framework consists of the following steps:

1 𝛼 𝒫 𝑥 𝛼 𝒬 𝑥 \mathcal{D}_{\alpha}(X)=(1-\alpha)\mathcal{P}(x)+\alpha\mathcal{Q}(x) caligraphic_D start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_X ) = ( 1 - italic_α ) caligraphic_P ( italic_x ) + italic_α caligraphic_Q ( italic_x ) , where α 𝛼 \alpha italic_α is the fraction of AI-modified documents. The goal is to estimate α 𝛼 \alpha italic_α based on observed documents { X i } i = 1 N ∼ 𝒟 α similar-to superscript subscript subscript 𝑋 𝑖 𝑖 1 𝑁 subscript 𝒟 𝛼 \{X_{i}\}_{i=1}^{N}\sim\mathcal{D}_{\alpha} { italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ∼ caligraphic_D start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT .

Parameterization : To make α 𝛼 \alpha italic_α identifiable, the framework models the distributions of token occurrences in human-written and LLM-modified documents, denoted as 𝒫 T subscript 𝒫 𝑇 \mathcal{P}_{T} caligraphic_P start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT and 𝒬 T subscript 𝒬 𝑇 \mathcal{Q}_{T} caligraphic_Q start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT , respectively, for a chosen list of tokens T = { t i } i = 1 M 𝑇 superscript subscript subscript 𝑡 𝑖 𝑖 1 𝑀 T=\{t_{i}\}_{i=1}^{M} italic_T = { italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT . The occurrence probabilities of each token in human-written and LLM-modified documents, p t subscript 𝑝 𝑡 p_{t} italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and q t subscript 𝑞 𝑡 q_{t} italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , are used to parameterize 𝒫 T subscript 𝒫 𝑇 \mathcal{P}_{T} caligraphic_P start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT and 𝒬 T subscript 𝒬 𝑇 \mathcal{Q}_{T} caligraphic_Q start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT :

Estimation : The occurrence probabilities p t subscript 𝑝 𝑡 p_{t} italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and q t subscript 𝑞 𝑡 q_{t} italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT are estimated using collections of known human-written and LLM-modified documents, { X j P } j = 1 n P superscript subscript superscript subscript 𝑋 𝑗 𝑃 𝑗 1 subscript 𝑛 𝑃 \{X_{j}^{P}\}_{j=1}^{n_{P}} { italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT end_POSTSUPERSCRIPT and { X j Q } j = 1 n Q superscript subscript superscript subscript 𝑋 𝑗 𝑄 𝑗 1 subscript 𝑛 𝑄 \{X_{j}^{Q}\}_{j=1}^{n_{Q}} { italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_Q end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , respectively:

1 𝛼 subscript ^ 𝒫 𝑇 𝑋 𝛼 subscript ^ 𝒬 𝑇 𝑋 \hat{\mathcal{D}}_{\alpha,T}(X)=(1-\alpha)\hat{\mathcal{P}}_{T}(X)+\alpha\hat{% \mathcal{Q}}_{T}(X) over^ start_ARG caligraphic_D end_ARG start_POSTSUBSCRIPT italic_α , italic_T end_POSTSUBSCRIPT ( italic_X ) = ( 1 - italic_α ) over^ start_ARG caligraphic_P end_ARG start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_X ) + italic_α over^ start_ARG caligraphic_Q end_ARG start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_X ) :

Liang et al. ( 2024 ) demonstrate that the data points { X i } i = 1 N ∼ 𝒟 α similar-to superscript subscript subscript 𝑋 𝑖 𝑖 1 𝑁 subscript 𝒟 𝛼 \{X_{i}\}_{i=1}^{N}\sim\mathcal{D}_{\alpha} { italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ∼ caligraphic_D start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT can be constructed either as a document or as a sentence, and both work well. Following their method, we use sentences as the unit of data points for the estimates for the main results. In addition, we extend this framework for our application to academic papers with two key differences:

Generating Realistic LLM-Produced Training Data

We use a two-stage approach to generate LLM-produced text, as simply prompting an LLM with paper titles or keywords would result in unrealistic scientific writing samples containing fabricated results, evidence, and ungrounded or hallucinated claims.

Specifically, given a paragraph from a paper known to not include LLM-modification, we first perform abstractive summarization using an LLM to extract key contents in the form of an outline. We then prompt the LLM to generate a full paragraph based the outline (see Appendix for full prompts).

Our two-stage approach can be considered a counterfactual framework for generating LLM text: given a paragraph written entirely by a human, how would the text read if it conveyed almost the same content but was generated by an LLM? This additional abstractive summarization step can be seen as the control for the content. This approach also simulates how scientists may be using LLMs in the writing process, where the scientists first write the outline themselves and then use LLMs to generate the full paragraph based on the outline.

Using the Full Vocabulary for Estimation

We use the full vocabulary instead of only adjectives, as our validation shows that adjectives, adverbs, and verbs all perform well in our application (Figure  3 ). Using the full vocabulary minimizes design biases stemming from vocabulary selection. We also find that using the full vocabulary is more sample-efficient in producing stable estimates, as indicated by their smaller confidence intervals by bootstrap.

4 Implementation and Validations

4.1 data collection and sampling.

We collect data from three sources: arXiv , bioRxiv , and 15 journals from the Nature portfolio. For each source, we randomly sample up to 2,000 papers per month from January 2020 to February 2024. The procedure for generating the LLM-generated corpus data is described in Section § § \S §   3 . We focused on the introduction sections for the main texts, as the introduction was the most consistently and commonly occurring section across diverse categories of papers. See Appendix  C for comprehensive implementation details.

4.2 Data Split, Model Fitting, and Evaluation

For model training, we count word frequencies for scientific papers written before the release of ChatGPT and the LLM-modified corpora described in Section 3 . We fit the model with data from 2020, and use data from January 2021 onwards for validation and inference. We fit separate models for abstracts and introductions for each major category.

To evaluate model accuracy and calibration under temporal distribution shift, we use 3,000 papers from January 1, 2022, to November 29, 2022, a time period prior to the release of ChatGPT, as the validation data. We construct validation sets with LLM-modified content proportions ( α 𝛼 \alpha italic_α ) ranging from 0% to 25%, in 5% increments, and compared the model’s estimated α 𝛼 \alpha italic_α with the ground truth α 𝛼 \alpha italic_α (Figure 3 ). Full vocabulary, adjectives, adverbs, and verbs all performed well in our application, with a prediction error consistently less than 3.5% at the population level across various ground truth α 𝛼 \alpha italic_α values (Figure 3 ).

5 Main Results and Findings

5.1 temporal trends in ai-modified academic writing.

We apply the model to estimate the fraction of LLM-modified content ( α 𝛼 \alpha italic_α ) for each paper category each month, for both abstracts and introductions. Each point in time was independently estimated, with no temporal smoothing or continuity assumptions applied.

Our findings reveal a steady increase in the fraction of AI-modified content ( α 𝛼 \alpha italic_α ) in both the abstracts (Figure  1 ) and the introductions (Figure  7 ), with the largest and fastest growth observed in Computer Science papers. By February 2024, the estimated α 𝛼 \alpha italic_α for Computer Science had increased to 17.5% for abstracts and 15.5% for introductions. The second-fastest growth was observed in Electrical Engineering and Systems Science, with the estimated α 𝛼 \alpha italic_α reaching 14.4% for abstracts and 12.4% for introductions during the same period. In contrast, Mathematics papers and the Nature portfolio showed the least increase. By the end of the studied period, the estimated α 𝛼 \alpha italic_α for Mathematics had increased to 4.9% for abstracts and 3.9% for introductions, while the estimated α 𝛼 \alpha italic_α for the Nature portfolio had reached 6.3% for abstracts and 4.3% for introductions.

The November 2022 estimates serve as a pre-ChatGPT reference point for comparison, as ChatGPT was launched on November 30, 2022. The estimated α 𝛼 \alpha italic_α for Computer Science in November 2022 was 2.3%, while for Electrical Engineering and Systems Science, Mathematics, and the Nature portfolio, the estimates were 2.9%, 2.4%, and 3.1%, respectively. These values are consistent with the false positive rate reported in the earlier section ( § § \S §   4.2 ).

5.2 Relationship Between First-Author Preprint Posting Frequency and GPT Usage

We found a notable correlation between the number of preprints posted by the first author on arXiv and the estimated number of LLM-modified sentences in their academic writing. Papers were stratified into two groups based on the number of first-authored arXiv Computer Science preprints by the first author in the year: those with two or fewer ( ≤ 2 absent 2 \leq 2 ≤ 2 ) preprints and those with three or more ( ≥ 3 absent 3 \geq 3 ≥ 3 ) preprints (Figure 4 ). We used the 2023 author grouping for the 2024.1-2 data, as we don’t have the complete 2024 author data yet.

By February 2024, abstracts of papers whose first authors had ≥ 3 absent 3 \geq 3 ≥ 3 preprints in 2023 showed an estimated 19.3% of sentences modified by AI, compared to 15.6% for papers whose first authors had ≤ 2 absent 2 \leq 2 ≤ 2 preprints (Figure 4 a). We observe a similar trend in the introduction sections, with first authors posting more preprints having an estimated 16.9% LLM-modified sentences, compared to 13.7% for first authors posting fewer preprints (Figure 4 b). Since the first-author preprint posting frequency may be confounded by research field, we conduct an additional robustness check for our findings. We find that the observed trend holds for each of the three arXiv Computer Science sub-categories: cs.CV (Computer Vision and Pattern Recognition), cs.LG (Machine Learning), and cs.CL (Computation and Language) (Supp Figure  12 ).

Our results suggest that researchers posting more preprints tend to utilize LLMs more extensively in their writing. One interpretation of this effect could be that the increasingly competitive and fast-paced nature of CS research communities incentivizes taking steps to accelerate the writing process. We do not evaluate whether these preprints were accepted for publication.

5.3 Relationship Between Paper Similarity and LLM Usage

We investigate the relationship between a paper’s similarity to its closest peer and the estimated LLM usage in the abstract. To measure similarity, we first embed each abstract from the arXiv Computer Science papers using OpenAI’s text-embedding-ada-002 model, creating a vector representation for each abstract. We then calculate the distance between each paper’s vector and its nearest neighbor within the arXiv Computer Science abstracts. Based on this similarity measure we divide papers into two groups: those more similar to their closest peer (below median distance) and those less similar (above median distance).

The temporal trends of LLM usage for these two groups are shown in Figure 5 . After the release of ChatGPT, papers most similar to their closest peer consistently showed higher LLM usage compared to those least similar. By February 2024, the abstracts of papers more similar to their closest peer had an estimated 22.2% of sentences modified by LLMs, compared to 14.7% for papers less similar to their closest peer. To account for potential confounding effects of research fields, we conducted an additional robustness check by measuring the nearest neighbor distance within each of the three arXiv Computer Science sub-categories: cs.CV (Computer Vision and Pattern Recognition), cs.LG (Machine Learning), and cs.CL (Computation and Language), and found that the observed trend holds for each sub-category (Supp Figure  13 ).

There are several ways to interpret these findings. First, LLM-use in writing could cause the similarity in writing or content. Community pressures may even motivate scholars to try to sound more similar – to assimilate to the “style” of text generated by an LLM. Alternatively, LLMs may be more commonly used in research areas where papers tend to be more similar to each other. This could be due to the competitive nature of these crowded subfields, which may pressure researchers to write faster and produce similar findings. Future interdisciplinary research should explore these hypotheses.

5.4 Relationship Between Paper Length and AI Usage

We also explored the association between paper length and LLM usage in arXiv Computer Science papers. Papers were stratified by their full text word count, including appendices, into two bins: below or above 5,000 words (the rounded median).

Figure 6 shows the temporal trends of LLM usage for these two groups. After the release of ChatGPT, shorter papers consistently showed higher AI usage compared to longer papers. By February 2024, the abstracts of shorter papers had an estimated 17.7% of sentences modified by LLMs, compared to 13.6% for longer papers (Figure 6 a). A similar trend was observed in the introduction sections (Figure 6 b). To account for potential confounding effects of research fields, we conducted an additional robustness check. The finding holds for both cs.CV (Computer Vision and Pattern Recognition) and cs.LG (Machine Learning) (Supp Figure  14 ). However, for cs.CL (Computation and Language), we found no significant difference in LLM usage between shorter and longer papers, possibly due to the limited sample size, as we only parsed a subset of the PDFs and calculated their full length.

As Computer Science conference papers typically have a fixed page limit, longer papers likely have more substantial content in the appendix. The lower LLM usage in these papers may suggest that researchers with more comprehensive work rely less on LLM-assistance in their writing. However, further investigation is needed to determine the relationship between paper length, content comprehensiveness, and the quality of the research.

6 Discussion

Our findings show a sharp increase in the estimated fraction of LLM-modified content in academic writing beginning about five months after the release of ChatGPT, with the fastest growth observed in Computer Science papers. This trend may be partially explained by Computer Science researchers’ familiarity with and access to large language models. Additionally, the fast-paced nature of LLM research and the associated pressure to publish quickly may incentivize the use of LLM writing assistance (Foster et al., 2015 ) .

We expose several other factors associated with higher LLM usage in academic writing. First, authors who post preprints more frequently show a higher fraction of LLM-modified content in their writing. Second, papers in more crowded research areas, where papers tend to be more similar, showed higher LLM-modification compared to those in less crowded areas. Third, shorter papers consistently showed higher LLM-modification compared to longer papers, which may indicate that researchers working under time constraints are more likely to rely on AI for writing assistance. These results may be an indicator of the competitive nature of certain research areas and the pressure to publish quickly.

If the majority of modification comes from an LLM owned by a private company, there could be risks to the security and independence of scientific practice. We hope our results inspire further studies of widespread LLM-modified text and conversations about how to promote transparent, epistemically diverse, accurate, and independent scientific publishing.

Limitations

While our study focused on ChatGPT, which accounts for more than three-quarters of worldwide internet traffic in the category (Van Rossum, 2024 ) , we acknowledge that there are other large language models used for assisting academic writing. Furthermore, while Liang et al. ( 2023a ) demonstrate that GPT-detection methods can falsely identify the writing of language learners as LLM-generated, our results showed that consistently low false positives estimates of α 𝛼 \alpha italic_α in 2022, which contains a significant fraction of texts written by multilingual scholars. We recognize that significant author population changes  (MacroPolo, 2024 ) or other language-use shifts could still impact the accuracy of our estimates. Finally, the associations that we observe between LLM usage and paper characteristics are correlations which could be affected by other factors such as research topics. More causal studies is an important direction for future work.

Acknowledgments

We thank Daniel A. McFarland, Dan Jurafsky, Zachary Izzo, Xi Victoria Lin, Lingjiao Chen, and Haotian Ye for their helpful comments and discussions. J.Z. is supported by the National Science Foundation (CCF 1763191 and CAREER 1942926), the US National Institutes of Health (P30AG059307 and U01MH098953) and grants from the Silicon Valley Foundation and the Chan-Zuckerberg Initiative. and H.L. is supported by the National Science Foundation (2244804 and 2022435) and the Stanford Institute for Human-Centered Artificial Intelligence (HAI).

  • Aaronson (2023) Scott Aaronson. Simons Institute Talk on Watermarking of Large Language Models, 2023. URL https://simons.berkeley.edu/talks/scott-aaronson-ut-austin-openai-2023-08-17 .
  • Atallah et al. (2001) Mikhail J. Atallah, Victor Raskin, Michael Crogan, Christian F. Hempelmann, Florian Kerschbaum, Dina Mohamed, and Sanket Naik. Natural Language Watermarking: Design, Analysis, and a Proof-of-Concept Implementation. In Information Hiding , 2001.
  • Badaskar et al. (2008) Sameer Badaskar, Sachin Agarwal, and Shilpa Arora. Identifying Real or Fake Articles: Towards better Language Modeling. In International Joint Conference on Natural Language Processing , 2008.
  • Bakhtin et al. (2019) Anton Bakhtin, Sam Gross, Myle Ott, Yuntian Deng, Marc’Aurelio Ranzato, and Arthur Szlam. Real or Fake? Learning to Discriminate Machine from Human Generated Text. ArXiv , abs/1906.03351, 2019.
  • Bao et al. (2023) Guangsheng Bao, Yanbin Zhao, Zhiyang Teng, Linyi Yang, and Yue Zhang. Fast-DetectGPT: Efficient Zero-Shot Detection of Machine-Generated Text via Conditional Probability Curvature. ArXiv , abs/2310.05130, 2023.
  • Beresneva (2016) Daria Beresneva. Computer-Generated Text Detection Using Machine Learning: A Systematic Review. In International Conference on Applications of Natural Language to Data Bases , 2016.
  • Bhagat & Hovy (2013) Rahul Bhagat and Eduard H. Hovy. Squibs: What Is a Paraphrase? Computational Linguistics , 39:463–472, 2013.
  • Bhattacharjee et al. (2023) Amrita Bhattacharjee, Tharindu Kumarage, Raha Moraffah, and Huan Liu. ConDA: Contrastive Domain Adaptation for AI-generated Text Detection. ArXiv , abs/2309.03992, 2023. URL https://api.semanticscholar.org/CorpusID:261660497 .
  • Chakraborty et al. (2023) Souradip Chakraborty, Amrit Singh Bedi, Sicheng Zhu, Bang An, Dinesh Manocha, and Furong Huang. On the possibilities of ai-generated text detection. arXiv preprint arXiv:2304.04736 , 2023.
  • Chen et al. (2023) Yutian Chen, Hao Kang, Vivian Zhai, Liang Li, Rita Singh, and Bhiksha Ramakrishnan. GPT-Sentinel: Distinguishing Human and ChatGPT Generated Content. ArXiv , abs/2305.07969, 2023. URL https://api.semanticscholar.org/CorpusID:258686680 .
  • Chiang et al. (2003) Yuei-Lin Chiang, Lu-Ping Chang, Wen-Tai Hsieh, and Wen-Chih Chen. Natural Language Watermarking Using Semantic Substitution for Chinese Text. In International Workshop on Digital Watermarking , 2003.
  • Conroy (2023a) Gemma Conroy. How ChatGPT and other AI tools could disrupt scientific publishing. Nature , October 2023a. URL https://www.nature.com/articles/d41586-023-03144-w .
  • Conroy (2023b) Gemma Conroy. Scientific sleuths spot dishonest ChatGPT use in papers. Nature , September 2023b. URL https://www.nature.com/articles/d41586-023-02477-w .
  • Crothers et al. (2022) Evan Crothers, Nathalie Japkowicz, and Herna Viktor. Machine Generated Text: A Comprehensive Survey of Threat Models and Detection Methods. arXiv preprint arXiv:2210.07321 , 2022.
  • Deguerin (2024) Mack Deguerin. AI-generated nonsense is leaking into scientific journals . Popular Science , March 2024. URL https://www.popsci.com/technology/ai-generated-text-scientific-journals/ .
  • Dell’Acqua et al. (2023) Fabrizio Dell’Acqua, Edward McFowland, Ethan R Mollick, Hila Lifshitz-Assaf, Katherine Kellogg, Saran Rajendran, Lisa Krayer, François Candelon, and Karim R Lakhani. Navigating the jagged technological frontier: Field experimental evidence of the effects of AI on knowledge worker productivity and quality, 2023. Harvard Business School Technology & Operations Mgt. Unit Working Paper 24-013.
  • Elazar et al. (2023) Yanai Elazar, Akshita Bhagia, Ian Helgi Magnusson, Abhilasha Ravichander, Dustin Schwenk, Alane Suhr, Evan Pete Walsh, Dirk Groeneveld, Luca Soldaini, Sameer Singh, et al. What’s In My Big Data? In The Twelfth International Conference on Learning Representations , 2023.
  • Else (2023) Holly Else. Abstracts written by ChatGPT fool scientists. Nature , Jan 2023. URL https://www.nature.com/articles/d41586-023-00056-7 .
  • Fagni et al. (2021) Tiziano Fagni, Fabrizio Falchi, Margherita Gambini, Antonio Martella, and Maurizio Tesconi. TweepFake: About detecting deepfake tweets. Plos one , 16(5):e0251415, 2021.
  • Fernandez et al. (2023) Pierre Fernandez, Antoine Chaffin, Karim Tit, Vivien Chappelier, and Teddy Furon. Three Bricks to Consolidate Watermarks for Large Language Models. 2023 IEEE International Workshop on Information Forensics and Security (WIFS) , pp.  1–6, 2023.
  • Foster et al. (2015) Jacob G Foster, Andrey Rzhetsky, and James A Evans. Tradition and innovation in scientists’ research strategies. American sociological review , 80(5):875–908, 2015.
  • Gao et al. (2022) Catherine A Gao, Frederick M Howard, Nikolay S Markov, Emma C Dyer, Siddhi Ramesh, Yuan Luo, and Alexander T Pearson. Comparing scientific abstracts generated by ChatGPT to original abstracts using an artificial intelligence output detector, plagiarism detector, and blinded human reviewers. bioRxiv , pp.  2022–12, 2022.
  • Gao et al. (2020) Leo Gao, Stella Biderman, Sid Black, Laurence Golding, Travis Hoppe, Charles Foster, Jason Phang, Horace He, Anish Thite, Noa Nabeshima, et al. The pile: An 800gb dataset of diverse text for language modeling. arXiv preprint arXiv:2101.00027 , 2020.
  • Gehrmann et al. (2019) Sebastian Gehrmann, Hendrik Strobelt, and Alexander M Rush. GLTR: Statistical Detection and Visualization of Generated Text. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations , pp.  111–116, 2019.
  • Ghosal et al. (2023) Soumya Suvra Ghosal, Souradip Chakraborty, Jonas Geiping, Furong Huang, Dinesh Manocha, and A. S. Bedi. Towards Possibilities & Impossibilities of AI-generated Text Detection: A Survey. ArXiv , abs/2310.15264, 2023.
  • Ghosh & Caliskan (2023) Sourojit Ghosh and Aylin Caliskan. ’Person’== Light-skinned, Western Man, and Sexualization of Women of Color: Stereotypes in Stable Diffusion. arXiv preprint arXiv:2310.19981 , 2023.
  • Heikkilä (2022) Melissa Heikkilä. How to spot AI-generated text. MIT Technology Review , Dec 2022. URL https://www.technologyreview.com/2022/12/19/1065596/how-to-spot-ai-generated-text/ .
  • Hu et al. (2023a) Xiaobing Hu, Pin-Yu Chen, and Tsung-Yi Ho. RADAR: Robust AI-Text Detection via Adversarial Learning. ArXiv , abs/2307.03838, 2023a. URL https://api.semanticscholar.org/CorpusID:259501842 .
  • Hu et al. (2023b) Zhengmian Hu, Lichang Chen, Xidong Wu, Yihan Wu, Hongyang Zhang, and Heng Huang. Unbiased Watermark for Large Language Models. ArXiv , abs/2310.10669, 2023b.
  • ICML (2023) ICML. Clarification on large language model policy LLM. https://icml.cc/Conferences/2023/llm-policy , 2023.
  • Ippolito et al. (2019) Daphne Ippolito, Daniel Duckworth, Chris Callison-Burch, and Douglas Eck. Automatic detection of generated text is easiest when humans are fooled. arXiv preprint arXiv:1911.00650 , 2019.
  • Jawahar et al. (2020) Ganesh Jawahar, Muhammad Abdul-Mageed, and Laks VS Lakshmanan. Automatic detection of machine generated text: A critical survey. arXiv preprint arXiv:2011.01314 , 2020.
  • Kelly (2023) Samantha Murphy Kelly. ChatGPT creator pulls AI detection tool due to ‘low rate of accuracy’. CNN Business , Jul 2023. URL https://www.cnn.com/2023/07/25/tech/openai-ai-detection-tool/index.html .
  • Khanna et al. (2022) Saurabh Khanna, Jon Ball, Juan Pablo Alperin, and John Willinsky. Recalibrating the scope of scholarly publishing: A modest step in a vast decolonization process. Quantitative Science Studies , 3(4):912–930, 12 2022. ISSN 2641-3337. doi: 10.1162/qss˙a˙00228 . URL https://doi.org/10.1162/qss_a_00228 .
  • Kirchenbauer et al. (2023) John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, and Tom Goldstein. A watermark for large language models. International Conference on Machine Learning , 2023.
  • Kirchner et al. (2023) Jan Hendrik Kirchner, Lama Ahmad, Scott Aaronson, and Jan Leike. New AI classifier for indicating AI-written text, 2023. URL https://openai.com/blog/new-ai-classifier-for-indicating-ai-written-text .
  • Kuditipudi et al. (2023) Rohith Kuditipudi, John Thickstun, Tatsunori Hashimoto, and Percy Liang. Robust Distortion-free Watermarks for Language Models. ArXiv , abs/2307.15593, 2023.
  • Lavergne et al. (2008) Thomas Lavergne, Tanguy Urvoy, and François Yvon. Detecting Fake Content with Relative Entropy Scoring. Pan , 2008.
  • Li et al. (2023) Yafu Li, Qintong Li, Leyang Cui, Wei Bi, Longyue Wang, Linyi Yang, Shuming Shi, and Yue Zhang. Deepfake Text Detection in the Wild. ArXiv , abs/2305.13242, 2023. URL https://api.semanticscholar.org/CorpusID:258832454 .
  • Liang et al. (2023a) Weixin Liang, Mert Yuksekgonul, Yining Mao, Eric Wu, and James Y. Zou. GPT detectors are biased against non-native English writers. ArXiv , abs/2304.02819, 2023a.
  • Liang et al. (2023b) Weixin Liang, Yuhui Zhang, Hancheng Cao, Binglu Wang, Daisy Ding, Xinyu Yang, Kailas Vodrahalli, Siyu He, Daniel Smith, Yian Yin, et al. Can large language models provide useful feedback on research papers? A large-scale empirical analysis. arXiv preprint arXiv:2310.01783 , 2023b.
  • Liang et al. (2024) Weixin Liang, Zachary Izzo, Yaohui Zhang, Haley Lepp, Hancheng Cao, Xuandong Zhao, Lingjiao Chen, Haotian Ye, Sheng Liu, Zhi Huang, Daniel A. McFarland, and James Y. Zou. Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews. arXiv preprint arXiv:2403.07183 , 2024.
  • Liu & Shah (2023) Ryan Liu and Nihar B Shah. Reviewergpt? an exploratory study on using large language models for paper reviewing. arXiv preprint arXiv:2306.00622 , 2023.
  • Liu et al. (2022) Xiaoming Liu, Zhaohan Zhang, Yichen Wang, Yu Lan, and Chao Shen. CoCo: Coherence-Enhanced Machine-Generated Text Detection Under Data Limitation With Contrastive Learning. ArXiv , abs/2212.10341, 2022. URL https://api.semanticscholar.org/CorpusID:254877728 .
  • Liu et al. (2019) Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. RoBERTa: A Robustly Optimized BERT Pretraining Approach. ArXiv , abs/1907.11692, 2019.
  • MacroPolo (2024) MacroPolo. The Global AI Talent Tracker, 2024. URL https://macropolo.org/digital-projects/the-global-ai-talent-tracker/ .
  • Mitchell et al. (2023a) Eric Mitchell, Yoonho Lee, Alexander Khazatsky, Christopher D. Manning, and Chelsea Finn. DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature. ArXiv , abs/2301.11305, 2023a.
  • Mitchell et al. (2023b) Eric Mitchell, Yoonho Lee, Alexander Khazatsky, Christopher D Manning, and Chelsea Finn. DetectGPT: Zero-shot machine-generated text detection using probability curvature. arXiv preprint arXiv:2301.11305 , 2023b.
  • Okunytė (2023) Paulina Okunytė. Google search exposes academics using ChatGPT in research papers. Cybernews , November 2023. URL https://cybernews.com/news/academic-cheating-chatgpt-openai/ .
  • OpenAI (2019) OpenAI. GPT-2: 1.5B release. https://openai.com/research/gpt-2-1-5b-release , 2019. Accessed: 2019-11-05.
  • Oransky & Marcus (2024) Ivan Oransky and Adam Marcus. Papers and peer reviews with evidence of ChatGPT writing . Retraction Watch , 2024. URL https://retractionwatch.com/papers-and-peer-reviews-with-evidence-of-chatgpt-writing/ .
  • Raffel et al. (2020) Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research , 21(1):5485–5551, 2020.
  • Sadasivan et al. (2023) Vinu Sankar Sadasivan, Aounon Kumar, S. Balasubramanian, Wenxiao Wang, and Soheil Feizi. Can AI-Generated Text be Reliably Detected? ArXiv , abs/2303.11156, 2023.
  • Santurkar et al. (2023) Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. Whose opinions do language models reflect? In International Conference on Machine Learning , pp.  29971–30004. PMLR, 2023.
  • Shi et al. (2023) Zhouxing Shi, Yihan Wang, Fan Yin, Xiangning Chen, Kai-Wei Chang, and Cho-Jui Hsieh. Red Teaming Language Model Detectors with Language Models. ArXiv , abs/2305.19713, 2023.
  • Shumailov et al. (2023) Ilia Shumailov, Zakhar Shumaylov, Yiren Zhao, Yarin Gal, Nicolas Papernot, and Ross Anderson. The curse of recursion: Training on generated data makes models forget. arXiv preprint arXiv:2305.17493 , 2023.
  • Solaiman et al. (2019) Irene Solaiman, Miles Brundage, Jack Clark, Amanda Askell, Ariel Herbert-Voss, Jeff Wu, Alec Radford, Gretchen Krueger, Jong Wook Kim, Sarah Kreps, et al. Release strategies and the social impacts of language models. arXiv preprint arXiv:1908.09203 , 2019.
  • Thorp (2023) H. Holden Thorp. Chatgpt is fun, but not an author. Science , 379(6630):313–313, 2023. doi: 10.1126/science.adg7879 . URL https://www.science.org/doi/abs/10.1126/science.adg7879 .
  • Topkara et al. (2006a) Mercan Topkara, Giuseppe Riccardi, Dilek Z. Hakkani-Tür, and Mikhail J. Atallah. Natural language watermarking: challenges in building a practical system. In Electronic imaging , 2006a.
  • Topkara et al. (2006b) Umut Topkara, Mercan Topkara, and Mikhail J. Atallah. The hiding virtues of ambiguity: quantifiably resilient watermarking of natural language text through synonym substitutions. In Workshop on Multimedia & Security , 2006b.
  • Touvron et al. (2023) Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 , 2023.
  • Tulchinskii et al. (2023) Eduard Tulchinskii, Kristian Kuznetsov, Laida Kushnareva, Daniil Cherniavskii, S. Barannikov, Irina Piontkovskaya, Sergey I. Nikolenko, and Evgeny Burnaev. Intrinsic Dimension Estimation for Robust Detection of AI-Generated Texts. ArXiv , abs/2306.04723, 2023.
  • Uchendu et al. (2020) Adaku Uchendu, Thai Le, Kai Shu, and Dongwon Lee. Authorship Attribution for Neural Text Generation. In Conference on Empirical Methods in Natural Language Processing , 2020.
  • Van Rossum (2024) Dann. Van Rossum. Generative AI Top 150: The World’s Most Used AI Tools. https://www.flexos.work/learn/generative-ai-top-150 , February 2024.
  • Veselovsky et al. (2023) Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. Artificial artificial artificial intelligence: Crowd workers widely use large language models for text production tasks. arXiv preprint arXiv:2306.07899 , 2023.
  • Vincent (2023) James Vincent. ‘As an AI language model’: the phrase that shows how AI is pollulating the web. The Verge , Apr 2023. URL https://www.theverge.com/2023/4/25/23697218/ai-generated-spam-fake-user-reviews-as-an-ai-language-model .
  • Weber-Wulff et al. (2023) Debora Weber-Wulff, Alla Anohina-Naumeca, Sonja Bjelobaba, Tomáš Foltýnek, Jean Guerrero-Dib, Olumide Popoola, Petr Šigut, and Lorna Waddington. Testing of detection tools for AI-generated text. International Journal for Educational Integrity , 19(1):26, 2023. ISSN 1833-2595. doi: 10.1007/s40979-023-00146-z . URL https://doi.org/10.1007/s40979-023-00146-z .
  • Wolff (2020) Max Wolff. Attacking Neural Text Detectors. ArXiv , abs/2002.11768, 2020.
  • Wu et al. (2023) Yihan Wu, Zhengmian Hu, Hongyang Zhang, and Heng Huang. DiPmark: A Stealthy, Efficient and Resilient Watermark for Large Language Models. ArXiv , abs/2310.07710, 2023.
  • Yang et al. (2023a) Xianjun Yang, Wei Cheng, Linda Petzold, William Yang Wang, and Haifeng Chen. DNA-GPT: Divergent N-Gram Analysis for Training-Free Detection of GPT-Generated Text. ArXiv , abs/2305.17359, 2023a.
  • Yang et al. (2023b) Xianjun Yang, Liangming Pan, Xuandong Zhao, Haifeng Chen, Linda Ruth Petzold, William Yang Wang, and Wei Cheng. A Survey on Detection of LLMs-Generated Content. ArXiv , abs/2310.15654, 2023b.
  • Yoo et al. (2023) Kiyoon Yoo, Wonhyuk Ahn, Jiho Jang, and No Jun Kwak. Robust Multi-bit Natural Language Watermarking through Invariant Features. In Annual Meeting of the Association for Computational Linguistics , 2023.
  • Yu et al. (2023) Xiao Yu, Yuang Qi, Kejiang Chen, Guoqiang Chen, Xi Yang, Pengyuan Zhu, Weiming Zhang, and Neng H. Yu. GPT Paternity Test: GPT Generated Text Detection with GPT Genetic Inheritance. ArXiv , abs/2305.12519, 2023. URL https://api.semanticscholar.org/CorpusID:258833423 .
  • Zellers et al. (2019) Rowan Zellers, Ari Holtzman, Hannah Rashkin, Yonatan Bisk, Ali Farhadi, Franziska Roesner, and Yejin Choi. Defending Against Neural Fake News. ArXiv , abs/1905.12616, 2019.
  • Zhang et al. (2023) Yi-Fan Zhang, Zhang Zhang, Liang Wang, Tien-Ping Tan, and Rong Jin. Assaying on the Robustness of Zero-Shot Machine-Generated Text Detectors. ArXiv , abs/2312.12918, 2023.
  • Zhao et al. (2023) Xuandong Zhao, Yu-Xiang Wang, and Lei Li. Protecting Language Generation Models via Invisible Watermarking. In Proceedings of the 40th International Conference on Machine Learning , pp.  42187–42199, 2023.
  • Zhao et al. (2024a) Xuandong Zhao, Prabhanjan Vijendra Ananth, Lei Li, and Yu-Xiang Wang. Provable Robust Watermarking for AI-Generated Text. In International Conference on Learning Representations (ICLR) , 2024a.
  • Zhao et al. (2024b) Xuandong Zhao, Lei Li, and Yu-Xiang Wang. Permute-and-Flip: An optimally robust and watermarkable decoder for LLMs. arXiv preprint arXiv:2402.05864 , 2024b.

Appendix A Estimated Fraction of LLM-Modified Sentences in Introductions

Appendix b llm prompts used in the study, appendix c additional information on implementation and validations, supplementary information about data.

We collected data for this study from three publicly accessible sources: official APIs provided by arXiv and bioRxiv , and web pages from the Nature portfolio. For each of the five major arXiv categories (Computer Science, Electrical Engineering and Systems Science, Mathematics, Physics, Statistics), we randomly sampled 2,000 papers per month from January 2020 to February 2024. Similarly, from bioRxiv , we randomly sampled 2,000 papers for each month within the same timeframe. For the Nature portfolio, encompassing 15 Nature journals including Nature, Nature Biomedical Engineering, Nature Human Behaviour, and Nature Communications, we followed the same sampling strategy, selecting 2,000 papers randomly from each month, from January 2020 to February 2024. The procedure for generating the AI corpus data for a given time period is described in aforementioned Section § § \S §   3 .

When there were not enough papers to reach our target of 2,000 per month, we included all available papers. The Nature portfolio encompasses the following 15 Nature journals: Nature, Nature Communications, Nature Ecology & Evolution, Nature Structural & Molecular Biology, Nature Cell Biology, Nature Human Behaviour, Nature Immunology, Nature Microbiology, Nature Biomedical Engineering, Communications Earth & Environment, Communications Biology, Communications Physics, Communications Chemistry, Communications Materials, and Communications Medicine .

Additional Information on Large Language Models

In this study, we utilized the gpt-3.5-turbo-0125 model, which was trained on data up to September 2021, to generate the training data for our analysis. The LLM was employed solely for the purpose of creating the training dataset and was not used in any other aspect of the study.

We chose to focus on ChatGPT due to its dominant position in the generative AI market. According to a comprehensive analysis conducted by FlexOS in early 2024, ChatGPT accounts for an overwhelming 76% of global internet traffic in the category, followed by Bing AI at 16%, Bard at 7%, and Claude at 1% (Van Rossum, 2024 ) . This market share underscores ChatGPT’s widespread adoption and makes it a highly relevant subject for our investigation. Furthermore, recent studies have also shown that ChatGPT demonstrates substantially better understanding of scientific papers than other LLMs  (Liang et al., 2023b ; Liu & Shah, 2023 ) .

We chose to use GPT-3.5 for generating the training data due to its free availability, which lowers the barrier to entry for users and thereby captures a wider range of potential LLM usage patterns. This accessibility makes our study more representative of the broad phenomenon of LLM-assisted writing. Furthermore, the previous work by Liang et al. ( 2024 ) has demonstrated the framework’s robustness and generalizability to other LLMs. Their findings suggest that the framework can effectively handle significant content shifts and temporal distribution shifts.

Regarding the parameter settings for the LLM, we set the decoding temperature to 1.0 and the maximum decoding length to 2048 tokens during our experiments. The Top P hyperparameter, which controls the cumulative probability threshold for token selection, was set to 1.0. Both the frequency penalty and presence penalty, which can be used to discourage the repetition of previously generated tokens, were set to 0.0. Additionally, we did not configure any specific stop sequences during the decoding process.

Appendix D Word Frequency Shift in arXiv Computer Science introductions

Appendix e fine-grained main findings, appendix f proofreading results on arxiv data, appendix g extended related work, zero-shot llm detection..

A major category of LLM text detection uses statistical signatures that are characteristic of machine-generated text, and the scope is to detect the text within individual documents. Initially, techniques to distinguish AI-modified text from human-written text employed various metrics, such as entropy (Lavergne et al., 2008 ) , the frequency of rare n-grams (Badaskar et al., 2008 ) , perplexity (Beresneva, 2016 ) , and log-probability scores (Solaiman et al., 2019 ) , which are derived from language models. More recently, DetectGPT (Mitchell et al., 2023a ) found that AI-modified text is likely to be found in areas with negative log probability curvature. DNA-GPT (Yang et al., 2023a ) improves performance by examining the divergence in n-gram patterns. Fast-DetectGPT (Bao et al., 2023 ) enhances efficiency by utilizing conditional probability curvature over raw probability. Tulchinskii et al. ( 2023 ) studied the intrinsic dimensionality of generated text to perform the detection. We refer to recent surveys by Yang et al. ( 2023b ); Ghosal et al. ( 2023 ) for additional details and more related works. However, zero-shot detection requires direct access to LLM internals to enable effective detection. Closed-source commercial LLMs, like GPT-4, necessitate using proxy LLMs, which compromises the robustness of zero-shot detection methods across various scenarios (Sadasivan et al., 2023 ; Shi et al., 2023 ; Yang et al., 2023b ; Zhang et al., 2023 ) .

Training-based LLM detection.

Another category is training-based detection, which involves training classification models on datasets that consist of both human and AI-modified texts for the binary classification task of detection. Early efforts applied classification algorithms to identify AI text across various domains, such as peer review submissions (Bhagat & Hovy, 2013 ) , media publications (Zellers et al., 2019 ) , and other contexts (Bakhtin et al., 2019 ; Uchendu et al., 2020 ) . Recently, researchers have finetuned pretrained language model backbones for this binary classification. GPT-Sentinel (Chen et al., 2023 ) uses the constructed dataset OpenGPTText to train RoBERTa (Liu et al., 2019 ) and T5 (Raffel et al., 2020 ) classifiers. GPT-Pat (Yu et al., 2023 ) trains a Siamese neural network to compute the semantic similarity of AI text and human text. Li et al. ( 2023 ) build a wild testbed by gathering texts from various human writings and texts generated by different LLMs. Using techniques such as contrastive and adversarial learning can enhance classifier robustness (Liu et al., 2022 ; Bhattacharjee et al., 2023 ; Hu et al., 2023a ) . We refer to recent surveys Yang et al. ( 2023b ); Ghosal et al. ( 2023 ) for additional methods and details. However, these publicly available tools for detecting AI-modified content have sparked a debate about their effectiveness and reliability  (OpenAI, 2019 ; Jawahar et al., 2020 ; Fagni et al., 2021 ; Ippolito et al., 2019 ; Mitchell et al., 2023b ; Gehrmann et al., 2019 ; Heikkilä, 2022 ; Crothers et al., 2022 ; Solaiman et al., 2019 ) . OpenAI’s decision to discontinue its AI-modified text classifier in 2023 due to “low rate of accuracy” further highlighted this discussion  (Kirchner et al., 2023 ; Kelly, 2023 ) .

Training-based detection methods face challenges such as overfitting to training data and language models, making them vulnerable to adversarial attacks (Wolff, 2020 ) and biased against non-dominant language varieties (Liang et al., 2023a ) . The theoretical possibility of achieving accurate instance -level detection has also been questioned  (Weber-Wulff et al., 2023 ; Sadasivan et al., 2023 ; Chakraborty et al., 2023 ) .

LLM watermarking.

Text watermarking introduces a method to detect AI-modified text by embedding an imperceptible signal, known as a watermark, directly into the text. This watermark can be retrieved by a detector that shares the model owner’s secret key. Early watermarking techniques included synonym substitution (Chiang et al., 2003 ; Topkara et al., 2006b ) and syntactic restructuring (Atallah et al., 2001 ; Topkara et al., 2006a ) . Modern watermarking strategies involve integrating watermarks into the decoding process of language models (Aaronson, 2023 ; Kirchenbauer et al., 2023 ; Zhao et al., 2023 ) . Researchers have developed various techniques, such as the Gumbel watermark (Aaronson, 2023 ) , which uses traceable pseudo-random softmax sampling, and the red-green list approach (Kirchenbauer et al., 2023 ; Zhao et al., 2024a ) , which splits the vocabulary based on hash values of previous n-grams. Some methods focus on preserving the original token probability distributions (Hu et al., 2023b ; Kuditipudi et al., 2023 ; Wu et al., 2023 ) , while others aim to improve detectability and perplexity (Zhao et al., 2024b ) or incorporate multi-bit watermarks (Yoo et al., 2023 ; Fernandez et al., 2023 ) . However, one major concern with watermarking is that it requires the involvement of the model or service owner, such as OpenAI, to implant the watermark during the text generation process. In contrast, the framework by Liang et al. ( 2024 ) operates independently of the model or service owner’s intervention, allowing for the monitoring of AI-modified content without requiring their active participation or adoption.

Implications for LLM Pretraining Data Quality

The increasing prevalence of AI-modified content in academic papers, particularly on platforms like arXiv , has important implications for the quality of LLM pretraining data. arXiv has become a significant source of training data for LLMs, contributing approximately 2.5% of the data for models like Llama (Touvron et al., 2023 ) , 12% for RedPajama (Elazar et al., 2023 ) , and 8.96% for the Pile (Gao et al., 2020 ) . Our findings suggest that a growing proportion of this pretraining data may contain LLM-modified content. Preliminary research indicates that the inclusion of LLM-modified content (Veselovsky et al., 2023 ) in LLM training can lead to several pitfalls, such as the reinforcement of stereotypes and biases against anyone who is not a middle-aged “European/North American man” (Ghosh & Caliskan, 2023 ; Santurkar et al., 2023 ) , the flattening of variation in language and content (Dell’Acqua et al., 2023 ) , and the potential failure of models to accurately capture the true distribution of the original content, which may result in model collapse (Shumailov et al., 2023 ) . Santurkar et al. ( 2023 ) demonstrate that this phenomenon amplifies the effect of LLMs providing content that is unrepresentative of most of the world. As such, our results underscore the importance of robust data curation and filtering strategies even in seemingly unpolluted datasets.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • v.40(2); 2023 Apr
  • PMC10108763

Logo of biolsport

From human writing to artificial intelligence generated text: examining the prospects and potential threats of ChatGPT in academic writing

Ismail dergaa.

1 Primary Health Care Corporation (PHCC), Doha, Qatar

2 Research Unit Physical Activity, Sport, and Health, UR18JS01, National Observatory of Sport, Tunis 1003, Tunisia

3 High Institute of Sport and Physical Education, University of Sfax, Sfax, Tunisia

Karim Chamari

4 Aspetar, Orthopaedic and Sports Medicine Hospital, FIFA Medical Centre of Excellence, Doha, Qatar

Piotr Zmijewski

5 Jozef Pilsudski University of Physical Education in Warsaw, Warsaw, Poland

Helmi Ben Saad

6 University of Sousse, Farhat HACHED hospital, Service of Physiology and Functional Explorations, Sousse, Tunisia

7 University of Sousse, Farhat HACHED hospital, Research Laboratory LR12SP09 «Heart Failure», Sousse, Tunisia

8 University of Sousse, Faculty of Medicine of Sousse, Laboratory of Physiology, Sousse, Tunisia

Natural language processing (NLP) has been studied in computing for decades. Recent technological advancements have led to the development of sophisticated artificial intelligence (AI) models, such as Chat Generative Pre-trained Transformer (ChatGPT). These models can perform a range of language tasks and generate human-like responses, which offers exciting prospects for academic efficiency. This manuscript aims at (i) exploring the potential benefits and threats of ChatGPT and other NLP technologies in academic writing and research publications; (ii) highlights the ethical considerations involved in using these tools, and (iii) consider the impact they may have on the authenticity and credibility of academic work. This study involved a literature review of relevant scholarly articles published in peer-reviewed journals indexed in Scopus as quartile 1. The search used keywords such as “ChatGPT,” “AI-generated text,” “academic writing,” and “natural language processing.” The analysis was carried out using a quasi-qualitative approach, which involved reading and critically evaluating the sources and identifying relevant data to support the research questions. The study found that ChatGPT and other NLP technologies have the potential to enhance academic writing and research efficiency. However, their use also raises concerns about the impact on the authenticity and credibility of academic work. The study highlights the need for comprehensive discussions on the potential use, threats, and limitations of these tools, emphasizing the importance of ethical and academic principles, with human intelligence and critical thinking at the forefront of the research process. This study highlights the need for comprehensive debates and ethical considerations involved in their use. The study also recommends that academics exercise caution when using these tools and ensure transparency in their use, emphasizing the importance of human intelligence and critical thinking in academic work.

INTRODUCTION

In the last decade or so, the field of artificial intelligence (AI) has seen remarkable advances, and Chat Generative Pre-trained Transformer (ChatGPT – open AIs GPT-3 model) is a striking example of this progress [ 1 ]. Academic research has traditionally relied on laborious manual methods to sort and analyse large volumes of text. However, recent advances in natural language processing (NLP) technology have made it possible to automate many of these tasks. ChatGPT is one such technology that has shown promising prospects in academic research. ChatGPT is a large language model (LLM) that has been trained on an extensive corpus of text, enabling it to generate human-like text responses. For a few years now, it has been evident that AI can produce coherent language, and it is becoming increasingly challenging to distinguish AI sentences from those created by humans. In 2022, the journal Nature reported that scientists were already using chatbots as research assistants to help them organize their thoughts, receive feedback on their work, write codes, and even summarize research literature [ 2 ].

ChatGPT has the ability to create well-written student essays, summarize research papers, answer questions well enough to pass medical exams, and generate helpful computer codes, for instance [ 2 ]. It has even created research abstracts that scientists found difficult to distinguish from those written by a human [ 2 ]. However, this technology has also the potential to produce spam, ransomware, and other harmful outputs, which is substantially worrisome for our societies [ 2 ]. Given the potential for LLMs, like ChatGPT, to disrupt various fields, there is an urgent need for the research community to engage in a comprehensive debate on the potential uses, threats and limitations of these tools. Therefore, the aims of this correspondence were to (i) provide an overview of ChatGPT and other NLP technologies, their emergence, limitations, and related practical implications; and (ii) examine the prospects and consequences of using these tools in academic writing

MATERIALS AND METHODS

This manuscript was based on a literature review of relevant scholarly articles, only published in peer-reviewed journals indexed in Scopus as quartile 1. The search was conducted using keywords such as “ChatGPT,” “AI-generated text,” “academic writing,” and “natural language processing”. To ensure the reliability and quality of the sources used in this paper, we excluded preprints from our references. Indeed, the lack of pertinence of the latter and the potential bias they may contain, particularly with the recent emergence of National Library of Medicine (NLM) technologies did not ensure that they would provide a trustworthy and unbiased assessment of the state of the field. The analysis was carried out using a quasiqualitative approach, which involved reading and critically evaluating the sources, and identifying relevant data to support the research questions.

RESULTS AND DISCUSSION

Is nlp technology, including chatgpt, really a new concept.

NLP has been an area of study since the dawn of computing. The creation of electronic computers in the mid-20 th century marked the first attempts to develop computer programs capable of comprehending and generating human language. The roots of products like ChatGPT lie in early AI research from the 1950s and 1960s [ 3 ]. Scientists have been aiming at creating computer programs that could understand and respond to natural language in a manner that mimics human communication. This task was challenging due to the complexity, ambiguity, and variability of human language with complex cultural influences that rendered the task even more difficult. The scientific field dedicated to this problem has been called “NLP”. In the 1980s and 1990s, NLP gained renewed interest due to technological advances that led to the development of more advanced AI systems [ 4 ]. Researchers designed a new generation of statistical AI models capable of learning from vast amounts of text data. These models were a significant improvement from previous rule-based systems and generated responses that sounded more natural [ 4 ]. The field of NLP underwent a significant transformation in the 2010s [ 5 , 6 ]. Indeed, the widespread availability of internet text data and the development of deep learning methods changed the way scientists approached the problem of natural language understanding. Large-scale deep learning models such as recurrent neural networks and transformer models were trained on vast text datasets, enabling them to produce highly realistic human-like responses [ 6 ]. These models learned the nuances and structure of language from the input data, making them more effective than previous statistical models [ 6 ].

One of the most widely known and commonly used deep learning models is GPT, developed by OpenAI in 2018 [ 7 ]. The GPT model was trained on massive amounts of internet text data and was capable of generating responses that closely resembled human writing [ 7 ]. It quickly became popular for use in chatbots and other conversational AI applications. OpenAI subsequently developed even more sophisticated models, such as GPT-2 in 2019 and GPT-3 in 2020, which were capable of generating text that was almost indistinguishable from human writing and excelled in various language/ style tasks [ 8 ]. Despite the effectiveness of ChatGPT and other NLP technologies, they were relatively unknown to the scientific community for a while. However, in November 2022, OpenAI made multiple updates and improvements to ChatGPT, enhancing its ability to handle a wider range of queries and provide more accurate, relevant, and helpful responses to users [ 2 ]. The improvements included more training data, improved language understanding, an expanded knowledge base, and higher accuracy and precision [ 2 ]. The success of GPT and similar models has led to the development of conversational AI models by other companies and research organizations. For instance, Google’s Bidirectional Encoder Representations from Transformers (BERT) and Facebook’s RoBERTa models (a reimplementation of BERT with some modifications to the key hyperparameters and minor embedding tweaks) were trained on even larger text datasets and achieved state-of-the-art results in a range of NLP tasks [ 9 , 10 ].

Is NLP, including ChatGPT, really a source of concern?

There are several concerns about how ChatGPT may impact education, especially in academic writing. While ChatGPT is capable of generating essays on various topics, its scholarly writing still needs improvement, as evidenced by its performance when given an exam and a final project from a science denial class at George Washington University [ 11 ]. Nonetheless, the advent of AI in education may encourage instructors to rethink their teaching methodologies by providing assignments that demand critical thinking and problem-solving beyond the actual capabilities of AI that will evolve, probably in an endless race with humans. Another, more significant worry is the possibility of AI-generated content infiltrating scientific papers [ 2 , 11 , 12 ]. In a 2023 study, reviewers identified only 63% of the fake abstracts created by ChatGPT [ 12 ]. This raises pertinent issues concerning the authenticity and credibility of research publications, which is backed by an exclusively human based reviewer process as of now. The Science family of journals has always required authors to sign a license certifying that their work is original [ 11 ]. Interestingly, to ensure that this standard is met, the license and editorial policies have recently been updated to explicitly prohibit the use of AI-generated text, figures, images, or graphics in research publications. Moreover, AI programs cannot be considered authors of scientific papers. While AI tools can undoubtedly aid scientific research in many ways, it is crucial to recognize that they should function as a supplementary aid rather than a complete substitute for human creativity and ingenuity. Moreover, there are certain responsibilities that all authors of scientific research papers bear, which cannot be assumed by any computer or program.

How can academic efficiency benefit from NLP, including ChatGPT?

ChatGPT offers numerous advantages that make it a valuable tool for academic research. One of its greatest strengths is its ability to process vast amounts of textual data in a short period, which can save researchers significant time and effort. In fact, ChatGPT and other NLP technologies have the potential to automate many tasks that were previously carried out manually. For instance, ChatGPT can be utilized to analyse academic papers by scanning them and extracting important details such as the author(s), publication date, and significant findings [ 11 , 13 ]. This feature not only saves time but also enables researchers to avoid the tedious manual searching of papers. Another benefit of using ChatGPT in academic research is in the creation of summaries. Summarizing lengthy academic papers can be a time-consuming process, but ChatGPT can be trained to automatically generate them [ 12 , 14 , 15 ]. This feature guarantees an objective and unbiased summary, generated by a machine instead of a human (drawbacks and caution are discussed below). Furthermore, researchers can also use ChatGPT to generate research questions. By inputting a topic or research area into ChatGPT, researchers can obtain a list of potential research questions [ 2 ]. This feature can be particularly useful for new researchers or those seeking inspiration for a research project. However, it is important to use ChatGPT with caution and in conjunction with other research methods [ 12 ]. Researchers should carefully consider the scope of their research questions, be aware of the limitations of ChatGPT (see below), take care to attribute sources appropriately, exercise caution with sensitive or controversial topics, and remain informed about new developments in NLP.

Will ChatGPT replace academic researchers?

ChatGPT is a robust LLM with various capabilities, yet it also possesses limitations that make it unsuitable for certain academic research types [ 2 , 12 ]. The utilization of ChatGPT in research could result in the integration of false or biased information into papers, potentially leading to unintentional plagiarism and/or the misattribution of concepts. Moreover, researchers employing LLMs like ChatGPT run the risk of not adequately citing original sources or authors, leading to inadvertent misattribution of information [ 2 ]. Furthermore, the lack of transparency in the training sets and LLMs underlying ChatGPT and other conversational can cause biases and inaccuracies [ 2 ]. Importantly, the dearth of transparency and inability to access the internal workings of these models is contradictory to the current trend of transparency and open science. Additionally, concerns persist that conversational AIs can be influenced by the biases of their developers and training data, resulting in inaccuracies and incomplete knowledge. Van Dis et al. [ 2 ] demonstrated an instance of ChatGPT’s potential inaccuracies in their endeavour to summarize a systematic review on cognitive behavioural therapy for anxiety-related disorders. ChatGPT generated a response containing factual errors and misrepresentations, which could be attributed to the absence of relevant articles in its training set and/or the incapacity to differentiate between credible and less credible sources. To address these concerns, it is vital for researchers to maintain vigilance and integrate expert-driven fact-checking and verification processes into their work. Additionally, high-quality journals may opt to include human verification steps or technologies capable of identifying LLMs’ interference. The development and implementation of open-source AI technology present another means of addressing transparency and accuracy concerns [ 2 ]. Non-commercial organizations and academic collaborations have already started to create open-source LLMs that can promote innovation and reliability [ 2 , 16 ]. Tech companies may also benefit from releasing relevant portions of their models and corpora as open source, encouraging greater community involvement and enabling the creation of more accurate and comprehensive results [ 2 ]. Similarly to how modern technology has not replaced engineers, computer scientists, or transportation companies, but rather improved their efficiency; academic researchers will not lose their importance with the advent of NLP technology. Instead, they will hopefully continue to grow, improve, and adapt. Hence, we firmly hold the view that NLP cannot substitute academic researchers because it could jeopardize research activities as a whole. ChatGPT has recently brought more light on one of the most pressing challenges faced by academic researchers, namely the threat of ‘fake science’. Scientists shall adapt to contemporary changes while continuing to excel as they have always done throughout history.

What should be done in regard to NLP, including ChatGPT? A call for action

Primarily, we recommend that researchers, reviewers, editors, and publishers should try out ChatGPT for themselves. This will allow them to explore the capabilities of such a program and hopefully follow its development, keeping a close eye on any potential pitfalls or issues. Ultimately, we hope that they will act as guardians of honest science. In this regard, educators should discuss the use and ethics of this technology with undergraduate students. In the absence of any external guidelines so far, responsible group leaders and teachers should determine how to use ChatGPT with honesty, integrity, and transparency and establish some basic rules of engagement [ 12 ]. All contributors to research should be reminded that they would be held accountable for their work, whether it was generated with the support of ChatGPT/equivalent support, or not. Therefore, every author should take responsibility for thoroughly fact checking their text, results, data, code(s), and references. One additional critical issue to address is the implications for diversity and inequality in the research field. LLMs could be a double-edged sword [ 12 ]. Indeed, and importantly, LLMs could help level-off the playing field by removing language barriers and hopefully enabling more people to write high-quality text. However, it is likely that high-income countries and privileged researchers will quickly find ways to exploit LLMs in ways that accelerate their research and widen inequalities even more. Here, we urge the managers and developers to consider these global/language inequities and integrate in the development of future tools features that will provide equal chances to people independently of their language and access to technologies. Therefore, we propose that the U.S. NLM ( https://www.nlm.nih.gov/ ) should create software similar to a plagiarism checker (“NLP pattern checker” could be suggested as a name) to help editors and publishers detect text generated by LLMs instead of researchers and/or reviewers. We foresee an epic battle between ChatGPT’s developers and whoever will be in charge of detecting non-human created text. In that regard, and as of now, ChatGPT’s raw output can be detected on careful inspection, particularly when the text contains more than a few paragraphs and the subject relates to scientific work. This is because LLMs produce patterns of words based on statistical associations in their training data and prompts, meaning that their output can appear generic or contain simple errors.

Should ChatGPT be mentioned in the list of authors?

There are growing concerns regarding the ethical use of LLMs such as ChatGPT in academic research. There is a risk that researchers and students may rely too heavily on these models, resulting in unfair and unreliable work and/or even plagiarism [ 2 ]. To address these concerns, some scientific publishers, such as Springer Nature , have already established guidelines for ethically using LLMs. These guidelines advise against crediting LLMs as authors on research papers since, as mentioned above, AI tools/computers cannot take responsibility for the work. Researchers are also encouraged to document their use of LLMs in the methods or acknowledgements sections of their papers [ 17 ]. As an illustration of the latter [ 18 ], we asked ChatGPT to generate the abstract of the present manuscript ( Box 1 ). Despite these guidelines, some papers have already credited LLMs as co-authors, such as an editorial in Nurse Education in Practice [ 19 ]. However, despite the honest behaviour of the authors to credit the help of ChatGPT, the journal’s editor-in-chief has further acknowledged this as an oversight from the journal editorial board and has promised to make the necessary corrections [ 20 ]. Another paper, published in Oncoscience , credited ChatGPT as a co-author of a perspective article [ 21 ]. The author of the paper, the chief executive officer of Insilico Medicine , strikingly stated that his company has published over 80 papers produced by generative AI tools [ 21 ]. While some argue that LLMs can produce higher quality work than earlier generations of AI tools, there are still concerns about their ethical use in research. Publishers such as Springer Nature have established guidelines to promote ethical and transparent use of LLMs while holding researchers accountable for their work [ 22 ]. To maintain transparency, it is essential to acknowledge the use of any NLP software, including simpler tools such as Grammarly and ProWritingAid, in the methodology or acknowledgement section of all manuscripts. This recommendation applies not only to ChatGPT and other NLP technologies but should also apply to publishers, editors, and reviewers who should encourage transparency in their own reviews and feedback to authors. Basic fairness would naturally result in acknowledging the contribution of any tool that aided in producing better research outcomes.

Verbatim version of the abstract created by ChatGPT based on the article manuscript final version (Date: March 5, 2023). Readers will identify ChatGPT actual limitations in constructing a well-structured abstract that covers all aspects of the article.

ChatGPT is a powerful tool for academic research with the ability to process vast amounts of textual data and automate tasks that were previously done manually. It can be used to scan academic papers and extract important details, generate objective and unbiased summaries, and create research questions. However, researchers must exercise caution when using ChatGPT as it can also integrate false or biased information into papers, potentially leading to unintentional plagiarism and misattribution of concepts. The lack of transparency in the training sets and underlying models can also cause biases and inaccuracies. To address these concerns, researchers should maintain vigilance, integrate expert-driven fact-checking and verification processes, and encourage the development and implementation of open-source AI technology. Educators should also discuss the use and ethics of this technology with undergraduate students, and all contributors to research should take responsibility for thoroughly fact-checking their work. Additionally, the implications for diversity and inequality in the research field should be considered, and the development of future tools should aim to provide equal chances to people regardless of their language and access to technologies. Finally, the proposal for the creation of a plagiarism checker similar to NLP pattern checker to help detect text generated by LLMs is suggested to help editors and publishers detect potential issues. Overall, ChatGPT has the potential to improve the efficiency of academic research, but researchers must use it responsibly and with caution to avoid any unintended consequences.

Note from the authors: While ChatGPT may prove to be beneficial to researchers; it is still a long way from being able to replace academics.

Furthermore, we believe that grey literature, including pre-prints, should not be considered as references for the time being. This is because it may contain significant bias due to being totally or partially generated by AI tools. We would like to emphasize that we are not opposed to the use of AI technology in academic writing, but it is important to note that manuscripts that have not undergone a robust peer review process may increase the risk of failing to identify AI-generated/supported articles. This means that academics who serve as reviewers should make the necessary efforts to follow the evolution of LLMs in order to reduce the likelihood that machines will imperceptibly replace humans in research.

Is the use of NLP, including ChatGPT, by academic researchers any different in the sports science field?

We searched MEDLINE on March 4, 2023, using the combination of the following two terms: ChatGPT AND Sport. Only two editorials were found [ 23 , 24 ]. The use of ChatGPT in academic research, including sports science, has both advantages and disadvantages that are not fundamentally different from other academic fields. One potential limitation is that ChatGPT was trained on a small academic database [ 25 ], which may not capture the full breadth and depth of knowledge in sports science. Furthermore, ChatGPT has been trained on a dataset that ends in September 2021 [ 25 ], while the academic sports science field is rapidly evolving, with new research emerging at an unprecedented pace. Within a year, evidence can quickly become outdated as new discoveries and advances are made. This emphasizes the importance of keeping up to speed with the latest research and constantly reassessing and updating scientific knowledge in the field, which increases the potential for bias in ChatGPT’s responses to sports science questions. Likewise, much of the online sports science information may be biased or based on anecdotal evidence rather than empirical research. This is because much of the content available on the internet is created by non-experts or individuals with a vested interest in promoting specific ideas or products. As a result, the quality and reliability of the available information may limit ChatGPT’s responses to sports science questions. Moreover, the popularization of sports science has resulted in an increase in the availability of biased or inaccurate information on the internet [ 26 ]. Athletes and fitness enthusiasts frequently share their experiences and advice, which may or may not be supported by empirical research and may be influenced by commercial interests. As a result, ChatGPT should prioritize peer-reviewed academic sources and be sceptical of non-academic sources in order to ensure the accuracy and reliability of its responses. This is especially important in sports science research, where inaccurate or biased data can have serious consequences for athletes’ health and performance.

The limitations of ChatGPT, however, are not unique to sports science research. In many fields, the internet is rife with biased or inaccurate information, and researchers must always be cautious when evaluating online sources. Furthermore, overreliance on ChatGPT-generated responses is a risk in any academic field. While ChatGPT can offer useful insights into research questions, it should not be used as the sole source of analysis. In conjunction with other research methods, researchers should always critically evaluate and interpret ChatGPT-generated responses. To illustrate our discussion with a practical example, the integral answer of ChatGPT to the question “How can we use ChatGPT in sports?’ ’ is reported in Box 2 . We deliberately decided to keep Box 2 as is, even though at the time of our manuscript writing-up (February 2023), ChatGPT’s database was set up to September 2021 only. Therefore, statements as “ Providing realtime updates ”, should obviously considered with caution.

Verbatim’ answer of ChatGPT to the following request: “How can we use ChatGPT in sports?’’ (Access date, February 28 th 2023).

ChatGPT can be used in sports in several ways, such as:
ChatGPT can be used to:
Provide real-time updates on live sports events, such as scores, highlights, and player statistics. This can help fans stay up-to-date on their favourite teams and players.
Answer fan questions about sports, such as rules, regulations, and historical facts. This can help fans better understand the sport and appreciate it more.
Provide personalized training programs for athletes based on their individual needs and goals. This can help athletes improve their performance and achieve their full potential.
Conduct market research on sports fans, such as their preferences, opinions, and behaviours. This can help sports organizations better understand their audience and tailor their offerings to meet their needs
Provide sports betting tips and predictions based on historical data and statistical analysis. This can help bettors make more informed decisions and improve their chances of winning

How easy is it to fool reviewers, editors, journals, and publishers with fake papers?

It is worth noting that in 2013, The Guardian newspaper/website published an article which detailed a sting operation conducted by the science journalist John Bohannon [ 27 ]. Over a period of 10 months, Bohannon submitted a fake scientific paper to 304 open access journals worldwide. The hoax paper had “fatal flaws” and used fabricated content, authors and universities. Surprisingly, and worryingly, the fabricated text was accepted by 157 journals, including those published by industry giants Sage, Elsevier, and Wolters Kluwer. The sting operation raised concerns about the quality control of open access journals and the “gold” open access model. The article suggested that the hoax reflected a weakness in peer review rather than a flaw in the gold open access model. Nowadays, we interpret that journal article differently. Indeed, with the excellent academic skills of NLP technology, less-skilled reviewers are less likely to detect or distinguish fake from true scientific articles. This will inevitably increase the threat of fake science.

It is now clear how easy it is to fool reviewers, editors, journals, and publishers with fake papers. Therefore, the three main messages of our article are as follows:

  • NLP, including ChatGPT, should no longer be considered a threat to academics, as they have always existed, and their output continues to improve, just like scientific researchers’ output will do. Their existence will contribute to the improvement of science, but only if high-quality ethical standards are applied
  • We call for highly skilled academics and pioneering leaders in big academic institutions to impose higher standards for academic writing, such as double author identification verification, or criteria for each author to submit in relevant journals. The same goes for peer reviewers; upon reviewers’ selection, editors may rely on their H-index or the number of their reviews to increase the standards, double institutional review board verification, and/or banning pre-prints or imposing more criteria for deposition before accepting pre prints (we believe that this is less likely to happen; therefore we call on the research community to avoid relying on pre-print manuscripts to back the research)
  • It is essential to include a module for medical students, as well as for master’s and PhD students, on NLP technologies, ethics, and the best way to use them.

When asked to “ consider coming up with tools that will protect science and academics from the misuse of LLMs ”, the integral answer of ChatGPT is reported in Box 3 . The authors call on the readers to judge whether ChatGPT did a satisfactory job for the assignment. Our personal opinion is that the text is really good.

Verbatim’ answer of ChatGPT to the following request: Please consider coming-up with tools that will protect science and academics from the misuse of large language models (LLMs) (access date, February 20 th 2023 – Importantly, this outcome is strictly independent from the content of the present manuscript. The authors would like to emphasize that we asked ChatGPT to answer this question after having completed our paper draft. We were surprised by some text sections that were extremely close to our manuscript, and ensure the readers that we did not take the below responses into account to bring any modification to our article).

There are several steps that can be taken to protect science and academics from the potential misuse of LLMs like ChatGPT. Here are a few suggestions: : It is crucial to educate researchers and academics about the capabilities and limitations of LLMs, as well as the potential risks associated with their use. This could involve providing training on best practices for using LLMs in research and emphasizing the importance of maintaining academic integrity. : The research community could develop guidelines and standards for using LLMs in academic research. This could include standards for verifying the authenticity of research outputs and guidelines for detecting and addressing potential misuse of LLMs. : It is important to develop tools that can detect AI-generated content in scientific publications and research outputs. This could involve developing algorithms that can identify language patterns and other characteristics of AI-generated text. : It is crucial to promote responsible use of LLMs in academic research. This could involve developing ethical guidelines for the use of LLMs in research and encouraging researchers to use LLMs in a way that supports human intelligence and critical thinking, rather than replacing it. It is important to engage with the broader community, including policymakers and the public, to raise awareness about the potential risks and benefits of LLMs in academic research. This could involve developing educational materials and hosting public discussions to foster informed and responsible use of LLMs in research.Overall, protecting science and academics from the potential misuse of LLMs will require a collaborative effort from the research community, policymakers, and the broader public. By working together, we can develop the tools, guidelines, and standards needed to ensure that LLMs are used in a responsible and ethical manner, in support of scientific research and academic excellence.

CONCLUSIONS

With the growth of AI technology, science, like many other fields, is not only greatly benefiting, but also currently facing challenges to its core principles, methods, norms and ethics. The priority should be to embrace the opportunities presented by this development and optimally manage any associated risks. It is believed that science can make the most of conversational AI while still maintaining the essential elements that make it such a rewarding and significant pursuit, such as curiosity, creativity, and exploration. Actually, the progress of AI technology may limit and obscure people’s/researchers’ contributions in the future. Indeed, AI chatbots may be able to generate research questions and hypotheses, develop methodology, create experiments’ research protocols, analyse and interpret data, and write manuscripts. Although we are still some way off from this scenario, there is no doubt that conversational AI technology will increasingly impact all stages of the scientific publishing process. Indeed, we believe that research institutions and academics will not be able to exclude the use of this technology. Therefore, this is a call for action for academics to act in the field of academic research. They shall focus on educating research students on the basic principles and ethical considerations involved in academic research. It is also important to emphasize the transparency of their work by acknowledging the use of any AI/LLMs at any stage of their research. Additionally, researchers should always crosscheck any information provided by these technologies with relevant sources. In doing so, researchers can ensure the ethical and transparent use of these tools in academic research. The race is now ongoing between humans and LLMs to come up with the best ways to limit the threats of such new AI-based tools while optimizing the benefits they will bring to humanity. Box 4 summarises the main messages of this review.

Main messages of this paper.


The evolution of ChatGPT and similar products is closely linked to the development of AI and NLP. From the early days of rule-based systems to today’s cutting-edge deep learning models, the field has made tremendous strides in generating highly realistic human-like text and performing a wide range of language tasks.

While ChatGPT’s capabilities offer exciting prospects for education and research, there are potential hazards associated with its usage that deserve full consideration. Maintaining the integrity of scientific research necessitates strict adherence to ethical and academic principles, with human intelligence and critical thinking at the forefront of the research process.

While ChatGPT has several benefits that make it a promising and effective tool for academic research, researchers must exercise caution and use it appropriately in conjunction with other research methods with cautious checks all along the way, to ensure trustful/reliable results.

While ChatGPT and other LLMs have numerous potential uses in academic research, they are not suitable for all types of research and pose accuracy and transparency threats/risks. Researchers employing these tools must maintain vigilance and integrate expert-driven fact-checking and verification processes into their work. Importantly, if some field of data science could be taken in charge by computers, most of the research experiments need humans to be ran. Thankfully, we do not expect humans to be completely replaced by computers yet.

We propose that the U.S. NLM creates a software similar to a plagiarism checker to help editors and publishers detect text generated by LLMs instead of researchers and/or reviewers.

Current guidelines advice against crediting LLMs as authors on research papers since, as mentioned above, AI tools/computers cannot take responsibility for the work. However, if the authors use such tools, it would be fair mentioning it in the acknowledgements’ section of the manuscripts.

The advantages and disadvantages of using ChatGPT in academic research for sports science are not fundamentally different from the advantages and disadvantages of using ChatGPT in other academic fields. ChatGPT has the potential to shed light on important issues in sports science research, but its limitations must be considered and addressed. In conjunction with other research methods, researchers should prioritize the use of peer-reviewed academic sources and critically evaluate and interpret ChatGPT-generated responses.

It is essential to include a module for medical students, as well as for master’s and PhD. students, on NLM technologies, ethics, and the best way to use them.

AI : Artificial Intelligence. LLM : Large Language Model. NLM : National Library of Medicine. NLP : Natural Language Processing.

Acknowledgements

Anecdotally, and intentionally, the authors would like to acknowledge the assistance of ChatGPT in the writing sections ( boxes 1 to 3 ) of our article. The use of ChatGPT within the ethical guidelines showed its potential to support the writing process and improve the clarity and coherence of scientific communication. We are grateful that the high standards of transparency were maintained throughout the writing process, and we believe that this acknowledgment will encourage further exploration of the use of natural language processing tools such as ChatGPT in scientific writing. We also extend our gratitude to the editors and reviewers of this academic journal for recognizing the value of our work and providing valuable feedback that has helped to improve the quality of this article.

This research received no specific grant from any funding agency in the public, commercial, or non-profit sectors.

Conflicts of interest

We, hereby declare that we have no financial or personal relationships that could potentially influence or bias the content of this paper. Specifically, none of the authors holds any financial interests or conflicts of interest associated with the ChatGPT or NLM technologies discussed in this paper. Furthermore, none of the authors has affiliations with any organizations that might have a financial interest in the research or its outcomes. Moreover, we confirm that we have no personal or professional relationships that could potentially affect the research or its findings. None of the authors has collaborated or consulted with any individuals or organizations that have a financial or other interest in the ChatGPT or NLM technologies. Additionally, we have not received any funding or other types of support from any sources that could influence the research or its findings. We affirm that the research presented in this paper is entirely based on our own analysis and interpretation of the facts/data. We assure that there are no conflicts of interest that could impact the objectivity or integrity of the research. We make this declaration of no conflict of interest to ensure transparency and maintain the credibility of the research manuscript presented.

Paperpal logo

Experience the future of academic writing

Boost your chances of success with real-time, subject-specific language suggestions that help you write better, faster!

Research, Write, Cite: Enjoy Uninterrupted Writing with Paperpal

Get science-backed answers to your questions from 250M+ research articles and save relevant citations to your library!

Online Plagiarism Checker For Students & Academics

Avoid accidental plagiarism in your academic text with Paperpal’s accurate plagiarism detector.

university-logo-1

1Mn+ Academics love Paperpal

trustpoilot rating image

Rated Excellent on Trustpilot

“It offers suggestions about tenses, and paraphrasing and helps re-organize my paragraphs, giving them better language structure. I feel confident about my manuscripts after running them through Paperpal. ”

" Unlike Grammarly, Paperpal aligns and tailors my sentence structures to the convention of academic writing. It also helps me to rephrase or simplify unclear sentence structures and helps me write with confidence and ease"

" What I love about Paperpal is that it has an element of context sensitivity. It understands the context in which the content is written and then gives suggestions"

" The best thing about Paperpal Word is that you get edits in real-time , and not when you’re done writing your document."

“Because of Paperpal, my manuscript is now free of errors, more concise and more readable! ”

“Its easy to use, helpful in drafts as an academic and unlike ChatGPT, it doesn't provide same suggestions everytime. ”

Try Paperpal

Unmatched features for academic writing

Language suggestions.

Paperpal understands academic context, suggesting precise language corrections without changing references, equations, technical terms, and non-English words.

language-suggestion-img

Plagiarism Checks

Get similarity score, colour-coded results and side-by-side comparisons to sources.

unmatched feature image

Research NEW

Get science-backed answers to your questions from 250M+ research articles as you write

unmatched feature image

Brainstorm ideas, get outlines and titles, create abstracts, summaries, study highlights, or craft powerful emails to journals with a click.

AI Writing banner

Paraphrase text to improve clarity and tone

Accurate paraphrasing that retains the meaning of your work, helps meet journal-prescribed word counts and maintains a formal tone.

High quality academic translations for your text

Context-sensitive translations from 25+ foreign languages to English, letting you focus on your research while we polish your writing.

Pre-submission checks to avoid desk rejection

Comprehensive submission readiness report with 30+ language and technical checks.

Contextual synonyms for better vocabulary

Improve the clarity of your text with relevant synonym suggestions that are validated based on published literature.

Paperpal has you covered at every writing stage

feature image

Have a ready manuscript that needs review? Full language checks with edits in track changes and multiple free rounds to ensure high-quality writing

feature image

Work anywhere online with real-time, secure, accurate language and grammar checks and rewrite suggestions for academic writing

feature image

Writing a paper on MS Word? Paperpal for Word provides assistive writing with instant language and grammar suggestions as you write

Be a part of the new writing revolution

Our powerful AI tools transform your work by giving you the power to improve your writing. Paperpal is built for all academics. Learn more below

feature image

Frequently Asked Questions

Is my data used for training paperpal’s ai, how does paperpal work, how to use paperpal in ms word, can i use paperpal for free, what is the price for paperpal paid subscriptions, who can use paperpal, what all can paperpal be used for, how secure is my manuscript and does paperpal use my data, what ai models are used in paperpal, will my paperpal edited document be flagged as plagiarised, will my paperpal edited document be flagged by ai detectors, will i retain copyright over my paperpal edited document, take the first step towards academic writing excellence.

china flag

  • Perspective
  • Open access
  • Published: 25 February 2023

Can artificial intelligence help for scientific writing?

  • Michele Salvagno 1 ,
  • Fabio Silvio Taccone 1 &
  • Alberto Giovanni Gerli 2  

Critical Care volume  27 , Article number:  75 ( 2023 ) Cite this article

94k Accesses

250 Citations

380 Altmetric

Metrics details

A Correspondence to this article was published on 10 May 2023

A Comment to this article was published on 18 April 2023

A Correspondence to this article was published on 21 March 2023

A Correction to this article was published on 08 March 2023

This article has been updated

This paper discusses the use of Artificial Intelligence Chatbot in scientific writing. ChatGPT is a type of chatbot, developed by OpenAI, that uses the Generative Pre-trained Transformer (GPT) language model to understand and respond to natural language inputs. AI chatbot and ChatGPT in particular appear to be useful tools in scientific writing, assisting researchers and scientists in organizing material, generating an initial draft and/or in proofreading. There is no publication in the field of critical care medicine prepared using this approach; however, this will be a possibility in the next future. ChatGPT work should not be used as a replacement for human judgment and the output should always be reviewed by experts before being used in any critical decision-making or application. Moreover, several ethical issues arise about using these tools, such as the risk of plagiarism and inaccuracies, as well as a potential imbalance in its accessibility between high- and low-income countries, if the software becomes paying. For this reason, a consensus on how to regulate the use of chatbots in scientific writing will soon be required.

Introduction

A chatbot is an electronic system (generally a software) that simulates conversations by responding to keywords or phrases it recognizes and that can be integrated into various platforms, such as websites, mobile apps, and messaging platforms.

The Chatbot Generative Pre-trained Transformer (ChatGPT), developed by OpenAI, is a type of Artificial Intelligence (AI) software designed to simulate conversations with human users. This chatbot works through algorithms programmed to understand natural language inputs and answer with appropriate responses, either pre-written or newly generated by the AI. ChatGPT is constantly improved with reinforcement techniques, natural language processing, and machine learning, to improve its ability to understand and thoroughly respond to users’ needs. Concretely, you can conversationally ask anything and receive a fast and adequate human-like written reply to your questions or requests such as: (a) write a small text on a given topic; (b) get information on a topic of interest; (c) compose an email or message with a certain tone, specific content, and intended for a particular person; (d) correct the shape of a text or change its wording; (e) solve problems.

As such, this chatbot could also be used in scientific writing [ 1 ]. Indeed, ChatGPT could become a promising and powerful tool for tasks such as automated draft generation, article summarizing, and language translation, which may be useful in academic activities to make writing work faster and easier. However, the use of this tool in scientific writing raises some ethical concerns and should therefore be regulated.

ChatGPT in scientific writing

ChatGPT is already able to help medical researchers and scientists to write articles and abstracts, in literature research, to summarize data or information, to provide suggestions for structure, references, and titles, in language reviews to make the text more readable, or even to generate a full draft of a paper [ 2 ]. At this stage, although feasible, no article dealing with critical care medicine has been published so far using this application. Moreover, whether writing a manuscript dealing with a topic of critical care medicine would be easier or more difficult when compared to other medical disciplines remains unknown. Which are the limits of possible applications of these tools have to be clarified.

ChatGPT can generate text on a wide variety of topics. Recently, an entire letter to the editor was written by ChatGPT, answering the author's questions [ 1 ]. Artificial Intelligence cannot generate new ideas, but it can organize and develop the ones of the researcher, creating an initial draft. However, this appears to be a starting point for a human-based development of the text, as the automatic text generated is far from being a substitute for the knowledge, creativity, and critical thinking of human experts.

For the literature search, ChatGPT and AI research assistants (such as “elicit.org”) can aid researchers in the review process by finding academic papers, summarizing their conclusions, and highlighting areas of uncertainty (i.e., summarizing the recent randomized clinical trial on the use of extracorporeal membrane oxygenation for refractory cardiac arrest [ 3 ], Fig.  1 ). This could help physicians, as an example, to quickly gain an understanding of the current state of knowledge on a particular topic, and to identify potential gaps that need to be addressed, although the provided summary might be quite generic and do not critically analyze differences among studies.

figure 1

Example of a summary of the existing literature on a specific topic

During the writing process, ChatGPT can help to generate an initial draft of a scientific paper and even suggest titles. By providing raw information, ChatGPT can also assist in composing the section on the methods used in the study, justify sample size and describe data analysis techniques. From author’s experience, when the manuscript has been finalized, ChatGPT is extremely effective for the editing process; formatting and language editing, rewriting a particularly complex sentence in a clearer way, and even summarizing the entire text in order to compose a suitable abstract, are feasible using this approach, although the results are not always satisfactory, but they certainly save time. ChatGPT has been recently asked to prepare regular discharge summaries [ 4 ], but the quality of the content has not been specifically evaluated. To our knowledge, a structured evaluation to determine the quality of the output has not been performed yet. More complicated writing processes, such as systematic review and meta-analyses, require the human intervention and ChatGPT might be helpful mainly for editing.

A future potential application of AI is in the automatic generation of figures, tables, and other visual elements of the manuscript, which can aid in summarizing data. These elements are important for the clarity and understanding of the manuscript, but they are often time-consuming to create.

Importantly, the writing process of a scientific paper requires, for the moment, the guidance and supervision of human researchers who are experts in the field to ensure the accuracy, coherence, and credibility of the content before being used or submitted for publication. Chatbots can help but needs the researcher’s input, and inadequate inputs would lead to inadequate results. For this reason, chatbots and AI, in general, should not replace human researchers' expertise, judgment, personality, and—in the end—responsibility.

Chatbots vs. human beings

As AI, ChatGPT has a superior advantage in terms of fast comprehending information deeply and connecting evidence to reach conclusions, compared to humans who have limitations in their ability to read a comprehensive range of literature and distinguish the connections between seemingly separate pieces of information.

Moreover, it may be difficult to recognize whether a paper is written by a chatbot or a human being [ 5 ], because chatbots use advanced techniques, such as natural language processing (NLP) and machine learning, to generate text that is similar to human writing. To detect the author is a complex task and requires thorough critical reading to reach a conclusion. However, a few characteristics might reveal that a paper was written by a chatbot, such as the lack of nuance, style, or originality, which could allow the identification by AI output detectors and skeptical human reviewers [ 6 ]. Interestingly, the same writing traits could be found in texts written in a language that is not an individual's native tongue. By focusing on these traits, it is possible that AI plagiarism detectors would identify non-native English language papers as AI-generated text. It would be interesting to find these tools' sensibility in detecting the authors of texts written by these two groups.

Nevertheless, the chatbot-generated text might lack the subtle phrasing and word choice that a human author might use to convey a certain meaning or tone. It might also be vaguer and contain inconsistencies that would not be present in a human-written paper. Instead, if the paper contains a high degree of structural and grammatical errors, it may suggest that it was written by a human being (but it should not be the other way around). Lastly, if the paper is discussing a topic that is very specific and highly technical, it is less likely that a chatbot could generate such a text, as it would require a deep understanding of the topic and the ability to generate scientific analysis and conclusions.

Ethical considerations

Ethical concerns could also limit the use of these chatbots for scientific writing [ 7 ].

The process of acquiring knowledge from others and writing new or review articles involves humans incorporating both what they have learned from others and their ideas. It is natural for humans to repeat the findings, statements, and written works of others, and thus to come close to committing plagiarism by presenting an idea without proper referencing to the original authors. AI or ChatGPT systems can commit plagiarism under this definition but can also be programmed to avoid copying others by rephrasing their work in a manner similar to what human authors do. However, using programs to reformulate sentences and writing to reduce the percentage of plagiarism (i.e., asking the software to rewrite a section written by other authors with different words) could not be considered acceptable in scientific research. If we define “plagiarism” as a mere act to copy someone else work, just rephrasing what it was written, regardless of the method used, and without adding anything personal, it is a violation of academic integrity. For this reason, journal editors should use programs to detect written content using AI to detect plagiarism better.

Second, the lack of an expert and critical human mind behind scientific work (which is the basis of the scientific method) could lead to a risk of perpetuating or amplifying existing biases and inaccuracies in the data, providing unfair results and hampering scientific growth. Whatever the use of AI, we believe that the presence of an expert in the field in conducting scientific activity and writing is a necessary cornerstone even to guarantee the quality of the work.

Third, the incredible development of AI tools can lead to a significant increase in publication numbers from some researchers, but not accompanied by a real increase in her or his experience in that field. Ethical issues can therefore arise regarding hiring professionals by academic institutions that score on the number of publications rather than on their quality.

Fourth, whether the ChatGPT should be mentioned within the authors of the manuscript written using this approach remains not adequately defined. Finally, if at the moment ChatGPT and other chatbot services are free of charge, it is not guaranteed that they will not become paying in the future. The introduction of charges to access these chatbots could lead to a further disparity between high- and low-income countries (as well as between more junior to older professionals), for scientific production, resulting in unfair facilitation for the formers with unpredictable consequences.

Chatbot as a tool in the ICU

In addition to scientific writing, ChatGPT has the potential to assist physicians in their hospital work by saving time and allowing them to focus on providing patient care. In an Intensive Care Unit (ICU), where constant monitoring of multiple patient information is required, such as treatment progression, laboratory values, microbiological results, and fluid balance calculation, ChatGPT can assist in several ways.

First, ChatGPT can provide general information about recognized ICU protocols: when given a specific request, ChatGPT would be able to generate a response (i.e., initial management of sepsis) by analyzing the input and identifying patterns in the text data that it has been trained on. Furthermore, ChatGPT has a distinct advantage over humans in quickly gathering, comprehending, and utilizing information. In future, it may eventually be taught to apply this knowledge in clinical practice by presenting the latest evidence to healthcare professionals, if not directly creating patient-tailored protocols to reflect new findings.

Currently, it could already be used for generating clinical notes by just providing raw information, such as daily summaries and discharge summaries, which could save time and increase accuracy. Moreover, it could help in communicating with patients and families, translating into other languages, and generating personalized messages with accurate and timely information about the patient's condition, which would not have the purpose of replacing, but add to, the doctor-patient (or doctor-relatives) relationship. Unfortunately, depending on the applicable legislation, privacy issues may arise when providing patients' information.

Anecdotally, when provided with clinical information, laboratory values, past medical history, and other relevant data, ChatGPT may already assist physicians in, for example, determining appropriate treatment options; however, it is important to note that the answers provided by the chatbot can be erroneous or dated, as the technology is still in development and may not have the level of accuracy, current knowledge, or precision that is desired. Nevertheless, it is believable that AI will continue to improve in the future and potentially provide more accurate and reliable results.

As next future perspectives, AI in general could potentially be trained to automatically extract and understand all the relevant information from electronic health records and by analyzing patient data, such as vital signs, laboratory results, and medical history, helping doctors to quickly access patient information, providing recommendations for interventions, and allowing more fast and comprehensive decisions. The implementations of these AI systems, which already exist in several areas of the world, with chatbots, could create an interacting system which would provide immediately elaborated information to the treating physician.

Conclusions

ChatGPT is an AI software potentially able to assist in the writing process of a scientific paper and can help in the literature review, identify research questions, provide an overview of the current state of the field, and assist with tasks, such as formatting and language review. Moreover, it can have applications in clinical practice as time-savers. As chatbot tools will be widely adopted in the near future, it is crucial to have international academic regulations in place to regulate their use in scientific writing and establish mechanisms for identifying and penalizing unethical usage. Chatbots are simply tools, they can assist human researchers but should not be used as a replacement for human researchers' expertise, judgment, and personality.

We have requested to ChatGPT “Could you review this paper and tell us if you (as an author!) agree to submit it to Critical Care Journal?”. The (non-modified) response from ChatGPT has been: “ As an AI model, I am not able to review or submit papers to journals as I am not a researcher or author. However, I can give you some feedback on the paper you've provided. The paper presents a clear and comprehensive overview of the potential uses of ChatGPT in scientific writing, specifically in the field of critical care medicine. The introduction provides a good background on chatbots and ChatGPT and how it can be used in various tasks. The paper goes on to discuss the potential uses of ChatGPT in scientific writing, such as draft generation, literature research and language review. The paper also touches on the ethical concerns and limitations of using ChatGPT in scientific writing. I would recommend that you review the paper again, check for grammar and spelling errors, and make sure that the citations and references are correctly formatted. It would be also important to have a proper conclusion that summarizes the main ideas of the paper and the future perspectives. It's important to keep in mind that ChatGPT should always be used in combination with the expertise and judgement of human experts and its output should be validated before it is used in clinical practice.”

Availability of data and materials

Not applicable.

Change history

08 march 2023.

A Correction to this paper has been published: https://doi.org/10.1186/s13054-023-04390-0

King MR. The future of AI in medicine: a perspective from a Chatbot. Ann Biomed Eng. 2022;51:291–5. https://doi.org/10.1007/s10439-022-03121-w .

Article   PubMed   Google Scholar  

Hutson M. Could AI help you to write your next paper? Nature Nature Research. 2022;611:192–3.

Article   CAS   Google Scholar  

Suverein MM, Delnoij TSR, Lorusso R, Bruinsma GJBB, Otterspoor L, Kraemer CVE, et al. Early extracorporeal CPR for refractory out-of-hospital cardiac arrest. New Engl J Med. 2023;388:299–309. https://doi.org/10.1056/NEJMoa2204511 .

Patel SB, Lam K. ChatGPT: the future of discharge summaries? Lancet Digit Health. 2023. https://doi.org/10.1016/S2589-7500(23)00021-3 .

Else H. Abstracts written by ChatGPT fool scientists. Nature. 2023;613:423–423.

Article   CAS   PubMed   Google Scholar  

Gao CA, Howard FM, Markov NS, Dyer EC, Ramesh S, Luo Y, et al. Comparing scientific abstracts generated by ChatGPT to original abstracts using an artificial intelligence output detector, plagiarism detector, and blinded human reviewers. BioRxiv. 2022. https://doi.org/10.1101/2022.12.23.521610v1 .

Article   PubMed   PubMed Central   Google Scholar  

Hammad M. The impact of artificial intelligence (AI) Programs on writing scientific research. Ann Biomed Eng. 2023;51:459–60.

Download references

Acknowledgements

For the writing of this article, we have not received funds or support from OpenAI, which is not associated with the entire process that led to the preparation of this article. The text, written with the support of the ChatGPT by OpenAI, has however been modified by the human authors, who assume full responsibility for form and content.

This research received no external funding.

Author information

Authors and affiliations.

Department of Intensive Care, Erasme Hospital, Université Libre de Bruxelles, 1070, Brussels, Belgium

Michele Salvagno & Fabio Silvio Taccone

Department of Clinical Sciences and Community Health, Università Degli Studi di Milano, 20122, Milan, Italy

Alberto Giovanni Gerli

You can also search for this author in PubMed   Google Scholar

Contributions

MS, FST, AGG conceived of the study, and participated in its design and coordination and helped to draft the manuscript; MS, FST, AGG participated in the design of the study; MS, FST, AGG wrote, reviewed and made extensive editing of the final content; FST conceived and took the screenshot. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Michele Salvagno .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests, additional information, publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original online version of this article was revised: shortly after initial publication of this article, the authorship and Acknowledgements were updated in line with Springer Nature authorship policies. Large Language Models (LLM), such as ChatGPT, do not currently satisfy our authorship criteria. An attribution of authorship carries with it accountability for the work, which cannot be effectively applied to LLMs.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Salvagno, M., Taccone, F.S. & Gerli, A.G. Can artificial intelligence help for scientific writing?. Crit Care 27 , 75 (2023). https://doi.org/10.1186/s13054-023-04380-2

Download citation

Received : 31 January 2023

Accepted : 21 February 2023

Published : 25 February 2023

DOI : https://doi.org/10.1186/s13054-023-04380-2

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Artificial intelligence
  • Scientific writing
  • Machine learning

Critical Care

ISSN: 1364-8535

ai language model for scientific research paper writing

Robot face processing human talk and learning from it

Updated : 6 June 2024 Contributor : Jim Holdsworth

Natural language processing (NLP) is a subfield of computer science and artificial intelligence (AI) that uses machine learning to enable computers to understand and communicate with human language. 

NLP enables computers and digital devices to recognize, understand and generate text and speech by combining computational linguistics—the rule-based modeling of human language—together with statistical modeling,  machine learning (ML)  and deep learning. 

NLP research has enabled the era of generative AI, from the communication skills of large language models (LLMs) to the ability of image generation models to understand requests. NLP is already part of everyday life for many, powering search engines, prompting chatbots for customer service with spoken commands, voice-operated GPS systems and digital assistants on smartphones. NLP also plays a growing role in enterprise solutions that help streamline and automate business operations, increase employee productivity and simplify mission-critical business processes.

Use this model selection framework to choose the most appropriate model while balancing your performance requirements with cost, risks and deployment needs.

Register for the white paper on AI governance

A natural language processing system can work rapidly and efficiently: after NLP models are properly trained, it can take on administrative tasks, freeing staff for more productive work. Benefits can include:

Faster insight discovery : Organizations can find hidden patterns, trends and relationships between different pieces of content. Text data retrieval supports deeper insights and analysis, enabling better-informed decision-making and surfacing new business ideas.

Greater budget savings : With the massive volume of unstructured text data available, NLP can be used to automate the gathering, processing and organization of information with less manual effort.

Quick access to corporate data : An enterprise can build a knowledge base of organizational information to be efficiently accessed with AI search. For sales representatives, NLP can help quickly return relevant information, to improve customer service and help close sales.

NLP models are not perfect and probably never will be, just as human speech is prone to error. Risks might include:

Biased training :  As with any AI function, biased data used in training will skew the answers. The more diverse the users of an NLP function, the more significant this risk becomes, such as in government services, healthcare and HR interactions. Training datasets scraped from the web, for example, are prone to bias.

Misinterpretation : As in programming, there is a risk of garbage in, garbage out (GIGO). NLP solutions might become confused if spoken input is in an obscure dialect, mumbled, too full of slang, homonyms, incorrect grammar, idioms, fragments, mispronunciations, contractions or recorded with too much background noise.

New vocabulary: New words are continually being invented or imported. The conventions of grammar can evolve or be intentionally broken. In these cases, NLP can either make a best guess or admit it’s unsure—and either way, this creates a complication.

Tone of voice : When people speak, their verbal delivery or even body language can give an entirely different meaning than the words alone. Exaggeration for effect, stressing words for importance or sarcasm can be confused by NLP, making the semantic analysis more difficult and less reliable.

Human language is filled with many ambiguities that make it difficult for programmers to write software that accurately determines the intended meaning of text or voice data. Human language might take years for humans to learn—and many never stop learning. But then programmers must teach natural language-driven applications to recognize and understand irregularities so their applications can be accurate and useful.

NLP combines the power of computational linguistics together with machine learning algorithms and deep learning. Computational linguistics is a discipline of linguistics that uses data science to analyze language and speech. It includes two main types of analysis: syntactical analysis and semantical analysis. Syntactical analysis determines the meaning of a word, phrase or sentence by parsing the syntax of the words and applying preprogrammed rules of grammar. Semantical analysis uses the syntactic output to draw meaning from the words and interpret their meaning within the sentence structure. 

The parsing of words can take one of two forms. Dependency parsing looks at the relationships between words, such as identifying nouns and verbs, while constituency parsing then builds a parse tree (or syntax tree): a rooted and ordered representation of the syntactic structure of the sentence or string of words. The resulting parse trees underly the functions of language translators and speech recognition. Ideally, this analysis makes the output—either text or speech—understandable to both NLP models and people.

Self-supervised learning (SSL) in particular is useful for supporting NLP because NLP requires large amounts of labeled data to train state-of-the-art artificial intelligence (AI) models . Because these labeled datasets require time-consuming annotation—a process involving manual labeling by humans—gathering sufficient data can be prohibitively difficult. Self-supervised approaches can be more time-effective and cost-effective, as they replace some or all manually labeled training data. Three different approaches to NLP include:

Rules-based NLP : The earliest NLP applications were simple if-then decision trees, requiring preprogrammed rules. They are only able to provide answers in response to specific prompts, such as the original version of Moviefone. Because there is no machine learning or AI capability in rules-based NLP, this function is highly limited and not scalable.

Statistical NLP : Developed later, statistical NLP automatically extracts, classifies and labels elements of text and voice data, and then assigns a statistical likelihood to each possible meaning of those elements. This relies on machine learning, enabling a sophisticated breakdown of linguistics such as part-of-speech tagging. Statistical NLP introduced the essential technique of mapping language elements—such as words and grammatical rules—to a vector representation so that language can be modeled by using mathematical (statistical) methods, including regression or Markov models. This informed early NLP developments such as spellcheckers and T9 texting (Text on 9 keys, to be used on Touch-Tone telephones).

Deep learning NLP : Recently, deep learning models have become the dominant mode of NLP, by using huge volumes of raw, unstructured data—both text and voice—to become ever more accurate. Deep learning can be viewed as a further evolution of statistical NLP, with the difference that it uses neural network models. There are several subcategories of models:

  • Sequence-to-Sequence (seq2seq) models : Based on recurrent neural networks (RNN) , they have mostly been used for machine translation by converting a phrase from one domain (such as the German language) into the phrase of another domain (such as English).
  • Transformer models : They use tokenization of language (the position of each token—words or subwords) and self-attention (capturing dependencies and relationships) to calculate the relation of different language parts to one another. Transformer models can be efficiently trained by using self-supervised learning on massive text databases. A landmark in transformer models was Google’s bidirectional encoder representations from transformers (BERT), which became and remains the basis of how Google’s search engine works.
  • Autoregressive models : This type of transformer model is trained specifically to predict the next word in a sequence, which represents a huge leap forward in the ability to generate text. Examples of autoregressive LLMs include GPT, Llama , Claude and the open-source Mistral.
  • Foundation models : Prebuilt and curated foundation models can speed the launching of an NLP effort and boost trust in its operation. For example, the IBM Granite™ foundation models are widely applicable across industries. They support NLP tasks including content generation and insight extraction. Additionally, they facilitate retrieval-augmented generation, a framework for improving the quality of response by linking the model to external sources of knowledge. The models also perform named entity recognition which involves identifying and extracting key information in a text.

For a deeper dive into the nuances between multiple technologies and their learning approaches, see “ AI versus. machine learning versus deep learning versus neural networks: What’s the difference? ”

Several NLP tasks typically help process human text and voice data in ways that help the computer make sense of what it’s ingesting. Some of these tasks include:

Linguistic tasks

  • Coreference resolution is the task of identifying if and when two words refer to the same entity. The most common example is determining the person or object to which a certain pronoun refers (such as, “she” = “Mary”). But it can also identify a metaphor or an idiom in the text (such as an instance in which “bear” isn’t an animal, but a large and hairy person).
  • Named entity recognition  ( NER ) identifies words or phrases as useful entities. NER identifies “London” as a location or “Maria” as a person's name.
  • Part-of-speech tagging , also called grammatical tagging, is the process of determining which part of speech a word or piece of text is, based on its use and context. For example, part-of-speech identifies “make” as a verb in “I can make a paper plane,” and as a noun in “What make of car do you own?”
  • Word sense disambiguation is the selection of a word meaning for a word with multiple possible meanings. This uses a process of semantic analysis to examine the word in context. For example, word sense disambiguation helps distinguish the meaning of the verb “make” in “make the grade” (to achieve) versus “make a bet” (to place). Sorting out “I will be merry when I marry Mary” requires a sophisticated NLP system.

User-supporting tasks

  • Speech recognition , also known as speech-to-text , is the task of reliably converting voice data into text data. Speech recognition is part of any application that follows voice commands or answers spoken questions. What makes speech recognition especially challenging is the way people speak—quickly, running words together, with varying emphasis and intonation.
  • Natural language generation (NLG) might be described as the opposite of speech recognition or speech-to-text: NLG is the task of putting structured information into conversational human language. Without NLG, computers would have little chance of passing the Turing test , where a computer tries to mimic a human conversation. Conversational agents such as Amazon’s Alexa and Apple’s Siri are already doing this well and assisting customers in real time.
  • Natural language understanding (NLU) is a subset of NLP that focuses on analyzing the meaning behind sentences. NLU enables software to find similar meanings in different sentences or to process words that have different meanings.
  • Sentiment analysis  attempts to extract subjective qualities —attitudes, emotions, sarcasm, confusion or suspicion—from text. This is often used for routing communications to the system or the person most likely to make the next response.

See the blog post “ NLP vs. NLU vs. NLG: the differences between three natural language processing concepts ” for a deeper look into how these concepts relate.

The all-new enterprise studio that brings together traditional machine learning along with new generative AI capabilities powered by foundation models.

Organizations can use NLP to process communications that include email, SMS, audio, video, newsfeeds and social media. NLP is the driving force behind AI in many modern real-world applications. Here are a few examples:

  • Customer assistance : Enterprises can deploy chatbots or virtual assistants to quickly respond to custom questions and requests. When questions become too difficult for the chatbot or virtual assistant, the NLP system moves the customer over to a human customer service agent. Virtual agents such as IBM watsonx™ Assistant , Apple’s Siri and Amazon’s Alexa use speech recognition to recognize patterns in voice commands and natural language generation to respond with appropriate actions or helpful comments. Chatbots respond to typed text entries. The best chatbots also learn to recognize contextual clues about human requests and use them to provide even better responses or options over time. The next enhancement for these applications is question answering, the ability to respond to questions—anticipated or not—with relevant and helpful answers in their own words. These automations help reduce costs, save agents from spending time on redundant queries and improve customer satisfaction. Not all chatbots are powered by AI, but state-of-the-art chatbots increasingly use conversational AI techniques, including NLP, to understand user questions and automate responses to them.
  • FAQ : Not everyone wants to read to discover an answer. Fortunately, NLP can enhance FAQs: When the user asks a question, the NLP function looks for the best match among the available answers and brings that to the user’s screen. Many customer questions are of the who/what/when/where variety, so this function can save staff from having to repeatedly answer the same routine questions.
  • Grammar correction : The rules of grammar can be applied within word processing or other programs, where the NLP function is trained to spot incorrect grammar and suggest corrected wordings.
  • Machine translation:  Google Translate is an example of widely available NLP technology at work. Truly useful machine translation involves more than replacing words from one language with words of another. Effective translation accurately captures the meaning and tone of the input language and translates it to text with the same meaning and desired impact in the output language. Machine translation tools are becoming more accurate. One way to test a machine translation tool is to translate text from one language and then back to the original. An oft-cited, classic example: Translating “ The spirit is willing, but the flesh is weak” from English to Russian and back again once yielded, “ The vodka is good, but the meat is rotten .” Recently, a closer result was “ The spirit desires, but the flesh is weak. ” Google translate can now take English to Russian to English and return the original, “ The spirit is willing, but the flesh is weak."        
  • Redaction of personally identifiable information (PII) : NLP models can be trained to quickly locate personal information in documents that might identify individuals. Industries that handle large volumes of sensitive information—financial, healthcare, insurance and legal firms—can quickly create versions with the PII removed.
  • Sentiment analysis : After being trained on industry-specific or business-specific language, an NLP model can quickly scan incoming text for keywords and phrases to gauge a customer’s mood in real-time as positive, neutral or negative. The mood of the incoming communication can help determine how it will be handled. And the incoming communication doesn’t have to be live: NLP can also be used to analyze customer feedback or call center recordings. Another option is an NLP API that can enable after-the-fact text analytics. NLP can uncover actionable data insights from social media posts, responses or reviews to extract attitudes and emotions in response to products, promotions and events. Information companies can use sentiment analysis in product designs, advertising campaigns and more.
  • Spam detection:  Many people might not think of spam detection as an NLP solution, but the best spam detection technologies use NLP’s text classification capabilities to scan emails for language indicating spam or phishing. These indicators can include overuse of financial terms, characteristic bad grammar, threatening language, inappropriate urgency or misspelled company names.
  • Text generation : NLP helps put the “generative” into generative AI. NLP enables computers to generate text or speech that is natural-sounding and realistic enough to be mistaken for human communication. The generated language might be used to create initial drafts of blogs, computer code, letters, memos or tweets. With an enterprise-grade system, the quality of generated language might be sufficient to be used in real time for autocomplete functions, chatbots or virtual assistants. Advancements in NLP are powering the reasoning engine behind generative AI systems, driving further opportunities. Microsoft® Copilot is an AI assistant designed to boost employee productivity and creativity across day-to-day tasks and is already at work in tools used every day.  
  • Text summarization: Text summarization uses NLP techniques to digest huge volumes of digital text and create summaries and synopses for indexes, research databases, for busy readers who don't have time to read the full text. The best text summarization applications use semantic reasoning and natural language generation (NLG) to add useful context and conclusions to summaries.
  • Finance : In financial dealings, nanoseconds might make the difference between success and failure when accessing data, or making trades or deals. NLP can speed the mining of information from financial statements, annual and regulatory reports, news releases or even social media.
  • Healthcare : New medical insights and breakthroughs can arrive faster than many healthcare professionals can keep up. NLP and AI-based tools can help speed the analysis of health records and medical research papers, making better-informed medical decisions possible, or assisting in the detection or even prevention of medical conditions.
  • Insurance : NLP can analyze claims to look for patterns that can identify areas of concern and find inefficiencies in claims processing—leading to greater optimization of processing and employee efforts.
  • Legal : Almost any legal case might require reviewing mounds of paperwork, background information and legal precedent. NLP can help automate legal discovery, assisting in the organization of information, speeding review and helping ensure that all relevant details are captured for consideration.

Python and the Natural Language Toolkit (NLTK)

The Python programing language provides a wide range of tools and libraries for performing specific NLP tasks. Many of these NLP tools are in the Natural Language Toolkit , or NLTK, an open-source collection of libraries, programs and education resources for building NLP programs.

The NLTK includes libraries for many NLP tasks and subtasks, such as sentence parsing , word segmentation , stemming and lemmatization (methods of trimming words down to their roots), and tokenization (for breaking phrases, sentences, paragraphs and passages into tokens that help the computer better understand the text). It also includes libraries for implementing capabilities such as semantic reasoning: the ability to reach logical conclusions based on facts extracted from text. Using NLTK, organizations can see the product of part-of-speech tagging. Tagging words might not seem to be complicated, but since words can have different meanings depending on where they are used, the process is complicated.

Generative AI platforms

Organizations can infuse the power of NLP into their digital solutions by leveraging user-friendly generative AI platforms such as IBM Watson NLP Library for Embed , a containerized library designed to empower IBM partners with greater AI capabilities. Developers can access and integrate it into their apps in their environment of their choice to create enterprise-ready solutions with robust AI models, extensive language coverage and scalable container orchestration.

More options include IBM ® watsonx.ai™ AI studio , which enables multiple options to craft model configurations that support a range of NLP tasks including question answering, content generation and summarization, text classification and extraction. Integrations can also enable more NLP capabilities. For example, with watsonx and Hugging Face AI builders can use pretrained models to support a range of NLP tasks.

Accelerate the business value of artificial intelligence with a powerful and flexible portfolio of libraries, services and applications.

Infuse powerful natural language AI into commercial applications with a containerized library designed to empower IBM partners with greater flexibility.

Learn the fundamental concepts for AI and generative AI, including prompt engineering, large language models and the best open source projects.

Learn about different NLP use cases in this NLP explainer.

Visit the IBM Developer's website to access blogs, articles, newsletters and more. Become an IBM partner and infuse IBM Watson embeddable AI in your commercial solutions today. Use IBM Watson NLP Library for Embed in your solutions.

Watch IBM Data and AI GM, Rob Thomas as he hosts NLP experts and clients, showcasing how NLP technologies are optimizing businesses across industries.

Learn about the Natural Language Understanding API with example requests and links to additional resources.

IBM has launched a new open-source toolkit, PrimeQA, to spur progress in multilingual question-answering systems to make it easier for anyone to quickly find information on the web.

Train, validate, tune and deploy generative AI, foundation models and machine learning capabilities with IBM watsonx.ai, a next-generation enterprise studio for AI builders. Build AI applications in a fraction of the time with a fraction of the data.

Apple Intelligence Preview

ai language model for scientific research paper writing

AI for the rest of us.

Coming in beta this fall *

Static image of multiple iPhones showing Apple Intelligence features.

Built into your iPhone, iPad, and Mac to help you write, express yourself, and get things done effortlessly.

Draws on your personal context while setting a brand-new standard for privacy in AI.

ai language model for scientific research paper writing

Write with intelligent new tools. Everywhere words matter.

Apple Intelligence powers new Writing Tools, which help you find just the right words virtually everywhere you write. With enhanced language capabilities, you can summarize an entire lecture in seconds, get the short version of a long group thread, and minimize unnecessary distractions with prioritized notifications.

iPhone and Mac showing Writing Tools

Explore new features for writing, focus, and communication.

UI for Writing Tools with a text field to enter prompts, buttons for Proofread and Rewrite, different tones of writing voice, and options for summarize, key points, table, and list

Transform how you communicate using intelligent Writing Tools that can proofread your text, rewrite different versions until the tone and wording are just right, and summarize selected text with a tap. Writing Tools are available nearly everywhere you write, including third-party apps.

Notifications list on an iPhone highlights Most Important at the top of the stack

Priority notifications appear at the top of the stack, letting you know what to pay attention to at a glance. And notifications are summarized, so you can scan them faster.

iPhone shows inbox in Mail app with important messages at the top and highlighted a different color

Priority messages in Mail elevate time-sensitive messages to the top of your inbox — like an invitation that has a deadline today or a check-in reminder for your flight this afternoon.

An email in the Mail app is shown with a summary you can read at the top.

Tap to reveal a summary of a long email in the Mail app and cut to the chase. You can also view summaries of email right from your inbox.

Phone app is shown with a new record function on a live call. A second iPhone shows a summary of the call based on live audio transcription.

Just hit record in the Notes or Phone apps to capture audio recordings and transcripts. Apple Intelligence generates summaries of your transcripts, so you can get to the most important information at a glance.

iPhone with Reduce Notifications Focus enabled shows a single notification marked "maybe important."

Reduce Interruptions is an all-new Focus that understands the content of your notifications and shows you the ones that might need immediate attention, like a text about picking up your child from daycare later today.

Smart Reply options in the Mail app are shown on an iPhone.

Use a Smart Reply in Mail to quickly draft an email response with all the right details. Apple Intelligence can identify questions you were asked in an email and offer relevant selections to include in your response. With a few taps you’re ready to send a reply with key questions answered.

Delightful images created just for you.

Apple Intelligence enables delightful new ways to express yourself visually. Create fun, original images and brand-new Genmoji that are truly personal to you. Turn a rough sketch into a related image that complements your notes with Image Wand. And make a custom memory movie based on the description you provide.

Custom images are shown in the Message app and the Image Wand feature in Notes is shown on an iPad.

Create expressive images, unique Genmoji, and custom memory movies.

UI of the Image Playground experience shows a colorful image of a brain surrounded by classical instruments and music notation with suggestions for more elements to add to the image

Produce fun, original images in seconds with the Image Playground experience right in your apps. Create an entirely new image based on a description, suggested concepts, and even a person from your Photos library. You can easily adjust the style and make changes to match a Messages thread, your Freeform board, or a slide in Keynote.

Image Playground app is shown on iPad. A custom image in the center is surrounded by different ideas and keywords used to make it.

Experiment with different concepts and try out image styles like animation, illustration, and sketch in the dedicated Image Playground app . Create custom images to share with friends in other apps or on social media.

Preview of a custom Genmoji of someone named Vee based on the prompt, race car driver

Make a brand-new Genmoji right in the keyboard to match any conversation. Provide a description to see a preview, and adjust your description until it’s perfect. You can even pick someone from your Photos library and create a Genmoji that looks like them.

A hand holding Apple Pencil draws a circle around a sketch in the Notes app on iPad.

Image Wand can transform your rough sketch into a related image in the Notes app. Use your finger or Apple Pencil to draw a circle around your sketch, and Image Wand will analyze the content around it to produce a complementary visual. You can even circle an empty space, and Image Wand will use the surrounding context to create a picture.

Cover of a custom new memory based on the description entered in the text field in the Photos app

Create a custom memory movie of the story you want to see, right in Photos. Enter a description, and Apple Intelligence finds the best photos and videos that match. It then crafts a storyline with unique chapters based on themes it identifies and arranges your photos into a movie with its own narrative arc.

A grid of photos based on the search prompt Katie with stickers on her face

Search for photos and videos in the Photos app simply by describing what you’re looking for. Apple Intelligence can even find a particular moment in a video clip that fits your search description and take you right to it.

A hand taps an object in the background of a photo on iPhone to highlight what to clean up

Remove distractions in your photos with the Clean Up tool in the Photos app. Apple Intelligence identifies background objects so you can remove them with a tap and perfect your shot — while staying true to the original image.

The start of a new era for Siri.

Siri draws on Apple Intelligence for all-new superpowers. With an all-new design, richer language understanding, and the ability to type to Siri whenever it’s convenient for you, communicating with Siri is more natural than ever. Equipped with awareness of your personal context, the ability to take action in and across apps, and product knowledge about your devices’ features and settings, Siri will be able to assist you like never before.

Mac, iPad, and iPhone are shown with new Siri features powered by Apple Intelligence

Discover an even more capable, integrated, personal Siri.

A light, colorful glow is barely visible around the edge of an iPhone showing the home screen

Siri has an all-new design that’s even more deeply integrated into the system experience, with an elegant, glowing light that wraps around the edge of your screen.

A text field at the top of keyboard in iPhone says Ask Siri

With a double tap on the bottom of your iPhone or iPad screen, you can type to Siri from anywhere in the system when you don’t want to speak out loud.

An iPhone is shown with step-by-step guidelines on how to schedule a text message to send later

Tap into the expansive product knowledge Siri has about your devices’ features and settings. You can ask questions when you’re learning how to do something new on your iPhone, iPad, and Mac, and Siri can give you step-by-step directions in a flash.

Siri, set an alarm for — oh wait no, set a timer for 10 minutes. Actually, make that 5.

Richer language understanding and an enhanced voice make communicating with Siri even more natural. And when you refer to something you mentioned in a previous request, like the location of a calendar event you just created, and ask ”What will the weather be like there?” Siri knows what you’re talking about.

A notification in the Apple TV+ app reminds you that a contact shared a show recommendation with you

Apple Intelligence empowers Siri with onscreen awareness , so it can understand and take action with things on your screen. If a friend texts you their new address, you can say “Add this address to their contact card,” and Siri will take care of it.

Snippets of information like calendar events, photos, and notes shows the many sources Siri can draw from

Awareness of your personal context enables Siri to help you in ways that are unique to you. Can’t remember if a friend shared that recipe with you in a note, a text, or an email? Need your passport number while booking a flight? Siri can use its knowledge of the information on your device to help find what you’re looking for, without compromising your privacy.

Photos library is shown on an iPhone along with a search description. A second iPhone is open to a single photo favorited based on the search. A third iPhone shows the photo incorporated into a note in the Notes app.

Seamlessly take action in and across apps with Siri. You can make a request like “Send the email I drafted to April and Lilly” and Siri knows which email you’re referencing and which app it’s in. And Siri can take actions across apps, so after you ask Siri to enhance a photo for you by saying “Make this photo pop,” you can ask Siri to drop it in a specific note in the Notes app — without lifting a finger.

Great powers come with great privacy.

Apple Intelligence is designed to protect your privacy at every step. It’s integrated into the core of your iPhone, iPad, and Mac through on-device processing. So it’s aware of your personal information without collecting your personal information. And with groundbreaking Private Cloud Compute, Apple Intelligence can draw on larger server-based models, running on Apple silicon, to handle more complex requests for you while protecting your privacy.

Private Cloud Compute

  • Your data is never stored
  • Used only for your requests
  • Verifiable privacy promise

ai language model for scientific research paper writing

ChatGPT, seamlessly integrated.

With ChatGPT from OpenAI integrated into Siri and Writing Tools, you get even more expertise when it might be helpful for you — no need to jump between tools. Siri can tap into ChatGPT for certain requests, including questions about photos or documents. And with Compose in Writing Tools, you can create and illustrate original content from scratch.

You control when ChatGPT is used and will be asked before any of your information is shared. Anyone can access ChatGPT for free, without creating an account. ChatGPT subscribers can connect accounts to access paid features within these experiences.

The Compose in Writing Tools feature is shown on a MacBook

New possibilities for your favorite apps.

New App Intents, APIs, and frameworks make it incredibly easy for developers to integrate system-level features like Siri, Writing Tools, and Image Playground into your favorite apps.

Learn more about developing for Apple Intelligence

Apple Intelligence is compatible with these devices.

Apple Intelligence is free to use and will initially be available in U.S. English. Coming in beta this fall. *

  • iPhone 15 Pro Max A17 Pro
  • iPhone 15 Pro A17 Pro
  • iPad Pro M1 and later
  • iPad Air M1 and later
  • MacBook Air M1 and later
  • MacBook Pro M1 and later
  • iMac M1 and later
  • Mac mini M1 and later
  • Mac Studio M1 Max and later
  • Mac Pro M2 Ultra

ai language model for scientific research paper writing

  • Trustworthy AI

Our trust in technology relies on understanding how it works. It’s important to understand why AI makes the decisions it does. We’re developing tools to make AI more explainable, fair, robust, private, and transparent.

Tiny benchmarks for large language models

  • Foundation Models

IBM’s Granite model is one of the most transparent LLMs in the world

ai language model for scientific research paper writing

  • AI Transparency
  • Open Source

What is red teaming for generative AI?

  • Adversarial Robustness and Privacy
  • Fairness, Accountability, Transparency
  • Natural Language Processing

An AI model trained on data that looks real but won’t leak personal information

ai language model for scientific research paper writing

  • Data and AI Security

The latest AI safety method is a throwback to our maritime past

ai language model for scientific research paper writing

  • Explainable AI
  • Generative AI

What is AI alignment?

  • Automated AI
  • See more of our work on Trustworthy AI
  • AI Testing We’re designing tools to help ensure that AI systems are trustworthy, reliable and can optimize business processes.
  • Adversarial Robustness and Privacy We’re making tools to protect AI and certify its robustness, and helping AI systems adhere to privacy requirements.
  • Explainable AI We’re creating tools to help AI systems explain why they made the decisions they did.
  • Fairness, Accountability, Transparency We’re developing technologies to increase the end-to-end transparency and fairness of AI systems.
  • Trustworthy Generation We’re developing theoretical and algorithmic frameworks for generative AI to accelerate future scientific discoveries.
  • Uncertainty Quantification We’re developing ways for AI to communicate when it's unsure of a decision across the AI application development lifecycle.

ai language model for scientific research paper writing

Science for Social Good

IBM Science for Social Good partners IBM Research scientists and engineers with academic fellows, subject matter experts from NGOs, public sector agencies, and social enterprises to tackle emerging societal challenges using science and technology.

Publications

  • Andong Wang
  • Cynthia Dwork
  • Kristjan Greenewald
  • Frank Libsch
  • Steve Bedell
  • Assala Benmalek
  • Celia Cintas
  • Leshem Choshen
  • LREC-COLING 2024

ai language model for scientific research paper writing

Building trustworthy AI with Watson

Our research is regularly integrated into Watson solutions to make IBM’s AI for business more transparent, explainable, robust, private, and fair.

  • United States
  • United Kingdom

What is generative AI? Artificial intelligence that creates

Generative ai models can carry on conversations, answer questions, write stories, produce source code, and create images and videos of almost any description. here's how generative ai works, how it's being used, and why it’s more limited than you might think..

Josh Fruhlinger

Contributing writer, InfoWorld |

shutterstock 1117048970 colorful balloons flying among paper planes and against a cloudy blue sky

The emergence of generative AI

How does generative ai work, what is an ai model, is generative ai sentient, testing the limits of computer intelligence.

  • Why does AI art have too many fingers?
  • Potential negative impacts of generative AI
  • Use cases for generative AI

Generative AI is a kind of artificial intelligence that creates new content, including text, images, audio, and video, based on patterns it has learned from existing content. Today’s generative AI models have been trained on enormous volumes of data using deep learning , or deep neural networks, and they can carry on conversations, answer questions, write stories, produce source code, and create images and videos of any description, all based on brief text inputs or “prompts.”

Generative AI is called generative because the AI creates something that didn’t previously exist. That’s what makes it different from discriminative AI , which draws distinctions between different kinds of input. To say it differently, discriminative AI tries to answer a question like “Is this image a drawing of a rabbit or a lion?” whereas generative AI responds to prompts like “Draw me a picture of a lion and a rabbit sitting next to each other.”

This article introduces you to generative AI and its uses with popular models like ChatGPT and DALL-E . We’ll also consider the limitations of the technology, including why “too many fingers” has become a dead giveaway for artificially generated art.

Generative AI has been around for years, arguably since  ELIZA , a chatbot that simulates talking to a therapist, was developed at MIT in 1966. But years of work on AI and machine learning have recently come to fruition with the release of new generative AI systems. You’ve almost certainly heard about ChatGPT , a text-based AI chatbot that produces remarkably human-like prose.  DALL-E  and  Stable Diffusion  have also drawn attention for their ability to create vibrant and realistic images based on text prompts.

Output from these systems is so uncanny that it has many people asking philosophical questions about the nature of consciousness—and worrying about the economic impact of generative AI on human jobs. But while all of these artificial intelligence creations are undeniably big news, there is arguably less going on beneath the surface than some may assume. We’ll get to some of those big-picture questions in a moment. First, let’s look at what’s going on under the hood.

Generative AI uses machine learning to process a huge amount of visual or textual data, much of which is scraped from the internet, and then determines what things are most likely to appear near other things. Much of the programming work of generative AI goes into creating algorithms that can distinguish the “things” of interest to the AI’s creators—words and sentences in the case of chatbots like ChatGPT , or visual elements for DALL-E . But fundamentally, generative AI creates its output by assessing an enormous corpus of data, then responding to prompts with something that falls within the realm of probability as determined by that corpus.

Autocomplete—when your cell phone or Gmail suggests what the remainder of the word or sentence you’re typing might be—is a low-level form of generative AI. ChatGPT and DALL-E just take the idea to significantly more advanced heights.

ChatGPT and DALL-E are interfaces to underlying AI functionality that is known in AI terms as a model. An AI model is a mathematical representation—implemented as an algorithm, or practice—that generates new data that will (hopefully) resemble a set of data you already have on hand. You’ll sometimes see ChatGPT and DALL-E themselves referred to as models; strictly speaking this is incorrect, as ChatGPT is a chatbot that gives users access to several different versions of the underlying GPT model. But in practice, these interfaces are how most people will interact with the models, so don’t be surprised to see the terms used interchangeably.

AI developers assemble a corpus of data of the type that they want their models to generate. This corpus is known as the model’s training set, and the process of developing the model is called training . The GPT models, for instance, were trained on a huge corpus of text scraped from the internet, and the result is that you can feed it natural language queries and it will respond in idiomatic English (or any number of other languages, depending on the input).

AI models treat different characteristics of the data in their training sets as vectors —mathematical structures made up of multiple numbers. Much of the secret sauce underlying these models is their ability to translate real-world information into vectors in a meaningful way, and to determine which vectors are similar to one another in a way that will allow the model to generate output that is similar to, but not identical to, its training set.

There are a number of different types of AI models out there, but keep in mind that the various categories are not necessarily mutually exclusive. Some models can fit into more than one category.

Probably the AI model type receiving the most public attention today is the large language models , or LLMs. LLMs are based on the concept of a transformer, first introduced in “ Attention Is All You Need ,” a 2017 paper from Google researchers. A transformer derives meaning from long sequences of text to understand how different words or semantic components might be related to one another, then determines how likely they are to occur in proximity to one another. The GPT models are LLMs, and the T stands for transformer. These transformers are run unsupervised on a vast corpus of natural language text in a process called  pretraining (that’s the  P in GPT), before being fine-tuned by human beings interacting with the model.

Diffusion is commonly used in generative AI models that produce images or video. In the diffusion process, the model adds noise —randomness, basically—to an image, then slowly removes it iteratively, all the while checking against its training set to attempt to match semantically similar images. Diffusion is at the core of AI models that perform text-to-image magic like Stable Diffusion and DALL-E.

A  generative adversarial network , or GAN, is based on a type of reinforcement learning , in which two algorithms compete against one another. One generates text or images based on probabilities derived from a big data set. The other—a discriminative AI—assesses whether that output is real or AI-generated. The generative AI repeatedly tries to “trick” the discriminative AI, automatically adapting to favor outcomes that are successful. Once the generative AI consistently “wins” this competition, the discriminative AI gets fine-tuned by humans and the process begins anew.

One of the most important things to keep in mind here is that, while there is human intervention in the training process, most of the learning and adapting happens automatically. Many, many iterations are required to get the models to the point where they produce interesting results, so automation is essential. The process is quite computationally intensive, and much of the recent explosion in AI capabilities has been driven by advances in GPU computing power and techniques for implementing parallel processing on these chips .

The mathematics and coding that go into creating and training generative AI models are quite complex, and well beyond the scope of this article. But if you interact with the models that are the end result of this process, the experience can be decidedly uncanny. You can get DALL-E to produce things that look like real works of art. You can have conversations with ChatGPT that feel like a conversation with another human. Have researchers truly created a thinking machine?

Chris Phipps, a former IBM natural language processing lead who worked on Watson AI products, says no. He describes ChatGPT as a “very good prediction machine.”

It’s very good at predicting what humans will find coherent. It’s not always coherent (it mostly is) but that’s not because ChatGPT “understands.” It’s the opposite: humans who consume the output are really good at making any implicit assumption we need in order to make the output make sense.

Phipps, who’s also a comedy performer, draws a comparison to a common improv game called Mind Meld.

Two people each think of a word, then say it aloud simultaneously—you might say “boot” and I say “tree.” We came up with those words completely independently and at first, they had nothing to do with each other. The next two participants take those two words and try to come up with something they have in common and say that aloud at the same time. The game continues until two participants say the same word.
Maybe two people both say “lumberjack.” It seems like magic, but really it’s that we use our human brains to reason about the input (“boot” and “tree”) and find a connection. We do the work of understanding, not the machine. There’s a lot more of that going on with ChatGPT and DALL-E than people are admitting. ChatGPT can write a story, but we humans do a lot of work to make it make sense.

Certain prompts that we can give to these AI models will make Phipps’ point fairly evident. For instance, consider the riddle “What weighs more, a pound of lead or a pound of feathers?” The answer, of course, is that they weigh the same (one pound), even though our instinct or common sense might tell us that the feathers are lighter.

ChatGPT will answer this riddle correctly, and you might assume it does so because it is a coldly logical computer that doesn’t have any “common sense” to trip it up. But that’s not what’s going on under the hood. ChatGPT isn’t logically reasoning out the answer; it’s just generating output based on its predictions of what should follow a question about a pound of feathers and a pound of lead. Since its training set includes a bunch of text explaining the riddle, it assembles a version of that correct answer.

However, if you ask ChatGPT whether two pounds of feathers are heavier than a pound of lead, it will confidently tell you they weigh the same amount, because that’s still the most likely output to a prompt about feathers and lead, based on its training set. It can be fun to tell the AI that it’s wrong and watch it flounder in response; I got it to apologize to me for its mistake and then suggest that two pounds of feathers weigh four times as much as a pound of lead.

  • Artificial Intelligence
  • Machine Learning
  • Emerging Technology
  • Technology Industry
  • Data Science
  • Generative AI

ai language model for scientific research paper writing

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Published: 26 April 2023

Science in the age of large language models

  • Abeba Birhane 1 ,
  • Atoosa Kasirzadeh 2 , 3 ,
  • David Leslie 3 , 4 &
  • Sandra Wachter 5  

Nature Reviews Physics volume  5 ,  pages 277–280 ( 2023 ) Cite this article

36k Accesses

56 Citations

112 Altmetric

Metrics details

Rapid advances in the capabilities of large language models and the broad accessibility of tools powered by this technology have led to both excitement and concern regarding their use in science. Four experts in artificial intelligence ethics and policy discuss potential risks and call for careful consideration and responsible usage to ensure that good scientific practices and trust in science are not compromised. 

Large language models (LLMs) are deep learning models with a huge number of parameters trained in an unsupervised way on large volumes of text. LLMs started to emerge around 2018 and since then there has been a sharp increase in the number of parameters and capabilities (for example, GPT-4 has over 100 trillion parameters and can process both text and images). Discussions about the use and misuse of this technology in science erupted in late 2022, prompted by the sudden widespread access to LLM tools that can generate and edit scientific text or can answer scientific questions. Some of the open questions fuelling these conversations are summarized in Box 1 .

Box 1 Open questions

Accuracy, reliability and accountability

Hallucination: How can scientists methodically determine when large language models (LLMs) are ‘hallucinating’ or generating inaccurate and fantastical content? How can scientists best assess and work around these tendencies to generate unreliable or non-factual outputs?

Responsiveness to change: If LLMs fail to extrapolate effectively when world knowledge changes or data distributions drift over time, how can scientists safeguard their accuracy, reliability and responsiveness to change?

Sparse phenomena: If LLMs struggle to reliably generate accurate content for infrequent or sparsely studied phenomena, how do scientists draw on LLMs to inform insights about anomalies, new discoveries or unprecedented observations?

Research integrity: What is plagiarism and authorial misrepresentation in the age of LLMs? How should scientists be held accountable for plagiarism and authorial misrepresentation? What checks should be put in place to establish the authenticity of scientific publications?

Quantifying the degree of LLMs assistance in writing: What is acceptable and what is not?

Accountability: Who is responsible for the integrity of scientific research and the content of scientific papers aided by LLMs? Who is accountable?

Explainability, missingness and bias

Opacity: How can opaque LLMs justifiably be integrated into the scientific method?

Explainability: How can the original sources be traced back? How can scientists, who draw on opaque LLMs, clarify the intended meaning or nuances of the texts based on which such models render their outputs? Does a lack of interpretability undermine the justifiability of relying on inferences drawn from LLMs?

Missingness: If scientific papers represent the final product of a research process rather than a full picture of the complex choices, practices and contexts that underlie the research (that is not all research is documented, in particular failures and negative results), how can the inferences generated by LLMs (which only process the information scientific articles, textbooks, websites and so on) account for the missingness that derives from the limitations of such a ‘tip-of-the-iceberg’ view?

Selection: How can LLMs account for outdated or incorrect knowledge in the published literature?

Bias: How can potential biases in the training data sets of LLMs — and other social, statistical and cognitive biases that may arise in their design, development and deployment — be most effectively assessed? How will LLMs enhance existing and introduce new biases or help remove existing ones?

Scientific ingenuity and discovery

Paradigm shifts: How can LLMs accommodate future ‘paradigm shifts’ in scientific understanding? Could LLMs (which generate insights by identifying patterns emergent from past research — potentially engendering paradigm lock-in and stifling novelty) function to tamp down possibilities for new scientific directions?

Outliers: Will outliers (radical new ideas, unconventional views and unusual writing styles) be lost, overlooked or averaged out?

Scientific creativity: What is the role of the scientist in the age of LLMs? What is the role of scientific creativity?

Deskilling: Will overreliance on LLMs to produce arguments and text risk diminishing or weakening the writing and critical thinking skills and insight of researchers?

Science assessment and peer review

Assessing quality: How do we assess high-quality science in the age of LLMs? What role should the values of reproducibility/replicability and transparency play?

Ethos of science: How do we trust science in the age of LLMs? How, if at all, do the values of objectivity, rigour and accountability change with the scaled integration of LLMs into scientific practices?

What are the wider concerns?

Abeba Birhane: In a matter of months, LLMs have come to captivate the scientific community, general public, journalists and legislators. These systems are often presented as game-changers that will radically affect our lives from the way we search for information to the way we create art and do science. As hype around the capabilities of these systems continues to grow, many claims are made without evidence; the burden of disproving these claims is put on critiques. Despite the concrete negative consequences of these systems on actual people 1 — often on those at the margins of society — questions of responsibility, accountability, exploited labour and otherwise critical inquiries rarely accompany discussion of LLMs. Instead, discussions are dominated by abstract and hypothetical speculations around their intelligence, consciousness, moral status and capability for understanding, all at the cost of questions of responsibility, underlying exploited labour and uneven distribution of harm and benefit from these systems.

Sandra Wachter: Generative AI (GenAI, deep learning models that can output data beyond text, such as images or audio), more broadly, is a potentially very disruptive technology that could impact many areas such as education, media, art and scientific research. The disruption of both the production and consumption of science and research is particularly concerning because domain expertise is necessary to detect when GenAI has ‘hallucinated’ or invented falsehoods and confidently passed them off as the truth.

Disruptive technologies have always inspired great hopes and fears. The printing press was feared to lead to the moral erosion of society, fast moving automobiles were assumed to harm internal organs of people and the telephone was said to destroy family values. Many of these fears were ultimately unfounded. But other dangers did materialize that were not even on the radar of developers, scholars and policymakers at the time, such as the significant impact of personal automobiles on the environment. Reliably predicting the social and economic impacts, risks and development pathway of disruptive technologies is difficult. This is not to say that we should stop horizon scanning, but rather that we need to periodically re-evaluate the risks and benefits of technologies.

Among these risks, the environmental impact of these technologies urgently needs to be addressed. Regardless of their utility, we need to keep in mind that they have a significant carbon footprint 2 . As opposed to when the automobile first appeared, we now know the environmental costs society is forced to bear. As scientists, and as a society, we must not look away from how the use of artificial intelligence (AI) technologies can exacerbate the climate crisis.

What are the specific concerns for science?

David Leslie: LLMs, and more broadly foundation models and GenAI, will undoubtedly play a significant role in the future of scientific discovery. Researchers, however, must proceed with caution, engaging the affordances provided by these technologies with the same kinds of epistemic humility, deflationary scepticism and disciplined adherence to the scientific method that have functioned as preconditions of modern scientific advancement since the dawn of the seventeenth-century Baconian and Newtonian revolutions. Amidst the hype surrounding LLMs, scientists must acknowledge the social and interpretative character of scientific discovery and manage expectations regarding the contributions of LLMs to the advancement of scientific understanding.

LLMs generate predictions of the ‘statistically likely continuations of word sequences’ 3 based on brute-force iterative training on massive corpuses of digital text data. As sequence predictors, these models draw on the underlying statistical distribution of previously generated text to stitch together vectorized symbol strings based on the probabilities of their co-occurrence 4 . They therefore lack the communicatively embodied and relational functionings that are a prerequisite of scientific meaning-making, in the barest sense. These systems do not ‘inhabit’ the lived reality in which speaking and interacting members of the human community together build and reproduce a common world of shared experience, using the agency of language to convey intention, to assess and establish truth through the exchange of reasons and to cope with the myriad problems of existence. In this way, LLMs, foundation models and GenAI technologies lack the basic capacities for intersubjectivity, semantics and ontology that are preconditions for the kind of collaborative world-making that allows scientists to theorize, understand, innovate and discover. Despite their impressive feats of rhetorical prowess, systems such as ChatGPT can neither navigate an evolving space of scientific reasons nor partake in the trials and triumphs of scientific meaning-making. Their subsidiary role in scientific discovery should hence be understood taking this limitation into account.

Atoosa Kasirzadeh: I point to three significant concerns regarding the use of LLMs in scientific contexts. First, LLMs may not capture nuanced value judgements implicit in scientific writings 5 . Although LLMs seem to provide useful general summaries of some scientific texts, for example, it is less clear whether they can capture the uncertainties, limitations and nuances of research that are obvious to the human scientist. Relying solely on LLMs for writing scientific summaries can result in oversimplified texts that overlook crucial value judgements and lead to misinterpretations of study results. We should, therefore, proceed with caution when using LLMs for scientific summarization. Additional work is needed to ensure that LLMs accurately communicate the value judgements underlying scientific practice. This work should include designing appropriate evaluation benchmarks to assess the accuracy of LLMs in communicating these value judgements.

Second, LLMs have been known to generate non-existent and false content — a phenomenon that has been dubbed ‘hallucination’. For instance, Meta’s Galactica, an LLM that was initially designed to reason about scientific knowledge, was reported to exhibit significant flaws such as reproducing biases and presenting falsehoods as facts 6 and was shut down after only 3 days of public API access. Therefore, overreliance on LLMs for tasks such as writing literature reviews should be avoided. Or at least the output should be very carefully fact-checked.

Third, the use of LLMs in the peer-review process can endanger trust in it. LLMs used for writing peer-review reports run the risk of misinterpreting the submitted scientific article, be it by a loss of crucial information or by a hallucination in the aforementioned sense. And whereas one can hold human reviewers responsible, it is a nontrivial question how to hold LLMs responsible — in part owing to their opaque nature. It seems like a responsibility gap is lurking here.

Who bears the responsibility?

AB: As we rush to deploy LLMs into scientific practices, it is important to remember that science is a human enterprise and LLMs are tools — albeit impressive at predicting the next word in a sequence based on previously ‘seen’ words — with limitations such as brittleness (susceptibility to catastrophic failure), unreliability and the fabrication of seemingly ‘scientific’ nonsense. Even if these limitations can, by some miracle, be solved, it would be a grave error to treat LLMs as scientists that can produce science. Knowledge implies responsibility and is never detached from the scientist that produces it. Science never emerges in a historical, social or cultural vacuum and builds on a vast edifice of well-established knowledge. We embark on a scientific journey to build on this edifice, to react and to debunk it, in anticipation of responses and reactions. We take responsibility for our work and defend it when criticized or retract it when proven wrong. What is conceived as science can be dependent on ideologies of the time. For example, at its peak during the early nineteenth century, eugenics was mainstream science. Most importantly, as science is never done from a ‘view from nowhere’, our questions, methodologies, analysis and interpretations of our findings are influenced by our interests, motivations, objectives and perspectives. LLMs, as tools, have none of these. As tools, LLMs, with close and constant vetting by the scientist, can aid scientific creativity and writing 7 . However, to conceive of LLMs as scientists or authors themselves is to misunderstand both science and LLMs and to evade responsibility and accountability.

What should scientists do?

SW: We are currently at a crucial point with GenAI. Its possibilities seem limitless, and yet we are still early enough in its lifecycle to transform its future pathway. Science is fast paced and highly competitive. The pressure to publish can be overwhelming. A technology that can save time in conducting research and increasing output can be very tempting. But if GenAI is used automatically and without critical oversight, it may fundamentally undermine the foundations of ‘good’ science.

At this stage, we need to think about how to responsibly integrate GenAI into science. Scientists have an ethical responsibility to society to produce knowledge that follows the highest possible standards. Climate change and COVID-19 are just two examples of the overwhelming importance of reliable science for driving policy and societal action. Researchers need to collaborate with journals, publishers, conference organizers, the press and the wider scientific community to develop best practices, standards and detection methods to ensure that the benefits of GenAI can be realized without fundamentally undermining science and its role in society.

DL: Scientists must view LLMs and GenAI technologies as exploratory tools that bolster responsible, mission-driven and society-led research practices and that support the advancement of scientific discovery and understanding. To paraphrase the words of economist Zvi Griliches 8 , the expanding use of these AI technologies in scientific research is the ‘discovery of a method of discovery’ — the invention of a new set of research tools that support and enable new pathways of insight, innovation and ingenuity in the physical and life sciences.

Starting from such a tool-based understanding, researchers must view the role of these technologies in scientific discovery through a chastening, but non-reductive lens, deploying them as computational vehicles of observation and analysis to probe properties of complex physical and biological systems and patterns in high-dimensional biophysical data that would otherwise be inaccessible to human-scale examination, experiment and inference. But the path to discovery should not be treated in a strictly instrumentalist way; scientists should not see these complex models as mere oracles. Rather, their results and innerworkings should be seen as springboards for scientific reflection and creativity that can play a constituent role in guiding the broader socially embodied pursuit of the expansion and refinement of scientific understanding 9 .

In addition, the AI-generated outputs and the insights of these models must be regarded as both interpreter-dependent and theory-laden. The construction and deployment of LLMs and GenAI tools and their application in scientific exploration must be seen as interpretive accomplishments that are embedded in what philosophers of science from have called ‘contexts of discovery’ 10 , 11 . These are contexts of scientific sense-making that involve real-life processes of communication carried out cooperatively by members of an unbounded human community of inquiry, interpretation and reason-giving.

It is important for the scientific community to closely monitor these developments and to urge AI research laboratories, such as OpenAI, to prioritize research on more reliable detectors. Furthermore, it is crucial that the scientific community continues to closely follow the development and use of LLMs and adapts their policies and practices in consultation with AI ethics and safety experts, to ensure that the use of LLMs enhances, rather than undermines, the rigor and reproducibility of scientific research. Finally, the scientific community must encourage more interdisciplinary discussions with experts from academia and industry to navigate the implications of LLMs for scientific knowledge.

AK: Until more robust and reliable safeguards are in place, the scientific community should take a timely and firm stance to avoid any overreliance on LLMs and to foster practices of responsible science in the age of LLMs. Otherwise, the risk is to jeopardize the credibility of scientific knowledge. An initial step towards this is to try to design LLM policies in a realistic way; for example, to identify and ban papers that primarily rely on LLMs, a policy already adopted at the International Conference on Machine Learning (ICML) 2023 and likely to be enforced widely. However, identifying LLM-generated text is challenging, and the development of accurate detection tools is an ongoing area of research. Recent studies have raised concerns about the reliability of these methods in accurately distinguishing between LLM-generated and non-LLM-generated text 12 .

In addition, scientists must also be more vocal about the potential negative impacts of this technology on the scientific community. By raising awareness and demanding further research and development of safeguards, the scientific community can actively contribute to the responsible and ethical use of LLMs. This includes promoting interdisciplinary collaboration and sharing knowledge about the potential risks and benefits of LLMs in various fields.

Weidinger, L. et al. Taxonomy of risks posed by language models. in FAccT ‘22: 2022 ACM Conference on Fairness, Accountability, and Transparency 214–229 (ACM, 2022).

Bender, E. et al. On the dangers of stochastic parrots: can language models be too big? in FAccT ‘21: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency 610–623 (ACM, 2021).

Shanahan, M. Talking about large language models. Preprint at https://doi.org/10.48550/arXiv.2212.03551 (2022).

Bender, E. & Koller, A. Climbing towards NLU: on meaning, form, and understanding in the age of data. in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics 5185–5198 (ACL, 2020).

Kasirzadeh, A. & Gabriel, I. In conversation with artificial intelligence: aligning language models with human values. Philos. Technol. 36 , 27 (2023).

Article   Google Scholar  

Heaven, W. D. Why Meta’s latest large language model survived only three days online, MIT Technology Review. https://www.technologyreview.com/2022/11/18/1063487/meta-large-language-model-ai-only-survived-three-days-gpt-3-science/ (2023).

Owens, B. How Nature readers are using ChatGPT. Nature https://www.nature.com/articles/d41586-023-00500-8 (20 February 2023).

Griliches, Z. Hybrid corn: an exploration in the economics of technological change. Econometrica 25 , 501–522 (1957).

Krenn, M. et al. On scientific understanding with artificial intelligence. Nat. Rev. Phys. 4 , 761–769 (2022).

Reichenbach, H. Experience and prediction. An analysis of the foundations and the structure of knowledge. J. Philos. 35 , 270 (1938).

Kuhn, T. The Structure of Scientific Revolutions (University of Chicago Press, 2012).

Sadasivan, V. S. et al. Can AI-generated text be reliably detected? Preprint at arXiv https://doi.org/10.48550/arXiv.2303.11156 (2023).

Download references

Acknowledgements

The work of S.W. is supported through research funding provided by the Wellcome Trust (grant nr 223765/Z/21/Z), Sloan Foundation (grant no. G-2021-16779), the Department of Health and Social Care (via the AI Lab at NHSx) and Luminate Group to support the Trustworthiness Auditing for AI project and Governance of Emerging Technologies research programme at the Oxford Internet Institute, University of Oxford.

Author information

Authors and affiliations.

Mozilla Foundation and Trinity College Dublin, Dublin, Ireland

Abeba Birhane

The University of Edinburgh, Edinburgh, UK

Atoosa Kasirzadeh

The Alan Turing Institute, London, UK

Atoosa Kasirzadeh & David Leslie

Queen Mary University of London, London, UK

David Leslie

University of Oxford, Oxford, UK

Sandra Wachter

You can also search for this author in PubMed   Google Scholar

Contributions

A.B. is cognitive scientist researching human behaviour, social systems and responsible and ethical AI. She is a Senior Fellow in Trustworthy AI at Mozilla Foundation and an Adjunct Assistant Professor at Trinity College Dublin, Ireland.

A.K. is a philosopher and ethicist of science and emerging technologies, an applied mathematician and an engineer. Currently, she is a tenure-track assistant professor and a Chancellor’s Fellow in the Philosophy department and the Director of Research at the Centre for Technomoral Futures in the Futures Institute at the University of Edinburgh. Her recent work is focused on the implications of machine learning, in particular large language models and other models for science, society and humanity.

S.W. is Professor of Technology and Regulation at the Oxford Internet Institute at the University of Oxford where she researches the legal and ethical implications of AI, Big Data and robotics as well as Internet and platform regulation. At the OII, she leads and coordinates the Governance of Emerging Technologies (GET) Research Programme that investigates legal, ethical and technical aspects of AI, machine learning and other emerging technologies.

D.L. is Professor of Ethics, Technology and Society at Queen Mary University of London and the Director of Ethics and Responsible Innovation Research at The Alan Turing Institute. He is a philosopher and social theorist, whose research focuses on the ethics of emerging technologies, AI governance, data justice and the social and ethical impacts of AI, machine learning and data-driven innovations.

Corresponding authors

Correspondence to Abeba Birhane , Atoosa Kasirzadeh , David Leslie or Sandra Wachter .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article.

Birhane, A., Kasirzadeh, A., Leslie, D. et al. Science in the age of large language models. Nat Rev Phys 5 , 277–280 (2023). https://doi.org/10.1038/s42254-023-00581-4

Download citation

Accepted : 24 March 2023

Published : 26 April 2023

Issue Date : May 2023

DOI : https://doi.org/10.1038/s42254-023-00581-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Artificial intelligence generated clinical score sheets: looking at the two faces of janus.

  • Cristian Berce

Laboratory Animal Research (2024)

Opportunities and risks of large language models in psychiatry

  • Nick Obradovich
  • Sahib S. Khalsa
  • Martin P. Paulus

NPP—Digital Psychiatry and Neuroscience (2024)

Techniques for supercharging academic writing with generative AI

  • Zhicheng Lin

Nature Biomedical Engineering (2024)

Testing the reliability of an AI-based large language model to extract ecological information from the scientific literature

  • Andrew V. Gougherty
  • Hannah L. Clipp

npj Biodiversity (2024)

The model student: GPT-4 performance on graduate biomedical science exams

  • Daniel Stribling

Scientific Reports (2024)

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

ai language model for scientific research paper writing

Purdue Online Writing Lab Purdue OWL® College of Liberal Arts

Welcome to the Purdue Online Writing Lab

OWL logo

Welcome to the Purdue OWL

This page is brought to you by the OWL at Purdue University. When printing this page, you must include the entire legal notice.

Copyright ©1995-2018 by The Writing Lab & The OWL at Purdue and Purdue University. All rights reserved. This material may not be published, reproduced, broadcast, rewritten, or redistributed without permission. Use of this site constitutes acceptance of our terms and conditions of fair use.

The Online Writing Lab (the Purdue OWL) at Purdue University houses writing resources and instructional material, and we provide these as a free service at Purdue. Students, members of the community, and users worldwide will find information to assist with many writing projects. Teachers and trainers may use this material for in-class and out-of-class instruction.

The On-Campus and Online versions of Purdue OWL assist clients in their development as writers—no matter what their skill level—with on-campus consultations, online participation, and community engagement. The Purdue OWL serves the Purdue West Lafayette and Indianapolis campuses and coordinates with local literacy initiatives. The Purdue OWL offers global support through online reference materials and services.

Social Media

Facebook twitter.

IMAGES

  1. (PDF) Evaluation of GPT-3 AI language model in research paper writing

    ai language model for scientific research paper writing

  2. How To Write A Research Paper On Artificial Intelligence?

    ai language model for scientific research paper writing

  3. Can We Use AI for Writing Scientific Papers? [Example Included]

    ai language model for scientific research paper writing

  4. Top 10 AI Writing Tools for Research Writing: Boost Your Academic Writing

    ai language model for scientific research paper writing

  5. Free AI Tool to generate Technical Content for Research Paper Writing |Writer AI

    ai language model for scientific research paper writing

  6. AI writing tools promise faster manuscripts for researchers

    ai language model for scientific research paper writing

VIDEO

  1. MIT CSAIL Explains: Large Language Models: Part 1

  2. Research Methods Workshop on Reading Computer Science Research Papers

  3. IMRAD format in scientific research paper writing|Steps in writing research paper|Nursing Research

  4. Writing a Scientific Research Paper: The Literature Review

  5. How to Write a Scientific Research Paper

  6. 科研论文作图014_高效组图动作包

COMMENTS

  1. Best Practices for Using AI When Writing Scientific Manuscripts

    AI language bots are incapable of understanding new information, generating insights, or deep analysis, which would limit the discussion within a scientific paper. While appearing well formulated, the results are, however, superficial, and over-reliance on the output could squelch creativity throughout the scientific enterprise.

  2. Best AI Tools for Research Paper Writing

    1. SciSpace Literature Review. SciSpace's Literature Review tool is designed for researchers to conduct systematic literature reviews efficiently. This AI research tool helps you compare and contrast scientific papers, and extract key information, themes, patterns, and concepts of numerous research papers in minutes, all in one single workspace.

  3. Techniques for supercharging academic writing with generative AI

    AI can make writing faster and better, enabling greater focus on content and ideas — thereby reducing communication costs and enhancing creativity and quality 9, 13, 14, 15. There are two ...

  4. Cheat Sheet

    LLMs and Tools for R&D. To help scientists and researchers navigate the increasing number of advanced artificial intelligence (AI) options, Enthought's experts put together this summary of Large Language Models (LLMs) and related tools that are most relevant for R&D updated as of early August 2023. This is a fast-moving field, so expect that ...

  5. Could AI help you to write your next paper?

    OpenAI, a research laboratory in San Francisco, California, created the most well-known LLM, GPT-3, in 2020, by training a network to predict the next piece of text based on what came before. On ...

  6. AI Tools for Science Writing: Why, How, When, and When Not To

    The use of AI tools for science writing offers several compelling benefits. First, it significantly enhances efficiency by expediting the writing process. AI-driven tools can swiftly generate drafts and summaries, allowing researchers to focus more on research and analysis.

  7. Ten simple rules for using large language models in science, version 1

    Introduction. Generative artificial intelligence (AI) tools, including large language models (LLMs), are expected to radically alter the way we live and work, with as many as 300 million jobs at risk [].Arguably the most well-known LLM currently is GPT (generative pre-trained transformer), developed by American company OpenAI [].Since its release in late 2022, GPT's chatbot interface ...

  8. Writing the rules in AI-assisted writing

    We wrote in January 2023 (ref. 1) about the possible impact of large language models (LLMs) on scientific writing and, like many others 2, we called for a community-wide discussion on guidelines ...

  9. ResearchAgent: Iterative Research Idea Generation over Scientific

    Scientific Research, vital for improving human life, is hindered by its inherent complexity, slow pace, and the need for specialized experts. To enhance its productivity, we propose a ResearchAgent, a large language model-powered research idea writing agent, which automatically generates problems, methods, and experiment designs while iteratively refining them based on scientific literature ...

  10. Can artificial intelligence help for scientific writing?

    Conclusions. ChatGPT is an AI software potentially able to assist in the writing process of a scientific paper and can help in the literature review, identify research questions, provide an overview of the current state of the field, and assist with tasks, such as formatting and language review.

  11. The best AI tools for research papers and academic research (Literature

    A tools that allow users to easily understand complex language in peer reviewed papers. The free tier is enough for nearly everyone. ... AI for scientific writing and research papers. In the ever-evolving realm of academic research, AI tools are increasingly taking center stage. Enter Paper Wizard, Jenny.AI, and Wisio - these groundbreaking ...

  12. Good Practices for Scientific Article Writing with ChatGPT and Other

    In particular, one potential threat is the use of AI-powered "paper mills" or "paper forges" that generate fake or low-quality scientific papers using language models, such as ChatGPT. These papers may be used to manipulate metrics, such as citation counts or impact factors, or to deceive readers into believing that the research is of a ...

  13. A Quick Guide of Using GenAI for Scientific Research

    Many recommendations, guidelines and comments are out there regarding the use of Generative AI in research and in other lines of work. Here are a few examples. Best Practices for Using AI When Writing Scientific Manuscripts: Caution, Care, and Consideration: Creative Science Depends on It. Jullian M. Buriak, et al. ACS Nano (2023)

  14. Artificial intelligence in scientific writing: a friend or a foe

    The generative pre-trained transformer, ChatGPT, is a chatbot that could serve as a powerful tool in scientific writing. ChatGPT is a so-called large language model (LLM) that is trained to mimic the statistical patterns of language in an enormous database of human-generated text combined from text in books, articles and websites across a wide ...

  15. How to Use Google Bard AI to Write a Scientific Research Paper

    It is a deep learning-based AI model that has been trained on a large corpus of scientific research papers to understand the structure, language, and tone of scientific writing. Google Bard AI is designed to automate the tedious and time-consuming task of writing a scientific research paper by analyzing research data, identifying key findings ...

  16. Generative AI and Writing

    As such, generative AI systems [a.k.a., "Large Language Models"] have emerged, depending on one's perspective, as either a threat or an opportunity in how faculty create or modify class assignments and how students approach the task of writing a college-level research paper.

  17. Evaluation of GPT-3 AI language model in research paper writing

    GPT-3, or Generative P retrained Transformer 3, is a state-of-the-art language model that has. been s hown to be effective at generat ing human-like text for a variety of applications. (Radford et ...

  18. The use of artificial intelligence to improve the scientific writing of

    RESULTS: Artificial intelligence can be a solution to improve scientific writing, especially for non-native English-speaking scientists. Artificial intelligence can assist in the search for pertinent scientific papers, generate summaries, and help with the writing of different sections of the manuscript, including the abstract, introduction, methods, results, and discussion.

  19. (PDF) Adapting to the Impact of AI in Scientific Writing: Balancing

    This article examines the advantages and disadvantages of Large Language Models (LLMs) and Artificial Intelligence (AI) in research and education and proposes the urgent need for an international ...

  20. Mapping the Increasing Use of LLMs in Scientific Papers

    We conduct the first systematic, large-scale analysis to quantify the prevalence of LLM-modified content across multiple academic platforms, extending a recently proposed, state-of-the-art distributional GPT quantification framework (Liang et al., 2024) for estimating the fraction of AI-modified content in a corpus. Throughout this paper, we use the term "LLM-modified" to refer to text ...

  21. ChatGPT

    Early access to new features. Access to GPT-4, GPT-4o, GPT-3.5. Up to 5x more messages for GPT-4o. Access to advanced data analysis, file uploads, vision, and web browsing

  22. From human writing to artificial intelligence generated text: examining

    INTRODUCTION. In the last decade or so, the field of artificial intelligence (AI) has seen remarkable advances, and Chat Generative Pre-trained Transformer (ChatGPT - open AIs GPT-3 model) is a striking example of this progress [].Academic research has traditionally relied on laborious manual methods to sort and analyse large volumes of text.

  23. AI Academic Writing Tool

    Enhance your academic writing with our free writing assistant, a generative AI-powered academic writing tool. Key features - AI Language suggestions, academic translation, grammar checker, english language check, paraphraser, free essay checker and text reducer tool. Try our online AI academic writing tool that checks language errors and provides instant, in-depth suggestions to help you ...

  24. Can artificial intelligence help for scientific writing?

    This paper discusses the use of Artificial Intelligence Chatbot in scientific writing. ChatGPT is a type of chatbot, developed by OpenAI, that uses the Generative Pre-trained Transformer (GPT) language model to understand and respond to natural language inputs. AI chatbot and ChatGPT in particular appear to be useful tools in scientific writing, assisting researchers and scientists in ...

  25. What Is NLP (Natural Language Processing)?

    NLP enables computers and digital devices to recognize, understand and generate text and speech by combining computational linguistics—the rule-based modeling of human language—together with statistical modeling, machine learning (ML) and deep learning. NLP research has enabled the era of generative AI, from the communication skills of large language models (LLMs) to the ability of image ...

  26. Apple Intelligence Preview

    Apple Intelligence powers new Writing Tools, which help you find just the right words virtually everywhere you write. With enhanced language capabilities, you can summarize an entire lecture in seconds, get the short version of a long group thread, and minimize unnecessary distractions with prioritized notifications.

  27. Trustworthy AI

    Natural Language Processing. Security. Trustworthy AI. An AI model trained on data that looks real but won't leak personal information. Research. Kim Martineau. 12 Dec 2023. AI. ... IBM Science for Social Good partners IBM Research scientists and engineers with academic fellows, subject matter experts from NGOs, public sector agencies, and ...

  28. What is generative AI? Artificial intelligence that creates

    Generative AI is a kind of artificial intelligence that creates new content, including text, images, audio, and video, based on patterns it has learned from existing content. Today's generative ...

  29. Science in the age of large language models

    To paraphrase the words of economist Zvi Griliches 8, the expanding use of these AI technologies in scientific research is the 'discovery of a method of discovery' — the invention of a new ...

  30. Welcome to the Purdue Online Writing Lab

    The Online Writing Lab (the Purdue OWL) at Purdue University houses writing resources and instructional material, and we provide these as a free service at Purdue. Students, members of the community, and users worldwide will find information to assist with many writing projects.