U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • HHS Author Manuscripts

Logo of hhspa

Strategies for overcoming language barriers in research

Allison squires.

1 Rory Meyers College of Nursing, New York University, New York City, New York

2 School of Medicine, New York University, New York City, New York

Tina Sadarangani

Simon jones.

3 Population Health, School of Medicine, New York University, New York City, New York

AS, TS and SJ: Made substantial contributions to conception and design, or acquisition of data, or analysis and interpretation of data. AS, TS and SJ: Involved in drafting the manuscript or revising it critically for important intellectual content. AS, TS and SJ: Given final approval of the version to be published. Each author should have participated sufficiently in the work to take public responsibility for appropriate portions of the content. AS, TS and SJ: Agreed to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

This paper seeks to describe best practices for conducting cross-language research with individuals who have a language barrier.

Discussion paper.

Data Sources

Research methods papers addressing cross-language research issues published between 2000–2017.

Implications for Nursing

Rigorous cross-language research involves the appropriate use of interpreters during the research process, systematic planning for how to address the language barrier between participant and researcher and the use of reliably and validly translated survey instruments (when applicable). Biases rooted in those who enter data into “big data” systems may influence data quality and analytic approaches in large observational studies focused on linking patient language preference to health outcomes.

Cross-language research methods can help ensure that those individuals with language barriers have their voices contributing to the evidence informing healthcare practice and policies that shape health services implementation and financing. Understanding the inherent conscious and unconscious biases of those conducting research with this population and how this may emerge in research studies is also an important part of producing rigorous, reliable, and valid cross-language research.

  • This study synthesized methodological recommendations for cross-language research studies with the goal to improve the quality of future research and expand the evidence-base for clinical practice.
  • Clear methodological recommendations were generated that can improve research rigor and quality of cross-language qualitative and quantitative studies.
  • The recommendations generated here have the potential to have an impact on the health and well-being of migrants around the world.

1. INTRODUCTION

Global migration has reached unprecedented levels in human history in the twenty-first century, with 3.3% of the world’s population having migrated internationally and 740 million people have migrated in their own countries ( International Organization for Migration, 2017 ). Medical tourism is also on the rise and language concordance is not always a guarantee between nurses and their patients in those cases ( Kanchanachitra et al., 2011 ; Reitig & Squires, 2015 ). For nurses and other healthcare researchers, migration creates a common challenge for healthcare research: language barriers.

Individuals with language barriers present new opportunities and challenges for researchers seeking to strengthen the evidence-base for clinical nursing practice and education around the world. Research with this population is also critical for understanding health outcomes, how individuals who have moved countries access (or the barriers to) health services and developing and testing effective strategies for health literacy promotion to name a few. Research will also help ensure that individuals with language barriers do not face discrimination in their health systems and subsequently develop costly health disparities.

It is surprising, however, how many researchers do not minimize the threats to research rigor posed by language barriers with their subjects. Patient and health services focused research on language barriers has historically been lacking and limited to a small group of researchers globally, even as incentives for such research have appeared in many countries ( Schwei et al., 2016 ). In the case of nursing, the literature has focused more on researching the language skills (or lack thereof) of internationally educated nurses rather than their patients ( Allan & Westwood, 2016 ; Müller, 2016 ). A cursory search of PubMed and CINAHL reveal that publications linked to nurses about language barriers since the year 2000 number, after removal of duplicates, only 280 of 303. Approximately one third of those are practice-based papers not involving research, editorials, or opinions—the lowest level of the evidence-based practice pyramid. Whilst there may be more publications, inappropriate use of key words may keep many hidden from systematic searches.

In this discussion paper, we draw from an international, interdisciplinary body of research that has explored and successfully addressed methodological challenges of qualitative and quantitative research involving language barriers in health care, known as cross-language research methods. We seek to highlight the key methodological implications of doing research involving language barriers by drawing from methodological developments in the literature since the year 2000 through 20,217—a period representing unprecedented growth in scientific studies and cross-language methodological developments. Both qualitative and quantitative methodological implications are reviewed from selected studies.

2. BACKGROUND

This section provides an overview of two key areas associated with cross-language research in nursing: language barriers and interpreter types. It aims to provide the reader with a basic conceptual understanding of core linguistic principles involved with addressing language barriers in research.

2.1. Language barriers in nursing and health care

Language barriers have been part of nursing practice since the formal inception of the profession in the nineteenth century when Nightingale was caring for soldiers from across Europe during the Crimean War. In the twenty-first century with global migration rates at record levels, language barriers present multiple challenges for health systems delivery ( Bloemraad & Sheares, 2017 ; Czaika & de Haas, 2013 ).

Research is needed to devise the best, context specific strategies for meeting the needs of patients with language barriers. Conducting research in a patient’s preferred language offers the best opportunity to truly capture reliable and valid results representative of their experiences. A preferred language is the person’s “language of the heart”, the one that they want to speak when they feel at their most vulnerable. The conduct of research when a language barrier is involved has two aspects. First, understanding linguistic competency and literacy of participants and planning the study around those factors. The second aspect focuses on addressing the language of health care and research itself, known as language for specific purposes (LSP).

First, to be able to communicate effectively with another person individuals need to have what linguists call discourse competence in a language. That means they can have a conversation with someone relatively easily and do not have to stop and look up words or phrases ( Danesi, 1996 ; Savignon, 1997 ; Squires, 2008 ). For example, immigrant children may have this level of language competence, but they usually lack a more sophisticated understanding of the language since most of the time, they do not receive formal education in their parents’ language. They will not have the vocabulary to speak “health care” either. They are known as “heritage” speakers of a language ( Montrul, 2010 ).

Health care, as we know, is its own language and fits the criteria of a language for a specific purpose ( Hull, 2016 ). LSP is the vernacular of the discipline. The professions all have their own language as do the different realms of the social sciences. When students study to become members of the discipline or profession, part of their socialization is learning the language. Therefore, effective communication with LSP depends on the person’s ability to translate not only the disciplinary vernacular, but also the standard language ( Hull, 2016 ).

We see “failures” of translating LSP with patients who speak our own language when we cannot improve their health literacy. In the case of language translation issues between patients and providers, for example, miscommunication related to translation increases the patient’s risk for hospital readmission, adverse events, and delays in care, to name a few ( De Gagne, Oh, So, & Kim, 2014 ; Dowsey, Broadhead, Stoney, & Choong, 2009 ; Durstenfeld, Ogedegbe, Katz, Park, & Blecker, 2016 ; Karliner, Kim, Meltzer, & Auerbach, 2010 ; Whittal & Lippke, 2016 ). Therefore, the same threats mistranslation poses to patients’ health and well-being will threaten the rigor of research studies involving translation.

2.2. An overview of types of healthcare interpreters and their potential research roles

Interpreters are an important part of mitigating threats to rigor in research. There is a difference, however, between an interpreter and a translator. An Interpreter is a person who conducts “live” interpretation between two people. A Translator is someone who translates text-based documents between the source language and the target language ( Squires, 2008 ; Temple, 2002 ). Qualified interpreters and translators will have had their language skills formally evaluated by an independent source ( Hull, 2016 ). The next sections focus on the roles of interpreters in research with roles of translators discussed specifically in the section focused on research implementation.

To begin, in the case of interpreters, they will play an important role in research data collection and potentially, analysis. Any interpreter contributing to a study ideally will have some experience facilitating research implementation, but that is not always possible ( Squires, 2008 ). There are five types of interpreters or services that can be used in research, each with pros and cons and budgetary implications.

The first type of interpreter we will discuss is the Dual Role Interpreter. This is usually a healthcare provider who has had their language proficiency formally evaluated by an independent source. They may have grown up speaking the language and continued studying it as they progressed in their education or alternatively, learned the language through intensive study or living and working abroad. In many countries, it is common for healthcare providers to speak multiple languages, especially when there are several official languages in a country. Again, it is important to remember that their level of language proficiency for healthcare language may vary.

The advantage of the dual role interpreter (especially if they also have research training) is that their contributions to the study will be informed by their experiences working with patients with language barriers. There is the potential for a more nuanced understanding of experiences or the challenges of measurement with specific populations with language barriers. A dual role interpreter, however, may bring their own set of biases into how results are interpreted from the study because of their experiences. This aspect of interpreter identity, even in quantitative studies, should be factored into study design and discussed in the limitations.

An in-person interpreter is an individual who has received specialized training. In health care, they have learned healthcare vocabulary as part of their training so they can effectively translate health care’s language—a definite advantage for study implementation. For research purposes, these interpreters are the best option if the study will involve communicating complex healthcare information to a study participant, such as might occur in randomized controlled trials.

In-person interpreters in research can also have no healthcare experience, which can provide an advantage for researchers who seek to minimize bias in participant responses in studies involving the patient experience. Like any healthcare worker, healthcare interpreters may have their own biases from interactions with healthcare providers, navigating systems issues and memorable patient interactions. Even though they may try to stay objective, the risk for them inserting bias into the findings increases if this threat to rigor is not managed well.

Technology-based interpreting is the kind that most clinicians are familiar with and it comes in the form of telephone or video-based interpreting services. When research funds are limited, using this service—especially if it is already part of the institution’s resources—can offer the most cost-effective option for conducting research. A threat to rigor from this type of interpreting is that the technology- based interpreting industry is largely unregulated globally and relies on companies to conduct their own internal quality checks of interpreter performance.

Finally, two other ways of bridging language barriers could threaten the rigor of research findings: Online translation services and family members. Online translation services have not yet developed the ability to effectively translate health care’s language. While they can seem like a good way to translate, for example, transcripts in qualitative research, many translation errors will happen—especially with patient interviews because they may often use obscure slang words from their particular dialect that the computer will mistranslate. A qualified translator will still be needed to check the translation, which could still take just as much time as if they did it themselves.

Family members may or may not make for effective interpreters for a research study. Just like during a healthcare encounter where sensitive or culturally taboo topics may emerge; the family member may influence the translation and information provided. Unless the study design includes the family member in it, using independent interpreters presents the best mediating option for threats to rigor from translation by family members. Nonetheless, with the participant’s consent, a family member could help enhance data quality by either improving its precision (in the case of a survey) or helping the participant to remember important experiential details (in the case of qualitative research).

3. DATA SOURCES

Cross-language research refers to research studies where a language barrier is present and data collection must involve the use of interpreters at some stage during the research process ( Croot et al., 2011 ; Squires, 2009 ; Squires et al., 2013 ). A critical factor of crosslanguage research, regardless of methodological approach, is that it must be completed in teams ( Chapple & Ziebland, 2018 ; Esposito, 2001 ; Im et al., 2017 ; Paulus, Jackson, & Davidson, 2017 ; Shordike et al., 2010 ; Stanley & Slattery, 2003 ). Cross-language research cannot be rigorous unless a team was involved because the interpretation of the data would be subject to the individual biases of a single researcher and are likely to be less representative of the population of interest. The team will include the researcher, coinvestigators, project managers and very importantly, interpreters. The following recommendations were drawn from 73 methods articles addressing some dimension of cross-language qualitative or quantitative research published between 2000 and 2017.

4. IMPLICATIONS FOR RESEARCH DESIGN

When the target population for a research study has a language barrier, careful planning is required. In this section, we offer considerations for the design of qualitative and quantitative studies where language barriers are an issue that could threaten the rigor of a study.

4.1. Qualitative research considerations

Cross-language qualitative research has grown extensively since the year 2000. Methods have evolved and several common methodological considerations emerged. Importantly, cross-language researchers uniformly agree that translation poses a threat to the trustworthiness of qualitative data ( Court & Abbas, 2013 ; Esposito, 2001 ; Im et al., 2017 ; Jones & Boyle, 2011 ; Larkin, Dierckx de Casterlé, & Schotsmans, 2007 ; MacKenzie, 2015 ; Temple, 2002 , 2005 ; Temple & Young, 2004 ; Wong & Poon, 2010 ; Xian, 2008 ). Squires (2009) developed criteria from a systematic review of cross-language studies for evaluating how researchers managed translation and then Croot et al. (2011) tested the criteria. The latter concluded that the criteria offered researchers useful direction both with study design and critical appraisal of existing studies, albeit with several caveats related to resources dictating interpreter usage.

Another point of consensus in cross-language research is that interpreter identity matters, with pros and cons for each choice made. Interpreters with translation work experience are uniformly recommended for cross-language studies to minimize the threats to trustworthiness of results posed by translation. The use of students, undergraduate or graduate, for interpreting may create good research socialization opportunities, but could also affect data quality due to their inexperience with both research and translation ( Lincoln, González y González, & Aroztegui Massera, 2016 ).

Interpreter timing during data collection also matters and well-planned studies account for this factor during study design ( Im et al., 2016 ; Santos, Black, & Sandelowski, 2014 ). Timing is rooted in the role design of the interpreter in the study. Researchers may find Role Theory useful in interpreter role design when planning a study ( Lynch, 2007 ; Morgeson, Delaney-Klinger, & Hemingway, 2005 ). For example, a functionalist role for an interpreter means the expectation of the interpreter is to adhere to their essential function: interpretation and translation. This would be defined as the “correct behaviours” that functionalist role theory emphasizes. Timing would be limited to the interview data collection point, transcription (if that is part of their role) and translation of the transcript.

An interpreter in an interactionist role, however, offers flexible boundaries to the interpreter that are less proscribed ( Lynch, 2007 ). Interactionist contributions of the interpreter would include not only interpretation and translation of the data, but also contributions to data analysis ( Squires, 2008 ). An interactionist role also allows the interpreter to integrate their role as a cultural broker between the parties, thereby contributing potential explanations to themes and categories that have emerged in the analysis or by providing culturally appropriate names for them.

Finally, transcription quality is always a critical part of any qualitative research study and this is where translators will help mitigate threats to rigor ( Poland, 1995 ; Tilley, 2003 ). Transcriptions are the final point of interpreter-mediated vulnerability in a research study because the quality of translation will affect the entire data analysis process. Clark, Birkhead, Fernandez, and Egger (2017) offer useful recommendations for quality checking the transcription and translation process. These include two independent checks on translation, hiring professional transcription services, achieving consensus around the translation of culturally unique words and slang phrases with the minimal goal of achieving semantic equivalence and the aspirational goal of conceptual equivalence.

4.2. Quantitative research considerations

Language barriers or patient language preferences can affect any kind of quantitative study. Most observational studies (e.g., cross-sectional, cohort, case control, etc.) and randomized control trials may involve the use of some survey design and methodological consensus has emerged around what constitutes rigorous survey instrument translation. “Big Data”, being newer, means that we do not fully understand where language preference and language barriers manifest themselves in patient outcomes in large datasets. Nonetheless, there are still salient points for discussion even in the early stages of the science.

4.2.1. Survey instruments and translation

There are many instruments with existing reliable and valid translations, or so it may appear. When making the method choice to use an existing translation, researchers should first research the history of the instrument to determine: (a) when it was developed; (b) how it was psychometrically evaluated in its original language; and (d) when and how the translation was completed. Flaws in the process tied with the original instrument translation process will not produce reliable and valid results in the translation. There is also the possibility that factor structures and other psychometric measures may change across cultures and contexts ( Brzyski, Kózka, Squires, & Brzostek, 2016 ; Choi et al., 2009 ; Mallinckrodt & Wang, 2004 ; Yu, Lee, & Woo, 2004 ). Sometimes these changes are not significant, but the researcher must differentiate when they are or are not using the appropriate methods. Just because it has been published does not mean it is a good quality translation. A translator is critical to help evaluate the quality of the survey’s translation during this phase of study planning. Some surveys, like the Maslach Burnout Inventory, have professional translations that are available, protected by copyright and may require a fee for their use in research studies ( Squires et al., 2014 ). Failure to appropriately use this kind of survey translation may place the researcher at risk for copyright violations.

Survey instrument translation appropriateness with specific populations is also affected by nativity and dialects—both of which are associated with social risk factors influencing health outcomes ( National Academies of Sciences Engineering & Medicine, 2017 ). Many countries have multiple official languages where citizens will speak all of them or at least one or two with a high level of proficiency. They may not read or write in the other languages.

Examples of this phenomenon come from every part of the world. In Sub-Saharan Africa, it is common for people to speak their tribal language (which may or may not be written), the language of their former colonizers and other languages common to the economic engine of the country ( Levin, 2006 ). Latin America has 448 indigenous languages spoken there, aside from Spanish, Portuguese, or French. China has the unifying scholarly language of Mandarin, with Cantonese the second most spoken language in the country. Yet each village and region in China can have a sub-dialect that only people from those regions understand ( Aroian, Wu, & Tran, 2005 ; Chidarikire, Cross, Skinner, & Cleary, 2018 ). Former Soviet Union States may still speak Russian but most have reasserted their country’s language as the primary language ( Shpilko, 2006 ). The languages of India and Pakistan remain numerous as well ( Abdelrahim et al., 2017 ). In the Middle East, Arabic is much like Spanish where the version of the language is specific to the country of origin of the person speaking it. Academic Arabic could be read by any educated person, but dialect specificity is important for accurate translation ( Al-Amer, Ramjan, Glew, Darwish, & Salamonson, 2016 ). For survey research, this means that a translation of a survey may not work well if it does not match the participant’s nativity—especially when measuring symptoms, coping strategies and other health related phenomenon where slang and linguistic variation by country become measurement factors.

Nonetheless, consensus has emerged in several areas around appropriate strategies for ensuring the most reliable, valid and culturally appropriate translation of a survey instrument. Flaherty et al. (1988) were one of the first groups to set early criteria for evaluating instrument translations. Their criteria include researchers taking steps during the translation process to ensure conceptual, semantic, technical, content, and construct equivalence. In time, it has also become clear that forward and backward translation alone are insufficient to ensure reliable and valid translations because that process alone cannot meet the five measures of equivalence ( Maneesriwongul & Dixon, 2004 ; Perneger, Leplège, & Etter, 1999 ; Squires et al., 2013 ). Systematic approaches to survey instrument translation, therefore, offer the best option to ensure reliable, valid, and culturally representative translations. Most of these approaches offer some combination of content validity indexing ( Brzostek et al., 2015 ; Liu, Squires, & You, 2011 ; Squires et al., 2012 , 2013 , 2014 ), cognitive interviewing ( Benitez & Padilla, 2014 ; Benitez, Padilla, van de Vijver, & Cuevas, 2018 ; Park, Sha, & Willis, 2016 ; Reeve et al., 2011 ), and interpreter timing ( Cha, Kim, & Erlen, 2007 ; Erkut, 2010 ; Johnson, 2006 ; Sidani, Guruge, Miranda, Ford-Gilboe, & Varcoe, 2010 ; Weeks, Swerissen, & Belfrage, 2007 ; Xian, 2008 ; Yu et al., 2004 ). Finally, conducting pre-data collection evaluations of the survey instrument with the migrant population will not only help determine if they understand the questions being asked in the instrument, but if the wording and literacy level is appropriate for the local population being studied.

4.2.2. Big data, patient language preference, and large dataset observational studies

Whilst there is a huge attraction to use big data to analyse health related issues tied to patient language preference, bigger does not always mean better ( Cohen et al., 2015 ; O’Halloran, Tan, Pham, Bateman, & Vande Moere, 2018 ). In addition to the many known problems that come with working with large datasets in health care, isolating language barrier related relationships and effects have their own challenges in observational studies. First, data generated from administrative files is not collected with the rigor and consistency that aligns with research practices. It requires multiple people and multiple incentives to get it right. Consequently, organizations that do not place a value on capturing patient language preference will likely have poor data to work with as capturing patient language preference data reflects organizational values for caring for migrants.

A logical leap from this issue then would be the use of missing data management techniques. Patient language preference data are often missing, thus presenting unique challenges for data analysis that are often reflective of biases and potentially, prejudices of who entered the data. Thus, missing language preference data are unlikely to be missing at random. For example, research from the UK shows that it is common to have missing race/ethnicity classifications of “white” ( Tippu et al., 2017 ). That missing data, with no language preference recorded, would leave out whites who do not speak English yet still comprise a substantial portion of the migrant population in the country. Other datasets might reflect similar patterns.

Another issue affecting large datasets where researchers want to consider language preference is that there are no standards for recording it in health care in terms of naming languages in EHRs. For example, organizations may have a language preference option of “Chinese”, when in fact there is no such language. Mandarin is the official language of China and Cantonese is a widely spoken dialect in the country that uses the same alphabet. As previously mentioned, China has multiple dialects—some just at the village level —and those are rarely considered for language preference. For older patients who may migrated in the twentieth century, their village dialect may represent the language of their heart.

Lastly, when trying to use large datasets to compare the impact of language preference on patient outcomes, how demographics are measured and have changed over time become important methodological considerations. In societies that were largely linguistically homogenous before the population changed, these differences present unique challenges for healthcare research. Longitudinal studies, therefore, become critical for studying the impact of changes in language preferences of populations over time and how these impact health outcomes and any resulting disparities.

4.3. Budgeting for interpretation and translation

Interpreting and translation can add significant costs to a research study. Costs are determined based on (a) the language being translated; (b) the source and target languages for translation; and (c) the extent of interpreter involvement in the research process.

Straightforward translation of survey instruments or transcription and translation of qualitative interviews have costs dictated by the time involved and the country where the researchers are seeking services. Many professional translation companies contract with translators in other countries to ensure the best quality translation and in some cases, this can save on costs. Country specific translations also ensure that the slang words or other country specific vernacular receives the correct interpretation. Professional translation services, however, are the most expensive.

The alternative is to hire research team members who speak the same language as the target population of the study. This can involve hiring research assistants on a part- or full-time basis and the total costs of their participation will depend on the country’s labour laws or organization’s employment requirements. Funding for the study may limit how long interpreters can be involved with it unless they can contribute to other parts of the study besides data collection. If researchers do want their language concordant staff to remain part of the entire study, they should budget funds to support their involvement for the duration. Depending on the funding source, this may or may not contribute significantly to the study’s total costs.

5. DISCUSSION AND CONCLUSION

Research will play a critical role in helping health systems, workers and patients determine the most effective ways to bridge language barriers between patients, providers and systems in ways that optimize health and system outcomes. Improved research rigor in studies involving language barriers in health care are also needed to create evidence-based policies at the organizational, local, national, and international levels.

All recommendations are made mindful of the possibility that any researcher or member of their team could be subject to ethnocentric assumptions in their work and thus, should operate from a place of being conscious of their own biases when implementing and interpreting studies involving language barriers. Using cross-language research methods to generate better, more rigorous evidence specific to the experiences of people with language barriers is critical for strengthening health care’s evidence base that informs clinical practice and policy.

Acknowledgments

Funding information This paper was informed by research funded by the United States’ Agency for Health Care Research and Quality, R01HS23593.

The authors declare that they have no conflicts of interest.

  • Abdelrahim H, Elnashar M, Khidir A, Killawi A, Hammoud M, Al-Khal AL, & Fetters MD (2017). Patient perspectives on language discordance during healthcare visits: Findings from the extremely high-density multicultural state of Qatar . Journal of Health Communication , 22 ( 4 ), 355–363. 10.1080/10810730.2017.1296507 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Al-Amer R, Ramjan L, Glew P, Darwish M, & Salamonson Y (2016). Language translation challenges with Arabic speakers participating in qualitative research studies . International Journal of Nursing Studies , 54 , 150–157. 10.1016/j.ijnurstu.2015.04.010 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Allan HT, & Westwood S (2016). English language skills requirements for internationally educated nurses working in the care industry: Barriers to UK registration or institutionalised discrimination? International Journal of Nursing Studies , 54 , 1–4. 10.1016/j.ijnurstu.2014.12.006 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Aroian KJ, Wu B, & Tran TV (2005). Health care and social service use among Chinese immigrant elders . Research in Nursing & Health , 28 ( 2 ), 95–105. 10.1002/nur.20069 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Benitez I, & Padilla J-L (2014). Analysis of nonequivalent assessments across different linguistic groups using a mixed methods approach: Understanding the causes of differential item functioning by cognitive interviewing . Journal of Mixed Methods Research , 8 ( 1 ), 52–68. 10.1177/1558689813488245 [ CrossRef ] [ Google Scholar ]
  • Benitez I, Padilla JL, van de Vijver, & Cuevas A (2018). What cognitive interviews tell us about bias in cross-cultural research . Field Methods , 30 ( 4 ), 277–294. 10.1177/1525822X18783961 [ CrossRef ] [ Google Scholar ]
  • Bloemraad I, & Sheares A (2017). Understanding membership in a world of global migration: (How) does citizenship matter? International Migration Review , 51 ( 4 ), 823–867. 10.1111/imre.12354 [ CrossRef ] [ Google Scholar ]
  • Brzostek T, Brzyski P, Kozka M, Squires A, Przewozniak L, Cisek M, … Ogarek M (2015). Research lessons from implementing a national nursing workforce study . International Nursing Review , 62 ( 3 ), 412–420. 10.1111/inr.12191 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Brzyski P, Kozka M, Squires A, & Brzostek T (2016). How factor analysis results may change due to country context . Journal of Nursing Scholarship , 48 ( 6 ), 598–607. 10.1111/jnu.12249 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Cha E-S, Kim KH, & Erlen JA (2007). Translation of scales in cross-cultural research: Issues and techniques . Journal of Advanced Nursing , 58 ( 4 ), 386–395. 10.1111/j.1365-2648.2007.04242.x [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Chapple, & Ziebland S (2018). Methodological and practical issues in cross-national qualitative research: Lessons from the literature and a comparative study of the experiences of people receiving a diagnosis of cancer . Qualitative Health Research , 28 ( 5 ), 789–799. 10.1177/1049732317736284 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Chidarikire S, Cross M, Skinner I, & Cleary M (2018). Navigating nuances of language and meaning: Challenges of cross-language ethnography involving shona speakers living with schizophrenia . Qualitative Health Research , 28 ( 6 ), 927–938. 10.1177/1049732318758645 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Choi BK, Bjorner JB, Ostergren P-O, Clays E, Houtman I, Punnett L, … Karasek R (2009). Cross-language differential item functioning of the job content questionnaire among European countries: The JACE study . International Journal of Behavioral Medicine , 16 ( 2 ), 136–147. 10.1007/s12529-009-9048-2 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Clark L, Birkhead AS, Fernandez C, & Egger MJ (2017). A transcription and translation protocol for sensitive cross-cultural team research . Qualitative Health Research , 27 ( 12 ), 1751–1764. 10.1177/1049732317726761 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Cohen B, Vawdrey DK, Liu J, Caplan D, Furuya EY, Mis FW, & Larson E (2015). Challenges associated with using large data sets for quality assessment and research in clinical settings . Policy, Politics & Nursing Practice , 16 ( 3–4 ), 117–124. 10.1177/1527154415603358 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Court D, & Abbas R (2013). Whose interview is it, anyway? Methodological and ethical challenges of insider-outsider research, multiple languages and dual-researcher cooperation . Qualitative Inquiry , 19 ( 6 ), 480–488. 10.1177/1077800413482102 [ CrossRef ] [ Google Scholar ]
  • Croot EJ, Lees J, Grant G, Barbour RS, Bradby H, Croot EJ, … Poon M-K-L (2011). Evaluating standards in cross-language research: A critique of Squires’ criteria . International Journal of Nursing Studies , 48 ( 8 ), 1002–1011. 10.1016/j.ijnurstu.2011.04.007 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Czaika M, & de Haas H (2013). The globalisation of migration. Has the world really become more migratory? Working Papers 68 , International Migration Institute, Oxford University, Oxford. [ Google Scholar ]
  • Danesi M (1996). Teen talk: What are the implications for second-language teaching? Mosaic , 3 ( 4 ), 1–10. [ Google Scholar ]
  • De Gagne, J. C, Oh J, So A, & Kim S-S (2014). The healthcare experiences of Koreans living in North Carolina: A mixed methods study . Health & Social Care in the Community , 22 ( 4 ), 417–428. 10.1111/hsc.12098 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Dowsey MM, Broadhead ML, Stoney JD, & Choong PF (2009). Outcomes of total knee arthroplasty in English- versus non-English-speaking patients . Journal of Orthopaedic Surgery , 17 ( 3 ), 305–309. 10.1177/230949900901700312 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Durstenfeld MS, Ogedegbe O, Katz SD, Park H, & Blecker S (2016). Racial and ethnic differences in heart failure readmissions and mortality in a large municipal healthcare system . JACC: Heart Failure , 4 ( 11 ), 885–893. 10.1016/jjchf.2016.05.008 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Erkut S (2010). Developing multiple language versions of instruments for intercultural research . Child Development Perspectives , 4 ( 1 ), 19–24. 10.1111/j.1750-8606.2009.00111.x [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Esposito N (2001). From meaning to meaning: The influence of translation techniques on non-English focus group research . Qualitative Health Research , 11 ( 4 ), 568–579. 10.1177/104973201129119217 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Flaherty JA, Gaviria FM, Pathak D, Mitchell T, Wintrob R, Richman JA, & Birz S (1988). Developing instruments for cross-cultural psychiatric research . The Journal of Nervous and Mental Disease , 176 ( 5 ), 257–263. 10.1097/00005053-198805000-00001 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hull M (2016). Medical language proficiency: A discussion of interprofessional language competencies and potential for patient risk . International Journal of Nursing Studies , 54 , 158–172. 10.1016/j.ijnurstu.2015.02.015 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Im E-O, Kim S, Tsai H-M, Nishigaki M, Yeo SA, Chee W, … Mao JJ (2016). Practical issues in multi-lingual research . International Journal of Nursing Studies , 54 , 141–149. 10.1016/j.ijnurstu.2015.02.008 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Im E-O, Lee SJ, Hu Y, Cheng C-Y, Iikura A, Inohara A, … Chee W (2017). The use of multiple languages in a technology-based intervention study: A discussion paper . Applied Nursing Research , 38 , 147–152. 10.1016/j.apnr.2017.10.011 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • International Organization for Migration (2017). World migration report 2018 . Geneva: International Organization for Migration. [ Google Scholar ]
  • Johnson TP (2006). Methods and frameworks for crosscultural measurement . Medical Care , 44 ( 11 Suppl 3 ), S17–S20. 10.1097/01.mlr.0000245424.16482.f1 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Jones EG, & Boyle JS (2011). Working with translators and interpreters in research: Lessons learned . Journal of Transcultural Nursing , 22 ( 2 ), 109–115. 10.1177/1043659610395767 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kanchanachitra C, Lindelow M, Johnston T, Hanvoravongchai P, Lorenzo FM, Huong NL, … dela Rosa JF (2011). Human resources for health in southeast Asia: Shortages, distributional challenges and international trade in health services . Lancet , 377 ( 9767 ), 769–781. 10.1016/S0140-6736(10)62035-1 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Karliner LS, Kim SE, Meltzer DO, & Auerbach AD (2010). Influence of language barriers on outcomes of hospital care for general medicine inpatients . Journal of Hospital Medicine , 5 ( 5 ), 276–282. 10.1002/jhm.658 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Larkin PJ, de Casterlé Dierckx, B., & Schotsmans P (2007). Multilingual translation issues in qualitative research: Reflections on a metaphorical process . Qualitative Health Research , 17 ( 4 ), 468–476. 10.1177/1049732307299258 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Levin ME (2006). Language as a barrier to care for Xhosa-speaking patients at a South African paediatric teaching hospital . South African Medical Journal , 96 ( 10 ), 1076–1079 http://www.ncbi.nlm.nih.gov/pubmed/17164939 . [ PubMed ] [ Google Scholar ]
  • Lincoln YS, González y González, E. M, & Aroztegui Massera C (2016). “ Spanish Is a Loving Tongue … “: Performing Qualitative Research Across Languages and Cultures . Qualitative Inquiry , 22 ( 7 ), 531–540. 10.1177/1077800416636148 [ CrossRef ] [ Google Scholar ]
  • Liu K, Squires A, & You, L.-M. (2011). A pilot study of a systematic method for translating patient satisfaction questionnaires . Journal of Advanced Nursing , 67 ( 5 ), 1012–1021. 10.1111/j.1365-2648.2010.05569.x [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Lynch KD (2007). Modeling role enactment: linking role theory and social cognition . Journal for the Theory of Social Behaviour , 37 ( 4 ), 379–399. 10.1111/j.1468-5914.2007.00349.x [ CrossRef ] [ Google Scholar ]
  • MacKenzie CA (2015). Filtered meaning: Appreciating linguistic skill, social position and subjectivity of interpreters in cross-language research . Qualitative Research , 16 ( 2 ), 167–182. 10.1177/1468794115569564 [ CrossRef ] [ Google Scholar ]
  • Mallinckrodt B, & Wang C-C (2004). Quantitative methods for verifying semantic equivalence of translated research instruments: A Chinese version of the experiences in close relationships scale . Journal of Counseling Psychology , 51 ( 3 ), 368–379. 10.1037/0022-0167.5L3.368 [ CrossRef ] [ Google Scholar ]
  • Maneesriwongul W, & Dixon JK (2004). Instrument translation process: A methods review . Journal of Advanced Nursing , 48 ( 2 ), 175–186. 10.1111/j.1365-2648.2004.03185.x [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Montrul S (2010). Dominant language transfer in adult second language learners and heritage speakers . Second Language Research , 26 ( 3 ), 293–327. 10.1177/0267658310365768 [ CrossRef ] [ Google Scholar ]
  • Morgeson FP, Delaney-Klinger K, & Hemingway MA (2005). The importance of job autonomy, cognitive ability and job- related skill for predicting role breadth and job performance . The Journal of Applied Psychology , 90 ( 2 ), 399–406. 10.1037/0021-9010.90.2.399 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Müller A (2016). Language proficiency and nursing registration . International Journal of Nursing Studies , 54 , 132–140. 10.1016/j.ijnurstu.2015.01.007 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • National Academies of Sciences Engineering and Medicine (2017). Accounting for social risk factors in medicare payment . Kwan LY, Stratton K, & Steinwachs DM, (Eds.). Washington, DC: National Academies Press; 10.17226/23635 [ CrossRef ] [ Google Scholar ]
  • O’Halloran KL, Tan S, Pham D-S, Bateman J, & Vande Moere A (2018). A digital mixed methods research design: Integrating multimodal analysis with data mining and information visualization for big data analytics . Journal of Mixed Methods Research , 12 ( 1 ), 11–30. 10.1177/1558689816651015 [ CrossRef ] [ Google Scholar ]
  • Park H, Sha MM, & Willis G (2016). Influence of english-language proficiency on the cognitive processing of survey questions . Field Methods , 28 ( 4 ), 415–430. 10.1177/1525822X16630262 [ CrossRef ] [ Google Scholar ]
  • Paulus TM, Jackson K, & Davidson J (2017). Digital Tools for Qualitative Research : Disruptions and Entanglements. Qualitative Inquiry , 23 ( 10 ), 751–756. 10.1177/1077800417731080 [ CrossRef ] [ Google Scholar ]
  • Perneger TV, Leplege A, & Etter JF (1999). Cross-cultural adaptation of a psychometric instrument: Two methods compared . Journal of Clinical Epidemiology , 52 ( 11 ), 1037–1046. [ PubMed ] [ Google Scholar ]
  • Poland BD (1995). Transcription quality as an aspect of rigor in qualitative research . Qualitative Inquiry , 1 ( 3 ), 290–310. 10.1177/107780049500100302 [ CrossRef ] [ Google Scholar ]
  • Reeve BB, Willis G, Shariff-Marco SN, Breen N, Williams DR, Gee GC, … Levin KY (2011). Comparing cognitive interviewing and psychometric methods to evaluate a racial/ethnic discrimination scale . Field Methods , 23 ( 4 ), 397–419. https://doi.org/10.n77/1525822Xn416564 [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Reitig V, & Squires A (2015). Building skills in North and Central America: Barriers and policy options toward harmonizing qualifications in nursing . Washington, DC: Migration Policy Institute. [ Google Scholar ]
  • Santos HPO, Black AM, & Sandelowski M (2014). Timing of translation in cross-language qualitative research . Qualitative Health Research , 25 ( 1 ), 134–144. 10.1177/1049732314549603 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Savignon S (1997). Communicative competence: Theory and classroom practice (2nd ed.). New York, NY: McGraw-Hill. [ Google Scholar ]
  • Schwei RJ, Del Pozo S, Agger-Gupta N, Alvarado-Little W, Bagchi A, Chen AH, … Jacobs EA (2016). Changes in research on language barriers in health care since 2003: A cross-sectional review study . International Journal of Nursing Studies , 54 , 36–44. 10.1016/j.ijnurstu.2015.03.001 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Shordike A, Hocking C, Pierce D, Wright-St. Clair V, Vittayakorn S, Rattakorn P, & Bunrayong W (2010). Respecting regional culture in an international multi-site study: A derived etic method . Qualitative Research , 10 ( 3 ), 333–355. 10.1177/1468794109360145 [ CrossRef ] [ Google Scholar ]
  • Shpilko I (2006). Russian-American health care: Bridging the communication gap between physicians and patients . Patient Education and Counseling , 64 ( 1–3 ), 331–341. 10.1016/j.pec.2006.03.014 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Sidani S, Guruge S, Miranda J, Ford-Gilboe M, & Varcoe C (2010). Cultural adaptation and translation of measures: An integrated method . Research in Nursing & Health , 33 ( 2 ), 133–143. 10.1002/nur.20364 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Squires A (2008). Language barriers and qualitative nursing research: Methodological considerations . International Nursing Review , 55 ( 3 ), 265–273. 10.1111/j.1466-7657.2008.00652.x [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Squires A (2009). Methodological challenges in cross-language qualitative research: A research review . International Journal of Nursing Studies , 46 ( 2 ), 277–287. 10.1016/j.ijnurstu.2008.08.006 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Squires A, Aiken LH, van den Heede K, Sermeus W, Bruyneel L, Lindqvist R, … Matthews A (2013). A systematic survey instrument translation process for multi-country, comparative health workforce studies . International Journal of Nursing Studies , 50 ( 2 ), 264–273. 10.1016/j.ijnurstu.2012.02.015 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Squires A, Bruyneel L, Aiken LH, Van den Heede K, Brzostek T, Busse R, Sermeus W (2012). Cross-cultural evaluation of the relevance of the HCAHPS survey in five European countries . International Journal for Quality in Health Care , 24 ( 5 ), 470–475. 10.1093/intqhc/mzs040 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Squires A, Finlayson C, Gerchow L, Cimiotti JP, Matthews A, Schwendimann R, … Sermeus W (2014). Methodological considerations when translating “burnout ”. Burnout Research , 1 ( 2 ), 59–68. 10.1016/j.burn.2014.07.001 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Stanley CA, & Slattery P (2003). Who reveals what to whom? Critical reflections on conducting qualitative inquiry as an interdisciplinary, biracial, male/female research team . Qualitative Inquiry , 9 ( 5 ), 705–728. 10.1177/1077800403253004 [ CrossRef ] [ Google Scholar ]
  • Temple B (2002). Crossed wires: Interpreters, translators and bilingual workers in cross-language research . Qualitative Health Research , 12 ( 6 ), 844–854. [ PubMed ] [ Google Scholar ]
  • Temple B (2005). Nice and tidy: Translation and representation . Sociological Research Online , 10 ( 2 ), 1–10. 10.5153/sro.1058 [ CrossRef ] [ Google Scholar ]
  • Temple B, & Young A (2004). Qualitative research and translation dilemmas . Qualitative Research , 4 ( 2 ), 161–178. 10.1177/1468794104044430 [ CrossRef ] [ Google Scholar ]
  • Tilley SA (2003). “ Challenging” research practices: Turning a critical lens on the work of transcription . Qualitative Inquiry , 9 ( 5 ), 750–773. 10.1177/1077800403255296 [ CrossRef ] [ Google Scholar ]
  • Tippu Z, Correa A, Liyanage H, Burleigh D, McGovern A, Van Vlymen J, … De Lusignan S (2017). Ethnicity recording in primary care computerised medical record systems: An ontological approach . Journal of Innovation in Health Informatics , 23 ( 4 ), 799 10.14236/jhi.v23i4.920 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Weeks A, Swerissen H, & Belfrage J (2007). Issues, challenges and solutions in translating study instruments . Evaluation Review , 31 ( 2 ), 153–165. 10.1177/0193841X06294184 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Whittal A, & Lippke S (2016). Investigating patients with an immigration background in Canada: Relationships between individual immigrant attitudes, the doctor-patient relationship and health outcomes . BMC Public Health , 16 ( 1 ), 23 10.1186/s12889-016-2695-8 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Wong J-P-H, & Poon M-K-L (2010). Bringing translation out of the shadows: Translation as an issue of methodological significance in cross-cultural qualitative research . Journal of Transcultural Nursing , 21 ( 2 ), 151–158. https://doi.org/10.n77/1043659609357637 [ PubMed ] [ Google Scholar ]
  • Xian H (2008). Lost in translation? Language, culture and the roles of translator in cross-cultural management research. Qualitative Research in Organizations and Management : An International Journal , 3 ( 3 ), 231–245. 10.1108/17465640810920304 [ CrossRef ] [ Google Scholar ]
  • Yu DSF, Lee DTF, & Woo J (2004). Issues and challenges of instrument translation . Western Journal of Nursing Research , 26 ( 3 ), 307–320. 10.1177/0193945903260554 [ PubMed ] [ CrossRef ] [ Google Scholar ]

Google sign-in

How to use appropriate academic language in research papers?

Many enter academic writing due to a passion for knowledge or to conduct independent research. Impactful writing in the academic world requires communicating ideas by using appropriate academic language, clarity, and grace. Most forms of writing such as email, texting, letters, etc. make use of informal language. Academic and scientific writing, however, requires the use of appropriate academic language. A good flow of writing leaves a lasting impression on the readers. Thus, it is essential to convey information and views concisely and clearly. This article offers insights on how to improve academic language in a paper.

Click to play

Avoid information overload

Bloated language is not academic language.

This misconception among writers is that supplementary words or sentences add more smartness and show in-depth knowledge about the subject. However, this unnecessary fluffing of words frustrates the reader and makes the content unnecessarily complex. A good academic paper is clear and concise. Wastage of words and repetition are avoided as many journals regulate a word limit. Some examples showing the difference between ‘bloated’ and concise sentences are shown below.

Bloated sentenceReduced sentence
.  
 

Get rid of redundant or wordy phrases

In the English language, there is the presence of many phrases that have multiple words. Although a writer needs to be cognizant of wordiness, a fine line exists between rambling and expressing. Therefore the use of these phrases must be avoided if they can be expressed in lesser words. For example:

Redundant wordsEffective words
In the event thatIf
Pandemic timespandemic
A majority ofMost
Due to the fact thatbecause/since
Subsequent toAfter
Have the capability toCan
A lot of job cutsconsiderable downsizing
With regard toconcerning/about
Keeping the workplace retained in the organizationretain employees  
Most specimens were blue in colourmost specimens were blue  
Roots penetrate into the soil to a depth of 5 metersroots penetrate  

Even, the word “etc.” should not be paired with “including”, “such as”, or “for example” because these words specifically mean that list is indicative and not exhaustive.

Everyday spoken language is not an academic language

Use appropriate academic language that is organized, and logical and avoids a detached tone. Usually, writers adapt to everyday spoken language, but the use of casual language should be avoided.

Casual languageFormal words
Mellow and goodMild-mannered and kind
Get to herAffect her spirit
Current and most suitableEfficient/effective
A lot ofEnhanced/numerous/higher
Get in touch withContact
Give the go-aheadAuthorize/authorize
The person is addicted to workWorkaholic
DisgustingUnpleasant

Use varying sentence structure

Sentence structure represents the physical structure of a sentence. For making an impactful research paper, it is important to use variegated words but avoid their overuse. Furthermore, repeating longer sentences may overshadow the argument and inundate the reader. Even frequent usage of short sentences makes arguments stunted or rushed. Therefore try to keep the length of sentences short and up to 20 words.

Inappropriate sentenceVarying sentence
In actual fact, every single nurse in the hospital worked for treating their patients from 3 am in the morning to twelve at the night.Every nurse in the hospital worked from 3 am to midnight
The mean age value for group A is 30 years while for group B mean age is 26 years, thus, there is a presence of a statistically significant difference between both groups.With the mean age of group A > group B, there is a significant difference between both groups.

Use sufficient referencing and citation

Academic papers to a great extent rely on the review of existing information in the form of secondary research. When reviewing secondary studies, references must be provided. It helps to establish the validity and credibility of the information. Although there is no standard rule for the number of references to use in a study, as a hand rule consider an average, of 1 reference per 100- 150 words.

Today, a vast number of elements contribute towards creating work-life balance in employees. The findings of Yadav & Dabhade (2014) identified rewards, mentally challenging work, favourable working conditions, supportive coworkers, and employee-friendly policies as some of the factors leading to work-life balance and job satisfaction. (Chaitra, Kumar, & Renuka, 2016) also identified some of the factors that impact work-life balance. The study collected data from 60 respondents and applied descriptive statistics and analysis of variance (ANOVA) technique for analyzing the data. It was found that over time, meetings, travelling to work, and training after working hours influence the work-life balance of the employees. (Sharma & Shekhawat, 2017) assessed the association between employee performance and work-life balance in the hotel industry of Rajasthan in India. The main aim of their analysis was to determine the factors affecting the work-life balance of the employees.

Refer to the right sources of secondary information

The type of secondary sources referred to while writing greatly affects the quality of the paper. For example, if you refer more to blogs and web pages then your content is likely to be ambiguous. This is because web pages are written to target the masses. For example –

Use of casual language in blog article which makes academic content less impactful

The above figure shows that the flow of sentences is informal with the usage of first-person pronouns and the exclusion of references to validate the information. Thus, this article is less suitable for academic writing, whereas the below study is a more suitable source of information.

On the other hand, referring to more academic sources such as journal papers and books academic journals are more structured and precise with clear content. It is likely to make your content more technically sound with less redundant information.

Presence of refined language in journal articles helps produce better quality content

Importance of grammar check in academic language

Making grammatical mistakes while writing is common but avoidable. Eliminating grammatical errors improves the quality of the paper. Fortunately, there are many applications available that help to track and correct these errors instantly. Grammarly is one of the most popular and convenient applications. Installing the Grammarly plug-in helps remove mistakes while writing.

The figure below presents an example of the use of Grammarly for grammar and spell check while writing. It shows that the phrase “in the event that” is redundant and can be reduced to “if”.

Grammarly example 1 for this article

Academic writing is an organized and information-oriented work, that requires a clear, concise, powerful, and graceful representation of ideas. Though often due to lack of practice or knowledge, writers tend to make mistakes. However, for impactful study, it is essential to empower yourself with knowledge, mix the sentence structure, and write to express not impress.

  • Lokhande, N., & Gundimeda, H. (2021). MGNREGA: The Guaranteed Refuge for Returning Migrants During COVID-19 Lockdown in India. The Indian Economic Journal . https://doi.org/10.1177/00194662211023848
  • Natali, D., & Terlizzi, A. (2021). The impact of Covid-19 on the future of pensions in the EU. ETUC SociAll Project 2018/08 Thematic Report , 4–7.
  • Spicer, A. (2020). Organizational Culture and COVID-19. Journal of Management Studies , 57 (8), 1737–1740. https://doi.org/10.1111/joms.12625
  • Takyi, P. O., & Bentum-Ennin, I. (2021). The impact of COVID-19 on stock market performance in Africa: A Bayesian structural time series approach. Journal of Economics and Business , 115 (xxxx), 105968. https://doi.org/10.1016/j.jeconbus.2020.105968
  • Priya Chetty

Priya is the co-founder and Managing Partner of Project Guru, a research and analytics firm based in Gurgaon. She is responsible for the human resource planning and operations functions. Her expertise in analytics has been used in a number of service-based industries like education and financial services.

Her foundational educational is from St. Xaviers High School (Mumbai). She also holds MBA degree in Marketing and Finance from the Indian Institute of Planning and Management, Delhi (2008).

Some of the notable projects she has worked on include:

  • Using systems thinking to improve sustainability in operations: A study carried out in Malaysia in partnership with Universiti Kuala Lumpur.
  • Assessing customer satisfaction with in-house doctors of Jiva Ayurveda (a project executed for the company)
  • Predicting the potential impact of green hydrogen microgirds (A project executed for the Government of South Africa)

She is a key contributor to the in-house research platform Knowledge Tank.

She currently holds over 300 citations  from her contributions to the platform.

She has also been a guest speaker at various institutes such as JIMS (Delhi), BPIT (Delhi), and SVU (Tirupati).

I am a master's in Economics from Amity University. Having a keen interest in Econometrics and data analysis, I was a part of the Innovation Project of Daulat Ram College, Delhi University. My core expertise and interest are in environment-related issues. Apart from academics, I love music and exploring new places.

  • Click to share on Twitter (Opens in new window)
  • Click to share on Facebook (Opens in new window)
  • Click to share on LinkedIn (Opens in new window)
  • Click to share on WhatsApp (Opens in new window)
  • Click to share on Telegram (Opens in new window)

Notify me of follow-up comments by email.

1 thought on “How to use appropriate academic language in research papers?”

Proofreading.

Stanford University

Along with Stanford news and stories, show me:

  • Student information
  • Faculty/Staff information

We want to provide announcements, events, leadership messages and resources that are relevant to you. Your selection is stored in a browser cookie which you can remove at any time using “Clear all personalization” below.

Speaking, writing and reading are integral to everyday life, where language is the primary tool for expression and communication. Studying how people use language – what words and phrases they unconsciously choose and combine – can help us better understand ourselves and why we behave the way we do.

Linguistics scholars seek to determine what is unique and universal about the language we use, how it is acquired and the ways it changes over time. They consider language as a cultural, social and psychological phenomenon.

“Understanding why and how languages differ tells about the range of what is human,” said Dan Jurafsky , the Jackson Eli Reynolds Professor in Humanities and chair of the Department of Linguistics in the School of Humanities and Sciences at Stanford . “Discovering what’s universal about languages can help us understand the core of our humanity.”

The stories below represent some of the ways linguists have investigated many aspects of language, including its semantics and syntax, phonetics and phonology, and its social, psychological and computational aspects.

Understanding stereotypes

Stanford linguists and psychologists study how language is interpreted by people. Even the slightest differences in language use can correspond with biased beliefs of the speakers, according to research.

One study showed that a relatively harmless sentence, such as “girls are as good as boys at math,” can subtly perpetuate sexist stereotypes. Because of the statement’s grammatical structure, it implies that being good at math is more common or natural for boys than girls, the researchers said.

Language can play a big role in how we and others perceive the world, and linguists work to discover what words and phrases can influence us, unknowingly.

How well-meaning statements can spread stereotypes unintentionally

New Stanford research shows that sentences that frame one gender as the standard for the other can unintentionally perpetuate biases.

Algorithms reveal changes in stereotypes

New Stanford research shows that, over the past century, linguistic changes in gender and ethnic stereotypes correlated with major social movements and demographic changes in the U.S. Census data.

Exploring what an interruption is in conversation

Stanford doctoral candidate Katherine Hilton found that people perceive interruptions in conversation differently, and those perceptions differ depending on the listener’s own conversational style as well as gender.

Cops speak less respectfully to black community members

Professors Jennifer Eberhardt and Dan Jurafsky, along with other Stanford researchers, detected racial disparities in police officers’ speech after analyzing more than 100 hours of body camera footage from Oakland Police.

How other languages inform our own

People speak roughly 7,000 languages worldwide. Although there is a lot in common among languages, each one is unique, both in its structure and in the way it reflects the culture of the people who speak it.

Jurafsky said it’s important to study languages other than our own and how they develop over time because it can help scholars understand what lies at the foundation of humans’ unique way of communicating with one another.

“All this research can help us discover what it means to be human,” Jurafsky said.

Stanford PhD student documents indigenous language of Papua New Guinea

Fifth-year PhD student Kate Lindsey recently returned to the United States after a year of documenting an obscure language indigenous to the South Pacific nation.

Students explore Esperanto across Europe

In a research project spanning eight countries, two Stanford students search for Esperanto, a constructed language, against the backdrop of European populism.

Chris Manning: How computers are learning to understand language​

A computer scientist discusses the evolution of computational linguistics and where it’s headed next.

Stanford research explores novel perspectives on the evolution of Spanish

Using digital tools and literature to explore the evolution of the Spanish language, Stanford researcher Cuauhtémoc García-García reveals a new historical perspective on linguistic changes in Latin America and Spain.

Language as a lens into behavior

Linguists analyze how certain speech patterns correspond to particular behaviors, including how language can impact people’s buying decisions or influence their social media use.

For example, in one research paper, a group of Stanford researchers examined the differences in how Republicans and Democrats express themselves online to better understand how a polarization of beliefs can occur on social media.

“We live in a very polarized time,” Jurafsky said. “Understanding what different groups of people say and why is the first step in determining how we can help bring people together.”

Analyzing the tweets of Republicans and Democrats

New research by Dora Demszky and colleagues examined how Republicans and Democrats express themselves online in an attempt to understand how polarization of beliefs occurs on social media.

Examining bilingual behavior of children at Texas preschool

A Stanford senior studied a group of bilingual children at a Spanish immersion preschool in Texas to understand how they distinguished between their two languages.

Predicting sales of online products from advertising language

Stanford linguist Dan Jurafsky and colleagues have found that products in Japan sell better if their advertising includes polite language and words that invoke cultural traditions or authority.

Language can help the elderly cope with the challenges of aging, says Stanford professor

By examining conversations of elderly Japanese women, linguist Yoshiko Matsumoto uncovers language techniques that help people move past traumatic events and regain a sense of normalcy.

Natural language processing: state of the art, current trends and challenges

  • Published: 14 July 2022
  • Volume 82 , pages 3713–3744, ( 2023 )

Cite this article

language of a research paper

  • Diksha Khurana 1 ,
  • Aditya Koli 1 ,
  • Kiran Khatter   ORCID: orcid.org/0000-0002-1000-6102 2 &
  • Sukhdev Singh 3  

147k Accesses

320 Citations

34 Altmetric

Explore all metrics

This article has been updated

Natural language processing (NLP) has recently gained much attention for representing and analyzing human language computationally. It has spread its applications in various fields such as machine translation, email spam detection, information extraction, summarization, medical, and question answering etc. In this paper, we first distinguish four phases by discussing different levels of NLP and components of N atural L anguage G eneration followed by presenting the history and evolution of NLP. We then discuss in detail the state of the art presenting the various applications of NLP, current trends, and challenges. Finally, we present a discussion on some available datasets, models, and evaluation metrics in NLP.

Similar content being viewed by others

language of a research paper

A survey on sentiment analysis methods, applications, and challenges

language of a research paper

A survey on large language model based autonomous agents

language of a research paper

Natural Language Processing

Avoid common mistakes on your manuscript.

1 Introduction

A language can be defined as a set of rules or set of symbols where symbols are combined and used for conveying information or broadcasting the information. Since all the users may not be well-versed in machine specific language, N atural Language Processing (NLP) caters those users who do not have enough time to learn new languages or get perfection in it. In fact, NLP is a tract of Artificial Intelligence and Linguistics, devoted to make computers understand the statements or words written in human languages. It came into existence to ease the user’s work and to satisfy the wish to communicate with the computer in natural language, and can be classified into two parts i.e. Natural Language Understanding or Linguistics and Natural Language Generation which evolves the task to understand and generate the text. L inguistics is the science of language which includes Phonology that refers to sound, Morphology word formation, Syntax sentence structure, Semantics syntax and Pragmatics which refers to understanding. Noah Chomsky, one of the first linguists of twelfth century that started syntactic theories, marked a unique position in the field of theoretical linguistics because he revolutionized the area of syntax (Chomsky, 1965) [ 23 ]. Further, Natural Language Generation (NLG) is the process of producing phrases, sentences and paragraphs that are meaningful from an internal representation. The first objective of this paper is to give insights of the various important terminologies of NLP and NLG.

In the existing literature, most of the work in NLP is conducted by computer scientists while various other professionals have also shown interest such as linguistics, psychologists, and philosophers etc. One of the most interesting aspects of NLP is that it adds up to the knowledge of human language. The field of NLP is related with different theories and techniques that deal with the problem of natural language of communicating with the computers. Few of the researched tasks of NLP are Automatic Summarization ( Automatic summarization produces an understandable summary of a set of text and provides summaries or detailed information of text of a known type), Co-Reference Resolution ( Co-reference resolution refers to a sentence or larger set of text that determines all words which refer to the same object), Discourse Analysis ( Discourse analysis refers to the task of identifying the discourse structure of connected text i.e. the study of text in relation to social context),Machine Translation ( Machine translation refers to automatic translation of text from one language to another),Morphological Segmentation ( Morphological segmentation refers to breaking words into individual meaning-bearing morphemes), Named Entity Recognition ( Named entity recognition (NER) is used for information extraction to recognized name entities and then classify them to different classes), Optical Character Recognition ( Optical character recognition (OCR) is used for automatic text recognition by translating printed and handwritten text into machine-readable format), Part Of Speech Tagging ( Part of speech tagging describes a sentence, determines the part of speech for each word) etc. Some of these tasks have direct real-world applications such as Machine translation, Named entity recognition, Optical character recognition etc. Though NLP tasks are obviously very closely interwoven but they are used frequently, for convenience. Some of the tasks such as automatic summarization, co-reference analysis etc. act as subtasks that are used in solving larger tasks. Nowadays NLP is in the talks because of various applications and recent developments although in the late 1940s the term wasn’t even in existence. So, it will be interesting to know about the history of NLP, the progress so far has been made and some of the ongoing projects by making use of NLP. The second objective of this paper focus on these aspects. The third objective of this paper is on datasets, approaches, evaluation metrics and involved challenges in NLP. The rest of this paper is organized as follows. Section 2 deals with the first objective mentioning the various important terminologies of NLP and NLG. Section 3 deals with the history of NLP, applications of NLP and a walkthrough of the recent developments. Datasets used in NLP and various approaches are presented in Section 4 , and Section 5 is written on evaluation metrics and challenges involved in NLP. Finally, a conclusion is presented in Section 6 .

2 Components of NLP

NLP can be classified into two parts i.e., Natural Language Understanding and Natural Language Generation which evolves the task to understand and generate the text. Figure 1 presents the broad classification of NLP. The objective of this section is to discuss the Natural Language Understanding (Linguistic) (NLU) and the Natural Language Generation (NLG) .

figure 1

Broad classification of NLP

NLU enables machines to understand natural language and analyze it by extracting concepts, entities, emotion, keywords etc. It is used in customer care applications to understand the problems reported by customers either verbally or in writing. Linguistics is the science which involves the meaning of language, language context and various forms of the language. So, it is important to understand various important terminologies of NLP and different levels of NLP. We next discuss some of the commonly used terminologies in different levels of NLP.

Phonology is the part of Linguistics which refers to the systematic arrangement of sound. The term phonology comes from Ancient Greek in which the term phono means voice or sound and the suffix –logy refers to word or speech. In 1993 Nikolai Trubetzkoy stated that Phonology is “the study of sound pertaining to the system of language” whereas Lass1998 [ 66 ]wrote that phonology refers broadly with the sounds of language, concerned with sub-discipline of linguistics, behavior and organization of sounds. Phonology includes semantic use of sound to encode meaning of any Human language.

The different parts of the word represent the smallest units of meaning known as Morphemes. Morphology which comprises Nature of words, are initiated by morphemes. An example of Morpheme could be, the word precancellation can be morphologically scrutinized into three separate morphemes: the prefix pre , the root cancella , and the suffix -tion . The interpretation of morphemes stays the same across all the words, just to understand the meaning humans can break any unknown word into morphemes. For example, adding the suffix –ed to a verb, conveys that the action of the verb took place in the past. The words that cannot be divided and have meaning by themselves are called Lexical morpheme (e.g.: table, chair). The words (e.g. -ed, −ing, −est, −ly, −ful) that are combined with the lexical morpheme are known as Grammatical morphemes (eg. Worked, Consulting, Smallest, Likely, Use). The Grammatical morphemes that occur in combination called bound morphemes (eg. -ed, −ing) Bound morphemes can be divided into inflectional morphemes and derivational morphemes. Adding Inflectional morphemes to a word changes the different grammatical categories such as tense, gender, person, mood, aspect, definiteness and animacy. For example, addition of inflectional morphemes –ed changes the root park to parked . Derivational morphemes change the semantic meaning of the word when it is combined with that word. For example, in the word normalize, the addition of the bound morpheme –ize to the root normal changes the word from an adjective ( normal ) to a verb ( normalize ).

In Lexical, humans, as well as NLP systems, interpret the meaning of individual words. Sundry types of processing bestow to word-level understanding – the first of these being a part-of-speech tag to each word. In this processing, words that can act as more than one part-of-speech are assigned the most probable part-of-speech tag based on the context in which they occur. At the lexical level, Semantic representations can be replaced by the words that have one meaning. In fact, in the NLP system the nature of the representation varies according to the semantic theory deployed. Therefore, at lexical level, analysis of structure of words is performed with respect to their lexical meaning and PoS. In this analysis, text is divided into paragraphs, sentences, and words. Words that can be associated with more than one PoS are aligned with the most likely PoS tag based on the context in which they occur. At lexical level, semantic representation can also be replaced by assigning the correct POS tag which improves the understanding of the intended meaning of a sentence. It is used for cleaning and feature extraction using various techniques such as removal of stop words, stemming, lemmatization etc. Stop words such as ‘ in ’, ‘the’, ‘and’ etc. are removed as they don’t contribute to any meaningful interpretation and their frequency is also high which may affect the computation time. Stemming is used to stem the words of the text by removing the suffix of a word to obtain its root form. For example: consulting and consultant words are converted to the word consult after stemming, using word gets converted to us and driver is reduced to driv . Lemmatization does not remove the suffix of a word; in fact, it results in the source word with the use of a vocabulary. For example, in case of token drived , stemming results in “driv”, whereas lemmatization attempts to return the correct basic form either drive or drived depending on the context it is used.

After PoS tagging done at lexical level, words are grouped to phrases and phrases are grouped to form clauses and then phrases are combined to sentences at syntactic level. It emphasizes the correct formation of a sentence by analyzing the grammatical structure of the sentence. The output of this level is a sentence that reveals structural dependency between words. It is also known as parsing which uncovers the phrases that convey more meaning in comparison to the meaning of individual words. Syntactic level examines word order, stop-words, morphology and PoS of words which lexical level does not consider. Changing word order will change the dependency among words and may also affect the comprehension of sentences. For example, in the sentences “ram beats shyam in a competition” and “shyam beats ram in a competition”, only syntax is different but convey different meanings [ 139 ]. It retains the stopwords as removal of them changes the meaning of the sentence. It doesn’t support lemmatization and stemming because converting words to its basic form changes the grammar of the sentence. It focuses on identification on correct PoS of sentences. For example: in the sentence “frowns on his face”, “frowns” is a noun whereas it is a verb in the sentence “he frowns”.

On a semantic level, the most important task is to determine the proper meaning of a sentence. To understand the meaning of a sentence, human beings rely on the knowledge about language and the concepts present in that sentence, but machines can’t count on these techniques. Semantic processing determines the possible meanings of a sentence by processing its logical structure to recognize the most relevant words to understand the interactions among words or different concepts in the sentence. For example, it understands that a sentence is about “movies” even if it doesn’t comprise actual words, but it contains related concepts such as “actor”, “actress”, “dialogue” or “script”. This level of processing also incorporates the semantic disambiguation of words with multiple senses (Elizabeth D. Liddy, 2001) [ 68 ]. For example, the word “bark” as a noun can mean either as a sound that a dog makes or outer covering of the tree. The semantic level examines words for their dictionary interpretation or interpretation is derived from the context of the sentence. For example: the sentence “Krishna is good and noble.” This sentence is either talking about Lord Krishna or about a person “Krishna”. That is why, to get the proper meaning of the sentence, the appropriate interpretation is considered by looking at the rest of the sentence [ 44 ].

While syntax and semantics level deal with sentence-length units, the discourse level of NLP deals with more than one sentence. It deals with the analysis of logical structure by making connections among words and sentences that ensure its coherence. It focuses on the properties of the text that convey meaning by interpreting the relations between sentences and uncovering linguistic structures from texts at several levels (Liddy,2001) [ 68 ]. The two of the most common levels are: Anaphora Resolution an d Coreference Resolution. Anaphora resolution is achieved by recognizing the entity referenced by an anaphor to resolve the references used within the text with the same sense. For example, (i) Ram topped in the class. (ii) He was intelligent. Here i) and ii) together form a discourse. Human beings can quickly understand that the pronoun “he” in (ii) refers to “Ram” in (i). The interpretation of “He” depends on another word “Ram” presented earlier in the text. Without determining the relationship between these two structures, it would not be possible to decide why Ram topped the class and who was intelligent. Coreference resolution is achieved by finding all expressions that refer to the same entity in a text. It is an important step in various NLP applications that involve high-level NLP tasks such as document summarization, information extraction etc. In fact, anaphora is encoded through one of the processes called co-reference.

Pragmatic level focuses on the knowledge or content that comes from the outside the content of the document. It deals with what speaker implies and what listener infers. In fact, it analyzes the sentences that are not directly spoken. Real-world knowledge is used to understand what is being talked about in the text. By analyzing the context, meaningful representation of the text is derived. When a sentence is not specific and the context does not provide any specific information about that sentence, Pragmatic ambiguity arises (Walton, 1996) [ 143 ]. Pragmatic ambiguity occurs when different persons derive different interpretations of the text, depending on the context of the text. The context of a text may include the references of other sentences of the same document, which influence the understanding of the text and the background knowledge of the reader or speaker, which gives a meaning to the concepts expressed in that text. Semantic analysis focuses on literal meaning of the words, but pragmatic analysis focuses on the inferred meaning that the readers perceive based on their background knowledge. For example, the sentence “Do you know what time is it?” is interpreted to “Asking for the current time” in semantic analysis whereas in pragmatic analysis, the same sentence may refer to “expressing resentment to someone who missed the due time” in pragmatic analysis. Thus, semantic analysis is the study of the relationship between various linguistic utterances and their meanings, but pragmatic analysis is the study of context which influences our understanding of linguistic expressions. Pragmatic analysis helps users to uncover the intended meaning of the text by applying contextual background knowledge.

The goal of NLP is to accommodate one or more specialties of an algorithm or system. The metric of NLP assess on an algorithmic system allows for the integration of language understanding and language generation. It is even used in multilingual event detection. Rospocher et al. [ 112 ] purposed a novel modular system for cross-lingual event extraction for English, Dutch, and Italian Texts by using different pipelines for different languages. The system incorporates a modular set of foremost multilingual NLP tools. The pipeline integrates modules for basic NLP processing as well as more advanced tasks such as cross-lingual named entity linking, semantic role labeling and time normalization. Thus, the cross-lingual framework allows for the interpretation of events, participants, locations, and time, as well as the relations between them. Output of these individual pipelines is intended to be used as input for a system that obtains event centric knowledge graphs. All modules take standard input, to do some annotation, and produce standard output which in turn becomes the input for the next module pipelines. Their pipelines are built as a data centric architecture so that modules can be adapted and replaced. Furthermore, modular architecture allows for different configurations and for dynamic distribution.

Ambiguity is one of the major problems of natural language which occurs when one sentence can lead to different interpretations. This is usually faced in syntactic, semantic, and lexical levels. In case of syntactic level ambiguity, one sentence can be parsed into multiple syntactical forms. Semantic ambiguity occurs when the meaning of words can be misinterpreted. Lexical level ambiguity refers to ambiguity of a single word that can have multiple assertions. Each of these levels can produce ambiguities that can be solved by the knowledge of the complete sentence. The ambiguity can be solved by various methods such as Minimizing Ambiguity, Preserving Ambiguity, Interactive Disambiguation and Weighting Ambiguity [ 125 ]. Some of the methods proposed by researchers to remove ambiguity is preserving ambiguity, e.g. (Shemtov 1997; Emele & Dorna 1998; Knight & Langkilde 2000; Tong Gao et al. 2015, Umber & Bajwa 2011) [ 39 , 46 , 65 , 125 , 139 ]. Their objectives are closely in line with removal or minimizing ambiguity. They cover a wide range of ambiguities and there is a statistical element implicit in their approach.

Natural Language Generation (NLG) is the process of producing phrases, sentences and paragraphs that are meaningful from an internal representation. It is a part of Natural Language Processing and happens in four phases: identifying the goals, planning on how goals may be achieved by evaluating the situation and available communicative sources and realizing the plans as a text (Fig. 2 ). It is opposite to Understanding.

Speaker and Generator

figure 2

Components of NLG

To generate a text, we need to have a speaker or an application and a generator or a program that renders the application’s intentions into a fluent phrase relevant to the situation.

Components and Levels of Representation

The process of language generation involves the following interweaved tasks. Content selection: Information should be selected and included in the set. Depending on how this information is parsed into representational units, parts of the units may have to be removed while some others may be added by default. Textual Organization : The information must be textually organized according to the grammar, it must be ordered both sequentially and in terms of linguistic relations like modifications. Linguistic Resources : To support the information’s realization, linguistic resources must be chosen. In the end these resources will come down to choices of particular words, idioms, syntactic constructs etc. Realization : The selected and organized resources must be realized as an actual text or voice output.

Application or Speaker

This is only for maintaining the model of the situation. Here the speaker just initiates the process doesn’t take part in the language generation. It stores the history, structures the content that is potentially relevant and deploys a representation of what it knows. All these forms the situation, while selecting subset of propositions that speaker has. The only requirement is the speaker must make sense of the situation [ 91 ].

3 NLP: Then and now

In the late 1940s the term NLP wasn’t in existence, but the work regarding machine translation (MT) had started. In fact, Research in this period was not completely localized. Russian and English were the dominant languages for MT (Andreev,1967) [ 4 ]. In fact, MT/NLP research almost died in 1966 according to the ALPAC report, which concluded that MT is going nowhere. But later, some MT production systems were providing output to their customers (Hutchins, 1986) [ 60 ]. By this time, work on the use of computers for literary and linguistic studies had also started. As early as 1960, signature work influenced by AI began, with the BASEBALL Q-A systems (Green et al., 1961) [ 51 ]. LUNAR (Woods,1978) [ 152 ] and Winograd SHRDLU were natural successors of these systems, but they were seen as stepped-up sophistication, in terms of their linguistic and their task processing capabilities. There was a widespread belief that progress could only be made on the two sides, one is ARPA Speech Understanding Research (SUR) project (Lea, 1980) and other in some major system developments projects building database front ends. The front-end projects (Hendrix et al., 1978) [ 55 ] were intended to go beyond LUNAR in interfacing the large databases. In early 1980s computational grammar theory became a very active area of research linked with logics for meaning and knowledge’s ability to deal with the user’s beliefs and intentions and with functions like emphasis and themes.

By the end of the decade the powerful general purpose sentence processors like SRI’s Core Language Engine (Alshawi,1992) [ 2 ] and Discourse Representation Theory (Kamp and Reyle,1993) [ 62 ] offered a means of tackling more extended discourse within the grammatico-logical framework. This period was one of the growing communities. Practical resources, grammars, and tools and parsers became available (for example: Alvey Natural Language Tools) (Briscoe et al., 1987) [ 18 ]. The (D)ARPA speech recognition and message understanding (information extraction) conferences were not only for the tasks they addressed but for the emphasis on heavy evaluation, starting a trend that became a major feature in 1990s (Young and Chase, 1998; Sundheim and Chinchor,1993) [ 131 , 157 ]. Work on user modeling (Wahlster and Kobsa, 1989) [ 142 ] was one strand in a research paper. Cohen et al. (2002) [ 28 ] had put forwarded a first approximation of a compositional theory of tune interpretation, together with phonological assumptions on which it is based and the evidence from which they have drawn their proposals. At the same time, McKeown (1985) [ 85 ] demonstrated that rhetorical schemas could be used for producing both linguistically coherent and communicatively effective text. Some research in NLP marked important topics for future like word sense disambiguation (Small et al., 1988) [ 126 ] and probabilistic networks, statistically colored NLP, the work on the lexicon, also pointed in this direction. Statistical language processing was a major thing in 90s (Manning and Schuetze,1999) [ 75 ], because this not only involves data analysts. Information extraction and automatic summarizing (Mani and Maybury,1999) [ 74 ] was also a point of focus. Next, we present a walkthrough of the developments from the early 2000.

3.1 A walkthrough of recent developments in NLP

The main objectives of NLP include interpretation, analysis, and manipulation of natural language data for the intended purpose with the use of various algorithms, tools, and methods. However, there are many challenges involved which may depend upon the natural language data under consideration, and so makes it difficult to achieve all the objectives with a single approach. Therefore, the development of different tools and methods in the field of NLP and relevant areas of studies have received much attention from several researchers in the recent past. The developments can be seen in the Fig.  3 :

figure 3

A walkthrough of recent developments in NLP

In early 2000, neural language modeling in which the probability of occurring of next word (token) is determined given n previous words. Bendigo et al. [ 12 ] proposed the concept of feed forward neural network and lookup table which represents the n previous words in sequence. Collobert et al. [ 29 ] proposed the application of multitask learning in the field of NLP, where two convolutional models with max pooling were used to perform parts-of-speech and named entity recognition tagging. Mikolov et.al. [ 87 ] proposed a word embedding process where the dense vector representation of text was addressed. They also report the challenges faced by traditional sparse bag-of-words representation. After the advancement of word embedding, neural networks were introduced in the field of NLP where variable length input is taken for further processing. Sutskever et al. [ 132 ] proposed a general framework for sequence-to-sequence mapping where encoder and decoder networks are used to map from sequence to vector and vector to sequence respectively. In fact, the use of neural networks have played a very important role in NLP. One can observe from the existing literature that enough use of neural networks was not there in the early 2000s but till the year 2013enough discussion had happened about the use of neural networks in the field of NLP which transformed many things and further paved the way to implement various neural networks in NLP. Earlier the use of Convolutional neural networks ( CNN ) contributed to the field of image classification and analyzing visual imagery for further analysis. Later the use of CNNs can be observed in tackling problems associated with NLP tasks like Sentence Classification [ 127 ], Sentiment Analysis [ 135 ], Text Classification [ 118 ], Text Summarization [ 158 ], Machine Translation [ 70 ] and Answer Relations [ 150 ] . An article by Newatia (2019) [ 93 ] illustrates the general architecture behind any CNN model, and how it can be used in the context of NLP. One can also refer to the work of Wang and Gang [ 145 ] for the applications of CNN in NLP. Further Neural Networks those are recurrent in nature due to performing the same function for every data, also known as Recurrent Neural Networks (RNNs), have also been used in NLP, and found ideal for sequential data such as text, time series, financial data, speech, audio, video among others, see article by Thomas (2019) [ 137 ]. One of the modified versions of RNNs is Long Short-Term Memory (LSTM) which is also very useful in the cases where only the desired important information needs to be retained for a much longer time discarding the irrelevant information, see [ 52 , 58 ]. Further development in the LSTM has also led to a slightly simpler variant, called the gated recurrent unit (GRU), which has shown better results than standard LSTMs in many tasks [ 22 , 26 ]. Attention mechanisms [ 7 ] which suggest a network to learn what to pay attention to in accordance with the current hidden state and annotation together with the use of transformers have also made a significant development in NLP, see [ 141 ]. It is to be noticed that Transformers have a potential of learning longer-term dependency but are limited by a fixed-length context in the setting of language modeling. In this direction recently Dai et al. [ 30 ] proposed a novel neural architecture Transformer-XL (XL as extra-long) which enables learning dependencies beyond a fixed length of words. Further the work of Rae et al. [ 104 ] on the Compressive Transformer, an attentive sequence model which compresses memories for long-range sequence learning, may be helpful for the readers. One may also refer to the recent work by Otter et al. [ 98 ] on uses of Deep Learning for NLP, and relevant references cited therein. The use of BERT (Bidirectional Encoder Representations from Transformers) [ 33 ] model and successive models have also played an important role for NLP.

Many researchers worked on NLP, building tools and systems which makes NLP what it is today. Tools like Sentiment Analyser, Parts of Speech (POS) Taggers, Chunking, Named Entity Recognitions (NER), Emotion detection, Semantic Role Labeling have a huge contribution made to NLP, and are good topics for research. Sentiment analysis (Nasukawaetal.,2003) [ 156 ] works by extracting sentiments about a given topic, and it consists of a topic specific feature term extraction, sentiment extraction, and association by relationship analysis. It utilizes two linguistic resources for the analysis: the sentiment lexicon and the sentiment pattern database. It analyzes the documents for positive and negative words and tries to give ratings on scale −5 to +5. The mainstream of currently used tagsets is obtained from English. The most widely used tagsets as standard guidelines are designed for Indo-European languages but it is less researched on Asian languages or middle- eastern languages. Various authors have done research on making parts of speech taggers for various languages such as Arabic (Zeroual et al., 2017) [ 160 ], Sanskrit (Tapswi & Jain, 2012) [ 136 ], Hindi (Ranjan & Basu, 2003) [ 105 ] to efficiently tag and classify words as nouns, adjectives, verbs etc. Authors in [ 136 ] have used treebank technique for creating rule-based POS Tagger for Sanskrit Language. Sanskrit sentences are parsed to assign the appropriate tag to each word using suffix stripping algorithm, wherein the longest suffix is searched from the suffix table and tags are assigned. Diab et al. (2004) [ 34 ] used supervised machine learning approach and adopted Support Vector Machines (SVMs) which were trained on the Arabic Treebank to automatically tokenize parts of speech tag and annotate base phrases in Arabic text.

Chunking is a process of separating phrases from unstructured text. Since simple tokens may not represent the actual meaning of the text, it is advisable to use phrases such as “North Africa” as a single word instead of ‘North’ and ‘Africa’ separate words. Chunking known as “Shadow Parsing” labels parts of sentences with syntactic correlated keywords like Noun Phrase (NP) and Verb Phrase (VP). Chunking is often evaluated using the CoNLL 2000 shared task. Various researchers (Sha and Pereira, 2003; McDonald et al., 2005; Sun et al., 2008) [ 83 , 122 , 130 ] used CoNLL test data for chunking and used features composed of words, POS tags, and tags.

There are particular words in the document that refer to specific entities or real-world objects like location, people, organizations etc. To find the words which have a unique context and are more informative, noun phrases are considered in the text documents. Named entity recognition (NER) is a technique to recognize and separate the named entities and group them under predefined classes. But in the era of the Internet, where people use slang not the traditional or standard English which cannot be processed by standard natural language processing tools. Ritter (2011) [ 111 ] proposed the classification of named entities in tweets because standard NLP tools did not perform well on tweets. They re-built NLP pipeline starting from PoS tagging, then chunking for NER. It improved the performance in comparison to standard NLP tools.

Emotion detection investigates and identifies the types of emotion from speech, facial expressions, gestures, and text. Sharma (2016) [ 124 ] analyzed the conversations in Hinglish means mix of English and Hindi languages and identified the usage patterns of PoS. Their work was based on identification of language and POS tagging of mixed script. They tried to detect emotions in mixed script by relating machine learning and human knowledge. They have categorized sentences into 6 groups based on emotions and used TLBO technique to help the users in prioritizing their messages based on the emotions attached with the message. Seal et al. (2020) [ 120 ] proposed an efficient emotion detection method by searching emotional words from a pre-defined emotional keyword database and analyzing the emotion words, phrasal verbs, and negation words. Their proposed approach exhibited better performance than recent approaches.

Semantic Role Labeling (SRL) works by giving a semantic role to a sentence. For example, in the PropBank (Palmer et al., 2005) [ 100 ] formalism, one assigns roles to words that are arguments of a verb in the sentence. The precise arguments depend on the verb frame and if multiple verbs exist in a sentence, it might have multiple tags. State-of-the-art SRL systems comprise several stages: creating a parse tree, identifying which parse tree nodes represent the arguments of a given verb, and finally classifying these nodes to compute the corresponding SRL tags.

Event discovery in social media feeds (Benson et al.,2011) [ 13 ], using a graphical model to analyze any social media feeds to determine whether it contains the name of a person or name of a venue, place, time etc. The model operates on noisy feeds of data to extract records of events by aggregating multiple information across multiple messages, despite the noise of irrelevant noisy messages and very irregular message language, this model was able to extract records with a broader array of features on factors.

We first give insights on some of the mentioned tools and relevant work done before moving to the broad applications of NLP.

3.2 Applications of NLP

Natural Language Processing can be applied into various areas like Machine Translation, Email Spam detection, Information Extraction, Summarization, Question Answering etc. Next, we discuss some of the areas with the relevant work done in those directions.

Machine Translation

As most of the world is online, the task of making data accessible and available to all is a challenge. Major challenge in making data accessible is the language barrier. There are a multitude of languages with different sentence structure and grammar. Machine Translation is generally translating phrases from one language to another with the help of a statistical engine like Google Translate. The challenge with machine translation technologies is not directly translating words but keeping the meaning of sentences intact along with grammar and tenses. The statistical machine learning gathers as many data as they can find that seems to be parallel between two languages and they crunch their data to find the likelihood that something in Language A corresponds to something in Language B. As for Google, in September 2016, announced a new machine translation system based on artificial neural networks and Deep learning. In recent years, various methods have been proposed to automatically evaluate machine translation quality by comparing hypothesis translations with reference translations. Examples of such methods are word error rate, position-independent word error rate (Tillmann et al., 1997) [ 138 ], generation string accuracy (Bangalore et al., 2000) [ 8 ], multi-reference word error rate (Nießen et al., 2000) [ 95 ], BLEU score (Papineni et al., 2002) [ 101 ], NIST score (Doddington, 2002) [ 35 ] All these criteria try to approximate human assessment and often achieve an astonishing degree of correlation to human subjective evaluation of fluency and adequacy (Papineni et al., 2001; Doddington, 2002) [ 35 , 101 ].

Text Categorization

Categorization systems input a large flow of data like official documents, military casualty reports, market data, newswires etc. and assign them to predefined categories or indices. For example, The Carnegie Group’s Construe system (Hayes, 1991) [ 54 ], inputs Reuters articles and saves much time by doing the work that is to be done by staff or human indexers. Some companies have been using categorization systems to categorize trouble tickets or complaint requests and routing to the appropriate desks. Another application of text categorization is email spam filters. Spam filters are becoming important as the first line of defence against the unwanted emails. A false negative and false positive issue of spam filters is at the heart of NLP technology, it has brought down the challenge of extracting meaning from strings of text. A filtering solution that is applied to an email system uses a set of protocols to determine which of the incoming messages are spam; and which are not. There are several types of spam filters available. Content filters : Review the content within the message to determine whether it is spam or not. Header filters : Review the email header looking for fake information. General Blacklist filters : Stop all emails from blacklisted recipients. Rules Based Filters : It uses user-defined criteria. Such as stopping mails from a specific person or stopping mail including a specific word. Permission Filters : Require anyone sending a message to be pre-approved by the recipient. Challenge Response Filters : Requires anyone sending a message to enter a code to gain permission to send email.

Spam Filtering

It works using text categorization and in recent times, various machine learning techniques have been applied to text categorization or Anti-Spam Filtering like Rule Learning (Cohen 1996) [ 27 ], Naïve Bayes (Sahami et al., 1998; Androutsopoulos et al., 2000; Rennie.,2000) [ 5 , 109 , 115 ],Memory based Learning (Sakkiset al.,2000b) [ 117 ], Support vector machines (Druker et al., 1999) [ 36 ], Decision Trees (Carreras and Marquez, 2001) [ 19 ], Maximum Entropy Model (Berger et al. 1996) [ 14 ], Hash Forest and a rule encoding method (T. Xia, 2020) [ 153 ], sometimes combining different learners (Sakkis et al., 2001) [ 116 ]. Using these approaches is better as classifier is learned from training data rather than making by hand. The naïve bayes is preferred because of its performance despite its simplicity (Lewis, 1998) [ 67 ] In Text Categorization two types of models have been used (McCallum and Nigam, 1998) [ 77 ]. Both modules assume that a fixed vocabulary is present. But in first model a document is generated by first choosing a subset of vocabulary and then using the selected words any number of times, at least once irrespective of order. This is called Multi-variate Bernoulli model. It takes the information of which words are used in a document irrespective of number of words and order. In second model, a document is generated by choosing a set of word occurrences and arranging them in any order. This model is called multi-nomial model, in addition to the Multi-variate Bernoulli model, it also captures information on how many times a word is used in a document. Most text categorization approaches to anti-spam Email filtering have used multi variate Bernoulli model (Androutsopoulos et al., 2000) [ 5 ] [ 15 ].

Information Extraction

Information extraction is concerned with identifying phrases of interest of textual data. For many applications, extracting entities such as names, places, events, dates, times, and prices is a powerful way of summarizing the information relevant to a user’s needs. In the case of a domain specific search engine, the automatic identification of important information can increase accuracy and efficiency of a directed search. There is use of hidden Markov models (HMMs) to extract the relevant fields of research papers. These extracted text segments are used to allow searched over specific fields and to provide effective presentation of search results and to match references to papers. For example, noticing the pop-up ads on any websites showing the recent items you might have looked on an online store with discounts. In Information Retrieval two types of models have been used (McCallum and Nigam, 1998) [ 77 ]. Both modules assume that a fixed vocabulary is present. But in first model a document is generated by first choosing a subset of vocabulary and then using the selected words any number of times, at least once without any order. This is called Multi-variate Bernoulli model. It takes the information of which words are used in a document irrespective of number of words and order. In second model, a document is generated by choosing a set of word occurrences and arranging them in any order. This model is called multi-nominal model, in addition to the Multi-variate Bernoulli model, it also captures information on how many times a word is used in a document.

Discovery of knowledge is becoming important areas of research over the recent years. Knowledge discovery research use a variety of techniques to extract useful information from source documents like Parts of Speech (POS) tagging , Chunking or Shadow Parsing , Stop-words (Keywords that are used and must be removed before processing documents), Stemming (Mapping words to some base for, it has two methods, dictionary-based stemming and Porter style stemming (Porter, 1980) [ 103 ]. Former one has higher accuracy but higher cost of implementation while latter has lower implementation cost and is usually insufficient for IR). Compound or Statistical Phrases (Compounds and statistical phrases index multi token units instead of single tokens.) Word Sense Disambiguation (Word sense disambiguation is the task of understanding the correct sense of a word in context. When used for information retrieval, terms are replaced by their senses in the document vector.)

The extracted information can be applied for a variety of purposes, for example to prepare a summary, to build databases, identify keywords, classifying text items according to some pre-defined categories etc. For example, CONSTRUE, it was developed for Reuters, that is used in classifying news stories (Hayes, 1992) [ 54 ]. It has been suggested that many IE systems can successfully extract terms from documents, acquiring relations between the terms is still a difficulty. PROMETHEE is a system that extracts lexico-syntactic patterns relative to a specific conceptual relation (Morin,1999) [ 89 ]. IE systems should work at many levels, from word recognition to discourse analysis at the level of the complete document. An application of the Blank Slate Language Processor (BSLP) ( Bondale et al., 1999) [ 16 ] approach for the analysis of a real-life natural language corpus that consists of responses to open-ended questionnaires in the field of advertising.

There is a system called MITA (Metlife’s Intelligent Text Analyzer) (Glasgow et al. (1998) [ 48 ]) that extracts information from life insurance applications. Ahonen et al. (1998) [ 1 ] suggested a mainstream framework for text mining that uses pragmatic and discourse level analyses of text .

Summarization

Overload of information is the real thing in this digital age, and already our reach and access to knowledge and information exceeds our capacity to understand it. This trend is not slowing down, so an ability to summarize the data while keeping the meaning intact is highly required. This is important not just allowing us the ability to recognize the understand the important information for a large set of data, it is used to understand the deeper emotional meanings; For example, a company determines the general sentiment on social media and uses it on their latest product offering. This application is useful as a valuable marketing asset.

The types of text summarization depends on the basis of the number of documents and the two important categories are single document summarization and multi document summarization (Zajic et al. 2008 [ 159 ]; Fattah and Ren 2009 [ 43 ]).Summaries can also be of two types: generic or query-focused (Gong and Liu 2001 [ 50 ]; Dunlavy et al. 2007 [ 37 ]; Wan 2008 [ 144 ]; Ouyang et al. 2011 [ 99 ]).Summarization task can be either supervised or unsupervised (Mani and Maybury 1999 [ 74 ]; Fattah and Ren 2009 [ 43 ]; Riedhammer et al. 2010 [ 110 ]). Training data is required in a supervised system for selecting relevant material from the documents. Large amount of annotated data is needed for learning techniques. Few techniques are as follows–

Bayesian Sentence based Topic Model (BSTM) uses both term-sentences and term document associations for summarizing multiple documents. (Wang et al. 2009 [ 146 ])

Factorization with Given Bases (FGB) is a language model where sentence bases are the given bases and it utilizes document-term and sentence term matrices. This approach groups and summarizes the documents simultaneously. (Wang et al. 2011) [ 147 ])

Topic Aspect-Oriented Summarization (TAOS) is based on topic factors. These topic factors are various features that describe topics such as capital words are used to represent entity. Various topics can have various aspects and various preferences of features are used to represent various aspects. (Fang et al. 2015 [ 42 ])

Dialogue System

Dialogue systems are very prominent in real world applications ranging from providing support to performing a particular action. In case of support dialogue systems, context awareness is required whereas in case to perform an action, it doesn’t require much context awareness. Earlier dialogue systems were focused on small applications such as home theater systems. These dialogue systems utilize phonemic and lexical levels of language. Habitable dialogue systems offer potential for fully automated dialog systems by utilizing all levels of a language. (Liddy, 2001) [ 68 ].This leads to producing systems that can enable robots to interact with humans in natural languages such as Google’s assistant, Windows Cortana, Apple’s Siri and Amazon’s Alexa etc.

NLP is applied in the field as well. The Linguistic String Project-Medical Language Processor is one the large scale projects of NLP in the field of medicine [ 21 , 53 , 57 , 71 , 114 ]. The LSP-MLP helps enabling physicians to extract and summarize information of any signs or symptoms, drug dosage and response data with the aim of identifying possible side effects of any medicine while highlighting or flagging data items [ 114 ]. The National Library of Medicine is developing The Specialist System [ 78 , 79 , 80 , 82 , 84 ]. It is expected to function as an Information Extraction tool for Biomedical Knowledge Bases, particularly Medline abstracts. The lexicon was created using MeSH (Medical Subject Headings), Dorland’s Illustrated Medical Dictionary and general English Dictionaries. The Centre d’Informatique Hospitaliere of the Hopital Cantonal de Geneve is working on an electronic archiving environment with NLP features [ 81 , 119 ]. In the first phase, patient records were archived. At later stage the LSP-MLP has been adapted for French [ 10 , 72 , 94 , 113 ], and finally, a proper NLP system called RECIT [ 9 , 11 , 17 , 106 ] has been developed using a method called Proximity Processing [ 88 ]. It’s task was to implement a robust and multilingual system able to analyze/comprehend medical sentences, and to preserve a knowledge of free text into a language independent knowledge representation [ 107 , 108 ]. The Columbia university of New York has developed an NLP system called MEDLEE (MEDical Language Extraction and Encoding System) that identifies clinical information in narrative reports and transforms the textual information into structured representation [ 45 ].

3.3 NLP in talk

We next discuss some of the recent NLP projects implemented by various companies:

ACE Powered GDPR Robot Launched by RAVN Systems [ 134 ]

RAVN Systems, a leading expert in Artificial Intelligence (AI), Search and Knowledge Management Solutions, announced the launch of a RAVN (“Applied Cognitive Engine”) i.e. powered software Robot to help and facilitate the GDPR (“General Data Protection Regulation”) compliance. The Robot uses AI techniques to automatically analyze documents and other types of data in any business system which is subject to GDPR rules. It allows users to search, retrieve, flag, classify, and report on data, mediated to be super sensitive under GDPR quickly and easily. Users also can identify personal data from documents, view feeds on the latest personal data that requires attention and provide reports on the data suggested to be deleted or secured. RAVN’s GDPR Robot is also able to hasten requests for information (Data Subject Access Requests - “DSAR”) in a simple and efficient way, removing the need for a physical approach to these requests which tends to be very labor thorough. Peter Wallqvist, CSO at RAVN Systems commented, “GDPR compliance is of universal paramountcy as it will be exploited by any organization that controls and processes data concerning EU citizens.

Link: http://markets.financialcontent.com/stocks/news/read/33888795/RAVN_Systems_Launch_the_ACE_Powered_GDPR_Robot

Eno A Natural Language Chatbot Launched by Capital One [ 56 ]

Capital One announces a chatbot for customers called Eno. Eno is a natural language chatbot that people socialize through texting. CapitalOne claims that Eno is First natural language SMS chatbot from a U.S. bank that allows customers to ask questions using natural language. Customers can interact with Eno asking questions about their savings and others using a text interface. Eno makes such an environment that it feels that a human is interacting. This provides a different platform than other brands that launch chatbots like Facebook Messenger and Skype. They believed that Facebook has too much access to private information of a person, which could get them into trouble with privacy laws U.S. financial institutions work under. Like Facebook Page admin can access full transcripts of the bot’s conversations. If that would be the case then the admins could easily view the personal banking information of customers with is not correct.

Link: https://www.macobserver.com/analysis/capital-one-natural-language-chatbot-eno/

Future of BI in Natural Language Processing [ 140 ]

Several companies in BI spaces are trying to get with the trend and trying hard to ensure that data becomes more friendly and easily accessible. But still there is a long way for this.BI will also make it easier to access as GUI is not needed. Because nowadays the queries are made by text or voice command on smartphones.one of the most common examples is Google might tell you today what tomorrow’s weather will be. But soon enough, we will be able to ask our personal data chatbot about customer sentiment today, and how we feel about their brand next week; all while walking down the street. Today, NLP tends to be based on turning natural language into machine language. But with time the technology matures – especially the AI component –the computer will get better at “understanding” the query and start to deliver answers rather than search results. Initially, the data chatbot will probably ask the question ‘how have revenues changed over the last three-quarters?’ and then return pages of data for you to analyze. But once it learns the semantic relations and inferences of the question, it will be able to automatically perform the filtering and formulation necessary to provide an intelligible answer, rather than simply showing you data.

Link: http://www.smartdatacollective.com/eran-levy/489410/here-s-why-natural-language-processing-future-bi

Using Natural Language Processing and Network Analysis to Develop a Conceptual Framework for Medication Therapy Management Research [ 97 ]

Natural Language Processing and Network Analysis to Develop a Conceptual Framework for Medication Therapy Management Research describes a theory derivation process that is used to develop a conceptual framework for medication therapy management (MTM) research. The MTM service model and chronic care model are selected as parent theories. Review article abstracts target medication therapy management in chronic disease care that were retrieved from Ovid Medline (2000–2016). Unique concepts in each abstract are extracted using Meta Map and their pair-wise co-occurrence are determined. Then the information is used to construct a network graph of concept co-occurrence that is further analyzed to identify content for the new conceptual model. 142 abstracts are analyzed. Medication adherence is the most studied drug therapy problem and co-occurred with concepts related to patient-centered interventions targeting self-management. The enhanced model consists of 65 concepts clustered into 14 constructs. The framework requires additional refinement and evaluation to determine its relevance and applicability across a broad audience including underserved settings.

Link: https://www.ncbi.nlm.nih.gov/pubmed/28269895?dopt=Abstract

Meet the Pilot, world’s first language translating earbuds [ 96 ]

The world’s first smart earpiece Pilot will soon be transcribed over 15 languages. According to Spring wise, Waverly Labs’ Pilot can already transliterate five spoken languages, English, French, Italian, Portuguese, and Spanish, and seven written affixed languages, German, Hindi, Russian, Japanese, Arabic, Korean and Mandarin Chinese. The Pilot earpiece is connected via Bluetooth to the Pilot speech translation app, which uses speech recognition, machine translation and machine learning and speech synthesis technology. Simultaneously, the user will hear the translated version of the speech on the second earpiece. Moreover, it is not necessary that conversation would be taking place between two people; only the users can join in and discuss as a group. As if now the user may experience a few second lag interpolated the speech and translation, which Waverly Labs pursue to reduce. The Pilot earpiece will be available from September but can be pre-ordered now for $249. The earpieces can also be used for streaming music, answering voice calls, and getting audio notifications.

Link: https://www.indiegogo.com/projects/meet-the-pilot-smart-earpiece-language-translator-headphones-travel#/

4 Datasets in NLP and state-of-the-art models

The objective of this section is to present the various datasets used in NLP and some state-of-the-art models in NLP.

4.1 Datasets in NLP

Corpus is a collection of linguistic data, either compiled from written texts or transcribed from recorded speech. Corpora are intended primarily for testing linguistic hypotheses - e.g., to determine how a certain sound, word, or syntactic construction is used across a culture or language. There are various types of corpus: In an annotated corpus, the implicit information in the plain text has been made explicit by specific annotations. Un-annotated corpus contains raw state of plain text. Different languages can be compared using a reference corpus. Monitor corpora are non-finite collections of texts which are mostly used in lexicography. Multilingual corpus refers to a type of corpus that contains small collections of monolingual corpora based on the same sampling procedure and categories for different languages. Parallel corpus contains texts in one language and their translations into other languages which are aligned sentence phrase by phrase. Reference corpus contains text of spoken (formal and informal) and written (formal and informal) language which represents various social and situational contexts. Speech corpus contains recorded speech and transcriptions of recording and the time each word occurred in the recorded speech. There are various datasets available for natural language processing; some of these are listed below for different use cases:

Sentiment Analysis: Sentiment analysis is a rapidly expanding field of natural language processing (NLP) used in a variety of fields such as politics, business etc. Majorly used datasets for sentiment analysis are:

Stanford Sentiment Treebank (SST): Socher et al. introduced SST containing sentiment labels for 215,154 phrases in parse trees for 11,855 sentences from movie reviews posing novel sentiment compositional difficulties [ 127 ].

Sentiment140: It contains 1.6 million tweets annotated with negative, neutral and positive labels.

Paper Reviews: It provides reviews of computing and informatics conferences written in English and Spanish languages. It has 405 reviews which are evaluated on a 5-point scale ranging from very negative to very positive.

IMDB: For natural language processing, text analytics, and sentiment analysis, this dataset offers thousands of movie reviews split into training and test datasets. This dataset was introduced in by Mass et al. in 2011 [ 73 ].

G.Rama Rohit Reddy of the Language Technologies Research Centre, KCIS, IIIT Hyderabad, generated the corpus “Sentiraama.” The corpus is divided into four datasets, each of which is annotated with a two-value scale that distinguishes between positive and negative sentiment at the document level. The corpus contains data from a variety of fields, including book reviews, product reviews, movie reviews, and song lyrics. The annotators meticulously followed the annotation technique for each of them. The folder “Song Lyrics” in the corpus contains 339 Telugu song lyrics written in Telugu script [ 121 ].

Language Modelling: Language models analyse text data to calculate word probability. They use an algorithm to interpret the data, which establishes rules for context in natural language. The model then uses these rules to accurately predict or construct new sentences. The model basically learns the basic characteristics and features of language and then applies them to new phrases. Majorly used datasets for Language modeling are as follows:

Salesforce’s WikiText-103 dataset has 103 million tokens collected from 28,475 featured articles from Wikipedia.

WikiText-2 is a scaled-down version of WikiText-103. It contains 2 million tokens with a 33,278 jargon size.

Penn Treebank piece of the Wall Street Diary corpus includes 929,000 tokens for training, 73,000 tokens for validation, and 82,000 tokens for testing purposes. Its context is limited since it comprises sentences rather than paragraphs [ 76 ].

The Ministry of Electronics and Information Technology’s Technology Development Programme for Indian Languages (TDIL) launched its own data distribution portal ( www.tdil-dc.in ) which has cataloged datasets [ 24 ].

Machine Translation: The task of converting the text of one natural language into another language while keeping the sense of the input text is known as machine translation. Majorly used datasets are as follows:

Tatoeba is a collection of multilingual sentence pairings. A tab-delimited pair of an English text sequence and the translated French text sequence appears on each line of the dataset. Each text sequence might be as simple as a single sentence or as complex as a paragraph of many sentences.

The Europarl parallel corpus is derived from the European Parliament’s proceedings. It is available in 21 European languages [ 40 ].

WMT14 provides machine translation pairs for English-German and English-French. Separately, these datasets comprise 4.5 million and 35 million sentence sets. Byte-Pair Encoding with 32 K tasks is used to encode the phrases.

There are around 160,000 sentence pairings in the IWSLT 14. The dataset includes descriptions in English-German (En-De) and German-English (De-En) languages. There are around 200 K training sentence sets in the IWSLT 13 dataset.

The IIT Bombay English-Hindi corpus comprises parallel corpora for English-Hindi as well as monolingual Hindi corpora gathered from several existing sources and corpora generated over time at IIT Bombay’s Centre for Indian Language Technology.

Question Answering System: Question answering systems provide real-time responses which are widely used in customer care services. The datasets used for dialogue system/question answering system are as follows:

Stanford Question Answering Dataset (SQuAD): it is a reading comprehension dataset made up of questions posed by crowd workers on a collection of Wikipedia articles.

Natural Questions: It is a large-scale corpus presented by Google used for training and assessing open-domain question answering systems. It includes 300,000 naturally occurring queries as well as human-annotated responses from Wikipedia pages for use in QA system training.

Question Answering in Context (QuAC): This dataset is used to describe, comprehend, and participate in information seeking conversation. In this dataset, instances are made up of an interactive discussion between two crowd workers: a student who asks a series of open-ended questions about an unknown Wikipedia text, and a teacher who responds by offering brief extracts from the text.

The neural learning models are overtaking traditional models for NLP [ 64 , 127 ]. In [ 64 ], authors used CNN (Convolutional Neural Network) model for sentiment analysis of movie reviews and achieved 81.5% accuracy. The results illustrate that using CNN was an appropriate replacement for state-of-the-art methods. Authors [ 127 ] have combined SST and Recursive Neural Tensor Network for sentiment analysis of the single sentence. This model amplifies the accuracy by 5.4% for sentence classification compared to traditional NLP models. Authors [ 135 ] proposed a combined Recurrent Neural Network and Transformer model for sentiment analysis. This hybrid model was tested on three different datasets: Twitter US Airline Sentiment, IMDB, and Sentiment 140: and achieved F1 scores of 91%, 93%, and 90%, respectively. This model’s performance outshined the state-of-art methods.

Santoro et al. [ 118 ] introduced a rational recurrent neural network with the capacity to learn on classifying the information and perform complex reasoning based on the interactions between compartmentalized information. They used the relational memory core to handle such interactions. Finally, the model was tested for language modeling on three different datasets (GigaWord, Project Gutenberg, and WikiText-103). Further, they mapped the performance of their model to traditional approaches for dealing with relational reasoning on compartmentalized information. The results achieved with RMC show improved performance.

Merity et al. [ 86 ] extended conventional word-level language models based on Quasi-Recurrent Neural Network and LSTM to handle the granularity at character and word level. They tuned the parameters for character-level modeling using Penn Treebank dataset and word-level modeling using WikiText-103. In both cases, their model outshined the state-of-art methods.

Luong et al. [ 70 ] used neural machine translation on the WMT14 dataset and performed translation of English text to French text. The model demonstrated a significant improvement of up to 2.8 bi-lingual evaluation understudy (BLEU) scores compared to various neural machine translation systems. It outperformed the commonly used MT system on a WMT 14 dataset.

Fan et al. [ 41 ] introduced a gradient-based neural architecture search algorithm that automatically finds architecture with better performance than a transformer, conventional NMT models. They tested their model on WMT14 (English-German Translation), IWSLT14 (German-English translation), and WMT18 (Finnish-to-English translation) and achieved 30.1, 36.1, and 26.4 BLEU points, which shows better performance than Transformer baselines.

Wiese et al. [ 150 ] introduced a deep learning approach based on domain adaptation techniques for handling biomedical question answering tasks. Their model revealed the state-of-the-art performance on biomedical question answers, and the model outperformed the state-of-the-art methods in domains.

Seunghak et al. [ 158 ] designed a Memory-Augmented-Machine-Comprehension-Network (MAMCN) to handle dependencies faced in reading comprehension. The model achieved state-of-the-art performance on document-level using TriviaQA and QUASAR-T datasets, and paragraph-level using SQuAD datasets.

Xie et al. [ 154 ] proposed a neural architecture where candidate answers and their representation learning are constituent centric, guided by a parse tree. Under this architecture, the search space of candidate answers is reduced while preserving the hierarchical, syntactic, and compositional structure among constituents. Using SQuAD, the model delivers state-of-the-art performance.

4.2 State-of-the-art models in NLP

Rationalist approach or symbolic approach assumes that a crucial part of the knowledge in the human mind is not derived by the senses but is firm in advance, probably by genetic inheritance. Noam Chomsky was the strongest advocate of this approach. It was believed that machines can be made to function like the human brain by giving some fundamental knowledge and reasoning mechanism linguistics knowledge is directly encoded in rule or other forms of representation. This helps the automatic process of natural languages [ 92 ]. Statistical and machine learning entail evolution of algorithms that allow a program to infer patterns. An iterative process is used to characterize a given algorithm’s underlying algorithm that is optimized by a numerical measure that characterizes numerical parameters and learning phase. Machine-learning models can be predominantly categorized as either generative or discriminative. Generative methods can generate synthetic data because of which they create rich models of probability distributions. Discriminative methods are more functional and have right estimating posterior probabilities and are based on observations. Srihari [ 129 ] explains the different generative models as one with a resemblance that is used to spot an unknown speaker’s language and would bid the deep knowledge of numerous languages to perform the match. Discriminative methods rely on a less knowledge-intensive approach and using distinction between languages. Whereas generative models can become troublesome when many features are used and discriminative models allow use of more features [ 38 ]. Few of the examples of discriminative methods are Logistic regression and conditional random fields (CRFs), generative methods are Naive Bayes classifiers and hidden Markov models (HMMs).

Naive Bayes Classifiers

Naive Bayes is a probabilistic algorithm which is based on probability theory and Bayes’ Theorem to predict the tag of a text such as news or customer review. It helps to calculate the probability of each tag for the given text and return the tag with the highest probability. Bayes’ Theorem is used to predict the probability of a feature based on prior knowledge of conditions that might be related to that feature. The choice of area in NLP using Naïve Bayes Classifiers could be in usual tasks such as segmentation and translation but it is also explored in unusual areas like segmentation for infant learning and identifying documents for opinions and facts. Anggraeni et al. (2019) [ 61 ] used ML and AI to create a question-and-answer system for retrieving information about hearing loss. They developed I-Chat Bot which understands the user input and provides an appropriate response and produces a model which can be used in the search for information about required hearing impairments. The problem with naïve bayes is that we may end up with zero probabilities when we meet words in the test data for a certain class that are not present in the training data.

Hidden Markov Model (HMM)

An HMM is a system where a shifting takes place between several states, generating feasible output symbols with each switch. The sets of viable states and unique symbols may be large, but finite and known. We can describe the outputs, but the system’s internals are hidden. Few of the problems could be solved by Inference A certain sequence of output symbols, compute the probabilities of one or more candidate states with sequences. Patterns matching the state-switch sequence are most likely to have generated a particular output-symbol sequence. Training the output-symbol chain data, reckon the state-switch/output probabilities that fit this data best.

Hidden Markov Models are extensively used for speech recognition, where the output sequence is matched to the sequence of individual phonemes. HMM is not restricted to this application; it has several others such as bioinformatics problems, for example, multiple sequence alignment [ 128 ]. Sonnhammer mentioned that Pfam holds multiple alignments and hidden Markov model-based profiles (HMM-profiles) of entire protein domains. The cue of domain boundaries, family members and alignment are done semi-automatically found on expert knowledge, sequence similarity, other protein family databases and the capability of HMM-profiles to correctly identify and align the members. HMM may be used for a variety of NLP applications, including word prediction, sentence production, quality assurance, and intrusion detection systems [ 133 ].

Neural Network

Earlier machine learning techniques such as Naïve Bayes, HMM etc. were majorly used for NLP but by the end of 2010, neural networks transformed and enhanced NLP tasks by learning multilevel features. Major use of neural networks in NLP is observed for word embedding where words are represented in the form of vectors. These vectors can be used to recognize similar words by observing their closeness in this vector space, other uses of neural networks are observed in information retrieval, text summarization, text classification, machine translation, sentiment analysis and speech recognition. Initially focus was on feedforward [ 49 ] and CNN (convolutional neural network) architecture [ 69 ] but later researchers adopted recurrent neural networks to capture the context of a word with respect to surrounding words of a sentence. LSTM (Long Short-Term Memory), a variant of RNN, is used in various tasks such as word prediction, and sentence topic prediction. [ 47 ] In order to observe the word arrangement in forward and backward direction, bi-directional LSTM is explored by researchers [ 59 ]. In case of machine translation, encoder-decoder architecture is used where dimensionality of input and output vector is not known. Neural networks can be used to anticipate a state that has not yet been seen, such as future states for which predictors exist whereas HMM predicts hidden states.

Bi-directional Encoder Representations from Transformers (BERT) is a pre-trained model with unlabeled text available on BookCorpus and English Wikipedia. This can be fine-tuned to capture context for various NLP tasks such as question answering, sentiment analysis, text classification, sentence embedding, interpreting ambiguity in the text etc. [ 25 , 33 , 90 , 148 ]. Earlier language-based models examine the text in either of one direction which is used for sentence generation by predicting the next word whereas the BERT model examines the text in both directions simultaneously for better language understanding. BERT provides contextual embedding for each word present in the text unlike context-free models (word2vec and GloVe). For example, in the sentences “he is going to the riverbank for a walk” and “he is going to the bank to withdraw some money”, word2vec will have one vector representation for “bank” in both the sentences whereas BERT will have different vector representation for “bank”. Muller et al. [ 90 ] used the BERT model to analyze the tweets on covid-19 content. The use of the BERT model in the legal domain was explored by Chalkidis et al. [ 20 ].

Since BERT considers up to 512 tokens, this is the reason if there is a long text sequence that must be divided into multiple short text sequences of 512 tokens. This is the limitation of BERT as it lacks in handling large text sequences.

5 Evaluation metrics and challenges

The objective of this section is to discuss evaluation metrics used to evaluate the model’s performance and involved challenges.

5.1 Evaluation metrics

Since the number of labels in most classification problems is fixed, it is easy to determine the score for each class and, as a result, the loss from the ground truth. In image generation problems, the output resolution and ground truth are both fixed. As a result, we can calculate the loss at the pixel level using ground truth. But in NLP, though output format is predetermined in the case of NLP, dimensions cannot be specified. It is because a single statement can be expressed in multiple ways without changing the intent and meaning of that statement. Evaluation metrics are important to evaluate the model’s performance if we were trying to solve two problems with one model.

BLEU (BiLingual Evaluation Understudy) Score: Each word in the output sentence is scored 1 if it appears in either of the reference sentences and a 0 if it does not. Further the number of words that appeared in one of the reference translations is divided by the total number of words in the output sentence to normalize the count so that it is always between 0 and 1. For example, if ground truth is “He is playing chess in the backyard” and output sentences are S1: “He is playing tennis in the backyard”, S2: “He is playing badminton in the backyard”, S3: “He is playing movie in the backyard” and S4: “backyard backyard backyard backyard backyard backyard backyard”. The score of S1, S2 and S3 would be 6/7,6/7 and 6/7. All sentences are getting the same score though information in S1 and S3 is not same. This is because BELU considers words in a sentence contribute equally to the meaning of a sentence which is not the case in real-world scenario. Using combination of uni-gram, bi-gram and n-grams, we can to capture the order of a sentence. We may also set a limit on how many times each word is counted based on how many times it appears in each reference phrase, which helps us prevent excessive repetition.

GLUE (General Language Understanding Evaluation) score: Previously, NLP models were almost usually built to perform effectively on a unique job. Various models such as LSTM, Bi-LSTM were trained solely for this task, and very rarely generalized to other tasks. The model which is used for named entity recognition can perform for textual entailment. GLUE is a set of datasets for training, assessing, and comparing NLP models. It includes nine diverse task datasets designed to test a model’s language understanding. To acquire a comprehensive assessment of a model’s performance, GLUE tests the model on a variety of tasks rather than a single one. Single-sentence tasks, similarity and paraphrase tasks, and inference tasks are among them. For example, in sentiment analysis of customer reviews, we might be interested in analyzing ambiguous reviews and determining which product the client is referring to in his reviews. Thus, the model obtains a good “knowledge” of language in general after some generalized pre-training. When the time comes to test out a model to meet a given task, this universal “knowledge” gives us an advantage. With GLUE, researchers can evaluate their model and score it on all nine tasks. The final performance score model is the average of those nine scores. It makes little difference how the model looks or works if it can analyze inputs and predict outcomes for all the activities.

Considering these metrics in mind, it helps to evaluate the performance of an NLP model for a particular task or a variety of tasks.

5.2 Challenges

The applications of NLP have been growing day by day, and with these new challenges are also occurring despite a lot of work done in the recent past. Some of the common challenges are: Contextual words and phrases in the language where same words and phrases can have different meanings in a sentence which are easy for the humans to understand but makes a challenging task. Such type of challenges can also be faced with dealing Synonyms in the language because humans use many different words to express the same idea, also in the language different levels of complexity such as large, huge, and big may be used by the different persons which become a challenging task to process the language and design algorithms to adopt all these issues. Further in language, Homonyms, the words used to be pronounced the same but have different definitions are also problematic for question answering and speech-to-text applications because they aren’t written in text form. Sentences using sarcasm and irony sometimes may be understood in the opposite way by the humans, and so designing models to deal with such sentences is a really challenging task in NLP. Furthermore, the sentences in the language having any type of ambiguity in the sense of interpreting in more than one way is also an area to work upon where more accuracy can be achieved. Language containing informal phrases, expressions, idioms, and culture-specific lingo make difficult to design models intended for the broad use, however having a lot of data on which training and updating on regular basis may improve the models, but it is a really challenging task to deal with the words having different meaning in different geographic areas. In fact, such types of issues also occur in dealing with different domains such as the meaning of words or sentences may be different in the education industry but have different meaning in health, law, defense etc. So, the models for NLP may be working good for an individual domain, geographic area but for a broad use such challenges need to be tackled. Further together with the above-mentioned challenges misspelled or misused words can also create a problem, although autocorrect and grammar corrections applications have improved a lot due to the continuous developments in the direction but predicting the intention of the writer that to from a specific domain, geographic area by considering sarcasm, expressions, informal phrases etc. is really a big challenge. There is no doubt that for most common widely used languages models for NLP have been doing very well, and further improving day by day but still there is a need for models for all the persons rather than specific knowledge of a particular language and technology. One may further refer to the work of Sharifirad and Matwin (2019) [ 123 ] for classification of different online harassment categories and challenges, Baclic et.al. (2020) [ 6 ] and Wong et al. (2018) [ 151 ] for challenges and opportunities in public health, Kang et.al. (2020) [ 63 ] for detailed literature survey and technological challenges relevant to management research and NLP, and a recent review work by Alshemali and Kalita (2020) [ 3 ], and references cited there in.

In the recent past, models dealing with Visual Commonsense Reasoning [ 31 ] and NLP have also been getting attention of the several researchers and seems a promising and challenging area to work upon. These models try to extract the information from an image, video using a visual reasoning paradigm such as the humans can infer from a given image, video beyond what is visually obvious, such as objects’ functions, people’s intents, and mental states. In this direction, recently Wen and Peng (2020) [ 149 ] suggested a model to capture knowledge from different perspectives, and perceive common sense in advance, and the results based on the conducted experiments on visual commonsense reasoning dataset VCR seems very satisfactory and effective. The work of Peng and Chi (2019) [ 102 ], that proposes Domain Adaptation with Scene Graph approach to transfer knowledge from the source domain with the objective to improve cross-media retrieval in the target domain, and Yen et al. (2019) [ 155 ] is also very useful to further explore the use of NLP and in its relevant domains.

6 Conclusion

This paper is written with three objectives. The first objective gives insights of the various important terminologies of NLP and NLG, and can be useful for the readers interested to start their early career in NLP and work relevant to its applications. The second objective of this paper focuses on the history, applications, and recent developments in the field of NLP. The third objective is to discuss datasets, approaches and evaluation metrics used in NLP. The relevant work done in the existing literature with their findings and some of the important applications and projects in NLP are also discussed in the paper. The last two objectives may serve as a literature survey for the readers already working in the NLP and relevant fields, and further can provide motivation to explore the fields mentioned in this paper. It is to be noticed that even though a great amount of work on natural language processing is available in literature surveys (one may refer to [ 15 , 32 , 63 , 98 , 133 , 151 ] focusing on one domain such as usage of deep-learning techniques in NLP, techniques used for email spam filtering, medication safety, management research, intrusion detection, and Gujarati language etc.), still there is not much work on regional languages, which can be the focus of future research.

Change history

25 july 2022.

Affiliation 3 has been added into the online PDF.

Ahonen H, Heinonen O, Klemettinen M, Verkamo AI (1998) Applying data mining techniques for descriptive phrase extraction in digital document collections. In research and technology advances in digital libraries, 1998. ADL 98. Proceedings. IEEE international forum on (pp. 2-11). IEEE

Alshawi H (1992) The core language engine. MIT press

Alshemali B, Kalita J (2020) Improving the reliability of deep neural networks in NLP: A review. Knowl-Based Syst 191:105210

Article   Google Scholar  

Andreev ND (1967) The intermediary language as the focal point of machine translation. In: Booth AD (ed) Machine translation. North Holland Publishing Company, Amsterdam, pp 3–27

Google Scholar  

Androutsopoulos I, Paliouras G, Karkaletsis V, Sakkis G, Spyropoulos CD, Stamatopoulos P (2000) Learning to filter spam e-mail: A comparison of a naive bayesian and a memory-based approach. arXiv preprint cs/0009009

Baclic O, Tunis M, Young K, Doan C, Swerdfeger H, Schonfeld J (2020) Artificial intelligence in public health: challenges and opportunities for public health made possible by advances in natural language processing. Can Commun Dis Rep 46(6):161

Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In ICLR 2015

Bangalore S, Rambow O, Whittaker S (2000) Evaluation metrics for generation. In proceedings of the first international conference on natural language generation-volume 14 (pp. 1-8). Assoc Comput Linguist

Baud RH, Rassinoux AM, Scherrer JR (1991) Knowledge representation of discharge summaries. In AIME 91 (pp. 173–182). Springer, Berlin Heidelberg

Baud RH, Rassinoux AM, Scherrer JR (1992) Natural language processing and semantical representation of medical texts. Methods Inf Med 31(2):117–125

Baud RH, Alpay L, Lovis C (1994) Let’s meet the users with natural language understanding. Knowledge and Decisions in Health Telematics: The Next Decade 12:103

Bengio Y, Ducharme R, Vincent P (2001) A neural probabilistic language model. Proceedings of NIPS

Benson E, Haghighi A, Barzilay R (2011) Event discovery in social media feeds. In proceedings of the 49th annual meeting of the Association for Computational Linguistics: human language technologies-volume 1 (pp. 389-398). Assoc Comput Linguist

Berger AL, Della Pietra SA, Della Pietra VJ (1996) A maximum entropy approach to natural language processing. Computational Linguistics 22(1):39–71

Blanzieri E, Bryl A (2008) A survey of learning-based techniques of email spam filtering. Artif Intell Rev 29(1):63–92

Bondale N, Maloor P, Vaidyanathan A, Sengupta S, Rao PV (1999) Extraction of information from open-ended questionnaires using natural language processing techniques. Computer Science and Informatics 29(2):15–22

Borst F, Sager N, Nhàn NT, Su Y, Lyman M, Tick LJ, ..., Scherrer JR (1989) Analyse automatique de comptes rendus d'hospitalisation. In Degoulet P, Stephan JC, Venot A, Yvon PJ, rédacteurs. Informatique et Santé, Informatique et Gestion des Unités de Soins, Comptes Rendus du Colloque AIM-IF, Paris (pp. 246–56). [5]

Briscoe EJ, Grover C, Boguraev B, Carroll J (1987) A formalism and environment for the development of a large grammar of English. IJCAI 87:703–708

Carreras X, Marquez L (2001) Boosting trees for anti-spam email filtering. arXiv preprint cs/0109015

Chalkidis I, Fergadiotis M, Malakasiotis P, Aletras N, Androutsopoulos I (2020) LEGAL-BERT: the muppets straight out of law school. arXiv preprint arXiv:2010.02559

Chi EC, Lyman MS, Sager N, Friedman C, Macleod C (1985) A database of computer-structured narrative: methods of computing complex relations. In proceedings of the annual symposium on computer application in medical care (p. 221). Am Med Inform Assoc

Cho K, Van Merriënboer B, Bahdanau D, Bengio Y, (2014) On the properties of neural machine translation: encoder-decoder approaches. arXiv preprint arXiv:1409.1259

Chomsky N (1965) Aspects of the theory of syntax. MIT Press, Cambridge, Massachusetts

Choudhary N (2021) LDC-IL: the Indian repository of resources for language technology. Lang Resources & Evaluation 55:855–867. https://doi.org/10.1007/s10579-020-09523-3

Chouikhi H, Chniter H, Jarray F (2021) Arabic sentiment analysis using BERT model. In international conference on computational collective intelligence (pp. 621-632). Springer, Cham

Chung J, Gulcehre C, Cho K, Bengio Y, (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555

Cohen WW (1996) Learning rules that classify e-mail. In AAAI spring symposium on machine learning in information access (Vol. 18, p. 25)

Cohen PR, Morgan J, Ramsay AM (2002) Intention in communication, Am J Psychol 104(4)

Collobert R, Weston J (2008) A unified architecture for natural language processing. In proceedings of the 25th international conference on machine learning (pp. 160–167)

Dai Z, Yang Z, Yang Y, Carbonell J, Le QV, Salakhutdinov R, (2019) Transformer-xl: attentive language models beyond a fixed-length context. arXiv preprint arXiv:1901.02860

Davis E, Marcus G (2015) Commonsense reasoning and commonsense knowledge in artificial intelligence. Commun ACM 58(9):92–103

Desai NP, Dabhi VK (2022) Resources and components for Gujarati NLP systems: a survey. Artif Intell Rev:1–19

Devlin J, Chang MW, Lee K, Toutanova K, (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805

Diab M, Hacioglu K, Jurafsky D (2004) Automatic tagging of Arabic text: From raw text to base phrase chunks. In Proceedings of HLT-NAACL 2004: Short papers (pp. 149–152). Assoc Computat Linguist

Doddington G (2002) Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In proceedings of the second international conference on human language technology research (pp. 138-145). Morgan Kaufmann publishers Inc

Drucker H, Wu D, Vapnik VN (1999) Support vector machines for spam categorization. IEEE Trans Neural Netw 10(5):1048–1054

Dunlavy DM, O’Leary DP, Conroy JM, Schlesinger JD (2007) QCS: A system for querying, clustering and summarizing documents. Inf Process Manag 43(6):1588–1605

Elkan C (2008) Log-Linear Models and Conditional Random Fields. http://cseweb.ucsd.edu/welkan/250B/cikmtutorial.pdf accessed 28 Jun 2017.

Emele MC, Dorna M (1998) Ambiguity preserving machine translation using packed representations. In proceedings of the 36th annual meeting of the Association for Computational Linguistics and 17th international conference on computational linguistics-volume 1 (pp. 365-371). Association for Computational Linguistics

Europarl: A Parallel Corpus for Statistical Machine Translation (2005) Philipp Koehn , MT Summit 2005

Fan Y, Tian F, Xia Y, Qin T, Li XY, Liu TY (2020) Searching better architectures for neural machine translation. IEEE/ACM Transactions on Audio, Speech, and Language Processing 28:1574–1585

Fang H, Lu W, Wu F, Zhang Y, Shang X, Shao J, Zhuang Y (2015) Topic aspect-oriented summarization via group selection. Neurocomputing 149:1613–1619

Fattah MA, Ren F (2009) GA, MR, FFNN, PNN and GMM based models for automatic text summarization. Comput Speech Lang 23(1):126–144

Feldman S (1999) NLP meets the jabberwocky: natural language processing in information retrieval. Online-Weston Then Wilton 23:62–73

Friedman C, Cimino JJ, Johnson SB (1993) A conceptual model for clinical radiology reports. In proceedings of the annual symposium on computer application in medical care (p. 829). Am Med Inform Assoc

Gao T, Dontcheva M, Adar E, Liu Z, Karahalios K DataTone: managing ambiguity in natural language interfaces for data visualization, UIST ‘15: proceedings of the 28th annual ACM symposium on User Interface Software & Technology, November 2015, 489–500, https://doi.org/10.1145/2807442.2807478

Ghosh S, Vinyals O, Strope B, Roy S, Dean T, Heck L (2016) Contextual lstm (clstm) models for large scale nlp tasks. arXiv preprint arXiv:1602.06291

Glasgow B, Mandell A, Binney D, Ghemri L, Fisher D (1998) MITA: an information-extraction approach to the analysis of free-form text in life insurance applications. AI Mag 19(1):59

Goldberg Y (2017) Neural network methods for natural language processing. Synthesis lectures on human language technologies 10(1):1–309

Gong Y, Liu X (2001) Generic text summarization using relevance measure and latent semantic analysis. In proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval (pp. 19-25). ACM

Green Jr, BF, Wolf AK, Chomsky C, Laughery K (1961) Baseball: an automatic question-answerer. In papers presented at the may 9-11, 1961, western joint IRE-AIEE-ACM computer conference (pp. 219-224). ACM

Greff K, Srivastava RK, Koutník J, Steunebrink BR, Schmidhuber J (2016) LSTM: A search space odyssey. IEEE transactions on neural networks and learning systems 28(10):2222–2232

Article   MathSciNet   Google Scholar  

Grishman R, Sager N, Raze C, Bookchin B (1973) The linguistic string parser. In proceedings of the June 4-8, 1973, national computer conference and exposition (pp. 427-434). ACM

Hayes PJ (1992) Intelligent high-volume text processing using shallow, domain-specific techniques. Text-based intelligent systems: current research and practice in information extraction and retrieval, 227-242.

Hendrix GG, Sacerdoti ED, Sagalowicz D, Slocum J (1978) Developing a natural language interface to complex data. ACM Transactions on Database Systems (TODS) 3(2):105–147

"Here’s Why Natural Language Processing is the Future of BI (2017) " SmartData Collective. N.p., n.d. Web. 19

Hirschman L, Grishman R, Sager N (1976) From text to structured information: automatic processing of medical reports. In proceedings of the June 7-10, 1976, national computer conference and exposition (pp. 267-275). ACM

Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

Huang Z, Xu W, Yu K (2015) Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:1508.01991

Hutchins WJ (1986) Machine translation: past, present, future (p. 66). Ellis Horwood, Chichester

Jurafsky D, Martin J (2008) H. Speech and language processing. 2nd edn. Prentice-Hall, Englewood Cliffs, NJ

Kamp H, Reyle U (1993) Tense and aspect. In from discourse to logic (pp. 483-689). Springer Netherlands

Kang Y, Cai Z, Tan CW, Huang Q, Liu H (2020) Natural language processing (NLP) in management research: A literature review. Journal of Management Analytics 7(2):139–172

Kim Y. (2014) Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882

Knight K, Langkilde I (2000) Preserving ambiguities in generation via automata intersection. In AAAI/IAAI (pp. 697-702)

Lass R (1998) Phonology: An Introduction to Basic Concepts. Cambridge, UK; New York; Melbourne, Australia: Cambridge University Press. p. 1. ISBN 978–0–521-23728-4. Retrieved 8 January 2011Paperback ISBN 0–521–28183-0

Lewis DD (1998) Naive (Bayes) at forty: The independence assumption in information retrieval. In European conference on machine learning (pp. 4–15). Springer, Berlin Heidelberg

Liddy ED (2001). Natural language processing

Lopez MM, Kalita J (2017) Deep learning applied to NLP. arXiv preprint arXiv:1703.03091

Luong MT, Sutskever I, Le Q V, Vinyals O, Zaremba W (2014) Addressing the rare word problem in neural machine translation. arXiv preprint arXiv:1410.8206

Lyman M, Sager N, Friedman C, Chi E (1985) Computer-structured narrative in ambulatory care: its use in longitudinal review of clinical data. In proceedings of the annual symposium on computer application in medical care (p. 82). Am Med Inform Assoc

Lyman M, Sager N, Chi EC, Tick LJ, Nhan NT, Su Y, ..., Scherrer, J. (1989) Medical Language Processing for Knowledge Representation and Retrievals. In Proceedings. Symposium on Computer Applications in Medical Care (pp. 548–553). Am Med Inform Assoc

Maas A, Daly RE, Pham PT, Huang D, Ng AY, Potts C (2011) Learning word vectors for sentiment analysis. In proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies (pp. 142-150)

Mani I, Maybury MT (eds) (1999) Advances in automatic text summarization, vol 293. MIT press, Cambridge, MA

Manning CD, Schütze H (1999) Foundations of statistical natural language processing, vol 999. MIT press, Cambridge

MATH   Google Scholar  

Marcus MP, Marcinkiewicz MA, Santorini B (1993) Building a large annotated corpus of english: the penn treebank. Comput Linguist 19(2):313–330

McCallum A, Nigam K (1998) A comparison of event models for naive bayes text classification. In AAAI-98 workshop on learning for text categorization (Vol. 752, pp. 41-48)

McCray AT (1991) Natural language processing for intelligent information retrieval. In Engineering in Medicine and Biology Society, 1991. Vol. 13: 1991., Proceedings of the Annual International Conference of the IEEE (pp. 1160–1161). IEEE

McCray AT (1991) Extending a natural language parser with UMLS knowledge. In proceedings of the annual symposium on computer application in medical care (p. 194). Am Med Inform Assoc

McCray AT, Nelson SJ (1995) The representation of meaning in the UMLS. Methods Inf Med 34(1–2):193–201

McCray AT, Razi A (1994) The UMLS knowledge source server. Medinfo MedInfo 8:144–147

McCray AT, Srinivasan S, Browne AC (1994) Lexical methods for managing variation in biomedical terminologies. In proceedings of the annual symposium on computer application in medical care (p. 235). Am Med Inform Assoc

McDonald R, Crammer K, Pereira F (2005) Flexible text segmentation with structured multilabel classification. In proceedings of the conference on human language technology and empirical methods in natural language processing (pp. 987-994). Assoc Comput Linguist

McGray AT, Sponsler JL, Brylawski B, Browne AC (1987) The role of lexical knowledge in biomedical text understanding. In proceedings of the annual symposium on computer application in medical care (p. 103). Am Med Inform Assoc

McKeown KR (1985) Text generation. Cambridge University Press, Cambridge

Book   Google Scholar  

Merity S, Keskar NS, Socher R (2018) An analysis of neural language modeling at multiple scales. arXiv preprint arXiv:1803.08240

Mikolov T, Chen K, Corrado G., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems

Morel-Guillemaz AM, Baud RH, Scherrer JR (1990) Proximity processing of medical text. In medical informatics Europe’90 (pp. 625–630). Springer, Berlin Heidelberg

Morin E (1999) Automatic acquisition of semantic relations between terms from technical corpora. In proc. of the fifth international congress on terminology and knowledge engineering-TKE’99

Müller M, Salathé M, Kummervold PE (2020) Covid-twitter-bert: A natural language processing model to analyse covid-19 content on twitter. arXiv preprint arXiv:2005.07503

"Natural Language Processing (2017) " Natural Language Processing RSS. N.p., n.d. Web. 25

"Natural Language Processing" (2017) Natural Language Processing RSS. N.p., n.d. Web. 23

Newatia R (2019) https://medium.com/saarthi-ai/sentence-classification-using-convolutional-neural-networks-ddad72c7048c . Accessed 15 Dec 2021

Nhàn NT, Sager N, Lyman M, Tick LJ, Borst F, Su Y (1989) A medical language processor for two indo-European languages. In proceedings. Symposium on computer applications in medical care (pp. 554-558). Am Med Inform Assoc

Nießen S, Och FJ, Leusch G, Ney H (2000) An evaluation tool for machine translation: fast evaluation for MT research. In LREC

Ochoa, A. (2016). Meet the Pilot: Smart Earpiece Language Translator. https://www.indiegogo.com/projects/meet-the-pilot-smart-earpiece-language-translator-headphones-travel . Accessed April 10, 2017

Ogallo, W., & Kanter, A. S. (2017). Using natural language processing and network analysis to develop a conceptual framework for medication therapy management research. https://www.ncbi.nlm.nih.gov/pubmed/28269895?dopt=Abstract . Accessed April 10, 2017

Otter DW, Medina JR, Kalita JK (2020) A survey of the usages of deep learning for natural language processing. IEEE Transactions on Neural Networks and Learning Systems 32(2):604–624

Ouyang Y, Li W, Li S, Lu Q (2011) Applying regression models to query-focused multi-document summarization. Inf Process Manag 47(2):227–237

Palmer M, Gildea D, Kingsbury P (2005) The proposition bank: an annotated corpus of semantic roles. Computational linguistics 31(1):71–106

Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In proceedings of the 40th annual meeting on association for computational linguistics (pp. 311-318). Assoc Comput Linguist

Peng Y, Chi J (2019) Unsupervised cross-media retrieval using domain adaptation with scene graph. IEEE Transactions on Circuits and Systems for Video Technology 30(11):4368–4379

Porter MF (1980) An algorithm for suffix stripping. Program 14(3):130–137

Rae JW, Potapenko A, Jayakumar SM, Lillicrap TP, (2019) Compressive transformers for long-range sequence modelling. arXiv preprint arXiv:1911.05507

Ranjan P, Basu HVSSA (2003) Part of speech tagging and local word grouping techniques for natural language parsing in Hindi. In Proceedings of the 1st International Conference on Natural Language Processing (ICON 2003)

Rassinoux AM, Baud RH, Scherrer JR (1992) Conceptual graphs model extension for knowledge representation of medical texts. MEDINFO 92:1368–1374

Rassinoux AM, Michel PA, Juge C, Baud R, Scherrer JR (1994) Natural language processing of medical texts within the HELIOS environment. Comput Methods Prog Biomed 45:S79–S96

Rassinoux AM, Juge C, Michel PA, Baud RH, Lemaitre D, Jean FC, Scherrer JR (1995) Analysis of medical jargon: The RECIT system. In Conference on Artificial Intelligence in Medicine in Europe (pp. 42–52). Springer, Berlin Heidelberg

Rennie J (2000) ifile: An application of machine learning to e-mail filtering. In Proc. KDD 2000 Workshop on text mining, Boston, MA

Riedhammer K, Favre B, Hakkani-Tür D (2010) Long story short–global unsupervised models for keyphrase based meeting summarization. Speech Comm 52(10):801–815

Ritter A, Clark S, Etzioni O (2011) Named entity recognition in tweets: an experimental study. In proceedings of the conference on empirical methods in natural language processing (pp. 1524-1534). Assoc Comput Linguist

Rospocher M, van Erp M, Vossen P, Fokkens A, Aldabe I, Rigau G, Soroa A, Ploeger T, Bogaard T(2016) Building event-centric knowledge graphs from news. Web Semantics: Science, Services and Agents on the World Wide Web, In Press

Sager N, Lyman M, Tick LJ, Borst F, Nhan NT, Revillard C, … Scherrer JR (1989) Adapting a medical language processor from English to French. Medinfo 89:795–799

Sager N, Lyman M, Nhan NT, Tick LJ (1995) Medical language processing: applications to patient data representation and automatic encoding. Methods Inf Med 34(1–2):140–146

Sahami M, Dumais S, Heckerman D, Horvitz E (1998) A Bayesian approach to filtering junk e-mail. In learning for text categorization: papers from the 1998 workshop (Vol. 62, pp. 98-105)

Sakkis G, Androutsopoulos I, Paliouras G, Karkaletsis V, Spyropoulos CD, Stamatopoulos P (2001) Stacking classifiers for anti-spam filtering of e-mail. arXiv preprint cs/0106040

Sakkis G, Androutsopoulos I, Paliouras G et al (2003) A memory-based approach to anti-spam filtering for mailing lists. Inf Retr 6:49–73. https://doi.org/10.1023/A:1022948414856

Santoro A, Faulkner R, Raposo D, Rae J, Chrzanowski M, Weber T, ..., Lillicrap T (2018) Relational recurrent neural networks. Adv Neural Inf Proces Syst, 31

Scherrer JR, Revillard C, Borst F, Berthoud M, Lovis C (1994) Medical office automation integrated into the distributed architecture of a hospital information system. Methods Inf Med 33(2):174–179

Seal D, Roy UK, Basak R (2020) Sentence-level emotion detection from text based on semantic rules. In: Tuba M, Akashe S, Joshi A (eds) Information and communication Technology for Sustainable Development. Advances in intelligent Systems and computing, vol 933. Springer, Singapore. https://doi.org/10.1007/978-981-13-7166-0_42

Chapter   Google Scholar  

Sentiraama Corpus by Gangula Rama Rohit Reddy, Radhika Mamidi. Language Technologies Research Centre, KCIS, IIIT Hyderabad (n.d.) ltrc.iiit.ac.in/showfile.php?filename=downloads/sentiraama/

Sha F, Pereira F (2003) Shallow parsing with conditional random fields. In proceedings of the 2003 conference of the north American chapter of the Association for Computational Linguistics on human language technology-volume 1 (pp. 134-141). Assoc Comput Linguist

Sharifirad S, Matwin S, (2019) When a tweet is actually sexist. A more comprehensive classification of different online harassment categories and the challenges in NLP. arXiv preprint arXiv:1902.10584

Sharma S, Srinivas PYKL, Balabantaray RC (2016) Emotion Detection using Online Machine Learning Method and TLBO on Mixed Script. In Proceedings of Language Resources and Evaluation Conference 2016 (pp. 47–51)

Shemtov H (1997) Ambiguity management in natural language generation. Stanford University

Small SL, Cortell GW, Tanenhaus MK (1988) Lexical Ambiguity Resolutions. Morgan Kauffman, San Mateo, CA

Socher R, Perelygin A, Wu J, Chuang J, Manning CD, Ng AY, Potts C (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In proceedings of the 2013 conference on empirical methods in natural language processing (pp. 1631-1642)

Sonnhammer EL, Eddy SR, Birney E, Bateman A, Durbin R (1998) Pfam: multiple sequence alignments and HMM-profiles of protein domains. Nucleic Acids Res 26(1):320–322

Srihari S (2010) Machine Learning: Generative and Discriminative Models. http://www.cedar.buffalo.edu/wsrihari/CSE574/Discriminative-Generative.pdf . accessed 31 May 2017.]

Sun X, Morency LP, Okanohara D, Tsujii JI (2008) Modeling latent-dynamic in shallow parsing: a latent conditional model with improved inference. In proceedings of the 22nd international conference on computational linguistics-volume 1 (pp. 841-848). Assoc Comput Linguist

Sundheim BM, Chinchor NA (1993) Survey of the message understanding conferences. In proceedings of the workshop on human language technology (pp. 56-60). Assoc Comput Linguist

Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems

Sworna ZT, Mousavi Z, Babar MA (2022) NLP methods in host-based intrusion detection Systems: A systematic review and future directions. arXiv preprint arXiv:2201.08066

Systems RAVN (2017) "RAVN Systems Launch the ACE Powered GDPR Robot - Artificial Intelligence to Expedite GDPR Compliance." Stock Market. PR Newswire, n.d. Web. 19

Tan KL, Lee CP, Anbananthen KSM, Lim KM (2022) RoBERTa-LSTM: A hybrid model for sentiment analysis with transformers and recurrent neural network. IEEE Access, RoBERTa-LSTM: A Hybrid Model for Sentiment Analysis With Transformer and Recurrent Neural Network

Tapaswi N, Jain S (2012) Treebank based deep grammar acquisition and part-of-speech tagging for Sanskrit sentences. In software engineering (CONSEG), 2012 CSI sixth international conference on (pp. 1-4). IEEE

Thomas C (2019)  https://towardsdatascience.com/recurrent-neural-networks-and-natural-language-processing-73af640c2aa1 . Accessed 15 Dec 2021

Tillmann C, Vogel S, Ney H, Zubiaga A, Sawaf H (1997) Accelerated DP based search for statistical translation. In Eurospeech

Umber A, Bajwa I (2011) “Minimizing ambiguity in natural language software requirements specification,” in Sixth Int Conf Digit Inf Manag, pp. 102–107

"Using Natural Language Processing and Network Analysis to Develop a Conceptual Framework for Medication Therapy Management Research (2017) " AMIA ... Annual Symposium proceedings. AMIA Symposium. U.S. National Library of Medicine, n.d. Web. 19

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I, (2017) Attention is all you need. In advances in neural information processing systems (pp. 5998-6008)

Wahlster W, Kobsa A (1989) User models in dialog systems. In user models in dialog systems (pp. 4–34). Springer Berlin Heidelberg, User Models in Dialog Systems

Walton D (1996) A pragmatic synthesis. In: fallacies arising from ambiguity. Applied logic series, vol 1. Springer, Dordrecht)

Wan X (2008) Using only cross-document relationships for both generic and topic-focused multi-document summarizations. Inf Retr 11(1):25–49

Wang W, Gang J, 2018 Application of convolutional neural network in natural language processing. In 2018 international conference on information Systems and computer aided education (ICISCAE) (pp. 64-70). IEEE

Wang D, Zhu S, Li T, Gong Y (2009) Multi-document summarization using sentence-based topic models. In proceedings of the ACL-IJCNLP 2009 conference short papers (pp. 297-300). Assoc Comput Linguist

Wang D, Zhu S, Li T, Chi Y, Gong Y (2011) Integrating document clustering and multidocument summarization. ACM Transactions on Knowledge Discovery from Data (TKDD) 5(3):14–26

Wang Z, Ng P, Ma X, Nallapati R, Xiang B (2019) Multi-passage bert: A globally normalized bert model for open-domain question answering. arXiv preprint arXiv:1908.08167

Wen Z, Peng Y (2020) Multi-level knowledge injecting for visual commonsense reasoning. IEEE Transactions on Circuits and Systems for Video Technology 31(3):1042–1054

Wiese G, Weissenborn D, Neves M (2017) Neural domain adaptation for biomedical question answering. arXiv preprint arXiv:1706.03610

Wong A, Plasek JM, Montecalvo SP, Zhou L (2018) Natural language processing and its implications for the future of medication safety: a narrative review of recent advances and challenges. Pharmacotherapy: The Journal of Human Pharmacology and Drug Therapy 38(8):822–841

Woods WA (1978) Semantics and quantification in natural language question answering. Adv Comput 17:1–87

Xia T (2020) A constant time complexity spam detection algorithm for boosting throughput on rule-based filtering Systems. IEEE Access 8:82653–82661. https://doi.org/10.1109/ACCESS.2020.2991328

Xie P, Xing E (2017) A constituent-centric neural architecture for reading comprehension. In proceedings of the 55th annual meeting of the Association for Computational Linguistics (volume 1: long papers) (pp. 1405-1414)

Yan X, Ye Y, Mao Y, Yu H (2019) Shared-private information bottleneck method for cross-modal clustering. IEEE Access 7:36045–36056

Yi J, Nasukawa T, Bunescu R, Niblack W (2003) Sentiment analyzer: extracting sentiments about a given topic using natural language processing techniques. In data mining, 2003. ICDM 2003. Third IEEE international conference on (pp. 427-434). IEEE

Young SJ, Chase LL (1998) Speech recognition evaluation: a review of the US CSR and LVCSR programmes. Comput Speech Lang 12(4):263–279

Yu S, et al. (2018) "A multi-stage memory augmented neural network for machine reading comprehension." Proceedings of the workshop on machine reading for question answering

Zajic DM, Dorr BJ, Lin J (2008) Single-document and multi-document summarization techniques for email threads using sentence compression. Inf Process Manag 44(4):1600–1610

Zeroual I, Lakhouaja A, Belahbib R (2017) Towards a standard part of speech tagset for the Arabic language. J King Saud Univ Comput Inf Sci 29(2):171–178

Download references

Acknowledgements

Authors would like to express the gratitude to Research Mentors from CL Educate: Accendere Knowledge Management Services Pvt. Ltd. for their comments on earlier versions of the manuscript. Although any errors are our own and should not tarnish the reputations of these esteemed persons. We would also like to appreciate the Editor, Associate Editor, and anonymous referees for their constructive suggestions that led to many improvements on an earlier version of this manuscript.

Author information

Authors and affiliations.

Department of Computer Science, Manav Rachna International Institute of Research and Studies, Faridabad, India

Diksha Khurana & Aditya Koli

Department of Computer Science, BML Munjal University, Gurgaon, India

Kiran Khatter

Department of Statistics, Amity University Punjab, Mohali, India

Sukhdev Singh

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Kiran Khatter .

Ethics declarations

Conflict of interest.

The first draft of this paper was written under the supervision of Dr. Kiran Khatter and Dr. Sukhdev Singh, associated with CL- Educate: Accendere Knowledge Management Services Pvt. Ltd. and deputed at the Manav Rachna International University. The draft is also available on arxiv.org at https://arxiv.org/abs/1708.05148

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Khurana, D., Koli, A., Khatter, K. et al. Natural language processing: state of the art, current trends and challenges. Multimed Tools Appl 82 , 3713–3744 (2023). https://doi.org/10.1007/s11042-022-13428-4

Download citation

Received : 03 February 2021

Revised : 23 March 2022

Accepted : 02 July 2022

Published : 14 July 2022

Issue Date : January 2023

DOI : https://doi.org/10.1007/s11042-022-13428-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Natural language processing
  • Natural language understanding
  • Natural language generation
  • NLP applications
  • NLP evaluation metrics
  • Find a journal
  • Publish with us
  • Track your research
  • How It Works
  • PhD thesis writing
  • Master thesis writing
  • Bachelor thesis writing
  • Dissertation writing service
  • Dissertation abstract writing
  • Thesis proposal writing
  • Thesis editing service
  • Thesis proofreading service
  • Thesis formatting service
  • Coursework writing service
  • Research paper writing service
  • Architecture thesis writing
  • Computer science thesis writing
  • Engineering thesis writing
  • History thesis writing
  • MBA thesis writing
  • Nursing dissertation writing
  • Psychology dissertation writing
  • Sociology thesis writing
  • Statistics dissertation writing
  • Buy dissertation online
  • Write my dissertation
  • Cheap thesis
  • Cheap dissertation
  • Custom dissertation
  • Dissertation help
  • Pay for thesis
  • Pay for dissertation
  • Senior thesis
  • Write my thesis

211 Research Topics in Linguistics To Get Top Grades

research topics in linguistics

Many people find it hard to decide on their linguistics research topics because of the assumed complexities involved. They struggle to choose easy research paper topics for English language too because they think it could be too simple for a university or college level certificate.

All that you need to learn about Linguistics and English is sprawled across syntax, phonetics, morphology, phonology, semantics, grammar, vocabulary, and a few others. To easily create a top-notch essay or conduct a research study, you can consider this list of research topics in English language below for your university or college use. Note that you can fine-tune these to suit your interests.

Linguistics Research Paper Topics

If you want to study how language is applied and its importance in the world, you can consider these Linguistics topics for your research paper. They are:

  • An analysis of romantic ideas and their expression amongst French people
  • An overview of the hate language in the course against religion
  • Identify the determinants of hate language and the means of propagation
  • Evaluate a literature and examine how Linguistics is applied to the understanding of minor languages
  • Consider the impact of social media in the development of slangs
  • An overview of political slang and its use amongst New York teenagers
  • Examine the relevance of Linguistics in a digitalized world
  • Analyze foul language and how it’s used to oppress minors
  • Identify the role of language in the national identity of a socially dynamic society
  • Attempt an explanation to how the language barrier could affect the social life of an individual in a new society
  • Discuss the means through which language can enrich cultural identities
  • Examine the concept of bilingualism and how it applies in the real world
  • Analyze the possible strategies for teaching a foreign language
  • Discuss the priority of teachers in the teaching of grammar to non-native speakers
  • Choose a school of your choice and observe the slang used by its students: analyze how it affects their social lives
  • Attempt a critical overview of racist languages
  • What does endangered language means and how does it apply in the real world?
  • A critical overview of your second language and why it is a second language
  • What are the motivators of speech and why are they relevant?
  • Analyze the difference between the different types of communications and their significance to specially-abled persons
  • Give a critical overview of five literature on sign language
  • Evaluate the distinction between the means of language comprehension between an adult and a teenager
  • Consider a native American group and evaluate how cultural diversity has influenced their language
  • Analyze the complexities involved in code-switching and code-mixing
  • Give a critical overview of the importance of language to a teenager
  • Attempt a forensic overview of language accessibility and what it means
  • What do you believe are the means of communications and what are their uniqueness?
  • Attempt a study of Islamic poetry and its role in language development
  • Attempt a study on the role of Literature in language development
  • Evaluate the Influence of metaphors and other literary devices in the depth of each sentence
  • Identify the role of literary devices in the development of proverbs in any African country
  • Cognitive Linguistics: analyze two pieces of Literature that offers a critical view of perception
  • Identify and analyze the complexities in unspoken words
  • Expression is another kind of language: discuss
  • Identify the significance of symbols in the evolution of language
  • Discuss how learning more than a single language promote cross-cultural developments
  • Analyze how the loss of a mother tongue affect the language Efficiency of a community
  • Critically examine how sign language works
  • Using literature from the medieval era, attempt a study of the evolution of language
  • Identify how wars have led to the reduction in the popularity of a language of your choice across any country of the world
  • Critically examine five Literature on why accent changes based on environment
  • What are the forces that compel the comprehension of language in a child
  • Identify and explain the difference between the listening and speaking skills and their significance in the understanding of language
  • Give a critical overview of how natural language is processed
  • Examine the influence of language on culture and vice versa
  • It is possible to understand a language even without living in that society: discuss
  • Identify the arguments regarding speech defects
  • Discuss how the familiarity of language informs the creation of slangs
  • Explain the significance of religious phrases and sacred languages
  • Explore the roots and evolution of incantations in Africa

Sociolinguistic Research Topics

You may as well need interesting Linguistics topics based on sociolinguistic purposes for your research. Sociolinguistics is the study and recording of natural speech. It’s primarily the casual status of most informal conversations. You can consider the following Sociolinguistic research topics for your research:

  • What makes language exceptional to a particular person?
  • How does language form a unique means of expression to writers?
  • Examine the kind of speech used in health and emergencies
  • Analyze the language theory explored by family members during dinner
  • Evaluate the possible variation of language based on class
  • Evaluate the language of racism, social tension, and sexism
  • Discuss how Language promotes social and cultural familiarities
  • Give an overview of identity and language
  • Examine why some language speakers enjoy listening to foreigners who speak their native language
  • Give a forensic analysis of his the language of entertainment is different to the language in professional settings
  • Give an understanding of how Language changes
  • Examine the Sociolinguistics of the Caribbeans
  • Consider an overview of metaphor in France
  • Explain why the direct translation of written words is incomprehensible in Linguistics
  • Discuss the use of language in marginalizing a community
  • Analyze the history of Arabic and the culture that enhanced it
  • Discuss the growth of French and the influences of other languages
  • Examine how the English language developed and its interdependence on other languages
  • Give an overview of cultural diversity and Linguistics in teaching
  • Challenge the attachment of speech defect with disability of language listening and speaking abilities
  • Explore the uniqueness of language between siblings
  • Explore the means of making requests between a teenager and his parents
  • Observe and comment on how students relate with their teachers through language
  • Observe and comment on the communication of strategy of parents and teachers
  • Examine the connection of understanding first language with academic excellence

Language Research Topics

Numerous languages exist in different societies. This is why you may seek to understand the motivations behind language through these Linguistics project ideas. You can consider the following interesting Linguistics topics and their application to language:

  • What does language shift mean?
  • Discuss the stages of English language development?
  • Examine the position of ambiguity in a romantic Language of your choice
  • Why are some languages called romantic languages?
  • Observe the strategies of persuasion through Language
  • Discuss the connection between symbols and words
  • Identify the language of political speeches
  • Discuss the effectiveness of language in an indigenous cultural revolution
  • Trace the motivators for spoken language
  • What does language acquisition mean to you?
  • Examine three pieces of literature on language translation and its role in multilingual accessibility
  • Identify the science involved in language reception
  • Interrogate with the context of language disorders
  • Examine how psychotherapy applies to victims of language disorders
  • Study the growth of Hindi despite colonialism
  • Critically appraise the term, language erasure
  • Examine how colonialism and war is responsible for the loss of language
  • Give an overview of the difference between sounds and letters and how they apply to the German language
  • Explain why the placement of verb and preposition is different in German and English languages
  • Choose two languages of your choice and examine their historical relationship
  • Discuss the strategies employed by people while learning new languages
  • Discuss the role of all the figures of speech in the advancement of language
  • Analyze the complexities of autism and its victims
  • Offer a linguist approach to language uniqueness between a Down Syndrome child and an autist
  • Express dance as a language
  • Express music as a language
  • Express language as a form of language
  • Evaluate the role of cultural diversity in the decline of languages in South Africa
  • Discuss the development of the Greek language
  • Critically review two literary texts, one from the medieval era and another published a decade ago, and examine the language shifts

Linguistics Essay Topics

You may also need Linguistics research topics for your Linguistics essays. As a linguist in the making, these can help you consider controversies in Linguistics as a discipline and address them through your study. You can consider:

  • The connection of sociolinguistics in comprehending interests in multilingualism
  • Write on your belief of how language encourages sexism
  • What do you understand about the differences between British and American English?
  • Discuss how slangs grew and how they started
  • Consider how age leads to loss of language
  • Review how language is used in formal and informal conversation
  • Discuss what you understand by polite language
  • Discuss what you know by hate language
  • Evaluate how language has remained flexible throughout history
  • Mimicking a teacher is a form of exercising hate Language: discuss
  • Body Language and verbal speech are different things: discuss
  • Language can be exploitative: discuss
  • Do you think language is responsible for inciting aggression against the state?
  • Can you justify the structural representation of any symbol of your choice?
  • Religious symbols are not ordinary Language: what are your perspective on day-to-day languages and sacred ones?
  • Consider the usage of language by an English man and someone of another culture
  • Discuss the essence of code-mixing and code-switching
  • Attempt a psychological assessment on the role of language in academic development
  • How does language pose a challenge to studying?
  • Choose a multicultural society of your choice and explain the problem they face
  • What forms does Language use in expression?
  • Identify the reasons behind unspoken words and actions
  • Why do universal languages exist as a means of easy communication?
  • Examine the role of the English language in the world
  • Examine the role of Arabic in the world
  • Examine the role of romantic languages in the world
  • Evaluate the significance of each teaching Resources in a language classroom
  • Consider an assessment of language analysis
  • Why do people comprehend beyond what is written or expressed?
  • What is the impact of hate speech on a woman?
  • Do you believe that grammatical errors are how everyone’s comprehension of language is determined?
  • Observe the Influence of technology in language learning and development
  • Which parts of the body are responsible for understanding new languages
  • How has language informed development?
  • Would you say language has improved human relations or worsened it considering it as a tool for violence?
  • Would you say language in a black populous state is different from its social culture in white populous states?
  • Give an overview of the English language in Nigeria
  • Give an overview of the English language in Uganda
  • Give an overview of the English language in India
  • Give an overview of Russian in Europe
  • Give a conceptual analysis on stress and how it works
  • Consider the means of vocabulary development and its role in cultural relationships
  • Examine the effects of Linguistics in language
  • Present your understanding of sign language
  • What do you understand about descriptive language and prescriptive Language?

List of Research Topics in English Language

You may need English research topics for your next research. These are topics that are socially crafted for you as a student of language in any institution. You can consider the following for in-depth analysis:

  • Examine the travail of women in any feminist text of your choice
  • Examine the movement of feminist literature in the Industrial period
  • Give an overview of five Gothic literature and what you understand from them
  • Examine rock music and how it emerged as a genre
  • Evaluate the cultural association with Nina Simone’s music
  • What is the relevance of Shakespeare in English literature?
  • How has literature promoted the English language?
  • Identify the effect of spelling errors in the academic performance of students in an institution of your choice
  • Critically survey a university and give rationalize the literary texts offered as Significant
  • Examine the use of feminist literature in advancing the course against patriarchy
  • Give an overview of the themes in William Shakespeare’s “Julius Caesar”
  • Express the significance of Ernest Hemingway’s diction in contemporary literature
  • Examine the predominant devices in the works of William Shakespeare
  • Explain the predominant devices in the works of Christopher Marlowe
  • Charles Dickens and his works: express the dominating themes in his Literature
  • Why is Literature described as the mirror of society?
  • Examine the issues of feminism in Sefi Atta’s “Everything Good Will Come” and Bernadine Evaristos’s “Girl, Woman, Other”
  • Give an overview of the stylistics employed in the writing of “Girl, Woman, Other” by Bernadine Evaristo
  • Describe the language of advertisement in social media and newspapers
  • Describe what poetic Language means
  • Examine the use of code-switching and code-mixing on Mexican Americans
  • Examine the use of code-switching and code-mixing in Indian Americans
  • Discuss the influence of George Orwell’s “Animal Farm” on satirical literature
  • Examine the Linguistics features of “Native Son” by Richard Wright
  • What is the role of indigenous literature in promoting cultural identities
  • How has literature informed cultural consciousness?
  • Analyze five literature on semantics and their Influence on the study
  • Assess the role of grammar in day to day communications
  • Observe the role of multidisciplinary approaches in understanding the English language
  • What does stylistics mean while analyzing medieval literary texts?
  • Analyze the views of philosophers on language, society, and culture

English Research Paper Topics for College Students

For your college work, you may need to undergo a study of any phenomenon in the world. Note that they could be Linguistics essay topics or mainly a research study of an idea of your choice. Thus, you can choose your research ideas from any of the following:

  • The concept of fairness in a democratic Government
  • The capacity of a leader isn’t in his or her academic degrees
  • The concept of discrimination in education
  • The theory of discrimination in Islamic states
  • The idea of school policing
  • A study on grade inflation and its consequences
  • A study of taxation and Its importance to the economy from a citizen’s perspectives
  • A study on how eloquence lead to discrimination amongst high school students
  • A study of the influence of the music industry in teens
  • An Evaluation of pornography and its impacts on College students
  • A descriptive study of how the FBI works according to Hollywood
  • A critical consideration of the cons and pros of vaccination
  • The health effect of sleep disorders
  • An overview of three literary texts across three genres of Literature and how they connect to you
  • A critical overview of “King Oedipus”: the role of the supernatural in day to day life
  • Examine the novel “12 Years a Slave” as a reflection of servitude and brutality exerted by white slave owners
  • Rationalize the emergence of racist Literature with concrete examples
  • A study of the limits of literature in accessing rural readers
  • Analyze the perspectives of modern authors on the Influence of medieval Literature on their craft
  • What do you understand by the mortality of a literary text?
  • A study of controversial Literature and its role in shaping the discussion
  • A critical overview of three literary texts that dealt with domestic abuse and their role in changing the narratives about domestic violence
  • Choose three contemporary poets and analyze the themes of their works
  • Do you believe that contemporary American literature is the repetition of unnecessary themes already treated in the past?
  • A study of the evolution of Literature and its styles
  • The use of sexual innuendos in literature
  • The use of sexist languages in literature and its effect on the public
  • The disaster associated with media reports of fake news
  • Conduct a study on how language is used as a tool for manipulation
  • Attempt a criticism of a controversial Literary text and why it shouldn’t be studied or sold in the first place

Finding Linguistics Hard To Write About?

With these topics, you can commence your research with ease. However, if you need professional writing help for any part of the research, you can scout here online for the best research paper writing service.

There are several expert writers on ENL hosted on our website that you can consider for a fast response on your research study at a cheap price.

As students, you may be unable to cover every part of your research on your own. This inability is the reason you should consider expert writers for custom research topics in Linguistics approved by your professor for high grades.

Sociology Research Topics

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Comment * Error message

Name * Error message

Email * Error message

Save my name, email, and website in this browser for the next time I comment.

As Putin continues killing civilians, bombing kindergartens, and threatening WWIII, Ukraine fights for the world's peaceful future.

Ukraine Live Updates

Purdue Online Writing Lab Purdue OWL® College of Liberal Arts

MLA General Format 

OWL logo

Welcome to the Purdue OWL

This page is brought to you by the OWL at Purdue University. When printing this page, you must include the entire legal notice.

Copyright ©1995-2018 by The Writing Lab & The OWL at Purdue and Purdue University. All rights reserved. This material may not be published, reproduced, broadcast, rewritten, or redistributed without permission. Use of this site constitutes acceptance of our terms and conditions of fair use.

MLA Style specifies guidelines for formatting manuscripts and citing research in writing. MLA Style also provides writers with a system for referencing their sources through parenthetical citation in their essays and Works Cited pages. 

Writers who properly use MLA also build their credibility by demonstrating accountability to their source material. Most importantly, the use of MLA style can protect writers from accusations of plagiarism, which is the purposeful or accidental uncredited use of source material produced by other writers. 

If you are asked to use MLA format, be sure to consult the  MLA Handbook  (9th edition). Publishing scholars and graduate students should also consult the  MLA Style Manual and Guide to Scholarly Publishing  (3rd edition). The  MLA Handbook  is available in most writing centers and reference libraries. It is also widely available in bookstores, libraries, and at the MLA web site. See the Additional Resources section of this page for a list of helpful books and sites about using MLA Style.

Paper Format

The preparation of papers and manuscripts in MLA Style is covered in part four of the  MLA Style Manual . Below are some basic guidelines for formatting a paper in  MLA Style :

General Guidelines

  • Type your paper on a computer and print it out on standard, white 8.5 x 11-inch paper.
  • Double-space the text of your paper and use a legible font (e.g. Times New Roman). Whatever font you choose, MLA recommends that the regular and italics type styles contrast enough that they are each distinct from one another. The font size should be 12 pt.
  • Leave only one space after periods or other punctuation marks (unless otherwise prompted by your instructor).
  • Set the margins of your document to 1 inch on all sides.
  • Indent the first line of each paragraph one half-inch from the left margin. MLA recommends that you use the “Tab” key as opposed to pushing the space bar five times.
  • Create a header that numbers all pages consecutively in the upper right-hand corner, one-half inch from the top and flush with the right margin. (Note: Your instructor may ask that you omit the number on your first page. Always follow your instructor's guidelines.)
  • Use italics throughout your essay to indicate the titles of longer works and, only when absolutely necessary, provide emphasis.
  • If you have any endnotes, include them on a separate page before your Works Cited page. Entitle the section Notes (centered, unformatted).

Formatting the First Page of Your Paper

  • Do not make a title page for your paper unless specifically requested or the paper is assigned as a group project. In the case of a group project, list all names of the contributors, giving each name its own line in the header, followed by the remaining MLA header requirements as described below. Format the remainder of the page as requested by the instructor.
  • In the upper left-hand corner of the first page, list your name, your instructor's name, the course, and the date. Again, be sure to use double-spaced text.
  • Double space again and center the title. Do not underline, italicize, or place your title in quotation marks. Write the title in Title Case (standard capitalization), not in all capital letters.
  • Use quotation marks and/or italics when referring to other works in your title, just as you would in your text. For example:  Fear and Loathing in Las Vegas  as Morality Play; Human Weariness in "After Apple Picking"
  • Double space between the title and the first line of the text.
  • Create a header in the upper right-hand corner that includes your last name, followed by a space with a page number. Number all pages consecutively with Arabic numerals (1, 2, 3, 4, etc.), one-half inch from the top and flush with the right margin. (Note: Your instructor or other readers may ask that you omit the last name/page number header on your first page. Always follow instructor guidelines.)

Here is a sample of the first page of a paper in MLA style:

This image shows the first page of an MLA paper.

The First Page of an MLA Paper

Section Headings

Writers sometimes use section headings to improve a document’s readability. These sections may include individual chapters or other named parts of a book or essay.

MLA recommends that when dividing an essay into sections you number those sections with an Arabic number and a period followed by a space and the section name.

MLA does not have a prescribed system of headings for books (for more information on headings, please see page 146 in the MLA Style Manual and Guide to Scholarly Publishing , 3rd edition). If you are only using one level of headings, meaning that all of the sections are distinct and parallel and have no additional sections that fit within them, MLA recommends that these sections resemble one another grammatically. For instance, if your headings are typically short phrases, make all of the headings short phrases (and not, for example, full sentences). Otherwise, the formatting is up to you. It should, however, be consistent throughout the document.

If you employ multiple levels of headings (some of your sections have sections within sections), you may want to provide a key of your chosen level headings and their formatting to your instructor or editor.

Sample Section Headings

The following sample headings are meant to be used only as a reference. You may employ whatever system of formatting that works best for you so long as it remains consistent throughout the document.

Formatted, unnumbered:

Level 1 Heading: bold, flush left

Level 2 Heading: italics, flush left

Level 3 Heading: centered, bold

Level 4 Heading: centered, italics

Level 5 Heading: underlined, flush left

ORIGINAL RESEARCH article

The effect of language learning strategies on proficiency, attitudes and school achievement.

\r\nAnita Habk*

  • Institute of Education, University of Szeged, Szeged, Hungary

This study examines language learning strategy (LLS) use in connexion with foreign language attitude, proficiency and general school achievement among lower secondary students in Years 5 and 8 ( n = 868) in Hungary. An adapted version of the Strategies Inventory for Language Learning questionnaire was used for data collection. The results showed that Hungarian students mainly engage in metacognitive strategies in both years. Differences between more and less proficient language learners’ strategy use have also been found. With regard to the effect of LLS on foreign language attitude, the foreign language mark and school achievement, path analysis indicated a good fit in both years. The metacognitive, social and memory strategies primarily influenced foreign language attitudes and marks in Year 5. The metacognitive strategies had a slight impact on school achievement as well as on foreign language marks. We demonstrated the dominant effect of metacognitive strategies and the low effect of memory strategies in Year 8. In addition, metacognitive strategies also influenced foreign language marks. The effect of foreign language marks on school achievement was also remarkable. There was a strong impact on the children’s attitudes through these variables.

Introduction

In recent decades, a number of studies have focused on foreign language learning, with the emphasis often having been placed on language learning strategies (LLS; Wong and Nunan, 2011 ; Oxford, 2016 ). Several studies have confirmed that these strategies aid students in becoming more effective learners inside the classroom and foster more efficient development of students’ mastery of the target language after leaving school ( Wong and Nunan, 2011 ). However, less is known about the structure and relationship between LLS, foreign language attitude, the foreign language mark and general school achievement (GA). Recent studies have mainly dealt with LLS among university students and upper secondary students, with only a few investigations having been conducted among lower secondary students. In the present study, we aim to examine young Hungarian students’ LLS use and its connexion to foreign language attitude, the foreign language mark and school achievement at the beginning and end of lower secondary school. We believe that it adds value to the article that we have investigated a young age group, as the beginning period of language learning can establish the success of the entire process. Another advantage of our research is that we analysed the whole language learning process in connexion with several other factors to represent the complexity of the language learning process.

Theoretical Background

Studies on LLS in recent decades have identified a large number of strategies which are employed by English as a foreign/second language (EFL/ESL) learners and several strategy categorisation patterns have also been established. The most frequently used taxonomy was developed by Oxford (1990) . She identified three direct and three indirect strategy types. Direct strategies are specific means of language use: memory, cognitive and compensatory (or compensation) strategies. Indirect strategies, such as metacognitive, affective and social strategies, support LLS indirectly. Recently, Oxford revisited her strategy categories and developed a model with four different strategy categories: cognitive, affective and sociocultural-interactive as well as a master category of “metastrategies.” Metastrategies comprise metacognitive, meta-affective and meta-sociocultural-interactive strategies ( Griffith and Oxford, 2014 ; Oxford, 2016 ). However, she did not elaborate on this strategy classification, and thus our study relied on her original taxonomy.

Various studies have focused on LLS use and aimed to identify the strategies most frequently employed by language learners ( Chamot, 2004 ; Magogwe and Oliver, 2007 ; Wu, 2008 ; Chen, 2009 ; Al-Qahtani, 2013 ; Charoento, 2016 ; Alhaysony, 2017 ; Dawadi, 2017 ). Overall, it can be concluded that the most commonly used LLS in these studies were metacognitive, compensation and cognitive strategies. However, Chamot (2004) pointed out that different strategy preferences were reported by students in different cultural contexts. Chinese and Singaporean students reported a higher level preference for social strategies and lower use of affective strategies than European students.

Some studies have dealt with the implementation of the SILL with a focus on school-aged students ( Magogwe and Oliver, 2007 ; Chen, 2009 , 2014 ; Gunning and Oxford, 2014 ; Platsidou and Kantaridou, 2014 ; Pfenninger and Singleton, 2017 ). The overall conclusion of these studies has been that young learners mostly used social, affective and compensation strategies. The use of memory strategies was relatively low ( Doró and Habók, 2013 ). The attitudes of learners at this age toward language learning are particularly important since they can greatly determine motivation, learning outcomes and later success in language learning ( Platsidou and Kantaridou, 2014 ; Platsidou and Sipitanou, 2014 ).

As the purpose of investigating LLS is to foster learning processes and improve language level, research projects often deal with LLS use in relation to language learning proficiency ( Khaldieh, 2000 ; Magogwe and Oliver, 2007 ; Wu, 2008 ; Chen, 2009 ; Liu, 2010 ; Al-Qahtani, 2013 ; Platsidou and Kantaridou, 2014 ; Charoento, 2016 ; Rao, 2016 ). The notion of proficiency has been defined and involved in analysis in a multitude of ways by various researchers. Charoento (2016) involved self-ratings, Wu (2008) used the results from language proficiency and achievement tests, Magogwe and Oliver (2007) incorporated language course grades into their analysis of their results. Most studies have shown a positive relationship between LLS and proficiency, but the direction of their connexion was often different. Some researchers have stressed that strategy use was mainly specified by proficiency. More proficient students engaged in LLS more frequently and also employed a broader range of strategies overall compared to less proficient students ( Khaldieh, 2000 ; Wu, 2008 ; Rao, 2016 ). Al-Qahtani (2013) and Charoento (2016) demonstrated that successful students mainly used cognitive strategies, while Wu (2008) emphasised significant utilisation of cognitive, metacognitive and social strategies among more proficient university students. Chen (2009) pointed to the use of fewer communication strategies among proficient learners, but noted that they employed them more efficiently than less proficient learners. In addition, Magogwe and Oliver (2007) also established that the basic difference in LLS use between proficient and less proficient learners was that more successful students not only used certain LLS significantly more often, but were also able to select the most adequate strategies depending on the goal of their task.

Some studies have dealt with the effect of LLS use on language proficiency. Both Liu (2010) and Platsidou and Kantaridou (2014) pointed out that learning strategy influences language use and that it plays a significant role in anticipating perceived language performance. Wu (2008) noted that cognitive strategies have the most dominant influence on proficiency. Rao (2016) found that students’ English proficiency significantly affected their learning strategy use and also observed that high-level students avail themselves of more strategies more frequently than low-level students.

Another essential area of LLS research is the study of strategy use in relation to affective variables, such as attitude and motivation ( Shang, 2010 ; Jabbari and Golkar, 2014 ; Platsidou and Kantaridou, 2014 ). Most of these studies have found that learners with a positive attitude employed LLS more frequently compared to learners with a negative attitude. Platsidou and Kantaridou (2014) reported that attitudes toward second language learning influence both direct and indirect strategy uses and that changing learners’ attitudes toward language learning can thus foster their strategy practises. Jabbari and Golkar (2014) established that learners with a positive attitude employ cognitive, compensation, metacognitive and social strategies more frequently.

It can be concluded that LLS use has been studied extensively in recent decades. Most research has found that LLS cannot be analysed separately; it must be examined in relation to certain other factors, among which foreign language attitudes and proficiency play a central role ( Griffiths and Incecay, 2016 ). However, most previous studies preferred university students or adults to primary or secondary school-aged students. Furthermore, a limited amount of research has investigated the relationship of LLS with attitude toward foreign language learning and the foreign language mark. There has also been a dearth of scholarship on how language proficiency and school achievement are determined by LLS use and attitude. Our study aims to fill this gap and attempts to present a comprehensive view of the relationship between LLS use and language attitude and between proficiency and general school achievement by focusing on school children at the beginning and end of lower secondary school. Our specific research question we focus on in this paper is the following:

What are the lower secondary school children’s strategy use preferences and how these are connected with their foreign language attitude, proficiency and general school achievement? Based on the relevant literature we assume that students of this age mainly employ indirect strategies, such as affective, metacognitive and social strategies and these have a significant impact on their foreign language learning attitude, proficiency and general school achievement.

Materials and Methods

Participants.

The participants in the present study were lower secondary students (11- and 14-year-olds) in Hungary ( n Year5 = 450, n Year8 = 418). Participation in the study was voluntary both for schools and students. This study was carried out in accordance with the recommendations of the University of Szeged, the Hungarian law and the municipalities that maintain the schools. The IRB of the Doctoral School (University of Szeged) specifically approved this research project. The agreements are documented and stored in written form in the schools.

Our target group generally started learning a foreign language in Year 4. As one portion of our sample have been learning a foreign language for at least four years, they must have experience of how they learn language. In Hungary, the primary level of education is composed of the elementary and lower secondary school levels; hence, the transition occurs with relatively few major changes, and children have the same language teacher during these school levels. While the foreign language teacher does not change, the other school subjects are taught by specialist teachers as of Year 5. Learning difficulties and differences among children grow considerably from the beginning of lower secondary school; hence, diagnosing language learning attitude is particularly essential.

Instruments

The Strategy Inventory for Language Learning (SILL, Oxford, 1990 ) was administered to investigate the children’s LLS use. The SILL is a standardised measurement tool, and it is applicable to various foreign languages. The complex questionnaire is clustered into six strategy fields: (1) memory (9 items); (2) cognitive (14 items); (3) compensation (6 items); (4) metacognitive (9 items); (5) affective (6 items); and (6) social strategies (6 items). The participants were asked to respond to each statement on a five-point Likert scale. The answers ranged from ‘1 = never or almost never true of me’ to ‘5 = always or almost always true of me.’ The reported internal consistency reliabilities of the questionnaires ranged between 0.91 and 0.94 (Cronbach’s alpha) ( Oxford and Burry-Stock, 1995 ; Ardasheva and Tretter, 2013 ). The questionnaire was conducted in Hungarian to eliminate differences in English knowledge and make it suitable for the language levels in these age groups. The reliability of the Hungarian version was confirmed in previous research ( Doró and Habók, 2013 ). In addition, the children were asked to self-report their foreign language attitude, foreign language mark (indicating students’ foreign language knowledge) and general school achievement (grade point average, which includes students’ achievement in all subjects) on a five-point scale. In Hungarian schools, the different proficiency levels are rated on a five-point scale: 1 is the weakest mark, and 5 is the most excellent.

Design and Procedure

Quantitative research design was employed through online survey methodology. The SILL questionnaire was administered via the eDia online testing platform, which was developed by the Centre for Research on Learning and Instruction for assessing Year 1–6 children’s foreign language knowledge and attitudes. One school lesson was provided for data collection; however, the children needed approximately 20 min to hand in their ratings. Both the children and teachers are familiar with this system because the online platform has been in use since 2009.

Data were handled confidentially during the testing procedure; the children used an identification code provided by research administrators. The researchers were only able to see the codes, and only the teachers were able to identify their students with the codes. All the instructions were in the online questionnaire, so the children were able to answer the questions individually. The teachers were also requested to report the children’s questions, remarks and difficulties during testing. Finally, the teachers reported no misunderstandings or problematic items during data collection.

The data analyses were twofold. First, SPSS for Microsoft Windows 20.0 was employed for classical test analysis, which included an estimation of frequencies, means and standard deviations. The significance of differences among the variables was determined by ANOVA analysis. Second, path analysis was managed by the SPSS AMOS v20 software package to analyse the effect of strategy use on the variables under observation ( Arbuckle, 2008 ). The model fit was indicated by the Tucker–Lewis index (TLI), the normed fit index (NFI), the comparative fit index (CFI) and the root mean square error of approximation (RMSEA) ( Byrne, 2010 ; Kline, 2015 ).

Descriptive Analysis

General strategy uses among lower secondary school children.

The mean scores and standard deviations showed moderate LLS use, with the use of metacognitive, affective and social strategies being the highest in Year 5 (Table 1 ). Compensatory strategies were employed significantly the lowest. In Year 8, besides metacognitive and social strategies, cognitive strategies were relied on the most. Metacognitive strategy use was similarly high in both age groups. Significant differences were found between the age groups in memory, compensation and affective strategies ( p ≤ 0.01). While the use of affective strategies was relatively high in Year 5, it was the least frequently employed in Year 8.

www.frontiersin.org

TABLE 1. The strategy use results for the sample.

Differences in Strategy Use among Students with Different Proficiency Levels

One of our goals was to identify students’ LLS use preferences according to their proficiency levels. To implement this goal, we grouped the children into categories according to their proficiency, which was derived from their foreign language marks.

We combined the foreign language marks for those children who were evaluated with a 1 or a 2. These children showed a very low knowledge level and demonstrated a large number of difficulties and misunderstandings in foreign language learning. The next group was formed of children who were assessed at mark 3. This mark indicated an average knowledge level with gaps. Children who were evaluated with a mark 4 had fewer significant deficits. Children who received a mark 5 were the highest performers in school. Tables 2 , 3 summarise our results on strategy use according to foreign language marks. The number of children is also indicated according to each category.

www.frontiersin.org

TABLE 2. Means of strategy users according to their foreign language mark in Year 5.

www.frontiersin.org

TABLE 3. Means of strategy users according to their foreign language mark in Year 8.

Multivariate Analyses

The relationships between lls and foreign language attitude, lls and foreign language marks, and lls and general school achievement.

Our results demonstrated that the sample was evaluated at an approximate level of mark 4 ( M Year5 = 3.84, SD Year5 = 1.17; M Year8 = 3.62, SD Year8 = 1.17); however, Year 5 children achieved significantly higher ( p < 0.01). As regards children’s attitudes, we found no significant differences between the years ( M Year5 = 3.53, SD Year5 = 1.35; M Year8 = 3.43, SD Year8 = 1.23; p < 0.05). On the whole, it can be stated that children’s foreign language marks are higher than their attitude toward foreign language. The average school achievement showed significantly higher means than foreign language marks in both years ( M Year5 = 3.82, SD Year5 = 0.87, p < 0.001; M Year8 = 3.62, SD Year8 = 1.17, p < 0.001).

We also examined the correlation between LLS and attitude toward foreign languages, LLS and the foreign language mark, and LLS and general school achievement. We observed the most significant estimates between language learning strategy use and attitude in Year 5 ( r = 0.53–0.20; p < 0.001–0.05). The correlational coefficient between attitude and the foreign language mark was also significant ( r = 0.37; p < 0.001). We noted that children who achieved higher in foreign languages showed a more positive attitude toward them. We also noticed a significantly strong effect for the foreign language mark and strategy use ( r = 0.49–0.13; p < 0.001–0.05).

In Year 8, we found significant ( r Year5 = 0.70–0.12; p < 0.001–0.01; r Year8 = 0.82–0.66; p < 0.001–0.01) relationships between overall strategy use and foreign language marks, attitudes and general school achievement. However, the relationship between affective strategies and school achievement was not significant. We observed that children who use LLS have positive attitudes toward language learning, except for compensation and affective strategies.

The Effect of Language Learning Strategies on Attitude, School Marks and General School Achievement

We analysed the effect of LLS on foreign language attitude, school marks and general achievement using AMOS. We were looking for causalities between questionnaire fields and further variables by constructing a theoretical model on the basis of Oxford’s strategy taxonomy and children’s background data. We hypothesised that strategy factors largely influence children’s attitude toward language learning and through this the other variables. The model we created showed appropriate fit indices for the final model and indicated a good fit to our data in both years (Figures 1 , 2 ).

www.frontiersin.org

FIGURE 1. The path model for LLS influence on foreign language mark through foreign language attitude and general school achievement (GA) in Year 5.

www.frontiersin.org

FIGURE 2. The path model for LLS influence on foreign language mark through foreign language attitude and general school achievement (GA) in Year 8.

Year 5 : χ 2 (13) = 18,309, p = 0.146; Year 8 : χ 2 (13) = 23,893, p = 0.18. An analysis of the hypothesised path model indicated a comparative fit index (CFI) of 0.998 in Year 5 and 0.994 in Year 8. The RMSEA (root mean squared error of approximation) was also good in both years, 0.030 in Year 5 and.049 in Year 8. Both the Tucker–Lewis index (TLI Year5 = 0.992; TLI Year8 = 0.981) and the normed fit index (NFI Year8 = 0.992; NFI Year8 = 0.989) confirmed that the model we constructed was a good fit to our data.

The main aim of the present study was to investigate our understanding of LLS in a foreign language learning context. Therefore, first, we identified the strategy use preferences in the sample and specified the most and least often used strategies among children with different proficiency levels. Second, we examined the children’s LLS use in connexion with their foreign language attitude, proficiency and general school achievement. Our results confirmed some results from previous studies and also established new relationships among the variables.

Regarding the general strategy use preferences of the sample, the students reported moderate use of the six strategy categories. The use of indirect strategies, more precisely, metacognitive, affective and social strategies, was the highest in Year 5, while metacognitive, cognitive and social strategies were the most frequently employed in Year 8. These findings shed light on the different preferences among the different ages and proficiency levels. While affective strategies play a significant role in Year 5, cognitive strategies become more dominant later. Metacognitive and social strategies remained the most frequently used in both Years. Our result is consistent with those reported by Dawadi (2017) who discovered similar strategy preferences. We can also reinforce Alhaysony’s (2017) results that high school sample did not engage in affective strategies, and Charoento’s (2016) findings about the low use of memory strategies.

We also examined the differences in strategy use among students with different proficiency levels in both Years. In Year 5 the research findings analysis demonstrated significant differences among strategy uses in four areas: the memory, cognitive, metacognitive and social fields. We noted no significant differences among children in compensation and affective strategies. As regards memory strategies, we observed that low-achieving children rarely employed them. Low achievers used cognitive strategies significantly less often than good and high performers. As our results showed, the most excellent learners are also metacognitive strategy users, and they engage in social strategies significantly very often. In Year 8, we observed significant differences in every field among children with different proficiencies. As in Year 5, the use of metacognitive and social strategies was the most frequent among the high-achieving students; however, cognitive strategy use was also relatively high. Charoento (2016) and Rao (2016) reported the same results, so we can confirm his previous research outcomes that high achievers avail themselves of strategies significantly more frequently than low-performing learners.

We also investigated the relationship between LLS and foreign language attitude, LLS and the foreign language mark, and LLS and general school achievement. According to our results, we found that children who prefer foreign language learning reported significantly higher strategy use. As regards foreign language marks, the relationships between different kinds of strategy users and their foreign language marks were low. Children with high proficiency did not necessarily employ each of the strategies at a higher rate. The same result was reached by Chen (2009) . The relationship between affective strategies and school achievement was not significant. We observed that children who use LLS have positive attitudes toward language learning. So our findings partly confirmed previous results reported by Jabbari and Golkar (2014) and Platsidou and Kantaridou (2014) .

Concerning the impact of strategy use on foreign language learning attitudes, proficiency and general school achievement. In Year 5 the effect of the questionnaire fields on foreign language attitude was considerably high; attitudes were strongly influenced by metacognitive strategies, and the effect of social strategies was also high. While memory and cognitive strategies showed positive paths to attitudes, compensation and affective strategies indicated negative effects on attitudes. Foreign language attitudes signified the same effect on foreign language marks as these marks did on general achievement. A lower but significant effect of metacognitive strategies was found on general school achievement in Year 5.

In Year 8, we found similar tendencies. The effect of metacognitive strategies on foreign language attitudes was very high, while that of memory strategies was low. The effect of social strategies was lost in Year 8. The impact of foreign language attitude on the foreign language mark was almost the same as in Year 5, but that of the foreign language mark on general school achievement was twice as high. Shawer (2016) likewise highlighted what our results have also shown: strategy use has a significant effect on general school achievement. Metacognitive strategies also had a direct effect on foreign language marks. On the whole, not only did we observe a strong use of metacognitive strategies, but the effect of metacognitive strategies on attitudes was also dominant in both years. Moreover, metacognitive strategies influenced school achievement in Year 5 and foreign language marks in Year 8.

To sum up, our results demonstrated that like other studies, our Hungarian sample showed significant preferences for metacognitive strategy use. Compensatory strategies were the least frequently preferred in Year 5 and memory strategies were the least common in Year 8, a finding which also reinforced previous research outcomes ( Doró and Habók, 2013 ). We observed significant differences between more and less proficient students in strategy use. In line with other research ( Platsidou and Kantaridou, 2014 ), we conclude that more proficient learners avail themselves of a broader range of strategies than less proficient students and strategy use has a significant effect on foreign language marks.

The research focused on the whole language process in connexion with several other factors among young students. The added value of our research is not only that we discovered relationships between factors required for foreign language learning, but direct and indirect underlying effects have also been brought to light through path analysis. These analyses provide a comprehensive view both of the dominant role of metacognitive strategies and of the foreign language learning process generally.

In spite of its value, the study has certain limitations. First, we employed a self-report instrument for data collection which does not address students’ deeper views on learning. Qualitative methods would make it possible to gain a more detailed understanding of foreign language learning through interviews, including think-aloud procedures and classroom observations. Second, the current research into LLS and proficiency among Hungarian students was conducted with participants from two different years at the lower secondary school level, so generalisation of the results is limited. In addition, our sample was not representative. Further research would be necessary to fully examine the relationship between language learning strategies, language learning attitudes, foreign language proficiency and general achievement among Hungarian students in a variety of years and in a larger sample.

Third, the current research only used two measurement points of proficiency, the foreign language mark and general achievement, which are evaluated by different teachers. In future, we will collect a wider range of language proficiency data, including language proficiency test and interviews. Fourth, a comparison of LLS and general learning strategies would produce a more nuanced overview of students’ strategy use.

Conclusion and Pedagogical Implications

The main purpose of the present study was to ascertain the effect of LLS on other variables, such as foreign language attitude, foreign language proficiency and general school achievement among secondary school children in Hungary at the beginning and end of lower secondary school. In the beginner phase of learning foreign languages, it is important to better understand the relationship between language learning and related factors. Hence, our main objective was to provide a complex overview of these measurement points and to examine how LLS can support children in the first phase of the language learning process.

We used the Hungarian translation of Oxford’s Strategy Inventory for Language Learning questionnaire and supplemented it with the children’s self-reports of their foreign language attitudes and proficiency indicated by their foreign language mark and school achievement. This provided the basis for our research.

Past research has demonstrated that students with more frequent LLS use have better chances to become more proficient language learners. It has been pointed out that students that are more proficient engage in a wider range of strategies and select learning strategies dependent on learning tasks. Thus, teachers are encouraged to introduce a range of strategies for children to be able to select those that are most appropriate to features of their personality and relevant to learning tasks. At this age, introducing LLS is significant, particularly for children with low and average foreign language marks. It would be essential to motivate children to discover a variety of ways to practise their foreign language and find opportunities to read and engage in conversations with others. Children who are able to recognise the significance of language learning and use a broad range of strategies can find new ways and opportunities to practise language and to improve their proficiency. Hence, it would be highly recommended to integrate LLS consciously into foreign language lessons.

Ethics Statement

This study was carried out in accordance with the recommendations of the University of Szeged. According to these recommendations participation in the study was voluntary both for schools and students. The participating schools had consent with the parents in allowing their students’ engagement in the research. According to the Hungarian law, the schools’ responsibility to conduct a written agreement with the parents about their consent to allow their children to take part in researches. The whole process is permitted and coordinated by the school holding municipalities. The agreements are documented and stored in written forms in the schools. The authors declare that data collection and handling strictly adhered to the usual standards of research ethics as approved by the University of Szeged.

Author Contributions

AH and AM substantially contributed to the conception and design of the study, data collection, analysis and interpretation of data for the research. Both have written the manuscript and reviewed all parts of the manuscript. AH and AM have given final approval of the final version to be published. AH and AM agree to be accountable for all aspects of the work.

The research was founded by the University of Szeged.

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Alhaysony, M. (2017). Language learning strategies use by Saudi EFL students: the effect of duration of English language study and gender. Theory Pract. Lang. Stud. 7, 18–28. doi: 10.17507/tpls.0701.03

CrossRef Full Text | Google Scholar

Al-Qahtani, M. F. (2013). Relationship between English language, learning strategies, attitudes, motivation, and students’ academic achievement. Educ. Med. J. 5, 19–29. doi: 10.5959/eimj.v5i3.124

PubMed Abstract | CrossRef Full Text | Google Scholar

Arbuckle, J. L. (2008). AMOS (Version 17.0) [Computer Software] . Chicago, IL: SPSS.

Google Scholar

Ardasheva, Y., and Tretter, T. R. (2013). Strategy inventory for language learning–ELL student form: testing for factorial validity. Mod. Lang. J. 97, 474–489. doi: 10.1111/j.1540-4781.2013.12011.x

Byrne, B. M. (2010). Structural Equation Modelling Using AMOS. Basic Concepts, Applications, and Programming , 2nd Edn. New York: Routledge.

Chamot, A. U. (2004). Issues in language learning strategy research and teaching. Electron. J. Foreign Lang. Teachnol. 1, 14–26.

Charoento, M. (2016). Individual learner differences and language learning strategies. Contemp. Educ. Res. J. 7, 57–72.

Chen, M. (2014). Age differences in the use of language learning strategies. Engl. Lang. Teach. 7, 144–151. doi: 10.5539/elt.v7n2p144

Chen, M. L. (2009). Influence of grade level on perceptual learning style preferences and language learning strategies of Taiwanese English as a foreign language learners. Learn. Individ. Dif. 19, 304–308. doi: 10.1016/j.lindif.2009.02.004

Dawadi, S. (2017). Language learning strategies profiles of EFL learners in Nepal. Eur. J. Educ. Soc. Sci. 2, 42–55.

Doró, K., and Habók, A. (2013). Language learning strategies in elementary school: the effect of age and gender in an EFL context. J. Linguist. Lang. Teach. 4, 25–37.

Griffith, C., and Oxford, R. (2014). The twenty-first century landscape of language learning strategies: introduction to this special issue. System 43, 1–10. doi: 10.1016/j.system.2013.12.009

Griffiths, C., and Incecay, G. (2016). “New directions in language learning strategy research: engaging with the complexity of strategy use,” in New Directions in Language Learning Psychology , eds C. Gkonou, D. Tatzl, and S. Mercer (Berlin: Springer), 25–38. doi: 10.1007/978-3-319-23491-5_3

Gunning, P., and Oxford, R. L. (2014). Children’s learning strategy use and the effects of strategy instruction on success in learning ESL in Canada. System 43, 82–100. doi: 10.1016/j.system.2013.12.012

Jabbari, M. J., and Golkar, N. (2014). The relationship between EFL learners’ language learning attitudes and language learning strategies. Int. J. Linguist. 6, 161–167. doi: 10.5296/ijl.v6i3.5837

Khaldieh, S. A. (2000). Learning strategies and writing processes of proficient vs. less-proficient learners of Arabic. Foreign Lang. Ann. 33, 522–533. doi: 10.1111/j.1944-9720.2000.tb01996.x

Kline, R. B. (2015). Principles and Practice of Structural Equation Modeling , 4th Edn. New York, NY: Guilford Press.

Liu, J. (2010). Language learning strategies and its training model. Int. Educ. Stud. 3, 100–104. doi: 10.5539/ies.v3n3p100

Magogwe, J. M., and Oliver, R. (2007). The relationship between language learning strategies, proficiency, age, and self-efficacy beliefs: a study of language learners in Botswana. System 35, 338–352. doi: 10.1016/j.system.2007.01.003

Oxford, R. L. (1990). Language Learning Strategies: What Every Teacher Should Know . Boston, MA: Heinle and Heinle.

Oxford, R. L. (2016). Teaching and Researching Language Learning Strategies: Self-Regulation in Context . New York, NY: Routledge.

Oxford, R. L., and Burry-Stock, J. A. (1995). Assessing the use of language learning strategies worldwide with the ESL/EFL version of the strategy inventory for language learning (SILL). System 23, 1–23. doi: 10.1016/0346-251X(94)00047-A

Pfenninger, S. E., and Singleton, D. (2017). Beyond Age Effects in Instructional L2 Learning: Revisiting the Age Factor . Clevedon: Multilingual Matters. doi: 10.21832/PFENNI7623

Platsidou, M., and Kantaridou, Z. (2014). The role of attitudes and learning strategy use in predicting perceived competence in school-aged foreign language learners. J. Lang. Lit. 5, 253–260. doi: 10.7813/jll.2014/5-3/43

Platsidou, M., and Sipitanou, A. (2014). Exploring relationships with grade level, gender and language proficiency in the foreign language learning strategy use of children and early adolescents. Int. J. Res. Stud. Lang. Learn. 4, 83–96. doi: 10.5861/ijrsll.2014.778

Rao, Z. (2016). Language learning strategies and English proficiency: interpretations from information-processing theory. Lang. Learn. J. 44, 90–106. doi: 10.1080/09571736.2012.733886

Shang, H. F. (2010). Reading strategy use, self-efficacy and EFL reading comprehension. Asian EFL J. 12, 18–42.

Shawer, S. F. (2016). Four language skills performance, academic achievement, and learning strategy use in preservice teacher training programs. TESOL J. 7, 262–303. doi: 10.1002/tesj.202

Wong, L. L. C., and Nunan, D. (2011). The learning styles and strategies of effective language learners. System 39, 144–163. doi: 10.1016/j.system.2011.05.004

Wu, Y. L. (2008). Language learning strategies used by students at different proficiency levels. Asian EFL J. 10, 75–95.

PubMed Abstract | Google Scholar

Keywords : language learning strategy, foreign language attitude, foreign language mark, general school achievement, lower secondary students

Citation: Habók A and Magyar A (2018) The Effect of Language Learning Strategies on Proficiency, Attitudes and School Achievement. Front. Psychol. 8:2358. doi: 10.3389/fpsyg.2017.02358

Received: 06 July 2017; Accepted: 26 December 2017; Published: 11 January 2018.

Reviewed by:

Copyright © 2018 Habók and Magyar. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Anita Habók, [email protected]

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Academia.edu no longer supports Internet Explorer.

To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to  upgrade your browser .

Enter the email address you signed up with and we'll email you a reset link.

  • We're Hiring!
  • Help Center

paper cover thumbnail

Language Research

Profile image of Muhammad  Thahir

Related Papers

Language Teaching Research

Federica Barbieri

Arguing that the introduction of corpus linguistics in teaching materials and the language classroom should be informed by theories and principles of SLA, this paper presents a case study illustrating how corpus-based findings on reported speech can be integrated into a form-focused model of instruction. After overviewing previous work which compares ESL grammar textbooks descriptions and real language use and promotes the use of corpus-based findings to inform L2 textbook descriptions, we outline the results of a survey of the presentation of reported speech in current popular textbooks. The survey findings are contrasted with the patterns of use found in two corpus-based cross-register studies of reported speech, showing how textbooks neglect important information on the use of this structure in real language. The frequency patterns of use that emerged in the corpus studies are then used to identify ten principles that should inform the design of L2 materials and classroom instruction of reported speech. In the second part of the paper we illustrate how corpus-based findings can be fruitfully implemented in a form-focused model of instruction through the use of structure-based tasks and selected principles of focus-on-form.

language of a research paper

Ivor Timmis

Dilan Gerçek Yılmaz

TESL Canada Journal

Ayşe S Akyel

This article examines a number of adult ESL grammar textbooks via an author designed checklist to analyze how well they incorporate the findings from research in communicative language teaching (CLT) and in form1ocused instruction (FFI). It concludes that although a few textbooks incorporate some of the research findings in CLT and FFI, they are not necessarily those chosen by the teaching institutions.

Tesol Quarterly

Lilia Novabos

Patricia (Patsy) Duff

Christison, M.A., Christian, D., Duff, P., & Spada, N. (Eds.). (2015). Teaching and learning English grammar: Research findings and future directions. New York: Routledge. ABSTRACT An important contribution to the emerging body of research-based knowledge about English grammar, this volume presents empirical studies along with syntheses and overviews of previous and ongoing work on the teaching and learning of grammar for learners of English as a second/foreign language. A variety of approaches are explored, including form-focused instruction, content and language integration, corpus-based lexicogrammatical approaches, and social perspectives on grammar instruction.... You'll find the (draft) chapter by Duff, Ferreira & Zappa-Hollman under 'Articles' on this site. TABLE OF CONTENTS Foreword --Joanne Dresner Preface --MaryAnn Christison, Donna Christian, Patricia A. Duff, and Nina Spada Part I. Overview of English grammar instruction Chapter 1. An overview of teaching grammar in ELT --Marianne Celce-Murcia Part II. Focus on form in second language acquisition Chapter 2. Focus on form: Addressing grammatical accuracy in an occupation-specific language program --Antonella Valeo Chapter 3. Teaching English grammar in context: The timing of form-focused intervention --Junko Hondo Chapter 4. Form-focused instruction and learner investment: Case study of a high school student in Japan ---Yasuyo Tomita Chapter 5: The influence of pretask instructions and pretask planning on focus on form during Korean EFL task-based interaction --Sujung Park Part III. The use of technology in teaching grammar Chapter 6. The role of corpus research in the design of advanced level grammar instruction --Michael J. McCarthy Chapter 7. Corpus-based lexicogrammatical approach to grammar instruction: Its use and effects in EFL and ESL contexts --Dilin Liu and Ping Jiang Chapter 8. Creating corpus-based vocabulary lists for two verb tenses: A lexicogrammar approach --Keith S. Folse Part IV. Instructional design and grammar Chapter 9. Putting (functional) grammar to work in content-based English for academic purposes instruction --Patricia A. Duff, Alfredo A. Ferreira, and Sandra Zappa-Hollman Chapter 10. Integrating grammar in adult TESOL classrooms --Anne Burns and Simon Borg Chapter 11. Teacher and learner preferences for integrated and isolated form-focused instruction --Nina Spada and Marília dos Santos Lima Chapter 12. Form-focused approaches to learning, teaching, and researching grammar --Rod Ellis Epilogue --Kathleen M. Bailey

The Canadian Modern Language Review / La revue canadienne des langues vivantes

gabra gabra

Terry Shortall

Language policy and pedagogy: Essays in honor of A. …

David Nunan

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.

RELATED PAPERS

Hyunsook Yoon

BELT: Brazilian English Language Teaching Journal

Elisa Mattos

María del Pilar García Mayo

Research in Corpus Linguistics

JOAQUIN GRIS

ELT Journal

  •   We're Hiring!
  •   Help Center
  • Find new research papers in:
  • Health Sciences
  • Earth Sciences
  • Cognitive Science
  • Mathematics
  • Computer Science
  • Academia ©2024

Help | Advanced Search

Computer Science > Computation and Language

Title: lora: low-rank adaptation of large language models.

Abstract: An important paradigm of natural language processing consists of large-scale pre-training on general domain data and adaptation to particular tasks or domains. As we pre-train larger models, full fine-tuning, which retrains all model parameters, becomes less feasible. Using GPT-3 175B as an example -- deploying independent instances of fine-tuned models, each with 175B parameters, is prohibitively expensive. We propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks. Compared to GPT-3 175B fine-tuned with Adam, LoRA can reduce the number of trainable parameters by 10,000 times and the GPU memory requirement by 3 times. LoRA performs on-par or better than fine-tuning in model quality on RoBERTa, DeBERTa, GPT-2, and GPT-3, despite having fewer trainable parameters, a higher training throughput, and, unlike adapters, no additional inference latency. We also provide an empirical investigation into rank-deficiency in language model adaptation, which sheds light on the efficacy of LoRA. We release a package that facilitates the integration of LoRA with PyTorch models and provide our implementations and model checkpoints for RoBERTa, DeBERTa, and GPT-2 at this https URL .
Comments: Draft V2 includes better baselines, experiments on GLUE, and more on adapter latency
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as: [cs.CL]
  (or [cs.CL] for this version)
  Focus to learn more arXiv-issued DOI via DataCite

Submission history

Access paper:.

  • Other Formats

References & Citations

  • Google Scholar
  • Semantic Scholar

11 blog links

Dblp - cs bibliography, bibtex formatted citation.

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

Paperpal logo

Experience the future of academic writing

Boost your chances of success with real-time, subject-specific language suggestions that help you write better, faster!

Research, Write, Cite: Enjoy Uninterrupted Writing with Paperpal

Get science-backed answers to your questions from 250M+ research articles and save relevant citations to your library!

Online Plagiarism Checker For Students & Academics

Avoid accidental plagiarism in your academic text with Paperpal’s accurate plagiarism detector.

university-logo-1

1Mn+ Academics love Paperpal

trustpoilot rating image

Rated Excellent on Trustpilot

“It offers suggestions about tenses, and paraphrasing and helps re-organize my paragraphs, giving them better language structure. I feel confident about my manuscripts after running them through Paperpal. ”

" Unlike Grammarly, Paperpal aligns and tailors my sentence structures to the convention of academic writing. It also helps me to rephrase or simplify unclear sentence structures and helps me write with confidence and ease"

" What I love about Paperpal is that it has an element of context sensitivity. It understands the context in which the content is written and then gives suggestions"

" The best thing about Paperpal Word is that you get edits in real-time , and not when you’re done writing your document."

“Because of Paperpal, my manuscript is now free of errors, more concise and more readable! ”

“Its easy to use, helpful in drafts as an academic and unlike ChatGPT, it doesn't provide same suggestions everytime. ”

Try Paperpal

Unmatched features for academic writing

Language suggestions.

Paperpal understands academic context, suggesting precise language corrections without changing references, equations, technical terms, and non-English words.

language-suggestion-img

Plagiarism Checks

Get similarity score, colour-coded results and side-by-side comparisons to sources.

unmatched feature image

Research NEW

Get science-backed answers to your questions from 250M+ research articles as you write

unmatched feature image

Brainstorm ideas, get outlines and titles, create abstracts, summaries, study highlights, or craft powerful emails to journals with a click.

AI Writing banner

Paraphrase text to improve clarity and tone

Accurate paraphrasing that retains the meaning of your work, helps meet journal-prescribed word counts and maintains a formal tone.

High quality academic translations for your text

Context-sensitive translations from 25+ foreign languages to English, letting you focus on your research while we polish your writing.

Pre-submission checks to avoid desk rejection

Comprehensive submission readiness report with 30+ language and technical checks.

Contextual synonyms for better vocabulary

Improve the clarity of your text with relevant synonym suggestions that are validated based on published literature.

Paperpal has you covered at every writing stage

feature image

Have a ready manuscript that needs review? Full language checks with edits in track changes and multiple free rounds to ensure high-quality writing

feature image

Work anywhere online with real-time, secure, accurate language and grammar checks and rewrite suggestions for academic writing

feature image

Writing a paper on MS Word? Paperpal for Word provides assistive writing with instant language and grammar suggestions as you write

Be a part of the new writing revolution

Our powerful AI tools transform your work by giving you the power to improve your writing. Paperpal is built for all academics. Learn more below

feature image

Frequently Asked Questions

Is my data used for training paperpal’s ai, how does paperpal work, how to use paperpal in ms word, can i use paperpal for free, what is the price for paperpal paid subscriptions, who can use paperpal, what all can paperpal be used for, how secure is my manuscript and does paperpal use my data, what ai models are used in paperpal, will my paperpal edited document be flagged as plagiarised, will my paperpal edited document be flagged by ai detectors, will i retain copyright over my paperpal edited document, take the first step towards academic writing excellence.

china flag

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 05 June 2024

Scaling neural machine translation to 200 languages

Nature ( 2024 ) Cite this article

21k Accesses

502 Altmetric

Metrics details

  • Communication
  • Computer science

The development of neural techniques has opened up new avenues for research in machine translation. Today, neural machine translation (NMT) systems can leverage highly multilingual capacities and even perform zero-shot translation, delivering promising results in terms of language coverage and quality. However, scaling quality NMT requires large volumes of parallel bilingual data, which are not equally available for the 7,000+ languages in the world 1 . Focusing on improving the translation qualities of a relatively small group of high-resource languages comes at the expense of directing research attention to low-resource languages, exacerbating digital inequities in the long run. To break this pattern, here we introduce No Language Left Behind—a single massively multilingual model that leverages transfer learning across languages. We developed a conditional computational model based on the Sparsely Gated Mixture of Experts architecture 2 , 3 , 4 , 5 , 6 , 7 , which we trained on data obtained with new mining techniques tailored for low-resource languages. Furthermore, we devised multiple architectural and training improvements to counteract overfitting while training on thousands of tasks. We evaluated the performance of our model over 40,000 translation directions using tools created specifically for this purpose—an automatic benchmark (FLORES-200), a human evaluation metric (XSTS) and a toxicity detector that covers every language in our model. Compared with the previous state-of-the-art models, our model achieves an average of 44% improvement in translation quality as measured by BLEU. By demonstrating how to scale NMT to 200 languages and making all contributions in this effort freely available for non-commercial use, our work lays important groundwork for the development of a universal translation system.

Similar content being viewed by others

language of a research paper

Accurate structure prediction of biomolecular interactions with AlphaFold 3

language of a research paper

Highly accurate protein structure prediction with AlphaFold

language of a research paper

Testing theory of mind in large language models and humans

The recent advent of neural machine translation (NMT) has pushed translation technologies to new frontiers, but its benefits are unevenly distributed 1 . The vast majority of improvements made have mainly benefited high-resource languages, leaving many low-resource languages behind. (For the purpose of our research, we define a high-resource language as a language for which we have at least 1 million sentences of aligned textual data (or bitext) with another language). This disparity could largely be attributed to a data gap: NMT models typically require large volumes of data to produce quality translations and, by definition, these volumes are not available for lower-resource languages. The No Language Left Behind (NLLB-200) project seeks to overcome this limitation by leveraging previously unknown approaches for building massively multilingual models with cross-lingual transfer abilities 8 , 9 , thereby enabling related languages to learn from each other 1 , 10 , 11 .

It has now been widely acknowledged that multilingual models have demonstrated promising performance improvement over bilingual models 12 . However, the question remains whether massively multilingual models can enable the representation of hundreds of languages without compromising quality. Our results demonstrate that doubling the number of supported languages in machine translation and maintaining output quality are not mutually exclusive endeavours. Our final model—which includes 200 languages and three times as many low-resource languages as high-resource ones—performs, as a mean, 44% better than the previous state-of-the-art systems. This paper presents some of the most important data-gathering, modelling and evaluation techniques used to achieve this goal.

First, compared with their high-resource counterparts, training data for low-resource languages are expensive and logistically challenging to procure 13 , 14 , 15 . Publicly available digital resources are either limited in volume or difficult for automated systems to detect (particularly in large public web datasets such as CommonCrawl). Regardless of whether collecting a critical mass of human-translated seed data is necessary, sufficient data acquisition relies on large-scale data mining and monolingual data pipelines 16 , 17 , 18 , 19 . The latter techniques are often affected by noise and biases, thereby making validating the quality of the datasets they generate tedious 20 . In NLLB-200, we show that a distillation-based sentence encoding technique, LASER3 (ref.  21 ), facilitates the effective mining of parallel data for low-resource languages.

Second, on the modelling side, we use an assemblage of seed, mined, open-source and back-translated datasets to train multilingual conditional computational models (more specifically, Sparsely Gated Mixtures-of-Experts models 2 , 3 , 4 , 5 , 6 , 7 that enable cross-lingual transfer between related languages without increasing interference between unrelated languages). We show how we can achieve state-of-the-art performance with a more optimal trade-off between cross-lingual transfer and interference, and improve performance for low-resource languages.

Finally, for the purpose of quality evaluation, we created FLORES-200—a massive multilingual benchmark that enables the measurement of translation quality across any of the approximately 40,000 translation directions covered by the NLLB-200 models. Apart from automatic metrics, we also created Cross-lingual Semantic Text Similarity (XSTS) and Evaluation of Toxicity (ETOX). XSTS is a human evaluation protocol that provides consistency across languages; ETOX is a tool to detect added toxicity in translations using toxicity word lists.

Beyond creating these models, we also reflect on the potential societal impact of NLLB. To amplify the practical applicability of our work in service of low-resource-speaking communities, we provide all the benchmarks, data, code and models described in this effort as resources freely available for non-commercial use ( https://github.com/facebookresearch/fairseq/tree/nllb ) (see Data and Code availability statements for details).

Automatically creating translation training data

The current techniques used for training translation models are difficult to extend to low-resource settings, in which aligned bilingual textual data (or bitext data) are relatively scarce 22 . Many low-resource languages are supported only by small targeted bitext data consisting primarily of translations of the Christian Bible 23 , which provide limited domain diversity.

To build a large-scale parallel training dataset that covers hundreds of languages, our approach centres around extending existing datasets by first collecting non-aligned monolingual data. Then, we used a semantic sentence similarity metric to guide a large-scale data mining effort aiming to identify sentences that have a high probability of being semantically equivalent in different languages 18 .

Language identification for monolingual data collection

Collecting monolingual data at scale requires a language identification (LID) system that accurately classifies textual resources for all NLLB-200 languages. Although LID could be seen as a solved problem in some domains 24 , it remains an open challenge for web data 25 , 26 . Specifically, issues coalesce around domain mismatch 26 , similar language disambiguation 27 and successful massively multilingual scaling 28 .

Devoted attention to advancing LID techniques led to a noticeable increase in both language coverage and accuracy over time. CLD3 ( https://github.com/google/cld3 ) and fasttext 29 are two readily available models offering high detection performance for 107 and 187 languages, respectively. By using numerous public datasets, previous studies 30 , 31 report even higher coverage—464 and 1,366 languages, respectively. Another study 32 scales LID performance up to 1,629 languages using word lists and self-supervision to bootstrap training data found on the web. However, these approaches using found data suffer from domain imbalance. That is, because the available text domains vary by language, classifiers conflate different domains with different languages.

In our work, we curated FLORES-200 to use as a development set so that our LID system performance 33 is tuned over a uniform domain mix. Our approach combines a data-driven fasttext model trained on FLORES-200 with a small set of handwritten rules to address human feedback on classification errors. These rules are specifically mentioned in section 5.1.3 of ref.  34 and include linguistic filters to mitigate the learning of spurious correlations due to noisy training samples while modelling hundreds of languages.

We compare our LID model with three publicly available models: CLD3, LangId ( https://github.com/saffsd/langid.py ) and LangDetect ( https://pypi.org/project/langdetect/ ). Table 1 reports the performance on three cascading sets of languages intersecting with NLLB-200: (1) 51 languages also supported by LangId, LangDetect and CLD3; (2) 78 languages also supported by LangId and CLD3; (3) 95 languages also supported by CLD3. We also report false-positive rates (FPR) to reflect the impact of false positives on unseen languages. Our results show that our model is equipped to handle all 200 languages found in FLORES-200 while achieving notably higher performance than LangId, LangDetect and CLD3. Furthermore, the gain in F1 score is accompanied by a notable improvement in FPR, suggesting a much stronger fit for extracting low-resource languages from web corpora 32 .

Mining for bitext

Previous work 35 notes that translation quality generally increases with the amount of high-quality training data, which is difficult to procure when working with low-resource languages. Existing parallel corpora for low-resource languages are often conveniently drawn from known multilingual collections, such as the Christian Bible or the publications of multinational organizations, which are limited in quantity and domain. To overcome this problem, we created training datasets through global bitext mining in publicly available web content (drawn from repositories such as CommonCrawl). The underlying idea of our bitext mining approach is first to learn a multilingual sentence embedding space and use a similarity measure in that space to decide whether two sentences are parallel. This comparison can be done for all possible pairs in two collections of monolingual texts.

As our mining approach requires a multilingual embedding space, there are several challenges when scaling this representation to all NLLB-200 languages. First, we had to ensure that all languages were well learnt and that we accounted for large imbalances in available training data. Second, training a massively multilingual sentence encoder from scratch each time a new set of languages is introduced is computationally expensive. Furthermore, the main drawback of this approach is that the learnt embedding spaces from each new model are not necessarily mutually compatible. This can make mining intractable as for each new encoder, the entirety of available monolingual data needs to be re-embedded (for example, for English alone, this means thousands of millions of sentences and considerable computational resources). We solved this problem using a teacher–student approach 21 that extends the LASER embedding space 36 to all NLLB-200 languages. Languages are trained either as individual students or together with languages from the same family. The training of students follows the approach described in ref.  21 .

Our approach enables us to focus on the specifics of each language while taking advantage of related languages, which is crucial for dealing with very low-resource languages. (A language is defined as very low-resource if it has fewer than 100,000 samples across all pairings with any other language in our dataset). Using this method, we generated more than 1,100 million new sentence pairs of training data for 148 languages. This additional training data, paired with back translation (a conventional technique for data augmentation in NMT; ref.  37 ), ushered notable improvements in translation quality—specifically, +12.5 chrF++ (ref.  38 ) for translating very low-resource languages into English. For more details, see Supplementary Information D .

Even with marked data volume increases, the main challenge of low-resource translation is for training models to adequately represent 200 languages while adjusting to variable data capacity per language pair. Apart from techniques such as data augmentation (for example, with back translation) and self-supervision strategies on monolingual data, we used conditional computational models—more specifically, Sparsely Gated Mixture of Experts (henceforth MoE)—to minimize interference between unrelated language directions.

MoE transformer models differ from dense transformer models in that some of the feed-forward network layers are replaced with MoE layers in both the encoder and the decoder. An MoE layer consists of E experts (each is a feed-forward network) and a gating network to decide how to route input tokens to experts. The transformer encoder–decoder model, supplemented with MoE layers and their respective gating networks, learns to route input tokens to the corresponding top two experts by optimizing a linearly weighted combination of label-smoothed cross entropy 39 and an auxiliary load balancing loss 6 .

We find that vanilla MoE models with overall dropout are suboptimal for low-resource languages and significantly overfit on low-resource pairs. To remedy this issue, we designed Expert Output Masking (EOM), a regularization strategy specific to MoE architectures, and compared it with existing regularization strategies, such as Gating Dropout 40 . We find that Gating Dropout performs better than vanilla MoE with overall dropout but is outperformed by EOM.

To further reduce overfitting on low-resource language pairs, we devised a curriculum learning that introduces language pairs in phases during model training. Pairs that empirically overfit within K updates are introduced with K updates before the end of training. This reduces overfitting while allowing pairs that benefit from additional training to continue their learning. Table 2 shows that combining curriculum learning and EOM improves performance, especially on low and very low-resource language pairs (see section ‘Modelling’ for more details).

To understand how MoE models are helpful for multilingual machine translation, we visualize similarities of experts in the MoE layers using heat maps (Fig. 1a–d ). These heat maps demonstrate that in late decoder layers (Fig. 1d ), languages are being separated (that is, dispatched to different sets of experts). Moreover, we observe that languages within the same family are highly similar in their choice of experts (that is, the late decoder MoE layers are language-specific). This is particularly the case for the Arabic dialects (the six rows and columns in the top-left corner), languages in the Benue–Congo subgrouping, as well as languages in the Devanagari script. By contrast, the early decoder MoE layers (Fig. 1c ) seem to be less language-specific. The late encoder MoE layers are particularly language-agnostic in how they route tokens as can be attested by the uniform heat map in Fig. 1b .

figure 1

a – d , The first ( a ) and last ( b ) encoder layers and then the first ( c ) and last ( d ) decoder layers. The similarity is measured with respect to the gating decisions (expert choice) per language (source side in the encoder and target side in the decoder). Lighter colours represent higher experts similarity, hence, a language-agnostic processing.

Combining data (see section ‘ Automatically creating translation training data ’) and modelling contributions, Table 3 shows that NLLB-200 outperforms the nearest state-of-the-art system by almost +7.3 spBLEU (ref.  41 ) on average, constituting a 44% improvement. We then compared NLLB-200 with a few other state-of-the-art models, such as Deepnet 42 and M2M-100 (ref.  1 ), to report scores for 87 languages against FLORES-101. On this smaller subset, NLLB-200 again outperforms by +7.0 spBLEU on average. Overall, the results show that NLLB-200 improves on state-of-the-art systems by a notable margin despite supporting 200 languages, or twice as many languages (and more than 30,000 additional directions) compared with any previous work. We also show in additional experiments that NLLB-200 is a general-purpose NMT model, transferable to other domains by fine-tuning on small quantities of high-quality bitexts (see Supplementary Information E.3 ).

Evaluations

Among the many aspects of model performance that can be evaluated 43 , this section emphasizes three aspects that have a marked impact on the overall quality assessment: benchmarks for automatic evaluation, human evaluation protocols and toxicity evaluation.

A benchmark for automatic evaluation using FLORES-200

The quality of NMT outputs is typically evaluated by automatic metrics such as BLEU 44 or spBLEU 41 . The computation of automatic quality scores using these metrics requires benchmark datasets that provide gold-standard human translations as references. In turn, the apples-to-apples evaluation of different approaches made possible by these benchmark datasets gives us a better understanding of what requires further research and development. For example, creating benchmark data sets at the Workshop on Machine Translation (WMT) 45 led to rapid progress in translation directions such as English to German and English to French.

For massively multilingual NMT, the largest benchmark dataset available was FLORES-101, which supports roughly half the number of languages in NLLB-200. The necessary expansion of FLORES-101 to FLORES-200 constitutes a further challenge in terms of quality assurance, in part because of differences in standardization practices and limited access to professional translators for all languages involved. To overcome this challenge, we adapted our workflow to pay particular attention to quality assurance mechanisms. The FLORES-200 workflow consists of four phases: (1) alignment; (2) translation, initial quality assurance and iteration(s); (3) final quality assurance; and (4) completion. A language FLORES-200 set is considered ready after passing a final human quality test with a 90 out of 100 quality score (that is, independent raters agreed with 90% of the FLORES-200 reference translations in that direction).

As a result of this redesigned workflow, we produced a three-split (dev, devtest, test) data set of parallel human reference translations for all NLLB-200 languages meeting the 90% quality threshold in a maximum turnaround time of 287 days (119 days on average, 70 days minimum). (Note that to avoid leakage with our models, we filtered data from FLORES and other evaluation benchmarks used (such as WMT and IWSLT) from our training data. This was done by comparing the hashes of training sentences against those of evaluation sentences, using the xxHash algorithm). Please refer to Supplementary Information C for more details on the evaluation process. Figure 2 shows the quality scores for all languages, some of which are labelled as examples.

figure 2

Quality assurance scores for the languages in FLORES-200. The minimum acceptable standard is 90%.

Reliable human evaluation

State-of-the-art automatic metrics often fail to capture aspects of language that, while subtle, can have a notable bearing on translation quality. Human evaluations are, therefore, essential to ensuring meaningful quality assessments 46 . That said, relying on them comes with two challenges: (1) any large-scale human evaluation of NMT quality, regardless of the number of translation directions involved, contends with potentially low inter-evaluator agreement (in the vicinity of 0.5 kappa); and (2) massively multilingual NMT introduces another complexity—that of quality evaluation consistency across language directions. We address these two issues by developing XSTS 47 , a new scoring metric focused on meaning, and by using a protocol that allows for the calibration of scores across evaluators and language pairs.

XSTS is a human evaluation protocol inspired by STS 48 , emphasizing meaning preservation over fluency. XSTS uses a five-point scale, in which 1 is the lowest score, and 3 represents the acceptability threshold. To ensure consistency not only across languages but also among different evaluators of any given language, we included the same subset of sentence pairs in the full set of sentence pairs given to each evaluator, making it possible to calibrate results.

We find that automated metrics such as spBLEU and chrF++ correlate reasonably well with calibrated human evaluations of translation quality, as shown in Fig. 3 . Spearman’s R correlation coefficients between aggregated XSTS and spBLEU, chrF++ (corpus) and chrF++ (average sentence-level) are 0.710, 0.687 and 0.694, respectively. Other correlation coefficients (Kendall’s τ and Pearson’s R ) have the same ordering. Corpus spBLEU provides the best nominal correlation, followed by average sentence-level chrF++.

figure 3

a , The relationship between spBLEU and XSTS. b , The relationship between chrF++ and XSTS. c , The relationship between average sentence-level chrF++ and XSTS. All automated scores were computed only on the sentences evaluated for a given model and translation direction (either the full FLORES-200 dataset or a subset). NLLB-200 refers to a 55B parameter MoE model, and NLLB-200 Baseline refers to a dense 3.3B parameter model.

We also find that calibrated human evaluation scores correlate more strongly with automated scores than uncalibrated human evaluation scores across all automated metrics and choices of correlation coefficient. In particular, uncalibrated human evaluation scores have a Spearman’s R correlation coefficient of 0.625, 0.607 and 0.611 for spBLEU, chrF++ (corpus) and chrF++ (average sentence-level), respectively.

Overall, a sample of 55 language directions were evaluated, including 8 into English, 27 out of English, and 20 other direct language directions. The overall mean of calibrated XSTS scores was 4.26, with 38/55 directions scoring over 4.0 (that is, high quality) and 52/56 directions scoring over 3.0.

We hypothesize that added toxicity may be because of the presence of toxicity in the training data and used our detectors to estimate, more specifically, unbalanced toxicity in the bitext data. We find that estimated levels of unbalanced toxicity vary from one corpus of bitext to the next and that unbalanced toxicity can be greatly attributed to misaligned bitext. In other words, training with this misaligned bitext could encourage mistranslations with added toxicity.

To mitigate this issue, we designed a bitext filtering procedure based on the detection of multiple instances of added toxicity (that is, cases in which one sentence in the bitext pair contains at least two more toxic items than the other sentence in the pair). (A previous detector quality analysis showed that a higher precision was reached in this situation). We added this toxicity filtering procedure as an option to the filtering process and experimented with or without it for comparison.

The experimental results on the FLORES-200 dev set for 10 translation directions (from and into English for Somali, Southern Sotho, Twi, Umbundu and Venetian) show that after filtering an average amount of around 30% parallel sentences, the translation quality (chrF++) improves by 5% and added toxicity (ETOX) reduces by the same amount. Therefore, the filtering pipeline that includes toxicity filtering not only reduces the number of toxic items in the translation output but also improves the overall translation performance.

In 2016, the United Nations declared internet access a basic human right. Although the intent of this declaration was to limit censorship and allow for information and ideas to flow without interference, much of the internet today remains inaccessible to many due to language barriers. Our effort was designed to contribute one solution to help alter this status quo.

For many low-resource language communities, NLLB-200 is one of the first models designed to support translation into or out of their languages. Although applications of these new translation capabilities could be found in several domains of everyday life, we believe their impact would be most significant in a domain such as education. In formal educational settings, for instance, students and educators belonging to low-resource language groups could, with the help of NLLB-200, tap into more books, research articles and archives than before. Within the realms of informal learning, low-resource language speakers could experience greater access to information from global news outlets and social media platforms, as well as online encyclopaedias such as Wikipedia. Access to machine translation motivates more low-resource language writers or content creators to share localized knowledge or various aspects of their culture. Giving individuals access to new translation tools could thus open up opportunities for bidirectional learning, thereby also challenging Western-centric modes of knowledge production and dissemination, ultimately aiding in revitalizing certain minority cultures and languages.

Since launching NLLB-200, we can already see the impact of the model across many directions. Four months after the launch of NLLB-200, Wikimedia reported that our model was the third most used machine translation engine used by Wikipedia editors (accounting for 3.8% of all published translations) ( https://web.archive.org/web/20221107181300/https://nbviewer.org/github/wikimedia-research/machine-translation-service-analysis-2022/blob/main/mt_service_comparison_Sept2022_update.ipynb ). Compared with other machine translation services and across all languages, articles translated with NLLB-200 has the lowest percentage of deletion (0.13%) and highest percentage of translation modification kept under 10%.

In many ways, the composition of the NLLB-200 effort speaks to the centrality of interdisciplinarity in shaping our vision. Machine translation and AI advancements lie at the intersection of technological, cultural and societal development, and thus require scholars with diverse training and standpoints to fully comprehend every angle 49 , 50 . It is our hope that in future iterations, NLLB-200 continues to include scholars from fields underrepresented in the world of machine translation and AI, particularly those from humanities and social sciences backgrounds. More importantly, we hope that teams developing these initiatives would come from a wide range of race, gender and cultural identities, much like the communities whose lives we seek to improve.

Finally, we want to emphasize that overcoming the challenges that prevent the web from being accessible to speakers of all languages requires a multifaceted approach. At the technical level, NLLB-200 overcomes many data, modelling and evaluation challenges in NMT research, but it still has its limitations, some of which are documented in Supplementary Information G . As a single technological intervention, NLLB-200 is all but one piece of a massive puzzle; policy interventions aimed at more fundamental issues surrounding education, internet access and digital literacy are imperative to eradicate the structural problem of language disparities.

This section describes the steps taken to design our language identification system and bitext mining protocol.

Language identification

To train language identification models, we used fasttext 33 , 51 , which has been widely used for text classification tasks because of its simplicity and speed. We embedded character-level n -grams from the input text and leveraged a multiclass linear classifier on top. The lightweight nature of fasttext enables our LID models to handle web-scale data. Furthermore, a linear model has the benefit of being easily explainable, allowing us to trace any classification error back to its root cause. This is instrumental in addressing common pitfalls that arise when detecting language on web corpora 32 .

Classifier design

We experimented with two different designs. First, we used a combination of multiple binary classifiers in which the final decision was obtained by selecting the language with the highest score after applying a threshold. We applied threshold optimization so that when the confidence of a classifier is low, the corresponding language is not considered for the final decision. A sentence was filtered out if none of the classifiers surpassed its threshold. Second, we built a multiclass classifier using softmax over all possible languages. In this case, the threshold optimization is done after the softmax.

Our results directed us to focus on the second approach, which offers several advantages. First, changing the threshold for one language did not affect the performance of the other (which is not true in the first setting). Second, this approach generalizes better to out-of-domain data, which is our primary use case (Wikipedia → web data). Finally, a single classifier has the added benefit of being computationally simpler, thus streamlining the language identification process.

Training data and handling massive class imbalance

We used publicly available datasets to train our LID system, partially covering our languages of interest. The public datasets deployed were mostly built from web pages such as CommonCrawl. We then supplemented these with NLLB-Seed data (Supplementary Information  B ) for any missing languages. However, this supplementation is insufficient in ensuring balance in the raw training data 32 , 30 . For example, English alone represents 10.1% of our training data, whereas Minangkabau (Latin script) represents only 0.06%. Following ref.  10 , we experimented with multiple settings of temperature upsampling for underrepresented languages, in which sentences from a language l representing p l per cent of the data set are sampled proportionally to \({p}_{l}^{1/T}\) . Optimal performance was obtained at 1/ T  = 0.3 (for more details, see section 5.1 of ref.  34 ).

Training parameters

Our best-performing model was trained with softmax loss over two epochs with a learning rate of 0.8 and embeddings with 256 dimensions. We discarded words with less than a thousand occurrences after upsampling and selecting a minimum and maximum character n -gram length of two and five, respectively (which were assigned a slot in buckets of size 1,000,000). (In fasttext, we refer to ‘word’ when it is separated by spaces. When it is a non-segmenting language, there is only one ‘word’ for the whole sentence (and we take character n -grams)). All hyperparameters were tuned on FLORES-200 dev (see section 5.1.2 of ref.  34 ).

Improving LID with linguistic analysis

Language identification is a challenging task in which numerous failure modes exist, often exacerbated by the gaps between the clean data on which LID models are trained and noisy data on which LID models are applied. In other words, LID models trained in a supervised manner on fluently written sentences may have difficulty identifying grammatically incorrect and incomplete strings extracted from the web. Furthermore, models can easily learn spurious correlations that are not meaningful for the task itself. Given these challenges, we collaborated closely with a team of linguists throughout different stages of LID development to identify proper focus areas, mitigate issues and explore solutions (see section 5.1.3 of ref.  34 ).

Bitext mining

The overall approach for bitext mining focused on starting with a massively multilingual sentence encoder teacher model and adapting it to several different low-resource student models. This approach enabled us to add low-resource languages without competing with high-resource languages for capacity. Doing so circumvents the need to retrain the entire model from scratch while maintaining compatibility with the multilingual embedding spaces for subsequent mining. Extended data Fig. 1 summarizes the overall architecture of the teacher–student approach. The teacher, LASER2, is an improved version of the open-source LASER encoder ( https://github.com/facebookresearch/LASER ). The original training procedure 36 was adapted to include SentencePiece tokenization (including a vocabulary of 7,000 tokens) and the upsampling of low-resource languages.

The architecture of the five-layer BiLSTM encoder and the max pooling method to obtain sentence embeddings were left unchanged. The training was then performed on the same 93 languages with public resources obtained from OPUS 52 . See ref.  36 for details on the original LASER training procedure. Training of the students followed the approach described in greater detail in ref.  21 , summarized below:

students specialized in one language or several similar languages;

students were randomly initialized because we wanted to handle low-resource language for which we did not have a pre-trained language model;

students may have a dedicated SentencePiece vocabulary different from the teacher to better accommodate scripts and tokens in the student languages;

as we used cosine distance for bitext mining (Fig. 1 ), students learnt to minimize the cosine loss with the teacher;

students can have an MLM loss to leverage student language monolingual data (Fig. 1 ).

Our student encoders used a 12-layer transformer with a hidden size of 1,024, four attention heads, and around 250 million parameters. All students were trained with available bitexts in their respective language, complemented by 2 million sentences of English/English and English/Spanish. The motivation behind this approach is to anchor the students to the English embedding space, increasing robustness by including English/Spanish bitexts from CCMatrix and allowing for the joint learning of new languages. This technique is particularly useful when only limited amounts of bitexts are available to train the students. Teacher–student training was performed on 16 GPUs, the ADAM optimizer, a learning rate of 0.0005 and a batch size of 10,000. We trained student encoders for 148 languages and named these models LASER3.

Proxy metric for new encoders

Mined bitexts were subsequently used to improve translation quality for the languages of NLLB-200. However, mining and NMT training are computationally expensive, and it is intractable to perform this evaluation systematically for many different sentence encoder variants. As an evaluation proxy, we used a mining-based multilingual similarity search error rate, referred to here as xsim. In contrast to cosine accuracy, which aligns embeddings based on the highest cosine score, xsim aligns source and target embeddings based on the highest margin score, which has been shown to be beneficial in mining 53 . The margin-based score is defined as

where x and y are the source and target sentences, and N N k ( x ) denotes the k nearest neighbours of x in the other language. We set k to 4. All xsim results are calculated on FLORES-200 devtest, using the ratio margin, where margin( a ,  b ) =  a / b . Moreover, all scores are calculated for translations into English (that is, xxx → eng). English is encoded by the teacher, and the other language is encoded by the LASER3 student. To facilitate further research using xsim, we also provide this evaluation method as an open-source resource ( https://github.com/facebookresearch/LASER/ ).

End-to-end encoder evaluation

Once we had identified the best sentence encoder for each language using the xsim scores, we performed mining, added the mined data to the existing bitexts and trained a bilingual NMT system. Initial experiments indicated that a threshold on the margin of 1.06 seems to be the best compromise between precision and recall for most languages. For these NMT baselines, we do not apply extra filtering on the bitexts and leave this to the training procedure of our massively multilingual NMT system.

We did not attempt to optimize the architecture and parameters of the bilingual NMT systems to the characteristics of each language pair but used the same architecture for all. Therefore, the reported results should not be interpreted as the best possible ones given the available resources—they are mainly provided to validate the mined bitexts. We used a 12-layer encoder and decoder and trained for 100 epochs. Moreover, we looked for the best performance on the FLORES-200 development set and report detokenized BLEU on the FLORES-200 devtest.

In this section, we first describe the multilingual machine translation task setup, which includes tokenization and base model architecture. Then, we outline how we leveraged conditional computation for massively multilingual machine translation with EOM regulation and our Curriculum Learning (CL) strategy for low-resource languages.

We modelled multilingual NMT as a sequence-to-sequence task, in which we conditioned on an input sequence in the source language with an encoder and generated the output sequence in the expected target language with a decoder 54 . With the source sentence S , source language ℓ s , and target language ℓ t in hand, we trained to maximize the probability of the translation in the target language T —that is, P ( T ∣ S ,  ℓ s ,  ℓ t ). Below, we discuss details of the (1) tokenization of the text sequences in the source and target languages; and (2) model architecture with the input and output designed specifically for multilingual machine translation. For further details on the task setup, such as the amount of training data per language pair, please refer to Supplementary Information  F or section 8 of ref.  34 .

Segmentation with SentencePiece

To tokenize our text sequences, we trained a single SentencePiece model (SPM) 55 for all languages. We sampled a total of 100 million sentences from primary bitext data. To ensure low-resource languages are well-represented in the vocabulary, we downsampled high-resource and upsampled low-resource languages with a sampling temperature of five (ref.  10 ). Notably, vocabulary size is an important hyperparameter in multilingual translation models involving low-resource languages 56 , 57 , 58 . The vocabulary size of our trained SPM model is 256,000. Such a large vocabulary ensures adequate representation across the wide spectrum of languages we support.

Model architecture

Our sequence-to-sequence multilingual machine translation model is based on the transformer encoder–decoder architecture 59 . The encoder transforms the source token sequence into a sequence of token embeddings. Then, the decoder attends to the encoder output and autoregressively generates the target sentence token by token. More precisely, the encoder takes the sequence of tokens W  = ( w 1 , …,  w S ) and the source language ℓ s , and produces a sequence of embeddings H  = ( h 1 , …,  h S ), which are then provided to the decoder with the target language ℓ t to produce the target tokens V  = ( v 1 , …,  v T ) sequentially. In sum,

Note that we prefixed the source sequence with the source language, as opposed to the target language, as done in previous work 10 , 60 . We did so because we prioritized optimizing the zero-shot performance of our model on any pair of 200 languages at a minor cost to supervised performance. Empirically, we find zero-shot performance to be negatively affected when conditioning the encoder on the target language. When the source is conditioned on only the source language, the encoder generalizes better to pairs of source and target languages not encountered during training 1 .

Conditional computation for multilingual machine translation

A massively multilingual translation (MMT) model uses the same shared model capacity to train on several translation directions simultaneously. While doing so can lead to beneficial cross-lingual transfer between related languages, it can also add to the risk of interference between unrelated languages 1 , 61 . MoE models are a type of conditional computational models 62 , 63 that activate a subset of model parameters per input, as opposed to dense models that activate all model parameters per input. MoE models unlock marked representational capacity while maintaining the same inference and training efficiencies in terms of FLOPs compared with the core dense architecture.

However, as we increase the model capacity and the computational cost per update, the propensity for low or very low-resource languages to overfit increases, thus causing performance to deteriorate. In this section, we examine how we can use Sparsely Gated Mixture of Experts models 2 , 3 , 4 , 5 , 6 , 7 to achieve a more optimal trade-off between cross-lingual transfer and interference and improve performance for low-resource languages.

Sparsely gated mixture of experts

To build our MoE models, we substitute a quarter of the encoder and decoder feed-forward network layers with MoE layers, each with E distinct experts. We followed the Top- k -Gating algorithm in ref.  4 and dispatched each token to at most k  = 2 experts. For more details on the training of MoE models, see Supplementary Information  E .

Expert output masking

In this proposed regularization strategy, we masked the expert output for a random fraction ( p eom ) of the input tokens. For input tokens with dropped expert outputs, the first and/or second expert is effectively skipped. As shown in the second panel of Extended data Fig. 2 , we masked both experts for the first token ( x 1 in red), chose not to mask any of the expert outputs for the second token ( x 2 in blue) and in the final scenario, masked only one expert for the last token ( x 3 in green).

Curriculum learning for MMT

Orthogonal to model-side regularization methods such as dropout, we explored regularizing MMT models by means of CL. We proposed starting training with high-resource pairs first, then introducing low-resource pairs—prone to overfitting—in later phases. To derive the phases of the curriculum, we first trained a vanilla MoE model (without CL), followed by partitioning the translation directions into n bins { b 1 , …,  b n }. If T is the total number of training updates, we introduced each bin b i after T  −  k i updates. We based when \({({k}_{i})}_{i}\) and what \({({b}_{i})}_{i}\) directions to add at every phase of the step when we observed a language pair starting to overfit. Review the step-based CL algorithm in ref.  64 for more on how the directions are partitioned. See Supplementary Information E.2 for the list of directions added at each stage.

Automatic evaluation

Many automatic translation quality assessment metrics exist, including model-based ones such as COMET 65 and BLEURT 66 . Although model-based metrics have shown better correlation with human judgement in recent metrics shared tasks of the WMT 43 , they require training and are not easily extendable to a large set of low-resource languages. In this work, we rely on BLEU (and a variant of it) and chrF++. Both measures draw on the idea that translation quality can be quantified based on how similar a machine translation output is compared with that produced by a human translator.

BLEU and spBLEU

The BLEU score 44 has been the standard metric for machine translation evaluation since its inception two decades ago. It measures the overlap between machine and human translations by combining the precision of 1-grams to 4-grams with a brevity penalty. The main disadvantage of BLEU is that it is tokenization-dependent. Efforts such as sacrebleu 67 have taken strides towards standardization, supporting the use of community-standard tokenizers under the hood. However, these tokenizers do not extend to many languages. Reference 41 proposes spBLEU, a BLEU metric based on a standardized SentencePiece model (SPM) covering 101 languages, released alongside FLORES-101. In this work, we provide SPM-200 along with FLORES-200 to enable the measurement of spBLEU. (Our analyses demonstrate that there are minor differences between SPM-200 from FLORES-200 and SPM-100 from FLORES-101 when measuring on the FLORES-101 languages. The major advantage of SPM-200 is that it covers 200 languages. More details on SPM-200 are reported in section 8.1.1 of ref.  34 ).

The chrF++ score 38 overcomes the limitation of the BLEU score, which requires that a sentence can be broken up into word tokens. However, some languages, such as Chinese or Thai, do not use spaces to separate words, and word segmentation tools may not be readily available. There is also a concern about highly agglutinative languages in which BLEU fails to assign any credit to morphological variants. chrF++ overcomes these weaknesses by basing the overlap calculation on character-level n -grams F -score ( n ranging from 1 to 6) and complementing with word unigrams and bi-grams. In this work, we primarily evaluated using chrF++ using the settings from sacrebleu. However, when comparing with other published work, we used BLEU and spBLEU where appropriate.

Human evaluation methodology

When building machine translation systems for thousands of different language pairs, a core question is which pairs reach certain levels of quality. Therefore, we needed meaningful scores that are comparable across language pairs.

XSTS evaluation protocol

We adapted the recently proposed XSTS methodology 48 . In short, XSTS is a human evaluation protocol focusing on meaning preservation above fluency. See details on this protocol in Supplementary Information  F . For low-resource languages, translations are usually of poorer quality, and so we focused more on usable (that is, meaning-preserving) translations, even if they are not fully fluent. Compared with Direct Assessment 68 with a 5-point scale (the original direct assessment uses a 100-point scale), it is found that XSTS yields higher inter-annotator agreement 47 . XSTS rates each source sentence and its machine translation on a 5-point scale, in which 1 is the lowest and 5 is the highest.

Calibration set

To enable meaningful scores comparable across language pairs, we asked each evaluator to provide assessments using the XSTS scale on precisely the same set of sentence pairs. This aims to identify annotators who have a systematic tendency to be more harsh or generous in their scoring and correct for this effect. The calibration set consists of the machine translation output paired with the reference translation only in English. Based on how evaluators used the XSTS scale on this calibration set, we adjusted their raw scores on the actual evaluation task to ensure consistency across evaluators. Although this monolingual calibration task does not precisely mimic the bilingual XSTS task, it is a reasonable first approximation and has been shown to increase the correlation between human and automatic metrics primarily by reducing one source of ‘noise’ in the human evaluations—the lack of score calibration between annotators.

Obtaining aggregated human quality metrics from multiple studies

To obtain an aggregate human quality metric for each language direction in an evaluation study, we take the majority XSTS score (that is, mean–median score) for each sentence and average these majority scores over all evaluated sentences. In a given study, the aggregate human evaluation score for any translation direction l s  →  l t is

where l s and l t denote the source language and the target language, respectively; \({X}_{{l}_{{\rm{s}}}\to {l}_{{\rm{t}}},i}(S,T)\) denotes the XSTS score of the i th evaluator who evaluates sentences in a given translation direction l s  →  l t for a source sentence S and a target sentence T ; \({M}_{{l}_{{\rm{s}}}\to {l}_{{\rm{t}}}}\) denotes the total number of evaluators who evaluate the (source, translation) sentence pair ( S ,  T ) for translation direction l s  →  l t ; \({{\mathcal{T}}}_{{l}_{{\rm{s}}}\to {l}_{{\rm{t}}}}=\{({S}_{{l}_{{\rm{s}}}\to {l}_{{\rm{t}}},k},{T}_{{l}_{{\rm{s}}}\to {l}_{{\rm{t}}},k})| 1\le k\le {N}_{{l}_{{\rm{s}}}\to {l}_{{\rm{t}}}}\}\) is the set of \({N}_{{l}_{{\rm{s}}}\to {l}_{{\rm{t}}}}\) (source, translation) sentence pairs being evaluated for translation direction l s  →  l t .

Every evaluator in a given study s is also asked to provide ratings for all or parts of a calibration set— \({{\mathcal{C}}}_{s}=\{({S}_{s,k},{T}_{s,k})| 1\le k\le {K}_{s}\}\) . S s , k denotes the k th source sentence in the calibration set for an evaluation study; s , T s , k denotes the translated sentence corresponding to S s , k ; and \({K}_{s}=| {{\mathcal{C}}}_{s}| \) is the number of sentence pairs in the calibration set for an evaluation study.

For each language direction evaluated in a study, we obtained the majority score on the calibration set as follows:

where \({X}_{l,i}^{(s)}(S,T)\) denotes the XSTS score provided by the i th evaluator, for the language direction l s  →  l t , in study s , for a given source sentence S and a translated sentence T , in the calibration set \({{\mathcal{C}}}_{s}\) of the study.

To obtain aggregated calibrated XSTS scores on the language direction level, we explored several different calibration methodologies. None of the calibration methods we investigated showed a marked difference in correlation with automated scores, and all calibration methodologies we explored provided superior correlation compared with uncalibrated XSTS scores. For more details on these calibration methodologies, see section 7.2 of ref.  34 .

Added toxicity detection for 200 languages

To enable toxicity detection at scale, we used a detector based on word lists. In this section, we provide more details about our toxicity definition and describe the detector (ETOX) and associated word lists.

Toxic content

Owing to the subjective nature of toxicity, definitions of toxic language can vary. We included items that are commonly referred to as vulgar or profane language. (Note that vulgar or profane language is not always necessarily toxic. Some common slang, for instance, may be considered vulgar but is not necessarily toxic). Moreover, we also included items associated with depictions of pornographic content or sexual acts, some frequently used hate speech expressions and some expressions tied to bullying. We also included items, vulgar or not, referring to body parts that are commonly associated with sexual practices.

The ETOX detector

We started with the assumption that general-purpose machine translation systems should remain faithful to the source content and not add any toxic elements during the translation process. We define toxic elements as word tokens or short phrases present in our lists. ETOX identifies added toxicity using the following two criteria: number of toxic items and matched or non-matched toxicity. A toxic item is considered detected if it is present in a line and surrounded by spaces or the start or end of a line. ETOX tracks the number of unique toxic items found in a line but does not count a phrase again if it has multiple occurrences. Matched toxicity indicates that the number of toxic items is the same in both the source and the translated content (that is, no added toxicity). Added toxicity is an instance of non-matched toxicity in which more toxic items are found in the translation output than in the source. For non-segmenting languages or some languages that use complex diacritics, space tokenization is insufficient to distinguish words from one another. In those cases, we used SentencePiece tokenization of both the sentence and toxicity word list.

Toxicity-200 lists

Lists are based on professional translations from English, which were then heuristically adapted by linguists to better serve the target language. As toxicity is culturally sensitive, attempting to find equivalents in a largely multilingual setting constitutes a challenge when starting from one source language. To address this issue, translators were allowed to forgo translating some of the source items and add more culturally relevant items.

In the initial release of the Toxicity-200 lists, the average number of items in a toxicity detection list was 271 entries, whereas the median number of entries was 143. The latter may be a better measure of central tendency than the mean average, given that languages with a rich inflectional morphology constitute extreme outliers (for example, the Czech list had 2,534 entries and the Polish list 2,004). The shortest list had 36 entries, and the longest 6,078.

Data availability

All data generated and described in the Article and its Supplementary Information are available at GitHub ( https://github.com/facebookresearch/fairseq/tree/nllb ) 69 as follows. The FLORES-200 dataset contains human-translated evaluation data in 204 languages. The NLLB-Seed database contains human-translation seed training data in 39 languages (Supplementary Information I ). The NLLB-MD database contains human-translated seed data in different domains in six languages to assess generalization (Supplementary Information J ). The Toxicity-200 database contains wordlists to detect toxicity in 200 languages. Mined bitext database contains publicly available web data for 148 English-centric and 1,465 non-English-centric language pairs. Publicly available data used to train NLLB models with references to download the data are listed in Supplementary Table 2 .

Code availability

To make our work available to the community, we provide the following models and supporting code as resources freely available for non-commercial use, available at GitHub ( https://github.com/facebookresearch/fairseq/tree/nllb ) 69 as follows. The translation models cover 200 languages; the NLLB models come in multiple sizes (54.5B MoE, 3.3B and 1.3B Dense, and 1.3B and 600M distilled). The language identification models contain more than 200 languages. LASER3 comprises sentence encoders for identifying aligned bitext for 148 languages. Stopes consists of a data-mining library that can be used to process and clean monolingual data, followed by the creation of aligned bitext. Scripts to recreate our training data and training and generation scripts to reproduce our models are also included.

Fan, A. et al. Beyond English-centric multilingual machine translation. J. Mach. Learn. Res 22 , 1–48 (2021).

MathSciNet   Google Scholar  

Du, N. et al. GlaM: efficient scaling of language models with mixture-of-experts. In Proceedings of the 39th International Conference on Machine Learning Vol. 162, 5547–5569 (PMLR, 2022).

Hwang, C. et al. Tutel: adaptive mixture-of-experts at scale. In 6th Conference on Machine Learning and Systems (MLSys, 2023).

Lepikhin, D. et al. GShard: scaling giant models with conditional computation and automatic sharding. In International Conference on Learning Representations (ICLR, 2021).

Lewis, M., Bhosale, S., Dettmers, T., Goyal, N. & Zettlemoyer, L. BASE layers: simplifying training of large, sparse models. In Proc. 38th International Conference on Machine Learning Vol. 139, 6265–6274 (PMLR, 2021).

Shazeer, N. et al. Outrageously large neural networks: the sparsely-gated mixture-of-experts layer. In Proc. 2017 International Conference on Learning Representations (ICLR) 1–19 (ICLR, 2017).

Zoph, B. et al. ST-MoE: designing stable and transferable sparse expert models. Preprint at https://arxiv.org/abs/2202.08906 (2022).

Zoph, B., Yuret, D., May, J. & Knight, K. Transfer learning for low-resource neural machine translation. In Proc. 2016 Conference on Empirical Methods in Natural Language Processing (eds Su, J. et al.) 1568–1575 (Association for Computational Linguistics, 2016).

Nguyen, T. Q. & Chiang, D. Transfer learning across low-resource, related languages for neural machine translation. In Proc. Eighth International Joint Conference on Natural Language Processing Vol. 2 (eds Kondrak, G. & Watanabe, T.) 296–301 (Asian Federation of Natural Language Processing, 2017).

Arivazhagan, N. et al. Massively multilingual neural machine translation in the wild: findings and challenges. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies Vol. 1, 3874–3884 (Association for Computational Linguistics, 2019).

Zhang, B., Williams, P., Titov, I. & Sennrich, R. Improving massively multilingual neural machine translation and zero-shot translation. In Proc. 58th Annual Meeting of the Association for Computational Linguistics (eds Jurafsky, D. et al.) 1628–1639 (ACL, 2020).

Tran, C. et al. Facebook AI’s WMT21 news translation task submission. In Proc. Sixth Conference on Machine Translation (eds Barrault, L.) 205–215 (ACL, 2021); https://aclanthology.org/2021.wmt-1.19 .

Orife, I. et al. Masakhane – machine translation for Africa. Preprint at https://arxiv.org/abs/2003.11529 (2020).

Kuwanto, G. et al. Low-resource machine translation training curriculum fit for low-resource languages. Preprint at https://arxiv.org/abs/2103.13272 (2021).

Nekoto, W. et al. Participatory research for low-resourced machine translation: a case study in African languages. In Findings of the Association for Computational Linguistics: EMNLP 2020 (eds Cohn, T. et al.) 2144–2160 (ACL, 2020).

Karakanta, A., Dehdari, J. & van Genabith, J. Neural machine translation for low-resource languages without parallel corpora. Mach. Transl. 32 , 167–189 (2018).

Article   Google Scholar  

Bañón, M. et al. ParaCrawl: web-scale acquisition of parallel corpora. In Proc. 58th Annual Meeting of the Association for Computational Linguistics (eds Jurafsky, D. et al.) 4555–4567 (ACL, 2020).

Schwenk, H. et al. CCMatrix: mining billions of high-quality parallel sentences on the web. In Proc. 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing Vol. 1 (eds Zong, C. et al.) 6490–6500 (ACL, 2021).

Ramesh, G. et al. Samanantar : the largest publicly available parallel corpora collection for 11 Indic languages. Trans. Assoc. Comput. Linguist. 10 , 145–162 (2022).

Kreutzer, J. et al. Quality at a glance: an audit of web-crawled multilingual datasets. Trans. Assoc. Comput. Linguist. 10 , 50–72 (2022).

Heffernan, K., Çelebi, O. & Schwenk, H. Bitext mining using distilled sentence representations for low-resource languages. Preprint at https://arxiv.org/abs/2205.12654 (2022).

Gowda, T., Zhang, Z., Mattmann, C. & May, J. Many-to-English machine translation tools, data, and pretrained models. In Proc. 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations (eds Ji, H. et al.) 306–316 (ACL, 2021).

McCarthy, A. D. et al. The Johns Hopkins University Bible corpus: 1600+ tongues for typological exploration. In Proc. 12th Language Resources and Evaluation Conference (eds Calzolari, N. et al.) 2884–2892 (European Language Resources Association, 2020); https://aclanthology.org/2020.lrec-1.352 .

McNamee, P. Language identification: a solved problem suitable for undergraduate instruction. J. Comput. Sci. Coll. 20 , 94–101 (2005).

Google Scholar  

Abadji, J., Suárez, P. J. O., Romary, L. & Sagot, B. Towards a cleaner document-oriented multilingual crawled corpus. Preprint at https://arxiv.org/abs/2201.06642 (2022).

Widdows, D. & Brew, C. Language identification with a reciprocal rank classifier. Preprint at https://arxiv.org/abs/2109.09862 (2021).

Goutte, C., Léger, S., Malmasi, S. & Zampieri, M. Discriminating similar languages: evaluations and explorations. Preprint at http://arxiv.org/abs/1610.00031 (2016).

Jauhiainen, T., Lindén, K. & Jauhiainen, H. Evaluation of language identification methods using 285 languages. In Proc. 21st Nordic Conference on Computational Linguistics (eds. Tiedemann, J. & Tahmasebi, N.) 183–191 (2017).

Grave, É., Bojanowski, P., Gupta, P., Joulin, A. & Mikolov, T. Learning word vectors for 157 languages. In Proc. 11th International Conference on Language Resources and Evaluation (LREC 2018) (eds Calzolari, N. et al.) (ELRA, 2018).

Dunn, J. Mapping languages: the corpus of global language use. Lang. Resour. Eval. 54 , 999–1018 (2020).

Brown, R. D. Non-linear mapping for improved identification of 1300+ languages. In Proc. 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (eds Moschitti, A. et al.) 627–632 (ACL, 2014).

Caswell, I., Breiner, T., van Esch, D. & Bapna, A. Language ID in the wild: unexpected challenges on the path to a thousand-language web text corpus. In Proc. 28th International Conference on Computational Linguistics (eds Scott, D. et al.) 6588–6608 (International Committee on Computational Linguistics, 2020); https://aclanthology.org/2020.coling-main.579 .

Joulin, A., Grave, E., Bojanowski, P. & Mikolov, T. Bag of tricks for efficient text classification. In Proc. 15th Conference of the European Chapter of the Association for Computational Linguistics Vol. 2 (eds Lapata, M. et al.) 427–431 (ACL, 2017).

NLLB Team et al. No language left behind: scaling human-centered machine translation. Preprint at https://arxiv.org/abs/2207.04672 (2022).

Koehn, P. & Knowles, R. Six challenges for neural machine translation. In Proc. First Workshop on Neural Machine Translation (eds Luong, T. et al.) 28–39 (ACL, 2017).

Artetxe, M. & Schwenk, H. Massively multilingual sentence embeddings for zero-shot cross-lingual transfer and beyond. Trans. Assoc. Comput. Linguist. 7 , 597–610 (2019).

Sennrich, R., Haddow, B. & Birch, A. Improving neural machine translation models with monolingual data. In Proc. 54th Annual Meeting of the Association for Computational Linguistics (ACL) Vol. 1 (eds Erk, K. & Smith, N. A.) 86–96 (ACL, 2016).

Popović, M. chrf++: words helping character n-grams. In Proc. Second Conference on Machine Translation Vol. 2 (eds Bojar, O. et al.) 612–618 (ACL, 2017).

Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J. & Wojna, Z. Rethinking the inception architecture for computer vision. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2818–2826 (IEEE, 2016).

Liu, R., Kim, Y. J., Muzio, A., Mozafari, B. & Awadalla, H. H. Gating dropout: communication-efficient regularization for sparsely activated transformers. In Proceedings of the 39th International Conference on Machine Learning (PMLR, 2022).

Goyal, N. et al. The Flores-101 evaluation benchmark for low-resource and multilingual machine translation. Trans. Assoc. Comput. Linguist. 10 , 522–538 (2022).

Wang, H. et al. DeepNet: scaling transformers to 1,000 layers. In IEEE Transactions on Pattern Analysis and Machine Intelligence https://doi.org/10.1109/TPAMI.2024.3386927 (IEEE, 2024)

Freitag, M. et al. Results of the WMT21 metrics shared task: evaluating metrics with expert-based human evaluations on TED and news domain. In Proc. Sixth Conference on Machine Translation (eds Barrault, L. et al.) 733–774 (ACL, 2021); https://aclanthology.org/2021.wmt-1.73 .

Papineni, K., Roukos, S., Ward, T. & Zhu, W.-J. Bleu: a method for automatic evaluation of machine translation. In Proc. 40th annual meeting of the Association for Computational Linguistics (eds Isabelle, P. et al.) 311–318 (ACL, 2002).

Akhbardeh, F. et al. Findings of the 2021 conference on machine translation (WMT21). In Proc. Sixth Conference on Machine Translation (eds Barrault, L. et al.) 1–88 (ACL, 2021); https://aclanthology.org/2021.wmt-1.1 .

Kocmi, T. et al. To ship or not to ship: an extensive evaluation of automatic metrics for machine translation. In Proc. Sixth Conference on Machine Translation (eds Barrault, L. et al.) 478–494 (ACL, 2021).

Licht, D. et al. Consistent human evaluation of machine translation across language pairs. In Proceedings of the 15th Biennial Conference of the Association for Machine Translation in the Americas Vol. 1, 309–321 (Association for Machine Translation in the Americas, 2022).

Agirre, E. et al. SemEval-2012 task 6: a pilot on semantic textual similarity. In Proc. *SEM 2012: The First Joint Conference on Lexical and Computational Semantics Vols 1–2 (eds Aggire, E. et al.) 385–393 (ACL, 2012).

Kusters, R. et al. Interdisciplinary research in artificial intelligence: Challenges and opportunities. Front. Big Data 3 , 577974 (2020).

Article   PubMed   PubMed Central   Google Scholar  

Wang, S., Cooper, N., Eby, M. & Jo, E. S. From human-centered to social-centered artificial intelligence: assessing ChatGPT’s impact through disruptive events. Preprint at https://arxiv.org/abs/2306.00227 (2023).

Bojanowski, P., Grave, E., Joulin, A. & Mikolov, T. Enriching word vectors with subword onformation. Trans. Assoc. Comput. Linguist. 5 , 135–146 (2017).

Tiedemann, J. Parallel data, tools and interfaces in OPUS. In Proc. Eighth International Conference on Language Resources and Evaluation (eds Calzolari, N. et al.) 2214–2218 (ACL, 2012).

Artetxe, M. & Schwenk, H. Margin-based parallel corpus mining with multilingual sentence embeddings. In Proc. 57th Annual Meeting of the Association for Computational Linguistics (eds Korhonen, A.) 3197–3203 (ACL, 2019).

Bahdanau, D., Cho, K. H. & Bengio, Y. Neural machine translation by jointly learning to align and translate. In Proc. of the 3rd International Conference on Learning Representations (ICLR, 2015).

Kudo, T. & Richardson, J. SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing. In Proc. 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018 (eds Blanco, E. & Lu, W.) 66–71 (ACL, 2018); https://doi.org/10.18653/v1/d18-2012 .

Gu, J., Hassan, H., Devlin, J. & Li, V. O. Universal Neural Machine Translation for Extremely Low Resource Languages. In Proc. 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies Vol. 1 (eds Walker, M. et al.) 344–354 (ACL, 2018); https://aclanthology.org/N18-1032 .

Wang, X., Pham, H., Arthur, P. & Neubig, G. Multilingual neural machine translation with soft decoupled encoding. Preprint at https://arxiv.org/abs/1902.03499 (2019).

Rajab, J. Effect of tokenisation strategies for low-resourced Southern African languages. In 3rd Workshop on African Natural Language Processing (ICLR, 2022).

Vaswani, A. et al. Attention is all you need. In Proc. 31st Conference on Neural Information Processing Systems 5998–6008 (NIPS, 2017).

Johnson, M. et al. Google’s multilingual neural machine translation system: enabling zero-shot translation. Trans. Assoc. Comput. Linguist. 5 , 339–351 (2017).

Conneau, A. et al. Unsupervised cross-lingual representation learning at scale. In Proc. 58th Annual Meeting of the Association for Computational Linguistics (eds Jurafsky, D. et al.) 8440–8451 (ACL, 2020).

Bengio, Y., Léonard, N. & Courville, A. C. Estimating or propagating gradients through stochastic neurons for conditional computation. Preprint at http://arxiv.org/abs/1308.3432 (2013).

Almahairi, A. et al. Dynamic capacity networks. In Proc. 33rd International Conference on International Conference on Machine Learning Vol. 48, 2091–2100 (PMLR, 2016).

Elbayad, M., Sun, A. & Bhosale, S. Fixing MoE over-fitting on low-resource languages in multilingual machine translation. In Findings of the Association for Computational Linguistics: ACL 2023 (eds Rogers, A. et al.) 14237–14253 (ACL, 2023); https://aclanthology.org/2023.findings-acl.897 .

Rei, R., Stewart, C., Farinha, A. C. & Lavie, A. COMET: a neural framework for MT evaluation. In Proc. 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (eds Webber, B. et al.) 2685–2702 (ACL, 2020).

Sellam, T., Das, D. & Parikh, A. BLEURT: learning robust metrics for text generation. In Proc. 58th Annual Meeting of the Association for Computational Linguistics (eds Jurafsky, D. et al.) 7881–7892 (ACL, 2020).

Post, M. A Call for Clarity in Reporting BLEU Scores. In Proc. Third Conference on Machine Translation: Research Papers (eds Bojar, O. et al.) 186–191 (ACL, 2018); https://aclanthology.org/W18-6319 .

Graham, Y., Baldwin, T., Moffat, A. & Zobel, J. Continuous measurement scales in human evaluation of machine translation. In Proc. 7th Linguistic Annotation Workshop and Interoperability with Discourse 33–41 (eds Graham, Y. et al.) (ACL, 2013).

NLLB Team et al. No Language Left Behind: scaling human-centered machine translation. GitHub https://github.com/facebookresearch/fairseq/tree/nllb (2022).

Download references

Acknowledgements

We thank the following interns for their contributions to the project: C. Baziotis, D. Dua, A. Guo, O. Ignat, A. Kamran, T. Mohiuddin, A. N. Rubungo, S. Sun, S. Tan, H. Xu, S. Wu and Y. Zhang. We are grateful to all the Wikimedia Foundation staff and volunteers who worked with us and provided helpful feedback on our project. We thank V. Chaudhary for help with the data pipeline; E. Grave for his help in scaling fasttext to all FLORES-200 languages; M. Diab for her work on XSTS; L. Specia for her feedback on toxicity and XSTS; J. Ferrando and C. Escolano for their help in using the ALTI+ method; G. Chang, C.-J. Wu and R. Raghavendra for helping us to compute the CO 2 cost of training our models; A. Sridhar for helping with FSDP; S. Jeschonek, G. Anantharaman, D. Sarina, J. Colombo, S. Krishnan, D. Kannappan, K. Saladi, V. Pai, A. Yajurvedi and S. Sengupta for their assistance with training infrastructure; K. Johnson for his help with UXR studies and model evaluation; B. O’Horo and J. Kao for their generative insights and guidance; P. Fung, N. Usunier, S. Riedel, S. Sengupta and E. Dinan for their helpful feedback on the paper. We would also like to thank A. Bordes, M. Zannoli and C. Moghbel for their overall support of this project. Finally, we are indebted to the translators, reviewers, human evaluators, linguists, as well as the translation and quality assurance agencies we partnered with, for helping to create FLORES-200, NLLB-Seed, NLLB-MD and Toxicity-200; performing human evaluations; and teaching us about their native languages.

Author information

Authors and affiliations.

Foundational AI Research (FAIR), Meta, Paris, France

Marta R. Costa-jussà, Onur Çelebi, Guillaume Wenzek, Loic Barrault, Shannon Spruit, Pierre Andrews, Alexandre Mourachko & Holger Schwenk

Foundational AI Research (FAIR), Meta, New York, NY, USA

James Cross, Angela Fan, Philipp Koehn & Safiyyah Saleem

Foundational AI Research (FAIR), Meta, Menlo Park, CA, USA

Maha Elbayad, Elahe Kalbassi, Janice Lam, Daniel Licht, Jean Maillard, Anna Sun, Skyler Wang, Al Youngblood, Bapi Akula, Gabriel Mejia Gonzalez, Prangthip Hansanti, John Hoffman, Semarley Jarrett, Kaushik Ram Sadagopan, Dirk Rowe, Chau Tran, Necip Fazil Ayan, Shruti Bhosale, Sergey Edunov, Cynthia Gao, Vedanuj Goswami, Francisco Guzmán, Christophe Ropers & Jeff Wang

Foundational AI Research (FAIR), Meta, London, UK

Kenneth Heafield & Kevin Heffernan

University of California, Berkeley, CA, USA

Skyler Wang

Johns Hopkins University, Baltimore, MD, USA

Philipp Koehn

  • Marta R. Costa-jussà
  • , James Cross
  • , Onur Çelebi
  • , Maha Elbayad
  • , Kenneth Heafield
  • , Kevin Heffernan
  • , Elahe Kalbassi
  • , Janice Lam
  • , Daniel Licht
  • , Jean Maillard
  • , Skyler Wang
  • , Guillaume Wenzek
  • , Al Youngblood
  • , Bapi Akula
  • , Loic Barrault
  • , Gabriel Mejia Gonzalez
  • , Prangthip Hansanti
  • , John Hoffman
  • , Semarley Jarrett
  • , Kaushik Ram Sadagopan
  • , Dirk Rowe
  • , Shannon Spruit
  • , Chau Tran
  • , Pierre Andrews
  • , Necip Fazil Ayan
  • , Shruti Bhosale
  • , Sergey Edunov
  • , Angela Fan
  • , Cynthia Gao
  • , Vedanuj Goswami
  • , Francisco Guzmán
  • , Philipp Koehn
  • , Alexandre Mourachko
  • , Christophe Ropers
  • , Safiyyah Saleem
  • , Holger Schwenk
  •  & Jeff Wang

Contributions

B.A., P.A., O.Ç., K. Heafield, K. Heffernan, S.J., H.S. and G.W. contributed to the data workstream of the project, which includes developing tools to facilitate data mining, cleaning and consolidation. L.B., S.B., J.C., M.E., V.G., J.M., K.R.S., A.S. and C.T. conducted research and experiments that gave rise to the models in this work. M.R.C., C.G., J.H., E.K., P.K., D.L., D.R., S.Spruit., S.W. and A.Y. implemented automatic and human evaluations of NLLB, including but not limited to quality, bias and toxicity. G.M.G., P.H., J.L. and C.R. performed all linguistics work in this project. N.F.A., S.E., A.F., F.G., A.M., S.S. and J.W. provided crucial technical and organizational leadership to help materialize this overall project. M.R.C., C.R., M.E. and S.W. prepared the paper for publication.

Corresponding author

Correspondence to Marta R. Costa-jussà .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Peer review

Peer review information.

Nature thanks David Adelani, Sunipa Dev and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended data fig. 1 architecture of the laser3 teacher-student approach..

See 21 for more details.

Extended Data Fig. 2 Illustration of EOM (panel c) in contrast to overall dropout (panel b) for MoE layers.

A color represents a token, and each token is dispatched to two experts (Top-2-Gating) depending on the gating decision (panel a). Faded colors correspond to dropped units or masked outputs.

Supplementary information

Supplementary information.

This file contains Supplementary Information Sections A–K and Supplementary References – see Supplementary Contents page for details.

Peer Review File

Rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

NLLB Team. Scaling neural machine translation to 200 languages. Nature (2024). https://doi.org/10.1038/s41586-024-07335-x

Download citation

Received : 08 May 2023

Accepted : 19 March 2024

Published : 05 June 2024

DOI : https://doi.org/10.1038/s41586-024-07335-x

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

language of a research paper

  • Harvard Business School →
  • Faculty & Research →
  • Working Paper
  • HBS Working Paper Series

Investor Influence on Media Coverage: Evidence from Venture Capital-Backed Startups

  • Format: Print
  • | Language: English
  • | Pages: 65

About The Author

language of a research paper

Brian K. Baik

More from the authors.

  • Faculty Research

Private Equity and Digital Transformation

  • February 2024 (Revised June 2024)

PerlAvance: Using Objectives and Key Results (OKRs) at a Fintech Venture (A)

Private equity fund valuation management during fundraising.

  • Private Equity and Digital Transformation  By: Brian K. Baik, Wilbur Chen and Suraj Srinivasan
  • PerlAvance: Using Objectives and Key Results (OKRs) at a Fintech Venture (A)  By: Brian K. Baik, Tatiana Sandino and Tricia Peralta
  • Private Equity Fund Valuation Management during Fundraising  By: Brian K. Baik

To revisit this article, visit My Profile, then View saved stories .

  • Backchannel
  • Newsletters
  • WIRED Insider
  • WIRED Consulting

Will Knight

OpenAI Offers a Peek Inside the Guts of ChatGPT

Person using ChatGPT on a computer

ChatGPT developer OpenAI’s approach to building artificial intelligence came under fire this week from former employees who accuse the company of taking unnecessary risks with technology that could become harmful.

Today, OpenAI released a new research paper apparently aimed at showing it is serious about tackling AI risk by making its models more explainable. In the paper , researchers from the company lay out a way to peer inside the AI model that powers ChatGPT. They devise a method of identifying how the model stores certain concepts—including those that might cause an AI system to misbehave.

Although the research makes OpenAI’s work on keeping AI in check more visible, it also highlights recent turmoil at the company. The new research was performed by the recently disbanded “superalignment” team at OpenAI that was dedicated to studying the technology’s long-term risks.

The former group’s coleads, Ilya Sutskever and Jan Leike—both of whom have left OpenAI —are named as coauthors. Sutskever, a cofounder of OpenAI and formerly chief scientist, was among the board members who voted to fire CEO Sam Altman last November, triggering a chaotic few days that culminated in Altman’s return as leader.

ChatGPT is powered by a family of so-called large language models called GPT, based on an approach to machine learning known as artificial neural networks. These mathematical networks have shown great power to learn useful tasks by analyzing example data, but their workings cannot be easily scrutinized as conventional computer programs can. The complex interplay between the layers of “neurons” within an artificial neural network makes reverse engineering why a system like ChatGPT came up with a particular response hugely challenging.

“Unlike with most human creations, we don’t really understand the inner workings of neural networks,” the researchers behind the work wrote in an accompanying blog post . Some prominent AI researchers believe that the most powerful AI models, including ChatGPT, could perhaps be used to design chemical or biological weapons and coordinate cyberattacks. A longer-term concern is that AI models may choose to hide information or act in harmful ways in order to achieve their goals.

OpenAI’s new paper outlines a technique that lessens the mystery a little, by identifying patterns that represent specific concepts inside a machine learning system with help from an additional machine learning model. The key innovation is in refining the network used to peer inside the system of interest by identifying concepts, to make it more efficient.

OpenAI proved out the approach by identifying patterns that represent concepts inside GPT-4, one of its largest AI models. The company released code related to the interpretability work, as well as a visualization tool that can be used to see how words in different sentences activate concepts, including profanity and erotic content, in GPT-4 and another model. Knowing how a model represents certain concepts could be a step toward being able to dial down those associated with unwanted behavior, to keep an AI system on the rails. It could also make it possible to tune an AI system to favor certain topics or ideas.

The Snowflake Attack May Be Turning Into One of the Largest Data Breaches Ever

By Matt Burgess

The End of El Niño Might Make the Weather Even More Extreme

By Sachi Mulkey

Microsoft Will Switch Off Recall by Default After Security Backlash

By Andy Greenberg

The Lords of Silicon Valley Are Thrilled to Present a ‘Handheld Iron Dome’

By Matthew Gault

Even though LLMs defy easy interrogation, a growing body of research suggests they can be poked and prodded in ways that reveal useful information. Anthropic, an OpenAI competitor backed by Amazon and Google, published similar work on AI interpretability last month. To demonstrate how the behavior of AI systems might be tuned, the company's researchers created a chatbot obsessed with San Francisco's Golden Gate Bridge . And simply asking an LLM to explain its reasoning can sometimes yield insights .

“It’s exciting progress,” says David Bau , a professor at Northeastern University who works on AI explainability, of the new OpenAI research. “As a field, we need to be learning how to understand and scrutinize these large models much better.”

Bau says the OpenAI team’s main innovation is in showing a more efficient way to configure a small neural network that can be used to understand the components of a larger one. But he also notes that the technique needs to be refined to make it more reliable. “There’s still a lot of work ahead in using these methods to create fully understandable explanations,” Bau says.

Bau is part of a US government-funded effort called the National Deep Inference Fabric , which will make cloud computing resources available to academic researchers so that they too can probe especially powerful AI models. “We need to figure out how we can enable scientists to do this work even if they are not working at these large companies,” he says.

OpenAI’s researchers acknowledge in their paper that further work needs to be done to improve their method, but also say they hope it will lead to practical ways to control AI models. “We hope that one day, interpretability can provide us with new ways to reason about model safety and robustness, and significantly increase our trust in powerful AI models by giving strong assurances about their behavior,” they write.

You Might Also Like …

Navigate election season with our WIRED Politics Lab newsletter and podcast

Don’t think breakdancing is an Olympic sport ? The world champ agrees (kinda)

How researchers cracked an 11-year-old password to a $3M crypto wallet

The uncanny rise of the world’s first AI beauty pageant

Give your back a break: Here are the best office chairs we’ve tested

language of a research paper

Reece Rogers

AI Is a Black Box. Anthropic Figured Out a Way to Look Inside

Steven Levy

It’s Time to Believe the AI Hype

IMAGES

  1. How to Write a Research Paper in English

    language of a research paper

  2. 😀 Research paper format. The Basics of a Research Paper Format. 2019-02-10

    language of a research paper

  3. How to Write a Research Paper Fast in 9 Steps

    language of a research paper

  4. How to write results section of research paper

    language of a research paper

  5. Sample Research Paper In English Language

    language of a research paper

  6. 🏆 English research paper. Writing a research paper for students

    language of a research paper

VIDEO

  1. Language Research

  2. Common Types of Research Papers for Publication

  3. 9 TIPS TO ENHANCE THE LANGUAGE IN YOUR RESEARCH PAPER

  4. The Intriguing Journey of Human Language

  5. Psychology of Language Series

  6. Language of Research Campaigns Advocacies

COMMENTS

  1. Language in Academic Writing: Features and Topical Issues

    Abstract The quality of language of a scholarly paper. determines its acceptability for academic publication. Books, editorials and journals have distinguished styles of. expressions, sentence ...

  2. Strategies for overcoming language barriers in research

    This paper seeks to describe best practices for conducting cross-language research with individuals who have a language barrier.Discussion paper.Research methods papers addressing cross-language research issues published between 2000-2017.Rigorous ...

  3. Appropriate Language: Overview

    Appropriate Language: Overview. When writing, it is very important to use language that fits your audience and matches purpose. Inappropriate language uses can damage your credibility, undermine your argument, or alienate your audience. This handout will cover some of the major issues with appropriate language use: levels of language formality ...

  4. Language Skills in Research Paper Writing

    Scientific communication is a blend of scientific facts presented in a language that everyone can understand and appreciate. Experts claim that an effectively written research paper is more about a story, and not a study [].In fact, one of the virtues discussed in conducting research points to telling the story in a cohesive manner, which depends upon a powerful and skillful use of language [].

  5. How to use appropriate academic language in research papers?

    Use varying sentence structure. Sentence structure represents the physical structure of a sentence. For making an impactful research paper, it is important to use variegated words but avoid their overuse. Furthermore, repeating longer sentences may overshadow the argument and inundate the reader.

  6. Language and Literature: Sage Journals

    Language and Literature is an invaluable international peer-reviewed journal that covers the latest research in stylistics, defined as the study of style in literary and non-literary language. We publish theoretical, empirical and experimental research that aims to make a contribution to our understanding of style and its effects on readers.

  7. Language and linguistics

    Drawing upon the philosophical theories of language—that the meaning and inference of a word is dependent on its use—we argue that the context in which use of the term patient occurs is ...

  8. PDF The Language of Research

    The Language of Research It is quite common for students and teachers alike to use the term research to de-scribe a paper assignment that is actually a literature review. As previously noted, with respect to criminal justice and criminology, there is more to research than reviewing literature. This synonymous use of the term research is just ...

  9. Second Language Research: Sage Journals

    Second Language Research is an international peer-reviewed, quarterly journal, publishing original theory-driven research concerned with second language acquisition and second language performance. This includes both experimental studies and contributions aimed at exploring conceptual issues. In addition to providing a forum for investigators in the field of non-native language learning...

  10. The power of language: How words shape people, culture

    Language can play a big role in how we and others perceive the world, and linguists work to discover what words and phrases can influence us, unknowingly. ... For example, in one research paper, a ...

  11. Research on learning and teaching of languages other than English in

    Language education research papers have recently been giving increased attention to language learners, a reflection of the advances made in learner-centred language teaching (e.g. Ushioda, 2020). It therefore comes as no surprise that research on language learners occupies a central position in the studies we reviewed. The six articles we ...

  12. 55 Top-Rated Research Topics in Linguistics For an A+

    The relationship between language and identity. A critical evaluation of language and ethnicity. Analyzing language attrition among most English speakers. Distinct functions of language among ...

  13. PDF LANGUAGE ACQUISITION AND LANGUAGE LEARNING

    the author's position on the topic of the research. The paper involves 3 sections. Section 1 introduces language acquisition. Language learning will be presented in Section The 2. synergy between language acquisition and language learning is presented in Section 3. Finally, some concluding remarks are provided.

  14. The Language Learning Journal

    The Language Learning Journal (LLJ) is an academic, peer-reviewed journal, providing a forum for research and scholarly debate on current aspects of foreign and second language learning and teaching. Its international readership includes foreign and second language teachers and teacher educators, researchers in language education and language acquisition, and educational policy makers.

  15. APA Sample Paper

    Crucially, citation practices do not differ between the two styles of paper. However, for your convenience, we have provided two versions of our APA 7 sample paper below: one in student style and one in professional style. Note: For accessibility purposes, we have used "Track Changes" to make comments along the margins of these samples.

  16. Natural language processing: state of the art, current trends and

    Natural language processing (NLP) has recently gained much attention for representing and analyzing human language computationally. It has spread its applications in various fields such as machine translation, email spam detection, information extraction, summarization, medical, and question answering etc. In this paper, we first distinguish four phases by discussing different levels of NLP ...

  17. 211 Interesting Research Topics in Linguistics For Your Thesis

    Linguistics Research Paper Topics. If you want to study how language is applied and its importance in the world, you can consider these Linguistics topics for your research paper. They are: An analysis of romantic ideas and their expression amongst French people. An overview of the hate language in the course against religion.

  18. General Format

    Paper Format. The preparation of papers and manuscripts in MLA Style is covered in part four of the MLA Style Manual. Below are some basic guidelines for formatting a paper in MLA Style: General Guidelines. Type your paper on a computer and print it out on standard, white 8.5 x 11-inch paper.

  19. Frontiers

    In line with other research (Platsidou and Kantaridou, 2014), we conclude that more proficient learners avail themselves of a broader range of strategies than less proficient students and strategy use has a significant effect on foreign language marks. The research focused on the whole language process in connexion with several other factors ...

  20. (PDF) Language Research

    Academia.edu is a platform for academics to share research papers. Language Research ... Shehadeh, A. 2002: Comprehensible output, from occurrence to acquisition: an agenda for acquisitional research. Language Learning 52: 597-647. Skehan, P. 1998: A cognitive approach to language learning. Oxford: Oxford University Press. Swain, M. 1995 ...

  21. [2303.18223] A Survey of Large Language Models

    To discriminate the difference in parameter scale, the research community has coined the term large language models (LLM) for the PLMs of significant size. Recently, the research on LLMs has been largely advanced by both academia and industry, and a remarkable progress is the launch of ChatGPT, which has attracted widespread attention from society.

  22. [2106.09685] LoRA: Low-Rank Adaptation of Large Language Models

    An important paradigm of natural language processing consists of large-scale pre-training on general domain data and adaptation to particular tasks or domains. As we pre-train larger models, full fine-tuning, which retrains all model parameters, becomes less feasible. Using GPT-3 175B as an example -- deploying independent instances of fine-tuned models, each with 175B parameters, is ...

  23. AI Academic Writing Tool

    Enhance your academic writing with our free writing assistant, a generative AI-powered academic writing tool. Key features - AI Language suggestions, academic translation, grammar checker, english language check, paraphraser, free essay checker and text reducer tool. Try our online AI academic writing tool that checks language errors and provides instant, in-depth suggestions to help you ...

  24. Language Teaching Research: Sage Journals

    Language Teaching Research is a peer-reviewed journal that publishes research within the area of second or foreign language teaching. Although articles are written in English, the journal welcomes studies dealing with the teaching of languages other … | View full journal description. This journal is a member of the Committee on Publication ...

  25. Scaling neural machine translation to 200 languages

    The translation models cover 200 languages; the NLLB models come in multiple sizes (54.5B MoE, 3.3B and 1.3B Dense, and 1.3B and 600M distilled). The language identification models contain more ...

  26. Investor Influence on Media Coverage: Evidence from Venture Capital

    Baik, Brian K., and Albert Shin. "Investor Influence on Media Coverage: Evidence from Venture Capital-Backed Startups." Harvard Business School Working Paper, No. 24-073, May 2024.

  27. What is Natural Language Processing? Definition and Examples

    Natural language processing (NLP) is a subset of artificial intelligence, computer science, and linguistics focused on making human communication, such as speech and text, comprehensible to computers. NLP is used in a wide variety of everyday products and services. Some of the most common ways NLP is used are through voice-activated digital ...

  28. OpenAI Offers a Peek Inside the Guts of ChatGPT

    Days after former employees said the company was being too reckless with its technology, OpenAI released a research paper on a method for reverse engineering the workings of AI models.