Root out friction in every digital experience, super-charge conversion rates, and optimize digital self-service

Uncover insights from any interaction, deliver AI-powered agent coaching, and reduce cost to serve

Increase revenue and loyalty with real-time insights and recommendations delivered to teams on the ground

Know how your people feel and empower managers to improve employee engagement, productivity, and retention

Take action in the moments that matter most along the employee journey and drive bottom line growth

Whatever they’re are saying, wherever they’re saying it, know exactly what’s going on with your people

Get faster, richer insights with qual and quant tools that make powerful market research available to everyone

Run concept tests, pricing studies, prototyping + more with fast, powerful studies designed by UX research experts

Track your brand performance 24/7 and act quickly to respond to opportunities and challenges in your market

Explore the platform powering Experience Management

  • Free Account
  • For Digital
  • For Customer Care
  • For Human Resources
  • For Researchers
  • Financial Services
  • All Industries

Popular Use Cases

  • Customer Experience
  • Employee Experience
  • Net Promoter Score
  • Voice of Customer
  • Customer Success Hub
  • Product Documentation
  • Training & Certification
  • XM Institute
  • Popular Resources
  • Customer Stories
  • Artificial Intelligence

Market Research

  • Partnerships
  • Marketplace

The annual gathering of the experience leaders at the world’s iconic brands building breakthrough business results, live in Salt Lake City.

  • English/AU & NZ
  • Español/Europa
  • Español/América Latina
  • Português Brasileiro
  • REQUEST DEMO
  • Experience Management
  • Secondary Research

Try Qualtrics for free

Secondary research: definition, methods, & examples.

19 min read This ultimate guide to secondary research helps you understand changes in market trends, customers buying patterns and your competition using existing data sources.

In situations where you’re not involved in the data gathering process ( primary research ), you have to rely on existing information and data to arrive at specific research conclusions or outcomes. This approach is known as secondary research.

In this article, we’re going to explain what secondary research is, how it works, and share some examples of it in practice.

Free eBook: The ultimate guide to conducting market research

What is secondary research?

Secondary research, also known as desk research, is a research method that involves compiling existing data sourced from a variety of channels . This includes internal sources (e.g.in-house research) or, more commonly, external sources (such as government statistics, organizational bodies, and the internet).

Secondary research comes in several formats, such as published datasets, reports, and survey responses , and can also be sourced from websites, libraries, and museums.

The information is usually free — or available at a limited access cost — and gathered using surveys , telephone interviews, observation, face-to-face interviews, and more.

When using secondary research, researchers collect, verify, analyze and incorporate it to help them confirm research goals for the research period.

As well as the above, it can be used to review previous research into an area of interest. Researchers can look for patterns across data spanning several years and identify trends — or use it to verify early hypothesis statements and establish whether it’s worth continuing research into a prospective area.

How to conduct secondary research

There are five key steps to conducting secondary research effectively and efficiently:

1.    Identify and define the research topic

First, understand what you will be researching and define the topic by thinking about the research questions you want to be answered.

Ask yourself: What is the point of conducting this research? Then, ask: What do we want to achieve?

This may indicate an exploratory reason (why something happened) or confirm a hypothesis. The answers may indicate ideas that need primary or secondary research (or a combination) to investigate them.

2.    Find research and existing data sources

If secondary research is needed, think about where you might find the information. This helps you narrow down your secondary sources to those that help you answer your questions. What keywords do you need to use?

Which organizations are closely working on this topic already? Are there any competitors that you need to be aware of?

Create a list of the data sources, information, and people that could help you with your work.

3.    Begin searching and collecting the existing data

Now that you have the list of data sources, start accessing the data and collect the information into an organized system. This may mean you start setting up research journal accounts or making telephone calls to book meetings with third-party research teams to verify the details around data results.

As you search and access information, remember to check the data’s date, the credibility of the source, the relevance of the material to your research topic, and the methodology used by the third-party researchers. Start small and as you gain results, investigate further in the areas that help your research’s aims.

4.    Combine the data and compare the results

When you have your data in one place, you need to understand, filter, order, and combine it intelligently. Data may come in different formats where some data could be unusable, while other information may need to be deleted.

After this, you can start to look at different data sets to see what they tell you. You may find that you need to compare the same datasets over different periods for changes over time or compare different datasets to notice overlaps or trends. Ask yourself: What does this data mean to my research? Does it help or hinder my research?

5.    Analyze your data and explore further

In this last stage of the process, look at the information you have and ask yourself if this answers your original questions for your research. Are there any gaps? Do you understand the information you’ve found? If you feel there is more to cover, repeat the steps and delve deeper into the topic so that you can get all the information you need.

If secondary research can’t provide these answers, consider supplementing your results with data gained from primary research. As you explore further, add to your knowledge and update your findings. This will help you present clear, credible information.

Primary vs secondary research

Unlike secondary research, primary research involves creating data first-hand by directly working with interviewees, target users, or a target market. Primary research focuses on the method for carrying out research, asking questions, and collecting data using approaches such as:

  • Interviews (panel, face-to-face or over the phone)
  • Questionnaires or surveys
  • Focus groups

Using these methods, researchers can get in-depth, targeted responses to questions, making results more accurate and specific to their research goals. However, it does take time to do and administer.

Unlike primary research, secondary research uses existing data, which also includes published results from primary research. Researchers summarize the existing research and use the results to support their research goals.

Both primary and secondary research have their places. Primary research can support the findings found through secondary research (and fill knowledge gaps), while secondary research can be a starting point for further primary research. Because of this, these research methods are often combined for optimal research results that are accurate at both the micro and macro level.

First-hand research to collect data. May require a lot of time The research collects existing, published data. May require a little time
Creates raw data that the researcher owns The researcher has no control over data method or ownership
Relevant to the goals of the research May not be relevant to the goals of the research
The researcher conducts research. May be subject to researcher bias The researcher collects results. No information on what researcher bias existsSources of secondary research
Can be expensive to carry out More affordable due to access to free data

Sources of Secondary Research

There are two types of secondary research sources: internal and external. Internal data refers to in-house data that can be gathered from the researcher’s organization. External data refers to data published outside of and not owned by the researcher’s organization.

Internal data

Internal data is a good first port of call for insights and knowledge, as you may already have relevant information stored in your systems. Because you own this information — and it won’t be available to other researchers — it can give you a competitive edge . Examples of internal data include:

  • Database information on sales history and business goal conversions
  • Information from website applications and mobile site data
  • Customer-generated data on product and service efficiency and use
  • Previous research results or supplemental research areas
  • Previous campaign results

External data

External data is useful when you: 1) need information on a new topic, 2) want to fill in gaps in your knowledge, or 3) want data that breaks down a population or market for trend and pattern analysis. Examples of external data include:

  • Government, non-government agencies, and trade body statistics
  • Company reports and research
  • Competitor research
  • Public library collections
  • Textbooks and research journals
  • Media stories in newspapers
  • Online journals and research sites

Three examples of secondary research methods in action

How and why might you conduct secondary research? Let’s look at a few examples:

1.    Collecting factual information from the internet on a specific topic or market

There are plenty of sites that hold data for people to view and use in their research. For example, Google Scholar, ResearchGate, or Wiley Online Library all provide previous research on a particular topic. Researchers can create free accounts and use the search facilities to look into a topic by keyword, before following the instructions to download or export results for further analysis.

This can be useful for exploring a new market that your organization wants to consider entering. For instance, by viewing the U.S Census Bureau demographic data for that area, you can see what the demographics of your target audience are , and create compelling marketing campaigns accordingly.

2.    Finding out the views of your target audience on a particular topic

If you’re interested in seeing the historical views on a particular topic, for example, attitudes to women’s rights in the US, you can turn to secondary sources.

Textbooks, news articles, reviews, and journal entries can all provide qualitative reports and interviews covering how people discussed women’s rights. There may be multimedia elements like video or documented posters of propaganda showing biased language usage.

By gathering this information, synthesizing it, and evaluating the language, who created it and when it was shared, you can create a timeline of how a topic was discussed over time.

3.    When you want to know the latest thinking on a topic

Educational institutions, such as schools and colleges, create a lot of research-based reports on younger audiences or their academic specialisms. Dissertations from students also can be submitted to research journals, making these places useful places to see the latest insights from a new generation of academics.

Information can be requested — and sometimes academic institutions may want to collaborate and conduct research on your behalf. This can provide key primary data in areas that you want to research, as well as secondary data sources for your research.

Advantages of secondary research

There are several benefits of using secondary research, which we’ve outlined below:

  • Easily and readily available data – There is an abundance of readily accessible data sources that have been pre-collected for use, in person at local libraries and online using the internet. This data is usually sorted by filters or can be exported into spreadsheet format, meaning that little technical expertise is needed to access and use the data.
  • Faster research speeds – Since the data is already published and in the public arena, you don’t need to collect this information through primary research. This can make the research easier to do and faster, as you can get started with the data quickly.
  • Low financial and time costs – Most secondary data sources can be accessed for free or at a small cost to the researcher, so the overall research costs are kept low. In addition, by saving on preliminary research, the time costs for the researcher are kept down as well.
  • Secondary data can drive additional research actions – The insights gained can support future research activities (like conducting a follow-up survey or specifying future detailed research topics) or help add value to these activities.
  • Secondary data can be useful pre-research insights – Secondary source data can provide pre-research insights and information on effects that can help resolve whether research should be conducted. It can also help highlight knowledge gaps, so subsequent research can consider this.
  • Ability to scale up results – Secondary sources can include large datasets (like Census data results across several states) so research results can be scaled up quickly using large secondary data sources.

Disadvantages of secondary research

The disadvantages of secondary research are worth considering in advance of conducting research :

  • Secondary research data can be out of date – Secondary sources can be updated regularly, but if you’re exploring the data between two updates, the data can be out of date. Researchers will need to consider whether the data available provides the right research coverage dates, so that insights are accurate and timely, or if the data needs to be updated. Also, fast-moving markets may find secondary data expires very quickly.
  • Secondary research needs to be verified and interpreted – Where there’s a lot of data from one source, a researcher needs to review and analyze it. The data may need to be verified against other data sets or your hypotheses for accuracy and to ensure you’re using the right data for your research.
  • The researcher has had no control over the secondary research – As the researcher has not been involved in the secondary research, invalid data can affect the results. It’s therefore vital that the methodology and controls are closely reviewed so that the data is collected in a systematic and error-free way.
  • Secondary research data is not exclusive – As data sets are commonly available, there is no exclusivity and many researchers can use the same data. This can be problematic where researchers want to have exclusive rights over the research results and risk duplication of research in the future.

When do we conduct secondary research?

Now that you know the basics of secondary research, when do researchers normally conduct secondary research?

It’s often used at the beginning of research, when the researcher is trying to understand the current landscape . In addition, if the research area is new to the researcher, it can form crucial background context to help them understand what information exists already. This can plug knowledge gaps, supplement the researcher’s own learning or add to the research.

Secondary research can also be used in conjunction with primary research. Secondary research can become the formative research that helps pinpoint where further primary research is needed to find out specific information. It can also support or verify the findings from primary research.

You can use secondary research where high levels of control aren’t needed by the researcher, but a lot of knowledge on a topic is required from different angles.

Secondary research should not be used in place of primary research as both are very different and are used for various circumstances.

Questions to ask before conducting secondary research

Before you start your secondary research, ask yourself these questions:

  • Is there similar internal data that we have created for a similar area in the past?

If your organization has past research, it’s best to review this work before starting a new project. The older work may provide you with the answers, and give you a starting dataset and context of how your organization approached the research before. However, be mindful that the work is probably out of date and view it with that note in mind. Read through and look for where this helps your research goals or where more work is needed.

  • What am I trying to achieve with this research?

When you have clear goals, and understand what you need to achieve, you can look for the perfect type of secondary or primary research to support the aims. Different secondary research data will provide you with different information – for example, looking at news stories to tell you a breakdown of your market’s buying patterns won’t be as useful as internal or external data e-commerce and sales data sources.

  • How credible will my research be?

If you are looking for credibility, you want to consider how accurate the research results will need to be, and if you can sacrifice credibility for speed by using secondary sources to get you started. Bear in mind which sources you choose — low-credibility data sites, like political party websites that are highly biased to favor their own party, would skew your results.

  • What is the date of the secondary research?

When you’re looking to conduct research, you want the results to be as useful as possible , so using data that is 10 years old won’t be as accurate as using data that was created a year ago. Since a lot can change in a few years, note the date of your research and look for earlier data sets that can tell you a more recent picture of results. One caveat to this is using data collected over a long-term period for comparisons with earlier periods, which can tell you about the rate and direction of change.

  • Can the data sources be verified? Does the information you have check out?

If you can’t verify the data by looking at the research methodology, speaking to the original team or cross-checking the facts with other research, it could be hard to be sure that the data is accurate. Think about whether you can use another source, or if it’s worth doing some supplementary primary research to replicate and verify results to help with this issue.

We created a front-to-back guide on conducting market research, The ultimate guide to conducting market research , so you can understand the research journey with confidence.

In it, you’ll learn more about:

  • What effective market research looks like
  • The use cases for market research
  • The most important steps to conducting market research
  • And how to take action on your research findings

Download the free guide for a clearer view on secondary research and other key research types for your business.

Related resources

Market intelligence 10 min read, marketing insights 11 min read, ethnographic research 11 min read, qualitative vs quantitative research 13 min read, qualitative research questions 11 min read, qualitative research design 12 min read, primary vs secondary research 14 min read, request demo.

Ready to learn more about Qualtrics?

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Methodology

  • What is Secondary Research? | Definition, Types, & Examples

What is Secondary Research? | Definition, Types, & Examples

Published on January 20, 2023 by Tegan George . Revised on January 12, 2024.

Secondary research is a research method that uses data that was collected by someone else. In other words, whenever you conduct research using data that already exists, you are conducting secondary research. On the other hand, any type of research that you undertake yourself is called primary research .

Secondary research can be qualitative or quantitative in nature. It often uses data gathered from published peer-reviewed papers, meta-analyses, or government or private sector databases and datasets.

Table of contents

When to use secondary research, types of secondary research, examples of secondary research, advantages and disadvantages of secondary research, other interesting articles, frequently asked questions.

Secondary research is a very common research method, used in lieu of collecting your own primary data. It is often used in research designs or as a way to start your research process if you plan to conduct primary research later on.

Since it is often inexpensive or free to access, secondary research is a low-stakes way to determine if further primary research is needed, as gaps in secondary research are a strong indication that primary research is necessary. For this reason, while secondary research can theoretically be exploratory or explanatory in nature, it is usually explanatory: aiming to explain the causes and consequences of a well-defined problem.

Prevent plagiarism. Run a free check.

Secondary research can take many forms, but the most common types are:

Statistical analysis

Literature reviews, case studies, content analysis.

There is ample data available online from a variety of sources, often in the form of datasets. These datasets are often open-source or downloadable at a low cost, and are ideal for conducting statistical analyses such as hypothesis testing or regression analysis .

Credible sources for existing data include:

  • The government
  • Government agencies
  • Non-governmental organizations
  • Educational institutions
  • Businesses or consultancies
  • Libraries or archives
  • Newspapers, academic journals, or magazines

A literature review is a survey of preexisting scholarly sources on your topic. It provides an overview of current knowledge, allowing you to identify relevant themes, debates, and gaps in the research you analyze. You can later apply these to your own work, or use them as a jumping-off point to conduct primary research of your own.

Structured much like a regular academic paper (with a clear introduction, body, and conclusion), a literature review is a great way to evaluate the current state of research and demonstrate your knowledge of the scholarly debates around your topic.

A case study is a detailed study of a specific subject. It is usually qualitative in nature and can focus on  a person, group, place, event, organization, or phenomenon. A case study is a great way to utilize existing research to gain concrete, contextual, and in-depth knowledge about your real-world subject.

You can choose to focus on just one complex case, exploring a single subject in great detail, or examine multiple cases if you’d prefer to compare different aspects of your topic. Preexisting interviews , observational studies , or other sources of primary data make for great case studies.

Content analysis is a research method that studies patterns in recorded communication by utilizing existing texts. It can be either quantitative or qualitative in nature, depending on whether you choose to analyze countable or measurable patterns, or more interpretive ones. Content analysis is popular in communication studies, but it is also widely used in historical analysis, anthropology, and psychology to make more semantic qualitative inferences.

Primary Research and Secondary Research

Secondary research is a broad research approach that can be pursued any way you’d like. Here are a few examples of different ways you can use secondary research to explore your research topic .

Secondary research is a very common research approach, but has distinct advantages and disadvantages.

Advantages of secondary research

Advantages include:

  • Secondary data is very easy to source and readily available .
  • It is also often free or accessible through your educational institution’s library or network, making it much cheaper to conduct than primary research .
  • As you are relying on research that already exists, conducting secondary research is much less time consuming than primary research. Since your timeline is so much shorter, your research can be ready to publish sooner.
  • Using data from others allows you to show reproducibility and replicability , bolstering prior research and situating your own work within your field.

Disadvantages of secondary research

Disadvantages include:

  • Ease of access does not signify credibility . It’s important to be aware that secondary research is not always reliable , and can often be out of date. It’s critical to analyze any data you’re thinking of using prior to getting started, using a method like the CRAAP test .
  • Secondary research often relies on primary research already conducted. If this original research is biased in any way, those research biases could creep into the secondary results.

Many researchers using the same secondary research to form similar conclusions can also take away from the uniqueness and reliability of your research. Many datasets become “kitchen-sink” models, where too many variables are added in an attempt to draw increasingly niche conclusions from overused data . Data cleansing may be necessary to test the quality of the research.

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Normal distribution
  • Degrees of freedom
  • Null hypothesis
  • Discourse analysis
  • Control groups
  • Mixed methods research
  • Non-probability sampling
  • Quantitative research
  • Inclusion and exclusion criteria

Research bias

  • Rosenthal effect
  • Implicit bias
  • Cognitive bias
  • Selection bias
  • Negativity bias
  • Status quo bias

A systematic review is secondary research because it uses existing research. You don’t collect new data yourself.

The research methods you use depend on the type of data you need to answer your research question .

  • If you want to measure something or test a hypothesis , use quantitative methods . If you want to explore ideas, thoughts and meanings, use qualitative methods .
  • If you want to analyze a large amount of readily-available data, use secondary data. If you want data specific to your purposes with control over how it is generated, collect primary data.
  • If you want to establish cause-and-effect relationships between variables , use experimental methods. If you want to understand the characteristics of a research subject, use descriptive methods.

Quantitative research deals with numbers and statistics, while qualitative research deals with words and meanings.

Quantitative methods allow you to systematically measure variables and test hypotheses . Qualitative methods allow you to explore concepts and experiences in more detail.

Sources in this article

We strongly encourage students to use sources in their work. You can cite our article (APA Style) or take a deep dive into the articles below.

George, T. (2024, January 12). What is Secondary Research? | Definition, Types, & Examples. Scribbr. Retrieved June 9, 2024, from https://www.scribbr.com/methodology/secondary-research/
Largan, C., & Morris, T. M. (2019). Qualitative Secondary Research: A Step-By-Step Guide (1st ed.). SAGE Publications Ltd.
Peloquin, D., DiMaio, M., Bierer, B., & Barnes, M. (2020). Disruptive and avoidable: GDPR challenges to secondary research uses of data. European Journal of Human Genetics , 28 (6), 697–705. https://doi.org/10.1038/s41431-020-0596-x

Is this article helpful?

Tegan George

Tegan George

Other students also liked, primary research | definition, types, & examples, how to write a literature review | guide, examples, & templates, what is a case study | definition, examples & methods, what is your plagiarism score.

  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • QuestionPro

survey software icon

  • Solutions Industries Gaming Automotive Sports and events Education Government Travel & Hospitality Financial Services Healthcare Cannabis Technology Use Case NPS+ Communities Audience Contactless surveys Mobile LivePolls Member Experience GDPR Positive People Science 360 Feedback Surveys
  • Resources Blog eBooks Survey Templates Case Studies Training Help center

secondary research government reports

Home Market Research

Secondary Research: Definition, Methods and Examples.

secondary research

In the world of research, there are two main types of data sources: primary and secondary. While primary research involves collecting new data directly from individuals or sources, secondary research involves analyzing existing data already collected by someone else. Today we’ll discuss secondary research.

One common source of this research is published research reports and other documents. These materials can often be found in public libraries, on websites, or even as data extracted from previously conducted surveys. In addition, many government and non-government agencies maintain extensive data repositories that can be accessed for research purposes.

LEARN ABOUT: Research Process Steps

While secondary research may not offer the same level of control as primary research, it can be a highly valuable tool for gaining insights and identifying trends. Researchers can save time and resources by leveraging existing data sources while still uncovering important information.

What is Secondary Research: Definition

Secondary research is a research method that involves using already existing data. Existing data is summarized and collated to increase the overall effectiveness of the research.

One of the key advantages of secondary research is that it allows us to gain insights and draw conclusions without having to collect new data ourselves. This can save time and resources and also allow us to build upon existing knowledge and expertise.

When conducting secondary research, it’s important to be thorough and thoughtful in our approach. This means carefully selecting the sources and ensuring that the data we’re analyzing is reliable and relevant to the research question . It also means being critical and analytical in the analysis and recognizing any potential biases or limitations in the data.

LEARN ABOUT: Level of Analysis

Secondary research is much more cost-effective than primary research , as it uses already existing data, unlike primary research, where data is collected firsthand by organizations or businesses or they can employ a third party to collect data on their behalf.

LEARN ABOUT: Data Analytics Projects

Secondary Research Methods with Examples

Secondary research is cost-effective, one of the reasons it is a popular choice among many businesses and organizations. Not every organization is able to pay a huge sum of money to conduct research and gather data. So, rightly secondary research is also termed “ desk research ”, as data can be retrieved from sitting behind a desk.

secondary research government reports

The following are popularly used secondary research methods and examples:

1. Data Available on The Internet

One of the most popular ways to collect secondary data is the internet. Data is readily available on the internet and can be downloaded at the click of a button.

This data is practically free of cost, or one may have to pay a negligible amount to download the already existing data. Websites have a lot of information that businesses or organizations can use to suit their research needs. However, organizations need to consider only authentic and trusted website to collect information.

2. Government and Non-Government Agencies

Data for secondary research can also be collected from some government and non-government agencies. For example, US Government Printing Office, US Census Bureau, and Small Business Development Centers have valuable and relevant data that businesses or organizations can use.

There is a certain cost applicable to download or use data available with these agencies. Data obtained from these agencies are authentic and trustworthy.

3. Public Libraries

Public libraries are another good source to search for data for this research. Public libraries have copies of important research that were conducted earlier. They are a storehouse of important information and documents from which information can be extracted.

The services provided in these public libraries vary from one library to another. More often, libraries have a huge collection of government publications with market statistics, large collection of business directories and newsletters.

4. Educational Institutions

Importance of collecting data from educational institutions for secondary research is often overlooked. However, more research is conducted in colleges and universities than any other business sector.

The data that is collected by universities is mainly for primary research. However, businesses or organizations can approach educational institutions and request for data from them.

5. Commercial Information Sources

Local newspapers, journals, magazines, radio and TV stations are a great source to obtain data for secondary research. These commercial information sources have first-hand information on economic developments, political agenda, market research, demographic segmentation and similar subjects.

Businesses or organizations can request to obtain data that is most relevant to their study. Businesses not only have the opportunity to identify their prospective clients but can also know about the avenues to promote their products or services through these sources as they have a wider reach.

Key Differences between Primary Research and Secondary Research

Understanding the distinction between primary research and secondary research is essential in determining which research method is best for your project. These are the two main types of research methods, each with advantages and disadvantages. In this section, we will explore the critical differences between the two and when it is appropriate to use them.

Research is conducted first hand to obtain data. Researcher “owns” the data collected. Research is based on data collected from previous researches.
is based on raw data. Secondary research is based on tried and tested data which is previously analyzed and filtered.
The data collected fits the needs of a researcher, it is customized. Data is collected based on the absolute needs of organizations or businesses.Data may or may not be according to the requirement of a researcher.
Researcher is deeply involved in research to collect data in primary research. As opposed to primary research, secondary research is fast and easy. It aims at gaining a broader understanding of subject matter.
Primary research is an expensive process and consumes a lot of time to collect and analyze data. Secondary research is a quick process as data is already available. Researcher should know where to explore to get most appropriate data.

How to Conduct Secondary Research?

We have already learned about the differences between primary and secondary research. Now, let’s take a closer look at how to conduct it.

Secondary research is an important tool for gathering information already collected and analyzed by others. It can help us save time and money and allow us to gain insights into the subject we are researching. So, in this section, we will discuss some common methods and tips for conducting it effectively.

Here are the steps involved in conducting secondary research:

1. Identify the topic of research: Before beginning secondary research, identify the topic that needs research. Once that’s done, list down the research attributes and its purpose.

2. Identify research sources: Next, narrow down on the information sources that will provide most relevant data and information applicable to your research.

3. Collect existing data: Once the data collection sources are narrowed down, check for any previous data that is available which is closely related to the topic. Data related to research can be obtained from various sources like newspapers, public libraries, government and non-government agencies etc.

4. Combine and compare: Once data is collected, combine and compare the data for any duplication and assemble data into a usable format. Make sure to collect data from authentic sources. Incorrect data can hamper research severely.

4. Analyze data: Analyze collected data and identify if all questions are answered. If not, repeat the process if there is a need to dwell further into actionable insights.

Advantages of Secondary Research

Secondary research offers a number of advantages to researchers, including efficiency, the ability to build upon existing knowledge, and the ability to conduct research in situations where primary research may not be possible or ethical. By carefully selecting their sources and being thoughtful in their approach, researchers can leverage secondary research to drive impact and advance the field. Some key advantages are the following:

1. Most information in this research is readily available. There are many sources from which relevant data can be collected and used, unlike primary research, where data needs to collect from scratch.

2. This is a less expensive and less time-consuming process as data required is easily available and doesn’t cost much if extracted from authentic sources. A minimum expenditure is associated to obtain data.

3. The data that is collected through secondary research gives organizations or businesses an idea about the effectiveness of primary research. Hence, organizations or businesses can form a hypothesis and evaluate cost of conducting primary research.

4. Secondary research is quicker to conduct because of the availability of data. It can be completed within a few weeks depending on the objective of businesses or scale of data needed.

As we can see, this research is the process of analyzing data already collected by someone else, and it can offer a number of benefits to researchers.

Disadvantages of Secondary Research

On the other hand, we have some disadvantages that come with doing secondary research. Some of the most notorious are the following:

1. Although data is readily available, credibility evaluation must be performed to understand the authenticity of the information available.

2. Not all secondary data resources offer the latest reports and statistics. Even when the data is accurate, it may not be updated enough to accommodate recent timelines.

3. Secondary research derives its conclusion from collective primary research data. The success of your research will depend, to a greater extent, on the quality of research already conducted by primary research.

LEARN ABOUT: 12 Best Tools for Researchers

In conclusion, secondary research is an important tool for researchers exploring various topics. By leveraging existing data sources, researchers can save time and resources, build upon existing knowledge, and conduct research in situations where primary research may not be feasible.

There are a variety of methods and examples of secondary research, from analyzing public data sets to reviewing previously published research papers. As students and aspiring researchers, it’s important to understand the benefits and limitations of this research and to approach it thoughtfully and critically. By doing so, we can continue to advance our understanding of the world around us and contribute to meaningful research that positively impacts society.

QuestionPro can be a useful tool for conducting secondary research in a variety of ways. You can create online surveys that target a specific population, collecting data that can be analyzed to gain insights into consumer behavior, attitudes, and preferences; analyze existing data sets that you have obtained through other means or benchmark your organization against others in your industry or against industry standards. The software provides a range of benchmarking tools that can help you compare your performance on key metrics, such as customer satisfaction, with that of your peers.

Using QuestionPro thoughtfully and strategically allows you to gain valuable insights to inform decision-making and drive business success. Start today for free! No credit card is required.

LEARN MORE         FREE TRIAL

MORE LIKE THIS

Life@QuestionPro: The Journey of Kristie Lawrence

Life@QuestionPro: The Journey of Kristie Lawrence

Jun 7, 2024

We are on the front end of an innovation that can help us better predict how to transform our customer interactions.

How Can I Help You? — Tuesday CX Thoughts

Jun 5, 2024

secondary research government reports

Why Multilingual 360 Feedback Surveys Provide Better Insights

Jun 3, 2024

Raked Weighting

Raked Weighting: A Key Tool for Accurate Survey Results

May 31, 2024

Other categories

  • Academic Research
  • Artificial Intelligence
  • Assessments
  • Brand Awareness
  • Case Studies
  • Communities
  • Consumer Insights
  • Customer effort score
  • Customer Engagement
  • Customer Experience
  • Customer Loyalty
  • Customer Research
  • Customer Satisfaction
  • Employee Benefits
  • Employee Engagement
  • Employee Retention
  • Friday Five
  • General Data Protection Regulation
  • Insights Hub
  • Life@QuestionPro
  • Market Research
  • Mobile diaries
  • Mobile Surveys
  • New Features
  • Online Communities
  • Question Types
  • Questionnaire
  • QuestionPro Products
  • Release Notes
  • Research Tools and Apps
  • Revenue at Risk
  • Survey Templates
  • Training Tips
  • Uncategorized
  • Video Learning Series
  • What’s Coming Up
  • Workforce Intelligence

What is Secondary Research? Types, Methods, Examples

Appinio Research · 20.09.2023 · 13min read

What Is Secondary Research Types Methods Examples

Have you ever wondered how researchers gather valuable insights without conducting new experiments or surveys? That's where secondary research steps in—a powerful approach that allows us to explore existing data and information others collect.

Whether you're a student, a professional, or someone seeking to make informed decisions, understanding the art of secondary research opens doors to a wealth of knowledge.

What is Secondary Research?

Secondary Research refers to the process of gathering and analyzing existing data, information, and knowledge that has been previously collected and compiled by others. This approach allows researchers to leverage available sources, such as articles, reports, and databases, to gain insights, validate hypotheses, and make informed decisions without collecting new data.

Benefits of Secondary Research

Secondary research offers a range of advantages that can significantly enhance your research process and the quality of your findings.

  • Time and Cost Efficiency: Secondary research saves time and resources by utilizing existing data sources, eliminating the need for data collection from scratch.
  • Wide Range of Data: Secondary research provides access to vast information from various sources, allowing for comprehensive analysis.
  • Historical Perspective: Examining past research helps identify trends, changes, and long-term patterns that might not be immediately apparent.
  • Reduced Bias: As data is collected by others, there's often less inherent bias than in conducting primary research, where biases might affect data collection.
  • Support for Primary Research: Secondary research can lay the foundation for primary research by providing context and insights into gaps in existing knowledge.
  • Comparative Analysis : By integrating data from multiple sources, you can conduct robust comparative analyses for more accurate conclusions.
  • Benchmarking and Validation: Secondary research aids in benchmarking performance against industry standards and validating hypotheses.

Primary Research vs. Secondary Research

When it comes to research methodologies, primary and secondary research each have their distinct characteristics and advantages. Here's a brief comparison to help you understand the differences.

Primary vs Secondary Research Comparison Appinio

Primary Research

  • Data Source: Involves collecting new data directly from original sources.
  • Data Collection: Researchers design and conduct surveys, interviews, experiments, or observations.
  • Time and Resources: Typically requires more time, effort, and resources due to data collection.
  • Fresh Insights: Provides firsthand, up-to-date information tailored to specific research questions.
  • Control: Researchers control the data collection process and can shape methodologies.

Secondary Research

  • Data Source: Involves utilizing existing data and information collected by others.
  • Data Collection: Researchers search, select, and analyze data from published sources, reports, and databases.
  • Time and Resources: Generally more time-efficient and cost-effective as data is already available.
  • Existing Knowledge: Utilizes data that has been previously compiled, often providing broader context.
  • Less Control: Researchers have limited control over how data was collected originally, if any.

Choosing between primary and secondary research depends on your research objectives, available resources, and the depth of insights you require.

Types of Secondary Research

Secondary research encompasses various types of existing data sources that can provide valuable insights for your research endeavors. Understanding these types can help you choose the most relevant sources for your objectives.

Here are the primary types of secondary research:

Internal Sources

Internal sources consist of data generated within your organization or entity. These sources provide valuable insights into your own operations and performance.

  • Company Records and Data: Internal reports, documents, and databases that house information about sales, operations, and customer interactions.
  • Sales Reports and Customer Data: Analysis of past sales trends, customer demographics, and purchasing behavior.
  • Financial Statements and Annual Reports: Financial data, such as balance sheets and income statements, offer insights into the organization's financial health.

External Sources

External sources encompass data collected and published by entities outside your organization.

These sources offer a broader perspective on various subjects.

  • Published Literature and Journals: Scholarly articles, research papers, and academic studies available in journals or online databases.
  • Market Research Reports: Reports from market research firms that provide insights into industry trends, consumer behavior, and market forecasts.
  • Government and NGO Databases: Data collected and maintained by government agencies and non-governmental organizations, offering demographic, economic, and social information.
  • Online Media and News Articles: News outlets and online publications that cover current events, trends, and societal developments.

Each type of secondary research source holds its value and relevance, depending on the nature of your research objectives. Combining these sources lets you understand the subject matter and make informed decisions.

How to Conduct Secondary Research?

Effective secondary research involves a thoughtful and systematic approach that enables you to extract valuable insights from existing data sources. Here's a step-by-step guide on how to navigate the process:

1. Define Your Research Objectives

Before delving into secondary research, clearly define what you aim to achieve. Identify the specific questions you want to answer, the insights you're seeking, and the scope of your research.

2. Identify Relevant Sources

Begin by identifying the most appropriate sources for your research. Consider the nature of your research objectives and the data type you require. Seek out sources such as academic journals, market research reports, official government databases, and reputable news outlets.

3. Evaluate Source Credibility

Ensuring the credibility of your sources is crucial. Evaluate the reliability of each source by assessing factors such as the author's expertise, the publication's reputation, and the objectivity of the information provided. Choose sources that align with your research goals and are free from bias.

4. Extract and Analyze Information

Once you've gathered your sources, carefully extract the relevant information. Take thorough notes, capturing key data points, insights, and any supporting evidence. As you accumulate information, start identifying patterns, trends, and connections across different sources.

5. Synthesize Findings

As you analyze the data, synthesize your findings to draw meaningful conclusions. Compare and contrast information from various sources to identify common themes and discrepancies. This synthesis process allows you to construct a coherent narrative that addresses your research objectives.

6. Address Limitations and Gaps

Acknowledge the limitations and potential gaps in your secondary research. Recognize that secondary data might have inherent biases or be outdated. Where necessary, address these limitations by cross-referencing information or finding additional sources to fill in gaps.

7. Contextualize Your Findings

Contextualization is crucial in deriving actionable insights from your secondary research. Consider the broader context within which the data was collected. How does the information relate to current trends, societal changes, or industry shifts? This contextual understanding enhances the relevance and applicability of your findings.

8. Cite Your Sources

Maintain academic integrity by properly citing the sources you've used for your secondary research. Accurate citations not only give credit to the original authors but also provide a clear trail for readers to access the information themselves.

9. Integrate Secondary and Primary Research (If Applicable)

In some cases, combining secondary and primary research can yield more robust insights. If you've also conducted primary research, consider integrating your secondary findings with your primary data to provide a well-rounded perspective on your research topic.

You can use a market research platform like Appinio to conduct primary research with real-time insights in minutes!

10. Communicate Your Findings

Finally, communicate your findings effectively. Whether it's in an academic paper, a business report, or any other format, present your insights clearly and concisely. Provide context for your conclusions and use visual aids like charts and graphs to enhance understanding.

Remember that conducting secondary research is not just about gathering information—it's about critically analyzing, interpreting, and deriving valuable insights from existing data. By following these steps, you'll navigate the process successfully and contribute to the body of knowledge in your field.

Secondary Research Examples

To better understand how secondary research is applied in various contexts, let's explore a few real-world examples that showcase its versatility and value.

Market Analysis and Trend Forecasting

Imagine you're a marketing strategist tasked with launching a new product in the smartphone industry. By conducting secondary research, you can:

  • Access Market Reports: Utilize market research reports to understand consumer preferences, competitive landscape, and growth projections.
  • Analyze Trends: Examine past sales data and industry reports to identify trends in smartphone features, design, and user preferences.
  • Benchmark Competitors: Compare market share, customer satisfaction, and pricing strategies of key competitors to develop a strategic advantage.
  • Forecast Demand: Use historical sales data and market growth predictions to estimate demand for your new product.

Academic Research and Literature Reviews

Suppose you're a student researching climate change's effects on marine ecosystems. Secondary research aids your academic endeavors by:

  • Reviewing Existing Studies: Analyze peer-reviewed articles and scientific papers to understand the current state of knowledge on the topic.
  • Identifying Knowledge Gaps: Identify areas where further research is needed based on what existing studies still need to cover.
  • Comparing Methodologies: Compare research methodologies used by different studies to assess the strengths and limitations of their approaches.
  • Synthesizing Insights: Synthesize findings from various studies to form a comprehensive overview of the topic's implications on marine life.

Competitive Landscape Assessment for Business Strategy

Consider you're a business owner looking to expand your restaurant chain to a new location. Secondary research aids your strategic decision-making by:

  • Analyzing Demographics: Utilize demographic data from government databases to understand the local population's age, income, and preferences.
  • Studying Local Trends: Examine restaurant industry reports to identify the types of cuisines and dining experiences currently popular in the area.
  • Understanding Consumer Behavior: Analyze online reviews and social media discussions to gauge customer sentiment towards existing restaurants in the vicinity.
  • Assessing Economic Conditions: Access economic reports to evaluate the local economy's stability and potential purchasing power.

These examples illustrate the practical applications of secondary research across various fields to provide a foundation for informed decision-making, deeper understanding, and innovation.

Secondary Research Limitations

While secondary research offers many benefits, it's essential to be aware of its limitations to ensure the validity and reliability of your findings.

  • Data Quality and Validity: The accuracy and reliability of secondary data can vary, affecting the credibility of your research.
  • Limited Contextual Information: Secondary sources might lack detailed contextual information, making it important to interpret findings within the appropriate context.
  • Data Suitability: Existing data might not align perfectly with your research objectives, leading to compromises or incomplete insights.
  • Outdated Information: Some sources might provide obsolete information that doesn't accurately reflect current trends or situations.
  • Potential Bias: While secondary data is often less biased, biases might still exist in the original data sources, influencing your findings.
  • Incompatibility of Data: Combining data from different sources might pose challenges due to variations in definitions, methodologies, or units of measurement.
  • Lack of Control: Unlike primary research, you have no control over how data was collected or its quality, potentially affecting your analysis. Understanding these limitations will help you navigate secondary research effectively and make informed decisions based on a well-rounded understanding of its strengths and weaknesses.

Secondary research is a valuable tool that businesses can use to their advantage. By tapping into existing data and insights, companies can save time, resources, and effort that would otherwise be spent on primary research. This approach equips decision-makers with a broader understanding of market trends, consumer behaviors, and competitive landscapes. Additionally, benchmarking against industry standards and validating hypotheses empowers businesses to make informed choices that lead to growth and success.

As you navigate the world of secondary research, remember that it's not just about data retrieval—it's about strategic utilization. With a clear grasp of how to access, analyze, and interpret existing information, businesses can stay ahead of the curve, adapt to changing landscapes, and make decisions that are grounded in reliable knowledge.

How to Conduct Secondary Research in Minutes?

In the world of decision-making, having access to real-time consumer insights is no longer a luxury—it's a necessity. That's where Appinio comes in, revolutionizing how businesses gather valuable data for better decision-making. As a real-time market research platform, Appinio empowers companies to tap into the pulse of consumer opinions swiftly and seamlessly.

  • Fast Insights: Say goodbye to lengthy research processes. With Appinio, you can transform questions into actionable insights in minutes.
  • Data-Driven Decisions: Harness the power of real-time consumer insights to drive your business strategies, allowing you to make informed choices on the fly.
  • Seamless Integration: Appinio handles the research and technical complexities, freeing you to focus on what truly matters: making rapid data-driven decisions that propel your business forward.

Join the loop 💌

Be the first to hear about new updates, product news, and data insights. We'll send it all straight to your inbox.

Get the latest market research news straight to your inbox! 💌

Wait, there's more

Pareto Analysis Definition Pareto Chart Examples

30.05.2024 | 29min read

Pareto Analysis: Definition, Pareto Chart, Examples

What is Systematic Sampling Definition Types Examples

28.05.2024 | 32min read

What is Systematic Sampling? Definition, Types, Examples

Time Series Analysis Definition Types Techniques Examples

16.05.2024 | 30min read

Time Series Analysis: Definition, Types, Techniques, Examples

  • Login to Survey Tool Review Center

Secondary Research Advantages, Limitations, and Sources

Summary: secondary research should be a prerequisite to the collection of primary data, but it rarely provides all the answers you need. a thorough evaluation of the secondary data is needed to assess its relevance and accuracy..

5 minutes to read. By author Michaela Mora on January 25, 2022 Topics: Relevant Methods & Tips , Business Strategy , Market Research

Secondary Research

Secondary research is based on data already collected for purposes other than the specific problem you have. Secondary research is usually part of exploratory market research designs.

The connection between the specific purpose that originates the research is what differentiates secondary research from primary research. Primary research is designed to address specific problems. However, analysis of available secondary data should be a prerequisite to the collection of primary data.

Advantages of Secondary Research

Secondary data can be faster and cheaper to obtain, depending on the sources you use.

Secondary research can help to:

  • Answer certain research questions and test some hypotheses.
  • Formulate an appropriate research design (e.g., identify key variables).
  • Interpret data from primary research as it can provide some insights into general trends in an industry or product category.
  • Understand the competitive landscape.

Limitations of Secondary Research

The usefulness of secondary research tends to be limited often for two main reasons:

Lack of relevance

Secondary research rarely provides all the answers you need. The objectives and methodology used to collect the secondary data may not be appropriate for the problem at hand.

Given that it was designed to find answers to a different problem than yours, you will likely find gaps in answers to your problem. Furthermore, the data collection methods used may not provide the data type needed to support the business decisions you have to make (e.g., qualitative research methods are not appropriate for go/no-go decisions).

Lack of Accuracy

Secondary data may be incomplete and lack accuracy depending on;

  • The research design (exploratory, descriptive, causal, primary vs. repackaged secondary data, the analytical plan, etc.)
  • Sampling design and sources (target audiences, recruitment methods)
  • Data collection method (qualitative and quantitative techniques)
  • Analysis point of view (focus and omissions)
  • Reporting stages (preliminary, final, peer-reviewed)
  • Rate of change in the studied topic (slowly vs. rapidly evolving phenomenon, e.g., adoption of specific technologies).
  • Lack of agreement between data sources.

Criteria for Evaluating Secondary Research Data

Before taking the information at face value, you should conduct a thorough evaluation of the secondary data you find using the following criteria:

  • Purpose : Understanding why the data was collected and what questions it was trying to answer will tell us how relevant and useful it is since it may or may not be appropriate for your objectives.
  • Methodology used to collect the data : Important to understand sources of bias.
  • Accuracy of data: Sources of errors may include research design, sampling, data collection, analysis, and reporting.
  • When the data was collected : Secondary data may not be current or updated frequently enough for the purpose that you need.
  • Content of the data : Understanding the key variables, units of measurement, categories used and analyzed relationships may reveal how useful and relevant it is for your purposes.
  • Source reputation : In the era of purposeful misinformation on the Internet, it is important to check the expertise, credibility, reputation, and trustworthiness of the data source.

Secondary Research Data Sources

Compared to primary research, the collection of secondary data can be faster and cheaper to obtain, depending on the sources you use.

Secondary data can come from internal or external sources.

Internal sources of secondary data include ready-to-use data or data that requires further processing available in internal management support systems your company may be using (e.g., invoices, sales transactions, Google Analytics for your website, etc.).

Prior primary qualitative and quantitative research conducted by the company are also common sources of secondary data. They often generate more questions and help formulate new primary research needed.

However, if there are no internal data collection systems yet or prior research, you probably won’t have much usable secondary data at your disposal.

External sources of secondary data include:

  • Published materials
  • External databases
  • Syndicated services.

Published Materials

Published materials can be classified as:

  • General business sources: Guides, directories, indexes, and statistical data.
  • Government sources: Census data and other government publications.

External Databases

In many industries across a variety of topics, there are private and public databases that can bed accessed online or by downloading data for free, a fixed fee, or a subscription.

These databases can include bibliographic, numeric, full-text, directory, and special-purpose databases. Some public institutions make data collected through various methods, including surveys, available for others to analyze.

Syndicated Services

These services are offered by companies that collect and sell pools of data that have a commercial value and meet shared needs by a number of clients, even if the data is not collected for specific purposes those clients may have.

Syndicated services can be classified based on specific units of measurements (e.g., consumers, households, organizations, etc.).

The data collection methods for these data may include:

  • Surveys (Psychographic and Lifestyle, advertising evaluations, general topics)
  • Household panels (Purchase and media use)
  • Electronic scanner services (volume tracking data, scanner panels, scanner panels with Cable TV)
  • Audits (retailers, wholesalers)
  • Direct inquiries to institutions
  • Clipping services tracking PR for institutions
  • Corporate reports

You can spend hours doing research on Google in search of external sources, but this is likely to yield limited insights. Books, articles journals, reports, blogs posts, and videos you may find online are usually analyses and summaries of data from a particular perspective. They may be useful and give you an indication of the type of data used, but they are not the actual data. Whenever possible, you should look at the actual raw data used to draw your own conclusion on its value for your research objectives. You should check professionally gathered secondary research.

Here are some external secondary data sources often used in market research that you may find useful as starting points in your research. Some are free, while others require payment.

  • Pew Research Center : Reports about the issues, attitudes, and trends shaping the world. It conducts public opinion polling, demographic research, media content analysis, and other empirical social science research.
  • Data.Census.gov : Data dissemination platform to access demographic and economic data from the U.S. Census Bureau.
  • Data.gov : The US. government’s open data source with almost 200,00 datasets ranges in topics from health, agriculture, climate, ecosystems, public safety, finance, energy, manufacturing, education, and business.
  • Google Scholar : A web search engine that indexes the full text or metadata of scholarly literature across an array of publishing formats and disciplines.
  • Google Public Data Explorer : Makes large, public-interest datasets easy to explore, visualize and communicate.
  • Google News Archive : Allows users to search historical newspapers and retrieve scanned images of their pages.
  • Mckinsey & Company : Articles based on analyses of various industries.
  • Statista : Business data platform with data across 170+ industries and 150+ countries.
  • Claritas : Syndicated reports on various market segments.
  • Mintel : Consumer reports combining exclusive consumer research with other market data and expert analysis.
  • MarketResearch.com : Data aggregator with over 350 publishers covering every sector of the economy as well as emerging industries.
  • Packaged Facts : Reports based on market research on consumer goods and services industries.
  • Dun & Bradstreet : Company directory with business information.

Related Articles

  • What Is Market Research?
  • Step by Step Guide to the Market Research Process
  • How to Leverage UX and Market Research To Understand Your Customers
  • Why Your Business Needs Discovery Research
  • Your Market Research Plan to Succeed As a Startup
  • Top Reason Why Businesses Fail & What To Do About It
  • What To Value In A Market Research Vendor
  • Don’t Let The Budget Dictate Your Market Research Approach
  • How To Use Research To Find High-Order Brand Benefits
  • How To Prioritize What To Research
  • Don’t Just Trust Your Gut — Do Research
  • Understanding the Pros and Cons of Mixed-Mode Research

Subscribe to our newsletter to get notified about future articles

Subscribe and don’t miss anything!

Recent Articles

  • How AI Can Further Remove Researchers in Search of Productivity and Lower Costs
  • Re: Design/Growth Podcast – Researching User Experiences for Business Growth
  • Why You Need Positioning Concept Testing in New Product Development
  • Why Conjoint Analysis Is Best for Price Research
  • The Rise of UX
  • Making the Case Against the Van Westendorp Price Sensitivity Meter
  • How to Future-Proof Experience Management and Your Business
  • When Using Focus Groups Makes Sense
  • How to Make Segmentation Research Actionable
  • How To Integrate Market Research and UX Research for Desired Business Outcomes

Popular Articles

  • Which Rating Scales Should I Use?
  • What To Consider in Survey Design
  • 6 Decisions To Make When Designing Product Concept Tests
  • Write Winning Product Concepts To Get Accurate Results In Concept Tests
  • How to Use Qualitative and Quantitative Research in Product Development
  • The Opportunity of UX Research Webinar
  • Myths & Misunderstandings About UX – MR Realities Podcast
  • 12 Research Techniques to Solve Choice Overload
  • Concept Testing for UX Researchers
  • UX Research Geeks Podcast – Using Market Research for Better Context in UX
  • A Researcher’s Path – Data Stories Leaders At Work Podcast
  • How To Improve Racial and Gender Inclusion in Survey Design

GDPR

  • Privacy Overview
  • Strictly Necessary Cookies

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.

Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.

If you disable this cookie, we will not be able to save your preferences. This means that every time you visit this website you will need to enable or disable cookies again.

  • Search Menu
  • Sign in through your institution
  • Special Issues
  • Author Guidelines
  • Submission Site
  • Open Access
  • About Journal of Law and the Biosciences
  • About the Duke University School of Law
  • About the Harvard Law School
  • About Stanford Law School
  • Editorial Board
  • Advertising and Corporate Services
  • Journals Career Network
  • Self-Archiving Policy
  • Journals on Oxford Academic
  • Books on Oxford Academic

Issue Cover

Article Contents

I. introduction, ii. health data and biospecimens as a common good, iii. current governance of health data and biospecimens, iv. limitations in current governance of the secondary research market, v. amcs should create new policies to control the revolving door between academia and industry, vi. conclusion.

  • < Previous

Governing secondary research use of health data and specimens: the inequitable distribution of regulatory burden between federally funded and industry research

Assistant professor, Obstetrics and Gynecology, University of Michigan Medical School; associate director, Center for Bioethics & Social Sciences in Medicine. This work was supported by the National Human Genome Research Institute (K01HG010496), the National Cancer Institute (R01CA21482904), and the National Center for Advancing Translational Sciences (UL1TR002240). Thank you to the participants in the Saint Louis University School of Law Center for Health Law Studies and the American Society of Law, Medicine and Ethics’ 2019 Health Law Scholars Workshop, as well as Professors Sharona Hoffman, Paul Lombardo, Kirsten Ostherr, Elizabeth Pendo, W. Nicholson Price II, Tara Sklar, and Ruqaiijah Yearby for their insightful comments on a previous draft. All errors are my own.

  • Article contents
  • Figures & tables
  • Supplementary Data

Kayte Spector-Bagdady, Governing secondary research use of health data and specimens: the inequitable distribution of regulatory burden between federally funded and industry research, Journal of Law and the Biosciences , Volume 8, Issue 1, January-June 2021, lsab008, https://doi.org/10.1093/jlb/lsab008

  • Permissions Icon Permissions

Some of the most promising recent advances in health research offer opportunities to improve diagnosis and therapy for millions of patients. They also require access to massive collections of health data and specimens. This need has generated an aggressive and lucrative push toward amassing troves of human data and biospecimens within academia and private industry. But the differences between the strict regulations that govern federally funded researchers in academic medical centers (AMCs) versus those that apply to the collection of health data and specimens by industry can entrench disparities. This article will discuss the value of secondary research with data and specimens and analyze why AMCs have been put at a disadvantage as compared to industry in amassing the large datasets that enable this work. It will explore the limitations of this current governance structure and propose that, moving forward, AMCs should set their own standards for commercialization of the data and specimens they generate in-house, the ability of their researchers to use industry data for their own work, and baseline informed consent standards for their own patients in order to ensure future data accessibility.

Some of the most promising recent advances in health research—including ‘precision medicine’ and other genetic, machine learning, and big-data protocols—offer opportunities to improve diagnosis and therapy for millions of patients. 1 , 2 They also require access to massive collections of health data and specimens to analyze the correlations between genetic variants, behaviors, environment, and health outcomes. 3 This need has generated an aggressive and lucrative push toward amassing troves of human biospecimens, health, and health-proxy data. 4 But there are profound differences between the strict federal human subjects research regulations that govern federally funded researchers versus the rules that apply to the collection of health data and specimens by industry. This difference can entrench disparities in the value and breadth of data available as between the two. But why would the US government make it harder for federally funded researchers to accomplish their work than industry-funded ones?

In 2019 the US Department of Health and Human Services substantively revised the ‘Common Rule’ portion of the Human Subjects Research Regulations (which generally regulate federally-funded researchers) for the first time since its original conceptualization in the 1980s. 5 , 6 The revisions attempted to significantly update governance over the emerging area of ‘secondary research’ with data or biospecimens, ie research in addition to the clinical care or primary research protocol for which the biospecimen or data were originally procured.

In the wake of the immensely popular book The Immortal Life of Henrietta Lacks , 7 and related empirical studies, 8 the research community is on notice that people generally want information regarding whether their specimens collected by hospitals may be ‘commercialized’ or sold to private industry—and that there are important differences in preferences by race and ethnicity. 9 An updated federal regulatory requirement, consistent with this interest, stipulates that if biospecimens collected in research might later be commercialized (even if they are completely deidentified), federally-funded researchers must disclose this possibility to participants. But we also know that when people understand that their specimens may be commercialized, the majority are uncomfortable with such use. 10

Also in 2019, Ascension health—which holds the medical records of patients across 20 states and the District of Columbia—shared fully identified medical records of 50 million patients with Google for research. 11 Google argued that the transaction was consistent with the Business Associate Agreement requirements under the Privacy Rule of the Health Information Portability and Accountability Act (HIPAA), 12 but its legality is currently under investigation by the US Department of Health & Human Service’s (DHHS) Office for Civil Rights 13 and several state senators. 14

Although health research has been widely recognized as a public good, an increasing amount of health data and specimens are actually under industry control, and current federal regulations have not been able to protect public accessibility. Instead, we are currently living in a myopic binary system where federally-funded researchers are highly regulated and commercial health data collectors are barely regulated at all. But Academic Medical Centers (AMCs), which receive over $12.5B in National Institutes of Health (NIH) funding per year alone, 15 have more negotiation power than they are currently leveraging. Comprehensive databanks require a diversity of health-related data, which is easier to collect when people have a compelling basic interest in sharing it (ie at a hospital or clinic), as well as when insurance is reimbursing to annotate health information in detail. In addition, industry needs academic collaborators to conduct research, publish articles, and treat the patients and write the prescriptions that make the private data valuable in the first place. AMCs could use this captive audience not just to negotiate for licenses for future machine-learning products (as some are currently doing 16 ) but also to protect and improve the treatment of their patients and participants.

Part I of this article will discuss the value of secondary research with data and specimens. Part II will analyze why AMCs have been put at a disadvantage as compared to industry in amassing the large datasets that enable this work. It will specifically focus on the origin of the human subjects research regulations, their scope, regulatory alternatives for industry dataset governance, and the increasing mixture of health data and specimens across entities. Part III will explore the limitations of this current governance structure, arguing that the federal regulations are not accomplishing their stated goal of informed consent, and they are also not appropriately calibrating to the risks and benefits of secondary research. In addition, the regulations as written are serving to actually encourage a private data and specimen market with associated limitations. And Part IV will propose that, moving forward, AMCs should set their own standards for commercialization of the data and specimens they generate in-house, the ability of their researchers to use industry data for their own work, and baseline informed consent standards for their own patients and participants.

The human subjects research governance structure was created to govern research with humans. It was not designed to govern research with all the stuff derived from them. The consequences of this imbalance between emerging research protocols and static regulations is the government-enabled privatization of one of our most valuable health resources. But there are steps that AMCs can and should currently take to better control the revolving door of data and specimens between AMCs and industry to protect access to the data and specimens needed for life-saving research moving forward.

In order to achieve promises made by the Precision Medicine Initiative and other cutting-edge health campaigns, researchers need data—regarding lifestyle choices, environment, health outcomes, and genetics—derived from biospecimens, medical records, wearable technologies, and other collection methods. Through ‘big data’ researchers can slowly piece together what health outcomes individuals can control, or clinicians can treat, and subsequently improve. 17 This has led some scholars to categorize health data as critical ‘infrastructural resource’; its value derived from downstream uses rather than as an end product in and of itself. 18

Ruth Faden and colleagues have argued that medical centers should be reconceptualized as ‘learning health systems’ which would ‘have continuous access to information about as many patients as possible to be efficient, affordable, fair, and of highest quality.’ 19 But data and biospecimens currently used for secondary research often come from broader sources than a single medical center. 20 Personal health data are collected across the Internet, apps, and other data-capture mechanisms via algorithmic systems in people’s devices, wearables, homes, work, personal lives, and leisure activities. 21 This makes big data a big business. 22 In 2017, the personal data market was valued at $35 billion—and it is projected to reach $103 billion by 2027. 23

Resources that are beneficial to a large part of the population, such as natural resources like fresh water or man-made resources such as data or biobanks, are often described as ‘common resources’. A unique set of theories regarding the best ways to govern these resources has direct applicability to the health data and biospecimen commons. 24 An oft-invoked example of a commons is a shared pasture upon which sheep may graze. If sheep overgraze, the grass will diminish, and no sheep will be able to graze. Thus, although any individual farmer might benefit from adding to her flock and allowing an additional sheep to graze, if all the farmers acted in such a way indefinitely, there would be no grass left for any sheep and everyone will be worse off. As Garett Hardin 25 originally argued:

Therein is the tragedy. Each man is locked into a system that compels him to increase his herd without limit – in a world that is limited. Ruin is the destination toward which all men rush, each pursuing his own best interest in a society that believes in the freedom in the commons. Freedom in a commons brings ruin to all. 26

But how to avoid such ruin? Hardin offers two solutions: privatization or government regulation (‘These, I think, are all the reasonable possibilities. They are all objectionable. But we must choose – or acquiesce in the destruction of the commons….’). 27 In other words, if the government is not governing a commons adequately, private interests will. Thus, under social contract theory, communities are incentivized to give up some amount of their liberty in exchange for living within a system that offers centralized protection for community well-being. But, although private industry ‘may well contribute to the common good’ they ‘are not guardians of this common good.’ 28 The common good is not a private matter, it is the purview of government. 29 So then what should we do as a society if the government is not adequately protecting it?

In order to fully discuss potential solutions to balancing the breadth and value of industry versus AMC secondary research banks, we must first understand how they are currently governed. This section will review the current human subjects research regulatory structure, discuss limitations inherent therein, explore alternative regulations potentially applicable to industry databanks, and end by arguing that—while the entire system is founded on controlling the entity that acquired the health data or specimen in the first place—data and specimens are becoming increasingly mixed across entities, thereby undermining a central premise of the entire governance system.

II.A. Human Subjects Research Regulations

The current US human subjects research regulations were developed in the 1970s and ‘80s in the wake of research catastrophes, such as the infamous syphilis experiments in Tuskegee, Alabama conducted by the US Public Health Service. 30 These regulations are steadfastly founded in the need for the informed consent of the individual, 31 as a demonstration of her autonomy interests, before she may be enrolled in research as a participant. Case law focused on secondary research use of data and specimens, however, narrowly assesses them on their value to research—as opposed to the individual from whom they were derived. However, new cases regarding modern data-sharing relationships between AMCs and industry may begin to elucidate how the law will attempt to control these kinds of relationships moving forward.

III.A.1. The Need to Regulate Human Subjects Research

Starting with the Hippocratic Oath, the original foundation of medical ethics as a field rested in the concept of beneficence: the idea that clinicians ought not to inflict harm upon patients and should instead promote good. 32 Even after the atrocities committed during the Holocaust by Nazi doctors and ‘researchers’, as well as reckonings by Henry Beecher and other critics of the American research enterprise closer to home, 33 there was still an overarching sentiment by the federal government that US clinical researchers were upstanding citizens, just like their clinical counterparts, and their motivations should generally not be questioned. 34

This assumption was challenged in the 1970s with the public revelation of the now-infamous syphilis experiment in Tuskegee, Alabama. 35 During that experiment, researchers from the US Public Health Service lied to impoverished African American men with syphilis, whom they went on to study without notice or consent. In 1932, when the syphilis experiments in Tuskegee began, the only known treatments for syphilis (eg arsenic or mercury) required a long course, were largely ineffective, and had many adverse effects. 36 However, in 1943, the same team of Public Health Service researchers discovered that penicillin could effectively cure syphilis. 37

The syphilis experiment in Tuskegee then took an even darker turn when researchers began attempting to prevent subjects from accessing penicillin that might treat their syphilis…for the next 30 years. 38 It was not until an internal US Centers for Disease Control and Prevention whistle-blower and a journalist brought the syphilis experiment in Tuskegee to the attention of the US media in 1972 that the study was finally stopped. 39 In addition, once discovering a cure for syphilis, many of the same researchers turned to improving prophylactic measures to prevent it in the first place, and conducted experiments between 1946–48 in Guatemala for which they purposefully infected vulnerable subjects with STDs. 40 The experiments in Guatemala were kept secret until rediscovered in 2010. 41 The US clinical research system clearly suffered from critical flaws in oversight with terrible consequences. 42

III.A.2. The Common Rule

In the face of such interventional research failures—with risks and burdens to subjects that were individual, physical, and profound—US bioethicists and regulators reconsidered treating clinical researchers with the same deference given to clinicians. Instead of presuming that the motivation of a researcher was beneficence , regulators instead began to emphasize the potential conflict of interest a researcher has with promoting her own work. Under this view, proposed research protocols had to be assessed by neutral third parties in order to protect the participant from the conflicted researcher. The role of the third party would be to ensure that any potential risks of research were counterbalanced by potential benefits (either to the individual or society) and that individual subjects provided fully informed consent before enrollment. The Belmont Report , authored by the National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research in 1979, thus cemented autonomy’s preeminence as the cornerstone of clinical research ethics. 43

Ironically, although the syphilis experiment in Tuskegee was the inciting incident for the Belmont Report , one of the framework’s main takeaways—that participants should give informed consent to participation in research—would not actually have resolved the profound problems with that work. 44 Although researcher deceit in Tuskegee was surely a great moral defect, even if the participants had consented the experiment should never have taken place. There was also irreconcilable beneficence and justice deficits (the other two Belmont principles) as the potential benefits of the study were not proportional to the risks undertaken and the study targeted impoverished and uneducated Black participants for research that was not intended to specifically help them or their community.

But there is a policy advantage in the regulations’ focus on informed consent. 45 Generating a list of consent form disclosure requirements to be presented to prospective enrollees is an easier goal to achieve than writing the rules that somehow assure justice and fairness. The government thus codified the predominance of disclosure , framed as supporting the informed consent necessary to enable autonomy, into the foundation of the human subjects research governance structure. The first subpart of these regulations were later adopted across the majority of the US federal departments and agencies and thus named the ‘Common Rule’. 46

The Common Rule generally requires Institutional Review Board (IRB) review of proposed protocols to ensure there is an appropriate balance between risks and burdens to participants and potential benefits to the participant herself, or to knowledge generally. There are additional protections for vulnerable populations, such as children, in subsequent subparts. 47 If an IRB has authorized recruitment of participants into a study, individual consent must be given by the individual participant or her representative. 48 And detailed consent form disclosures—such as information about risk, benefits, and alternatives 49 —now act as the gatekeepers through which all generalizable knowledge must pass. Considerations of justice and beneficence are left to the mercies of individual IRBs to assess on a case by case basis. 50 The Common Rule had remained substantially similar to its first iteration until 2019 when it was revised after a 6-year regulatory notice and comment process. 51

III.A.3. Case Law

Although regulatory protections for human subjects research has largely focused on the scope of which participants should be protected, much of the famous human subjects research case law has focused on use and access to biospecimens via the classic trio of Moore v. Regents , 52   Greenberg v. Miami , 53 and Washington University v. Catalona . 54 Together, these cases are less deferential to the autonomy interests of the people from whom the specimens are derived, and are instead founded in supporting researcher access to valuable biobanks. The new dismissal in Dinerstein v. Google 55 demonstrates how some of these issues surrounding data sharing specifically might be litigated henceforth.

In Moore v. Regents (1990), patient John Moore went to doctors at the University of California, Los Angeles for the treatment of his hairy-cell leukemia. His clinicians collected many types of biospecimens from him, including samples taken after such collection was no longer necessary for his clinical care, without disclosing that the purpose of the ongoing collection was research. The investigators were ultimately able to develop a lucrative cell line from his specimens. Once Moore realized what his doctors were doing, he sued for conversion (among other things). 56

The California Supreme Court found that a claim of conversion could not stand because Moore did not retain an ownership interest in his cells once they left his body. It did find that the doctors should have disclosed their research interest to Moore as a potential conflict of interest during his clinical informed consent process. 57 The court ultimately held that ‘the theory of liability that Moore urges us to endorse threatens to destroy the economic incentive to conduct important medical research.… If the use of cells in research is a conversion, then with every cell sample a researcher purchases a ticket in a litigation lottery.’ 58

In Greenberg v. Miami (2003), 59 parents of children affected by Canavan disease, a devastating neurodegenerative disorder, donated money, medical information, and biospecimens to researchers at Miami Children’s hospital. The families stated that their goal was to support the research necessary to isolate the variants associated with (and thereby develop a carrier test for) the disease, as well as work toward a cure. The Miami researchers did in fact identify the variant associated with Canavan, but patented the discovery. 60 The patent restricted others’ ability to offer Canavan carrier testing or do their own therapeutic research.

The families sued for claims including a lack of informed consent and conversion based on their understanding that their contributions were in exchange for ‘affordable and accessible’ testing for other families. 61 The district court in this case declined to recognize a failure in the duty of informed consent between the families and the researchers. The Greenberg court distinguished Moore due to the ‘therapeutic relationship’ between Moore and his physician researchers; in Greenberg the court described defendants as ‘solely medical researchers’. 62 The court bemoaned that finding otherwise would ‘have pernicious effects over medical research….’ 63 The conversion claim was also dismissed since ‘the property right in blood and tissue samples also evaporates once the sample is voluntarily given to a third party’. 64

In the last famous biospecimen case, Washington University v. Catalona (2006), 65 Dr William Catalona planned to move his faculty practice and research on prostate cancer from Washington University to Northwestern. He had personally recruited ~3000 of his patients to contribute samples to the WashU biobank. 66 In anticipation of his move, Dr Catalona sent a letter to the much broader cohort of all (~30,000) contributors whose specimens had been used in his research asking them to donate and release their samples to him at Northwestern for use ‘only at his discretion and with his express consent….’ Six thousand agreed. 67 WashU sued.

The Missouri district court found for WashU in part with reference to Moore and Greenberg , 68 and in so doing made an impassioned policy argument for the importance of enabling accessible biobanks:

Medical research can only advance if access to these materials to the scientific community is not thwarted by private agendas. If left unregulated and to the whims of a [research participant], these highly prized biological materials would become nothing more than chattel going to the highest bidder. It would no longer be a question of the importance of the research protocol to public health, but rather who can pay the most. […] The integrity and utility of all biorepositories would be seriously threatened if [research participants] could move their samples from institution to institution any time they wanted. No longer could research protocols rely on aggregate collections since individual samples would come and go. 69

A more recent example related to data-sharing agreements between an AMC and industry comes in the form of Dinerstein v. Google 70 from the Northern District of Illinois. At issue in Dinerstein is a 2017 data use agreement between the University of Chicago and Google. The goal of this agreement was to generate ‘machine-learning techniques to create predictive health models aimed at reducing hospital readmissions and anticipating future medical events’. 71 To do so, UChicago shared the ‘de-identified’ electronic medical records (EMR) of all adult patients over a 5-year period (the data, in fact, included dates of service). 72

When Mr Dinerstein, a patient, found out his health data had been shared with Google, he sued for breach of contract (among other things) 73 for alleged violations including HIPAA compliance (as the court allowed a private right of action for HIPAA under a state tort law claim 74 ). 75 Under HIPAA, covered entitles may not generally sell protected health information (PHI) without written permission. But this prohibition does not include sharing PHI for research purposes in exchange for a ‘reasonable cost based fee to cover the cost to prepare and transmit’ said PHI. 76 In Dinerstein , the data use agreement between Google and UChicago granted the University, for internal ‘non-commercial research’ purposes, ‘a nonexclusive, perpetual license to use the [ ] Trained Models and Predictions’ created by Google. 77 Defendants argued that the perpetual license was, in fact, only a reasonable cost-based fee. However the Court found that: ‘Whatever a perpetual license for “Trained Models and Predictions” actually means, it appears to qualify as direct or indirect remuneration.’ 78 But ultimately the Court ruled that none of Mr Dinerstein’s arguments could support a claim for relief 79 because Illinois neither recognizes noneconomic breach of contract damages, eg ‘anxiety and distress’ (but for under exceptional circumstances), 80 nor could the plaintiff establish economic damages to his EMR data (which were not even recognized as his property to begin with). 81 In summary, it will be exceptionally hard for plaintiffs to move past dismissal in the future unless they can establish that the invasion of their privacy resulted in specific financial injury.

III.B. Limitations in the Scope of the Human Subjects Research Regulations

Despite the rich regulatory and case law background of the governance of human subjects research, a major limitation is that this governance structure does not actually cover all human subjects research. Even where the regulations do apply, they only govern research that is interventional or involves identified biospecimens or data—which excludes a large number of secondary protocols. And, even when the regulations apply, and even if the biospecimens or data are identifiable, there are still options for conducting the research with a waiver of informed consent. As Neil Richards and Woodrow Hartzog have argued: consent ‘transforms the moral landscape between people and makes the otherwise impossible possible’. 82 But the human subjects research regulations allow an exceptional number of exceptions to actually procuring such consent.

III.B.1. Not All Human Subjects Research

First, the human subjects research regulations do not actually apply to all human subjects research. There are, in fact, only four circumstances in which they do apply: (i) If the investigator is conducting research using federal funding from a US Department or Agency that has adopted the Common Rule. 83 This is understood as a reasonable derivation of Congressional spending power, ie if you are going to take the government’s money to do research, it has the right to put limits on usage. 84 When the Common Rule was revised in 2018, regulators considered extending its reach to include all clinical trials at US institutions that receive some federal support for human subjects research (regardless of the funding of the specific study). 85 But the final iteration of the updated rule did not extend its scope. 86 (ii) If an institution voluntarily decides to extend the regulatory requirements to all of its employees conducting human subjects research (as many do). 87 (iii) If researchers are using an investigational-only product which requires US Food and Drug Administration (FDA) authorization to ship and use in interstate commerce. 88 Or (iv), if investigators wish to submit data derived from their research to FDA in support of an application for research or marketing, they have to follow substantially similar FDA regulations regarding the protection of human subjects. 89

These limitations mean that privately funded research is outside the scope of federal regulations if it does not involve product requiring FDA authorization to distribute or if researchers do not ultimately submit their data in support of an FDA application. 90 Some private entities may choose to follow components of the research regulatory structure due to other market motivations, 91 discussed further below, but many do not. Recent more comprehensive data privacy legislation has been adopted in Europe 92 and California, 93 although both still only regulate ‘identified’ data, have ambiguously broad exceptions for research in public health, and do not cover health-related data sharing in the USA. 94 For example under the GDPR, although participants in clinical trials must give full informed consent for future data sharing (‘organization employed clear, intelligible, and easily accessible language, with the purpose for data processing attached to that consent’ 95 ), and they must be able to withdraw from future trials, they cannot retrospectively erase their clinical data without an audit trail. 96

In addition, and somewhat counter-intuitively, research with health data and specimens collected via clinical care from patients is also not governed by the human subjects research regulations. This work instead falls under the HIPAA research rules. But HIPAA only covers individuals’ PHI collected by ‘covered entities’ (ie health care providers, health plans, or healthcare clearing houses). Covered entities are allowed to share identified data with ‘business associates’ with whom they have a contract to perform ‘functions or activities on behalf of’ or provide services. 97 HIPAA allows entities to transfer data for research as long as they are deidentified, 98 but also allows for the research sharing of identified data and specimens with an IRB waiver—a much more efficient process then acquiring consent in the first place. 99 And the mandatory disclosures regarding default consent to data and specimens collection for research is generally written right into an institution’s standard clinical consent form. 100 As W. Nicholson Price II has pointed out:

‘…if health privacy is worth defending, then why limit those defenses to the narrow set of actors and data covered by HIPAA, as the United States largely does? HIPAA’s outdated focus on covered entities and its safe harbor for “deidentified” data leave too much for manipulation, if health privacy protection is the goal.’ 101

But, as tested in Dinerstein v. Google , 102 built into HIPAA is the assumption that private health information is only collected by covered entities, or, that once collected, those data stay put. 103 , 104 In addition, HIPAA neither envisioned nor protects the huge swath of health-proxy data from which entities can derive health-related information (eg Google’s non-UChicago derived health information). 105 This might include, for example, geolocation data about visits to a psychiatrist, or demographic information often correlated with health outcomes such as maternal mortality. 106

III.B.2. Readily Identifiable Health Data and Biospecimens Only

A second important limitation of the research regulations is that—even if research is federally funded, at an institution that requires all researchers to follow them, or generated with or as part of an FDA application—the rules only apply to research involving ‘human subjects’. 107 A ‘human subject’, in turn, is defined as a living person with whom the investigator ‘obtains information through intervention or interaction’ or which involves ‘identifiable private information or identifiable biospecimens’. 108 Thus, whether a biospecimen or data are considered ‘identified’ remains a critical threshold requirement for protections.

Past concerns regarding individual identifiability of data have focused on large-scale or germline genomic data. As such genetic sequences are unique to only one person, they were assumed to be more easily reidentified than other types of deidentified health data. 109 However, recent research has demonstrated that now almost all data from Americans can actually be reidentified by matching it with as few as 15 demographic attributes that are easily discoverable from the information we enter into our phones, computers, and wearables every day 110 (unless you are Latanya Sweeny, in which case you only need three 111 ). The time when genes represented a singularly unique identifier worthy of potential additional protection has already come and gone. Thus additional laws protecting people from untoward uses of those data, ie the Genetic Information Nondiscrimination Act of 2008, 112 are now far too limited in scope to protect people from the myriad types of health data which may be linked back to them in discriminatory ways. 113

III.B.3. Waiver of Informed Consent

Last, it is worth noting that even if research falls within the scope of federal regulation and even if participants fall within the definition of ‘human subjects’, researchers may still apply for a waiver of informed consent. Waiver can be granted by an IRB if the research involves no more than minimal risk, could not otherwise be practicably carried out, would not adversely affect the rights or welfare of participants, and if additional relevant information will be provided to subjects after their ‘participation’. 114

Thus, despite the enormous research governance system set up by the federal government; parallel research protection programs at individual research institutions; and complex review, waiver, and exception policies with which researchers must grapple; many instances remain where informed consent of individual participants is not legally required to begin with. According to Price, ‘…privacy hurdles are just that—hurdles, not walls. They can be surmounted’. 115 The world of ‘aconsensual’ 116 secondary research is still vast.

III.C. FDA and the Collection of Industry Health Data

Despite not being generally covered under the human subjects research regime, the private human health data market has been influenced, however circuitously, by FDA via its involvement in the direct-to-consumer personal genetic testing (DTC-PGT) industry. Although FDA has authority over a broad swath of drug, device, and biologic products including genetic tests, it exercised ‘regulatory discretion’ over DTC-PGT industry from its inception in 2007 until 2010. 117 But, in 2010, Pathway Genomics, the largest DTC-PGT player at the time, announced a partnership with Walgreens under which it would sell its product in stores across the USA (in contrast to its earlier more discrete online sales model). 118 FDA assessed that this increase in access to potential consumers increased the absolute risk of the product, 119 presumably because more individuals were likely to buy them and therefore more individuals would suffer related burdens. FDA therefore sent ‘Untitled Letters’ warning of potential violations of the Food, Drug and Cosmetic Act to 23 DTC-PGT companies. 120 Although the Untitled Letters did not require the companies to actually discontinue their product, most companies did. 23andMe, which had deep pockets due to its heavy investment from Google, 121 was the only DTC-PGT company left standing to begin regulatory authorization. 122

But 23andMe, somewhat disingenuously, continued to market its product…while seeking FDA authorization to market its product. By 2013, FDA sent 23andMe a ‘Warning Letter’ requiring the company to discontinue its health-related testing sales. FDA justified its response due to concern about the ‘public health consequences of inaccurate results from the device.’ 123 23andMe complied, withdrew its health-related testing from the market, and began to offer components again only after they became FDA authorized.

Although genetic data are but one area of health-related data making up large databanks, it is an important one. The FDA/23andMe saga is an example of how the government can effectively regulate products within its purview. However, FDA’s reach over DTC-PGT data was by proxy. What FDA actually has authority over is the genetic testing device and informational results returned to consumers, not the data or research itself. Regulating the secondary use of 23andMe’s over 12-million person genetic and phenotypic database, 124 or Ancestry.com ’s 18 million one, 125 remains outside both the purview of FDA and DHHS.

III.D. Data and Specimens between AMCs and Industry are Becoming Increasingly Mixed

Despite the fact that federal governance structure over health data and biospecimens focuses on controlling the actions of the entity that collects the data or specimens in the first place, data and specimens are increasingly becoming mixed across clinical, research, and private entities after collection—without additional informed consent. Specifically, the sharing of data and biospecimens between academia and industry has been growing at a rapid pace.

In addition to the deal already discussed in the introduction between Ascension and Google, where identified data are flowing from a covered entity to a private entity (otherwise in the business of collecting widespread data about people) under a HIPAA business associate agreement; and the one at issue in Dinerstein v. Google , where EMR data were shared from UChicago to Google in exchange for licensing rights of future algorithmic decision-making tools; 126 Google also has major deals with Mayo Clinic, 127 the University of California, San Francisco (UCSF), 128 and Stanford Medicine. 129 Note that all of these deals actually involve patient (ie not research participant) data. Therefore, they are not primarily analyzed under the human subjects research regulations, but rather the more flexible HIPAA research regime discussed above.

But, even if medical record information that is shared is fully deidentified, and therefore does not require a HIPAA business associate agreement (which would also include other proscriptions on use), with the amount of personal data Google has already collected regarding the average person, it would be fairly straightforward to reidentify someone’s data and link their identity back to their medical information. 130 Mr Dinerstein attempted to use that argument in his case against Google (ie that there was a higher than average risk of reidentification of his EMR data by Google because of ‘the information Google already possessed about individuals through the other services it provides’ 131 ), but the Northern District of Illinois Court agreed with Google that the theoretical ability to reidentify the data, in the absence of evidence that it had actually done so, was not enough to demonstrate a violation. 132 Of note, the Google/UCSF agreement does specifically allow Google to connect the otherwise deidentified health data with other ‘data and materials it obtains elsewhere...’ 133 Thus, ironically, Google may end up with more information about patients whose medical information was shared with them in a deidentified fashion than with patients whose medical information was identified in the first place (i.e. with an accompanying business associate agreement limiting reidentification and linkage with other data).

In addition, academic researchers are also increasingly using data and specimens gathered by private industry (thus the ‘revolving door’ analogy). Indeed, the number of peer-reviewed publications using genetic data from a private databank increased from four in 2011 to 57 in 2017 (for a total of 181 over the time period). The vast majority of those publications (86%) listed at least one academic author. 134 In fact, the models trained by Google from the UChicago EMR data resulted in a publication in npj Digital Medicine with a first and last author from Google and three academic researchers from UChicago (where the data were from) as well as UCSF and Stanford (which have other agreements with Google 135 , 136 ).

Having data published in the peer-reviewed literature is itself a business asset for industry. Unlike some tangible common resources like fish in a lake, knowledge and ideas are not generally depleted by use. 137 In fact, the use of knowledge increases its value. 138 Publishing articles in the peer-reviewed literature also demonstrates the scientific acceptability of a dataset. For example, in the beginning there were serious questions regarding the validity of self-reported phenotypic data, but 23andMe has established such collaborations as a viable research model. 139 23andMe publicizes these relationships and publications in order to recruit other research partners. 140

We can only speculate regarding why any one researcher might choose to partner with private industry rather than analyze data held by academia, a government entity, or recruit new participants. 141 But although they may be acting rationally as an individual, many researchers acting in this way can enhance private data resources at the cost of developing and supporting more accessible ones. If researchers continue to or increasingly rely on private datasets over refining, validating, and contributing to other more accessible datasets with their own work—much like Hardin’s overgrazing sheep 142 —they may unintentionally as a community contribute to the current environment in which accessible banks struggle to compete with the value and size of private sets. 143 This system contributes to the enabling of private databanks by AMCs either intentionally or unintentionally investing in, and validating, a structure which they do not ultimately control.

In addition to scoping issues, there are several important limitations on the current governance structure of the secondary research market. First, despite extensive and complex informed consent requirements for federally-funded researchers, even when participants provide explicit consent, the result is not consent that is actually informed . Second, despite the recent revision, the regulations written for interventional human research are not tailored adequately for the risk/benefit profiles of secondary research. And third, the combination of the limitations and failures of the human subjects research market has enabled the privatization of this valuable resource—which has related limitations of its own.

IV.A. Breakdown of the Informed Consent Process

A first limitation of the current governance structure is that, even when research consent is obtained, it is often not actually informed. 144 Although many have studied how to improve the informed consent process, the only intervention that has been found to consistently improve participant comprehension is a conversation between the prospective participant and a person knowledgeable about the study. 145 But the human subjects research regulations, while mandating that many specific things be included in the informed consent form (eg descriptions of risks, benefits, alternatives, confidentiality, compensation, contact information, voluntariness, and information regarding secondary research 146 ) say very little when it comes to the actual informed consent conversation. 147

The long lists of mandatory disclosures in research informed consent forms have diminishing returns. 148 Participants neither read nor understand them. 149 As Meg Leta Jones and Margot Kaminski recently argued:

The U.S. version of individual control and consent is largely understood to be a paper regime, based on long, elaborate privacy policies that nobody reads, and surveillance that is impossible to opt out of in practice. Thus ‘consent’ and ‘notice and choice’ have become somewhat dirty words in data privacy conversations, standing for the exploitation of individuals under the fictional banner of respecting their autonomy. 150

In addition, in a recent study of participants who enrolled in a precision medicine trial, the majority of participants had no idea that their data might be commercialized or used for secondary research protocols—mere weeks after they had signed a comprehensive informed consent form disclosing just that. 151 It appears that the more information the regulations mandate be included in the informed consent form in the name of autonomy and transparency, the less likely participants are to actually read and comprehend it. 152

In secondary research—which often involves complex technologies, ethereal risks, and vague protocols—these concerns are compounded. 153 As Laura Beskow has argued: ‘…informed consent cannot bear the weight it is being asked to shoulder. There is a chasm between the theoretical ideals of informed consent and what it accomplishes in actual practice’. 154 And Patrick Taylor: ‘We cannot assume that all social goals will be met through a lemming-like coincidence of universal consent’. 155

People have a hard time grasping the concept of risk in general. 156 And, although a primary risk of secondary research is reidentification, the concept of identifiability is itself a false-binary. 157 The risk of reidentification is on a spectrum of how much risk, which constantly evolves as additional databases are added to public accessibility and emerging aggregation and algorithmic technologies. In addition, people feel differently about the risk of reidentification as it is related to different kinds of data. As Raymond De Vries and Tom Tomlinson argue:

Donors need to know not only whether they can be reidentified; they also need to be able to decide whether the harms caused by reidentification are too high. The answer to that question depends on the type of research findings being protected and the implications for the person’s welfare should those findings be disclosed, not just the statistical likelihood of reidentification. 158

Even for regulated research with myriad protections regarding informed consent, the laws only practically require documenting legal consent, as opposed to ensuring consent that is actually informed. And that necessity to document can also lead to limiting research with important populations who could otherwise benefit from it, like cognitively impaired adults. 159 As Taylor has bemoaned: ‘Ethics is reduced to autonomy; autonomy is reduced to naked choice; and a self-commodifying model of choice is substituted for richer visions of human nature and interdependence.’ 160

IV.B. Regulations are not Responsive to Current Secondary Research Market

A second problem with the current regulatory structure surrounding research data and biospecimens is that it was a system originally built to protect people from potentially harmful interventions, not secondary research. The regulations are therefore not calibrated to the unique risk–benefit profiles of secondary research, leading to both over- and under-regulation. Also, the regulations’ threshold for governance is still method of collection—which aligns with neither contributor concerns regarding usage nor current data-sharing practices.

IV.B.1. Regulations Lack Responsiveness to the Risk/Benefit Profiles of Secondary Research

A major issue with the current governance of the federally funded secondary research enterprise is a lack of responsiveness to the risk/benefit profiles of most secondary research protocols. Not only does secondary research need large amounts of biospecimens and data in the aggregate to be helpful—distilling the value of any one participant—lowered risks to individuals, level of burden if those risks materialize, and greater benefits to the community warrant a reconsideration of the current informed consent requirements. 161

With the transition from interventional clinical research to secondary research protocols, corresponding risks to participants have also shifted. 162 Whereas the research violations that founded the current regulatory and case law structure were tangible and largely physical 163 or financial 164 in nature, secondary research risks generally involve dignitary harms which are harder to quantify 165 and establish as damages in court. 166 Participants in secondary research protocols can suffer from violations to what some have dubbed ‘non-welfare interests’, or the ‘moral, religious, or cultural concerns’ about uses of their data or specimens, even when they are never reidentified 167 (also related to the concept of a ‘dignitary tort’ in law, such as intentional infliction of emotional distress). For example, in one study by De Vries and colleagues, whereas more than 70% of participants surveyed said they would be happy to sign a ‘blanket consent’ for any future use of their donated biospecimen, when pressed specifically about controversial examples of research—such as those involving patents, abortions, or weapons of mass destruction—almost as many changed their mind and asked to withdraw. 168

Last, the value of the denominator in research has shifted considerably. In interventional clinical research, due to the intensive control of variants and attempts to ensure that the study is fully powered to capture statistically significant differences, each participant can be of value to the study at the individual level. But in secondary research protocols participants are generally of value in aggregate. 169 This is analogous conceptually to the recognized ‘prevention paradox’ in public health, where changes in health behavior can positively affect outcomes at the population level but only result in negligible improvements for any given individual. 170

On the bright side, due to this shift in value—and unlike, for example, the single cancer patient who may be enrolled in only one experimental chemotherapeutic or standard of care control arm—uses of aggregate data are generally ‘non-rivalrous’ or ‘non-subtractive’ of others’ uses. 171 As Charlotte Hess and Elinor Ostom, the founders of modern common resource policy, have argued: ‘…the more people who share useful knowledge, the greater the common good’. 172

Last, upon reviewing the new types of risks and benefits for secondary research, an important overarching observation becomes evident. Although the risks of secondary research (eg privacy breaches, dignitary harms) remain remote, small, and individual, the benefits (valuable datasets and knowledge outputs) redound to either the entity holding the data or the common good. 173 This situation is similar to the development of ‘herd immunity’ in the public health context, where for some diseases 90% of individuals must be vaccinated to achieve herd immunity of a community that otherwise protects those who cannot be vaccinated. The pursuit of herd immunity in vaccinations relies on social contract theory and requires individuals to take on some small risk to themselves (of adverse events associated with the vaccine) to benefit others greatly (by protecting them from the more harmful disease).

Similarly, the asymmetry between the risk- and benefit-bearer can have detrimental effects on an efficient secondary research biospecimen and data market. Market actors often make decisions based on costs and anticipated benefits for themselves, and disregard or discount costs to others. This ‘negative externality’, 174 of researchers discounting privacy risks to the participants of secondary research protocols, can cause the secondary research market to become inefficient, ie not fully tally the actual costs and benefits of an action-unless externally forced to perform otherwise.

IV.B.2. Governance by Methods of Collection No Longer Makes Sense

Another problem with the current governance of the secondary research enterprise is that siloed regulation of data and biospecimens by collection entity no longer makes sense due to increased sharing. This is why the new data regulations in both California and Europe generally regulate by the kind of data being used, rather than by who is using it. 175 Also, in the USA, who collected the data is an irrelevant distinction to contributors who generally care about its use.

There has been a plethora of empirical studies demonstrating that people care how their data and specimens are used for research. 176 A minority even worry about research with deidentified data and specimens, which is particularly true in Black and Latino populations, 177 and currently out of scope for all regulatory regimes. In addition, although many people hypothetically support the use of specimens in research, 178 biobanks are increasingly turning to a ‘commercialization’ model in order to support the expenses of long-term cryopreservation. 179 But a recent US-based study found that 67% of participants wanted to be clearly notified regarding potential biospecimen commercialization and only 23% were comfortable with such use. 180 The revisions to the Common Rule now require that regulated researchers disclose potential commercialization to participants, putting them at a potential additional disadvantage for recruitment as compared to their unregulated peers. 181 As aptly summarized by the recent National Academy of Medicine (NAM) report ‘Health Data Sharing to Support Better Outcomes: Building a Foundation of Stakeholder Trust’:

The patient and family community lacks trust that health care systems and researchers will make data and the conclusions based on those data available to them and will not misuse data they provide by rationing care and sharing it with unauthorized third parties. 182

And last, even if the regulations are applicable and consent is obtained, it does not provide the kind of control that contributors want. The Common Rule or HIPAA only provide a binary ‘exit right’—the right to either contribute to research or not. 183 Participants are not given a voice in the kind of research gets done or veto power over what secondary research protocols they are or are not willing to contribute. 184 As Richards and Hartzog argue: ‘…consent does not scale. It is almost entirely incompatible with the modern realities of data and technology in all but the most limited of circumstances’. 185

IV.C. Encouraging Privatization of a Shared Resource

A third major problem with the current governance of the secondary research market is that it actually enables the privatization of secondary research data and biobanks. And privately-held databanks, as opposed to those run by a government agency or other type of accessible collaboratory, are associated with several pressing societal and scientific concerns.

IV.C.1.Privatization of Data and Biobanks

As discussed above, the proposed alternative to governance of shared resources in the ‘tragedy of the commons’, other than government regulation, is privatization. As Michael Heller and Rebecca Eisenberg pointed out (as far back as 1998), this is the direction biomedical research is headed. 186 But it is not fair to assume that the privatization of a shared resource is necessarily bad. According to Hardin, it is the only other option in averting sure ruin. 187

The US federal and global governments have focused vast resources into encouraging data accessibility and sharing. The European Open Science Cloud and various country-specific genetic and other health data and specimen banks have aggressively pursued the ideal of open science abroad. 188 The NIH’s 2018 Strategic Plan for Data Science proposes a ‘data ecosystem’ which would allow for ‘a distributed, adaptive, open system with properties of self-organization, scalability and sustainability’ via projects including the NIH Data Commons. 189 However, these initiatives have yet to achieve widely engaged data-sharing practices. 190

A decade ago, the NAM report Toward Precision Medicine: Building a Knowledge Network for Biomedical Research and a New Taxonomy of Disease argued:

Data-sharing standards should be created that respect individual privacy concerns while enhancing the deposition of data into the Information Commons. Importantly, these standards should provide incentives that motivate data sharing over the establishment of proprietary databases for commercial intent. Resolving these impediments may require legislation and perhaps evolution in the public’s expectations with regard to access and privacy of health-care data. 191

This report, in turn, inspired the US government’s dedication to building a health biospecimen and data commons, the All of Us Research Program , which was announced by President Barack Obama in his 2015 State of the Union address. The goal of All of Us is to ‘enroll at least 1 million persons who agree to share their EMR data, donate biospecimens for genomic and other laboratory assessments, respond to surveys, and have standardized physical measurements taken’. 192 It supports ~270,000 participants, and recently started returning a first round of genic results to participants recruited by the University of Wisconsin. 193   All of Us targets recruitment to those historically ‘underrepresented in biomedical research’, and it currently boasts 75% of participants that meet that categorization. 194 Since 2015, Congress has allocated $1.02 billion toward supporting the All of Us program and the 21st Century Cures Act authorized another $1.14 billion through 2026. 195

By contrast, the DTC-PGT company 23andMe has a genetic database of over 12 million participants and counting, as well as over a billion phenotypic data points. 196 This makes it ‘the largest re-contactable research database of genotypic and phenotypic information in the world.’ 197 And 23andMe recently announced plans to go public with a merger valuation of $3.5 billion. 198 Therefore, despite All of Us’ laudable goals and progress, the US government has essentially spent $2.16 billion to build a database to compete with the private ones its own regulations enabled in the first place.

IV.C.2. Challenges with Privatization

There are costs associated to allowing a valuable common resource to become privatized. First, informed consent is lacking at an even greater scale. Second, when data and biospecimens are privately held, industry can put limitations on access that might stifle future research advances. Third, without data access, peer researchers can neither validate nor build on work otherwise available in the peer-reviewed literature. Last, even if certain industry players are currently allowing limited researcher access to their datasets, a new business focus, leadership, or regulation could change that arrangement quickly—and the time and effort other researchers spent contributing to and validating that private resource could be lost.

First, issues with the Common Rule’s informed consent process can be compounded in unregulated industry research which generally relies on digital consent platforms (if it acquires consent at all). This type of consent to private consumer interactions is not a typical clinical or research informed consent with associated fiduciary obligations—it is contractual. 199 There is generally no IRB ensuring that the risks and benefits are adequately balanced before attempting to enroll participants, but rather an attorney (best-case scenario) who has wordsmithed language to protect her company from liability and grant them latitude. Richards and Hartzog highlight three conditions that might make electronic consent particularly ‘pathological’: (i) when people are asked constantly for their consent such that there is not enough time to seriously consider each choice, (ii) when the risks of consent are complex and ethereal, and (iii) when people are incentivized to not take choices seriously 200 —all of which are relevant to the e-consent of unregulated research.

People also generally do not think about inputting their health data into private platforms, such as ‘wellness apps’ or industry websites, as commercializing their own information. 201 Almost a quarter of US adults report that they are asked to agree to a private privacy policy on a near daily basis. 202 Only 1/1000 consumers click on a website’s terms of service; only 1/10,000 if it requires two clicks. For the very few who make it to terms of service, the median time spent reading them is 29 seconds. 203 Contemplating sharing sensitive health data does not change this. 204 And, even if consumers do glance at the terms and conditions, the risks of secondary usages of data are so complex that there is disagreement regarding whether terms and conditions governing them should even be considered valid. 205 This also explains contributor concern in the summer of 2019 when 23andMe announced its exclusivity agreement with GlaxoSmithKline to use its database for drug research and development (despite the fact that such a potential collaboration was laid forth clearly in 23andMe’s Terms and Conditions). 206

In addition to the fact that most people do not read informed consent forms—and even if they do, comprehension is likely fleeting—people are also generally unaware of the possibility of ‘data mosaicking’ (combining different datasets to gain a more complete picture of a single point of interest):

Our consent to data practices is astonishingly dispersed. Thousands of apps and services ask us for small, incremental disclosures, few of which involve the kind of collection of information that might give people pause. While dating apps and platforms that collect sensitive and large amounts of personal data might cause some pause, it’s not as though people share all their information at once. Instead, it trickles out over time, such that people’s incentives to deliberate at the point of agreement are small because we don’t know how much information we will ultimately end up sharing. 207

Indeed, companies that create ‘shadow health records’ have started to multiply and gather health-related and proxy data from nonprotected sources, assemble the data back together under the identity of the individual from whom it came, and then sell access to the Frankenstein-ed health records back to researchers. 208

And, although the benefits of digital consent are generally immediate and obvious (eg a funny picture of you as the opposite gender, or 30 years older, to share on social media 209 ), deidentification and mosaicking risks are often unknown, complex to grasp, and may or may not become actual burdens. 210 Few understand that researchers will not just have the ability to access to your ‘selfie’, but connect it with troves of other data you share—indefinitely. A recent example of this is predators collecting name, birthday, and location information from people posting photos of themselves with their COVID-19 vaccination cards for potential scams or the sale of fake cards. 211

A second issue is potential limits on access. Researcher use of private datasets allows industry a gatekeeping function over what research is enabled. 23andMe has an internal committee that reviews proposed protocols and only supports a select few per year. 212 But gatekeeping can compromise the reliability of peer-reviewed literature and bias it in favor of industry. In 2011, for instance, 96.5% of published, industry-sponsored, head-to-head comparative effectiveness trials found favorable results—a highly unlikely outcome, presumably associated with the type of reports submitted for publication. 213 These kinds of access issues are already well-documented in the medical drug market. 214 But, as biospecimens and health data emerge as increasingly critical to drug and device development, 215 it is reasonable to anticipate similar problems in the upstream data market.

Third, if datasets are private, other researchers’ ability to validate the work or build derivative discoveries will be limited. For example, there has been a recent rash of reanalyses of studies in the nutritional 216 and psychological literature 217 which has debunked many major studies—a critical re-examination opportunity that supports the integrity of science in the published literature. As argued in the recent NIH Policy for Data Management and Sharing :

Sharing scientific data accelerates biomedical research discovery, in part, by enabling validation of research results, providing accessibility to high-value datasets, and promoting data reuse for future research studies. 218

In fact, two COVID-19 drug studies from the same authorship team were recently retracted from the New England Journal of Medicine 219 and The Lancet 220 , respectively due to a lack of author-access to privately-held data—a fact which authors misrepresented when submitting the articles and confirming data validity.

Without access to underlying data sources, this type of post-hoc quality assurance is much harder, if not impossible. 221 And, as Heller and Eisenberg argued, too many intellectual property rights in premarket ‘upstream’ research can result in limited ‘downstream’ produce-oriented research in the ‘tragedy of the anti-commons’. This type of tragedy can, in turn, lead to under utilization of a valuable resource. 222

Although norms are beginning to shift gradually, extensive disclosure of raw data in supplements to publications remains uncommon. If an external researcher were later to request underlying data from the corresponding author for verification or further research, the response to such query would also generally not be public (and there are many reports of data actually being notably un available upon request). 223 This tension is exacerbated when the underlying dataset has commercial value. 224 But, even if databanks are not protected by patents or other types of business interests, and even if corresponding authors would be willing to share the underlying datasets supporting their work, the burden of requesting and accessing the dataset is still on other researchers:

…when it is easy for owners to exclude users from access to resources, as in the case of ‘practically excludable’ materials and data, the burden of inertia is on users to persuade owners to permit access, whether or not the resource is covered by formal property rights such as patents. In this context, high transaction costs make use less likely, aggravating the risk of an anticommons. 225

Last, companies might decrease or limit researcher access because of a new business focus, leadership, or regulation. For example, when law enforcement used the GEDmatch database surreptitiously to solve the ‘Golden State Killer’ case, there was substantial backlash. 226 GEDmatch only listlessly pursued claims against the law enforcement officials who violated their terms and conditions by making up an identity. 227 The company was then bought by Verogen, a forensic genomics firm, whose specific intent was to aid in law enforcement searches. The database, including the genomic data of over 1.3 million users, was sold en masse without so much as an affirmative notice to contributors. GEDmatch then updated its Terms of Service to disclose the acquisition, and users were locked out of accessing the platform until they agreed to the new terms. 228 Relatedly, in February 2021, 23andMe announced its decision to go public—a move for which the implications are not yet clear. 229

We know that what contributors care about is use of their health data and specimens. 230 We know that they are particularly suspect when health data and specimens exchange hands across types of entities. 231 And we know the government is currently regulating the secondary research enterprise in a way that remains nonresponsive to either of those concerns. But questions remain: Should someone do something about it? And, if so, who?

There have been several sets of important recommendations regarding better governing biospecimens and health data research, which generally segregate along four main suggestions: strengthen existing or pass new law, require all users of the system to contribute, develop accessible resources, or enable data commons. 232 Many scholars have theoretically supported an opt-out or no consent system for public health purposes, 233 although—when asked—few potential contributors agree. 234 But, because many previous proposals either required private industry to self-restrict when competitors might not, or were founded on the government issuing fluid and comprehensive regulatory revisions when it apparently cannot, none have yet fully stemmed the tide of privatization.

This section first explores industry self-governance as one alternative to the current regulatory structure. Given its limitations, it will then discuss alternative approaches AMCs can take to control the commercialization of held health data and biospecimens as well as academic researcher use of industry data, in addition to improving standards for informed consent for both patients and participants.

V.A. Self-Governance as an Alternative?

Stephen Maurer, in his book on Self-Governance in Science , argues that self-governance of industry can be a viable alternative when regulatory mechanisms have proven ineffective. 235 There are many types of industries where self-governance can be attractive, both to ward off potentially more restrictive government regulation, or because the consumer base insists on a standard for behavior without assurance of which they will no longer purchase the product. 236 In addition, private industry players are in a position to possess the most relevant information regarding effective strategies for controlling industry behavior. 237 They may also be able to reach a larger swath of players than government agencies, which are constrained by congressional scope and funding. 238

In particular, industries where players are limited and derivations from market expectations may be attributed across all entities might be particularly motivated to self-regulate. 239 For example, when 23andMe sales started to decline in 2018, CEO Anne Wojcicki attributed it to privacy concerns surrounding use of their databank by law enforcement after the GEDmatch saga. Sales of the 23andMe product have continued to decline (by $136 M just from 2019 to 20), and last year 23andMe laid off 14% of its employees. 240 Wojcicki attempted to head the generalization of GEDMatch privacy laxness off by coauthoring a new Privacy Best Practices for Consumer Genetic Testing Services 241 within months of the GEDmatch news and in the same week she announced the 23andMe deal with GlaxoSmithKline. 242 23andMe was joined on these Best Practices by several other major DTC-PGT players including Ancestry and Helix. 243

Self-regulation also has the potential to be more efficient and finely tuned to market variance than external regulatory bodies. 244 Evolving technology moves faster than notice and comment rulemaking allows. Evocative of when FDA finally intervened with the DTC-PGT market and all other companies but for 23andMe dropped off the market or began requiring a prescription, Maurer points out:

…official regulation can be oversupplied so that politicians and bureaucrats invest more than what the entire industry is worth to society. By comparison, society’s investment in private regulation can never exceed what consumers are willing to pay for the regulated product. 245

That is, if industry spends more money on self-regulation that it can regain in sales, an efficient market would right itself by expelling such a product (23andMe was able to avoid this market consequence via its investment by Google). In addition, as argued by the recent NAM report on health data sharing:

Standards of conduct can build trust, because people know what to expect. Collaborative efforts built on trust can convert zero-sum relationships into positive-sum relationships, where data sharing serves everyone’s interests 246

There are some successful examples of health industry self-governance. The Pharmaceutical Research and Manufacturers of America’s (PhRMA) Code on Interactions with Health Care Professionals brings additional clarity and specificity to existing FDA regulations and the antikickback law regarding incentives from pharmaceutical sales representatives to health care professionals (eg specifying that acceptable ‘gifts’ must be educational in nature). 247 In return, the government has acted creatively to incorporate PhRMA’s perspective and flexibility into governance by stating that compliance may protect companies from liability 248 or even requiring PhRMA Code compliance in settlement or corporate integrity agreements. 249

However, the private data and biospecimen industry has yet to attempt comprehensive self-regulation. The closest example is the 23andMe-driven DTC-PGT industry’s recent Privacy Best Practices for Consumer Genetic Testing Services discussed above. 250 But, most DTC-PGT companies did not actually sign it, and the statement’s scope had notable limitations including a lack of privacy protections for nongenetic health data also collected. 251

Thus, although self-governance is a potentially effective possibility for the private data and biospecimen industry, it has not yet come to fruition despite decades of concern. Also, as Maurer concludes, ‘this bargain is only available provided government can trust the private process’ , 252 and there is no indication that is true.

V.B. Proposed AMC Policies Regarding Commercialization and Commercial Use of Data

The broad limitations in scope and application of the federal human subjects research regulations put federally funded researchers at a competitive disadvantage vis-à-vis private industry, so much so that AMCs are increasingly buying their data from industry to begin with. 253 In lieu of national standards, it is time to look to regulatory alternatives to controlling the market, and—given their large negotiating power—the most promising seems to be AMCs setting higher policy standards.

V.B.1. AMC Commercialization and Use of Health Data and Biospecimens

A major value of private data and biobanks is in their ability to support good upstream research that can be translated into potentially lucrative downstream products with marketing authorization that clinicians will both use and prescribe. 254 These kinds of comprehensive databanks require a diversity of both genomic and other health-related data, often found in EMRs. EMRs are generally in possession of hospitals and clinics providing health services. 255 This gives AMCs negotiation power—not just for licenses for eventual machine-learning products such as in Dinerstein v. Google , 256 but also to protect and improve the treatment of their patients and participants. Industry also needs AMCs to conduct and engage in research, publish articles, and treat the patients and write the prescriptions that make the private data valuable in the first place. Although academia can also benefit from using data and specimens held privately (as it might be more cost-effective for federally-funded researchers to purchase such data resources than generate them de novo 257 ), instead of waiting for industry to self-regulate its production of valuable health data and biospecimens, academia should self-regulate its own consumption .

In addition to the opportunity to potentially set better standards for the future, academia might also be motivated to set policies for engagement with private industry given the recent proliferation of related negative press coverage and lawsuits. 258 Nontransparent academic/industry partnerships can bring attention to that which is legal, but not widely known (eg the Mayo/Google deal), or that which is questionably legal to begin with (eg the Google/Ascension deal). Both types of engagement raise questions of potential corruption, which can in turn ‘undermine[] the institution’s effectiveness by diverting it from its purpose or weaken[ ] its ability to achieve its purpose, including…the public’s trust in that institution or the institution’s inherent trustworthiness’. 259

But partnerships between industry and entities hoping to remain publicly oriented might raise a specter of corruption. As Jonathan Marks recently argued in his book, The Perils of Partnership , the focus of industry is profit and branding. This focus influences most, if not all, of industry’s actions—which may ‘lead to a bias toward the development of technological solutions to public health problems that may be readily commercialized’. 260 For example, when soft drink companies partnered with public health agencies to jointly combat the ‘obesity epidemic’, the clinical focus was shifted from the effects of sugar to those of exercise; ie instead of focusing on limiting sugar intake, the proffered industry/government partnership solution was increasing physical activity to burn it off. 261 Such partnerships not only allow industry to align government efforts on goals congruent with their bottom line, but also allow them to don a ‘health halo’ of respectability due to the assumption that government entities are acting in the public’s best interest—and those that they partner with do the same. 262

We could also see this phenomenon painfully playing out in the initial glacial distribution of the COVID-19 vaccine across the country. States had been distributing their allocation of the vaccine from the Strategic National Stockpile to both public health entities as well as hospitals, AMCs, and other types of private entities. 263 Although this may have made sense given the existing lack of infrastructure and funding of public health agencies, 264 backlash regarding the choices of some of those entities—and in particular AMCs which adopted rather luxurious definitions of who counted as an ‘essential employee’—was swift. 265 But this is a classic example of the problem with entrusting private entities with public goods—they are neither enabled (nor potentially particularly inclined) to act equitably at a public community level. As Wendy Parmet recently argued in her Atlantic essay on the privatization of public health:

Unquestionably, the private sector has a role to play in public health—just look at the private companies that produced the vaccines and the private hospitals that have cared for the ill. But to rely on it to protect the public’s health is pure folly…. To depend now on the private sector to increase vaccination rates would further underscore America’s tepid commitment to the basic principles of public health. 266

Thus, AMC/industry partnerships might not be the right framing to control data and biobank repositories. However, this should not prevent academia from setting standards for industry interactions. In fact, academia has done so successfully in the past. For example, in the wake of research establishing that even small gifts from pharmaceutical sales representatives influenced physician prescribing practices, many AMCs set policies limiting or prohibiting sales representatives from campus. 267 These prohibitions, in turn, significantly altered prescribing practices (from predominantly drugs that had been heavily marketed to cheaper ones that were not) at the majority of implementing institutions. 268 Setting standards at the AMC level is one distinct possibility for controlling these relationships. 269

As an example, at the University of Michigan Medical School, we have set up a multidisciplinary Human Data and Biospecimen Release Committee which reviews all industry and researcher requests to commercialize participant health data and biospecimens collected at Michigan Medicine. 270 This Committee sets higher standards than those included in the human subjects research regulations, which we also voluntarily extend to all human subjects research at our institution. We require participant consent (with exceptions for rare diseases and some pediatric research) for all commercialization, and do not grandfather in data and biospecimens collected before the regulations required us to disclose this information. In addition, given the increasing risk of reidentification discussed above, we require commercialization consent of even data and biospecimens that will be provided to industry in deidentified form. 271

Another example of a potential AMC approach is controlling the data usage agreements necessary to transition industry data to academic researchers at most AMCs. 272 AMCs could require additional clarity regarding industry data provenance specifically to ensure that the type of informed consent provided by participants meets the AMC’s own expectations; avoiding the opportunity for AMC researchers to otherwise whitewash data that they could not have legally acquired themselves. Indeed, there has been minimal oversight or even acknowledgement of the consent pedigree of privately procured specimens and data in the published literature. And, although there has been some response to the moral turpitude of analyzing specimens and data taken from participants forcefully—such as via the Guatemala STD experiments, 273 Nazi medical experiments, 274 or Chinese prisoners 275 —there has been virtually no emphasis in nonegregious circumstances. For example, in a recent review of academic publications with data procured from private databanks, the type of original contributor consent was either not disclosed or was unclear almost half the time. 276 AMCs could begin to improve this system at the data use agreement level.

Another possibility is that AMCs could require IRB authorization of the primary research which generated the data in the first place. While, as discussed above, IRB authorization might not be an actual legal requirement of all industry research, by refusing to publish secondary research with data acquired otherwise, AMCs could begin to shift behavior. Such an approach has been employed by journal editors in the past. For example, when 23andMe submitted one of its first genome-wide association studies to PLoS Genetics in 2010, 277 editors noticed three major deficits: the protocol had not been prospectively reviewed by an IRB, there was concern over the type of consent obtained, and access to the underlying data was limited. PLoS Genetics editors published a concurrent comment explaining why they decided to publish the piece, despite these perceived shortcomings, given the work’s (i) novel contribution to the literature and (ii) their independent assessment that the participants involved were neither coerced nor deceived. 278 Ultimately, editors argued that publication ‘accompanied by an editorial providing transparent documentation of the process of consideration’ was appropriate, in addition to their ‘call for community input to spur efforts to standardize the IRB consent process’ for that type of research. 279 The day the editorial was published, 23andMe sent out a press release emphasizing that IRB review was not legally required for its research—but that it would obtain review going forward ‘to ensure that our work is in line with scientific research best practices’. 280 In this case, the ability to publish knowledge gained from analysis of its databank made the databank a more valuable business asset itself—one for which 23andMe was willing to give up something of value (ie not having IRB oversight) in return. 281

V.B.2. ‘Lifting All Boats’ Via Informed Consent

In addition to better controlling the flow of data and specimens between AMCs and industry, AMCs can begin to improve informed consent practices at their own institution. As argued above, one of the reasons that the information contained in a consent form is so highly regulated is that it is easier than empirically validated methods of gaining informed consent (ie a conversation) to regulate in the first place. The assumption is that if information about risks or burdens were disclosed in the form, they have been understood and accepted. We know this is not true. 282 But, given the new types of risks to individuals and benefits to communities of secondary research, should we bother trying to continue to improve the individual informed consent process given the enormous time and burden that would entail? If the individual risks of secondary research are less, and we know that informed consent is generally ineffective anyway, is there a better solution?

Although a signature on the informed consent form will certainly remain legally required, AMCs can also attempt to better conscribe what they are asking of patients and participants contributing to secondary research banks in the first place. By ensuring a baseline standard taking into account the risks and benefits of secondary research, with buy in from representative community members, this ‘rising tide can lift all boats’ and potentially better protect many more participants—even if some might not read or understand the forms.

One way to do this is by establishing some type of data review committee which will agree to prospective standards for what can be asked of participants in the first place, rather than a retrospective check of informed consent (like the University of Michigan model). Not only would such a committee potentially resolve the imbalance of putting too much emphasis on individual consent and autonomy for low risk research, but it would also avoid the secondary issue that even when researchers attempt to get individual level consent, participants rarely comprehend what they are agreeing to.

At the committee level, patient representatives could engage in a more helpful conversation about risks, benefits, and alternatives to research. In particular, given that we know that Black and Latino participants are more likely to be hesitant to share data and specimens, 283 we must ensure that these standards are not set at the ‘average patient’ standard—because that ‘average’ for many AMCs is likely to be of a prominently white cohort and therefore represent a white viewpoint. Informed consent standards should be responsive to a diversity of racial and ethnic viewpoints so as not to further discourage the diversity of racial and ethnic representation within research datasets.

Of course, dissenting individual participants would still be potentially subjugated via representation at the cohort level. But, given the justice and public beneficence opportunity for communities from enabling such research, and the low risk to individuals, this seems like a better compromise than the one we are making now (ie that many participants do not understand the forms and yet we move forward). We should instead be moving toward setting and enforcing higher standards across the board such that the individual informed consent conversation can focus on other preference-sensitive choices (eg amount of time participants are willing to commit to research in return for what value).

Therefore, although a more classic academic/industry partnership model might not be appropriate, there are several concrete steps AMCs can take to better control and set standards for their own data and specimen commercialization and academic researcher access to industry data and specimens, as well as resolve some of the lapses in protection of patients and participants at the federal level.

The future of scientific advances via secondary research with biospecimens and health data is bright. However, the current strict governance of federally funded researchers (and, by association, most AMC researchers) and a decided lack of governance of privately collected data and specimens has created a stark imbalance. This inequity is increasingly leading to AMCs commercializing their own health data and specimens, in addition to securing additional data access from private holdings. But, though AMCs may have little control over new federal legislation or revising extant regulations, they do have control over supply and demand as well as the behavior of their researcher employees. Simply succumbing to inevitable privatization of data and biospecimen banking is not an acceptable solution for such an important public good. Instead, AMCs should move toward better controlling the proverbial revolving door between themselves and industry to continue to advance life-saving research while also protecting the interests of those whose data are supporting such advances.

Arti K. Rai, Risk Regulation and Innovation: The Case of Rights-Encumbered Biomedical Data Silos , 92 Notre Dame L. Rev 1641, 1643 (2017) (‘In diagnostic testing, as in other areas of biomedicine, large data sets promote cumulative innovation.’).

NIH, Final NIH Policy for Data Management and Sharing (Oct. 29, 2020) https://grants.nih.gov/grants/guide/notice-files/NOT-OD-21-013.html (accessed Jan. 25, 2021) (‘Sharing scientific data accelerates biomedical research discovery, in part, by enabling validation of research results, providing accessibility to high-value datasets, and promoting data reuse for future research studies.’) (hereinafter, NIH, Policy for data management ).

National Research Council, Toward precision medicine: building a knowledge network for biomedical research and a new taxonomy of disease (2011), https://www.nap.edu/catalog/13284/toward-precision-medicine-building-a-knowledge-network-for-biomedical-research (accessed July 17, 2020).

Barbara J. Evans , Barbarians at the Gate: Consumer-Driven Health Data Commons and the Transformation of Citizen Science , 42 Am. J. Law Med. 651, 652 (2016) (‘Data resources are a central currency of twenty-first-century science, and the question is, “Who will control them?”’).

45 C.F.R. § 46 (2019).

National Science Board, Science & Engineering Indicators (2018) https://www.nsf.gov/statistics/2018/nsb20181/report/sections/academic-research-and-development/expenditures-and-funding-for-academic-r-d#sources-of-support-for-academic-r-d (accessed Dec. 27, 2019).

Rebecca Skloot, The Immortal Life of Henrietta Lacks (2011).

Kayte Spector-Bagdady et al., Encouraging Participation and Transparency In Biobank Research, 37 Health Aff. (Millwood) 1313 (2018) (hereinafter Spector-Bagdady , Encouraging Participation ).

Reshma Jagsi et al., Perspectives of Patients with Cancer on the Ethics of Rapid-Learning Health Systems , 35 J. Clin. Oncol. 2315 (2017).

Spector-Bagdady et al., Encouraging Participation, supra note 8.

Ed Pilkington, Google’s secret cache of medical data includes names and full details of millions—whistleblower The Guardian, Nov. 12, 2019 (‘A whistleblower who works in Project Nightingale, the secret transfer of the personal medical data of up to 50 million Americans from one of the largest health care providers in the US to Google, has expressed anger to the Guardian that patients are being kept in the dark about the massive deal.’); Tarq Shaukat, Our partnership with Ascension , https://cloud.google.com/blog/topics/inside-google-cloud/our-partnership-with-ascension (accessed Dec. 24, 2019).

C.F.R. § 164.514(b) (2016).

Rebecca Robins, HHS to probe whether Google’s ‘Project Nightingale’ followed federal privacy law STAT, Nov. 13, 2019, https://www.statnews.com/2019/11/13/hhs-probe-google-ascension-project-nightingale/ (accessed May 31, 2020) (‘A federal regulator is investigating whether the federal privacy law known as HIPAA was followed when Google collected millions of patient records through a partnership with nonprofit hospital chain Ascension…. “OCR would like to learn more information about this mass collection of individuals’ medical records with respect to the implications for patient privacy under HIPAA,” Roger Severino, the office’s director, said in a statement to STAT.’).

Heather Landi, Google defends use of patient data on Capitol Hill among scrutiny of Ascension deal , FIERCE Healthcare (Mar. 4, 2020) https://www.fiercehealthcare.com/tech/senators-pressing-ascension-google-data-deal-as-tech-giant-defends-its-use-patient-records (accessed Feb. 13, 2021) (‘The lawmakers—presidential candidate Elizabeth Warren (D-Mass.), Richard Blumenthal (D-Conn.), and Bill Cassidy (R-La.)—sent a letter to Ascension…demand[ing] more information regarding the type and amount of information the health system has provided to Google, whether the health system provided advance notice to patients about the deal and whether patients can opt-out of data sharing.’).

Blue Ridge Institute for Medical Research, Ranking Tables of NIH Funding to US Medical Schools in 2020 as compiled by Robert Roskoski Jr. and Tristram G. Parslow   http://www.brimr.org/NIH_Awards/2020/default.htm (accessed Feb. 14, 2021).

Dinerstein v. Google, LLC, No. 19 C 4311, 2020 WL 5296920, 1079 (N.D. Ill. Sept. 4, 2020) (‘Whatever a perpetual license for “Trained Models and Predictions” actually means, it appears to qualify as direct or indirect remuneration.’).

Anne Cambon-Thomsen, The Social and Ethical Issues of Post-Genomic Human Biobanks , 5 Nat. Rev. Genet. 866, 867 (2004) (‘large population biobanks are indeed useful tools not only as a repository of genomic knowledge but also as a means of measuring non-genetic environmental factors. As such, they give epidemiologists and geneticists a new tool to explore complex gene–gene and gene–environment interactions at the population level.’).

Brett M. Frischmann, Infrastructure: The Social Value of Shared Resources 61 (2012) (‘[s]ocial demand for the resource is driven primarily by downstream productive activities that require the resource as an input.’); also see eg W. Nicholson Price II, Risk and Resilience in Health Data Infrastructure, 16 Colo. Tech. L.J. 65, 77 (2017) (‘Roads are not valuable principally because you can drive on them; roads are valuable because you can use them to get places and transport goods.’).

Ruth R. Faden et al., An Ethics Framework for a Learning Health Care System: A Departure from Traditional Research Ethics and Clinical Ethics , Hastings Cent. Rep. S16–27, S23 (2013).

Jodyn Platt et al. Ethical, Legal, and Social Implications of Learning Health Systems , 2 Learn. Health Syst. e10051 (2018).

Evans, supra note 4, at 651.

Elizabeth R. Pike, Defending Data: Toward Ethical Protections and Comprehensive Data Governance, 69 Emory L.J. 687, 691 (‘Personal data are big business.’).

Kari Paul, What Is Exactis—And How Could It Have Leaked The Data Of Nearly Every American? , Market Watch, June 29, 2018, https://www.marketwatch.com/story/what-is-exactisand-how-could-it-have-the-data-of-nearly-every-american-2018-06-28 (accessed June 29, 2018).

Amy L. McGuire et al., Importance of Participant-Centricity and Trust for a Sustainable Medical Information Commons , 47 J. Law Med. Ethics 12, 15 (2019).

It is critical to note that, although Hardin’s conceptual framework of a commons founded a theory upon which many scholars have built, Hardin was a racist and eugenicist. His words should only be read and understood within that important context. Eg Craig Straub, Living in a world of limits (interview with Garrett Hardin), 8 The Social Contract 1 (1997) (‘…I think there are other reasons for restricting immigration that are more powerful. My position is that this idea of a multiethnic society is a disaster…[it] is insanity. I think we should restrict immigration for that reason.’).

Garrett Hardin, The tragedy of the commons , 162 Science. 1243, 1244 (1968).

Jonathan H. Marks, The Perils of Partnership, at 34 (2019) (emphasis added).

James H Jones, Bad Blood: The Tuskegee Syphilis Experiments (1993).

Or surrogate, if the individual lacks capacity.

Hippocrates, The Oath (Francis Adams trans. 400 B.C.), http://classics.mit.edu/Hippocrates/hippooath.html [ http://perma.cc/B2JH-86RE ] (accessed July 17, 2020).

Henry K. Beecher, Ethics and Clinical Research , 274 N. Engl J. Med. 1354 (1966) (‘Human experimentation since World War II has created some difficult problems with the increasing employment of patients as experimental subjects when it must be apparent that they would not have been available if they had been truly aware of the uses that would be made of them. Evidence is at hand that many of the patients in the examples to follow never had the risk satisfactorily explained to them, and it seems obvious that further hundreds have not known that they were the subjects of an experiment although grave consequences have been suffered as a direct result of experiments described here.’).

Kayte Spector-Bagdady & Paul A. Lombardo, ‘Something of an adventure’: postwar NIH Research Ethos and the Guatemala STD Experiments 41 J. Law Med. Ethics 697 (2013).

Ruth R. Faden & Tom L. Beauchamp, A History and Theory of Informed Consent, at 23 (1986); Tom L. Beauchamp, Informed Consent: Its History, Meaning, And Present Challenges , 20 Cambridge Q. Healthcare Ethics 515, 518 (2011).

H. A. Callis , Comparative Therapy in Syphilis , 21 J. Natl. Med. Assoc. 61 (1929).

John F. Mahoney et al., Penicillin Treatment of Early Syphilis: A Preliminary Report , 33 Am. J. Public Health & Nation’s Health 1390 (1943).

In many cases this was not actually successful, making the Tuskegee experiments more of a study of under- rather than untreated syphilis. Susan M. Reverby, Compensation and Reparations for Victims and Bystanders of the U.S. Public Health Service Research Studies in Tuskegee and Guatemala: Who Do We Owe What? Bioethics DOI: 10.1111/bioe.12784 (2020) (‘…some of the men still alive in the post World War II antibiotic era were able to get to treatment, sometimes because they had moved outside of the area, or because their doctors did not know they were in the study.’).

Jones, supra note 30.

Presidential Commission for the Study of Bioethical Issues, Ethically Impossible: STD Research in Guatemala from 1946 to 1948 (2011) https://bioethicsarchive.georgetown.edu/pcsbi/sites/default/files/Ethically%20Impossible%20(with%20linked%20historical%20documents)%202.7.13.pdf (accessed July 18, 2020).

Susan M. Reverby, ‘Normal Exposure’ and Inoculation Syphilis: A PHS ‘Tuskegee’ Doctor in Guatemala, 1946–1948, 23 J. Policy Hist. 6 (2011).

Kayte Spector-Bagdady & Paul A. Lombardo, U.S. Public Health Service STD Experiments in Guatemala (1946–1948) and Their Aftermath , 41 Ethics Hum. Res. 29 (2019).

National Bioethics Advisory Commission, The Belmont Report (1979) https://www.hhs.gov/ohrp/regulations-and-policy/belmont-report/read-the-belmont-report/index.html#xrespect (accessed July 18, 2020) (‘In most cases of research involving human subjects, respect for persons demands that subjects enter into the research voluntarily and with adequate information.’); Jonathan Beever & Nicolae Morar, The Porosity of Autonomy: Social and Biological Constitution of the Patient in Biomedicine , 16 Am. J. Bioeth. 34 (2016) (‘Respect for the individual holds the place of utmost prominence among the principles of contemporary bioethics.’).

Ruqaiijah Yearby, Exploitation in Medical Research: The Enduring Legacy of the Tuskegee Syphilis Study , 67 Case W. Res. 1171, 1175 (2017).

For an argument regarding why principlism is oversimplified altogether, see John H. Evans, A Sociological Account of the Growth of Principlism , 30 Hastings Cent. Rep. 5, 31–38 (2000) (‘Principlism is…a method that takes the complexity of actually lived moral life and translates this information into four scales by discarding information that resists translation.’).

45 CFR § 46 (2019).

45 CFR § 46 (2019), Subparts B-D. Of note, only Subpart A of 45 CFR § 46 is called the ‘Common Rule.’

FADEN, supra note 35, at 23 (1986).

Holly Fernandez Lynch et al., Of Parachutes and Participant Protection: Moving Beyond Quality to Advance Effective Research Ethics Oversight 14 J. Empir. Res. Hum. Res. Ethics 190 (2019).

Council on Governmental Relations, Association of Public & Land-Grant Universities, COGR-APLU analysis of the Common Rule NPRM comments: COGR June 2016 meeting   http://www.cogr.edu/COGR/files/ccLibraryFiles/Filename/000000000371/COGR-APLU%20Analysis%20of%20the%20Common%20Rule%20NPRM%20Comments.pdf (accessed Aug. 6, 2019).

Moore v. Regents of University of California, 51 3d 120 (Cal 1990).

Greenberg v. Miami Children’s Hosp. Research Inst., Inc., 264 F. Supp. 2d 1064 (S.D. Fla. 2003).

Wash. Univ. v. Catalona, 437 F. Supp. 2d 985 (E.D. Mo. 2006), aff’d, 490 F.3d 667 (8th Cir. 2007).

Dinerstein , 2020 WL 5296920 at 1079.

Jessica L. Roberts, Genetic Conversion , Available at SSRN: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3357566 (accessed July 16, 2020).

Moore , 51 3d at 131 (‘Accordingly, we hold that a physician who is seeking a patient’s consent for a medical procedure must, in order to satisfy his fiduciary duty and to obtain the patient’s informed consent, disclose personal interests unrelated to the patient’s health, whether research or economic, that may affect his medical judgment.’).

Moore , 51 3d at 146.

Greenberg , 264 F. Supp. 2d.

The Supreme Court subsequently found that a patent on a genetic variant is invalid ``merely because it has been isolated'', see Association for Molecular Pathology v. Myriad Genetics, Inc., 569 U.S. 576 (2013).

Greenberg , 264 F. Supp. 2d at 1067.

Id. , at 1070.

Id. , at 1075.

Wash. Univ., 490 F.3d.

Id. at 988–89.

Id. at 993.

Id. at 995–97 (‘The two cases which provide the most guidance conclude that research participants retain no ownership of biological materials they contribute for medical research.’).

Id. at 1000

Dinerstein v. Google, LLC, No. 19 C 4311, 2020 WL 5296920, 1079–1124 (N.D. Ill. Sept. 4, 2020); see also I. Glenn Cohen & Michelle M. Mello, Big Data, Big Tech, and Protecting Patient Privacy , 322 JAMA 1141 (2019) (hereinafter Cohen & Mello, Big Data ).

Id. at 1079.

Dinerstein , 2020 WL 5296920 at 1099.

Id. at 1104.

Id. at 1112.

45 C.F.R. §§ 164.502(a)(5)(ii)(B)(2)(ii).

Dinerstein , 2020 WL 5296920 at 1082.

Id. at 1109.

Id. at 1119.

Parks v. Wells Fargo Home Mortg., Inc., 398 F.3d 937, 940–41 (7th Cir. 2005) (quoting Maere v. Churchill, 116 Ill. App. 3d 939, 944, 452 N.E.2d 694, 697 (3d Dist. 1983)).

Dinerstein , 2020 WL 5296920 at 1118.

Neil Richards & Woodrow Hartzog, The Pathologies of Digital Consent , 96 Wash. U. L. Rev. 1461, 1462 (2019).

45 C.F.R. § 46.101 (2019) (‘…this policy applies to all research involving human subjects conducted, supported, or otherwise subject to regulation by any Federal department or agency that takes appropriate administrative action to make the policy applicable to such research.’).

Helvering v. Davis, 301 U.S. 619, 645 (1937) (‘…when money is spent to promote the general welfare, the concept of welfare or the opposite is shaped by Congress…’); Institute of Medicine, Committee on Ethical Considerations for Revisions to DHHS Regulations for Protection of Prisoners Involved in Research (Gostin LO, Vanchieri C, Pope A, eds.) (2007).

Department of Homeland Security et al., Notice of Proposed Rulemaking: Federal Policy for the Protection of Human Subject . 80 Fed. Reg. 173, 53933–54061, 53989 (Sept. 8, 2015).

This is due to the fact that regulators agreed with the ‘slim majority’ of commenters opposing the change and ultimately agreed that such an extension would ‘benefit from further deliberation’. Department of Homeland Security et al., Federal Policy for the Protection of Human Subjects 82 Fed. Reg. 12, 7149–7274, 7155–56 (Jan. 19, 2017).

Id. at 7156 (‘We recognize that institutions may choose to establish an institutional policy that would require IRB review of research that is not funded by a Common Rule department or agency (and indeed, as commenters noted, almost all institutions already do this), and nothing in this final rule precludes institutions from providing protections to human subjects in this way.’), although the revisions also did away with the previous ‘Federal Wide Assurance’ mechanism under which institutions could contractually commit to the government that they would do so (‘We therefore plan to implement the proposed nonregulatory change to the assurance mechanism to eliminate the voluntary extension of the FWA to nonfederally funded research.’).

Eg 21 C.F.R. §312 (1987).

21 CFR § 50.1 (2019).

Presidential Commission for the Study of Bioethical Issues, Moral Science: Protecting Participants in Human Subjects Research (2011) https://bioethicsarchive.georgetown.edu/pcsbi/node/558.html (accessed Dec. 29, 2019).

Kayte Spector-Bagdady et al., Nonregulated Interventions, Clinical Trial Registration, and Editorial Responsibility , 12 Circ. Cardiovasc. Qual. Outcomes E005721 (2019) (‘Because of inherent limitations of the scope of enforcement by FDA, funders, and research institutions, a critical fourth wall of protection can also be journal standards. Expectations set by journal editors can influence major components of the international research industry.’).

General Data Protection Regulation 2016/679 (2018).

California Consumer Privacy Act, AB No. 375 (2018).

W. Nicholson Price II et al., Shadow Health Records Meet New Data Privacy Laws , 363 Science 448, 450 (2019) (‘the GDPR refers to exceptions for “scientific research”, the “public interest”, and “public health” without clearly defining these overlapping terms or addressing dual-use endeavors.’) (hereinafter Price et al., Shadow health records ).

Mabel Crescioni & Tara Sklar, The Research Exemption Carve Out: Understanding Research Participants Rights Under GDPR and U.S. Data Privacy Laws, 60 Jurimetrics 125, 128 (2020).

Id. at 128 (‘Exemptions included in the law allow sponsors to refuse the request for the data to be removed, but this exemption has yet to be interpreted or applied by the courts.’).

Dept. Health & Human Services, Business Associate Contracts , Jan. 25, 2013, https://www.hhs.gov/hipaa/for-professionals/covered-entities/sample-business-associate-agreement-provisions/index.html (accessed Dec. 25, 2019).

45 C.F.R. § 164.514 (b) (2016).

W. Nicholson Price II, Problematic Interactions between AI and Health Privacy , Utah L. Rev. (forthcoming) (hereinafter Price, Problematic interactions ).

Kayte Spector-Bagdady, Hospitals Should Act Now to Notify Patients About Research Use of Their Data and Biospecimens 26 Nat. Med. 306 (2020).

W. Nicholson Price II, Medical AI and Contextual Bias, 33 Harv. J.L. & Tech. 65, *** (2020) (hereinafter Price, Contextual bias ).

W. Nicholson Price II & I. Glenn Cohen, Privacy in the Age of Medical Big Data , 25 Nat. Med. 37, 39 (2019) (‘When Congress enacted HIPAA in 1996, it envisioned a regime in which most health data would be held in health records, and so it accordingly focused on health care providers and other covered entities. In the big-data world, the type of data sources covered by HIPAA are but a small part of a larger health data ecosystem.’).

Stacey A. Tovino, Assumed Compliance, 72 Ala. L. Rev. 279, 282 (2020) (‘The HIPAA Rules do not protect the privacy and security of health data collected, used, disclosed, or sold by many technology companies, online service providers, mobile health applications, and other entities and technologies that do not meet the definition of a covered entity or business associate.’).

Price, Problematic interactions , supra note 99.

I. Glenn Cohen & Michelle M. Mello, HIPAA and Protecting Health Information in the 21st Century , 320 JAMA 231, 232 (2018) (‘HIPAA does not cover health or health care data generated by noncovered entities or patient-generated information about health (eg social-media posts). It does not touch the huge volume of data that is not directly about health but permits inferences about health…. The amount of such data collected and trade online is increasing exponentially and eventually may support more accurate predictions about health than a person’s medical records.’).

45 CFR § 46.101(a) (2019).

45 CFR § 46.102(e) (2019).

Presidential Commission for the Study of Bioethical Issues, Privacy and Progress in Whole Genome Sequencing, at 83 (Oct. 2012) https://bioethicsarchive.georgetown.edu/pcsbi/sites/default/files/PrivacyProgress508_1.pdf (accessed Aug. 2, 2019)(‘Obtaining a whole genome sequence data file by itself yields information about, but does not definitively identify, a specific individual. The individual still has “practical obscurity”, as his or her identity is not readily ascertainable from the data. Practical obscurity means that simply because information is accessible, does not mean it is easily available or interpretable, and that those who want to find specific information must expend a lot of effort to do so…. In addition, even if we know that a whole genome sequence is from one individual, we cannot know which of the over 7 billion people on Earth that person is without a key linking the whole genome sequence information with a single person or their close relative. Therefore, while whole genome sequence data are uniquely identifiable, they are not currently readily identifiable.’) (hereinafter, PCSBI, Privacy & Progress ).

Luc Rocher et al., Estimating the Success of Re-Identifications in Incomplete Datasets Using Generative Models , 10 Nat. Commun. 3069 (2019).

Latanya Sweeny, Research accomplishments , http://latanyasweeney.org/work/identifiability.html (accessed Nov. 13, 2021).

Public Law 110–233; Genetic Information Nondiscrimination Act (2008).

Ellen Wright Clayton et al., The Law of Genetic Privacy: Applications, Implications, and Limitations , 6 J. Law Biosci. 11 (2019) (‘The Privacy Rule was never intended to be a comprehensive health privacy regulation, but it has assumed such a role by default because of Congress’s failure to enact more sweeping and rigorous health and genetic privacy laws and regulations.’).

45 CFR § 46.116 (2019).

Holly Fernandez Lynch, et al., Implementing Regulatory Broad Consent Under the Revised Common Rule: Clarifying Key Points and the Need for Evidence , 47 J. Law Med. Ethics. 213 (2019).

Kayte Spector-Bagdady & Elizabeth Pike, Consuming Genomics: Regulating Direct-to-Consumer Genetic and Genomic Information , 92 Neb. L. Rev. 677, 698 (2014).

Andrew Pollack, Walgreens Delays Selling Personal Genetic Test Kit , N.Y. Times, May 12, 2010, at B5.

Spector-Bagdady & Pike, supra note 117, at 705 (‘An FDA representative stated that the DTC distribution of genetic tests can increase the risk of a device because “a patient may make a decision that adversely affects [his or her] health, such as stopping or changing the dose of a medication or continuing an unhealthy lifestyle, without the intervention of a learned intermediary.”’)

Id. at 705–10.

Lisa Baertlein, Google-backed 23andMe Offers $999 DNA Test , USA Today, Nov. 20, 2007, http://usatoday30.usatoday.com/tech/webguide/internetlife/2007-11-20-23andme-launch_N.htm (accessed July 18, 2020).

23andMeBlog, 23andMe Takes First Step Toward FDA Clearance , July 30, 2012, http://blog.23andme.com/news/23andme-takes-first-step-toward-fda-clearance/ (accessed July 18, 2020).

Spector-Bagdady & Pike, supra note 117, at 705–10.

23andMe, About Us   https://mediacenter.23andme.com/company/about-us/ (accessed July 17, 2020).

Erin Brodwin & Katie Palmer, 5 burning questions on the business of big genetics based on 23andMe’s filing to go public , STAT+ (Feb. 5, 2021) https://www.statnews.com/2021/02/05/23andme-public-profit-genetics-data/ (accessed Feb. 13, 2021).

Casey Ross, Google, Mayo Clinic strike sweeping partnership on patient data, STAT News, Sept. 12, 2019 https://www.statnews.com/2019/09/10/google-mayo-clinic-partnership-patient-data/ (accessed July 18, 2020) (‘Mayo Clinic, one of medicine’s most prestigious brands, announced Tuesday that it has struck a sweeping partnership with Google to store patient data in the cloud and build products using artificial intelligence and other technologies to improve care.’).

Rebecca Robins, Contract offers unprecedented look at Google deal to obtain patient data from the University of California, Feb. 26, 2020 https://www.statnews.com/2020/02/26/patient-data-contract-google-university-of-california/ (accessed July 16, 2020).

Stanford Medicine, Stanford Medicine, Google team up to harness power of data science for health care (Aug. 8, 2016) https://med.stanford.edu/news/all-news/2016/08/stanford-medicine-google-team-up-to-harness-power-of-data-science.html (accessed Feb. 13, 2021) (‘Together, Stanford Medicine and Google will build cloud-based applications for exploring massive health-care data sets, a move that could transform patient care and medical research… “We are excited to support the creation of the Clinical Genomics Service by connecting our clinical care technologies with Google’s extraordinary capabilities for cloud data storage, analysis and interpretation, enabling Stanford to lead in the field of precision health”, said Pravene Nath, chief information officer…’).

Cohen & Mello, Big Data , supra note 70, at 1141 (‘Once an individual’s identity is ascertained, the company could then link EHR data with other types of information about that person (eg what they purchase). HIPAA bars none of this except the release of date stamps, and would not be implicated, for example, if Google identified individuals by linking EHR data without HIPAA identifiers to internet data of consumers who visited the University of Chicago hospital and searched online for information about particular medical conditions or if a social-media company linked such EHR data to a user’s posts about a hospital stay.’).

Id. at 1084.

Id. at 1095.

Google and UCSF, Data Evaluation License Agreement, Mar. 1, 2016 https://www.statnews.com/2020/02/26/patient-data-contract-google-university-of-california/ (accessed July 16, 2020).

Kayte Spector-Bagdady et al., Genetic Data Partnerships: Academic Publications with Privately Owned or Generated Genetic Data , 21 Genet Med. 2827, 2828 (2019).

Stanford Medicine, supra note 129.

Robins, supra note 128.

Brett M. Frischmann et al., Common Knowledge , 362 Science 1240 (2018).

Hess & Ostrom, supra note 170.

Catherine Offord, The Rising Research Profile of 23andMe , Nov. 30, 2017, https://www.the-scientist.com/news-analysis/the-rising-research-profile-of-23andme-30564 (accessed July 18, 2020) (‘But the value of such self-reported data sets is perceived more highly than it used to be, in part thanks to the success of 23andMe’s research contributions, notes Weinberg. “Historically, people were very skeptical you’d be able to collect data in this relatively simplistic way and still yield the results”, he says. “But I think they have proven again and again that you can do that. There is strength in numbers.”’).

23andMe, 23andMe for Scientists , https://research.23andme.com/ (accessed Dec. 29, 2019) (‘The 23andMe cohort is the largest re-contactable research database of genotypic and phenotypic information in the world. By inviting customers to participate in research, we have created a new research model that accelerates genetic discovery and offers the potential to more quickly garner new insights into treatments for disease.’).

But see NIH RePORT, Genetic data sharing partnerships: Enabling equitable access within academic/private data sharing agreements , PI: Kayte Spector-Bagdady (‘This research proposes to characterize and evaluate the factors influencing these genetic data partnerships (beginning with academics), compare market drivers to current existing governance structures, and offer a model for best practices.’).

Hardin, supra note 26.

John M. Conley et al., Myriad After Myriad: The Proprietary Data Dilemma , 15 N. C. J. Law Technol. 597 (2014).

Ezekiel J. Emanuel & Christine Grady, Case Study. Is Longer Always Better? 38 Hastings Cent. Rep. 10 (2008) (arguing that consent forms are ‘growing in length and complexity, becoming ever more intimidating, and perhaps inhibiting rather than enhancing participants’ understanding. Participants may not even read them, much less understand them.’).

Laura M. Beskow, Lessons from HeLa Cells: The Ethics and Policy of Biospecimens , 17 Annu. Rev. Genom. Hum. Genet. 409 (2016).

45 CFR § 46.166(b) (2019).

45 CFR § 46.166(a)(2) (2019) (‘An investigator shall seek informed consent only under circumstances that provide the prospective subject or the legally authorized representative sufficient opportunity to discuss and consider whether or not to participate and that minimize the possibility of coercion or undue influence.’).

Patrick Taylor, Personal Genomes: When Consent Gets in the Way , 456 Nature 32 (2008).

Beskow, supra note 145, at 408; Ellen W. Clayton, The Unbearable Requirement of Informed Consent , 19 Am. J. Bioeth. 19 (2019).

Meg Leta Jones & Margot E. Kaminski, An American’s Guide to the GDPR 98 Denver L. Rev. 1 (forthcoming) https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3620198 (accessed July 16, 2020).

Kayte Spector-Bagdady et al., ‘My research is their business, but I’m not their business’: Patient and Clinician Perspectives on Commercialization of Precision Oncology Data 25 The Oncologist 620 (2020) (‘Several patient- and clinician-participants did not understand that the consent form already permitted commercialization of patient genetic data and expressed concerns regarding who would profit from the data, how profits would be used, and privacy and access.’).

Emanuel & Grady, supra note 144.

Richards & Hartzog, supra note 82, at 1464 (‘Consent is undeniably powerful, and often very attractive. But we have relied upon it too much, and deployed it in ways and in contexts to do more harm than good, and in ways that have masked the effects of largely unchecked (and sometimes unconscionable) power.’).

Beskow, supra note 145, at 408.

Taylor, supra note 148, at 32.

Brian J. Zikmund-Fisher, Helping People Know Whether Measurements Have Good or Bad Implications: Increasing the Evaluability of Health and Science Data Communications , 6 Pol’y Insights from Behavioral & Brain Sciences 29, 31 (2019) (‘people tend to ignore single risk statistics in decision making…’).

Kayte Spector-Bagdady et al., Distinguishing Cell Line Research , 5 JAMA Oncol. 406, 409 (2019) (‘Given technological indeterminacy and individual contributor preference, however, identifiability should not be considered a binary concept. Guidance regarding deidentification under the Privacy Rule of [HIPAA] acknowledges that, even for technically deidentified data, some risk of reidentification will always remain. How much risk is acceptable is ultimately a policy question that requires ethical analysis.’).

Tom Tomlinson & Raymond G. De Vries, Human Biospecimens Come from People , 41 Ethics Hum. Res. 22, 23 (2019).

Beth Prusaczyk et al. Informed Consent to Research with Cognitively Impaired Adults: Transdisciplinary Challenges and Opportunities , 40 Clin. Gerontol. 1, 63–73 (2017).

K. Spector-Bagdady & J. Beever, Rethinking the Importance of the Individual within a Community of Data Hastings Cent Rep doi: 10.1002/hast.1112 (2020) (‘Given the low-individual-risk and high-community-benefit profile of many secondary research protocols, in which biospecimens generally play a large role, to subjugate (high) community benefit to (low) individual risk at this scale is inappropriate. We should instead be prioritizing assessment of risk versus benefits at the community level.’)

Rai, supra note 1, at 1651 (‘…rules governing interventional research, which involves a risk of physical harm, do not necessarily function well when exported to the realm of purely informational research.’).

Price & Cohen, supra note 103, at 40 (‘Especially for deontological concerns with health privacy, the loss of control over who accesses an individual’s data and for what purpose matters, even if there are no material consequences for the individual or if the individual does not even know.’).

Parks v. Wells Fargo Home Mortg., Inc., 398 F.3d 937, 940–41 (7th Cir. 2005) (quoting Maere v. Churchill, 116 Ill. App. 3d 939, 944, 452 N.E.2d 694, 697 (3d Dist. 1983)). Dinerstein , 2020WL 5296920.

Raymond G. De Vries, et al, The Moral Concerns of Biobank Donors: The Effect of Non-Welfare Interests on Willingness to Donate , 12 Life Sci. Soc. Policy 3 (2016).

Spector-Bagdady & Beever, supra note 160 (‘The idea of respect for the individual participant is historically contingent; it no longer exhausts the ways we design and conduct research. Individuals are still needed, but their greatest value is often in aggregate.’)

Geoffrey Rose, Strategy of Prevention: Lessons From Cardiovascular Disease , 282 Br. Med. J. (Clin. Res. Ed.) 1847, 1850 (1981) (‘We arrive at what we might call the prevention paradox—“a measure that brings large benefits to the community offers little to each participating individual.”’).

Evans, supra note 4, at 661 (‘Generally speaking, though, once data are converted into a common data model or other interoperable format, further uses of the converted data are non-rivalrous. Health data resources thus can support the simultaneous existence of multiple health data commons.’); Charlotte Hess & Elinor Ostrom, Introduction: An overview of the knowledge commons . In Understanding Knowledge as a Commons: From Theory to Practice, at 5 (Hess C, Ostrom E, eds. 2011) (‘Most types of knowledge have, on the other hand, typically been relatively nonsubtractive.’).

Hess & Ostrom, id .

PCSBI, Privacy & Progress, supra note 109, at 3 (‘Currently, the majority of the benefits anticipated from whole genome sequencing research will accrue to society, while associated risks fall to the individuals sharing their data.’)

Fundamental Finance, Negative Externality , http://economics.fundamentalfinance.com/negative-externality.php (accessed April 6, 2020).

Price et al., Shadow Health Records, supra note 94, at 450.

Spector-Bagdady, Encouraging Participation, supra note 8.

Jagsi et al., supra note 9.

Jeffrey Peppercorn et al., Patient Preferences for Use of Archived Biospecimens from

Oncology Trials When Adequacy of Informed Consent Is Unclear , 25 The Oncologist 78 (2019).

Timothy Caulfield et al., A Review of the Key Issues Associated With the Commercialization of Biobanks , 1 J. Law Biosci. 94 (2014).

Spector-Bagdady, Encouraging Participation , supra note 8.

Jessica L. Roberts, Negotiating Commercial Interests in Biospecimens , 45 J Law Med Ethics 138 (2017); Joshua D. Smith et al., Immortal Life of the Common Rule: Ethics, Consent, and the Future of Cancer Research , 35 J. Clin. Oncol. 1879 (2017).

NAM, Health Data Sharing to Support Better Outcomes: Building a Foundation of Stakeholder Trust (2020) https://nam.edu/health-data-sharing-special-publication/ (accessed Jan. 28, 2021).

Evans, supra note 4, at 684 (‘These federal regulations conceive individual protection as an exit right (informed consent/authorization) while granting people no real voice in setting the goals of informational research or the privacy and ethical protections they expect. Lacking a voice, people CRexit from research—that is, they exercise their Common Rule, or “CR”, informed consent rights to exit from informational research altogether. This CRexit strategy scarcely advances people’s interests, when surveys show that most Americans would like to see their data used to advance science.’).

Id. (‘Low participation in informational research should be viewed as what it is: a widespread popular rejection of the unsatisfactory, top-down protections afforded by existing regulations.’).

Richards & Hartzog, supra note 82, at 1467.

Michael A. Heller & Rebecca S. Eisenberg, Can patents deter innovation? The anticommons in biomedical research , 280 Science 698, 698 (1998) (‘Since Hardin’s article appeared, biomedical research has been moving from a commons model toward a privatization model... Today, upstream research in the biomedical sciences is increasingly likely to be “private” in one or more senses of the term—supported by private funds, carried out in a private institution, or privately appropriated through patents, trade secrecy, or agreements that restrict the use of materials and data.’); see also Craig Konnoth, Preemption through Privatization, Harvard L Rev, Vol 134 (2021).

Hardin, supra note 26, at 1244.

Mark Phillips & Bartha M. Knoppers, Whose Commons? Data Protection as a Legal Limit of Open Science , 47 J. Law Med. Ethics 106 (2019).

NIH, Strategic Plan for Data Science, 2018, https://datascience.nih.gov/sites/default/files/NIH_Strategic_Plan_for_Data_Science_Final_508.pdf (accessed July 18, 2020).

Ida Sim et al., Time for NIH to Lead on Data Sharing , 367 Science 1308 (2020).

National Research Council, supra note 3.

All of Us Research Program Investigators, The ‘All of Us’ Research Program , 381 N. Engl. J. Med. 668, 668 (2019).

All of Us Research Program, National Research Program Returns First Results to Local Participants (Jan. 21, 2021) https://allofus.nih.gov/news-events-and-media/news/national-research-program-returns-first-results-local-participants (accessed Feb. 13, 2021).

Id. at 669 (‘Among persons from whom biospecimens are obtained, the target percentage of persons in racial and ethnic minorities is more than 45% and that of persons in underrepresented populations is more than 75%.’)

Id. at 675.

23andMe, About Us , https://mediacenter.23andme.com/company/about-us/ (accessed Aug. 15, 2019).

23andMe, 23andMe therapeutics , https://therapeutics.23andme.com/ (last visited July 17, 2020).

Kari Paul, Fears over DNA privacy as 23andMe plans to go public in deal with Richard Branson, The Guardian (Feb. 9, 2021) https://www.theguardian.com/technology/2021/feb/09/23andme-dna-privacy-richard-branson-genetics (accessed March 16, 2021).

Kayte Spector-Bagdady, Reconceptualizing Consent for Direct-to-Consumer Health Services , 41 Am. J. Law Med. 568 (2015).

Richards & Hartzog, supra note 82, at 1461 (‘We argue that consent is most valid when we are asked to choose infrequently , when the potential harms that result from the consent are easy to imagine , and when we have the correct incentives to consent consciously and seriously . The further we fall from this gold standard, the more a particular consent is pathological and thus suspect.’).

Kirsten Ostherr et al., Trust and privacy in the context of user-generated health data , Big Data & Society 1 (2017) (‘Members of the general public expressed little concern about sharing health data with the companies that sold the devices or apps they used...’).

Brooke Auxier et al., Americans and Privacy: Concerned, Confused and Feeling Lack of Control Over Their Personal Information , Nov. 15, 2019, https://www.pewresearch.org/internet/2019/11/15/americans-and-privacy-concerned-confused-and-feeling-lack-of-control-over-their-personal-information/ (accessed Nov. 15, 2019).

Yannis Bakoset al., Does Anyone Read the Fine Print? Consumer Attention to Standard Form Contracts , 43 J. Legal Stud. 1 (2014).

Ostherr et al., supra note 200, at 6 (‘While users were generally aware that consenting to a company’s terms of use constitutes a legal contract, very few reported actually reading those agreements before consenting to them. One participant commented: “Do I ever read ‘terms of use’? Did I actually read the consent form I just signed? No. I just agree to everything like I do for all of my Apple updates. Agree. Agree. Done.”’).

Richards & Hartzog, supra note 82, at 1461.

GlaxoSmithKline, GSK and 23andMe sign agreement to leverage genetic insights for the development of novel medicines , July 25, 2018, https://www.gsk.com/en-gb/media/press-releases/gsk-and-23andme-sign-agreement-to-leverage-genetic-insights-for-the-development-of-novel-medicines/ (accessed July 25, 2018). (‘GSK and 23andMe today unveiled an exclusive four-year collaboration that will focus on research and development of innovative new medicines and potential cures, using human genetics as the basis for discovery…Additionally, GSK has made a $300 M equity investment in 23andMe.’); Megan Molteni, 23andMe’s pharma deals have been the plan all along , Wired, Aug. 3, 2018, https://www.wired.com/story/23andme-glaxosmithkline-pharma-deal/ (accessed August 3, 2018) (‘But some customers were still surprised and angry, unaware of what they had already signed (and spat) away.’).

Richards & Hartzog, supra note 82, at 1497.

Price et al., Shadow health records , supra note 94, at 448.

Thomas Brewster, FaceApp: Is the Russian Face-Aging App a Danger to your Privacy?, Forbes, Jul. 17, 2019, https://www.forbes.com/sites/thomasbrewster/2019/07/17/faceapp-is-the-russian-face-aging-app-a-danger-to-your-privacy/#2b6e80982755 (accessed Dec. 25, 2019).

Pike, supra note 22, at 710 (‘Finally, the reality is that people are imperfect decision makers, particularly with choices that involve immediate gratification and delayed, but uncertain, consequences.’); Richards & Hartzog, supra note 82, at 1497 (‘people would have little incentive to deliberate because, frankly, they have little notion of the stakes, and the benefits of consent are right at their fingertips.’).

Better Business Bureau, BBB Tip: Do not share your COVID-19 vaccine card on social media (Jan. 29, 2021) https://www.bbb.org/article/news-releases/23675-bbb-tip-dont-share-your-vaccine-card-on-social-media (accessed Nov. 13, 2021).

23andMe, 23andMe Research Innovations Collaborations Program   https://research.23andme.com/research-innovation-collaborations/ (accessed Aug. 22, 2019) (‘We accept applications from academic researchers on a rolling basis. In June and December, we hold a scientific review to evaluate proposals for the limited number of collaborative projects we can initiate. Applicants are informed of our committee decision approximately three months after the deadline.’).

Maria Elena Flacco et al., Head-to-Head Randomized Trials are Mostly Industry Sponsored and Almost Always Favor the Industry Sponsor , 68 J. Clin. Epidemiol. 811 (2015).

Eg C. Lee Ventola, The drug shortage crisis in the United States: Causes, Impact, and Management Strategies , 36 PT. 740 (2011); and Daniel Kozarich, Mylan’s EpiPen Pricing Crossed Ethical Boundaries , Fortune, Sept. 27, 2016, http://fortune.com/2016/09/27/mylan-epipen-heather-bresch/ (accessed Apr. 2, 2018).

Brodwin & Palmer, supra note 125.

John P. A. Ioannidis, The Challenge of Reforming Nutritional Epidemiologic Research , 320 JAMA 969 (2018).

Adam Marcus, Psychological Science in the news again: CNN retracts story on hormone-voting, Oct. 25, 2012, https://retractionwatch.com/2012/10/25/psychological-science-in-the-news-again-cnn-retracts-story-on-hormone-voting-link/ (accessed July 18, 2020).

NIH, Policy for Data Management, supra note 2.

Mandeep R. Mehra et al., Retraction: Cardiovascular Disease, Drug Therapy, and Mortality in Covid-19, N. Engl. J. Med. DOI:   10.1056/NEJMoa2007621 , 382 N. Engl. J. Med. 26, at 2582 (2020) (‘Because all the authors were not granted access to the raw data and the raw data could not be made available to a third-party auditor, we are unable to validate the primary data sources underlying our article…. We therefore request that the article be retracted.’)

Mandeep R. Mehra et al., RETRACTED: Hydroxychloroquine or Chloroquine With or Without a Macrolide for Treatment of COVID-19: A Multinational Registry Analysis , Lancet doi: 10.1016/S0140-6736(20)31180-6 (2020).

Rebecca S. Eisenberg, Proprietary Rights and the Norms of Science in Biotechnology Research , 97 Yale L.J. 177, 197 (1987) (‘But for research involving the use of unique biological materials, such as bacterial strains and other types of self-replicating cells, publication in writing alone may not be sufficient to satisfy this replicability norm. To replicate the authors’ results, subsequent investigators may need access to identical materials. By sharing access to unique materials, however, the publishing scientist not only enables other scientists to replicate her claims; she also allows them to compete with her more effectively in making new discoveries.’) (hereinafter, Eisenberg, Proprietary Rights ).

Heller & Eisenberg, supra note 185, at 698 (‘A resource is prone to overuse in a tragedy of the commons when too many owners each have a privilege to use a given resource and no one has a right to exclude another. By contrast, a resource is prone to underuse in a “tragedy of the anticommons” when multiple owners each have a right to exclude others from a scarce resource and no one has an effective privilege of use.’).

Eisenberg, Proprietary Rights, supra note 220, at 198 (‘Withholding materials is a relatively inconspicuous departure from scientific norms. It occurs after publication and is not apparent from the written text.’).

Id. (‘…publishing scientists with exclusive access to such materials have an opportunity to gain recognition while retaining a future advantage over their research competitors. This conflict between norms and incentives is aggravated when the materials (or the discoveries they facilitate) have potential commercial value.’).

Rebecca S. Eisenberg, Noncompliance, Nonenforcement, Nonproblem? Rethinking the Anticommons in Biomedical Research , 45 Hous. L. Rev. 1059, 1098–99 (2008) (hereinafter, Eisenberg, Noncompliance ).

Avi Selk, The ingenious and ‘dystopian’ DNA technique police used to hunt the ‘Golden State Killer’ suspect , NY Times, Apr. 28, 2018, https://www.washingtonpost.com/news/true-crime/wp/2018/04/27/golden-state-killer-dna-website-gedmatch-was-used-to-identify-joseph-deangelo-as-suspect-police-say/ (accessed Dec. 25, 2019).

Cassie Martin, Why a warrant to search GEDmatch’s genetic data has sparked privacy concerns , ScienceNews, Nov. 12, 2019, https://www.sciencenews.org/article/why-warrant-search-gedmatch-genetic-data-has-sparked-privacy-concerns (accessed Dec. 25, 2019).

Megan Molteni, A DNA firm that caters to police just bought a genealogy site , Wired, Dec. 9, 2019, https://www.wired.com/story/a-dna-firm-that-caters-to-police-just-bought-a-genealogy-site/ (accessed Dec. 25, 2019).

De Vries, supra note 166.

Caulfield, supra note 178.

Cambon-Thomsen, supra note 17; Ruth Chadwick & Kare Berg, Solidarity and Equity: New Ethical Frameworks for Genetic Databases , 2 Nat. Rev. Genet. 318 (2001); Robert Cook-Deegan et al., Introduction: Sharing Data in a Medical Information Commons , 47 J. Law Med. Ethics 7 (2019); Patricia A. Deverka et al., Hopeful and Concerned: Public Input on Building a Trustworthy Medical Information Commons , 47 J. Law Med. Ethics 70 (2019); Evans, supra note 4; Frischmann et al., supra note 137; Hess & Ostrom, supra note 170; Sharona Hoffman, Electronic Health Records and Medical Big Data: Law and Policy (2016); Pike, supra note 22; Rai, supra note 1; Richards & Hartzog, supra note 82; Mark D. Wilkinson et al., The FAIR Guiding Principles for scientific data management and stewardship , 15 Sci. Data. 160018 (2016).

Amy L. McGuire, supra note 24, at 15 (2019) (‘We acknowledge that weighing all relevant considerations, opt-out or no consent may be appropriate in contexts such as public health surveillance, use of de-identified information from a single source, like an electronic health record or newborn blood spot collection, or for research where an inclusive dataset is genuinely necessary in order to produce unbiased results, subject to careful oversight and accountability mechanisms.’).

Juli M. Bollinger et al., What is a Medical Information Commons? 47 J. L. Med. & Ethics. 41, 46 (2019). (‘There was little enthusiasm for MICs to operate under a public health model where individual participation is mandatory or there is an opt-out consent option rather than opt-in. Interviewees commented that an opt-out model would require a high level of trust that is unlikely to be found in the U.S. population given past incidents of research misconduct (and concerns about discrimination in healthcare and other domains), especially among minority/underrepresented populations.’).

Stephen M. Maurer, Self-Governance in Science, at 180 (2017) (‘conventional government has tried and failed, sometimes repeatedly, to address particular policy problems.’).

Id. , at 239 (‘Nearly all scholars agree that firms are more likely to organize self-governance when downstream markets are “highly branded” so that demand depends less on objective attributes than “marketing” so that demand depends less on objective attributes than “marketing” and a “constructed brand identity” which makes them vulnerable to “shifts in consumer preferences.”’).

Id. , at 179 (‘US officials have long recognized that private organizations often possess more information and can produce better standards.’).

Id. , at 179, 181.

Id. , at 4 (‘But [the virtuous circle of self-regulation] can also work in reverse, so that each new defector who leaves a standard devalues compliance for those who remain and tempts them to leave as well. The case is very different where the standard’s benefits are externalities, ie flow indiscriminately to every member of the community. The most important example is where backlash damages firms that had nothing to do with the scandal.’).

Future of Privacy Forum, Privacy Best Practices for Consumer Genetic Testing Services (July 31, 2018), https://fpf.org/wp-content/uploads/2018/07/Privacy-Best-Practices-for-Consumer-Genetic-Testing-Services-FINAL.pdf (‘The Best Practices provide a policy framework for the collection, retention, sharing, and use of Genetic Data generated by consumer genetic and personal genomic testing services.’).

GlaxoSmithKline, supra note 205.

Future of Privacy Forum, supra note 240 (‘The Best Practices provide a policy framework for the collection, retention, sharing, and use of Genetic Data generated by consumer genetic and personal genomic testing services.’).

Wendy E. Parmet, Employers’ Vaccine Mandates Are Representative of America’s Failed Approach to Public Health , Atlantic (Feb. 4, 2021) https://www.theatlantic.com/ideas/archive/2021/02/privatization-public-health/617918/ (accessed Feb. 12, 2021) (‘Indeed, the private sector is often seen as nimbler than the government precisely because it can eschew the necessity of public input and the threat of public accountability.’).

Maurer, supra note 234, at 156.

NAM, supra note 181.

PhRMA, Code for Interactions with Healthcare Professionals (2017).

Office of Inspector General. Compliance Program Guidance for Pharmaceutical Manufacturers (2003), https://oig.hhs.gov/fraud/docs/complianceguidance/042803pharmacymfgnonfr.pdf (accessed Apr.2, 2018).

Pfizer Inc., Corporate Integrity Agreement Annual Report (2008).

Kayte Spector-Bagdady, Improving Commercial Genetic Data Sharing Policy , in Consumer Genetic Technologies (I. Glenn Cohen, Nita A. Farahany, Henry T. Greely, Carmel Shachar eds., Cambridge Univ. Press, forthcoming).

Maurer, supra note 234, at 188.

Spector-Bagdady, Genetic data partnerships, supra note 134.

Dinerstein , 2020 WL 5296920 at 1109 (‘Whatever a perpetual license for “Trained Models and Predictions” actually means, it appears to qualify as direct or indirect remuneration.’).

Price, Contextual Bias , supra note 101.

Dinerstein , 2020 WL 5296920 at 1109.

Eisenberg, Noncompliance , supra note 224, at 1086 (‘As long as it is cheaper for the owner to share the resource than it is for the user to recreate it, there are potential gains from exchange that stand to be dissipated through transaction costs or lost through bargaining breakdowns.’).

Eg Ross, supra note 127; Daisuke Wakabayashi, Google and the University of Chicago Are Sued Over Data Sharing , NY Times, June 26, 2019 https://www.nytimes.com/2019/06/26/technology/google-university-chicago-data-sharing-lawsuit.html (accessed July 18, 2020).

Lawrence Lessig, ‘Institutional corruption’ defined , 41 J. Law Med. Ethics 553 (2013).

Marks, supra note 28, at 108.

Id. at 114.

Id. at 153.

Deirdre Fernandes, Northeastern University starts vaccinating its front-line staff , The Boston Globe (Jan. 6, 2021) https://www.bostonglobe.com/2021/01/06/metro/northeastern-university-starts-vaccinating-its-front-line-staff/ (accessed Feb. 14, 2021).

Noah Higgins-Dunn, States will need billions to distribute the Covid vaccine as federal funding falls short , CNBC (Oct. 27, 2020) https://www.cnbc.com/2020/10/27/states-will-need-billions-to-distribute-the-covid-vaccine-as-federal-funding-falls-short.html (accessed Feb. 14, 2021).

Apoorva Mandavilli, At Elite Medical Centers, Even Workers Who Do not Qualify Are Vaccinated , NYTimes (Jan. 10, 2021) https://www.nytimes.com/2021/01/10/health/coronavirus-hospitals-vaccinations.html (accessed Feb. 14, 2021).

Wendy E. Parmet, Employers’ Vaccine Mandates Are Representative of America’s Failed Approach to Public Health, Atlantic (Feb. 4, 2021) https://www.theatlantic.com/ideas/archive/2021/02/privatization-public-health/617918/ (accessed Feb. 12, 2021).

Marks, supra note 28, at 30.

Ian Larkin et al., Association Between Academic Medical Center Pharmaceutical Detailing Policies and Physician Prescribing , 317 JAMA 1785 (2017).

Kayte Spector-Bagdady et al., Sharing Health Data and Biospecimens with Industry: A Principle-Driven, Practical Approach, 382 N. Engl. J. Med. 2072 (2020).

Michelle M. Mello et al., Waiting for Data: Barriers to Executing Data Use Agreements , 367 Science 150 (2020).

Kayte Spector-Bagdady & Paul A. Lombardo, From in vivo to in vitro: How the Guatemala STD Experiments Transformed Bodies into Biospecimens , 96 Milbank Q. 244 (2018).

Stephen G. Post, The Echo of Nuremberg: Nazi Data and Ethics , 17 J. Med. Ethics 42 (1991).

Retraction Watch, Journals retract more than a dozen studies from China that may have used executed prisoners’ organs , Aug. 14, 2019, https://retractionwatch.com/2019/08/14/journals-retract-more-than-a-dozen-studies-from-china-that-may-have-used-executed-prisoners-organs/ (accessed July 18, 2020).

Nicholas Eriksson et al., WebBased, Participant Driven Studies Yield Novel Genetic Associations for Common Traits , 6 PLoS Genet. E1000993 (2010).

Greg Gibson & Gregory P. Copenhaver, Consent and Internet-Enabled Human Genomics , 6 PLoS Genet. E1000965 (2010).

23andMe, 23andMe improves research consent , Jun. 24, 2010, https://blog.23andme.com/23andme-research/23andme-improves-research-consent-process/ (accessed Mar. 31, 2019).

Kayte Spector-Bagdady, ‘The Google of Healthcare’: Enabling the Privatization of Genetic Bio/Databanking , 26 Ann. Epidemiol. 515, 517 (2016).

As Beecher argued as far back as 1966: ‘All so-called codes are based on the bland assumption that meaningful or informed consent is readily available for the asking…this is very often not the case. Consent in any fully informed sense may not be obtainable. Nevertheless, except, possibly, in the most trivial situations, it remains a goal toward which one must strive for sociologic, ethical and clear-cut legal reasons. There is no choice in the matter.’ Beecher, supra note 33, at 1355.

Author notes

Month: Total Views:
May 2021 222
June 2021 142
July 2021 116
August 2021 187
September 2021 150
October 2021 93
November 2021 70
December 2021 67
January 2022 93
February 2022 98
March 2022 112
April 2022 144
May 2022 90
June 2022 47
July 2022 66
August 2022 96
September 2022 85
October 2022 128
November 2022 78
December 2022 82
January 2023 40
February 2023 53
March 2023 77
April 2023 59
May 2023 43
June 2023 52
July 2023 49
August 2023 42
September 2023 36
October 2023 42
November 2023 44
December 2023 29
January 2024 73
February 2024 71
March 2024 65
April 2024 171
May 2024 127
June 2024 9

Email alerts

Citing articles via.

  • Recommend to your Library

Affiliations

Duke University School of Law

  • Online ISSN 2053-9711
  • Copyright © 2024 Oxford University Press and Harvard, Duke and Stanford Law Schools
  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Institutional account management
  • Rights and permissions
  • Get help with access
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

Logo

Table of Content

A comprehensive guide to secondary market research.

Secondary market research is a valuable tool to drive your company's success. Let's explore everything you need to know about it.

secondary research government reports

To gain a competitive edge, businesses need to understand their target market, consumer behavior, and industry trends. This is where market research comes into play. While primary market research involves collecting data directly from the target audience, secondary market research offers a wealth of existing data and information that can provide valuable insights. 

Read more: A Guide to Primary Market Research: What Is It, and Why Is It Important?

Secondary market serves as an existing foundation of knowledge, supporting decision-making processes, enhancing understanding, and identifying opportunities. In this article, we explore the importance of secondary market research and its benefits in informing business strategies and driving success.

What is secondary market research?

Secondary market research refers to the process of collecting and analyzing existing data and information that has been previously gathered by someone else for a different purpose. It involves utilizing sources such as published reports, government publications, industry studies, academic journals, online databases, and other publicly available data.

Unlike primary market research, which involves collecting data directly from the target audience through surveys, interviews, or observations, secondary research relies on existing data sources such as market reports, industry statistics, customer surveys, and competitor analysis, among others.

Secondary market research is conducted to gain insights into market trends, industry dynamics, consumer behavior, competitive landscapes, and other factors that influence business decisions. It helps businesses and organizations understand the market and industry they operate in, identify opportunities and gaps, assess market potential, evaluate competition, and inform decision-making.

Why is secondary market research important?

Secondary market research plays a crucial role in informing your decision-making processes. It provides valuable insights into market trends, industry dynamics, consumer behavior, and competitive landscapes. Here are some key reasons why you should consider conducting secondary market research:

Informing your decision-making

Secondary market research provides you with valuable information that supports your decision-making processes. It helps you gather insights into market trends, industry dynamics, consumer behavior, and competitive landscapes. This information is essential for making informed strategic decisions, identifying opportunities, mitigating risks, and staying competitive in the market. For example, market research reports on the emerging trends in sustainable fashion can help you make eco-friendly sourcing decisions.

Read more: Using Data-Driven Decision Making to Decode Consumer Behavior: Here’s How to Do It

Understanding the market and industry

Secondary research helps you gain a comprehensive understanding of the market and industry landscape. It provides data and analysis on market size, growth rates, market segmentation, customer demographics, buying patterns, and emerging trends. This knowledge is valuable for identifying target markets, evaluating market potential, and developing effective marketing and business strategies. For example, a report on the market segmentation and customer preferences within the beauty and cosmetics industry can help you refine your product offerings.

Identifying opportunities and gaps

By examining existing data, secondary research enables you to identify market gaps, unmet needs, and untapped opportunities. It helps uncover areas where demand is not adequately addressed, revealing potential niches or underserved segments that you can target for growth and innovation. For example, a report on the growing demand for plant-based protein alternatives may prompt you to develop and launch a line of plant-based meat products.

Conducting preliminary research

Secondary research is often the first step in the research process. It allows you to gather preliminary information, explore different angles of your research topic, and develop hypotheses or research questions. It provides a foundation for further investigation and can guide the design of your primary research studies. For example, online forums and discussion boards related to parenting can provide valuable insights for a baby product manufacturer like yourself looking to develop new products that address parents' needs and concerns.

Supporting your primary research

Secondary research complements your primary research efforts by providing context, background information, and supporting evidence. It helps you gain insights into previous studies, theories, and findings related to your research topic. This knowledge can inform the design of your primary research studies, guide the development of your research instruments, and validate or challenge existing theories. For example, publicly available financial reports of competitors provide valuable insights into their market positioning, pricing strategies, and financial performance, aiding you in developing competitive strategies.

Pros of secondary market research:

Cost-effectiveness.

Conducting primary research can be expensive, while secondary research utilizes existing data sources that are more accessible and cost-effective. Businesses can leverage pre-existing data without the need for extensive data collection efforts, significantly reducing research costs.

Time efficiency

Primary research can be time-consuming, whereas secondary research offers a time-efficient alternative by utilizing readily available data. Researchers can access a wide range of information quickly, saving time and allowing for faster decision-making.

Wide range of data sources

Secondary research draws on diverse data sources, including government reports, industry publications, market research studies, academic journals, and online databases. These sources provide a wealth of information on market trends, consumer behavior, competitor analysis, and industry insights, enhancing the comprehensiveness and reliability of the research findings.

Historical perspective

Secondary research often includes historical data and trends, enabling businesses to analyze past patterns and make informed decisions based on historical insights. This longitudinal view provides a valuable perspective on market dynamics, industry shifts, and consumer behavior over time.

Broader scope and breadth of data

Secondary research allows businesses to access a wide range of data that may not be feasible to collect through primary research alone. It provides a broader scope of information, covering various industries, markets, and geographic regions, enabling comprehensive insights into market trends, consumer preferences, and competitive landscapes.

Cons of secondary market research:

Data relevance and quality.

The quality and relevance of secondary data can vary depending on the sources and specific research needs. It's important to critically evaluate the credibility and accuracy of the data to ensure its reliability.

Lack of customization

Secondary data is collected for general purposes and may not address specific research objectives or unique requirements. It may lack specific variables or insights crucial for a particular study, limiting the depth of analysis.

Limited control over data collection

Researchers have no control over the data collection process in secondary research. The data may not align perfectly with the research objectives or may not cover all aspects needed for a comprehensive analysis.

Outdated or incomplete information

Secondary data may become outdated or incomplete, especially when relying on older reports or sources that are no longer regularly updated. This can impact the accuracy and relevance of the findings.

Potential bias

Secondary data can carry inherent biases or limitations based on the methods and objectives of the original researchers. It's important to consider the context and potential biases associated with the data sources to avoid misleading or skewed interpretations.

Ways to do secondary market research

Published sources.

This includes books, newspapers, magazines, trade publications, academic journals, and reports from reputable sources. These sources provide a wealth of information on various industries, markets, trends, and consumer behavior.

Government sources

Government agencies often collect and publish data on demographics, economic indicators, market trends, regulations, and industry statistics. Examples include the U.S. Census Bureau, Bureau of Labor Statistics, and Department of Commerce.

Market research reports

Market research firms and organizations generate reports that provide in-depth analysis and insights on specific industries, markets, and consumer trends. These reports often include market size, growth rates, competitive analysis, and future projections.

Industry associations and trade organizations

Industry associations and trade organizations publish reports, surveys, and studies related to specific sectors. They often provide valuable industry-specific data, market trends, and best practices.

Academic research

Academic research articles and papers can offer valuable insights into specific topics or industries. They are typically published in academic journals and provide rigorous analysis and findings based on research conducted by scholars and experts.

Online database

Online databases such as market research databases, industry-specific portals, and data repositories provide access to a wide range of information. Examples include Statista, Euromonitor International, and Factiva.

Company websites and annual reports

Company websites and annual reports provide information about the company's performance, financials, products, and market positioning. They can offer insights into the company's strategies, market share, and competitive landscape.

S ocial media and online communities

Monitoring social media platforms, online forums, and communities can provide valuable insights into consumer opinions, preferences, and trends. It allows businesses to understand customer sentiment, identify emerging issues, and gather feedback.

Patent databases

Patent databases provide information on inventions, innovations, and technological developments. They can help businesses understand the competitive landscape, identify new technologies, and track industry trends.

Data aggregators

Data aggregators collect and compile data from various sources, such as government databases, surveys, and market research reports. They provide consolidated datasets that can be used for analysis and insights.

Why should you conduct secondary market research along with primary market research?

Comprehensive understanding.

Conducting both secondary and primary market research allows you to gain a comprehensive understanding of your research topic. Secondary research provides existing knowledge, theories, and findings related to your subject matter, while primary research allows you to gather specific insights directly from your target audience. By combining both approaches, you can develop a well-rounded understanding of the market, industry trends, and consumer behavior.

Identification of research gaps

Secondary research helps identify gaps in existing knowledge or areas that require further exploration. By reviewing previous studies, reports, and industry publications, you can identify areas where primary research can contribute new insights or validate existing findings. This integration of secondary and primary research ensures that your research addresses important gaps in the current understanding of the topic.

Research design and instrument development

Secondary research provides valuable insights into research design and instrument development. By examining previous studies and methodologies, you can refine your research objectives, select appropriate data collection methods, and design effective research instruments. Secondary research can guide the creation of questionnaires, surveys, interview protocols, or experimental designs for your primary research studies.

Validation and triangulation

Conducting secondary market research alongside primary research allows for data validation and triangulation. You can compare your primary research findings with existing secondary data to assess the consistency and reliability of your results. This validation process adds credibility to your research findings and strengthens the overall research outcomes. By triangulating data from different sources, you can ensure the reliability and accuracy of your research findings.

In-depth analysis and interpretation

Integrating secondary and primary research enables you to conduct in-depth analysis and interpretation of your findings. Secondary research provides a broader context and background information, while primary research offers specific insights from your target audience. By combining both types of data, you can gain a deeper understanding of the market dynamics, industry trends, and consumer preferences, and interpret your findings more comprehensively.

Enhanced credibility and robustness

By conducting both secondary and primary research, you enhance the credibility and robustness of your research. The integration of multiple data sources demonstrates a comprehensive approach to data collection and analysis, which increases the confidence in your research findings. This is particularly important when presenting your research to stakeholders or making strategic business decisions based on the research outcomes.

Richer insights and actionable recommendations

Combining secondary and primary market research allows you to generate richer insights and develop actionable recommendations. Secondary research provides a broader perspective on the market and industry, while primary research captures specific insights directly from your target audience. By integrating these two sources, you can develop a more nuanced understanding of consumer behavior, market trends, and competitive landscapes. This enables you to make informed decisions and develop actionable recommendations that are grounded in both existing knowledge and real-world insights.

In conclusion

By leveraging secondary research, businesses can make informed decisions, identify opportunities, and develop effective strategies. Furthermore, secondary research complements primary research efforts, offering context, supporting evidence, and guiding research design. Its cost-effectiveness, time efficiency, and historical perspective make it an invaluable asset for businesses seeking a comprehensive understanding of their market.

{{cta-button}}

secondary research government reports

Get your Product Pack Design tested against competitors

secondary research government reports

Got a question? Check out our FAQ’s

Maximize your research potential.

Experience why teams worldwide trust our Consumer & User Research solutions.

Book a Demo

Logo for UA Open Textbooks

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

4 Chapter 5 Secondary Research

Learning Objectives

By the end of this chapter, students must be able to:

  • Explain the concept of secondary research
  • Highlight the key benefits and limitations of secondary research
  • Evaluate different sources of secondary data

What is Secondary Research?

In situations where the researcher has not been involved in the data gathering process (primary research), one may have to rely on existing information and data to arrive at specific research conclusions or outcomes. Secondary research, also known as desk research, is a research method that involves the use of information previously collected for another research purpose.

In this chapter, we are going to explain what secondary research is, how it works, and share some examples of it in practice.

Marketing textbook © 2022  Western Sydney University taken by   Sally Tsoutas Western Sydney University Photographer  is licensed under an   Attribution-NonCommercial-NoDerivatives 4.0 International

Sources of secondary data.

The two main sources of secondary data are:

  • Internal sources
  • External sources

Internal sources of secondary data exist within the organization. There could be reports, previous research findings, or old documents which may still be used to understand a particular phenomenon. This information may only be available to the organization’s members and could be a valuable asset.

External sources of secondary data lie outside the organization and refer to information held at the public library, government departments, council offices, various associations as well as in newspapers or journal articles.

Benefits of using Secondary Data

It is only logical for researchers to look for secondary information thoroughly before investing their time and resources in collecting primary data.  In academic research, scholars are not permitted to move to the next stage till they demonstrate they have undertaken a review of all previous studies. Suppose a researcher would like to examine the characteristics of a migrant population in the Western Sydney region. The following pieces of information are already available in various reports generated from the Australian Bureau of Statistics’ census data:

  • Birthplace of residents
  • Language spoken at home by residents
  • Family size
  • Income levels
  • Level of education

By accessing such readily available secondary data, the researcher is able to save time, money, and effort. When the data comes from a reputable source, it further adds to the researchers’ credibility of identifying a trustworthy source of information.

Evaluation of Secondary Data

[1] Assessing secondary data is important. It may not always be available free of cost. The following factors must be considered as these relate to the reliability and validity of research results, such as whether:

  • the source is trusted
  • the sample characteristics, time of collection, and response rate (if relevant) of the data are appropriate
  • the methods of data collection are appropriate and acceptable in your discipline
  • the data were collected in a consistent way
  • any data coding or modification is appropriate and sufficient
  • the documentation of the original study in which the data were collected is detailed enough for you to assess its quality
  • there is enough information in the metadata or data to properly cite the original source.

In addition to the above-mentioned points, some practical issues which need to be evaluated include the cost of accessing and the time frame involved in getting access to the data is relevant.

Secondary Sources information A secondary source takes the accounts of multiple eyewtinesses or primary sources and creates a record that considers an event from different points of view. Secondary sources provide: Objectivity: Multiple points of view mitigate bias and provide a broader perspective. Context: Historical distance helps explain an event's significance. Common examples include: Books, Scholarly articles, documentaries and many other formats.

The infographic Secondary Sources created by Shonn M. Haren, 2015 is licensed under  a  Creative Commons Attribution 4.0 International Licence [2]

Table 2: differences between primary and secondary research.

First-hand research to collect data. May require a lot of time The research collects existing, published data. Requires less time
Creates raw data that the researcher owns The researcher has no control over data method or ownership
Relevant to the goals of the research May not be relevant to the goals of the research
The researcher conducts research. May be subject to researcher bias The researcher only uses findings of the research
Can be expensive to carry out More affordable due to access to free data (sometimes!)
  • Griffith University n.d., Research data: get started, viewed 28 February 2022,<https://libraryguides.griffith.edu.au/finddata>. ↵
  • Shonnmaren n.d., Secondary sources, viewed 28 February 2020, Wikimedia Commons, <https://commons.wikimedia.org/wiki/File:Secondary_Sources.png> ↵
  • Qualtrics XM n.d., S econdary research: definition, methods and examples , viewed 28 February 2022,  <https://www.qualtrics.com/au/experience-management/research/secondary-research/#:~:text=Unlike%20primary%20research%2C%20secondary%20research,secondary%20research%20have%20their%20places>. ↵

About the author

Contributor photo

name: Aila Khan

institution: Western Sydney University

Chapter 5 Secondary Research Copyright © by Aila Khan is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

U.S. flag

An official website of the United States government

Here's how you know

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Understanding the value of secondary research data June 28, 2023

smiling woman in black top with dark hair that is pulled back

“Reduce, reuse, recycle” isn’t just a good motto for preserving the environment, it’s also a smart scientific principle, thanks to the value of secondary research.

Secondary research uses existing data or specimens initially collected for purposes other than the planned (or primary) research. For example, the same specimens originally collected for a clinical trial could also be used in secondary genomic research. Secondary research maximizes the usefulness of data and unique specimens while minimizing risk to study volunteers since no new procedures are needed.

Through previous blogs, NIA provided updates and tips on the NIH Data Management and Sharing (DMS) Policy . That same policy also emphasizes the importance of sharing data gleaned from secondary research. It requires investigators, including those conducting secondary research, to describe the type of scientific data they plan to generate, and encourages good data sharing practices when performing secondary research. NIA is actively supporting secondary research through our recent Notice of Special Interest on the topic .

Advantages and challenges

Secondary research has several benefits:

  • Enables use of large-scale data sets or large samples of human or model organism specimens
  • Can be less expensive and time-consuming than primary data collection
  • May be simpler (and expedited) if an Institutional Review Board waives the need for informed consent for a secondary research project

Potential downsides to consider might include:

  • Original data may not be a perfect fit for your current research question or study design
  • Details on previous data collection procedures may be scarce
  • Data may potentially lack depth
  • Often requires special techniques for statistical data analysis

Know the rules of the road

As you consider secondary research, be sure to get familiar with related regulations and rules. There may be requirements to access and use secondary data or specimens as stipulated by NIH-supported scientific data repositories or other sources of information. Generally, data repositories with controlled access , such as the NIA Genetics of Alzheimer’s Disease Data Storage Site ( NIAGADS ) or the Database of Genotypes and Phenotypes , require investigators to sign a Data Use Certification Agreement (PDF, 775K) to ensure protection of sensitive data.

Additional potential requirements can include:

  • IRB approval to meet human subject protections (per regulation 45 CFR 46 )
  • NIH Institutional Certification (for large-scale genomic data generation)
  • Data Distribution Agreement (for NIAGADS) (PDF, 673K)
  • Attestation of Alzheimer’s Disease Genomics Sharing Plan (for Alzheimer’s and related dementias genomic research)
  • Cloud Use Statement and Cloud Server Provider Information (as applicable)
  • Possible participant consent

Reach out with questions!

With these guidelines in mind, secondary research can be quite valuable to your studies. If you have questions, please refer to the FAQs About Secondary Research or leave a comment below. For specific questions related to preparing a DMS plan for the generation of secondary data for your research, contact your NIA Program Officer .

Add new comment

A red asterisk ( * ) indicates a required field.

  • Allowed HTML tags: <p> <br>
  • No HTML tags allowed.
  • Lines and paragraphs break automatically.
  • Web page addresses and email addresses turn into links automatically.

nia.nih.gov

An official website of the National Institutes of Health

Banner

Secondary Sources & Legal Research

  • An Overview of Secondary Resources
  • Types of Secondary Sources
  • Legal Dictionaries
  • Legal Encyclopedias
  • Case Law Locators
  • American Law Reports
  • Restatements
  • Government Publications
  • Law Reviews & Journals
  • Legal Periodicals and Newspapers
  • Legal Blogs and Websites
  • Practitioner Materials
  • Evaluating Secondary Sources
  • Conclusion & Handout
  • Glossary of Terms & Useful Links

What are they?

Government publications encompass a variety of materials released by government agencies and bodies.

Government publications include reports, guidance documents, administrative decisions, and other materials published by government agencies and legislative bodies. For instance, Congressional Research Service (CRS) reports provide in-depth, non-partisan analysis on a broad range of policy and legal topics.

Importance in Legal Research

Government publications can be helpful in legal research as they provide authoritative and detailed information on legal and policy matters. They often contain original research, empirical data, and thorough analysis that can illuminate the legislative intent behind laws, clarify regulatory requirements, and offer insights into current policy debates.

How to find and use them?

Government publications are typically available through government websites and databases. For instance, CRS reports can be accessed directly through the CRS website or resources like the University of North Texas Libraries' CRS Report Archive. Utilizing these sources involves identifying the relevant agency or body, then searching their publications for information related to your research topic. As with all secondary sources, verify the information with primary sources where possible.

  • << Previous: Restatements
  • Next: Law Reviews & Journals >>
  • Last Updated: Nov 27, 2023 4:22 PM
  • URL: https://law.ubalt.libguides.com/SecondarySources

Banner

How To Do Secondary Research or a Literature Review

  • Secondary Research
  • Literature Review
  • Step 1: Develop topic
  • Step 2: Develop your search strategy
  • Step 3. Document search strategy and organize results
  • Systematic Literature Review Tips
  • More Information

Search our FAQ

Make a research appointment.

Schedule a personalized one-on-one research appointment with one of Galvin Library's research specialists.

What is Secondary Research?

Secondary research, also known as a literature review , preliminary research , historical research , background research , desk research , or library research , is research that analyzes or describes prior research. Rather than generating and analyzing new data, secondary research analyzes existing research results to establish the boundaries of knowledge on a topic, to identify trends or new practices, to test mathematical models or train machine learning systems, or to verify facts and figures. Secondary research is also used to justify the need for primary research as well as to justify and support other activities. For example, secondary research may be used to support a proposal to modernize a manufacturing plant, to justify the use of newly a developed treatment for cancer, to strengthen a business proposal, or to validate points made in a speech.

Why Is Secondary Research Important?

Because secondary research is used for so many purposes in so many settings, all professionals will be required to perform it at some point in their careers. For managers and entrepreneurs, regardless of the industry or profession, secondary research is a regular part of worklife, although parts of the research, such as finding the supporting documents, are often delegated to juniors in the organization. For all these reasons, it is essential to learn how to conduct secondary research, even if you are unlikely to ever conduct primary research.

Secondary research is also essential if your main goal is primary research. Research funding is obtained only by using secondary research to show the need for the primary research you want to conduct. In fact, primary research depends on secondary research to prove that it is indeed new and original research and not just a rehash or replication of somebody else’s work.

Creative Commons License

  • Next: Literature Review >>
  • Last Updated: Dec 21, 2023 3:46 PM
  • URL: https://guides.library.iit.edu/litreview
  • Privacy Policy

Research Method

Home » Secondary Data – Types, Methods and Examples

Secondary Data – Types, Methods and Examples

Table of Contents

Secondary Data

Secondary Data

Definition:

Secondary data refers to information that has been collected, processed, and published by someone else, rather than the researcher gathering the data firsthand. This can include data from sources such as government publications, academic journals, market research reports, and other existing datasets.

Secondary Data Types

Types of secondary data are as follows:

  • Published data: Published data refers to data that has been published in books, magazines, newspapers, and other print media. Examples include statistical reports, market research reports, and scholarly articles.
  • Government data: Government data refers to data collected by government agencies and departments. This can include data on demographics, economic trends, crime rates, and health statistics.
  • Commercial data: Commercial data is data collected by businesses for their own purposes. This can include sales data, customer feedback, and market research data.
  • Academic data: Academic data refers to data collected by researchers for academic purposes. This can include data from experiments, surveys, and observational studies.
  • Online data: Online data refers to data that is available on the internet. This can include social media posts, website analytics, and online customer reviews.
  • Organizational data: Organizational data is data collected by businesses or organizations for their own purposes. This can include data on employee performance, financial records, and customer satisfaction.
  • Historical data : Historical data refers to data that was collected in the past and is still available for research purposes. This can include census data, historical documents, and archival records.
  • International data: International data refers to data collected from other countries for research purposes. This can include data on international trade, health statistics, and demographic trends.
  • Public data : Public data refers to data that is available to the general public. This can include data from government agencies, non-profit organizations, and other sources.
  • Private data: Private data refers to data that is not available to the general public. This can include confidential business data, personal medical records, and financial data.
  • Big data: Big data refers to large, complex datasets that are difficult to manage and analyze using traditional data processing methods. This can include social media data, sensor data, and other types of data generated by digital devices.

Secondary Data Collection Methods

Secondary Data Collection Methods are as follows:

  • Published sources: Researchers can gather secondary data from published sources such as books, journals, reports, and newspapers. These sources often provide comprehensive information on a variety of topics.
  • Online sources: With the growth of the internet, researchers can now access a vast amount of secondary data online. This includes websites, databases, and online archives.
  • Government sources : Government agencies often collect and publish a wide range of secondary data on topics such as demographics, crime rates, and health statistics. Researchers can obtain this data through government websites, publications, or data portals.
  • Commercial sources: Businesses often collect and analyze data for marketing research or customer profiling. Researchers can obtain this data through commercial data providers or by purchasing market research reports.
  • Academic sources: Researchers can also obtain secondary data from academic sources such as published research studies, academic journals, and dissertations.
  • Personal contacts: Researchers can also obtain secondary data from personal contacts, such as experts in a particular field or individuals with specialized knowledge.

Secondary Data Formats

Secondary data can come in various formats depending on the source from which it is obtained. Here are some common formats of secondary data:

  • Numeric Data: Numeric data is often in the form of statistics and numerical figures that have been compiled and reported by organizations such as government agencies, research institutions, and commercial enterprises. This can include data such as population figures, GDP, sales figures, and market share.
  • Textual Data: Textual data is often in the form of written documents, such as reports, articles, and books. This can include qualitative data such as descriptions, opinions, and narratives.
  • Audiovisual Data : Audiovisual data is often in the form of recordings, videos, and photographs. This can include data such as interviews, focus group discussions, and other types of qualitative data.
  • Geospatial Data: Geospatial data is often in the form of maps, satellite images, and geographic information systems (GIS) data. This can include data such as demographic information, land use patterns, and transportation networks.
  • Transactional Data : Transactional data is often in the form of digital records of financial and business transactions. This can include data such as purchase histories, customer behavior, and financial transactions.
  • Social Media Data: Social media data is often in the form of user-generated content from social media platforms such as Facebook, Twitter, and Instagram. This can include data such as user demographics, content trends, and sentiment analysis.

Secondary Data Analysis Methods

Secondary data analysis involves the use of pre-existing data for research purposes. Here are some common methods of secondary data analysis:

  • Descriptive Analysis: This method involves describing the characteristics of a dataset, such as the mean, standard deviation, and range of the data. Descriptive analysis can be used to summarize data and provide an overview of trends.
  • Inferential Analysis: This method involves making inferences and drawing conclusions about a population based on a sample of data. Inferential analysis can be used to test hypotheses and determine the statistical significance of relationships between variables.
  • Content Analysis: This method involves analyzing textual or visual data to identify patterns and themes. Content analysis can be used to study the content of documents, media coverage, and social media posts.
  • Time-Series Analysis : This method involves analyzing data over time to identify trends and patterns. Time-series analysis can be used to study economic trends, climate change, and other phenomena that change over time.
  • Spatial Analysis : This method involves analyzing data in relation to geographic location. Spatial analysis can be used to study patterns of disease spread, land use patterns, and the effects of environmental factors on health outcomes.
  • Meta-Analysis: This method involves combining data from multiple studies to draw conclusions about a particular phenomenon. Meta-analysis can be used to synthesize the results of previous research and provide a more comprehensive understanding of a particular topic.

Secondary Data Gathering Guide

Here are some steps to follow when gathering secondary data:

  • Define your research question: Start by defining your research question and identifying the specific information you need to answer it. This will help you identify the type of secondary data you need and where to find it.
  • Identify relevant sources: Identify potential sources of secondary data, including published sources, online databases, government sources, and commercial data providers. Consider the reliability and validity of each source.
  • Evaluate the quality of the data: Evaluate the quality and reliability of the data you plan to use. Consider the data collection methods, sample size, and potential biases. Make sure the data is relevant to your research question and is suitable for the type of analysis you plan to conduct.
  • Collect the data: Collect the relevant data from the identified sources. Use a consistent method to record and organize the data to make analysis easier.
  • Validate the data: Validate the data to ensure that it is accurate and reliable. Check for inconsistencies, missing data, and errors. Address any issues before analyzing the data.
  • Analyze the data: Analyze the data using appropriate statistical and analytical methods. Use descriptive and inferential statistics to summarize and draw conclusions from the data.
  • Interpret the results: Interpret the results of your analysis and draw conclusions based on the data. Make sure your conclusions are supported by the data and are relevant to your research question.
  • Communicate the findings : Communicate your findings clearly and concisely. Use appropriate visual aids such as graphs and charts to help explain your results.

Examples of Secondary Data

Here are some examples of secondary data from different fields:

  • Healthcare : Hospital records, medical journals, clinical trial data, and disease registries are examples of secondary data sources in healthcare. These sources can provide researchers with information on patient demographics, disease prevalence, and treatment outcomes.
  • Marketing : Market research reports, customer surveys, and sales data are examples of secondary data sources in marketing. These sources can provide marketers with information on consumer preferences, market trends, and competitor activity.
  • Education : Student test scores, graduation rates, and enrollment statistics are examples of secondary data sources in education. These sources can provide researchers with information on student achievement, teacher effectiveness, and educational disparities.
  • Finance : Stock market data, financial statements, and credit reports are examples of secondary data sources in finance. These sources can provide investors with information on market trends, company performance, and creditworthiness.
  • Social Science : Government statistics, census data, and survey data are examples of secondary data sources in social science. These sources can provide researchers with information on population demographics, social trends, and political attitudes.
  • Environmental Science : Climate data, remote sensing data, and ecological monitoring data are examples of secondary data sources in environmental science. These sources can provide researchers with information on weather patterns, land use, and biodiversity.

Purpose of Secondary Data

The purpose of secondary data is to provide researchers with information that has already been collected by others for other purposes. Secondary data can be used to support research questions, test hypotheses, and answer research objectives. Some of the key purposes of secondary data are:

  • To gain a better understanding of the research topic : Secondary data can be used to provide context and background information on a research topic. This can help researchers understand the historical and social context of their research and gain insights into relevant variables and relationships.
  • To save time and resources: Collecting new primary data can be time-consuming and expensive. Using existing secondary data sources can save researchers time and resources by providing access to pre-existing data that has already been collected and organized.
  • To provide comparative data : Secondary data can be used to compare and contrast findings across different studies or datasets. This can help researchers identify trends, patterns, and relationships that may not have been apparent from individual studies.
  • To support triangulation: Triangulation is the process of using multiple sources of data to confirm or refute research findings. Secondary data can be used to support triangulation by providing additional sources of data to support or refute primary research findings.
  • To supplement primary data : Secondary data can be used to supplement primary data by providing additional information or insights that were not captured by the primary research. This can help researchers gain a more complete understanding of the research topic and draw more robust conclusions.

When to use Secondary Data

Secondary data can be useful in a variety of research contexts, and there are several situations in which it may be appropriate to use secondary data. Some common situations in which secondary data may be used include:

  • When primary data collection is not feasible : Collecting primary data can be time-consuming and expensive, and in some cases, it may not be feasible to collect primary data. In these situations, secondary data can provide valuable insights and information.
  • When exploring a new research area : Secondary data can be a useful starting point for researchers who are exploring a new research area. Secondary data can provide context and background information on a research topic, and can help researchers identify key variables and relationships to explore further.
  • When comparing and contrasting research findings: Secondary data can be used to compare and contrast findings across different studies or datasets. This can help researchers identify trends, patterns, and relationships that may not have been apparent from individual studies.
  • When triangulating research findings: Triangulation is the process of using multiple sources of data to confirm or refute research findings. Secondary data can be used to support triangulation by providing additional sources of data to support or refute primary research findings.
  • When validating research findings : Secondary data can be used to validate primary research findings by providing additional sources of data that support or refute the primary findings.

Characteristics of Secondary Data

Secondary data have several characteristics that distinguish them from primary data. Here are some of the key characteristics of secondary data:

  • Non-reactive: Secondary data are non-reactive, meaning that they are not collected for the specific purpose of the research study. This means that the researcher has no control over the data collection process, and cannot influence how the data were collected.
  • Time-saving: Secondary data are pre-existing, meaning that they have already been collected and organized by someone else. This can save the researcher time and resources, as they do not need to collect the data themselves.
  • Wide-ranging : Secondary data sources can provide a wide range of information on a variety of topics. This can be useful for researchers who are exploring a new research area or seeking to compare and contrast research findings.
  • Less expensive: Secondary data are generally less expensive than primary data, as they do not require the researcher to incur the costs associated with data collection.
  • Potential for bias : Secondary data may be subject to biases that were present in the original data collection process. For example, data may have been collected using a biased sampling method or the data may be incomplete or inaccurate.
  • Lack of control: The researcher has no control over the data collection process and cannot ensure that the data were collected using appropriate methods or measures.
  • Requires careful evaluation : Secondary data sources must be evaluated carefully to ensure that they are appropriate for the research question and analysis. This includes assessing the quality, reliability, and validity of the data sources.

Advantages of Secondary Data

There are several advantages to using secondary data in research, including:

  • Time-saving : Collecting primary data can be time-consuming and expensive. Secondary data can be accessed quickly and easily, which can save researchers time and resources.
  • Cost-effective: Secondary data are generally less expensive than primary data, as they do not require the researcher to incur the costs associated with data collection.
  • Large sample size : Secondary data sources often have larger sample sizes than primary data sources, which can increase the statistical power of the research.
  • Access to historical data : Secondary data sources can provide access to historical data, which can be useful for researchers who are studying trends over time.
  • No ethical concerns: Secondary data are already in existence, so there are no ethical concerns related to collecting data from human subjects.
  • May be more objective : Secondary data may be more objective than primary data, as the data were not collected for the specific purpose of the research study.

Limitations of Secondary Data

While there are many advantages to using secondary data in research, there are also some limitations that should be considered. Some of the main limitations of secondary data include:

  • Lack of control over data quality : Researchers do not have control over the data collection process, which means they cannot ensure the accuracy or completeness of the data.
  • Limited availability: Secondary data may not be available for the specific research question or study design.
  • Lack of information on sampling and data collection methods: Researchers may not have access to information on the sampling and data collection methods used to gather the secondary data. This can make it difficult to evaluate the quality of the data.
  • Data may not be up-to-date: Secondary data may not be up-to-date or relevant to the current research question.
  • Data may be incomplete or inaccurate : Secondary data may be incomplete or inaccurate due to missing or incorrect data points, data entry errors, or other factors.
  • Biases in data collection: The data may have been collected using biased sampling or data collection methods, which can limit the validity of the data.
  • Lack of control over variables: Researchers have limited control over the variables that were measured in the original data collection process, which can limit the ability to draw conclusions about causality.

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Qualitative Data

Qualitative Data – Types, Methods and Examples

Research Data

Research Data – Types Methods and Examples

Primary Data

Primary Data – Types, Methods and Examples

Research Information

Information in Research – Types and Examples

Quantitative Data

Quantitative Data – Types, Methods and Examples

Banner

  • Teesside University Student & Library Services
  • Learning Hub Group

Research Methods

Secondary research.

  • Primary Research

What is Secondary Research?

Advantages and disadvantages of secondary research, secondary research in literature reviews, secondary research - going beyond literature reviews, main stages of secondary research, useful resources, using material on this page.

  • Quantitative Research This link opens in a new window
  • Qualitative Research This link opens in a new window
  • Being Critical This link opens in a new window
  • Subject LibGuides This link opens in a new window

Pile of books on a desk with a person behind them

Secondary research

Secondary research uses research and data that has already been carried out. It is sometimes referred to as desk research. It is a good starting point for any type of research as it enables you to analyse what research has already been undertaken and identify any gaps. 

You may only need to carry out secondary research for your assessment or you may need to use secondary research as a starting point, before undertaking your own primary research .

Searching for both primary and secondary sources can help to ensure that you are up to date with what research has already been carried out in your area of interest and to identify the key researchers in the field.

"Secondary sources are the books, articles, papers and similar materials written or produced by others that help you to form your background understanding of the subject. You would use these to find out about experts’ findings, analyses or perspectives on the issue and decide whether to draw upon these explicitly in your research." (Cottrell, 2014, p. 123).

Examples of secondary research sources include:.

  • journal articles
  • official statistics, such as government reports or organisations which have collected and published data

Primary research  involves gathering data which has not been collected before. Methods to collect it can include interviews, focus groups, controlled trials and case studies. Secondary research often comments on and analyses this primary research.

Gopalakrishnan and Ganeshkumar (2013, p. 10) explain the difference between primary and secondary research:

"Primary research is collecting data directly from patients or population, while secondary research is the analysis of data already collected through primary research. A review is an article that summarizes a number of primary studies and may draw conclusions on the topic of interest which can be traditional (unsystematic) or systematic".

Secondary Data

As secondary data has already been collected by someone else for their research purposes, it may not cover all of the areas of interest for your research topic. This research will need to be analysed alongside other research sources and data in the same subject area in order to confirm, dispute or discuss the findings in a wider context.

"Secondary source data, as the name infers, provides second-hand information. The data come ‘pre-packaged’, their form and content reflecting the fact that they have been produced by someone other than the researcher and will not have been produced specifically for the purpose of the research project. The data, none the less, will have some relevance for the research in terms of the information they contain, and the task for the researcher is to extract that information and re-use it in the context of his/her own research project." (Denscombe, 2021, p. 268)

In the video below Dr. Benedict Wheeler (Senior Research Fellow at the European Center for Environment and Human Health at the University of Exeter Medical School) discusses secondary data analysis. Secondary data was used for his research on how the environment affects health and well-being and utilising this secondary data gave access to a larger data set.

As with all research, an important part of the process is to critically evaluate any sources you use. There are tools to help with this in the  Being Critical  section of the guide.

Louise Corti, from the UK Data Archive, discusses using secondary data  in the video below. T he importance of evaluating secondary research is discussed - this is to ensure the data is appropriate for your research and to investigate how the data was collected.

There are advantages and disadvantages to secondary research:

Advantages:

  • Usually low cost
  • Easily accessible
  • Provides background information to clarify / refine research areas
  • Increases breadth of knowledge
  • Shows different examples of research methods
  • Can highlight gaps in the research and potentially outline areas of difficulty
  • Can incorporate a wide range of data
  • Allows you to identify opposing views and supporting arguments for your research topic
  • Highlights the key researchers and work which is being undertaken within the subject area
  • Helps to put your research topic into perspective

Disadvantages

  • Can be out of date
  • Might be unreliable if it is not clear where or how the research has been collected - remember to think critically
  • May not be applicable to your specific research question as the aims will have had a different focus

Literature reviews 

Secondary research for your major project may take the form of a literature review . this is where you will outline the main research which has already been written on your topic. this might include theories and concepts connected with your topic and it should also look to see if there are any gaps in the research., as the criteria and guidance will differ for each school, it is important that you check the guidance which you have been given for your assessment. this may be in blackboard and you can also check with your supervisor..

The videos below include some insights from academics regarding the importance of literature reviews.

Malcolm Williams, Professor and Director of the Cardiff School of Social Sciences, discusses how to build upon previous research by conducting a thorough literature review. Professor Geoff Payne discusses research design and how the literature review can help determine what research methods to use as well as help to further plan your project.

Secondary research which goes beyond literature reviews

For some dissertations/major projects there might only be a literature review (discussed above ). For others there could be a literature review followed by primary research and for others the literature review might be followed by further secondary research. 

You may be asked to write a literature review which will form a background chapter to give context to your project and provide the necessary history for the research topic. However, you may then also be expected to produce the rest of your project using additional secondary research methods, which will need to produce results and findings which are distinct from the background chapter t o avoid repetition .

Remember, as the criteria and guidance will differ for each School, it is important that you check the guidance which you have been given for your assessment. This may be in Blackboard and you can also check with your supervisor.

Although this type of secondary research will go beyond a literature review, it will still rely on research which has already been undertaken. And,  "just as in primary research, secondary research designs can be either quantitative, qualitative, or a mixture of both strategies of inquiry" (Manu and Akotia, 2021, p. 4).

Your secondary research may use the literature review to focus on a specific theme, which is then discussed further in the main project. Or it may use an alternative approach. Some examples are included below.  Remember to speak with your supervisor if you are struggling to define these areas.

Some approaches of how to conduct secondary research include:

  • A systematic review is a structured literature review that involves identifying all of the relevant primary research using a rigorous search strategy to answer a focused research question.
  • This involves comprehensive searching which is used to identify themes or concepts across a number of relevant studies. 
  • The review will assess the q uality of the research and provide a summary and synthesis of all relevant available research on the topic.
  • The systematic review  LibGuide goes into more detail about this process (The guide is aimed a PhD/Researcher students. However, students on other levels of study may find parts of the guide helpful too).
  • Scoping reviews aim to identify and assess available research on a specific topic (which can include ongoing research). 
  • They are "particularly useful when a body of literature has not yet been comprehensively reviewed, or exhibits a complex or heterogeneous nature not amenable to a more precise systematic review of the evidence. While scoping reviews may be conducted to determine the value and probable scope of a full systematic review, they may also be undertaken as exercises in and of themselves to summarize and disseminate research findings, to identify research gaps, and to make recommendations for the future research."  (Peters et al., 2015) .
  • This is designed to  summarise the current knowledge and provide priorities for future research.
  • "A state-of-the-art review will often highlight new ideas or gaps in research with no official quality assessment." ( MacAdden, 2020).
  • "Bibliometric analysis is a popular and rigorous method for exploring and analyzing large volumes of scientific data." (Donthu et al., 2021)
  • Quantitative methods and statistics are used to analyse the bibliographic data of published literature. This can be used to measure the impact of authors, publications, or topics within a subject area.

The bibliometric analysis often uses the data from a citation source such as Scopus or Web of Science .

  • This is a technique used to combine the statistic results of prior quantitative studies in order to increase precision and validity.
  • "It goes beyond the parameters of a literature review, which assesses existing literature, to actually perform calculations based on the results collated, thereby coming up with new results" (Curtis and Curtis, 2011, p. 220)

(Adapted from: Grant and Booth, 2009, cited in Sarhan and Manu, 2021, p. 72)

  • Grounded Theory is used to create explanatory theory from data which has been collected.
  • "Grounded theory data analysis strategies can be used with different types of data, including secondary data." (Whiteside, Mills and McCalman, 2012)
  • This allows you to use a specific theory or theories which can then be applied to your chosen topic/research area.
  • You could focus on one case study which is analysed in depth, or you could examine more than one in order to compare and contrast the important aspects of your research question.
  • "Good case studies often begin with a predicament that is poorly comprehended and is inadequately explained or traditionally rationalised by numerous conflicting accounts. Therefore, the aim is to comprehend an existent problem and to use the acquired understandings to develop new theoretical outlooks or explanations."  (Papachroni and Lochrie, 2015, p. 81)

Main stages of secondary research for a dissertation/major project

In general, the main stages for conducting secondary research for your dissertation or major project will include:

or ) before you dedicate too much time to your research, to make sure there is adequate published research available in that area.

,  or . You will need to justify which choice you make.

databases for your subject area. Use your   to identify these.   

 

Click on the image below to access the reading list which includes resources used in this guide as well as some additional useful resources.

Link to online reading list of additional resources and further reading

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License .

  • << Previous: Primary Research
  • Next: Quantitative Research >>
  • Last Updated: Apr 29, 2024 4:47 PM
  • URL: https://libguides.tees.ac.uk/researchmethods

Logo for Kwantlen Polytechnic University

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

Integrated Primary & Secondary Research

5 Types of Secondary Research Data

An overhead shot of a white man highlighting notes on a desk covered in sketch paper, sticky notes, pens, an iPhone, and a Mac desktop

Secondary sources allow you to broaden your research by providing background information, analyses, and unique perspectives on various elements for a specific campaign. Bibliographies of these sources can lead to the discovery of further resources to enhance research for organizations.

There are two common types of secondary data: Internal data and External data. Internal data is the information that has been stored or organized by the organization itself. External data is the data organized or collected by someone else.

Internal Secondary Sources

Internal secondary sources include databases containing reports from individuals or prior research. This is often an overlooked resource—it’s amazing how much useful information collects dust on an organization’s shelves! Other individuals may have conducted research of their own or bought secondary research that could be useful to the task at hand. This prior research would still be considered secondary even if it were performed internally because it was conducted for a different purpose.

External Secondary Sources

A wide range of information can be obtained from secondary research. Reliable databases for secondary sources include Government Sources, Business Source Complete, ABI, IBISWorld, Statista, and CBCA Complete. This data is generated by others but can be considered useful when conducting research into a new scope of the study. It also means less work for a non-for-profit organization as they would not have to create their own data and instead can piggyback off the data of others.

Examples of Secondary Sources

Government sources.

A lot of secondary data is available from the government, often for free, because it has already been paid for by tax dollars. Government sources of data include the Census Bureau, the Bureau of Labor Statistics, and the National Centre for Health Statistics.

For example, through the Census Bureau, the Bureau of Labor Statistics regularly surveys individuals to gain information about them (Bls.gov, n.d). These surveys are conducted quarterly, through an interview survey and a diary survey, and they provide data on expenditures, income, and household information (families or single). Detailed tables of the Expenditures Reports include the age of the reference person, how long they have lived in their place of residence and which geographic region they live in.

Syndicated Sources

A syndicated survey is a large-scale instrument that collects information about a wide variety of people’s attitudes and capital expenditures. The Simmons Market Research Bureau conducts a National Consumer Survey by randomly selecting families throughout the country that agree to report in great detail what they eat, read, watch, drive, and so on. They also provide data about their media preferences.

Other Types of Sources

Gallup, which has a rich tradition as the world’s leading public opinion pollster, also provides in-depth reports based on its proprietary probability-based techniques (called the Gallup Panel), in which respondents are recruited through a random digit dial method so that results are more reliably generalizable. The Gallup organization operates one of the largest telephone research data-collection systems in the world, conducting more than twenty million interviews over the last five years and averaging ten thousand completed interviews per day across two hundred individual survey research questionnaires (GallupPanel, n.d).

Attribution

This page contains materials taken from:

Bls.gov. (n.d). U.S Bureau of Labor Statistics. Retrieved from https://www.bls.gov/

Define Quantitative and Qualitative Evidence. (2020). Retrieved July 23, 2020, from http://sgba-resource.ca/en/process/module-8-evidence/define-quantitative-and-qualitative-evidence/

GallupPanel. (n.d). Gallup Panel Research. Retrieved from http://www.galluppanel.com

Secondary Data. (2020). Retrieved July 23, 2020, from https://2012books.lardbucket.org/books/advertising-campaigns-start-to-finish/s08-03-secondary-data.html

An Open Guide to Integrated Marketing Communications (IMC) Copyright © by Andrea Niosi and KPU Marketing 4201 Class of Summer 2020 is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

Find Study Materials for

  • Explanations
  • Business Studies
  • Combined Science
  • Computer Science
  • Engineering
  • English literature
  • Environmental Science
  • Human Geography
  • Macroeconomics
  • Microeconomics
  • Social Studies
  • Browse all subjects
  • Textbook Solutions
  • Read our Magazine

Create Study Materials

  • Flashcards Create and find the best flashcards.
  • Notes Create notes faster than ever before.
  • Study Sets Everything you need for your studies in one place.
  • Study Plans Stop procrastinating with our smart planner features.
  • Secondary Market Research

Market research can be overwhelming, but the good news is you don't have to start from scratch. Using secondary market research methods can save a lot of time and effort.  Explore various methods and sources of secondary market research, and discover how to conduct secondary market research like a pro. Weigh the advantages and disadvantages, and learn from real-life secondary market research examples. 

Secondary Market Research

Create learning materials about Secondary Market Research with our free learning app!

  • Instand access to millions of learning materials
  • Flashcards, notes, mock-exams and more
  • Everything you need to ace your exams
  • Customer Driven Marketing Strategy
  • Digital Marketing
  • Integrated Marketing Communications
  • International Marketing
  • Introduction to Marketing
  • Marketing Campaign Examples
  • Marketing Information Management
  • Behavioral Targeting
  • Customer Relationship Management
  • Ethics in Marketing
  • Experimental Research
  • Focus Groups
  • Interview in Research
  • Market Calculations
  • Market Mapping
  • Market Research
  • Marketing Analytics
  • Marketing Information System
  • Marketing KPIs
  • Methods of Market Research
  • Multi level Marketing
  • Neuromarketing
  • Observational Research
  • Online Focus Groups
  • PED and YED
  • Primary Market Research
  • Research Instrument
  • Sampling Plan
  • Survey Research
  • Understanding Markets and Customers
  • Marketing Management
  • Strategic Marketing Planning

Secondary Market Research Definition

Secondary market research is when a company uses existing information from other sources, like reports, articles, or surveys, to learn about its market and customers. It's like using someone else's research to help you make decisions for your business, without having to collect new data yourself.

Secondary market research, also known as desk research, is the process of gathering, analyzing, and interpreting data already collected by others, including industry reports, government publications, academic research, and competitor analysis, to inform business decisions.

Secondary market research is one of the two types of market research that businesses can employ to get necessary information/data about a target market. Market research refers to the process of studying a target market and determining the market value or viability of a new product or service in said market.

Imagine a world where everyone loves a specific type of fruit, but no one knows which is the most popular. A juice company wants to create a new drink that will be a hit with customers. Instead of conducting their own survey, they find a recent study on fruit preferences conducted by a well-known research firm. The study reveals that strawberry is the most popular fruit, so the juice company decides to launch a new strawberry-flavored drink based on the findings from the secondary market research.

While most secondary data is inexpensive to obtain, companies may have to pay a secondary data provider to access specialized data.

Nielsen and Gartner are two major global secondary data suppliers. Nielsen is a US-based company offering marketing and media information analytics for clients in over 100 countries. Here, researchers can retrieve plenty of data on customer behavior - where they buy their products, what they watch or listen to daily, etc. Gartner is a subscription-based service that provides access to on-demand published research content produced by over 2,300 research experts worldwide. 1

Secondary Market Research Methods

Secondary market research involves gathering data from existing sources to gain insights into a market, industry, or consumer behavior. The methods for secondary market research include:

Analyzing industry reports

Industry reports provide comprehensive information on a specific industry, including market size, growth trends, competitive landscape, and future prospects. For example, a company planning to enter the electric vehicle market can review industry reports to understand the current market situation, demand patterns, and major players.

Analyzing government publications

Government agencies publish a wide range of data and statistics on various industries, demographics, and economic indicators. For example, a startup looking to open a retail store in a new location can access census data to evaluate the area's population, income levels, and age distribution.

Reviewing academic research

Universities and research institutions often conduct studies and publish research papers on various topics, including consumer behavior, market trends, and emerging technologies. For instance, a pharmaceutical company can review academic research on the effectiveness of different drug delivery methods to inform their product development.

Reviewing trade publications

Trade publications are industry-specific magazines, journals, or newsletters that provide insights, news, and analysis relevant to a particular sector. A company in the solar energy sector can follow trade publications to stay informed about the latest technological advancements, regulations, and market trends.

Performing competitors analysis

Analyzing competitors' websites, press releases, marketing materials, and financial reports can provide valuable information about their strategies, product offerings, and performance. For example, a company looking to launch a new software-as-a-service platform can review competitors' websites to understand their pricing structures, features, and target audience.

Monitoring media coverage

News articles, blog posts, and social media can provide insights into public sentiment, emerging trends, and relevant events. A food industry company can monitor media coverage to identify popular diets or consumer preferences, helping them adjust their product offerings accordingly.

By using these secondary market research methods, businesses can gather valuable data to inform their decision-making and strategy development without the need for time-consuming and costly primary research.

Secondary Market Research Sources

The method of gathering secondary data is relatively straightforward. You simply go to reliable sources and search for secondary data relevant to your needs.

  • Public sources: These are freely available resources such as government publications, websites, public libraries, and statistics bureaus. Examples include the U.S. Census Bureau, Eurostat, or the World Bank.
  • Commercial sources: These are paid sources of information, such as industry reports, market research databases, and subscription-based publications. Examples include Statista, IBISWorld, or Gartner.
  • Educational institutions: Academic research papers, dissertations, and theses can be valuable sources of secondary market research. University libraries, online academic databases, and research repositories like JSTOR, Google Scholar, or SSRN provide access to such resources.
  • Trade associations and organizations: Industry-specific associations and organizations often publish relevant data, reports, and newsletters that can be useful for secondary market research. Examples include the National Retail Federation, the Consumer Technology Association, or the American Marketing Association.
  • News and media outlets: Newspapers, magazines, news websites, and television networks provide information about current events, trends, and public sentiment. Examples include The New York Times, Forbes, or CNBC.
  • Social media platforms: Social media platforms like Facebook, Twitter, LinkedIn, or Instagram can be valuable sources for gathering insights on public opinion, consumer preferences, and emerging trends.

These sources of secondary market research can be accessed and utilized through the methods mentioned earlier, such as analyzing industry reports, reviewing academic research, or monitoring media coverage.

Like primary data, secondary data can be qualitative or quantitative. Qualitative data is descriptive and answers the question "why" or "how", whereas quantitative data is related to numbers and tell us "how many", "how much", and "how often". Qualitative data is acquired via interviews or observations, whereas quantitative data mainly comes from statistics or sales reports. Both types of data are essential to secondary research and help researchers analyze different angles of the research problem. 3

How to Conduct Secondary Market Research

The process of conducting secondary market research consists of five steps:

1. Define the research needs

Before doing any research, the researchers need to clarify the research objectives. Why are we conducting research? How can it contribute to the marketing effort?

2. Choose research sources

Once the research's purpose is identified, researchers need to choose the data sources. The sources can be internal, external, or both. One thing to note is that sources can vary widely in quality and credibility. For example, business magazines and published journals have a high-credibility score, whereas a random article or blog post on the Internet might be biased and unreliable.

How to evaluate research sources

To determine the quality of secondary data sources, the researcher needs to consider the following:

  • Research purpose
  • The audience
  • Authority and creditability (Author's qualification, publisher)
  • Accuracy and reliability (Citations and data collection methods).
  • Currency (Date of publishing)
  • Bias (Opinions or facts). 4

3. Collect secondary data

The next step is to select data based on the company's needs. The quickest way to source high-quality information is to look at online commercial databases or to buy from secondary data agencies. However, search engines and libraries are also a big help when the company is tight on budget.

4. Combine acquired data

The information collected should be grouped in the same categories or format. This grouping approach simplifies the secondary data analysis and cuts out content that does not contribute to research.

5. Analyze data

Finally, the data needs to be analyzed to answer the research question. If the research needs are unmet, researchers need to look for more secondary data or create their own. Researchers can also use secondary data to form a new perspective and help expand the understanding of a topic.

6. Provide conclusions and feedback

Once the data has been collected and analyzed, researchers must formulate and organize it to allow managers to draw feedback from it. Oftentimes a research report is created that outlines results, conclusions, implications, and recommendations.

Advantages of Secondary Market Research

Secondary market research can be an efficient and cost-effective way for businesses to gather information about their market, competitors, and customers using existing data sources.

Cost-effective

Secondary research is generally less expensive than primary research, as businesses can access existing data sources without spending resources on data collection. For example, a startup can use free government publications to analyze the demographics of their target market without incurring additional costs.

Time-saving

Secondary research provides businesses with immediate access to data, eliminating the need for planning and conducting time-consuming primary research. For example, a company can quickly gather industry insights from published reports instead of waiting for the results of a custom survey.

Broad scope

Secondary research can provide a comprehensive overview of market trends, historical data, and industry benchmarks. For example, a business can examine multiple years of financial reports from competitors to understand long-term growth patterns and financial stability.

Disadvantages of Secondary Market Research

While secondary market research offers several benefits, it's essential to be aware of the potential disadvantages as well. Here, we'll explore the disadvantages of secondary market research.

Lack of specificity

Secondary research may not address specific research questions or cater to unique business needs, as it wasn't collected for that purpose. For example, a company seeking information on customer preferences for a niche product may struggle to find relevant secondary data.

Potential outdated information

Since secondary research relies on existing data, it may be outdated or no longer relevant, limiting its usefulness for decision-making. For example, a business looking to enter a rapidly changing market may find that historical data is insufficient for understanding current trends.

Quality and reliability concerns

The accuracy and reliability of secondary research data may vary, as businesses have limited control over how the data is collected and analyzed. For example, a company using data from an unknown source might not be able to trust its accuracy or validity.

To address the disadvantages of secondary research, researchers should accompany it with primary research - collecting original data through interviews, observations, etc. Using high-quality sources to minimize biases and incorrect information is also crucial.

Secondary Market Research Examples

Examples of secondary market research include a nalyzing industry reports, r eviewing academic research and trade publications as well as p erforming competitors analysis or monitoring media coverage. Let' see how companies can use them in action.

Market trend analysis for a fashion retailer

A fashion retailer wants to identify upcoming trends and preferences to inform their product line and marketing strategy. They conduct secondary market research by reviewing fashion magazines, blog posts, and social media influencers' content. This helps them identify popular colors, patterns, and styles, which they incorporate into their clothing designs and promotional campaigns.

Secondary Market Research Reviewing fashion magazines Vaia

Competitor pricing analysis for a new restaurant

A new restaurant wants to understand the pricing strategies of its local competitors to determine its own menu pricing. They gather secondary data by visiting competitors' websites and analyzing their menus. Based on this research, the restaurant can set competitive prices that appeal to customers and position themselves effectively in the market.

Industry growth projections for a renewable energy startup

A renewable energy startup is seeking investment and wants to demonstrate the potential for growth in its industry. They conduct secondary market research by examining government publications, industry reports, and academic research on renewable energy trends and projections. This data allows them to present a compelling case to potential investors, showcasing the promising growth trajectory of the renewable energy sector.

Secondary Market Research - Key Takeaways

  • Secondary market researc h is the process of gathering, analyzing, and interpreting data already collected by others, including industry reports, government publications, academic research, and competitor analysis, to inform business decisions.
  • The main benefit of secondary market research is saving research time and costs.
  • Examples of secondary market research include a nalyzing industry reports, r eviewing academic research and trade publications as well as p erforming competitors analysis or monitoring media coverage
  • There are five steps of primary market research : defining the research needs, choosing sources, collecting secondary data, combining the acquired data, and analyzing it.
  • The main disadvantage of secondary research is that the data may be outdated, not specific or low quality.
  • Software Testing Help, Top 10 Market Research Companies [2022 Review & Comparison], 2022.
  • The-Definition, Commercial online database, 2022.
  • Career Foundry, Quantitative vs Qualitative Data: What's the Difference?, 2022.
  • Brock University Library, Evaluating Information Sources, n.d.

Flashcards in Secondary Market Research 13

Secondary data gathering   involves using ___________ collected by someone else for another purpose.

existing data

Primary research uses _________ while secondary uses ___________.

original data, existing data

Companies can buy secondary data from a third-party provider. 

Secondary data can be obtained using __________, _________, or buying from a supplier. 

search engines, commercial online databases

Government censuses are an internal source of secondary data. 

What is not an EXTERNAL source of secondary data?

Buyer personas

Secondary Market Research

Learn with 13 Secondary Market Research flashcards in the free Vaia app

We have 14,000 flashcards about Dynamic Landscapes.

Already have an account? Log in

Frequently Asked Questions about Secondary Market Research

What is an example of gathering secondary data?

Online resources and resource banks such as Google Scholar, Research Gate, Euromonitor, and Statista provide relevant information and data on a target market for businesses. For example, data can be gathered from Statista to identify the services or products customers spent the most money on in a certain industry during a set period of time. 

What are the 5 methods of collecting secondary data?

Some of the methods of collecting secondary data include buying them from a supplier, searching commercial online databases, using search engines, or using internal sources like sales reports, customer feedback, etc.

What is secondary research in marketing?

Secondary market research, also known as desk research, involves businesses using existing information or data. This type of data has already been collected by someone other than the researcher.

What should be in secondary marketing research?

To conduct secondary research marketers have to define the research needs, choose research sources, collect the data, combine the data, analyze the data, and come up with conclusions.

What is an example of secondary market research?

An example of secondary market research includes searching online commercial databases. These are archives of data from commercial sources on the Internet. Researchers can dig around these databases to find their secondary data sources.

What useful data can marketers gather from suppliers and distributors?

Marketers can gather useful data from suppliers and distributors on product availability, pricing, customer preferences, and emerging trends in the industry.

How does secondary research help a business?

Secondary research helps a business by providing cost-effective and time-saving access to existing data on market trends, competitors, and customer behavior to inform decision-making and strategy development.

Why is secondary market research important?

Secondary market research is important because it allows businesses to gain valuable insights and understanding of their market, competition, and customers without the need for resource-intensive primary research.

Test your knowledge with multiple choice flashcards

Secondary Market Research

Join the Vaia App and learn efficiently with millions of flashcards and more!

Keep learning, you are doing great.

Discover learning materials with the free Vaia app

1

Vaia is a globally recognized educational technology company, offering a holistic learning platform designed for students of all ages and educational levels. Our platform provides learning support for a wide range of subjects, including STEM, Social Sciences, and Languages and also helps students to successfully master various tests and exams worldwide, such as GCSE, A Level, SAT, ACT, Abitur, and more. We offer an extensive library of learning materials, including interactive flashcards, comprehensive textbook solutions, and detailed explanations. The cutting-edge technology and tools we provide help students create their own learning materials. StudySmarter’s content is not only expert-verified but also regularly updated to ensure accuracy and relevance.

Secondary Market Research

Vaia Editorial Team

Team Secondary Market Research Teachers

  • 12 minutes reading time
  • Checked by Vaia Editorial Team

Study anywhere. Anytime.Across all devices.

Create a free account to save this explanation..

Save explanations to your personalised space and access them anytime, anywhere!

By signing up, you agree to the Terms and Conditions and the Privacy Policy of Vaia.

Sign up to highlight and take notes. It’s 100% free.

Join over 22 million students in learning with our Vaia App

The first learning app that truly has everything you need to ace your exams in one place

  • Flashcards & Quizzes
  • AI Study Assistant
  • Study Planner
  • Smart Note-Taking

Join over 22 million students in learning with our Vaia App

Privacy Overview

Home • Knowledge hub • How to Perform Insightful Secondary Market Research.

How to Perform Insightful Secondary Market Research.

Singapore-silver-economy

Making decisions without data is like navigating without a compass. That’s where secondary market research steps in. It’s not just a backup plan; it’s a smart strategy for any brand looking to get ahead. Think of it as the detective work behind the scenes, using existing data to piece together the market puzzle.

While primary research gets a lot of attention for its direct approach to gathering data, it can be expensive and time-consuming. That’s where secondary research shines. It uses data already out there—industry reports, academic studies, and public records. This saves time and money and adds depth to your understanding of the market.

Secondary research complements primary research perfectly. It gives context and background, helping to interpret new data more effectively. In essence, it’s about working smarter, not harder. Leveraging existing data can uncover trends, competitor insights, and customer behavior that might not be evident from new research alone.

So, as we dive into the how-tos of insightful secondary market research, keep in mind it’s not just about cutting costs. It’s about making informed decisions with a fuller picture of the market. After all, in business, knowledge is power, and secondary research is a crucial tool in harnessing that power.

Understanding the Basics of Secondary Research

Secondary market research is about making use of data that’s already out there. Unlike primary research, where you’re collecting data firsthand through surveys, interviews, or experiments, secondary research taps into existing resources. It’s about being resourceful and finding and using data already gathered by others.

So, what can you dig up with secondary research? A lot. You’ve got your public records – think census data, government reports, and regulatory filings. These are goldmines for demographic and economic insights. Then there are academic papers, where you find cutting-edge research and theories that can spark new ideas or validate your hypotheses. Industry reports and market analyses offer a bird’s-eye view of market trends, competitor performance, and industry benchmarks. And don’t forget about competitive analysis – using information published by your competitors themselves, like annual reports and press releases, to get a read on their strategies and performance.

In short, secondary research is your shortcut to a wealth of information. It’s not about reinventing the wheel; it’s about leveraging what’s already out there to build a more robust, more informed strategy for your brand. Whether you’re validating your primary research findings or getting a quick overview of the market landscape, secondary research is a critical step in the process.

The Strategic Value of Secondary Research

Now, let’s talk strategy. Secondary research isn’t just about gathering data; it’s about giving you the strategic edge. Understanding market trends, the competitive landscape, and customer behavior is crucial, and secondary research serves this up on a silver platter.

For instance, let’s take market trends. By analyzing industry reports and academic research, you can spot trends before they go mainstream. This is about seeing where the market is heading, not just where it’s been. For a brand leader looking to steer their company in the right direction, this is invaluable. It’s like having a roadmap for what’s next, helping you to navigate market shifts and position your company as a leader, not a follower.

Then there’s the competitive landscape. Competitive analysis through secondary research lets you peek into your competitors’ worlds. What strategies are they using? What’s working for them (or not)? This isn’t about copying them—it’s about understanding the playing field and finding opportunities to outmaneuver them. This insight can guide mergers, acquisitions, or new product launches.

And we can’t forget about customer behavior. Secondary research gives you a broader understanding of customer needs and pain points. Social media analytics, customer reviews, and market analyses offer a treasure trove of information on what customers say and do. For any brand executive, this is gold. It means you can tailor your products, marketing, and customer service to meet your customers where they are, often before they even know they need you.

In practice, imagine a V.P. of Marketing using secondary research to identify a rising trend in sustainable products within their industry. By aligning their product development and marketing strategies with this trend, they capitalize on market demand and position their brand as forward-thinking and responsible.

Or consider a Head of Strategy using competitive analysis to discover a competitor’s shift towards a new market segment. This insight allows for strategic planning to counteract this move or identify underserved segments that could offer new opportunities.

Secondary research is more than data collection; it’s a strategic tool that helps executives make informed, forward-looking decisions. It’s about staying ahead of the curve and using the wealth of existing information to guide your company’s strategic direction.

The Green Brand Sustainability Study

Step-by-Step Guide to Conducting Effective Secondary Research

Let’s dive into the nuts and bolts of doing secondary research correctly. Follow these steps to ensure your research is thorough and directly aligned with your strategic goals.

Identifying Your Research ObjectivesStart with clarity. What exactly do you need to know? Define your objectives in a way that they directly support your business goals. Whether it’s understanding a market trend, evaluating competitive positions, or getting to know your customers better, your objectives should be specific, measurable, achievable, relevant, and time-bound (SMART).
Sourcing Relevant DataNot all data is created equal. Focus on finding high-quality, reliable sources. Look into academic databases like JSTOR or Google Scholar for peer-reviewed papers, industry reports from firms like Gartner or McKinsey, and public databases for economic and demographic data. Assess the credibility of these sources by checking the author’s credentials, publication date, and the methodology used in the research.
Analyzing and Interpreting DataThis is where the magic happens. Use qualitative methods to understand themes and narratives or quantitative methods for statistical analysis. Tools like SWOT analysis can help in understanding strengths, weaknesses, opportunities, and threats based on the data. Software like SPSS or Excel can be invaluable for crunching numbers. The key is to look for patterns, correlations, and insights that align with your research objectives.
Applying Insights to Strategic DecisionsNow, turn those insights into action. If the data shows a growing market trend, consider how your product development can align with that trend. If competitive analysis reveals a gap in the market, think about how you can position your company to fill that gap. Use these insights to inform decisions on product development, market entry, and competitive positioning.

Challenges and Solutions in Secondary Research

Even with a solid plan, you’ll likely hit a few bumps. Let’s tackle some common challenges in secondary research and how to overcome them.

Overcoming Data Overload

  • The Problem: It’s easy to drown in a sea of data.
  • The Solution: Stay focused on your research objectives. Use filters and search operators to narrow down results.

Dealing with Outdated Information

  • The Problem: Not all data is fresh. Some might be stale by the time you find it.
  • The Solution: Always check the publication date. Prioritize the most recent data, but don’t ignore historical trends, as they can provide valuable context.

Assessing Credibility and Bias

  • The Problem: Not every source is reliable or unbiased.
  • Check the author’s credentials and the publication’s reputation.
  • Look for corroborating evidence from multiple sources to mitigate bias.

Making Sense of Diverse Data

  • The Problem: Data comes in all shapes and sizes, making analysis complex.
  • Use a mixed-methods approach, combining qualitative and quantitative analysis.
  • Visualize your findings with charts and graphs to better identify patterns.

Leveraging Technology in Secondary Research

Technology can be a game-changer in managing and analyzing data.

Data Management Tools

  • Evernote or OneNote: This is used to organize and annotate your findings.
  • Zotero or Mendeley: Great for managing academic references.

Analysis Software

  • Excel or Google Sheets: Handy for quantitative analysis.
  • NVivo: Useful for qualitative data analysis, helping to identify themes and patterns.

Wrapping Up with Actionable Insights

Once you’ve navigated the challenges and leveraged the right tools, it’s time to translate your findings into actionable insights.

Turn Insights into Strategies

  • Product Development: Align your offerings with emerging trends identified in your research.
  • Market Entry: Choose your markets based on competitive analysis and customer needs.
  • Competitive Positioning: Differentiate your brand by filling gaps your competitors have overlooked.

Keep the Conversation Going

  • Share Your Findings: Present your insights to your team or stakeholders in a clear, concise manner.
  • Encourage Feedback: Open the floor for discussions. Different perspectives can further refine your strategy.

Let’s break down how technological powerhouses are changing the game.

A.I. and Machine Learning: The Smart Scouts

  • Pattern Recognition : These tools are like having a detective with a photographic memory and a knack for spotting patterns. They can sift through mountains of data to find trends and correlations that would take humans ages to uncover.
  • Predictive Analysis : A.I. doesn’t just tell you what’s happened; it predicts what might happen next. This is crucial for anticipating market shifts, consumer behavior changes, and potential new niches.
  • Natural Language Processing (NLP) : Ever wanted to know what people say about your brand on social media or in reviews? NLP technologies analyze text to gauge sentiment, pull out key themes, and even track brand mentions over time.

Data Analytics Tools: The Analytical Brains

  • Data Visualization : Tools like Tableau or Power B.I. transform complex datasets into clear, understandable visuals. This makes it easier to share insights with your team or stakeholders and make data-driven decisions quickly.
  • Big Data Analytics : With tools designed to handle vast datasets, you can analyze information from multiple sources simultaneously. This means a more comprehensive view of the market without getting bogged down in details.

Automation: The Efficiency Expert

  • Automated Data Collection : Say goodbye to manual data scraping. Automated tools can continuously monitor and collect data from specified sources, ensuring you have the latest information at your fingertips.
  • Streamlined Analysis : Automation isn’t just for collecting data; it also applies to analyzing it. Automated analysis tools can identify key metrics, perform statistical tests, and even generate reports, saving you time and reducing the risk of human error.

beverage-trends-report

How This Changes the Game

Leveraging technology in secondary research isn’t just about keeping up with the times; it’s about setting the pace. By embracing A.I., machine learning, and data analytics, you’re not just collecting data but unlocking its full potential to drive your brand forward. Integrating these technologies into your secondary research processes means you can:

  • Do More With Less : Less time spent on manual tasks means more time for strategic thinking and decision-making.
  • Stay Ahead of the Curve : With predictive analytics and continuous data monitoring, you can anticipate market trends and adjust your strategies proactively.
  • Make Informed Decisions : Enhanced data visualization and analysis offer clearer insights, making it easier to understand complex information and make informed decisions.

Essential Resources for Secondary Research

Whether you’re digging into local markets or casting a net across global industries, finding reliable and free resources is key to effective secondary research. Here’s a list of go-to sources for insightful, credible information at various levels—local, state, country, and global.

CIA World FactbookGlobalComprehensive information on the history, people, government, economy, geography, communications, transportation, military, and transnational issues for 267 world entities.
Google ScholarGlobalAccess to a wide range of scholarly articles, theses, books, abstracts, and court opinions from academic publishers, professional societies, online repositories, universities, and websites.
PubMedGlobalA free resource supporting the search and retrieval of biomedical and life sciences literature with the aim of improving health–both globally and personally.
World Bank Open DataGlobalFree and open access to global development data, including data on economic development, health, and population statistics.
EurostatEuropeStatistical data and analyses on European countries covering various sectors including economy, population, and social conditions.
United Nations DataGlobalA portal to international statistics gathered by the United Nations on economics, social conditions, environment, and more.
U.S. Census BureauUnited StatesDetailed data on demographic, economic, and geographic studies of the U.S. population.
Bureau of Labor StatisticsUnited StatesU.S. economic data, including employment, productivity, inflation, and the state of various industries.
Pew Research CenterGlobalNonpartisan fact tank that informs the public about the issues, attitudes, and trends shaping the world through public opinion polling and social science research.
StatistaGlobalStatistics portal integrating data on over 80,000 topics from over 22,500 sources onto a single platform.
Google Public Data ExplorerGlobalLarge datasets from world development indicators, OECD, and human development indicators, visualized in an easy-to-understand way.
National Bureau of Economic Research (NBER)United StatesOffers a wide range of economic data, research, and analysis.
Office for National Statistics (ONS)United KingdomUK’s largest independent producer of official statistics and the recognized national statistical institute of the UK.
Australian Bureau of Statistics (ABS)AustraliaProvides statistical services and data on economic, population, environmental, and social issues.
Statistics CanadaCanadaNational statistical office offering a wide array of economic, social, and environmental statistics.
Data.govUnited StatesHome to the U.S. government’s open data, including data on agriculture, education, energy, finance, and more.
European Union Open Data PortalEuropeProvides access to data published by EU institutions and bodies.
IndiaStatIndiaComprehensive statistical analysis on India covering demographics, economy, health, education, and more.
Chinese National Bureau of StatisticsChinaOffers economic, demographic, and social data on China.
Africa Development Bank – Open Data PlatformAfricaData on African countries covering economic, social, and environmental indicators.

This table is a treasure trove for researchers looking to gather secondary data from credible, free sources. Whether you’re exploring local economic trends or global health statistics, these resources offer a wealth of information to support your research objectives.

Conclusion: The Strategic Edge of Secondary Research

Let’s wrap this up with some straight talk: secondary market research is not just a nice-to-have; it’s a must-have in your strategic arsenal. It’s the compass that helps you navigate, offering insights and perspectives that can fundamentally shape your strategic direction.

Remember, secondary research gives you a head start. It’s cost-effective, efficient, and taps into a wealth of data already out there waiting to be leveraged. From understanding market trends and competitive landscapes to getting inside your customers’ heads, secondary research lays the groundwork for informed decision-making.

But it’s not just about collecting data; it’s about turning that data into actionable intelligence. With the help of technology—A.I., machine learning, and data analytics tools—secondary research has become more powerful than ever. It allows you to sift through mountains of information, spot patterns, and predict trends, ensuring that your strategic decisions are backed by solid evidence.

And let’s not forget the resources at your disposal. From the CIA World Factbook to Google Scholar, the tools and databases we’ve discussed are your allies in the quest for knowledge. They’re the sources that can fill in the blanks, confirm your hunches, or even challenge your assumptions, ensuring that your strategies are not just guesses but informed choices.

So, to the marketing and research executives reading this: consider secondary market research as the foundation of your strategic planning. It’s the key to unlocking insights that can propel your business forward, helping you to not just keep up with the pace of change but to set it. 

Helping brands uncover valuable insights

We’ve been working with Kadence on a couple of strategic projects, which influenced our product roadmap roll-out within the region. Their work has been exceptional in providing me the insights that I need. Senior Marketing Executive Arla Foods
Kadence’s reports give us the insight, conclusion and recommended execution needed to give us a different perspective, which provided us with an opportunity to relook at our go to market strategy in a different direction which we are now reaping the benefits from. Sales & Marketing Bridgestone
Kadence helped us not only conduct a thorough and insightful piece of research, its interpretation of the data provided many useful and unexpected good-news stories that we were able to use in our communications and interactions with government bodies. General Manager PR -Internal Communications & Government Affairs Mitsubishi
Kadence team is more like a partner to us. We have run a number of projects together and … the pro-activeness, out of the box thinking and delivering in spite of tight deadlines are some of the key reasons we always reach out to them. Vital Strategies
Kadence were an excellent partner on this project; they took time to really understand our business challenges, and developed a research approach that would tackle the exam question from all directions.  The impact of the work is still being felt now, several years later. Customer Intelligence Director Wall Street Journal

Get In Touch

" (Required) " indicates required fields

Privacy Overview

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List

Logo of springeropen

Protecting against researcher bias in secondary data analysis: challenges and potential solutions

Jessie r. baldwin.

1 Department of Clinical, Educational and Health Psychology, Division of Psychology and Language Sciences, University College London, London, WC1H 0AP UK

2 Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology and Neuroscience, King’s College London, London, UK

Jean-Baptiste Pingault

Tabea schoeler, hannah m. sallis.

3 MRC Integrative Epidemiology Unit at the University of Bristol, Bristol Medical School, University of Bristol, Bristol, UK

4 School of Psychological Science, University of Bristol, Bristol, UK

5 Centre for Academic Mental Health, Population Health Sciences, University of Bristol, Bristol, UK

Marcus R. Munafò

6 NIHR Biomedical Research Centre, University Hospitals Bristol NHS Foundation Trust and University of Bristol, Bristol, UK

Analysis of secondary data sources (such as cohort studies, survey data, and administrative records) has the potential to provide answers to science and society’s most pressing questions. However, researcher biases can lead to questionable research practices in secondary data analysis, which can distort the evidence base. While pre-registration can help to protect against researcher biases, it presents challenges for secondary data analysis. In this article, we describe these challenges and propose novel solutions and alternative approaches. Proposed solutions include approaches to (1) address bias linked to prior knowledge of the data, (2) enable pre-registration of non-hypothesis-driven research, (3) help ensure that pre-registered analyses will be appropriate for the data, and (4) address difficulties arising from reduced analytic flexibility in pre-registration. For each solution, we provide guidance on implementation for researchers and data guardians. The adoption of these practices can help to protect against researcher bias in secondary data analysis, to improve the robustness of research based on existing data.

Introduction

Secondary data analysis has the potential to provide answers to science and society’s most pressing questions. An abundance of secondary data exists—cohort studies, surveys, administrative data (e.g., health records, crime records, census data), financial data, and environmental data—that can be analysed by researchers in academia, industry, third-sector organisations, and the government. However, secondary data analysis is vulnerable to questionable research practices (QRPs) which can distort the evidence base. These QRPs include p-hacking (i.e., exploiting analytic flexibility to obtain statistically significant results), selective reporting of statistically significant, novel, or “clean” results, and hypothesising after the results are known (HARK-ing [i.e., presenting unexpected results as if they were predicted]; [ 1 ]. Indeed, findings obtained from secondary data analysis are not always replicable [ 2 , 3 ], reproducible [ 4 ], or robust to analytic choices [ 5 , 6 ]. Preventing QRPs in research based on secondary data is therefore critical for scientific and societal progress.

A primary cause of QRPs is common cognitive biases that affect the analysis, reporting, and interpretation of data [ 7 – 10 ]. For example, apophenia (the tendency to see patterns in random data) and confirmation bias (the tendency to focus on evidence that is consistent with one’s beliefs) can lead to particular analytical choices and selective reporting of “publishable” results [ 11 – 13 ]. In addition, hindsight bias (the tendency to view past events as predictable) can lead to HARK-ing, so that observed results appear more compelling.

The scope for these biases to distort research outputs from secondary data analysis is perhaps particularly acute, for two reasons. First, researchers now have increasing access to high-dimensional datasets that offer a multitude of ways to analyse the same data [ 6 ]. Such analytic flexibility can lead to different conclusions depending on the analytical choices made [ 5 , 14 , 15 ]. Second, current incentive structures in science reward researchers for publishing statistically significant, novel, and/or surprising findings [ 16 ]. This combination of opportunity and incentive may lead researchers—consciously or unconsciously—to run multiple analyses and only report the most “publishable” findings.

One way to help protect against the effects of researcher bias is to pre-register research plans [ 17 , 18 ]. This can be achieved by pre-specifying the rationale, hypotheses, methods, and analysis plans, and submitting these to either a third-party registry (e.g., the Open Science Framework [OSF]; https://osf.io/ ), or a journal in the form of a Registered Report [ 19 ]. Because research plans and hypotheses are specified before the results are known, pre-registration reduces the potential for cognitive biases to lead to p-hacking, selective reporting, and HARK-ing [ 20 ]. While pre-registration is not necessarily a panacea for preventing QRPs (Table ​ (Table1), 1 ), meta-scientific evidence has found that pre-registered studies and Registered Reports are more likely to report null results [ 21 – 23 ], smaller effect sizes [ 24 ], and be replicated [ 25 ]. Pre-registration is increasingly being adopted in epidemiological research [ 26 , 27 ], and is even required for access to data from certain cohorts (e.g., the Twins Early Development Study [ 28 ]). However, pre-registration (and other open science practices; Table ​ Table2) 2 ) can pose particular challenges to researchers conducting secondary data analysis [ 29 ], motivating the need for alternative approaches and solutions. Here we describe such challenges, before proposing potential solutions to protect against researcher bias in secondary data analysis (summarised in Fig.  1 ).

Limitations in the use of pre-registration to address QRPs

LimitationExample
Pre-registration may not prevent selective reporting/outcome switchingThe COMPare Trials Project [ ] assessed outcome switching in clinical trials published in the top 5 medical journals between October 2015 and January 2016. Among 67 clinical trials, on average, each trial reported 58.2% of its specified outcomes, and silently added 5.3 new outcomes
Pre-registration may be performed retrospectively after the results are knownMathieu et al. [ ] assessed 323 clinical trials published in 2008 in the top 10 medical journals. 45 trials (13.9%) were registered after the completion of the study
Deviations from pre-registered protocols are commonClaesen et al. [ ] assessed all pre-registered articles published in Psychological Science and between February 2015 and November 2017. All 23 articles deviated from the pre-registration, and only one study disclosed the deviation
Pre-registration may not improve the credibility of hypothesesRubin [ ] and Szollosi, Kellen [ ] argue that formulating hypotheses post-hoc (HARK-ing) is not problematic if they are deduced from pre-existing theory or evidence, rather than induced from the current results

Challenges and potential solutions regarding sharing pre-existing data

ChallengePotential solutions

:

Many datasets cannot be publicly shared because of ethical and legal requirements

Share a synthetic dataset (a simulated dataset which mimics an original dataset by preserving its statistical properties and associations between variables). For a tutorial, see Quintana [ ]
Provide specific instructions on how data can be accessed and links to codebooks/data dictionaries with variable information [ ]

If different researchers conduct similar statistical tests on a dataset and do not correct for multiple testing, this increases the risk of false positives [ ]

Test whether findings replicate in independent samples, as the chance of two identical false positives occurring in independent samples is small
Ensure that the research question is distinct from prior studies on the given dataset, to help ensure that proposed analyses are part of a different statistical family. Multiple analyses on a single dataset will not lead to false positives if the analyses are part of different statistical families

An external file that holds a picture, illustration, etc.
Object name is 10654_2021_839_Fig1_HTML.jpg

Challenges in pre-registering secondary data analysis and potential solutions (according to researcher motivations). Note : In the “Potential solution” column, blue boxes indicate solutions that are researcher-led; green boxes indicate solutions that should be facilitated by data guardians

Challenges of pre-registration for secondary data analysis

Prior knowledge of the data.

Researchers conducting secondary data analysis commonly analyse data from the same dataset multiple times throughout their careers. However, prior knowledge of the data increases risk of bias, as prior expectations about findings could motivate researchers to pursue certain analyses or questions. In the worst-case scenario, a researcher might perform multiple preliminary analyses, and only pursue those which lead to notable results (perhaps posting a pre-registration for these analyses, even though it is effectively post hoc). However, even if the researcher has not conducted specific analyses previously, they may be biased (either consciously or subconsciously) to pursue certain analyses after testing related questions with the same variables, or even by reading past studies on the dataset. As such, pre-registration cannot fully protect against researcher bias when researchers have previously accessed the data.

Research may not be hypothesis-driven

Pre-registration and Registered Reports are tailored towards hypothesis-driven, confirmatory research. For example, the OSF pre-registration template requires researchers to state “specific, concise, and testable hypotheses”, while Registered Reports do not permit purely exploratory research [ 30 ], although a new Exploratory Reports format now exists [ 31 ]. However, much research involving secondary data is not focused on hypothesis testing, but is exploratory, descriptive, or focused on estimation—in other words, examining the magnitude and robustness of an association as precisely as possible, rather than simply testing a point null. Furthermore, without a strong theoretical background, hypotheses will be arbitrary and could lead to unhelpful inferences [ 32 , 33 ], and so should be avoided in novel areas of research.

Pre-registered analyses are not appropriate for the data

With pre-registration, there is always a risk that the data will violate the assumptions of the pre-registered analyses [ 17 ]. For example, a researcher might pre-register a parametric test, only for the data to be non-normally distributed. However, in secondary data analysis, the extent to which the data shape the appropriate analysis can be considerable. First, longitudinal cohort studies are often subject to missing data and attrition. Approaches to deal with missing data (e.g., listwise deletion; multiple imputation) depend on the characteristics of missing data (e.g., the extent and patterns of missingness [ 34 ]), and so pre-specifying approaches to dealing with missingness may be difficult, or extremely complex. Second, certain analytical decisions depend on the nature of the observed data (e.g., the choice of covariates to include in a multiple regression might depend on the collinearity between the measures, or the degree of missingness of different measures that capture the same construct). Third, much secondary data (e.g., electronic health records and other administrative data) were never collected for research purposes, so can present several challenges that are impossible to predict in advance [ 35 ]. These issues can limit a researcher’s ability to pre-register a precise analytic plan prior to accessing secondary data.

Lack of flexibility in data analysis

Concerns have been raised that pre-registration limits flexibility in data analysis, including justifiable exploration [ 36 – 38 ]. For example, by requiring researchers to commit to a pre-registered analysis plan, pre-registration could prevent researchers from exploring novel questions (with a hypothesis-free approach), conducting follow-up analyses to investigate notable findings [ 39 ], or employing newly published methods with advantages over those pre-registered. While this concern is also likely to apply to primary data analysis, it is particularly relevant to certain fields involving secondary data analysis, such as genetic epidemiology, where new methods are rapidly being developed [ 40 ], and follow-up analyses are often required (e.g., in a genome-wide association study to further investigate the role of a genetic variant associated with a phenotype). However, this concern is perhaps over-stated – pre-registration does not preclude unplanned analyses; it simply makes it more transparent that these analyses are post hoc. Nevertheless, another understandable concern is that reduced analytic flexibility could lead to difficulties in publishing papers and accruing citations. For example, pre-registered studies are more likely to report null results [ 22 , 23 ], likely due to reduced analytic flexibility and selective reporting. While this is a positive outcome for research integrity, null results are less likely to be published [ 13 , 41 , 42 ] and cited [ 11 ], which could disadvantage researchers’ careers.

In this section, we describe potential solutions to address the challenges involved in pre-registering secondary data analysis, including approaches to (1) address bias linked to prior knowledge of the data, (2) enable pre-registration of non-hypothesis-driven research, (3) ensure that pre-planned analyses will be appropriate for the data, and (4) address potential difficulties arising from reduced analytic flexibility.

Challenge: Prior knowledge of the data

Declare prior access to data.

To increase transparency about potential biases arising from knowledge of the data, researchers could routinely report all prior data access in a pre-registration [ 29 ]. This would ideally include evidence from an independent gatekeeper (e.g., a data guardian of the study) stating whether data and relevant variables were accessed by each co-author. To facilitate this process, data guardians could set up a central “electronic checkout” system that records which researchers have accessed data, what data were accessed, and when [ 43 ]. The researcher or data guardian could then provide links to the checkout histories for all co-authors in the pre-registration, to verify their prior data access. If it is not feasible to provide such objective evidence, authors could self-certify their prior access to the dataset and where possible, relevant variables—preferably listing any publications and in-preparation studies based on the dataset [ 29 ]. Of course, self-certification relies on trust that researchers will accurately report prior data access, which could be challenging if the study involves a large number of authors, or authors who have been involved on many studies on the dataset. However, it is likely to be the most feasible option at present as many datasets do not have available electronic records of data access. For further guidance on self-certifying prior data access when pre-registering secondary data analysis studies on a third-party registry (e.g., the OSF), we recommend referring to the template by Van den Akker, Weston [ 29 ].

The extent to which prior access to data renders pre-registration invalid is debatable. On the one hand, even if data have been accessed previously, pre-registration is likely to reduce QRPs by encouraging researchers to commit to a pre-specified analytic strategy. On the other hand, pre-registration does not fully protect against researcher bias where data have already been accessed, and can lend added credibility to study claims, which may be unfounded. Reporting prior data access in a pre-registration is therefore important to make these potential biases transparent, so that readers and reviewers can judge the credibility of the findings accordingly. However, for a more rigorous solution which protects against researcher bias in the context of prior data access, researchers should consider adopting a multiverse approach.

Conduct a multiverse analysis

A multiverse analysis involves identifying all potential analytic choices that could justifiably be made to address a given research question (e.g., different ways to code a variable, combinations of covariates, and types of analytic model), implementing them all, and reporting the results [ 44 ]. Notably, this method differs from the traditional approach in which findings from only one analytic method are reported. It is conceptually similar to a sensitivity analysis, but it is far more comprehensive, as often hundreds or thousands of analytic choices are reported, rather than a handful. By showing the results from all defensible analytic approaches, multiverse analysis reduces scope for selective reporting and provides insight into the robustness of findings against analytical choices (for example, if there is a clear convergence of estimates, irrespective of most analytical choices). For causal questions in observational research, Directed Acyclic Graphs (DAGs) could be used to inform selection of covariates in multiverse approaches [ 45 ] (i.e., to ensure that confounders, rather than mediators or colliders, are controlled for).

Specification curve analysis [ 46 ] is a form of multiverse analysis that has been applied to examine the robustness of epidemiological findings to analytic choices [ 6 , 47 ]. Specification curve analysis involves three steps: (1) identifying all analytic choices – termed “specifications”, (2) displaying the results graphically with magnitude of effect size plotted against analytic choice, and (3) conducting joint inference across all results. When applied to the association between digital technology use and adolescent well-being [ 6 ], specification curve analysis showed that the (small, negative) association diminished after accounting for adequate control variables and recall bias – demonstrating the sensitivity of results to analytic choices.

Despite the benefits of the multiverse approach in addressing analytic flexibility, it is not without limitations. First, because each analytic choice is treated as equally valid, including less justifiable models could bias the results away from the truth. Second, the choice of specifications can be biased by prior knowledge (e.g., a researcher may choose to omit a covariate to obtain a particular result). Third, multiverse analysis may not entirely prevent selective reporting (e.g., if the full range of results are not reported), although pre-registering multiverse approaches (and specifying analytic choices) could mitigate this. Last, and perhaps most importantly, multiverse analysis is technically challenging (e.g., when there are hundreds or thousands of analytic choices) and can be impractical for complex analyses, very large datasets, or when computational resources are limited. However, this burden can be somewhat reduced by tutorials and packages which are being developed to standardise the procedure and reduce computational time [see 48 , 49 ].

Challenge: Research may not be hypothesis-driven

Pre-register research questions and conditions for interpreting findings.

Observational research arguably does not need to have a hypothesis to benefit from pre-registration. For studies that are descriptive or focused on estimation, we recommend pre-registering research questions, analysis plans, and criteria for interpretation. Analytic flexibility will be limited by pre-registering specific research questions and detailed analysis plans, while post hoc interpretation will be limited by pre-specifying criteria for interpretation [ 50 ]. The potential for HARK-ing will also be minimised because readers can compare the published study to the original pre-registration, where a-priori hypotheses were not specified.

Detailed guidance on how to pre-register research questions and analysis plans for secondary data is provided in Van den Akker’s [ 29 ] tutorial. To pre-specify conditions for interpretation, it is important to anticipate – as much as possible – all potential findings, and state how each would be interpreted. For example, suppose that a researcher aims to test a causal relationship between X and Y using a multivariate regression model with longitudinal data. Assuming that all potential confounders have been fully measured and controlled for (albeit a strong assumption) and statistical power is high, three broad sets of results and interpretations could be pre-specified. First, an association between X and Y that is similar in magnitude to the unadjusted association would be consistent with a causal relationship. Second, an association between X and Y that is attenuated after controlling for confounders would suggest that the relationship is partly causal and partly confounded. Third, a minimal, non-statistically significant adjusted association would suggest a lack of evidence for a causal effect of X on Y. Depending on the context of the study, criteria could also be provided on the threshold (or range of thresholds) at which the effect size would justify different interpretations [ 51 ], be considered practically meaningful, or the smallest effect size of interest for equivalence tests [ 52 ]. While researcher biases might still affect the pre-registered criteria for interpreting findings (e.g., toward over-interpreting a small effect size as meaningful), this bias will at least be transparent in the pre-registration.

Use a holdout sample to delineate exploratory and confirmatory research

Where researchers wish to integrate exploratory research into a pre-registered, confirmatory study, a holdout sample approach can be used [ 18 ]. Creating a holdout sample refers to the process of randomly splitting the dataset into two parts, often referred to as ‘training’ and ‘holdout’ datasets. To delineate exploratory and confirmatory research, researchers can first conduct exploratory data analysis on the training dataset (which should comprise a moderate fraction of the data, e.g., 35% [ 53 ]. Based on the results of the discovery process, researchers can pre-register hypotheses and analysis plans to formally test on the holdout dataset. This process has parallels with cross-validation in machine learning, in which the dataset is split and the model is developed on the training dataset, before being tested on the test dataset. The approach enables a flexible discovery process, before formally testing discoveries in a non-biased way.

When considering whether to use the holdout sample approach, three points should be noted. First, because the training dataset is not reusable, there will be a reduced sample size and loss of power relative to analysing the whole dataset. As such, the holdout sample approach will only be appropriate when the original dataset is large enough to provide sufficient power in the holdout dataset. Second, when the training dataset is used for exploration, subsequent confirmatory analyses on the holdout dataset may be overfitted (due to both datasets being drawn from the same sample), so replication in independent samples is recommended. Third, the holdout dataset should be created by an independent data manager or guardian, to ensure that the researcher does not have knowledge of the full dataset. However, it is straightforward to randomly split a dataset into a holdout and training sample and we provide example R code at: https://github.com/jr-baldwin/Researcher_Bias_Methods/blob/main/Holdout_script.md .

Challenge: Pre-registered analyses are not appropriate for the data

Use blinding to test proposed analyses.

One method to help ensure that pre-registered analyses will be appropriate for the data is to trial the analyses on a blinded dataset [ 54 ], before pre-registering. Data blinding involves obscuring the data values or labels prior to data analysis, so that the proposed analyses can be trialled on the data without observing the actual findings. Various types of blinding strategies exist [ 54 ], but one method that is appropriate for epidemiological data is “data scrambling” [ 55 ]. This involves randomly shuffling the data points so that any associations between variables are obscured, whilst the variable distributions (and amounts of missing data) remain the same. We provide a tutorial for how to implement this in R (see https://github.com/jr-baldwin/Researcher_Bias_Methods/blob/main/Data_scrambling_tutorial.md ). Ideally the data scrambling would be done by a data guardian who is independent of the research, to ensure that the main researcher does not access the data prior to pre-registering the analyses. Once the researcher is confident with the analyses, the study can be pre-registered, and the analyses conducted on the unscrambled dataset.

Blinded analysis offers several advantages for ensuring that pre-registered analyses are appropriate, with some limitations. First, blinded analysis allows researchers to directly check the distribution of variables and amounts of missingness, without having to make assumptions about the data that may not be met, or spend time planning contingencies for every possible scenario. Second, blinded analysis prevents researchers from gaining insight into the potential findings prior to pre-registration, because associations between variables are masked. However, because of this, blinded analysis does not enable researchers to check for collinearity, predictors of missing data, or other covariances that may be necessary for model specification. As such, blinded analysis will be most appropriate for researchers who wish to check the data distribution and amounts of missingness before pre-registering.

Trial analyses on a dataset excluding the outcome

Another method to help ensure that pre-registered analyses will be appropriate for the data is to trial analyses on a dataset excluding outcome data. For example, data managers could provide researchers with part of the dataset containing the exposure variable(s) plus any covariates and/or auxiliary variables. The researcher can then trial and refine the analyses ahead of pre-registering, without gaining insight into the main findings (which require the outcome data). This approach is used to mitigate bias in propensity score matching studies [ 26 , 56 ], as researchers use data on the exposure and covariates to create matched groups, prior to accessing any outcome data. Once the exposed and non-exposed groups have been matched effectively, researchers pre-register the protocol ahead of viewing the outcome data. Notably though, this approach could help researchers to identify and address other analytical challenges involving secondary data. For example, it could be used to check multivariable distributional characteristics, test for collinearity between multiple predictor variables, or identify predictors of missing data for multiple imputation.

This approach offers certain benefits for researchers keen to ensure that pre-registered analyses are appropriate for the observed data, with some limitations. Regarding benefits, researchers will be able to examine associations between variables (excluding the outcome), unlike the data scrambling approach described above. This would be helpful for checking certain assumptions (e.g., collinearity or characteristics of missing data such as whether it is missing at random). In addition, the approach is easy to implement, as the dataset can be initially created without the outcome variable, which can then be added after pre-registration, minimising burden on data guardians. Regarding limitations, it is possible that accessing variables in advance could provide some insight into the findings. For example, if a covariate is known to be highly correlated with the outcome, testing the association between the covariate and the exposure could give some indication of the relationship between the exposure and the outcome. To make this potential bias transparent, researchers should report the variables that they already accessed in the pre-registration. Another limitation is that researchers will not be able to identify analytical issues relating to the outcome data in advance of pre-registration. Therefore, this approach will be most appropriate where researchers wish to check various characteristics of the exposure variable(s) and covariates, rather than the outcome. However, a “mixed” approach could be applied in which outcome data is provided in scrambled format, to enable researchers to also assess distributional characteristics of the outcome. This would substantially reduce the number of potential challenges to be considered in pre-registered analytical pipelines.

Pre-register a decision tree

If it is not possible to access any of the data prior to pre-registering (e.g., to enable analyses to be trialled on a dataset that is blinded or missing outcome data), researchers could pre-register a decision tree. This defines the sequence of analyses and rules based on characteristics of the observed data [ 17 ]. For example, the decision tree could specify testing a normality assumption, and based on the results, whether to use a parametric or non-parametric test. Ideally, the decision tree should provide a contingency plan for each of the planned analyses, if assumptions are not fulfilled. Of course, it can be challenging and time consuming to anticipate every potential issue with the data and plan contingencies. However, investing time into pre-specifying a decision tree (or a set of contingency plans) could save time should issues arise during data analysis, and can reduce the likelihood of deviating from the pre-registration.

Challenge: Lack of flexibility in data analysis

Transparently report unplanned analyses.

Unplanned analyses (such as applying new methods or conducting follow-up tests to investigate an interesting or unexpected finding) are a natural and often important part of the scientific process. Despite common misconceptions, pre-registration does not permit such unplanned analyses from being included, as long as they are transparently reported as post-hoc. If there are methodological deviations, we recommend that researchers should (1) clearly state the reasons for using the new method, and (2) if possible, report results from both methods, to ideally show that the change in methods was not due to the results [ 57 ]. This information can either be provided in the manuscript or in an update to the original pre-registration (e.g., on the third-party registry such as the OSF), which can be useful when journal word limits are tight. Similarly, if researchers wish to include additional follow-up analyses to investigate an interesting or unexpected finding, this should be reported but labelled as “exploratory” or “post-hoc” in the manuscript.

Ensure a paper’s value does not depend on statistically significant results

Researchers may be concerned that reduced analytic flexibility from pre-registration could increase the likelihood of reporting null results [ 22 , 23 ], which are harder to publish [ 13 , 42 ]. To address this, we recommend taking steps to ensure that the value and success of a study does not depend on a significant p-value. First, methodologically strong research (e.g., with high statistical power, valid and reliable measures, robustness checks, and replication samples) will advance the field, whatever the findings. Second, methods can be applied to allow for the interpretation of statistically non-significant findings (e.g., Bayesian methods [ 58 ] or equivalence tests, which determine whether an observed effect is surprisingly small [ 52 , 59 , 60 ]. This means that the results will be informative whatever they show, in contrast to approaches relying solely on null hypothesis significance testing, where statistically non-significant findings cannot be interpreted as meaningful. Third, researchers can submit the proposed study as a Registered Report, where it will be evaluated before the results are available. This is arguably the strongest way to protect against publication bias, as in-principle study acceptance is granted without any knowledge of the results. In addition, Registered Reports can improve the methodology, as suggestions from expert reviewers can be incorporated into the pre-registered protocol.

Under a system that rewards novel and statistically significant findings, it is easy for subconscious human biases to lead to QRPs. However, researchers, along with data guardians, journals, funders, and institutions, have a responsibility to ensure that findings are reproducible and robust. While pre-registration can help to limit analytic flexibility and selective reporting, it involves several challenges for epidemiologists conducting secondary data analysis. The approaches described here aim to address these challenges (Fig.  1 ), to either improve the efficacy of pre-registration or provide an alternative approach to address analytic flexibility (e.g., a multiverse analysis). The responsibility in adopting these approaches should not only fall on researchers’ shoulders; data guardians also have an important role to play in recording and reporting access to data, providing blinded datasets and hold-out samples, and encouraging researchers to pre-register and adopt these solutions as part of their data request. Furthermore, wider stakeholders could incentivise these practices; for example, journals could provide a designated space for researchers to report deviations from the pre-registration, and funders could provide grants to establish best practice at the cohort level (e.g., data checkout systems, blinded datasets). Ease of adoption is key to ensure wide uptake, and we therefore encourage efforts to evaluate, simplify and improve these practices. Steps that could be taken to evaluate these practices are presented in Box 1.

More broadly, it is important to emphasise that researcher biases do not operate in isolation, but rather in the context of wider publication bias and a “publish or perish” culture. These incentive structures not only promote QRPs [ 61 ], but also discourage researchers from pre-registering and adopting other time-consuming reproducible methods. Therefore, in addition to targeting bias at the individual researcher level, wider initiatives from journals, funders, and institutions are required to address these institutional biases [ 7 ]. Systemic changes that reward rigorous and reproducible research will help researchers to provide unbiased answers to science and society’s most important questions.

Box 1. Evaluation of approaches

To evaluate, simplify and improve approaches to protect against researcher bias in secondary data analysis, the following steps could be taken.

Co-creation workshops to refine approaches

To obtain feedback on the approaches (including on any practical concerns or feasibility issues) co-creation workshops could be held with researchers, data managers, and wider stakeholders (e.g., journals, funders, and institutions).

Empirical research to evaluate efficacy of approaches

To evaluate the effectiveness of the approaches in preventing researcher bias and/or improving pre-registration, empirical research is needed. For example, to test the extent to which the multiverse analysis can reduce selective reporting, comparisons could be made between effect sizes from multiverse analyses versus effect sizes from meta-analyses (of non-pre-registered studies) addressing the same research question. If smaller effect sizes were found in multiverse analyses, it would suggest that the multiverse approach can reduce selective reporting. In addition, to test whether providing a blinded dataset or dataset missing outcome variables could help researchers develop an appropriate analytical protocol, researchers could be randomly assigned to receive such a dataset (or no dataset), prior to pre-registration. If researchers who received such a dataset had fewer eventual deviations from the pre-registered protocol (in the final study), it would suggest that this approach can help ensure that proposed analyses are appropriate for the data.

Pilot implementation of the measures

To assess the practical feasibility of the approaches, data managers could pilot measures for users of the dataset (e.g., required pre-registration for access to data, provision of datasets that are blinded or missing outcome variables). Feedback could then be collected from researchers and data managers via about the experience and ease of use.

Acknowledgements

The authors are grateful to Professor George Davey for his helpful comments on this article.

Author contributions

JRB and MRM developed the idea for the article. The first draft of the manuscript was written by JRB, with support from MRM and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

J.R.B is funded by a Wellcome Trust Sir Henry Wellcome fellowship (grant 215917/Z/19/Z). J.B.P is a supported by the Medical Research Foundation 2018 Emerging Leaders 1 st Prize in Adolescent Mental Health (MRF-160–0002-ELP-PINGA). M.R.M and H.M.S work in a unit that receives funding from the University of Bristol and the UK Medical Research Council (MC_UU_00011/5, MC_UU_00011/7), and M.R.M is also supported by the National Institute for Health Research (NIHR) Biomedical Research Centre at the University Hospitals Bristol National Health Service Foundation Trust and the University of Bristol.

Declarations

Author declares that they have no conflict of interest.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

IMAGES

  1. 15 Secondary Research Examples (2024)

    secondary research government reports

  2. Secondary Market Research: What It Is and How to Do It Fast

    secondary research government reports

  3. Government Report

    secondary research government reports

  4. Government report

    secondary research government reports

  5. What is secondary research: Definition, methods and examples

    secondary research government reports

  6. Free Research Report Template

    secondary research government reports

VIDEO

  1. National Workshop

  2. How do Businesses use Market Reports for Secondary Market Research?

  3. EDUCATIONISTS WORK ON CURRICULUM

  4. Scholarly Sources in CJ Research

  5. How do Businesses use Government Reports for Secondary Market Research?

  6. Sources of Secondary Data(B.Com,BBA) by Mr. Shiv Jhalani Biyani Girls College, Jaipur

COMMENTS

  1. Secondary Research: Definition, Methods & Examples

    This includes internal sources (e.g.in-house research) or, more commonly, external sources (such as government statistics, organizational bodies, and the internet). Secondary research comes in several formats, such as published datasets, reports, and survey responses, and can also be sourced from websites, libraries, and museums.

  2. What is Secondary Research?

    Secondary research is a research method that uses data that was collected by someone else. In other words, whenever you conduct research using data that already exists, you are conducting secondary research. On the other hand, any type of research that you undertake yourself is called primary research. Example: Secondary research.

  3. Secondary Research: Definition, Methods & Examples

    Secondary research is a method that involves using already existing data. ... One common source of this research is published research reports and other documents. These materials can often be found in public libraries, on websites, or even as data extracted from previously conducted surveys. In addition, many government and non-government ...

  4. Methods of market research

    Market research - Edexcel Methods of market research - secondary research. ... External research could include information from internet research, market reports and government reports.

  5. What is Secondary Research? Types, Methods, Examples

    Secondary Research. Data Source: Involves utilizing existing data and information collected by others. Data Collection: Researchers search, select, and analyze data from published sources, reports, and databases. Time and Resources: Generally more time-efficient and cost-effective as data is already available.

  6. Secondary Research Advantages, Limitations, and Sources

    Secondary research is based on data already collected for purposes other than the specific problem you have. ... Government sources: Census data and other government publications. ... Pew Research Center: Reports about the issues, attitudes, and trends shaping the world. It conducts public opinion polling, demographic research, media content ...

  7. Secondary Market Research: How to do it Fast

    2 - Choose the best sources of secondary market research. 3 - Access, collate, and verify research data. 4 - Analyze, compare, and identify trends. 5 - Confirm if the research questions are answered. If not, repeat steps 1-4 using different sources, or consider primary market research as an alternative.

  8. Governing secondary research use of health data and specimens: the

    In secondary research—which often involves complex technologies, ethereal risks, and vague protocols—these concerns are compounded. 153 As Laura Beskow has argued: ... This report, in turn, inspired the US government's dedication to building a health biospecimen and data commons, ...

  9. A Comprehensive Guide to Secondary Market Research

    Secondary research draws on diverse data sources, including government reports, industry publications, market research studies, academic journals, and online databases. These sources provide a wealth of information on market trends, consumer behavior, competitor analysis, and industry insights, enhancing the comprehensiveness and reliability of ...

  10. Chapter 5 Secondary Research

    Secondary Research. First-hand research to collect data. May require a lot of time. The research collects existing, published data. Requires less time. Creates raw data that the researcher owns. The researcher has no control over data method or ownership. Relevant to the goals of the research. May not be relevant to the goals of the research.

  11. Secondary Analysis and The Relationship Between Official and ...

    ACADEMIC SOCIAL RESEARCH*. Abstract This article outlines three connected trends over the past fifteen years in social research: the establishment of a national data archive; the development of continuous and regular multi-purpose social surveys in government; and the emergence of secondary analysis as a distinctive new trend in social research.

  12. Understanding the value of secondary research data

    Secondary research has several benefits: Enables use of large-scale data sets or large samples of human or model organism specimens. Can be less expensive and time-consuming than primary data collection. May be simpler (and expedited) if an Institutional Review Board waives the need for informed consent for a secondary research project.

  13. PDF Workbook B

    Sometimes secondary research is referred to as community assessment, needs assessment, or situation analysis, but we use the term to mean: (1) collecting hard data that already exists about a community or communities targeted for your study; and (2) taking an initial look at communities' experiences with OST programs.

  14. Government Publications

    Government publications include reports, guidance documents, administrative decisions, and other materials published by government agencies and legislative bodies. For instance, Congressional Research Service (CRS) reports provide in-depth, non-partisan analysis on a broad range of policy and legal topics.

  15. Secondary Research

    Secondary research is considered human subjects research that requires IRB review when the specimens/data are identifiable to the researchers and were collected for another purpose than the planned research. The following is an example of secondary research: An investigator learns of preliminary data from a study that suggests cigarette smoking leads to specific epigenetic changes that ...

  16. Understanding Secondary Research: A Comprehensive Guide

    What secondary research is and how it's used, the advantages and disadvantages of this type of research, steps for conducting successful secondary research. Search ... books, newspapers/magazines, conference proceedings, government websites/reports, and industry reports. Qualitative and quantitative approaches can both be used in secondary ...

  17. Secondary Analysis Research

    Secondary analysis of data collected by another researcher for a different purpose, or SDA, is increasing in the medical and social sciences. This is not surprising, given the immense body of health care-related research performed worldwide and the potential beneficial clinical implications of the timely expansion of primary research (Johnston, 2014; Tripathy, 2013).

  18. Types of Market Research: Primary vs Secondary

    It involves more structured, formal interviews. Primary research usually costs more and often takes longer to conduct than secondary research, but it gives conclusive results. Secondary research is a type of research that has already been compiled, gathered, organized and published by others. It includes reports and studies by government ...

  19. How To Do Secondary Research or a Literature Review

    Secondary research, also known as a literature review, preliminary research, historical research, background research, desk research, or library research, is research that analyzes or describes prior research.Rather than generating and analyzing new data, secondary research analyzes existing research results to establish the boundaries of knowledge on a topic, to identify trends or new ...

  20. Secondary Data

    Types of secondary data are as follows: Published data: Published data refers to data that has been published in books, magazines, newspapers, and other print media. Examples include statistical reports, market research reports, and scholarly articles. Government data: Government data refers to data collected by government agencies and departments.

  21. Secondary Research

    official statistics, such as government reports or organisations which have collected and published data; Primary research involves gathering data which has not been collected before. Methods to collect it can include interviews, focus groups, controlled trials and case studies. ... Secondary research for your major project may take the form of ...

  22. Types of Secondary Research Data

    Reliable databases for secondary sources include Government Sources, Business Source Complete, ABI, IBISWorld, Statista, and CBCA Complete. This data is generated by others but can be considered useful when conducting research into a new scope of the study. It also means less work for a non-for-profit organization as they would not have to ...

  23. Conducting secondary analysis of qualitative data: Should we, can we

    Scholars have also promoted the practice of sharing data for the purpose of SDA, asserting that it may answer new research questions, as well as increase sample sizes and statistical power (Perrino et al., 2013).Sharing data also allows for the generation of new knowledge without the costs of administration and implementation of additional data collection and maximizes the output of large ...

  24. Secondary Market Research: Definition & Methods

    Secondary market researc h is the process of gathering, analyzing, and interpreting data already collected by others, including industry reports, government publications, academic research, and competitor analysis, to inform business decisions. The main benefit of secondary market research is saving research time and costs.

  25. Secondary Qualitative Research Methodology Using Online Data within the

    In addition to the challenges of secondary research as mentioned in subsection Secondary Data and Analysis, in current research realm of secondary analysis, there is a lack of rigor in the analysis and overall methodology (Ruggiano & Perry, 2019). This has the pitfall of possibly exaggerating the effects of researcher bias (Thorne, 1994, 1998 ...

  26. How to Perform Insightful Secondary Market Research

    Secondary market research is about making use of data that's already out there. Unlike primary research, where you're collecting data firsthand through surveys, interviews, or experiments, secondary research taps into existing resources. It's about being resourceful and finding and using data already gathered by others.

  27. PDF FAQs about Secondary Research

    1. What is secondary research? Secondary research is research with existing specimens/data initially collected for purposes other than the planned research. The specimens/data might have initially been collected for non-research purposes (for example, as part of routine clinical care) or as part a different research protocol.

  28. Protecting against researcher bias in secondary data analysis

    An abundance of secondary data exists—cohort studies, surveys, administrative data (e.g., health records, crime records, census data), financial data, and environmental data—that can be analysed by researchers in academia, industry, third-sector organisations, and the government. However, secondary data analysis is vulnerable to ...