MetascienceExplainerJun 25, 2026, 9:04 AM· 7 min read· #1 of 3 in culture

The Replication Crisis: Why Half of Social Science Findings Are Failing—and Why That's a Good Thing

A massive seven-year audit involving hundreds of researchers found that only half of published social science findings can be replicated. However, scientists say this public self-correction is exactly how the scientific method is supposed to work.

By Factlen Editorial Team

Share this story

Metascientists and Reformers 45%Optimistic Researchers 35%Contextualists 20%

Metascientists and Reformers: Argue that the low replication rate is a systemic issue solvable through open data, preregistration, and transparency.
Optimistic Researchers: View the crisis as a healthy, necessary self-correction that proves the scientific method works.
Contextualists: Argue that human behavior is highly sensitive to context, suggesting some failed replications reflect changing times.

What's not represented

· Policymakers who rely on these studies
· Patients receiving clinical psychology treatments

Why this matters

Social science research dictates how we treat mental illness, design school curriculums, and manage workplaces. Ensuring these studies are actually true prevents society from wasting millions of dollars and years of effort on interventions that don't work.

Key points

The SCORE project analyzed nearly 3,900 social science papers and attempted to actively replicate 164 of them.
Only 49 percent of the replicated studies produced results similar to the original findings.
When original researchers shared their data and code openly, exact reproducibility jumped to 77 percent.
Scientists view the audit as a massive success for transparency, proving the field is actively self-correcting.

49%

Studies successfully replicated

865

Researchers involved in the SCORE project

77%

Reproducibility rate when data is openly shared

0.25 to 0.10

Average shrinkage in effect size

The bedrock of the scientific method is repeatability. If a physicist drops an apple, it falls; if a chemist mixes two compounds, they react. In theory, if a social scientist runs an experiment twice, they should get the same result. But a massive new audit of the social and behavioral sciences has revealed a coin-flip reality, challenging decades of foundational research. The findings confirm what many researchers have suspected for years: a significant portion of published scientific literature cannot be replicated when tested by independent teams. Yet, rather than viewing this as the collapse of the field, leading scientists are celebrating the audit as a necessary, transparent self-correction that will ultimately make research much more reliable.[1][7]

Coordinated by the Center for Open Science and funded by the U.S. Defense Advanced Research Projects Agency (DARPA), the Systematizing Confidence in Open Research and Evidence (SCORE) project is the largest replication effort in scientific history. Over the course of seven years, an international coalition of 865 researchers analyzed nearly 3,900 scientific articles. These papers were published between 2009 and 2018 across 62 different academic journals, representing the absolute pinnacle of peer-reviewed social science. The sheer scale of the endeavor required unprecedented collaboration across borders and disciplines, aiming to definitively answer how much of modern social science is built on solid empirical ground.[3][5]

The scope of the SCORE project covered eleven distinct disciplines, including criminology, economics, educational science, organizational behavior, psychology, and sociology. From the massive pool of literature, the research team selected 164 specific, highly cited studies to actively replicate. This meant recruiting entirely new pools of participants and running the exact same experimental protocols to see if the original findings held up in a fresh context. The goal was not to target specific researchers or debunk niche theories, but to take a random, representative sample of the science that routinely makes its way into university textbooks and public policy discussions.[1][5]

The SCORE project represents the largest replication effort in scientific history.

The results of the active replication phase were sobering: only 49 percent of the studies could be successfully replicated with a similar result to the original paper. Furthermore, even when the findings did hold up under scrutiny, the magnitude of the discoveries shrank significantly. The average "effect size"—a statistical measure of how strong a relationship or intervention actually is—dropped from 0.25 in the original studies to just 0.10 in the replications. This severe deflation suggests that while a psychological or economic effect might genuinely exist, its real-world impact is often vastly overstated in the initial, headline-grabbing publication.[1][5]

To understand how this looks in practice, consider a widely cited 2012 study published in the Journal of Organizational Behavior. The original paper claimed to find a strong link between extraversion and emotional attachment to an organization, suggesting that extroverted employees form deeper, warmer bonds with their workplaces. When researchers at the University of Virginia, including artificial intelligence expert Ryan Wright, attempted to replicate this exact finding across six different universities, the link completely vanished. Measuring students' personalities and their cognitive and emotional connections to their schools revealed that being an extrovert simply did not predict a warmer bond with the institution.[4][6]

To understand how this looks in practice, consider a widely cited 2012 study published in the Journal of Organizational Behavior.

Beyond active replication with new participants, the SCORE team also tested "reproducibility"—whether independent scientists could reach the same conclusions using the original researchers' own data. They found that only 54 percent of the studies could be reproduced exactly. The primary culprit was not academic fraud or manipulated numbers, but missing information. Reproducibility was severely hampered by the fact that data was often unavailable; only a quarter of the original papers had shared their datasets openly. In many cases, the original authors had lost the files, changed institutions, or simply failed to document their analytical steps clearly enough for someone else to follow.[5]

However, the reproducibility audit also contained a massive silver lining that points the way forward for all of science. When researchers did share their original data and code openly, the exact reproducibility rate skyrocketed to 77 percent, and approximate reproducibility hit 91 percent. This stark contrast highlights a systemic cultural issue in how science has historically been published: the final written paper is often treated as the sole product, while the underlying data and code are discarded, hidden, or locked behind proprietary software. The SCORE findings prove mathematically that transparency is the single most effective tool for ensuring scientific credibility. When scientists show their work, errors can be caught, analyses can be verified, and the entire foundation of the discipline grows stronger.[5]

When researchers openly share their data and code, the ability for independent scientists to reproduce their findings skyrockets.

While a 50 percent failure rate might sound like a catastrophic crisis to the general public, many scientists view it as a triumph of the scientific method. Michael Inzlicht, a psychology professor at the University of Toronto and one of the study's authors, argues that a field willing to audit itself in public is fundamentally more trustworthy than one that claims it never errs. Failed replications reveal weaknesses in specific published findings, but they simultaneously demonstrate science's unique ability to identify and correct its own mistakes. The process of tearing down weak theories to build stronger ones is exactly how empirical research is supposed to function.[2]

The so-called "replication crisis" has already forced a massive, field-wide self-correction over the past decade. Researchers are increasingly adopting a practice known as "preregistration," where they publicly declare their hypotheses, sample sizes, and analytical methods before collecting a single data point. This prevents scientists from massaging their numbers after the fact—a practice known as p-hacking—to find a publishable, statistically significant result. Because of these sweeping reforms, reformers argue that the social science being published today looks nothing like the fragile literature of the early 2010s. Journals now demand larger sample sizes, mandate data sharing, and publish negative results that would have previously been buried in a file drawer.[2][4]

The stakes of getting this right extend far beyond academic debates and university seminars. Social science research routinely shapes public policy, corporate management strategies, educational standards, and clinical treatments for mental illness. As Inzlicht points out, psychology sits at the absolute heart of how society treats mental health conditions. If the underlying foundational work isn't credible, the real-world interventions and treatment recommendations built upon it will inevitably fail, costing time, money, and human well-being. Ensuring the reliability of this research is a profound public health imperative. When a study claims that a certain behavioral intervention reduces depression or that a specific teaching method accelerates childhood literacy, school districts and hospitals spend millions of dollars implementing those findings. The SCORE project underscores that adopting sweeping policy changes based on a single, unverified study is a recipe for systemic failure.[2]

Scientists are increasingly viewing individual papers as single pieces of a larger puzzle, rather than absolute truth.

The SCORE project's findings suggest that single studies should rarely, if ever, be treated as canonical truth. Tim Errington, the head of research at the Center for Open Science, notes that fresh methods or analyses can legitimately lead to distinct results depending on the context. Instead of taking papers at face value, researchers, journalists, and the public must view individual papers as single pieces of a much larger puzzle. True scientific consensus requires accumulation, verification, and rigorous stress-testing over time before a finding can be safely translated into widespread policy. The era of the 'lone genius' producing a single, world-changing paper is giving way to a more industrial, collaborative model of science where massive teams verify claims before they are accepted as fact.[7]

Ultimately, the replication crisis is not the death of social science, but its painful, necessary maturation. By embracing transparency, open data, and rigorous self-auditing, the field is rebuilding its foundations on solid ground. The researchers who spent seven years double-checking the work of their peers have provided a vital public service, proving that science is not a collection of infallible facts, but a continuous, self-correcting process. As the reforms of the past decade take root, the textbooks of tomorrow will be based on evidence that actually holds up to the light. The willingness to look inward and admit systemic flaws is the ultimate hallmark of scientific integrity, ensuring that future discoveries in human behavior and psychology will be robust enough to truly improve society.[2][4]

How we got here

Early 2010s
Early replication efforts in psychology reveal widespread issues, coining the term 'replication crisis.'
2019
The Center for Open Science launches the SCORE project to audit a massive swath of social science literature.
April 2026
The SCORE project publishes its findings in Nature, revealing a 49 percent replication rate across 164 studies.

Viewpoints in depth

Metascientists and Reformers

This camp believes the replication crisis exposes deep systemic flaws in how science is published and incentivized.

Researchers in this camp, including those at the Center for Open Science, argue that the academic pressure to 'publish or perish' has historically rewarded flashy, surprising findings over rigorous, boring truths. They advocate for a complete overhaul of the scientific publishing model, pushing for mandatory open data, preregistration of hypotheses, and the routine publication of 'negative' results where an experiment finds nothing. To them, the 49 percent replication rate is a mathematical proof that the old way of doing science is fundamentally broken.

Optimistic Researchers

This camp views the replication crisis not as a failure, but as a triumphant demonstration of science's ability to self-correct.

Scientists like Michael Inzlicht emphasize that the willingness to publicly audit and retract flawed work is the exact mechanism that separates science from dogma. They point out that the field of psychology looks radically different—and much more rigorous—than it did a decade ago. From this perspective, the replication crisis is a painful but necessary adolescence. A discipline that actively hunts down its own errors and corrects them is ultimately more trustworthy than one that projects an illusion of infallibility.

Contextualists

This camp argues that human behavior is fluid, meaning some failed replications might simply reflect changing times or different demographics.

Some traditional researchers caution against throwing out classic studies entirely. They argue that unlike physics, where an electron behaves the same way everywhere, social science studies human beings who are heavily influenced by culture, history, and context. A study on workplace extraversion conducted in 2012 might fail to replicate in 2026 not because the original math was wrong, but because remote work and generational shifts have fundamentally changed how people bond with their employers. However, the SCORE project noted that many studies failed consistently across multiple new contexts, challenging this defense.

What we don't know

Whether the integration of artificial intelligence into the peer-review process will be able to accurately predict which studies will fail to replicate before they are published.
How many currently accepted clinical treatments or educational policies are based on foundational studies that would fail a modern replication attempt.

Key terms

Replication: Running an entirely new experiment with new participants to see if an original finding holds up.
Reproducibility: Re-analyzing the original data from a study to see if the same mathematical conclusions can be reached.
Effect Size: A statistical measure of how strong a relationship is, or how large of an impact an intervention has.
Preregistration: The practice of publicly declaring a study's hypothesis and methods before collecting any data, preventing retroactive manipulation.
P-hacking: The misuse of data analysis to find patterns in data that can be presented as statistically significant, even when they are not.

Frequently asked

Does a failed replication mean the original researchers committed fraud?

Rarely. Most replication failures are due to small sample sizes, missing data, or the original study capturing a statistical fluke rather than intentional misconduct.

Is the replication crisis limited to psychology?

No. The SCORE project found similar replication issues across 11 disciplines, including economics, educational science, and organizational behavior. Medicine and biology have also faced similar audits.

How is the scientific community fixing this problem?

Researchers are increasingly adopting 'open science' practices, such as sharing their raw data, publishing their code, and preregistering their hypotheses before conducting experiments.

Should we stop trusting social science research?

No. Experts advise treating single, isolated studies as pieces of a larger puzzle rather than absolute truth, waiting for multiple replications before changing policies or behaviors.

Sources

[1]ForbesMetascientists and Reformers
Results From A Massive Research Project Investigating Whether Previously Reported Scientific Results Can Be Replicated Raises Questions About Their Reliability
Read on Forbes →
[2]The College FixOptimistic Researchers
Only 50% of social science research can be replicated, study finds
Read on The College Fix →
[3]Science FridayMetascientists and Reformers
Why so many studies can't be replicated
Read on Science Friday →
[4]SpaceDailyContextualists
This isn't the first time we've been here
Read on SpaceDaily →
[5]Karolinska InstitutetOptimistic Researchers
Half of social science results cannot be replicated
Read on Karolinska Institutet →
[6]University of VirginiaContextualists
New, massive study finds only half of social science findings replicate
Read on University of Virginia →
[7]NatureMetascientists and Reformers
Half of social-science studies fail replication test in years-long project
Read on Nature →

Up next

Visual Trends

The Anti-Perfection Aesthetic: Why Photographers Are Embracing Blur, Grain, and 'Messy' Reality as a Rejection of AI

As generative AI makes flawless, hyper-realistic images ubiquitous, a growing movement of photographers and brands is deliberately embracing blurry, grainy, and 'messy' visuals to signal authentic human experience.

Every angle. Every day.

Get culture stories with full source coverage and perspective breakdowns delivered to your inbox.

Get the briefing →Browse culture