In bed with the enemy: how to fix science

Calling ceasefire in the scientific turf war

The way we do science is broken: frequently inefficient, unreliable and even totally invalid, the road to consensus is fraught with hostility and academic pettiness. Cory Clark and Philip Tetlock argue for an alternative to the critique-reply-rejoinder format with the radical potential to transform the pursuit of knowledge.


If we were naïve observers, we might think of scientists as earnest detectives—carefully sifting through the evidence, pursuing all reasonable leads, and updating their beliefs as needed. We might imagine that scientists get together, exchange notes, form brilliant and empirically accurate beliefs, and then share these state-of-the-science ideas with the public. These ideas usually would be true and thus form a reliable basis for designing effective interventions and policies.

To be sure, science has accomplished remarkable feats, from vaccines to spacecrafts. But science is far from the idealistic version portrayed above. Science is the single most effective mode of knowledge formation to date. But, it can also be inefficient, hostile, petty, unreliable, and invalid. An average disagreement in science looks a bit like this: Scholar A forwards an idea. Scholar B contradicts it. Scholar A mocks Scholar B in a commentary, collects more data, and finds herself correct. Scholar B mocks Scholar A in a commentary, collects more data, and finds himself correct. Over time, they come to hate one another, and through the collection of more and more data, become only more convinced of their own initial beliefs. In good cases, third-party observers eventually make sense of the mess of seemingly contradictory evidence, and a “scientific consensus” is born.



Intellectual adversaries treat one another as enemies in a turf war, and so they strive to eliminate one another.



Scientific Indignities

Over the past decade or so, many scholars have accepted that much of science suffers a “replication crisis”. When a group of scholars tries to conduct the exact same methodological procedures as an earlier set of scholars, they often find different (and usually much less impressive) results. This means that a great deal of science is unreliable—very similar studies do not consistently produce very similar results.

But things are a bit worse than that. A great deal of science is also invalid, meaning it is simply not true. Even highly replicable findings can be wildly misleading, such as when a highly replicable association between two variables (say, ice cream sales and shark attacks) is accompanied by a highly inaccurate causal story (purchasing ice cream causes sharks to attack). Scholars had to work very hard to detect and demonstrate the replication crisis. The validity crisis is much simpler to detect: There are countless contradictory claims in the published literature.

STEM fields are biased against women. STEM fields are biased against men.

Stereotypes are generally false. Stereotypes are generally true.

Psychological sex differences are caused by evolution. Psychological sex differences are caused by culture. Psychological sex differences mostly do not exist. Biological sex isn’t really a thing.

Such claims, at least taken at face value, cannot all be true. Either someone is horrifically wrong, or at least someone is exaggerating. Although science purports to pursue truth, science actually incentivizes such contradictions.

  23 04 13 Quantum physics gives us problems.dc SUGGESTED READING Quantum mechanics gives us power, but no answers By John Horgan

Be New

Prestigious academic journals and institutions incentivize scholars to forward new ideas and findings—ideas that distinguish them from earlier scholars and contribute something unique to human knowledge. In other words, scholars are incentivized to create the impression of disagreement, even if disagreements are nominal, minuscule, or even non-existent.

One way scholars accomplish this is by using vague language that cannot be directly measured. A scholar might claim that “the antiquated professors before me severely underestimated human rationality,” yet never quantify how rational these professors claimed people were, nor quantify their newer, apparently better estimate. Scholars on different sides of a debate might have nearly identical estimates of the size of some effect, but one will emphasize that their view is important enough to warrant an entire career of exploration whilst the other emphasizes that the same view is trivial enough to warrant an entire career of debunking.


Be Big

Scientific institutions also incentivize broad claims, and so scholars prefer to report that some broad statement is true or false rather than the narrow contexts in which such statements are true or false. The claim “political liberals have a pro-woman bias” makes a better headline than “self-identifying political liberals in an online sample of U.S adults in the summer of 2022 found it less plausible that men would have evolved to be better leaders than women than that women would have evolved to be better leaders than men”. Scientists do try to increase the generalizability of their findings by testing at least a few iterations of their studies and materials, but it usually stops there.


Although science purports to pursue truth, science actually incentivizes such contradictions.


Even time-consuming meta-analyses (which synthesize all known investigations of a particular phenomenon) are constrained by the imaginations and data access of the scholars who conducted the original studies. If a handful of measures of authoritarianism, for example, became popular in a discipline, and those measures tended to be associated with conservatism (due to the biases of the scholars who designed them, or just randomness), then a thorough meta-analysis of hundreds of studies that used those methods would show a robust association between conservatism and authoritarianism. Meanwhile, another set of potential measures of authoritarianism could exist that show no relation with conservatism or even an association with political liberalism that is never created or discovered. We know of at least one case in which two meta-analyses that were conducted exactly in parallel came to opposing conclusions because scholars made different inclusion/exclusion criteria.


Be Right

Of course, being big and new is not sufficient. Scholars must convince the scientific community that their superficially novel and excessively broad ideas are true. This is where the real challenge comes in (even setting aside that popular and narrower claims are more likely to be true than unique and broader ones). Because scholars must disagree with other scholars, they have to prove that their (typically smart and competent adversaries) are phonies. Here, straw-manning opponents is useful.

In our ideal scientific world, scholars might look at intellectual adversaries with curiosity and interest. “Hmm… you are very smart and similarly well-versed in the literature as I am… how is it that you have come to a different conclusion? Come, ally! Let’s put our heads together and figure this out!”. Instead, intellectual adversaries treat one another as enemies in a turf war, and so they strive to eliminate one another.


A Better Way

Adversarial collaborations (henceforth, adcollab), is a methodological procedure in which disagreeing scholars work with each other rather than against each other to resolve their empirical dispute.

First, adversaries must articulate their disagreement in terms that both sides find accurate. This eliminates the use of wishy-washy disagreement language that scholars use to make big claims with little accountability. This also prevents scholars from only confronting the strawman version of their opponent’s perspective. In our experience, these initial conversations often cause adversaries to retreat from the bailey to their motte, to realize that their opponent’s views are much more nuanced than they previously thought, and consequently, to discover that the disagreement is much smaller than previously thought.

Second, adversaries must mutually design methods that both sides consider a fair and unbiased test of their competing hypotheses. This eliminates cherry-picking of methodological procedures designed to confirm preferred hypotheses. And this eliminates the ability for scholars to design methods that can only confirm preferred hypotheses while writing off failed tests as studies that simply did not “work”. Scholars must commit a priori to the diagnosticity of the study and agree that contradictory findings would at least cast some doubt on their preferred hypothesis. In our experience, this step leads scholars to develop far more rigorous methods as each side vetoes the blatantly rigged procedures that their opponents prefer. And this leads to more efficient tests because the results are informative no matter how they turn out.

Third, adversaries must mutually write and publish the results. This eliminates the possibility of excessively broad claims. Each adversary serves as a check on their opponent to make sure the claims are duly circumspect. Such reports will be less likely to forward unwarranted promises that lead other scholars, policymakers, and interventionists down expensive dead ends.

Adcollabs should help reduce publication biases in the scientific literature. File-drawering studies is not an option—even if one side wishes to suppress the outcome, the other side likely would not. And because the results are diagnostic regardless of the outcome, they will tend to be publication worthy.

Adcollabs help solve another problem too. Academics, and especially those in the human behavioural sciences, are overwhelmingly left-leaning politically. This creates huge potential for the scientific literature to be systematically biased toward the shared preconceptions and preferences of authors, reviewers, and editors. There is virtually no reason to hope that the ideological skew will change anytime in the foreseeable future. Indeed, given historical trends and scholars’ self-proclaimed willingness to discriminate against the few conservatives who try to enter academia, things are likely to become only more lop-sided.


Scholars are incentivized to create the impression of disagreement, even if disagreements are nominal, minuscule, or even non-existent.



Adcollabs will not create ideological balance. But recall that scholars are incentivized to disagree. So although it may be near impossible to find scholars who disagree with the entire progressive worldview, it is much easier to find scholars who will disagree with one aspect of that worldview. There are some empirical conclusions that are so taboo that social costs far outweigh the potential benefits of challenging popular perspectives, and so few scholars would dare forward the heterodox perspective. But for many relevant debates, there are one-issue renegades who will challenge mainstream views. And when there are, adcollabs allow them to tackle mainstream perspectives mano a mano on a level playing field.

Calling All One-Issue Renegades

The practice of collaborating with adversaries has been around since at least 1988, and the term “adversarial collaboration” has been around since at least 2001 when Nobel Laureate Daniel Kahneman collaborated with adversary Ralph Hertwig on a disagreement regarding the conjunction fallacy in a project led by an arbiter, Barb Mellers. This straightforward and simple idea—that scholars who publish contradictory conclusions should work together rather than separately—has been around for over 20 years. Yet to our knowledge, only some dozens of scholars have tried them out.


  John Horgan 33 SUGGESTED READING Science: Power and Politics By John Horgan


We can think of a few reasons why.

Adcollabs are more time consuming than traditional collaborations because all procedures must be carefully negotiated. This meticulous process results in higher quality, but lower quantity (per effort unit) compared to traditional collaborations. And scholars prefer to publish as many papers as quickly as possible.

Adcollabs seem awkward. Humans generally avoid interpersonal conflict (despite appearances on Twitter), and adcollabs seem likely to result in some uncomfortable conversations. This is true—we have observed some contentiousness in our own adcollabs. But adversaries often start out wary of one another, and adcollabs can make friends of enemies. In our experience, adcollabs are more likely to improve relationships than worsen them.

Perhaps the biggest barrier of all is fear of undermining one’s own research program. This is a possible outcome. In our experience, adcollabs tend to reveal in which contexts each side is more or less correct rather than that only one side is correct. Even in the worst-case scenario of catastrophic failure, in our view, it is better for scholars to correct their own mistakes sooner than to have them corrected later by other scholars.



A great deal of science is unreliable—very similar studies do not consistently produce very similar results.


Although adcollabs may seem risky and unfamiliar, they almost certainly produce truer and more nuanced information more efficiently than traditional approaches. Faster and more accurate science means quicker and more reliable solutions to pressing societal challenges. Scientists still might not want to participate because the good of science and society does not outweigh the good of their own careers. But we think this view is mistaken. Adcollabs could help them be better scientists with more lasting impact. And adcollabs could transform the culture of science to one where adversaries are viewed with openness and curiosity, and where scholars who update their prior hypotheses and theories based on new evidence are respected and admired rather than humiliated. But don’t take our word for it. Try one out and see for yourself.

Latest Releases
Join the conversation