The Epistemic Function of Replication: Mapping the Domains of the Debate

Bence Orkeny¹

1. KU Leuven, Leuven, Belgium.

Originally published on May 23, 2025 at:

https://doi.org/10.31219/osf.io/h8swj_v1

Abstract

The “replication crisis” has sparked extensive debate about core epistemic issues in contemporary research practices. Central to this debate is the epistemic function of replication: how does replication contribute to knowledge, validity, and the structure of scientific inquiry? This paper offers a structured overview and analysis of 15 influential contributions to this debate. I categorize these papers along three key domains of focus: the relation between a replication and the original study; the role of replication within specific scientific disciplines; and its significance for science as a whole. My analysis reveals two notable research gaps. First, few articles address all three domains in an integrated way. Second, the literature appears to follow a temporal trajectory - from early enthusiasm, through critical reflection, to a more balanced reengagement - which merits further attention. Overall, this essay offers an initial analysis of the literature concerned with the epistemic function of replication.

Full article

Introduction

In the past decade there has been a growing concern regarding the “replication crisis” – the failure to replicate important scientific findings across a variety of fields; from biomedical sciences, though phycology to economics (Romero, 2019). Many scholars have argued that the crisis reveals a crucial epistemic problem in contemporary research practices, while others have criticized either the severity or the relevance of the ‘crisis’ narrative. A crucial aspect of this debate is the epistemic function of replication: how does replication relate to knowledge, validity and expression – especially in the context of science? The aim of this paper is to provide a brief overview and analysis of the literature on this topic.

I review and categorize 15 papers that present substantial contributions to the debate on the epistemic function of replication. While there are multiple competing definitions of replication, I will broadly understand it as the practice of repeating or recreating an experiment of a previous study (Romero, 2019). I summarise each paper based on their most substantial arguments and the scope/domain of those arguments (see Table 1). Afterwards, I categorise the papers along three different domains of focus: (1) the relation between the replication and replicated study/claim/theory; (2) the connection between replication as a practice and a broader scientific discipline and finally (3) the relationship between replication and science as a whole (see Table 2). As I will show, there is a surprising lack of papers focusing on all three domains.

My analysis reveals two notable research gaps. First, there is a surprising lack of articles that engage with all three domains of replication’s epistemic function. A valuable avenue for further research would be to investigate why most contributions focus on only one or two domains, and whether there is a need to develop more coherent and explicit connections across them. Second, the discourse on replication appears to follow distinct temporal stages: an initial period of enthusiasm, followed by a phase of critical reflection, and more recently, a return to nuanced, positive engagement. A more detailed examination of this trajectory could help trace the evolution of the replication debate and clarify how attitudes toward its epistemic value have shifted over time.

I acknowledge that the set of articles I examine is not exhaustive. Nonetheless, my essay should offer a snapshot of the most representative arguments and issues that have emerged over the past two decades regarding the epistemic function of replication.

Overview of Articles

Table 1 (see appendix) offers an overview of the main arguments discussed in the 15 articles, specifically highlighting a) the main arguments in the paper relating to epistemic function of replication and b) the specific scope/domain of the argument. I have ordered the articles chronologically by publication year.

At this point, it is important to address the use of language. For the purposes of this essay, I use the terms reproducibility and replicability interchangeably. While some authors distinguish between them, in the historical context of the replication debate, the concepts are sufficiently similar to justify this usage. Nonetheless, the variability in terminology is worth noting. A common view is that the language around replication and reproducibility stabilized around 2018–2019, meaning that investigations of earlier periods must also consider discussions labelled under reproducibility. For example, Collins used the term replication as early as 1985, while Radder consistently refers to reproducibility (H. M. Collins, 1985; Radder, 1992). Even up to 2014 the language use remains a problem, for example Norton repeatedly highlights terminological inconsistencies and uses replication as an umbrella term covering both reproducibility and repeatability (Norton, 2015). Even after the replication crisis, when replication became the dominant term in official discourse around 2018, Leonelli still titled their essay around reproducibility (Leonelli, 2018). Similarly, the most recent paper in this collection – by Peels and Bouter, published in 2023 – lists reproducibility among its keywords, despite restricting the discussion to replication and mentioning reproducibility only in a footnote (Peels & Bouter, 2023). The earliest work I have included are by Collins and Radder, both of which address questions of reproducibility (H. M. Collins, 1985; Radder, 1992). Although these papers predate the contemporary debate on replicability, they are among the first to explicitly focus on the epistemic function of reproducibility/replication. There is a significant gap in the literature between the publications by Collins and Radder and the next major contribution: Ioannidis’ influential article, Why Most Published Research Findings Are False (Ioannidis, 2005). While Ioannidis does not explicitly address replication or reproducibility, the paper raises several foundational concerns that are directly relevant to this debate. In this sense, Ioannidis’ work can be seen as a precursor to the broader replication debate. Importantly, it is one of the few early publications to critically examine the epistemic reliability of research findings and to diagnose systemic issues underlying widespread replication failure. Between the 1990s and 2010s, it was among the most influential—and almost the only—article to engage critically with this topic.² From 2014 on there is an increasing number of publications focusing explicitly on the epistemic function of replication (F. S. Collins & Tabak, 2014; Lynch et al., 2015; Norton, 2015; Simons, 2014; Stroebe & Strack, 2014). These publications often cross-reference each other and actively reflect and criticize each other’s standpoints in the debate. There is an intense publication period in 2018-2019 with Leonelli, Feest and Machery publishing soon after each other (Feest, 2019; Leonelli, 2018; Machery, 2020). Since 2019 there has been a relatively steady stream of contributions up until the most recent publication included in this review, published in 2023 (Guttinger, 2020; Nosek & Errington, 2020; Peels & Bouter, 2023).

Categorization of Articles

Table 2 (see appendix) summarizes the positions taken by different articles regarding the epistemic function of replication along the domain of different scopes of focus.

I identify three main domains of focus: (1) the epistemic relationship between a replication and the original study or claim; (2) the role of replication within specific scientific discipline(s); and (3) the broader connection between replication and science or scientific knowledge as a whole. Importantly, many articles engage with more than one of these domains. A key challenge was to distinguish substantive arguments regarding a specific domain from more general remarks or passing opinions. In order to do so, I attempted to identify the main domain(s) of the articles by looking at the abstracts and conclusions and additionally checking whether they have dedicated at least a page of discussion to a given domain. I found that most articles alluded to 2 domains and only in rare (2 cases) to 3.

For example some articles allude to the broader epistemic value of replication for science, however, they do not place it in the centre of their discussion (Feest, 2019; Lynch et al., 2015; Machery, 2020; Peels & Bouter, 2023; Stroebe & Strack, 2014). An example of this approach is Lynch et al. Lynch et al are clearly critical of the relation between direct replication and replicated study: “We believe that the very concept of an “exact replication” in social science is flawed […] Exact replication is impossible” (Lynch et al., 2015, p. 333). They argue that exact replications can never perfectly capture the exact conditions of the original study and therefore, the connection between replication and replicated study is questionable. As a result, they emphasize the value of conceptual replication over direct replication. Lynch et al also provide a detailed analysis of articles published in the field of psychology. Thus, their overall concern also includes questions regarding the specific domains of replication. Their focus, however, remains primarily on discipline-specific issues in psychology and the field of “social sciences”, rather than on replication practices across all scientific fields. (Lynch et al., 2015).

Similarly, while many works briefly mention replication in relation to specific disciplines, they often do not elaborate on the specific role of disciplines in relation to replication (H. M. Collins, 1985; Norton, 2015; Nosek & Errington, 2020; Radder, 1992). A prime example is the work of Norton who aims to establish “that a principle of replicability cannot be given a general formulation such as would allow it to serve in a formal logic of induction” (Norton, 2015, p. 230). Norton also looks at specific example of past replications and shows that background facts and assumptions play a crucial role and are often “quite specific to the case.” (Norton, 2015, p. 240) Thus, Norton argues that the relation between replication and replicated study overall do not support “successful replication as a good evidential guide and this has fostered the illusion of a deeper, exceptionless principle.” (Norton, 2015, p. 229) We can see that Norton both engages with both the domain of how a specific replication relates to the replicated study and both the overall generalizability of replication across the sciences. However, there is almost no mention of factors pertaining to particular disciplines.

Finally, there are articles which focus on the broader role replication, spending less time on evaluating the relation between the replication and replicated study (F. S. Collins & Tabak, 2014; Guttinger, 2020; Leonelli, 2018). For example, Leonelli argues that “reproducibility is not only interpreted in different ways, but also serves a variety of epistemic functions depending on the research at hand” (Leonelli, 2018, p. 1). Leonelli provides an analysis of reproducibility as used in different research fields and in relation to different methodologies. She points out that direct reproducibility is only intelligible in specific research contexts, and that in many cases reproducibility must be understood in different ways. Therefore, Leonelli claims that “given such variation, I argue that the uncritical pursuit of reproducibility as an overarching epistemic value is misleading and potentially damaging to scientific advancement” (Leonelli, 2018, p. 1). Thus, we can see that Leonelli engages in depth with the differences across disciplinary contexts and begins to critique the use of replication as a universal standard across science as a whole. However, Leonelli’s project is not primarily concerned with the epistemic relation between the reproduced and original study; instead, her focus is on analyzing how reproduction is used in relation to different methodologies across fields.

For each of these three domains of focus I have identified three positions in the debate: 1) articles highlighting that replication has an important epistemic function in relation to the specific scope (marked with “+”) 2) articles critical regarding the epistemic value of replication in the given context (marked with and “–”) 3) articles not expressing an explicit argument, either supportive or critical, regarding the epistemic value of replication in the specific domain (marked with “N/A”).

Again, I run into a difficulty to precisely distinguish the content of the main arguments. To address this, again I adopted a conservative strategy: I only marked those articles in the table with “+” or “–” that offered a substantial argument – either supportive or critical – or have dedicated at least a full page to a given domain. If an article did not meet this threshold, I marked the entry as “N/A”. The “N/A” category therefore includes both articles that do not discuss a particular domain at all and those that only mention it in passing.

Likewise, the interpretation of whether an article expressed a supportive or critical stance toward the epistemic function of replication necessarily involved a degree of subjective judgment. Here too, I focused on the main thesis of each article. If an article argued primarily that replication plays a valuable epistemic role and supported this with substantive reasoning, I marked it with a “+”, indicating that replication is regarded as epistemically useful in that specific scope. For example, in the case of Nosek and Errington, they propose a revsied definition of replication as a “a study for which any outcome would be considered diagnostic evidence about a claim from prior research.” (Nosek & Errington, 2020, p. 1) Based on their new definition they argue that replication provides evidence for generalizability, and helps to test predictions, create new descriptions and provide more explanation. Overall, they claim that “defining replication as a confrontation of current theoretical expectations clarifies its important, exciting, and generative role in scientific progress.” (Nosek & Errington, 2020, p. 7) Thus their endorsement of replication both as an important tool to test the original study, and as a method for scientific progress clearly indicates that they are supportive regarding the epistemic function of replication in these domains.

Conversely, if an article offered critical arguments regarding a certain domain, I marked it with a “–”, meaning that the authors view replication as limited or problematic in that context. For example, Guttinger introduces the critical debate which question whether “all of science should be replicable” (Guttinger, 2020, p. 1). Guttinger analyses the discourse of “new localism” and proposes to resolve some of the issues that proponents of new localism would face (Guttinger, 2020). Overall Guttinger argues that new localism offers not only a plausible criticism of replication but also an important one: “replicability might be a local problem for specific sub-fields of science and that replicability itself should not be treated as a universal epistemic norm.” (Guttinger, 2020, p. 9) Thus, Guttinger is clearly taking a critical stance regarding the epistemic function of replication in the domain of science.

Some articles explicitly compare or propose alternative forms of replication (Lynch et al., 2015; Machery, 2020; Nosek & Errington, 2020). While such works may offer a critique about certain forms of replication, nonetheless these articles still endorse other forms of replication as epistemically significant. In these cases, I again took a conservative approach: if the overall conclusion was that replication – though perhaps revised or redefined – remains epistemically valuable, I marked the article with a “+”. If the conclusion leaned toward cautioning against the epistemic role of replication altogether, I marked it with a “–”. In the case of Machery he proposes a new, Resampling Account of replication. While proposing his account, he argues that “the usual notion of a conceptual replication is confused” and he concludes by “rejecting the very distinction between direct and conceptual replication, as it is usually drawn.” (Machery, 2020, p. 547) Nonetheless, regarding his own Resampling Account he argues that it is epistemically useful and solves many of the issues that critiques of direct and conceptual replication has raised (Machery, 2020). Thus, while he is critical of prior definitions of replication, he is overall positive regarding the epistemic value of his new account.

Still, I acknowledge that these judgments are not clear-cut, and the distinction between “+” and “–” can be debated.

Research Gaps

As Tables 1 and 2 demonstrate, there is a wide range of perspectives and scopes represented in the literature on the epistemic function of replication over the past decades. Nonetheless, it is striking how few articles engage substantively with all three domains of the debate. As Table 2 shows, most articles focus on two of the three domains. With the exception of Simons and, more recently, Sikorski and Andreoletti, no other article in the selection provides a sustained discussion of all three (Sikorski & Andreoletti, 2024; Simons, 2014).

Simons’ article takes a strong stance in favour of direct replication, arguing that is essential to establish the reliability and generalizability of scientific findings across all disciplines (Simons, 2014). He also explicitly engages with the role of replication in psychology, reflecting on broader questions about whether the discipline can produce robust, replicable effects. Sikorski and Andreoletti adopt a similarly comprehensive perspective, defending the ‘orthodox’ view that replication is always epistemically valuable (Sikorski & Andreoletti, 2024). They argue that replication can both corroborate and falsify findings, and that it always yields epistemically relevant information about the original study. Importantly, their article situates this defence within the wider debate, engaging directly with concerns raised by Feest, Leonelli, and Guttinger. In doing so, they challenge the view that replication’s value is limited to certain disciplines (or contexts) and instead assert its relevance for science as a whole.

Additionally, Table 2 offers a glimpse into how the discussion around replication has evolved over time. We observe an initial phase of generally positive engagement (with the exception of Harry Collins (H. M. Collins, 1985)) across all categories, followed by roughly five years of more critical or negative engagement, and then a return to a more positive tone in recent years. This pattern may reflect early over-enthusiasm for replication after the broader scientific community began recognizing problems related to replicability. Once the rhetoric of a “replication crisis” became more established, we see a potential backlash, with greater contextualization in the discourse -authors, such as Leonelli or Feest, began to emphasize that replication, while important, is and should not be the sole concern in scientific practice. Finally, in more recent years, the discussion appears to have shifted toward a more balanced and nuanced view, focusing on the specific uses and limitations of replication.

It is important to note, however, that my analysis is not a comprehensive literature review, but rather a preliminary survey of prominent articles engaging with the epistemic value of replication. A promising direction for future research would be to conduct a more detailed analysis of past literature in order to trace the precise trajectory of the replication debate since Collins’ early work.

Conclusion

Overall, my categorization reveals notable gaps in the literature. While nearly all articles aim to engage with the broader domains of replication – such as its role within specific scientific disciplines or its function in science as a whole – very few provide a substantive, integrated discussion that connects all three domains. Even the contributions by Simons and Sikorski and Andreoletti, which come closest to addressing the full scope, offer limited attention to the disciplinary domains. In both cases, replication within specific disciplines is treated either as an illustrative example (as in Simons’ discussion of psychology(Simons, 2014)) or as part of a rebuttal to opposing views (as in Sikorski and Andreoletti’s engagement with Leonelli and Feest (Sikorski & Andreoletti, 2024)).

Additionally, my analysis suggests that the discourse surrounding replication follows certain patterns; with an initial positive period of enthusiasm followed by more critical engagement, and finally in more recent years a return to more supportive literature. While this analysis is only preliminary, it opens up a promising avenue for further research aimed at clarifying the different phases and directions within the broader debate on the epistemic function of replication. Moreover, it would be particularly interesting to investigate how these phases – and the discussions around different domains of replication – have evolved over time, and whether there are ways to meaningfully connect arguments across different periods and domains.

In conclusions, it is evident that these three domains – replication in relation to the original study, to disciplinary context, and to science as a whole – are widely recognized as significant. Many authors position their arguments explicitly along one or more of these lines. This raises important questions about why particular domains are emphasized over others, and whether this selective focus limits the depth of the overall discourse. Importantly, the extent of this broader scope, and how it ought to be conceptualized, remains underexplored. As such, a valuable next step in the debate would be to reflect more critically on the scope of replication arguments and to encourage more contributions – like those of Simons and Sikorski and Andreoletti – that seek to bridge multiple domains in a systematic way.

Bibliography

Collins, F. S., & Tabak, L. A. (2014). NIH plans to enhance reproducibility. Nature, 505, 612– 613.

Collins, H. M. (1985). Changing order: Replication and induction in scientific practice. Sage Publications.

Feest, U. (2019). Why Replication Is Overrated. Philosophy of Science, 86(5), 895–905. https://doi.org/10.1086/705451

Guttinger, S. (2020). The limits of replicability. European Journal for Philosophy of Science, 10(2), 10. https://doi.org/10.1007/s13194-019-0269-1

Ioannidis, J. P. A. (2005). Why Most Published Research Findings Are False. PLoS Medicine, 2(8), e124. https://doi.org/10.1371/journal.pmed.0020124

Leonelli, S. (2018). Rethinking Reproducibility as a Criterion for Research Quality. In L. Fiorito, S. Scheall, & C. E. Suprinyak (Eds.), Research in the History of Economic Thought and Methodology (Vol. 36, pp. 129–146). Emerald Publishing Limited. https://doi.org/10.1108/S0743-41542018000036B009

Lynch, J. G., Bradlow, E. T., Huber, J. C., & Lehmann, D. R. (2015). Reflections on the replication corner: In praise of conceptual replications. International Journal of Research in Marketing, 32(4), 333–342. https://doi.org/10.1016/j.ijresmar.2015.09.006

Machery, E. (2020). What Is a Replication? Philosophy of Science, 87(4), 545–567. https://doi.org/10.1086/709701

Norton, J. D. (2015). Replicability of Experiment. THEORIA. An International Journal for Theory, History and Foundations of Science, 30(2), 229. https://doi.org/10.1387/theoria.12691

Nosek, B. A., & Errington, T. M. (2020). What is replication? PLOS Biology, 18(3), e3000691. https://doi.org/10.1371/journal.pbio.3000691

Peels, R., & Bouter, L. (2023). Replication and trustworthiness. Accountability in Research, 30(2), 77–87. https://doi.org/10.1080/08989621.2021.1963708

Radder, H. (1992). Experimental Reproducibility and the Experimenters’ Regress. PSA: Proceedings of the Biennial Meeting of the Philosophy of Science Association, 1992(1), 63–73. https://doi.org/10.1086/psaprocbienmeetp.1992.1.192744

Romero, F. (2019). Philosophy of science and the replicability crisis. Philosophy Compass, 14(11), e12633. https://doi.org/10.1111/phc3.12633

Sikorski, M., & Andreoletti, M. (2024). Epistemic Functions of Replicability in Experimental Sciences: Defending the Orthodox View. Foundations of Science, 29(4), 1071–1088. https://doi.org/10.1007/s10699-023-09901-4

Simons, D. J. (2014). The Value of Direct Replication. Perspectives on Psychological Science, 9(1), 76–80. https://doi.org/10.1177/1745691613514755

Stroebe, W., & Strack, F. (2014). The Alleged Crisis and the Illusion of Exact Replication. Perspectives on Psychological Science, 9(1), 59–71. https://doi.org/10.1177/1745691613514450

Notes

2. There are some exceptions, such as Schneider (2000) and Singh (2003), although these focus primarily on the social and economic sciences and are primarily concerned with how replication might be adapted for those fields.

Appendix

**Table 1.** Overview of the selected articles in regard to epistemic function and scope/domain of arguments
Publication	Epistemic function	Scope/Domain
Collins (1985)	Experimenters’ regress: if there is no consensus regarding correct measurement/experiment, reproducibility is not informative Role of experimental competence and skills Replication as a test of theoretical hypothesis Depends on tacit knowledge and community consensus	Criticisms of the lack of replication in experimental practices Focuses on specific experimental actions and practices, in order to draw conclusions on experimental practices in general
Radder (1992)	Three types of reproducibility a) under fixed theoretical interpretation b) as result of experiment c) material realization of experiment Explicit theoretical knowledge can solve the experimenters’ regress Reproducibility is an important part of scientific practice	Concerned with what and whom of reproducibility Analyses specific practices and actions in experimenting Highlights the role of reproducibility in the context of scientific practice
Ioannidis (2005)	Individual studies cannot be taken out of the context of the specific scientific field importance of connection to other studies in the field Single studies are epistemically weak due to high false positive rates Emphasizes value of synthesized evidence (e.g., meta-analyses)	Generalizability depends on field-specific factors: effect size, bias, study size, number of hypotheses Warnings against interpreting isolated findings Advocates viewing results in context of wider research ecosystems
Simons (2014)	Argues that reproducibility is “the cornerstone of science” Direct replication is useful to establish reliability of studied phenomenon Replicability is useful to differentiate between systematic and unsystematic error	Direct replication is central to achieve useful and generalizable predictions Focuses on the relation between replication and theory/findings but connects it to the broader aims of science
Collins and Tabak (2014)	Problem of lack or replications multiple factors have contributed Replication is a social process, not purely methodological Replication serves an important epistemic function	Reproducibility as a regulatory tool for science Recommends policy level changes Context of research community, publishers, industry etc. Focuses on the larger state of science
Lynch et al (2014)	Exact replication is flawed concept in social sciences Conceptual replication and extensions are more epistemologically useful Discusses the work of Replication Corner at IJRM	Overview of multiple replication studies Focus on the epistemic value of conceptual replication and how its links to specific discipline
Strobe and Strack (2014)	Direct replication epistemologically uninformative in case of psychology Direct replications test theoretical principles and not the actual experimental procedure Conceptual replication more epistemological value, increases confirmatory power	Focuses on the relation between direct/conceptual replication and specific theory/experimental method Concerned with the usefulness of specific instances of direct replications
Norton (2015)	Inductive logic does not support the universal principle of replicability Background facts matter for success of replication Replications epistemic warrant cannot be generalized	Failure to replicated does not diminish experimental result Replicability cannot be overly generalized
Leonelli (2018)	Validation of results does not require direct reproducibility There are a variety of epistemic functions that reproducibility serves Reproducibility differs based on research field Nonreproducible research has epistemic value	Reproducibility is one aspect of the overarching epistemic values present in science Reproductions need to be tailored to specific discipline/original study Focuses on the relation between reproducibility and disciplines/science
Feest (2019)	Replication involves high degree of epistemic uncertainty concern of “conceptual scope” direct replication not informative regarding systematic problems Replication has limited function to improve psychological science Instead of replication psychology is concerned with exploration	Concern with replication and original study Concern that replication is not informative regarding systematic problems Concern with replication and its connection to the wider discipline – replication does uncover systematic errors
Machery (2019)	replication can evaluate reliability of findings new general account of replication – the resampling account Replaces direct and conceptual replication distinction	Focused on the relation between replication and original study Useful to access reliability of replicated experiment Resampling account can be used across disciplines
Guttinger (2020)	Replicability not a universal epistemic standard in science Emergence of ‘local’ narrative: replication is not a problem in general science Non-replicable research can be epistemically valuable New localism (issues of replications are specific to subdiscipline) is an important extension of the replication debate	Replicability not a universal epistemic standard in science ‘New localism’ targets the idea that reproducibility should be a universal standard Non-replicable research can be epistemically valuable
Nosek and Errington (2020)	Argue that the distinction of direct/conceptual replication is irrelevant Define replication as a study which has an outcome that can be used as diagnostic evidence regarding previous research Replication decreases or increases the confidence in prior claim, depending on the result	Replication serves to confront prior claims with evidence Confronts current theoretical expectations Replications serve to test models and applicability of theories Replication helps with mature understanding of a theory
Sikorski and Andreoletti (2023)	Defend orthodox view on replication: replication always epistemically useful Responding to Feest and Leonelli’s concerns Replication can both corroborate and falsify Replication always tells something epistemically useful about original study	Focused on relation between replication and replicated study Connects it to larger debate (e.g. Feest, Leonelli, Guttinger etc)
Peels and Bouter (2023)	Replication is generally useful to achieve trust Nonetheless, other considerations also need to be considered besides replication	Replication and relation between main findings of primary study Concerned with how instances of replication can result in increasing the trustworthiness of the replicated study

**Table 2**. The stance of authors in regard to the epistemic function of replicability, separated by specific scope/domain
Publication	Replication / Replicated Study	Replication / Discipline	Replication / Science
Collins (1985)	–	N/A	+
Radder (1992)	+	N/A	+
Ioannidis (2005)	N/A	N/A	N/A
Simons (2013)	+	+	+
Collins and Tabak (2014)	N/A	+	+
Lynch et al. (2014)	–	+	N/A
Stroebe and Strack (2014)	–	–	N/A
Norton (2015)	–	N/A	–
Leonelli (2018)	N/A	+	–
Feest (2019)	–	–	N/A
Machery (2019)	+	+	N/A
Guttinger (2020)	N/A	–	–
Nosek and Errington (2020)	+	N/A	+
Sikorski and Andreoletti (2023)	+	+	+
Peels and Bouter (2023)	+	+	N/A

Editors

Kathryn Zeiler
Editor-in-Chief

Jennifer Anne Byrne
Handling Editor

Editorial Assessment

by Jennifer Anne Byrne

DOI: 10.70744/MetaROR.142.1.ea

This study reviews and categorises 15 publications that debate the epistemic function of replications. The 14 articles and one book were categorised according to whether and how they addressed 3 domains, namely (i) the relationship between a replication and its original study, (ii) the role of replications within specific disciplines, and (iii) the significance of replications for science overall. While recognising the interest in these questions, the reviewers described a number of ways in which the manuscript could be improved.

All reviewers highlighted the need to describe how the 15 publications were selected. The analysis of a relatively small number of publications was viewed to limit the generalisability of any conclusions, such as how the specific terms used to describe replications and/or published descriptions of replications may have changed over time, and outstanding evidence gaps. Reviewers also queried elements of the methodology used to analyse the 15 publications, highlighted the need to assess entire publications. The requirement for one page of discussion to be devoted to a given domain could represent varying proportions of different publications, according to overall length and formatting, and may therefore have biased some assessments of whether individual publications addressed the domains of interest.

The manuscript described relatively few articles that have addressed all 3 domains. Reviewers questioned both this conclusion (based on the analysis of a select group of publications), and whether individual publications would be reasonably expected to address all 3 domains, while recognising the value of at least some articles taking this approach.

Recommendations from the editor

Symbols used in Table 2 should be defined as footnotes, as all symbols could be interpreted differently by individual readers.

Recommendations for enhanced transparency

Add author ORCID ID.
Add a competing interest statement. Authors should report all competing interests, including not only financial interests, but any role, relationship, or commitment of an author that presents an actual or perceived threat to the integrity or independence of the research presented in the article. If no competing interests exist, authors should explicitly state this.
Add a funding source statement. Authors should report all funding in support of the research presented in the article. Grant reference numbers should be included. If no funding sources exist, explicitly state this in the article.

Competing interests: None.

Peer Review 1

Olavo Amaral

DOI: 10.70744/MetaROR.142.1.rv1

This work presents a brief review and categorization of articles debating the epistemic function of replication between 1985 and 2023. While the topic is interesting, the lack of description of the review methodology precludes drawing general conclusions from the sample.

Major points:

How were the 15 papers selected? Is there any kind of systematic process underlying the selection? If there is not, how can we assess whether these articles are indeed representative of the field (or “a snapshot of the most representative arguments”, as mentioned in the introduction)? This is a very important omission in the methods and compromises the value of the work, as it calls its general conclusions into question (see points below). Even if a nonsystematic approach has been used, a minimal description of the review methodology (including search method and inclusion criteria) is needed for one to make sense of the findings.
The categorization scheme used for the articles is useful as a general descriptor of the research base that was reviewed, but I personally don’t think that “evidence gaps” can be inferred from it. In particular, I don’t see a problem with having few papers addressing all three domains (i.e. relation between replication and replicated study, connection of replication to particular disciplines and connection of replication with science as a whole) as the author seems to imply. These purposes can be filled by different articles, and thus the absence of the three approaches.
If the author wants to identify research gaps, it would be more informative to qualitatively analyze the articles rather than rely on this simple categorization. That said, I would argue that one can only speak about “gaps” when an attempt to systematically review the literature has been made, which as far as I could understand does not seem to be the case here.
Similarly, I think the absence of a formal review methodology makes the generalizations about how the discussion of replications in the literature has evolved over time unwarranted. For making a statement on temporal evolution, one would need to have a sample that is representative of the literature. Otherwise, these patterns might arise as a byproduct of the particular articles that were selected, and there is really no way to tell whether these reflect the literature as a whole in the absence of a systematic review.

Other conceptual points:

Contrary to what the authors argue in page 3, I don’t think the language around replication and reproducibility have stabilized (see for example https://doi.org/10.31222/osf.io/entu4, https://osf.io/ewybt). This may have happened to some extent within specific communities (notably that centered around reproducibility in psychology), but language is still very inconsistent within and between many research fields.
I don’t really think that Ioannidis’ “Why most published research findings are false” pertains to the sample, as it does not seem to discuss the epistemic function of replication. The author seems to acknowledge this, but still includes the article arguing that it raises relevant concerns (which to me reflects the absence of explicit inclusion criteria) and that “between the 1990s and 2010s it was among the most influential—and almost the only—article to engage critically with this topic”. The “almost the only” part of this statement seems exaggerated to me, and probably reflects the non-systematic nature of the search. For examples of articles dealing with similar questions as Ioannidis’ in the medical field during that period, for example, see

Minor points and wording:

Page 1: “psychology”, not “phycology”
Page 1: “Many scholars have argued…” – this whole sentence needs references for both sides of the debate.
Page 2: The description of topic (2) in the introduction (i.e. “the connection between replication as a practice and a broader scientific discipline”) seems confusing as it stands and could be better worded.
Page 5: “broader role of replication”, not “broader role replication”.
Page 5: “For each of these three domains of focus I have identified three positions in the debate: 1) articles highlighting that replication has an important epistemic function in relation to the specific scope (marked with “+”) 2) articles critical regarding the epistemic value of replication in the given context (marked with and “–”) 3) articles not expressing an explicit argument, either supportive or critical, regarding the epistemic value of replication in the specific domain (marked with “N/A”).”
The distinction between “+” and “-“ (which seems to be a matter of supporting vs. questioning the epistemic value of reproducibility) is hard to follow when you use different words in points (1) and (2). Using “specific scope” for (1) and “given context” for (2) when referring to the same concepts may avoid repetition, but unnecessarily confuses the reader about the distinction between the two categories.
Page 6: “I only marked those articles in the table with “+” or “–” that offered a substantial argument”. Stating this as “I only marked with “+” or “-“ those articles in the table that offered a substantial argument” would be clearer.

Competing interests: None.

Peer Review 2

Sven Ulpts

DOI: 10.70744/MetaROR.142.1.rv2

The author attempts to provide a review of how epistemic functions of replication and reproducibility are presented in 15 sources of the ongoing replication debate. To that end the author categorized sources according to how they engage with three “domains of focus” (p2): 1) how the replicated/reproduced study relates to the original, 2) what the role of replication is in a specific discipline, and 3) what the role of replication is in science more generally. The described method of categorization is based on looking at abstracts and conclusions and checking whether at least one page is dedicated to a specific domain in the discussion (irrespective of the length of the sources). Based on this review the author concludes that there are “notable gaps in the literature” (p. 8), with only a few engaging with and integrating all three domains. According to the author, only Simons (2014) and Sikorski and Andreoletti (2024) integrate all three domains, and the literature needs more such contributions. The author makes multiple general claims about the replication discourse, such as, that the replication debate had an initial period of enthusiasm followed by a period characterized by a more critical attitude and recently returning to a period of nuanced enthusiasm.

In the following I will first go through some specific problems I see with the content of the manuscripts, then I will list some minor formalities. Lastly, I will provide a brief recommendation of how to proceed with this manuscript.

Specific problems with the content

On page 1 the author states that: “Many scholars have argued that the crisis reveals a crucial epistemic problem in contemporary research practices, while others have criticized either the severity or the relevance of the ‘crisis’ narrative.” However, there are references missing for such a statement. The author could for instance use Peterson and Panofsky (2023), Fanelli (2018), or Feest (2019, 2024). More than one reference should be provided, because the author states “many scholars argued …”.

While the author claims to review the literature on the epistemic functions of replication, he does not address the relevant literature on that topic. Two are, for instance, Schmidt (2009), which is something of an obligatory passage point when conceptual and direct replication are defined in psychology and Matarese (2022), who provided a review of kinds of replication and also categorized them based on the epistemic functions in her functional approach (also see Albertoni et al., 2023, who call epistemic functions ‘reasons’ in their review on the reproducibility terminology in machine learning). The author might also want to check Haig (2022).

It remains unclear why those 15 sources were selected for an analysis, if it was random, that is okay, but it would still be important to actually comprehend why those 15 sources are analysed. Moreover, starting on page 2 and throughout the manuscript the author makes claims about the wider replication discourse based on the selected 15 publications. There is no problem with a focused in-depth review of those publications. However, the analysis of the sources remains quite superficial and the inferences as well as claims about the general replication discourse are unwarranted based on what was analysed. General claims are inappropriate based on what the author did. Relatedly, on page 2 the author claims that the arguments about the functions of replication captured based on the 15 publications are “most representative” of the last 2 decades. However, it remains totally unclear how that is established. Therefore, right now this is an unwarranted claim.

Also on page 2, the author states that the terms replicability and reproducibility are used interchangeably, because there is enough agreement in the literature to justify such a usage. Using these terms interchangeably is okay, as long as the author informs the reader about that circumstance. However, the claim that such a use is justified because there is enough agreement in the literature or among scholars seems to be in contradiction with existing reviews on the topic (see e.g., Barba, 2018).

The author states that “A common view is that the language around replication and reproducibility stabilized around 2018–2019” (pp. 2-3) and on page 3 continues with “Even after the replication crisis, when replication became the dominant term in official discourse around 2018”. Such a claims requires evidence, for instance in the form of references. If the claim is that it is a common view such agreement needs to be shown, it cannot just be stated without any support. If it is a common view, where are the references in support of that view? If the sentences following this claim are meant as support for that claim, then this is further problematic because the literature seems misrepresented. For example, it is correct that Peels and Bouter (2023) only use the term reproducibility in footnote 1, but they use the term “reproduction” in their distinction between three types of replication: “Replication with existing data and the same research protocol: re-analysis of the data from the primary study with the primary research question. This we refer to as a “reproduction”. It can be argued that a publication should first be checked for numerical inconsistencies in the data and in the statistical parameters. Then a re-analysis of the existing data with the same data-analysis plan comes in view, potentially followed by analyses of the existing data with one or more alternative data-analyses plans. When these tests are passed satisfactorily it can be concluded that the findings of the primary study are robust (Nuijten 2021).” (Peels & Bouter, 2023, p. 79). Furthermore, my personal impression of the literature and own work on the issue in combination with existing reviews would suggest the opposite, that terminological proliferation and confusion increased since the claims of a replication crisis started to circulate and there is no emerging agreement of terminology in sight (see e.g., Gundersen, 2021; Ulpts & Schneider, 2024).

On page 3 the author claims that “There is a significant gap in the literature between the publications by Collins and Radder and the next major contribution: Ioannidis’ influential article, Why Most Published Research Findings Are False (Ioannidis, 2005).” This does not seem to be true considering the literature on replication and reproducibility in between 1992 and 2005. Here are some examples to consider: Basili et al. (1999), Bogen (2001), Chen (1994), Easley et al. (2000), Franklin (1999), Hunter (2001), King (1995), Peseran (2003), and Tsang & Kwan (1999).

The method described on page 4 for categorizing the publications, seems problematic. The author states to have categorized the selected literature within the three domains of focus “by looking at the abstracts and conclusions and additionally checking whether they have dedicated at least a page of discussion to a given domain”. It is just 15 sources, for a proper analysis, categorization, and representation of content, why not just read and analyse them from beginning to end? The approach described is flawed and too superficial for the indented purpose. Therefore, also the title of the manuscript is misleading and does not capture what is actually attempted in the actual paper, there is no mapping of the debate it is an attempt at capturing the discourse on the epistemic function of replication in 15 selected sources.

As an example of the superficial representation of the content of the selected articles, in the categorization of the 15 sources the author claims on page 5 about Leonelli (2018): “Leonelli’s project is not primarily concerned with the epistemic relation between the reproduced and original study; instead, her focus is on analyzing how reproduction is used in relation to different methodologies across fields” Which is not untrue, but in her 6 types of reproducibility she actually does elaborate on how the original study relates to the reproducing study as well as what function might be addressed. This apparent mischaracterization of Leonelli (2018) and other sources is probably a consequence of the improper analysis/ review method used which does not consider the full text. The chosen analysis and categorization approach is further problematic considering that the length of the 15 sources varies considerably, with Collins (1985) being a book. Furthermore, on page 4 the author also lists Norton (2015) as a source that does not acknowledge disciplinary contexts (“the role of disciplines”), which seems inaccurate considering that his perspective is based on an inductive approach to experimentation applied to four cases from at least three different disciplines.

On page 6 the author, in referring to Guttinger (2020), states: “Guttinger is clearly taking a critical stance regarding the epistemic function of replication in the domain of science.” I do not see how such a statement is justified. Guttinger (2020) is critical in reaction to other scholars claiming the general importance and relevance of replication as an epistemic practice and criterion, but he is not generally critical of the role or function of replication in the domain of science. He states that it is a local problem and practice, that does not mean he is critical of the function of replication for the domain of science. The analysis and representation of this literature seem out of context and unsubstantiated. Especially considering that much of these discussions (“new localism”) started to emerge after Peels and Bouter (2018) promoted the idea of replication in the humanities.

The identification of “research gaps” starting on page 7 based on the categorization is not only wanting due to the problematic review methodology, but also because the categorization seems superficial. Just looking at the domain focus on role of replication in science. It might just be a lack of nuance, neglect, or carelessness on the part of the authors of the selected sources (Simons, 2014; Sikorski & Andreoletti, 2024), instead of an actual consideration of the role of replication in science that led to the (+) in that domain in Table 2. Especially Simons (2014) is clearly psychology focused, and the speaking of science more generally does not seem to be an actual elaborate consideration of the role of replication in science.

Minor Formalities

There are some inconsistencies regarding the years of publication for the sources used. In Table 1, for instance, Sikorski and Andreoletti is listed as (2023), but in the reference list it is listed as (2024). Additionally, Lynch et al. is listed in Table 1 as (2014), but in the reference list as (2015). While Machery is in Table 1 listed as (2019) and in the references as (2020). The Schneider (2000) reference from footnote 2 is not in the reference list. In Table 1 for Ioannidis (2005) the listing of key points is off and for Guttinger (2020) there are more points listed than made.

Recommendation

All in all, the intention behind the manuscript is worthwhile and elaborating on the epistemic functions behind replication and reproducibility in disciplinary contexts as well as their role in the whole of science is an important endeavour, but the analysis presented in this manuscript is wanting and problematic. This “snapshot” should be situated within the existing literature and especially reviews on the epistemic functions of reproducibility and replication (Albertoni et al., 2023; Matarese, 2022; Ulpts & Schneider, 2024). The inclusion criteria for the 15 sources should be made transparent and argued for in the context of the intended purpose of the work. Having only 16 references in a manuscript that reviews 15 sources is an indication that something is lacking. Furthermore, to ensure that the selected sources are appropriately represented and categorized the full text needs to be considered and not just abstract, conclusion and whether at least one page is dedicated in the discussion to a specific domain. Therefore, major revisions are required, not just to ensure that the claims made are justified, but also to not create the impression of research gaps where there are none.

References

Albertoni, R., Colantonio, S., Skrzypczyński, P., & Stefanowski, J. (2023). Reproducibility of Machine Learning: Terminology, Recommendations and Open Issues (Version 1). arXiv. https://doi.org/10.48550/ARXIV.2302.12691

Barba, L. A. (2018). Terminologies for Reproducible Research (Version 1). arXiv. https://doi.org/10.48550/ARXIV.1802.03311

Basili, V. R., Shull, F., & Lanubile, F. (1999). Building knowledge through families of experiments. IEEE Transactions on Software Engineering, 25(4), 456–473. https://doi.org/10.1109/32.799939

Bogen, J. (2001). `Two as good as a hundred’: Poorly replicated evidence in some nineteenth-century neuroscientific research. Studies in History and Philosophy of Science Part C: Studies in History and Philosophy of Biological and Biomedical Sciences, 32(3), 491–533. https://doi.org/10.1016/S1369-8486(01)00013-9

Chen, X. (1994). The rule of reproducibility and its applications in experiment appraisal. Synthese, 99(1), 87–109. https://doi.org/10.1007/BF01064532

Collins, H. M. (1985). Changing order: Replication and induction in scientific practice. Sage Publications.

Easley, R. W., Madden, C. S., & Dunn, M. G. (2000). Conducting Marketing Science. Journal of Business Research, 48(1), 83–92. https://doi.org/10.1016/S0148-2963(98)00079-4

Fanelli, D. (2018). Is science really facing a reproducibility crisis, and do we need it to? Proceedings of the National Academy of Sciences, 115(11), 2628–2631. https://doi.org/10.1073/pnas.1708272114

Feest, U. (2019). Why Replication Is Overrated. Philosophy of Science, 86(5), 895–905. https://doi.org/10.1086/705451

Feest, U. (2024). What is the Replication Crisis a Crisis Of? Philosophy of Science, 91(5), 1361–1371. https://doi.org/10.1017/psa.2024.2

Franklin, A. (1999). How to Avoid the Experimenters’ Regress. In R. S. Cohen & M. W. Wartofsky (Eds.), Can that be Right? (Vol. 199, pp. 13–38). Springer Netherlands. https://doi.org/10.1007/978-94-011-5334-8_2

Gundersen, O. E. (2021). The fundamental principles of reproducibility. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 379(2197), 20200210. https://doi.org/10.1098/rsta.2020.0210

Guttinger, S. (2020). The limits of replicability. European Journal for Philosophy of Science, 10(2), 10. https://doi.org/10.1007/s13194-019-0269-1

Haig, B. D. (2022). Understanding Replication in a Way That Is True to Science. Review of General Psychology, 26(2), 224–240. https://doi.org/10.1177/10892680211046514

Hunter, J. E. (2001). The Desperate Need for Replications. Journal of Consumer Research, 28(1), 149–158. https://doi.org/10.1086/321953

Ioannidis, J. P. A. (2005). Why Most Published Research Findings Are False. PLoS Medicine, 2(8), e124. https://doi.org/10.1371/journal.pmed.0020124

King, G. (1995). Replication, Replication. PS: Political Science and Politics, 28(3), 444. https://doi.org/10.2307/420301

Machery, E. (2020). What Is a Replication? Philosophy of Science, 87(4), 545–567. https://doi.org/10.1086/709701

Matarese, V. (2022). Kinds of replicability: Different terms and different functions. Axiomathes, 32(S2), 647–670. https://doi.org/10.1007/s10516-021-09610-2

Norton, J. D. (2015). Replicability of Experiment. THEORIA. An International Journal for Theory, History and Foundations of Science, 30(2), 229. https://doi.org/10.1387/theoria.12691

Nuijten, M. B. (2021). Assessing and improving robustness of psychological research findings in four steps. https://doi.org/10.31234/osf.io/a4bu2

Peels, R., & Bouter, L. (2018). The possibility and desirability of replication in the humanities. Palgrave Communications, 4(1), 95. https://doi.org/10.1057/s41599-018-0149-x

Peels, R., & Bouter, L. (2023). Replication and trustworthiness. Accountability in Research, 30(2), 77–87. https://doi.org/10.1080/08989621.2021.1963708

Pesaran, H. (2003). Introducing a replication section. Journal of Applied Econometrics, 18(1), 111–111. https://doi.org/10.1002/jae.709

Peterson, D., & Panofsky, A. (2023). Metascience as a Scientific Social Movement. Minerva, 61(2), 147–174. https://doi.org/10.1007/s11024-023-09490-3

Schmidt, S. (2009). Shall we Really do it Again? The Powerful Concept of Replication is Neglected in the Social Sciences. Review of General Psychology, 13(2), 90–100. https://doi.org/10.1037/a0015108

Simons, D. J. (2014). The Value of Direct Replication. Perspectives on Psychological Science, 9(1), 76–80. https://doi.org/10.1177/1745691613514755

Tsang, E. W. K., & Kwan, K.-M. (1999). Replication and Theory Development in Organizational Science: A Critical Realist Perspective. The Academy of Management Review, 24(4), 759. https://doi.org/10.2307/259353

Ulpts, S., & Schneider, J. W. (2024, June 17). A conceptual review of uses and meanings of reproducibility and replication. https://doi.org/10.31222/osf.io/entu4

Competing interests: None.

Peer Review 3

Anonymous User

DOI: 10.70744/MetaROR.142.1.rv3

The author submitted a preprint providing their perspective on relatively recent papers that have been published discussing the epistemic value of replication. The author argues that there are three key focal areas discussed in these papers collectively, in terms of the relationship between a replication and the original study; replication within a specific scientific discipline; and the significance of replication for science at large. And the author states that rarely are all three discussed in an individual paper. Finally, the author discusses their interpretation of the tone of the papers, which following the timeline of papers published started as enthusiastic towards replication, shifting to critical reflection, and moving to a more balanced engagement.

My comments are intended for the author to consider as they revise their manuscript with a focus on presentation and clarifying some aspects of the methodology.

It is unclear how the author identified these articles. It was acknowledged that this was not an exhaustive list, but where did they come from? There have been positions about replication since the beginning of the modern scientific approach. I think this essay could benefit from including this perspective, particularly since the abstract says, ‘influential contributions to this debate’, which implies some method of defining what is influential.
The author defines a paper focusing on one of the 3 domains they identified by checking the abstracts and conclusions, as well as whether the article included a least a page of discussion to a given domain. I don’t understand the last piece. Why 1 page? It would seem percentage of the total article length would be more reasonable. That is, papers are different lengths, so a 1-page determination would create a bias towards shorter overall article lengths.
The author states there was a surprising lack of articles engaging with all three domains. Why is this surprising? Is it necessary that articles would do this? That is, the authors identified these three domains in their essay – so why would authors of other previous papers be expected to map to this? Each paper is a contribution, so I would not find it surprising, but expected as the conversation/debate unfolds. It is akin to how no single research paper is ‘definitive’ towards any given topic. Each paper (or more accurately study) is a piece of a puzzle. It’s why we do systematic reviews and meta-analyses – it provides a different lens on the status of what is and isn’t happening. I would think the essay by the author is just that – attempting to understand what has happened. All this to say, I was struck using this work as I do not find it surprising, so it might be useful to hear why. I do appreciate the authors position though that it is beneficial to have perspectives across these 3 domains, potentially even in a single perspective paper.
The abstract could be revised to emphasis that this is a preliminary approach (i.e., what the author acknowledges at the end of the ‘Research Gaps’ section), and that the assessment is based on the authors interpretation of the papers (e.g., the statement at the end of the ‘Categorization of Articles’ section). I think that is critical and will help frame this as an early contribution to understanding and advancing epistemic function of replication in science.
Minor: there is a misspelled word on page 6, ‘revsied’ instead of ‘revised’.

Again, my review is intended to provide, what I hope, is useful feedback to the author about their perspective. Thank you for giving me the chance to add my own perspective to this piece.

Competing interests: I am an employee of the non-profit organization Center for Open Science that has a mission to increase openness, integrity, and reproducibility of research.

Cite