Published at MetaROR
November 19, 2025
Table of contents
Curated
Article
The Epistemic Function of Replication: Mapping the Domains of the Debate
Bence Orkeny1
Originally published on May 23, 2025 at:
Editors
Kathryn Zeiler
Jennifer Anne Byrne
Editorial Assessment
by Jennifer Anne Byrne
This study reviews and categorises 15 publications that debate the epistemic function of replications. The 14 articles and one book were categorised according to whether and how they addressed 3 domains, namely (i) the relationship between a replication and its original study, (ii) the role of replications within specific disciplines, and (iii) the significance of replications for science overall. While recognising the interest in these questions, the reviewers described a number of ways in which the manuscript could be improved.
All reviewers highlighted the need to describe how the 15 publications were selected. The analysis of a relatively small number of publications was viewed to limit the generalisability of any conclusions, such as how the specific terms used to describe replications and/or published descriptions of replications may have changed over time, and outstanding evidence gaps. Reviewers also queried elements of the methodology used to analyse the 15 publications, highlighted the need to assess entire publications. The requirement for one page of discussion to be devoted to a given domain could represent varying proportions of different publications, according to overall length and formatting, and may therefore have biased some assessments of whether individual publications addressed the domains of interest.
The manuscript described relatively few articles that have addressed all 3 domains. Reviewers questioned both this conclusion (based on the analysis of a select group of publications), and whether individual publications would be reasonably expected to address all 3 domains, while recognising the value of at least some articles taking this approach.
Recommendations from the editor
Symbols used in Table 2 should be defined as footnotes, as all symbols could be interpreted differently by individual readers.
Recommendations for enhanced transparency
-
Add author ORCID ID.
-
Add a competing interest statement. Authors should report all competing interests, including not only financial interests, but any role, relationship, or commitment of an author that presents an actual or perceived threat to the integrity or independence of the research presented in the article. If no competing interests exist, authors should explicitly state this.
-
Add a funding source statement. Authors should report all funding in support of the research presented in the article. Grant reference numbers should be included. If no funding sources exist, explicitly state this in the article.
Peer Review 1
This work presents a brief review and categorization of articles debating the epistemic function of replication between 1985 and 2023. While the topic is interesting, the lack of description of the review methodology precludes drawing general conclusions from the sample.
Major points:
- How were the 15 papers selected? Is there any kind of systematic process underlying the selection? If there is not, how can we assess whether these articles are indeed representative of the field (or “a snapshot of the most representative arguments”, as mentioned in the introduction)? This is a very important omission in the methods and compromises the value of the work, as it calls its general conclusions into question (see points below). Even if a nonsystematic approach has been used, a minimal description of the review methodology (including search method and inclusion criteria) is needed for one to make sense of the findings.
- The categorization scheme used for the articles is useful as a general descriptor of the research base that was reviewed, but I personally don’t think that “evidence gaps” can be inferred from it. In particular, I don’t see a problem with having few papers addressing all three domains (i.e. relation between replication and replicated study, connection of replication to particular disciplines and connection of replication with science as a whole) as the author seems to imply. These purposes can be filled by different articles, and thus the absence of the three approaches.
If the author wants to identify research gaps, it would be more informative to qualitatively analyze the articles rather than rely on this simple categorization. That said, I would argue that one can only speak about “gaps” when an attempt to systematically review the literature has been made, which as far as I could understand does not seem to be the case here. - Similarly, I think the absence of a formal review methodology makes the generalizations about how the discussion of replications in the literature has evolved over time unwarranted. For making a statement on temporal evolution, one would need to have a sample that is representative of the literature. Otherwise, these patterns might arise as a byproduct of the particular articles that were selected, and there is really no way to tell whether these reflect the literature as a whole in the absence of a systematic review.
Other conceptual points:
- Contrary to what the authors argue in page 3, I don’t think the language around replication and reproducibility have stabilized (see for example https://doi.org/10.31222/osf.io/entu4, https://osf.io/ewybt). This may have happened to some extent within specific communities (notably that centered around reproducibility in psychology), but language is still very inconsistent within and between many research fields.
- I don’t really think that Ioannidis’ “Why most published research findings are false” pertains to the sample, as it does not seem to discuss the epistemic function of replication. The author seems to acknowledge this, but still includes the article arguing that it raises relevant concerns (which to me reflects the absence of explicit inclusion criteria) and that “between the 1990s and 2010s it was among the most influential—and almost the only—article to engage critically with this topic”. The “almost the only” part of this statement seems exaggerated to me, and probably reflects the non-systematic nature of the search. For examples of articles dealing with similar questions as Ioannidis’ in the medical field during that period, for example, see
Minor points and wording:
- Page 1: “psychology”, not “phycology”
- Page 1: “Many scholars have argued…” – this whole sentence needs references for both sides of the debate.
- Page 2: The description of topic (2) in the introduction (i.e. “the connection between replication as a practice and a broader scientific discipline”) seems confusing as it stands and could be better worded.
- Page 5: “broader role of replication”, not “broader role replication”.
- Page 5: “For each of these three domains of focus I have identified three positions in the debate: 1) articles highlighting that replication has an important epistemic function in relation to the specific scope (marked with “+”) 2) articles critical regarding the epistemic value of replication in the given context (marked with and “–”) 3) articles not expressing an explicit argument, either supportive or critical, regarding the epistemic value of replication in the specific domain (marked with “N/A”).”
The distinction between “+” and “-“ (which seems to be a matter of supporting vs. questioning the epistemic value of reproducibility) is hard to follow when you use different words in points (1) and (2). Using “specific scope” for (1) and “given context” for (2) when referring to the same concepts may avoid repetition, but unnecessarily confuses the reader about the distinction between the two categories.
-
Page 6: “I only marked those articles in the table with “+” or “–” that offered a substantial argument”. Stating this as “I only marked with “+” or “-“ those articles in the table that offered a substantial argument” would be clearer.
Peer Review 2
The author attempts to provide a review of how epistemic functions of replication and reproducibility are presented in 15 sources of the ongoing replication debate. To that end the author categorized sources according to how they engage with three “domains of focus” (p2): 1) how the replicated/reproduced study relates to the original, 2) what the role of replication is in a specific discipline, and 3) what the role of replication is in science more generally. The described method of categorization is based on looking at abstracts and conclusions and checking whether at least one page is dedicated to a specific domain in the discussion (irrespective of the length of the sources). Based on this review the author concludes that there are “notable gaps in the literature” (p. 8), with only a few engaging with and integrating all three domains. According to the author, only Simons (2014) and Sikorski and Andreoletti (2024) integrate all three domains, and the literature needs more such contributions. The author makes multiple general claims about the replication discourse, such as, that the replication debate had an initial period of enthusiasm followed by a period characterized by a more critical attitude and recently returning to a period of nuanced enthusiasm.
In the following I will first go through some specific problems I see with the content of the manuscripts, then I will list some minor formalities. Lastly, I will provide a brief recommendation of how to proceed with this manuscript.
Specific problems with the content
On page 1 the author states that: “Many scholars have argued that the crisis reveals a crucial epistemic problem in contemporary research practices, while others have criticized either the severity or the relevance of the ‘crisis’ narrative.” However, there are references missing for such a statement. The author could for instance use Peterson and Panofsky (2023), Fanelli (2018), or Feest (2019, 2024). More than one reference should be provided, because the author states “many scholars argued …”.
While the author claims to review the literature on the epistemic functions of replication, he does not address the relevant literature on that topic. Two are, for instance, Schmidt (2009), which is something of an obligatory passage point when conceptual and direct replication are defined in psychology and Matarese (2022), who provided a review of kinds of replication and also categorized them based on the epistemic functions in her functional approach (also see Albertoni et al., 2023, who call epistemic functions ‘reasons’ in their review on the reproducibility terminology in machine learning). The author might also want to check Haig (2022).
It remains unclear why those 15 sources were selected for an analysis, if it was random, that is okay, but it would still be important to actually comprehend why those 15 sources are analysed. Moreover, starting on page 2 and throughout the manuscript the author makes claims about the wider replication discourse based on the selected 15 publications. There is no problem with a focused in-depth review of those publications. However, the analysis of the sources remains quite superficial and the inferences as well as claims about the general replication discourse are unwarranted based on what was analysed. General claims are inappropriate based on what the author did. Relatedly, on page 2 the author claims that the arguments about the functions of replication captured based on the 15 publications are “most representative” of the last 2 decades. However, it remains totally unclear how that is established. Therefore, right now this is an unwarranted claim.
Also on page 2, the author states that the terms replicability and reproducibility are used interchangeably, because there is enough agreement in the literature to justify such a usage. Using these terms interchangeably is okay, as long as the author informs the reader about that circumstance. However, the claim that such a use is justified because there is enough agreement in the literature or among scholars seems to be in contradiction with existing reviews on the topic (see e.g., Barba, 2018).
The author states that “A common view is that the language around replication and reproducibility stabilized around 2018–2019” (pp. 2-3) and on page 3 continues with “Even after the replication crisis, when replication became the dominant term in official discourse around 2018”. Such a claims requires evidence, for instance in the form of references. If the claim is that it is a common view such agreement needs to be shown, it cannot just be stated without any support. If it is a common view, where are the references in support of that view? If the sentences following this claim are meant as support for that claim, then this is further problematic because the literature seems misrepresented. For example, it is correct that Peels and Bouter (2023) only use the term reproducibility in footnote 1, but they use the term “reproduction” in their distinction between three types of replication: “Replication with existing data and the same research protocol: re-analysis of the data from the primary study with the primary research question. This we refer to as a “reproduction”. It can be argued that a publication should first be checked for numerical inconsistencies in the data and in the statistical parameters. Then a re-analysis of the existing data with the same data-analysis plan comes in view, potentially followed by analyses of the existing data with one or more alternative data-analyses plans. When these tests are passed satisfactorily it can be concluded that the findings of the primary study are robust (Nuijten 2021).” (Peels & Bouter, 2023, p. 79). Furthermore, my personal impression of the literature and own work on the issue in combination with existing reviews would suggest the opposite, that terminological proliferation and confusion increased since the claims of a replication crisis started to circulate and there is no emerging agreement of terminology in sight (see e.g., Gundersen, 2021; Ulpts & Schneider, 2024).
On page 3 the author claims that “There is a significant gap in the literature between the publications by Collins and Radder and the next major contribution: Ioannidis’ influential article, Why Most Published Research Findings Are False (Ioannidis, 2005).” This does not seem to be true considering the literature on replication and reproducibility in between 1992 and 2005. Here are some examples to consider: Basili et al. (1999), Bogen (2001), Chen (1994), Easley et al. (2000), Franklin (1999), Hunter (2001), King (1995), Peseran (2003), and Tsang & Kwan (1999).
The method described on page 4 for categorizing the publications, seems problematic. The author states to have categorized the selected literature within the three domains of focus “by looking at the abstracts and conclusions and additionally checking whether they have dedicated at least a page of discussion to a given domain”. It is just 15 sources, for a proper analysis, categorization, and representation of content, why not just read and analyse them from beginning to end? The approach described is flawed and too superficial for the indented purpose. Therefore, also the title of the manuscript is misleading and does not capture what is actually attempted in the actual paper, there is no mapping of the debate it is an attempt at capturing the discourse on the epistemic function of replication in 15 selected sources.
As an example of the superficial representation of the content of the selected articles, in the categorization of the 15 sources the author claims on page 5 about Leonelli (2018): “Leonelli’s project is not primarily concerned with the epistemic relation between the reproduced and original study; instead, her focus is on analyzing how reproduction is used in relation to different methodologies across fields” Which is not untrue, but in her 6 types of reproducibility she actually does elaborate on how the original study relates to the reproducing study as well as what function might be addressed. This apparent mischaracterization of Leonelli (2018) and other sources is probably a consequence of the improper analysis/ review method used which does not consider the full text. The chosen analysis and categorization approach is further problematic considering that the length of the 15 sources varies considerably, with Collins (1985) being a book. Furthermore, on page 4 the author also lists Norton (2015) as a source that does not acknowledge disciplinary contexts (“the role of disciplines”), which seems inaccurate considering that his perspective is based on an inductive approach to experimentation applied to four cases from at least three different disciplines.
On page 6 the author, in referring to Guttinger (2020), states: “Guttinger is clearly taking a critical stance regarding the epistemic function of replication in the domain of science.” I do not see how such a statement is justified. Guttinger (2020) is critical in reaction to other scholars claiming the general importance and relevance of replication as an epistemic practice and criterion, but he is not generally critical of the role or function of replication in the domain of science. He states that it is a local problem and practice, that does not mean he is critical of the function of replication for the domain of science. The analysis and representation of this literature seem out of context and unsubstantiated. Especially considering that much of these discussions (“new localism”) started to emerge after Peels and Bouter (2018) promoted the idea of replication in the humanities.
The identification of “research gaps” starting on page 7 based on the categorization is not only wanting due to the problematic review methodology, but also because the categorization seems superficial. Just looking at the domain focus on role of replication in science. It might just be a lack of nuance, neglect, or carelessness on the part of the authors of the selected sources (Simons, 2014; Sikorski & Andreoletti, 2024), instead of an actual consideration of the role of replication in science that led to the (+) in that domain in Table 2. Especially Simons (2014) is clearly psychology focused, and the speaking of science more generally does not seem to be an actual elaborate consideration of the role of replication in science.
Minor Formalities
There are some inconsistencies regarding the years of publication for the sources used. In Table 1, for instance, Sikorski and Andreoletti is listed as (2023), but in the reference list it is listed as (2024). Additionally, Lynch et al. is listed in Table 1 as (2014), but in the reference list as (2015). While Machery is in Table 1 listed as (2019) and in the references as (2020). The Schneider (2000) reference from footnote 2 is not in the reference list. In Table 1 for Ioannidis (2005) the listing of key points is off and for Guttinger (2020) there are more points listed than made.
Recommendation
All in all, the intention behind the manuscript is worthwhile and elaborating on the epistemic functions behind replication and reproducibility in disciplinary contexts as well as their role in the whole of science is an important endeavour, but the analysis presented in this manuscript is wanting and problematic. This “snapshot” should be situated within the existing literature and especially reviews on the epistemic functions of reproducibility and replication (Albertoni et al., 2023; Matarese, 2022; Ulpts & Schneider, 2024). The inclusion criteria for the 15 sources should be made transparent and argued for in the context of the intended purpose of the work. Having only 16 references in a manuscript that reviews 15 sources is an indication that something is lacking. Furthermore, to ensure that the selected sources are appropriately represented and categorized the full text needs to be considered and not just abstract, conclusion and whether at least one page is dedicated in the discussion to a specific domain. Therefore, major revisions are required, not just to ensure that the claims made are justified, but also to not create the impression of research gaps where there are none.
References
Albertoni, R., Colantonio, S., Skrzypczyński, P., & Stefanowski, J. (2023). Reproducibility of Machine Learning: Terminology, Recommendations and Open Issues (Version 1). arXiv. https://doi.org/10.48550/ARXIV.2302.12691
Barba, L. A. (2018). Terminologies for Reproducible Research (Version 1). arXiv. https://doi.org/10.48550/ARXIV.1802.03311
Basili, V. R., Shull, F., & Lanubile, F. (1999). Building knowledge through families of experiments. IEEE Transactions on Software Engineering, 25(4), 456–473. https://doi.org/10.1109/32.799939
Bogen, J. (2001). `Two as good as a hundred’: Poorly replicated evidence in some nineteenth-century neuroscientific research. Studies in History and Philosophy of Science Part C: Studies in History and Philosophy of Biological and Biomedical Sciences, 32(3), 491–533. https://doi.org/10.1016/S1369-8486(01)00013-9
Chen, X. (1994). The rule of reproducibility and its applications in experiment appraisal. Synthese, 99(1), 87–109. https://doi.org/10.1007/BF01064532
Collins, H. M. (1985). Changing order: Replication and induction in scientific practice. Sage Publications.
Easley, R. W., Madden, C. S., & Dunn, M. G. (2000). Conducting Marketing Science. Journal of Business Research, 48(1), 83–92. https://doi.org/10.1016/S0148-2963(98)00079-4
Fanelli, D. (2018). Is science really facing a reproducibility crisis, and do we need it to? Proceedings of the National Academy of Sciences, 115(11), 2628–2631. https://doi.org/10.1073/pnas.1708272114
Feest, U. (2019). Why Replication Is Overrated. Philosophy of Science, 86(5), 895–905. https://doi.org/10.1086/705451
Feest, U. (2024). What is the Replication Crisis a Crisis Of? Philosophy of Science, 91(5), 1361–1371. https://doi.org/10.1017/psa.2024.2
Franklin, A. (1999). How to Avoid the Experimenters’ Regress. In R. S. Cohen & M. W. Wartofsky (Eds.), Can that be Right? (Vol. 199, pp. 13–38). Springer Netherlands. https://doi.org/10.1007/978-94-011-5334-8_2
Gundersen, O. E. (2021). The fundamental principles of reproducibility. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 379(2197), 20200210. https://doi.org/10.1098/rsta.2020.0210
Guttinger, S. (2020). The limits of replicability. European Journal for Philosophy of Science, 10(2), 10. https://doi.org/10.1007/s13194-019-0269-1
Haig, B. D. (2022). Understanding Replication in a Way That Is True to Science. Review of General Psychology, 26(2), 224–240. https://doi.org/10.1177/10892680211046514
Hunter, J. E. (2001). The Desperate Need for Replications. Journal of Consumer Research, 28(1), 149–158. https://doi.org/10.1086/321953
Ioannidis, J. P. A. (2005). Why Most Published Research Findings Are False. PLoS Medicine, 2(8), e124. https://doi.org/10.1371/journal.pmed.0020124
King, G. (1995). Replication, Replication. PS: Political Science and Politics, 28(3), 444. https://doi.org/10.2307/420301
Leonelli, S. (2018). Rethinking Reproducibility as a Criterion for Research Quality. In L. Fiorito, S. Scheall, & C. E. Suprinyak (Eds.), Research in the History of Economic Thought and Methodology (Vol. 36, pp. 129–146). Emerald Publishing Limited. https://doi.org/10.1108/S0743-41542018000036B009
Lynch, J. G., Bradlow, E. T., Huber, J. C., & Lehmann, D. R. (2015). Reflections on the replication corner: In praise of conceptual replications. International Journal of Research in Marketing, 32(4), 333–342. https://doi.org/10.1016/j.ijresmar.2015.09.006
Machery, E. (2020). What Is a Replication? Philosophy of Science, 87(4), 545–567. https://doi.org/10.1086/709701
Matarese, V. (2022). Kinds of replicability: Different terms and different functions. Axiomathes, 32(S2), 647–670. https://doi.org/10.1007/s10516-021-09610-2
Norton, J. D. (2015). Replicability of Experiment. THEORIA. An International Journal for Theory, History and Foundations of Science, 30(2), 229. https://doi.org/10.1387/theoria.12691
Nuijten, M. B. (2021). Assessing and improving robustness of psychological research findings in four steps. https://doi.org/10.31234/osf.io/a4bu2
Peels, R., & Bouter, L. (2018). The possibility and desirability of replication in the humanities. Palgrave Communications, 4(1), 95. https://doi.org/10.1057/s41599-018-0149-x
Peels, R., & Bouter, L. (2023). Replication and trustworthiness. Accountability in Research, 30(2), 77–87. https://doi.org/10.1080/08989621.2021.1963708
Pesaran, H. (2003). Introducing a replication section. Journal of Applied Econometrics, 18(1), 111–111. https://doi.org/10.1002/jae.709
Peterson, D., & Panofsky, A. (2023). Metascience as a Scientific Social Movement. Minerva, 61(2), 147–174. https://doi.org/10.1007/s11024-023-09490-3
Schmidt, S. (2009). Shall we Really do it Again? The Powerful Concept of Replication is Neglected in the Social Sciences. Review of General Psychology, 13(2), 90–100. https://doi.org/10.1037/a0015108
Sikorski, M., & Andreoletti, M. (2024). Epistemic Functions of Replicability in Experimental Sciences: Defending the Orthodox View. Foundations of Science, 29(4), 1071–1088. https://doi.org/10.1007/s10699-023-09901-4
Simons, D. J. (2014). The Value of Direct Replication. Perspectives on Psychological Science, 9(1), 76–80. https://doi.org/10.1177/1745691613514755
Tsang, E. W. K., & Kwan, K.-M. (1999). Replication and Theory Development in Organizational Science: A Critical Realist Perspective. The Academy of Management Review, 24(4), 759. https://doi.org/10.2307/259353
Ulpts, S., & Schneider, J. W. (2024, June 17). A conceptual review of uses and meanings of reproducibility and replication. https://doi.org/10.31222/osf.io/entu4
Peer Review 3
Anonymous User
The author submitted a preprint providing their perspective on relatively recent papers that have been published discussing the epistemic value of replication. The author argues that there are three key focal areas discussed in these papers collectively, in terms of the relationship between a replication and the original study; replication within a specific scientific discipline; and the significance of replication for science at large. And the author states that rarely are all three discussed in an individual paper. Finally, the author discusses their interpretation of the tone of the papers, which following the timeline of papers published started as enthusiastic towards replication, shifting to critical reflection, and moving to a more balanced engagement.
My comments are intended for the author to consider as they revise their manuscript with a focus on presentation and clarifying some aspects of the methodology.
-
It is unclear how the author identified these articles. It was acknowledged that this was not an exhaustive list, but where did they come from? There have been positions about replication since the beginning of the modern scientific approach. I think this essay could benefit from including this perspective, particularly since the abstract says, ‘influential contributions to this debate’, which implies some method of defining what is influential.
-
The author defines a paper focusing on one of the 3 domains they identified by checking the abstracts and conclusions, as well as whether the article included a least a page of discussion to a given domain. I don’t understand the last piece. Why 1 page? It would seem percentage of the total article length would be more reasonable. That is, papers are different lengths, so a 1-page determination would create a bias towards shorter overall article lengths.
-
The author states there was a surprising lack of articles engaging with all three domains. Why is this surprising? Is it necessary that articles would do this? That is, the authors identified these three domains in their essay – so why would authors of other previous papers be expected to map to this? Each paper is a contribution, so I would not find it surprising, but expected as the conversation/debate unfolds. It is akin to how no single research paper is ‘definitive’ towards any given topic. Each paper (or more accurately study) is a piece of a puzzle. It’s why we do systematic reviews and meta-analyses – it provides a different lens on the status of what is and isn’t happening. I would think the essay by the author is just that – attempting to understand what has happened. All this to say, I was struck using this work as I do not find it surprising, so it might be useful to hear why. I do appreciate the authors position though that it is beneficial to have perspectives across these 3 domains, potentially even in a single perspective paper.
-
The abstract could be revised to emphasis that this is a preliminary approach (i.e., what the author acknowledges at the end of the ‘Research Gaps’ section), and that the assessment is based on the authors interpretation of the papers (e.g., the statement at the end of the ‘Categorization of Articles’ section). I think that is critical and will help frame this as an early contribution to understanding and advancing epistemic function of replication in science.
-
Minor: there is a misspelled word on page 6, ‘revsied’ instead of ‘revised’.
Again, my review is intended to provide, what I hope, is useful feedback to the author about their perspective. Thank you for giving me the chance to add my own perspective to this piece.


