Published at MetaROR

December 10, 2025

Table of contents

Abstract
Full text
Editors
Editorial assessment
Peer review 1
Peer review 2
Author response
Leave a comment

Cite

Cite this article as:

Bornmann, L., & Leibel, C. (2025). Citation accuracy, citation noise, and citation bias: A foundation of citation analysis. arXiv preprint arXiv:2508.12735.

Citation accuracy, citation noise, and citation bias: A foundation of citation analysis

Lutz Bornmann¹, Christian Leibel²

1. Science Policy and Strategy Department, Administrative Headquarters of the Max Planck Society
2. Department of Sociology, LMU Munich

Originally published on August 18, 2025 at:

https://arxiv.org/abs/2508.12735v1

Abstract

Citation analysis is widely used in research evaluation to assess the impact of scientific papers. These analyses rest on the assumption that citation decisions by authors are accurate, representing flow of knowledge from cited to citing papers. However, in practice, researchers often cite for reasons other than attributing intellectual credit to previous research. Citations made for rhetorical reasons or without reading the cited work compromise the value of citations as instrument for research evaluation. Past research on threats to the accuracy of citations has mainly focused on citation bias as the primary concern. In this paper, we argue that citation noise - the undesirable variance in citation decisions - represents an equally critical but underexplored challenge in citation analysis. We define and differentiate two types of citation noise: citation level noise and citation pattern noise. Each type of noise is described in terms of how it arises and the specific ways it can undermine the validity of citation-based research assessments. By conceptually differing citation noise from citation accuracy and citation bias, we propose a framework for the foundation of citation analysis. We discuss strategies and interventions to minimize citation noise, aiming to improve the reliability and validity of citation analysis in research evaluation. We recommend that the current professional reform movement in research evaluation such as the Coalition for Advancing Research Assessment (CoARA) pick up these strategies and interventions as an additional building block for careful, responsible use of bibliometric indicators in research evaluation.

Full text

1 Introduction

Scientific knowledge is a collective endeavor, shaped by researchers who identify research problems and develop solutions within their communities (Aman & Gläser, 2025). Continuous knowledge flow is essential. Scientists usually publish their results in papers that are made available to other scientists via journal contributions. In addition to the author’s research results, each paper contains a list of the literature cited in the author’s text. Every paper is therefore linked to previous research via the author’s citation links. The collective of citing and cited papers forms a social citation system of formal scientific communication (Tahamtan & Bornmann, 2022). Citation links weave a web of knowledge connecting papers (research). The social citation system treats citations as value-free acts (Milojević, in press); the system reflects what researchers have drawn on at a particular point in time (during the research process): “the reference is the information that is necessary to the reader [of a paper] in identifying and finding used sources” (Masic, 2013). Value-laden (inappropriate) elements flow into the social citation system via the mental systems involved in communication (especially by the citing authors as mental systems); these elements ‘irritate’ (for example, bias) the social citation system.

In the formal scientific communication process, citations in a paper are intended to clarify which ideas, methods, results etc. do not originate from the citing authors themselves, but have already been dealt with in previous publications. The process of citation in the social citation system is of particular importance in the evaluation of research (in the attribution of reputation in science): “Citations are a form of scientific currency, actively conferring or denying value” (Penders, 2018). Nowadays, there is hardly any institutional evaluation procedure in science that does not rely on citation analysis. Citations are usually interpreted as a flow of knowledge from the cited paper to the citing author, with the citing author attributing intellectual credit to the cited author for valuable research contributions. If these citations (credits) are aggregated for individual papers, papers can be ranked according to their usefulness for subsequent research. Such rankings of usefulness in the interaction of citing and cited papers are at the core of bibliometrics (evaluative citation analysis).

To be able to use the social citation system for research evaluation purposes, it is imperative that the authors’ citation decisions are accurate. In the research evaluation context, accurate means that a citation is inserted into a manuscript by an author only if a concrete knowledge flow from the cited paper to the citing paper has taken place (Aksnes, Langfeldt, & Wouters, 2019). Citations can only be used meaningfully in research evaluation processes if they represent knowledge flow and the usefulness of research for authors. If other factors influenced the citing author’s decision (such as the reputation of the journal in which the cited paper was published or the reputation of its author) or no decisive reason existed (for example, if the referenced literature is a database error), the citation cannot be used for research evaluation. The citation may represent a certain fact (such as the reputation of an author), but the citation does not represent what is intended to be measured in research evaluation: the usefulness of papers and knowledge flow.

Since the 1970s, in connection with the increasing application of citation analysis in research evaluation, one research strand (in bibliometrics) dealing with citation decisions has addressed the question as to why authors cite – in addition to documenting the knowledge flow from the cited paper to the citing paper. The overview of the relevant literature by Bornmann and Daniel (2008) and Tahamtan and Bornmann (2018, 2019) shows that many reasons for citations could be identified. One of the most early identified reason to cite is that researchers cite scientific literature primarily to reinforce the credibility and legitimacy of their own claims, aiming to persuade readers of their robustness and validity (Gilbert, 1977). Researchers cite “to defend their claims against attack, advance their interests, convince others, and gain a dominant position in their scientific community” (Bornmann & Daniel, 2008, p. 49). The comprehensive literature on reasons to cite suggests that some disciplines are characterized by the prevalence of a different and broader spectrum of reasons than other disciplines, possibly because of differences in citation behavior in the disciplines. For example, the study by Sula and Miller (2014) reveals that linguistics tends to feature reinforcing citations of prior literature, whereas philosophy typically involves more critical engagement with cited works.

Another research strand (in bibliometrics) dealing with citation decisions has focused on inaccurate citations. Inaccurate citations may be quotation errors (the citation “does not support the statement to which it is applied”, Wakeling, Paramita, & Pinfield, in press) or reference errors (“the in-text citation … or the formal reference in the reference list, is incorrect or incomplete”, Wakeling et al., in press). Other inaccurate citations result from the ‘lazy author syndrome’ (Gavras, 2002), i.e. from citing publications without engaging with the content of the cited publication. Chen, Murray, Liu, and Barabási (2024) denote this syndrome as ‘heuristic citation approach’, where citations “are made without fully reading or understanding the paper. Heuristic approaches may involve lifting citations directly from the references of another paper, incorporating citations used in one’s previous work without knowledge of ongoing developments, or referencing work based only on cursory readings”. Both the lazy author syndrome and the heuristic citation approach are thus about cursory or careless citation.

Wakeling et al. (in press) published a current overview of previous studies dealing with inaccurate citations from which we report summarizing results in the following. In some of these studies, researchers had the task to assess the accuracy of a sample of citations from a specific discipline such as educational research, history, or psychiatry. In other studies, researchers also assessed the accuracy of citations, but these studies focused on a sample of citations following a single publication. Several studies dealing with inaccurate citations have investigated the type of errors leading to inaccurate citations. Some studies distinguished between major and minor errors, other studies sorted citations into fully accurate, partially accurate, or unsubstantiated categories. Studies suggested that the error type can be linked to its severity. For example, one study treated citations that contradicted, failed to support, or were irrelevant as major errors. In contrast, misquoting figures, using indirect citations, oversimplifying, or drawing conclusions absent from the original work were classified as minor errors.

Both strands of previous research on citation decisions (reasons to cite and inaccurate citations) point to a phenomenon that we would like to address fundamentally in this paper: the distortion of citation decisions by noise as opposed to citation accuracy. The conceptualization of citation noise and citation accuracy in this paper is closely based on Kahneman, Sibony, and Sunstein (2021). The authors deal with noise and accuracy in human decision-making processes in general. When dealing with error in human decision-making processes (as opposed to accuracy), Kahneman et al. (2021) focus on noise and bias: undesirable variability and deviation. In the context of citation decisions, past (bibliometric) research has dealt almost exclusively with biased decisions (in the context of the research strand dealing with authors’ reasons to cite). Citation noise hardly plays a role in the literature.

Traag and Waltman (2022) define bias “as a direct causal effect that is unjustified”. Biased citation decisions occur when factors unrelated to knowledge flow, such as authors’ gender, have causal effects on citation outcomes. A current overview of the many factors that can lead to biased citation decisions can be found, for example, in Kousha and Thelwall (2024). We would like to highlight just one example of citation bias here: Jannot, Agoritsas, Gayet-Ageron, and Perneger (2013) have found that there is “a citation bias favoring [statistically] significant results … in medical research. As a consequence, treatments may seem more effective to the readers of medical literature than they really are” (p. 296). In this quote, the authors not only name the potential citation bias, but also list the consequences that can arise from this bias: the distorted citation analysis can give a false impression of the state of research.

To understand citation error in citation decisions, it is necessary to deal not only with bias, but also with noise. In this paper, we will show that citation noise is a problem in citation decisions and thus in research evaluation. However, we were only able to identify a few papers (from bibliometrics) in the Web of Science (WoS, Clarivate) in which the term ‘citation noise’ appears in the title or abstract (Cawkell, 1969; Meho & Yang, 2007; Tang, 2023; Wei, Zhang, Zhang, Liang, & Wu, 2019) (date of search: April 2025). The authors of these papers usually use the term ‘citation noise’ but do not explicitly define it. Even though the term and its associated concept are hardly known in bibliometrics, the term has been in use in the field of patent analysis since the 1990s. In an overview of the literature on this phenomenon, Smith (2014) defines citations as noise in a patent if they do not represent a knowledge flow from the prior art to the patent. For the author, knowledge flow occurs “when a patent is building upon or improving older technology and may be identified when there is direct link between the claims of the patent and relevant citations” (Smith, 2014, p. 40). Results reveal that citations in a patent not representing knowledge flow may account for half of its total citations (Jaffe, Trajtenberg, & Fogarty, 2000).

We assume similar numbers for citations in publications: The literature overview by Bornmann and Daniel (2008) of studies on reasons to cite reveals a comparatively frequent occurrence of citations of the perfunctory (up to 50 percent) and persuasive (up to 40 percent) type. The results of Donner, Stahlschmidt, Haunschild, and Bornmann (in press) similarly show that around half of the citations in the WoS are of the background type. Background citations (superficially) mention prior research that provides the scholarly context for a study. We outlined above that the studies dealing with inaccurate citations differ in terms of the sampled discipline, applied methods, and used terminology (in defining inaccurate citations). These variances in the studies may be one reason why the figures about the extent of inaccurate citations range from 5% to 40% (Wakeling et al., in press) with most studies reporting between 10% to 20% (including the study by Wakeling et al., in press). Wakeling et al. (in press) illustrate these results as follows: “If a typical social science article has, on average, 34 references … the implication is that 3-6 of the citations in the paper will be erroneous in some way”. Two meta-analyses of studies dealing with inaccurate citations come to percentages of 25.4% (Jergas & Baethge, 2015) and 14.5% (Mogull, 2017).

The results of the studies on perfunctory citations and inaccurate citations (see above) reveal that many citations in publications do not represent knowledge flow (but noise or bias). There seems to be a similar problem in publication citations as in patent citations where citations may not represent knowledge flow for half of total citations (see above). Although patent citation noise is seen as “a major challenge to the effectiveness of patent evaluation methodologies, which may therefore be poor indicators of the economic value of patents” (Smith, 2014, p. 40), it is surprising that citation noise has not been assessed more frequently as a challenge in bibliometrics (and science policy) so far. However, as we will show in the following, dealing with citation noise is essential for the use of citation analysis in research evaluation. Bibliometric research should not only deal with citation bias.

What do we mean by citation noise in bibliometrics and how do we distinguish citation noise from citation bias and citation accuracy? Noise and bias are two sides of the citation error coin. Citation decisions are incorrect and therefore inaccurate if they do not represent knowledge flow from the cited to the citing paper. Noise refers to the undesirable variability in judgments about which papers should be cited in a particular paper; it represents the (random) scatter of citation decisions among citing authors. In contrast to noise, citation bias is the systematic deviation of citation decisions from accurate citation decisions. For example, if authors preferentially cite papers with statistically significant results, this phenomenon is referred to as citation bias. Studies that do not have statistically significant results tend to be ignored by the citing authors. Distortions in citation decisions due to bias and noise imply that aggregated citation measures (such as citation rates or times cited) cannot reliably or validly reflect the usefulness of research for future studies.

Since citation analyses are used to decide on reputation and resources in science, citation data should be able to give the right signal for research evaluation. If it is primarily noise and bias that determine citation decisions, then there is a risk that incorrect reputation and resources decisions will be made based on citation data. As we will show in the following, noise can be a significant source of error in citation decisions. We will argue that it is important to take measures in science (policy) that can increase the proportion of accurate citation decisions among all citation decisions. We claim in this study that if a paper receives many citations although it was considered hardly useful for future research, and another paper receives very few citations although it turned out to be very useful, citations cannot be used meaningfully in research evaluation. Since citation analyses can play a significant role in research evaluation, citation data should only be used if they are (mainly) based on accurate citation decisions.

In the next section (section 2), we introduce the concepts of citation accuracy, citation noise, and citation bias in detail based on an exemplary social citation system composed of a small fictitious world of cited and citing papers. With the introduction of these basic concepts, we sketch the foundation of a citation analysis that considers citations as knowledge flow proxies. In the following section 3, against the prevalence of noise and bias in citation decisions, we deal with strategies to reduce citation noise and bias.

2 Definition and measurement of citation accuracy, citation bias, and citation noise

We would like to explain the definition and measurement of citation accuracy, citation bias, and citation noise using the example of a fictitious social citation system. This small world example which we use for illustration purposes in this study consists of citing and cited papers (see Table 1). We assume that the fictitious system includes all citing and cited papers of the small world: ten citing (1 to 10) and five cited papers (A to E). No other citing and cited papers exist outside the system: The system is an isolated system with no knowledge spillover or knowledge absorption from the external world. The citation decisions in the social citation system are from three citing authors (I, II, and III) for at least two different papers each. In Table 1, the citing authors are indexed with 𝑖, the citing papers are indexed with 𝑗, and the cited papers are indexed with 𝑘.

The binary values 0 and 1 are used in the table to indicate whether a (cited) paper is cited (column R), should be cited (column A) or is erroneously cited (column E) by a citing paper. For example, while citing paper 2 has cited (cited) paper D, citing paper 3 has not cited this paper. The author of citing paper 2 therefore attributes a different value to cited paper D than the author of citing paper 3. Regarding the citation decisions made by the authors, we generally assume that citation decisions are reliable and interchangeable: Different authors would cite the same work at the same (or similar) point in a certain paper. That means citation decisions should not be based on personal taste or opinion but on objective requirements. Citation decisions may be trade-offs between the pros and cons of different citation options, but ultimately these trade-offs should be resolved by evaluative citation judgments made by the citing author about the perceived knowledge flow. It is possible, for example, that citing authors have found a certain fact to which they would like to refer in their own papers in several papers by another author. In this case, however, it is the task of the citing authors to make an evaluative citation judgment against the background of the available alternatives and the perceived knowledge flow. Large disagreements in performed citation decisions of different authors based on the same pool of available papers that are cited for one and the same text passage would violate expectations of reliability and validity in citation decisions (and citation analysis).

We assume in general and specifically for the social citation system in Table 1 that accurate citation decisions are possible for citing papers. It should be possible for authors to determine whether knowledge flow has happened or not. The assumed accurate citation decisions are given for each cited paper in the table (columns A). In a perfect citation world, all realized citation decisions (columns R) would be flawless measuring instruments of knowledge flow, and aggregated citations would be an error-free metric of knowledge flow. If Table 1 were free of errors, all realized citation decisions (columns R) would correspond to accurate citation decisions (columns A). For example, if cited paper A had always been cited accurately in the small world, all citing papers would have had cited the paper. Since accurate citation decisions can mean that a certain publication should be cited (value 1) or should not be cited (value 0) in a paper, the columns in Table 1 with the accurate citations generally consist of patterns including 0 and 1 values.

In the social citation system, we see realized citation decisions as a measuring instrument for knowledge flow, where the decision instrument is the citing author. The citing author is confronted with a cloud of citation possibilities and makes an evaluative judgment (i.e. an assessment based on scientific criteria with respect to knowledge flow) about whether a particular work should be cited in the text or not. This human measuring instrument sometimes works better and sometimes worse and makes sometimes accurate and sometimes erroneous citation decisions. For the accurate measurement of knowledge flow in research evaluation processes, however, it is essential to avoid citation errors as far as possible and to increase citation decisions’ reliability and accuracy. The aim should be to make citation decisions as accurate as possible. We assume that a paper must be cited precisely when a text passage in the manuscript has a certain property: It is based on knowledge that is described in another paper. The citations in the list of references at the end of the manuscript should reveal thus the extent to which the present contribution (substantially) builds on previous contributions.

In the context of a theory of citation, Small (2004) brings a norm of citation into play with recourse to the sociology of science by Merton (1973): “such a norm might be the expectation that authors acknowledge prior work in an accurate manner” (Small, 2004, p. 75). Similar normative statements about citation decisions can also be found at Penders (2018): “Writing manuscripts requires, among so much more, decisions on which previous studies to include and exclude, as well as decisions on how exactly that inclusion takes place. A well- referenced manuscript places the authors’ argument in the proper knowledge context and thereby can support its novelty, its value, and its visibility”. The text passages in a manuscript where a certain other work should be cited as the proper knowledge context and the pool of works that could potentially be cited result in an expected pattern of citations for a given manuscript, which can be compared with the realized citations in the manuscript (see columns A and R in Table 1). The expected pattern can be used to determine for each manuscript the extent to which citation errors (columns E) are present as deviations from accurate citation decisions. These deviations can be determined and examined for cited and citing papers in a social citation system.

Although citation accuracy is the goal of citation decisions, this goal is usually not (fully) achieved by the citing authors (as the studies on reasons to cite and inaccurate citations demonstrated, see above). We can assume that there will always be a certain number of errors (bias and noise) among the citation decisions for a manuscript. The individual errors among citation decisions are denoted as 𝑥_𝑖𝑗 in the following. 𝑥_𝑖𝑗 denotes an erroneous citation made by citing author 𝑖 in citing paper 𝑗. Since we assume for the small world in Table 1 that all citing papers should cite a certain set of papers (A to E), we can use the binary data in the table to determine the degree of citation accuracy for each cited and citing paper. We start with the citing authors and focus on the lines in Table 1. The (fictitious) authors of the citing papers evaluated whether they should cite the five papers (A to E) and made accurate or erroneous citation decisions. For example, the author of citing paper 1 (author I) made a citation decision on five papers and made one accurate decision.

Binary data (typically 0 and 1) often behave like a Bernoulli distribution, where each observation has two possible outcomes. Given n observations, where there are a ones and n – a zeros, the mean μ is calculated as

This mean 𝜇 corresponds to the proportion p that a value is 1. Since one of the five citation decisions for citing paper 1 in Table 1 is correct, the citing paper’s proportion of accurate citations across the cited papers is 𝑃𝐴_𝑖𝑗 = 0.2 (1/5) and the corresponding citing paper’s proportion of errors is 𝑃𝐸_𝑖𝑗 = 0.8 (4/5).

Each citation decision of the citing authors in Table 1 can be either:

Correct positive: The citing author correctly identifies a paper that should be cited
Correct negative: The citing author correctly identifies a paper that should not be cited
Incorrect positive/negative: The citing author either wrongly includes a citation or fails to identify a necessary citation.

Citation accuracy for the citing papers (𝑃𝐴_𝑖𝑗) provides a snapshot of the data quality in Table 1. As the results for the citing papers 1 to 10 show, 𝑃𝐴_𝑖𝑗 is between 0.2 and 0.8. The average across all citing papers is 𝑃̅̅𝐴̅̅ = 0.54, which is only slightly better than a random selection of the papers to be cited by the authors. The results indicate that the small world in Table 1 is affected by citation errors (𝑥_𝑖𝑗) to a greater extent. It is therefore questionable whether the citation accuracy is sufficient to justify using the data in research evaluation processes. The application of citation analysis in research evaluation processes presupposes that the accuracy of the citation decisions is high.

In a social citation system, we can also deal with accuracy at the level of cited papers. This involves the columns in Table 1. The paper citation accuracy is the proportion of correct citation decisions, which is calculated using the citation decisions of the ten citing papers (authors). The table indicates for each cited paper whether it should be cited by a citing author or not (column A). In the case of cited paper A, for example, we assume that it should have been cited by all 10 citing papers. As the values in Table 1 show, 6 author decisions that led to the realized citations are accurate; the cited paper’s proportion of accurate citations is therefore 𝑃𝐴_𝑘 = 0.6. The proportion (𝑃𝑅_𝑘) and times cited (𝑇𝐶_𝑘) of the realized citations are aggregated values of citations for the individual cited papers A to E in Table 1. As Table 1 reveals for cited paper A, 𝑃𝑅_𝑘 = 0.6 and 𝑇𝐶_𝑘 equals 6. These values are the aggregated citation impact values that paper A has achieved in the small world under the influence of errors in the social citation system.

The various deviations of realized citations from accurate citations create noise in the social citation system in Table 1. Since deviations between realized and expected (accurate) citations differ for citing authors and citing papers within the system, variability emerges. We refer to this variability as noise. Noise is measured in the table using the standard deviation. The standard deviation is the most common measure of variability in statistics. For a Bernoulli distribution, the standard deviation σ is defined by

Table 1. Fictitious social citation system including citing and cited papers
	Cited paper A			Cited paper B			Cited paper C			Cited paper D			Cited paper E			𝑃𝑅_𝑖𝑗	𝑃𝐴_𝑖𝑗	𝑃𝐸_𝑖𝑗	𝑃̅̅𝐸̅̅_𝑖	𝜎_{𝑃𝑁,𝑖}
	𝑅	𝐴	𝐸	𝑅	𝐴	𝐸	𝑅	𝐴	𝐸	𝑅	𝐴	𝐸	𝑅	𝐴	𝐸	𝑃𝑅_𝑖𝑗	𝑃𝐴_𝑖𝑗	𝑃𝐸_𝑖𝑗	𝑃̅̅𝐸̅̅_𝑖	𝜎_{𝑃𝑁,𝑖}
Author I: Citing paper 1	0	1	1	1	1	0	1	0	1	0	1	1	0	1	1	0.40	0.20	0.80	0.53	0.25
Author I: Citing paper 2	1	1	0	0	1	1	0	0	0	1	1	0	1	1	0	0.60	0.80	0.20
Author I: Citing paper 3	0	1	1	1	0	1	0	0	0	0	0	0	0	1	1	0.20	0.40	0.60
Author II: Citing paper 4	1	1	0	1	0	1	0	0	0	1	0	1	0	1	1	0.60	0.40	0.60	0.50	0.10
Author II: Citing paper 5	1	1	0	1	0	1	1	1	0	0	0	0	0	1	1	0.60	0.60	0.40	0.50	0.10
Author III: Citing paper 6	0	1	1	1	0	1	0	0	0	1	0	1	0	0	0	0.40	0.40	0.60	0.40	0.13
Author III: Citing paper 7	1	1	0	1	0	1	1	0	1	1	1	0	0	0	0	0.80	0.60	0.40
Author III: Citing paper 8	1	1	0	1	1	0	1	1	0	0	1	1	0	0	0	0.60	0.80	0.20
Author III: Citing paper 9	0	1	1	0	1	1	0	0	0	0	0	0	0	0	0	0.00	0.60	0.40
Author III: Citing paper 10	1	1	0	0	1	1	1	0	1	0	0	0	0	0	0	0.40	0.60	0.40
𝑃𝑅_𝑘	0.6			0.70			0.50			0.40			0.10
𝑇𝐶_𝑘	6			7			5			4			1
𝐸𝐶_𝑘		10			5			2			4			5
𝑃𝐴_𝑘			0.60			0.20			0.70			0.60			0.60
𝑃𝐸_𝑘			0.40			0.80			0.30			0.40			0.40
𝑃̅̅𝐴̅̅																	0.54
𝑃̅̅𝐸̅̅																		0.46
𝜎_𝐿𝑁																			0.06
𝜎_𝑃𝑁																				0.17

Notes. 𝑅 = realized citation, 𝐴 = accurate citation, 𝐸= erroneous citation, 𝑃𝑅_𝑖𝑗 = proportion of realized citations of citing author 𝑖 in citing paper 𝑗, 𝑃𝐴_𝑖𝑗 = proportion of accurate citations of citing author 𝑖 in citing paper 𝑗, 𝑃𝐸_𝑖𝑗 = proportion of erroneous citations of citing author 𝑖 in citing paper 𝑗, 𝑃̅̅𝐸̅̅_𝑖 = Person- specific average error rate of citing author 𝑖, 𝜎_{𝑃𝑁,𝑖} = author-specific pattern noise (within-author variance of erroneous citations for author 𝑖), 𝑃𝑅_𝑘 = cited paper 𝑘’s proportion of realized citations, 𝑇𝐶_𝑘 = times cited, 𝐸𝐶_𝑘 = expected citations, 𝑃𝐴_𝑘 = cited paper 𝑘’s proportion of accurate citations, 𝑃𝐸_𝑘 = cited paper 𝑘’s proportion of erroneous citations, 𝑃̅̅𝐴̅̅ = Proportion of accurate citations in the entire citation system, 𝑃̅̅𝐸̅̅ = Proportion of erroneous citations in the entire citation system, 𝜎_𝐿𝑁 = overall level noise (between-author variance of erroneous citations), 𝜎_𝑃𝑁 = overall pattern noise.

The standard deviation represents the variance in the data with n observations, which is calculated as the sum of the squared differences of the individual values from the mean across all values: (𝑥 − 𝜇)2. Using the method of deviating squares in the formula, the deviations from the mean (𝜇) – the individual errors (𝑥_𝑖𝑗) – are combined into an overall error. Citation noise is due to the variation among citing authors in their disposition to set citations in manuscripts.

In the social citation system, there are three different types of citation noise: citation level noise, citation pattern noise, and citation occasion noise. These three types of noise form the overall citation noise in the social citation system. In this section, we define and explain all three types of citation noise.

What do we mean by citation level noise? Some citing authors are more prone to make erroneous citations than others. Some authors tend to cite too little; others tend to cite too much. 𝑃̅̅𝐸̅̅_𝑖 gives the proportion of erroneous citations across all an author’s citing papers and can thus be interpreted as the author-specific citation error rate. In Table 1, for example, citing author I has an errors’ proportion of about 0.53, which is higher than the proportion of 0.40 for citing author III. These author specific level differences in the social citation system primarily reflect personal citation styles, which can also go back to corresponding subject area or research group styles. Even if all scientists – regardless of the subject area, research group, etc. – should follow the same citation norm (another paper is cited exactly when a knowledge flow has taken place), citation practice is often different. We refer to this kind of citing author variability in erroneous citation decisions as citation level noise.

In statistics, this type of variance, which we call level noise in this study, is referred to as between-author variance. It captures the variance of erroneous citation decisions across different citing authors. Citation level noise (𝜎_𝐿𝑁) is measured as the standard deviation of the author-specific error rates (𝑃̅̅𝐸̅̅_𝑖) from the proportion of erroneous citations in the entire citation system (𝑃̅̅𝐸̅̅) weighted by the number of citing papers co-authored by citing author 𝑖 (𝑛_𝑖).

Using the values in Table 1, the citation level noise in the fictitious social citation system is 𝜎_𝐿𝑁 = 0.06. The small amount of citation level noise indicates that the citing authors have a similar level of citation accuracy.

After the citation level noise, we would like to look at citation pattern noise: There is additional noise in the system than can be explained by different citation styles of the citing authors. Whereas citation level noise refers to (erroneous) inconsistencies in the citation decisions of different authors, citation pattern noise refers to inconsistencies in the citation decisions of one and the same author. More specifically, citation pattern noise is about the interaction of citing author characteristics and cited paper characteristics. In Table 1, the citation pattern noise values for each citing author can be found in column 𝜎_{𝑃𝑁,𝑖}: It is the within-variance of citation errors for each citing author.

The within-author variance of citation errors is calculated as the standard deviation of citation errors in an author’s citing papers (𝑃𝐸_𝑖𝑗) from the author-specific citation error rate (𝑃̅̅𝐸̅̅_𝑖). 𝑛_𝑖 denotes the number of citing papers authored by a specific citing author 𝑖. As the results in Table 1 show, the (citing author) pattern noise is similar for all authors: It is between 𝜎_{𝑃𝑁,𝑖} = 0.10 and 𝜎_{𝑃𝑁,𝑖} = 0.25.

The overall citation pattern noise in the social citation system (𝜎_𝑃𝑁) is defined as the weighted average of the author-specific citation pattern noise values, whereby 𝑛 denotes the number of citing authors. The results in Table 1 reveal that the overall citation pattern noise amounts to 𝜎_𝑃𝑁 = 0.17.

Citation pattern noise consists of two components: (1) stable citation pattern noise and (2) occasion noise – the random part of pattern noise. Stable citation pattern errors are assumed to be unique to each citing author: Even if a citing author will generally have a tendency to make more or fewer erroneous citations, this tendency will furthermore be determined by certain properties of the papers that are available for citation decisions. A citing author might, for example, tend to cite only a few sources in papers, but is more inclined to reference papers that propose specific statistical methods. Another author tends to cite many papers, but these papers are mostly written by authors from European countries. Such interactions of citing and cited papers properties vary from citing author to citing author. Stable citation pattern noise is not a random event in the citation process, but rather an interaction between citing author and cited papers that occurs systematically.

Pattern noise can not only occur systematically but can also be attributed to random events. We would like to refer to these (erroneous) random citation decisions as citation occasion noise (which is also pattern noise, but not stable pattern noise). We have described such random citation decisions in the introduction by quoting Chen et al. (2024). Citation occasion noise arises, for example, when the writing author Z of a paper Z accidentally becomes aware of a text passage in another paper Y. This passage is taken up by author Z and inserted as a citation in paper Z. At another point in time, author Z would probably not have cited paper Y in paper Z (but another paper or no paper at all). Author X generally reads a lot of papers by colleagues and therefore has a great tendency to cite in paper X. However, as the author has temporarily not been able to follow the research of his colleagues due to a heavy teaching workload, the author tends to cite less in a later paper Q (following paper X). This within-citing author variability, which can be determined via the test-retest reliability, is conceptually different from the stable between-citing author differences: we call this variability based on transient effects citation occasion noise.

We have now described all types of noise that can occur in social citation systems according to Kahneman et al. (2021). Citation system noise is the undesirable variability in citation decisions on the available papers in a social citation system that can be potentially cited by multiple citing authors. Citation system noise includes two noise components, which can be separated when the same citing authors evaluate multiple papers that can potentially be cited: (1) Citation level noise is the variability in the author-specific tendency to make citation errors. The amount of citation level noise in a social citation system is small if all citing authors have a similar tendency to make erroneous citations. By contrast, there is a large amount of citation level noise if some authors make substantially more accurate citation decisions than others. (2) Citation pattern noise is the variability in citing authors’ responses to the characteristics of papers that can potentially be cited. Citation pattern noise is large if citing authors are very inconsistent and it is small if the citing authors are very consistent in their tendency to make erroneous citations. (2a) Stable citation pattern noise arises in the interaction between the characteristics of the citing author and the characteristics of the papers that they could cite. This pattern noise is rooted in the stable personal characteristics of the citing authors regarding his or her citation decisions. (2b) In addition to stable citation pattern noise, there is also citation occasion noise, which can be described as random error. Citation level noise (1) and citation pattern noise (2a, stable pattern and, 2b, occasion noise) both contribute to the overall system noise.

The overall citation system noise is thus the combination of citation level noise and citation pattern noise. Citation system noise is the total amount of citation noise in the social citation system.

We used the values 𝜎_𝐿𝑁 = 0.06 and 𝜎_𝑃𝑁 = 0.17 in Table 1 to compute the total amount of citation noise in the (fictitious) social citation system. The resulting 𝜎_𝑆𝑌𝑆 = 0.18 indicates a certain level of noise in the system.

In addition to the various types of noise that can be identified in the social citation system, we also must deal with various citation biases that contribute to erroneous citations. Citation bias refers to a systematic, directional error in citation decisions. If several citing authors in Table 1 systematically exhibit citation errors due to certain factors (which are independent of knowledge flow and have a causal effect), we speak of citation bias. Citation decisions are biased if the cited papers are cited (significantly) more often or less often than what would be expected if all citation decisions were accurate. In other words, citation bias is measured by the difference between the average number of realized citations (𝑇𝐶_𝑘) and the average number of expected (accurate) citations (𝐸𝐶_𝑘) in a set of cited papers:

In the formula, 𝑛_𝑘 denotes the number of cited papers in the social citation system. In Table 1 for example, the cited papers should have received an average of 5.2 citations if all citations decisions were accurate [(10 + 5 + 2 + 4 + 5) / 5]. However, the papers actually received 4.6 citations on average [(6 + 7 + 5 + 4 + 1) / 5]. Therefore, there is an upward citation bias in the social citation system in Table 1: on average, the cited papers are over-cited.

Citation biases arise in a social citation system when many errors in the system have a specific tendency that relates to characteristics of the people (authors), institutions, countries, journals, etc. involved in the system (i.e. causally irritating the system). As we mentioned above, Jannot et al. (2013), for example, found that there is a citation bias favoring statistically significant results in medical research. On average, therefore, more citations are (erroneously) attributed to papers with statistically significant results than to papers without these results that could have been cited equally. Mutz, Wolbring, and Daniel (2017) were able to show that papers that were labeled as ‘very important paper’ by a reputable journal received more citations than comparable papers (i.e. papers of similar quality) without this label. Accordingly, papers labeled as very important receive on average more (erroneous) citations than papers without this label, i.e. citations that are not covered by a corresponding knowledge flow from the cited to the citing paper.

Unlike citation noise, biases in citation decisions have already been dealt with in many ways in bibliometric research (Kousha & Thelwall, 2024). Although we can assume that citation bias plays at least as great a role as citation noise in citation decisions, the two phenomena are treated very differently in bibliometric research. The difference in treatment is probably because – on the one hand – scientists tend to explain phenomena, and we can explain biases by linking them to a specific triggering factor. Scientists therefore try to assign a reason or factor to certain citation decisions, such as papers labeled as ‘very important paper’. Biases in citation decisions can relate to characteristics of the citing author (such as their school of thought) or characteristics of the cited paper (such as the school of thought of the cited authors). Citation noise – on the other hand – is a statistical phenomenon that can hardly be linked to specific characteristics: noise can only be identified if the distribution of an ensemble of citation decisions is statistically analyzed.

3 Strategies to reduce citation noise and citation bias

Over the past decade, several influential reports and initiatives – a “professional reform movement” (Rushforth & Hammarfelt, 2023, p. 879) – have stressed the need for careful, responsible use of bibliometric indicators in research evaluation. Key examples include the Leiden Manifesto (Hicks, Wouters, Waltman, de Rijcke, & Rafols, 2015), the United Kingdom (UK) Metric Tide report (Wilsdon et al., 2015), and a European Union (EU) agreement on research assessment reform (Coalition for Advancing Research Assessment, CoARA, see https://coara.eu). In parallel, agreements such as the Declaration on Research Assessment (DORA, see https://sfdora.org/) have pushed to limit the reliance on journal- based impact metrics, while the More Than Our Rank initiative (see https://inorms.net/more- than-our-rank/) highlights that university rankings – which often incorporate citation-based indicators – fail to capture essential aspects of an institution’s performance (Thelwall, 2024). Collectively, this reform movement underscores the limitations of depending solely on bibliometric measures in research evaluation. According to Thelwall et al. (2023), the reform movement delivers “strong arguments against using citations for aspects of research assessment” (p. 941).

Even though the reform movement strives for wide-reaching changes in research assessment (Rushforth & Hammarfelt, 2023), this movement is not concerned with the accuracy of citation decisions or the distinction between accurate and erroneous citations, nor with improving the validity of citation metrics by reducing erroneous citations. When it comes to discussing the validity of citation metrics, it is not the behavior of citing authors that is criticized by the movement, as they often do not make accurate citation decisions, but rather the evaluation system that uses citation analyses. However, this does not consider the fact that citation practices play a crucial role for the validity of citation metrics. Even though citing publications in everyday writing seems routine, each citation decision in a study should be influenced by value-based criteria related to knowledge flow. Careful citation should be a matter of course for scientists; scientists know that citations are used for performance metrics to account for (their) scientific work. Even one of the most commonplace scholarly actions – the citation of publications – is embedded in a broader evaluative context, subtly shaping and reflecting both disciplinary norms and administrative expectations (Penders, 2018).

With this study, we would like to argue that the professional reform movement in research evaluation should (also) devote itself to improving citation decisions – especially if it is a movement that deals with citation-based bibliometric indicators. Only if research assessment processes are largely based on accurate citation decisions, citation analysis can be used to assign credits (in terms of citations) accurately and to measure research performance (knowledge flow) reliably and validly: “citation-based metrics have proliferated as proxies for quality and impact over the years, only to be currently subjected to significant and highly relevant critique. To cite well, or to reference responsibly, is thus a matter of concern to all scientists” (Penders, 2018). For the reliable and valid use of citation analysis in research evaluation processes, it is also important to consider what can and cannot be measured by citations: Accurate citations can be used to measure the usefulness of published research, as they represent knowledge flow. Thus, citations do not primarily measure the quality or accuracy of research but formally communicated knowledge flow. However, if we assume that citations represent formally communicated knowledge flow, we can assume with a high degree of probability that knowledge flow primarily concerns research that is of high quality and accuracy.

Before we move on to strategies for reducing citation noise (and bias) in the social citation system, we would first like to raise awareness of the topic of citation noise. In doing so, we will also return to citation noise in the context of patent citations, as citation noise has already been dealt with there (including strategies for reducing citation noise). Unlike citation bias, which is a systematic error in one direction (for example, citations of papers reporting statistically significant results), noise is random and unpredictable referring to the variability or inconsistency in citation decisions made by citing authors. The sources of citation noise are diverse and range from coercive citation practices (an editor or a peer reviewer of a journal compels an author to include irrelevant or unwarranted citations as a condition for publication) to limited access to science literature that could (should) be cited. In contrast to citation bias, citation noise is more difficult to detect or explain. We suspect that this is an important reason why citation noise has hardly played a role in bibliometrics so far – although noise is certainly largely responsible for the dissatisfaction with the informative value of citation analysis.

In contrast to bibliometrics, dealing with citation noise plays a major role in patent analysis. Patent citations should represent (intensity of) knowledge flow. Cited and citing patents are seen as proxies of knowledge flow; cited references represent how previous knowledge has been combined in citing patents to produce new knowledge. Noise identification in the knowledge flow between patents is primarily about the distinction between relevant and irrelevant prior art citations. Smith (2014) describes the emergence of citation noise in patents as follows: “Citation noise occurs due to timing in the patenting process, the point in time when the prior art references are identified and the amended state of the patent claims at that point in time. Citation noise also occurs due to different perspectives and comprehension concerning the technology and invention. Prior art references may be provided by many actors such as the inventor with the patent application, a searcher during a patent office search phase, an examiner during a patent examination phase, or an interested third party after publication of the application. Inventor-supplied citations are typically addressed before filing a patent application and are typically not relevant at the time of filing the patent application. Searcher-supplied citations and third party supplied citations may arrive later, after amendments. They may not be relevant due to the preceding amendments that redefine the invention” (pp. 36-37).

According to Smith (2014), current citation-based patent evaluation methods are constrained by citation noise, which may lead to insufficient indicators of the economic value of patents and may obscure thus valuable economic insights. We can assume that we have a similar problem with evaluative citation impact analyses in bibliometrics – if half of the citations are of the perfunctory and persuasive type (Bornmann & Daniel, 2008) and 10% to 20% of citations are inaccurate (Wakeling et al., in press). By focusing on relevant (accurate) patent citations, noise may be eliminated, uncovering hidden connections by identifying, for example, strategic relationships and enhancing technology and invention assessments. A patent may include backward-pointing citations that refer to earlier prior art, and it might also serve as a forward citation for later patents. Combined, these two types of links establish a network of meaningful relationships between patents and their citations – provided they can be discerned amid the background noise of patent citation data.

Just as Smith (2014) points out the importance of accurate citations in patent analysis, ensuring citation accuracy is also crucial in bibliometric citation analysis. The goal should be the reduction of error in citation data, which refers to both bias and noise. More than ten years ago, Murphy (2011) already pointed out that “the emergence of PubMed has resulted in a citation explosion and a resulting quagmire as many authors are just not selective about referencing papers and tend to be a bit over zealous in citing more and more papers many of which are not on the topic” (p. 307). The author was troubled by the growing trend in which authors cite papers they have not thoroughly read. His concern is that many of these authors may have engaged with nothing beyond a paper’s title. The authors’ decision to cite a work is not based on a comprehensive understanding, but rather on a superficial impression gleaned solely from the title or abstract. These citation decisions cannot represent knowledge flow and are erroneous. However, the citation error medal not only has the side of careless and superficial citation, but also the side of plagiarism by authors. In plagiarism, authors take content from another publication, but this knowledge flow is not indicated by a citation (Masic, 2013). The authors who have made a knowledge flow fail to insert a citation in their text, although they should have done so – in the case of an accurate citation decision.

In the following, we would like to present some methods with which citation errors (noise) can be reduced and the proportion of accurate citations can be increased. On the one hand, this involves the method of aggregation of citation decisions, which is common in bibliometrics, and on the other hand, measures for improving citation decision hygiene such as guidelines and training for accurate citation decisions. These measures are generally aimed at reducing citation noise in general, i.e., without focusing on specific citation errors.

3.1 Aggregation of citation decisions

Aggregating citations in citation analysis builds on what Galton (1907) termed the ‘wisdom of crowds’ at the dawn of the twentieth century. The author reported on a contest held at an English livestock show, where 787 participants estimated the weight of a publicly displayed ox. Remarkably, the median estimate of 1,207 pounds was within less than 1% of the ox’s actual weight of 1,198 pounds. Estimating the ox’s weight by averaging over a larger number of independent estimates was probably one of the first demonstrations of the advantage of the crowd over a single opinion: When a sizable group offers judgments or estimates, the combined result tends to be highly accurate. The insight of Galton (1907) can also be applied to citation analysis. Citation analysis relies on the judgments of many researchers: A paper is widely cited if it proves useful by many researchers, and less so if it does not.

The insights of Galton (1907) from the estimation of the weight of an ox form the foundation for the influential book published by Surowiecki (2004) ‘The wisdom of crowds’. Surowiecki (2004) emphasizes that not all crowds make sound decisions – the collective judgment achieves high accuracy only when individual opinions are formed independently. If we apply the condition to bibliometrics, citation analysis can be expected to be highly reliable. First, citations reflect the evaluations of many scientists. Since there is hardly any scientist who does not summarize their research results in a publication, and scientists cite in these publications, we are talking about an enormously large community of citing scientists. Even if publications are not cited at all, this is ultimately based on the citation decisions of citing scientists. Second, most of these citations arise from independent assessments by individual researchers, with only limited instances where external factors (such as publisher or reviewer citation suggestions) might influence the decision to cite. The influence of external factors on citation decisions could certainly be reduced if citing authors were made more aware of the issues of citation accuracy, bias, and noise.

When each author’s citation decision is modeled as a binary outcome (with a probability p of citing a paper), the variance of that decision is given by p(1 – p). We have already dealt with the calculation of the variance in the previous section. If the citation decisions are independent, the average (or normalized sum) of many such binary variables (with the number n) forms the basis of a binomial distribution, and – by the central limit theorem – the standard error of the average decreases proportionally to 1⁄√𝑛. Although citation counts are usually presented as sums, the underlying reliability of the aggregated measure improves with the number of independent decisions to cite or not. Two (extreme) examples (Table 2 and Table 3 in the Appendix) should illustrate the relationship between citation bias and citation noise, on the one hand, and aggregated citation counts, on the other hand. The social citation systems in Table 2 and Table 3 deviate slightly from the scenario in Table 1. Table 1 contains five cited papers and 10 citing papers. Table 2 contains only four citing papers and Table 3 contains 11 citing papers. These changes were made to illustrate the two ways in which incorrectly given and incorrectly missing citations can average out on the aggregate level.

The first example in Table 2 is a fictitious social citation system, in which each cited paper is either over-cited (𝑇𝐶_𝑘 > 𝐸𝐶_𝑘) or under-cited (𝑇𝐶_𝑘 < 𝐸𝐶_𝑘), despite the absence of citation bias and citation noise. The second example in Table 3 is another fictitious social citation system, in which every cited paper has been accurately cited (𝑇𝐶_𝑘 = 𝐸𝐶_𝑘) even though there is an extreme amount of citation noise in the system. In both examples, the citations are unbiased because incorrectly given and incorrectly missing citations cancel out when the citations are aggregated across cited papers. The examples demonstrate that the larger the number of cited papers is, the larger is the improvement in citation accuracy gained by aggregating citation counts. However, aggregating citation counts across numerous cited papers does not ensure accuracy on the level of individual cited papers. The accuracy of an individual cited paper’s citation counts strongly depends on the amount of citation noise in the system and on the number of independent citation decisions made by potentially citing authors. The greater the number of independent citation decisions per cited paper, the higher the likelihood that incorrectly given and incorrectly missing citations cancel out on the individual paper level.

In summary, citation analysis profits from two distinct types of aggregation depending on the level of the analysis: Aggregating citation counts across cited papers helps to combat the undesirable consequences of citation noise in sufficiently large samples. On the level of individual cited papers, the same is achieved by aggregating independent citation decisions. This means that citation counts can be reliable even for papers with very few (or even no) citations as long as a large enough number of citing authors made the independent decision not to cite these papers. The amount of aggregation required to ensure the reliability of citation counts depends on the amount of citation system noise: The larger the amount of citation noise in the system, the more aggregation is required to perform valid and reliable citation analysis. Although not dealing with citation noise in citation decisions, van Raan (2005) argues for the method of aggregation in targeting citation errors: “So undoubtedly the process of citation is a complex one, and it certainly not provides an ‘ideal’ monitor on scientific performance. This is particularly the case at a statistically low aggregation level, for example the individual researcher. There is, however, sufficient evidence that these reference motives are not so different or ‘randomly given’ to such an extent that the phenomenon of citation would lose its role as a reliable measure of impact. Therefore, application of citation analysis to the entire work, the ‘oeuvre’ of a group of researchers as a whole over a longer period of time (author’s emphasis), does yield in many situations a strong indicator of scientific performance (pp. 134-135).

Aggregating multiple independent judgments harnesses the ‘wisdom of crowds’ concept by averaging individual assessments to diminish noise. As independent estimates are combined, the overall variability is reduced, leading to a more reliable consensus. While simple averaging is a common method among aggregation techniques in many areas where decisions are made, in citation analysis, sums of citations across papers are calculated to obtain an aggregated judgment beyond the individual judgments, which can contribute to noise reduction. Kahneman et al. (2021) point out regarding the aggregation of decisions in general that aggregated decisions may be less noisy, but not less biased, than the individual judgments. Citation biases are based on the average citation error of many citing authors. We suspect that the lack of reduction of biases in aggregated citation decisions is one reason why citation biases have been investigated very frequently in bibliometrics to date – in contrast to citation noise.

3.2 Citation decision hygiene

In addition to the aggregation of citations, another strategy for improving the informative value of citation analyses is various measures to improve citation decision hygiene, which are presented in this section.

3.2.1 Guidelines for ensuring accurate citations

We would like to begin with the development of guidelines for ensuring accurate citations that could serve as a guide for scientists in the academic writing process. These guidelines should help citing authors to ensure that citation decisions are not influenced by superficial reading, individual preferences or temporary moods, but are based on the experience of knowledge flow in the writing process. If knowledge elements in the manuscript originate from another work that has already been published, then this is knowledge flow that should be indicated. The useful knowledge for the citing author may, for example, be an empirical result, a certain theory or a specific method that was introduced and described in a previous work.

Superficial reading, individual preferences, and temporary moods are sources of citation noise. Although scholarly publishing relies on peer review and editorial oversight, many citations in a paper do not correspond to the content of a previous work or do not support the statement it accompanies (Wakeling et al., in press). In this paper, we argue not only that citation guidelines should be developed to improve citation decision hygiene at citing authors, but also that these authors should be encouraged to follow these guidelines through appropriate journal policies to guiding not only authors but also reviewers and editors. Reviewers and editors should be instrumental in detecting citation errors; systematic research into peer review practices for verifying citation accuracy is therefore important to know which guidelines are necessary (Wakeling et al., in press). According to our research, citation guidelines for scientists already exist, but these guidelines generally deal with the format and style of citations (for example, Harvard style or Chicago style). Following citation styles may reduce citation noise caused by cited reference errors in literature databases, but the guidelines do not provide any guidance on when it is necessary to cite a particular publication and when it is not.

In the past, when (evaluative) citation analysis in bibliometrics focused on relevant citations, this focus was not ex-ante – at the level of citing authors’ decisions – but ex-post – based on the empirical classification of citations in papers. These classifications have been carried out in so-called citation context analyses since the beginning of bibliometrics in the 1960s. An overview of the studies can be found in Bornmann and Daniel (2008) and Tahamtan and Bornmann (2019). In these studies, citations of publications are assigned to the affirmative type, assumptive type, conceptual type, contrastive type, methodological type, negational type, perfunctory type or persuasive type via the surrounding text in the analyzed manuscript. While early citation context analyses were mainly based on small datasets (due to the high – manual – effort required to code the citation contexts), analyses are now possible using more extensive datasets. In 2021, Clarivate (2022) has started to assign citations in papers to specific categories, which makes it possible to separate substantial from superficial citations in the WoS. This separation makes it possible in principle to limit citation analyses to substantial citations and thus reduce noise in citation data.

However, the use of citation context data has several disadvantages. On the one hand, considerable effort is required to classify all citations in papers ex-post. So far, only Clarivate (and no other database provider) has started to make citation context data available on a larger scale, and this data is not available for all papers in the WoS database. On the other hand, there are various procedures for classifying citation context, which relate to the classification scheme used, the algorithm for assigning context to categories, and the definition of the amount of text around a cited paper that is used for classification. Against the background of the disadvantages of an ex-post-oriented procedure, we consider it a better strategy for increasing the informative value of citation analyses to improve the citation decisions of authors ex-ante by means of citation guidelines. Just as it is standard when writing a paper to follow a certain citation style (such as the style of the American Psychological Association), it should be standard to insert a citation into one’s own text in a well-considered manner – in the case of a knowledge flow. We assume that citation guidelines can be an efficient mechanism for the reduction of noise, because they specify exactly when a citation should be made, thereby reducing the variance between the authors’ citation decisions.

In section 2 and thereafter, we defined the following citation practice as citation norm: Another paper is cited exactly when a knowledge flow has taken place. We wondered what the perception of such citation norms among scientists is. Do these norms play a (major) role in scientific work? Bruton, Macchione, Brown, and Hosseini (2025) conducted a survey on such questions and questions on ethically questionable citation behaviors among 257 United States (US) researchers receiving federal funding from the US National Institutes of Health, the US National Science Foundation, and the US National Endowment for the Humanities. Through the respondents’ answers, the authors were able to group ethically questionable citation behaviors into three categories: strategic citations, neglectful citations, and blind citations. Furthermore, the authors found that ethically questionable citation behaviors do not depend on the length of the scientific career: all scientists are affected. For the authors, the empirical results suggest that citation norms should be more clearly articulated in academia and that there should be improved guidance about citations. Their study shows that “rarely are the ethical norms of citations articulated clearly and meticulously” (Bruton et al., 2025).

To the best of our knowledge, one of the few guidelines about citing accurately has been published by Murphy (2011). Even if the rules of Murphy (2011) refer to different aspects in connection with citation decisions, they essentially concern the indication of knowledge flow by the citing author:

“Read and comprehend all of the literature that you are citing in your manuscript
Cite the primary literature and the actual papers to which a particular discovery is attributed; if multiple citations need to be made, do so
Cite the literature that agrees as well as that which disagrees with your point-of- view; be fair to multiple points-of-view
Be as inclusive as possible, but know where to draw the line, citing the most pertinent literature and the original papers
Use reviews judiciously in your manuscript and use them properly
Do NOT cite reviews in lieu of citing the original literature
Remember that PubMed [a literature database in biomedicine] goes back to 1966, but many worthwhile discoveries were made prior to that time and it is important to be historically accurate in your citations
When citing your own work, do so to support your point-of-view or to put the current work into the proper context for the reader. Do not attempt to grow your own citation base solely on the dreaded ‘self-citation’” (p. 309).

In addition to this list of eight rules in the citation guidelines, the literature is otherwise rather fragmented when it comes to rules and necessities in connection with citation decisions. Penders (2018) points out, for example, that citations should be used to differentiate your own research from that of others, and that citations in papers can provide readers with references to other relevant literature: Omitting citations to relevant previous publications “can wrongfully suggest that your own publication is the origin of an idea, a question, a method, or a critique, thereby illegitimately appropriating them. Citations identify where ideas have come from, and consulting the cited works allows readers of your text to study them more closely, as well as to evaluate whether your use of them is appropriate” (Penders, 2018). In addition, the author opposes the use of superficial citations: “Ask yourself why you are citing prior work and which value you are attributing to it, and whether the answers to these questions are accessible to your readers” (Penders, 2018).

3.2.2 Citation justification table

To improve the citation decision hygiene among authors, we would like to propose as one citation decision hygiene component that every future paper should contain a so-called citation justification table in the Appendix. In this table, the author of a paper should indicate why a certain publication was cited at a certain point in the manuscript. In other words, the author should enter in the table the knowledge flow for which a particular citation was inserted in the publication. As an example of a citation justification table, we have generated a corresponding table in the Appendix of this paper where we explain the reasons for every citation included in this paper (see Table 4). References to such citation justification tables can already be found at Penders (2018): “Sources deserve credit for the exact contribution they offer, not their contribution in general. This may mean that you need to cite a single source multiple times throughout your own argument, including explanations or indications why”. By creating a citation justification table, authors are urged to cite relevant publications in particular (otherwise an author could not enter anything in the table). In addition, the information in these tables on the papers could be used to classify citations in the papers ex post (by providers of literature databases such as Clarivate). In this classification, however, it would no longer be a question of whether the citation is superficial or substantial, but rather which knowledge elements were taken from prior literature, such as a statistical method used, an inspiring idea, a certain empirical result or an underlying theory. Thus, the meaningfulness of ex-post citation classifications would increase significantly.

3.2.3 Training of (young) researchers

As citation decision hygiene has hardly played a role in the training of early career researchers to date (to the best of our knowledge), we would consider it necessary for early career researchers to be trained in this area in the future. Young researchers should be trained in how to make accurate citations to the literature used in their own papers. During training, young academics should be taught the difference between substantial and superficial citations. Positive and negative examples from the scientific literature can be used for this purpose. The young researchers should learn about best citation practices in the literature and what is actually meant by knowledge flow. As early career researchers often base their scientific work on what has been published in the publication manuals of professional associations (such as the publication manual of the American Psychological Association, 2020), these manuals should also include rules to promote citation decision hygiene. Instructional videos on citation decision hygiene might be additionally helpful.

3.2.4 Correction of citation errors

Another element in citation decision hygiene could be the identification and correction of citation errors in publications that have already been published. Since scientific literature is usually published digitally these days, it should be possible to correct errors in citations retrospectively. However, this process should not only be in the hands of the citing author but should also be supported by staff at the relevant publisher. A correction should only be possible if the citing author can clearly demonstrate that the correction to a citation is necessary. We were only able to find one study that has dealt with the possible correction of citations by citing authors in the past. Wakeling et al. (in press) asked authors whether they had ever come across a citation of their work that they considered inappropriate and, if so, what measures they took. The results are as follows: “While 43.0% of respondents said they had never encountered an inappropriate citation of their work, 46.4% said they had, but had taken no action. 9.3% of respondents said they had contacted the authors of the citing article that inappropriately cited their work, while 3.1% said they had contacted the journal editor, and 0.7% the publisher” (Wakeling et al., in press). The results reveal that only a few authors have so far considered the possible correction of their works’ citations in other publications.

3.2.5 The use of AI for citation decision correction

As a final building block for the promotion of citation decision hygiene, we would like to mention the potential use of artificial intelligence (AI). In recent years, several application areas have been presented in the literature in which AI could be integrated into the work process of scientists. An overview of the various areas in which tasks can be performed with astonishing speed and accuracy formerly regarded as quintessentially human can be found at Binz et al. (2025) and Wang et al. (2023). One of these areas also involves suggesting scholarly references for a scientific text. For example, Algaba et al. (2024) dealt with this in an experiment based on 166 papers. The authors used the large language model (LLM) GPT-4 from OpenAI to suggest scholarly references for anonymized scientific texts. Even though the authors observed “a remarkable similarity between human and LLM citation patterns” and stated that LLMs “can aid in citation generation”, the authors found that the generated citations “may also amplify existing biases and introduce new ones, potentially skewing scientific discourse” (Algaba et al., 2024). In an earlier study by Khan et al. (2023) in the field of stem cell research, the authors similarly point out that “ChatGPT has the ability to produce generally accurate references, although it was observed to occasionally generate artificial hallucinations” (p. 5275).

In this study, we do not advocate using AI in the academic writing process to insert citations into a text if knowledge flow from a previous publication can be assumed for this text. Since only the author of a text can judge when a knowledge flow has taken place and when it has not, no AI can do this autonomously. However, no author is infallible in the writing process. We can therefore imagine AI being used in this process to identify text passages in a manuscript where a quote may have been forgotten by the author or where a quote may have been erroneously inserted. AI would therefore be used ex ante – before a scientific text is published – to check citation decisions of authors as a debias and denoise instrument. AI would mark places in the text where it might be necessary to cite another work. This would be the case, for example, if the citing author uses knowledge that has already been published, refers to a theory without having cited the underlying work, or the citing author forgets to cite the corresponding publication when using a method.

Following studies such as those presented by Khan et al. (2023) and Algaba et al. (2024), future experimental studies should examine whether AI can be used to inspect citation decisions of authors. If it turns out to be possible, it should be determined how citing authors can use AI to control their citations. We could imagine AI being used by scientists even before the writing process begins, for example to identify previous research that is relevant to their own research. What has already been empirically researched? Which methods should be used in the own research? Which theories could be relevant for this own research? Since AI can draw on a broad base of scientific literature, it may be very helpful in preparatory processes for the own research.

4 Discussion

Citations are a fundamental component of scientific communication, representing the flow of knowledge between research papers. Research papers include ‘codified knowledge’ “that is addressed to a large and partly anonymous audience” (Aman & Gläser, 2025, p. 158). Since the anonymous audience uses the codified knowledge as a basis for future research, citations of previous papers are integral components to the evaluation of research output and impact, forming the core of various ‘bibliometrics-based heuristics’ (Bornmann & Marewski, 2019). Bibliometrics-based heuristics are simplified decision-making strategies derived from the statistical analysis of paper and citation counts. The heuristics utilize bibliometric metadata to create mental shortcuts or rules of thumb for research evaluation processes. Since the accuracy of the underlying citation data is paramount for the validity of such evaluations and decision-making, this paper delves into the critical issues of citation noise and bias. Citation noise and bias can distort the ‘true’ representation of knowledge flow and, consequently, the reliable and valid assessment of research output and impact. Arbitrary citations contradict the meaningfulness of citations for research evaluation. With the conceptualization of citation accuracy, citation noise, and citation bias in a knowledge flow framework of citations, this study is intended to provide a foundation of citation analysis in bibliometrics.

Citation accuracy refers to the precise attribution of knowledge flow from cited to citing papers, ensuring that each citation reflects a genuine intellectual contribution. Citation noise, on the other hand, introduces random variability into citation decisions, leading to inconsistencies and potential misrepresentation of research impact. For example, Teplitskiy, Duede, Menietti, and Lakhani (2022) tested whether citations reflect rhetorical usefulness or influence on research. The authors used “data on 17,154 randomly sampled citations collected via surveys from 9,380 corresponding authors in 15 fields” and found that “most citations (54%) had little-to-no influence on the citing authors”. In contrast to random variability of citation noise, citation bias represents systematic deviations, where certain factors influence citation decisions beyond actual knowledge flow, such as the reputation of authors or journals. A social citation system in which professional judgments of researchers (irritating the social citation system) can be seen as inconsistent, biased, and arbitrary loses credibility for its use in research evaluation.

To illustrate citation accuracy, citation bias, and citation noise, this study employs a fictitious social citation system, a simplified model of citing papers and cited papers. The system reveals the prevalence of citation errors and their possible influence on citation-based metrics. The statistical analysis of the system demonstrates that citation noise may lead to substantial distortions in citation impact measurements, undermining the reliability of these metrics in research evaluation. This paper also explores strategies to mitigate citation noise and bias. One strategy is the aggregation of citation decisions, leveraging the ‘wisdom of crowds’ to reduce the influence of individual errors. Improving citation decision hygiene through guidelines, training, citation justification tables, post-publication citation corrections, and the use of AI is proposed as another strategy. These measures aim to enhance the accuracy of citation decisions, ensuring that citations ‘truly’ reflect knowledge flow and contribute to a more equitable research evaluation system.

This study underscores the importance of addressing citation noise and bias to ensure the validity and reliability of citation-based metrics. By understanding the complexities of citation dynamics and implementing targeted strategies, the scientific community can foster a more accurate and fair system of research evaluation. While citation bias has been extensively studied in bibliometrics, citation noise has received little attention. We argue that citation noise is at least as problematic as citation bias, as it introduces random variability into citation decisions, undermining the reliability of citation-based metrics. Citation noise is variability in citation decisions that should be identical between different authors. Variability is part of the scientific enterprise and part of citation decisions. However, when citation decisions are used to evaluate science, these decisions should be accurate. If they contain a lot of noise, they are not able to measure knowledge flow.

The problem of noise can be illustrated with various examples from bibliometrics. We present two examples in the following: (1) Li, Lin, and Wu (2025) investigated the innovative roots of the seminal paper “On computable numbers, with an application to the Entscheidungsproblem” published by Turing (1937) based on its cited references. The authors analysis of “Turing’s seven references reveals a remarkably conventional citation pattern, rooted in mainstream mathematical and logical texts” (Li et al., 2025). Noise in citation data would make such an evaluation and conclusion by Li et al. (2025) impossible or ultimately incorrect. (2) Funk and Owen-Smith (2017) suggest the CD index for the patent system; Wu, Wang, and Evans (2019) proposed the adaptation of this index to bibliometrics. The index measures whether a focal work (patent or paper) is able to disrupt its precursor works based on citing and cited works of the focal work. The indicator can only validly measure disruption if citing authors have accurately cited the focal work and its cited references: The index assumes thus “that a patent’s knowledge foundation is fully captured by its cited references, yet foundational knowledge may be unacknowledged or inaccurately cited in practice” (Yang, 2025). According to Liu, Zhang, and Li (2023), “diverse citation behaviors may result in inconsistencies between actual D [CD index] values and expected values”. Noise in citation data significantly questions the meaningfulness of the indices in science of science studies. This noise may be tackled by advanced statistical modelling approaches that consider uncertainty in the calculation of CD index values (Mutz & Bornmann, 2023).

Over the past decade, a ‘professional reform movement’ has emerged, emphasizing the careful and responsible use of bibliometric indicators in research evaluation. Key initiatives include the Leiden Manifesto, the UK Metric Tide report, and the EU’s CoARA agreement, all of which advocate for a more nuanced approach to research assessment. These efforts aim to reduce reliance on (journal-based) impact metrics and highlight the limitations of citation-based indicators in capturing the full performance of institutions. The DORA agreement and the More Than Our Rank initiative further stress the need to move beyond simplistic metrics, arguing that university rankings often fail to reflect essential aspects of institutional performance. Although the professional reform movement focuses on limiting the reliance on citation-based metrics, it does not address the root problem of citation analysis: citation errors. We argue that improving citation practices is essential for the valid use of citation analysis in research evaluation. This perspective implies that the responsibility for improving research evaluation lies not only with the evaluation system but also with citing authors. We certainly won’t be able to prevent citation noise altogether (some citation noise may be inevitable in practice), but we should try to reduce noise as much as possible. It will be a question of future research to investigate whether noise-reduced citation data reliably reflect knowledge flow in the science system, and the results of the analysis of the noise- reduced data are more helpful in research evaluation processes than the results based on data without noise-reduction.

This study has several limitations. First, the statistical analysis is based on a fictitious social citation system, which simplifies the complexities of real-world social citation systems. While this approach is useful for illustrating key concepts, it may not fully capture the dynamics of citation decisions in actual scientific communities. Second, the paper only partly provides empirical evidence for the prevalence of citation noise in real-world citation data. This evidence can be derived, for example, from citation context studies. Future research should address this gap by conducting (large-scale) empirical studies to quantify the extent of citation noise and its impact on citation-based metrics. The intention of this conceptual paper was to introduce the general accuracy and noise framework for citation decisions. Third, the paper’s focus on citation noise may overlook the interplay between noise and bias in citation decisions. For example, certain biases may contribute to noise, and vice versa, and understanding these interactions is crucial for developing effective strategies to improve citation accuracy.

Fourth, we argue in this study that citations should only be included in a manuscript if knowledge flow has happened. Although this rule sounds trivial, it may be difficult to reach the goal in every case. On the one hand, not all citing authors have the same standards: “variation in [citation] error rates … may very well reflect the different standards by which those researchers are judging what constitutes an error, rather than error rates in an objective sense” (Wakeling et al., in press). However, missing standards may be the results of missing trainings. On the other hand, citing authors usually follow the principle of ‘obliteration by incorporation’ (Merton, 1965). The principle occurs when an idea is so successful and fundamental that it becomes standard knowledge without the necessity no longer to cite the original work: the work becomes invisible in the formal citation network. For example, we used the standard deviation to measure citation noise in this study. This statistic was introduced by Fisher (1925) in his book “Statistical methods for research workers”. Although the statistic is heavily used in the empirical social sciences and beyond, the book is scarcely cited alongside its use; the standard deviation has become basic statistics such as the arithmetic mean. Obliteration by incorporation may be seen as a principle that belongs to an overarching phenomenon: Authors draw from a “deeper or broader knowledge reservoirs than what is reflected in the paper’s references” (Schilling & Green, 2011, p. 1325).

To address the limitations of this paper, several avenues for future research can be proposed:

First, empirical studies are needed to investigate the prevalence and impact of citation noise in real-world citation data. Such studies could analyze the variability in citation decisions (noise) across different fields. The extent of citation noise can be measured in citation noise audits. These are experiments in which several experts in a field make independent citation decisions for the same scholarly text and variability in the decisions can be measured. Empirical studies are also needed to assess the extent to which noise affects citation-based metrics. Here, it would be also interesting to quantify how much of citation pattern noise is stable and how much is citation occasion noise. These two types of pattern noise require different interventions to reduce the overall noise in citation decisions. Future research could also address heuristics (rules of thumb) authors use to cite certain papers. If these heuristics are known, best-practice heuristics can be developed for ensuring accurate citation decisions.

Second, future research could investigate the number of citation decisions not only in favor of, but also against cited papers to determine the reliability of citations. Quantifying the reliability of citations requires data on citation decisions that did not lead to citations. One potential source of data for such studies is Mendeley (see https://www.mendeley.com). Mendeley is a reference management software that provides data on the number of users (readers) that added a certain work to their personal library. Research has shown that Mendeley reader counts can be used as indicators for the future use and citation of publications (Mohammadi, Thelwall, & Kousha, 2016; Thelwall, 2018). We assume that the difference between a publication’s Mendeley reader counts and its citation counts can be used to estimate the number of citation decisions that did not manifest in citations.

An alternative strategy to investigate citation noise and accuracy may be to perform empirical studies in rather small fields with a manageable amount of literature. Koffi (2025) proposes the ‘omission indicator’ to identify papers that should have been cited in citing papers. The ‘omission indicator’ is “a binary variable that detects when an article fails to cite a relevant previously published paper. Indeed, having determined the similarity score for every article pair [in a certain field], one can isolate the most similar articles to a given paper based on textual distance. Within this ‘most similar set,’ if a subsequent paper overlooks a preceding one despite several similarities, the omission indicator flags it with a value equal to one (or zero otherwise)” (p. 2208). This indicator may be used to estimate the accuracy of citation decisions and the reliability of citations in the respective field.

Third, future research could evaluate the effectiveness of the strategies proposed in this paper for reducing citation noise and bias. For example, studies could assess the impact of citation guidelines, training programs, and AI-based tools on citation accuracy. These evaluations should consider not only the quantitative effects on citation metrics, but also the qualitative changes in citation practices and the perceptions of scientists regarding the role of citations in research evaluation. In this study, we focused on several strategies for improving citation accuracy. Other strategies are possible as well and could be elaborated in future studies such as the focus of evaluative citation analyses on journals working with measures to improve citation accuracy, for example, by demanding citation justification tables from the authors.

Fourth, future research could address possible contradicting effects of the strategies proposed above against citation noise and bias. On the one hand, AI-based tools may be helpful to improve the citation decision hygiene of authors. On the other hand, the use of the tools may lead to citation inflation in future papers whereby papers cite more work that shouldn’t be cited at all.

Fifth, the application of citation justification tables to promote citation decision hygiene could be tested in real-world settings. These tables could be implemented in academic writing workflows, and their effectiveness in reducing citation noise and improving the validity of citation-based metrics could be assessed through empirical studies. We could imagine that citation justification tables are used in future studies to measure – based on the explanations of the citing authors – the content of knowledge flow and the impact strength exercised by the cited paper on the citing publication.

5 Conclusions

Kahneman et al. (2021) provide a valuable framework for understanding the variability in human decisions. The framework can be reasonably applied in contexts including recurrent decisions (as opposed to single decisions). This is the case with citation decisions where interchangeable authors in a certain field make citation decisions in papers that are standardized (in style, content, structure etc.) using a pool of previous papers that are available to authors for possible citing. The existence of noise in citation decisions has significant implications for the integrity and reliability of research evaluation. By understanding and addressing the sources of noise, researchers and institutions can work towards creating a more consistent, fair, and meaningful bibliometric evaluation system. This not only enhances the quality of research evaluation processes but also strengthens trust in the academic enterprise. When citation decisions appear arbitrary or biased, trust in the academic evaluation system can diminish. By applying the framework proposed by Kahneman et al. (2021) to the context of citation decisions, we can identify strategies to mitigate noise and improve the overall rigor of academic citation practices.

This paper highlights the importance of addressing citation noise in citation analysis and argues that improving citation accuracy is essential for the reliable and valid use of citation-based metrics in research evaluation. The practical relevance of citation noise for bibliometric research evaluation depends on the level of aggregation. Performing bibliometrics on a high level of aggregation especially requires the assumption that citation decisions are unbiased. For example, citation data can be used to compare the scientific output of research institutions if the average number of citations for each institution is unbiased. Whereas the citation counts of large aggregates of publications are mostly subject to bias, the citation counts of individual papers may be subject to both bias and citation noise. The lower the level of aggregation, the more destructive are the potential consequences of citation noise. The bibliometric evaluation of individual papers or authors may only yield valid results if citation noise is below a critical threshold. While this paper provides a theoretical foundation for understanding citation analyses and proposes strategies to reduce citation noise, further research is needed to translate these ideas into practice. By addressing the limitations of this study and exploring the controversial aspects of citation noise, future research can contribute to a more accurate and equitable system of research evaluation.

References

Aksnes, D. W., Langfeldt, L., & Wouters, P. (2019). Citations, citation indicators, and research quality: An overview of basic concepts and theories. Sage Open, 9(1). doi: 10.1177/2158244019829575.

Algaba, A., Mazijn, C., Holst, V., Tori, F., Wenmackers, S., & Ginis, V. (2024). Large language models reflect human citation patterns with a heightened citation bias. Retrieved May 31, 2024, from https://arxiv.org/abs/2405.15739

Aman, V., & Gläser, J. (2025). Investigating knowledge flows in scientific communities: The potential of bibliometric methods. Minerva, 63(1), 155-182. doi: 10.1007/s11024-024- 09542-2.

American Psychological Association. (2020). Publication manual of the American Psychological Association (7. ed.). Washington, DC, USA: American Psychological Association (APA).

Binz, M., Alaniz, S., Roskies, A., Aczel, B., Bergstrom, C. T., Allen, C., . . . Schulz, E. (2025). How should the advancement of large language models affect the practice of science? Proceedings of the National Academy of Sciences, 122(5), e2401227121. doi: 10.1073/pnas.2401227121.

Bornmann, L., & Daniel, H.-D. (2008). What do citation counts measure? A review of studies on citing behavior. Journal of Documentation, 64(1), 45-80. doi: 10.1108/00220410810844150.

Bornmann, L., & Marewski, J. N. (2019). Heuristics as conceptual lens for understanding and studying the usage of bibliometrics in research evaluation. Scientometrics, 120(2), 419-459. doi: 10.1007/s11192-019-03018-x.

Bruton, S. V., Macchione, A. L., Brown, M., & Hosseini, M. (2025). Citation ethics: An exploratory survey of norms and behaviors. Journal of Academic Ethics, 23, 329-346. doi: 10.1007/s10805-024-09539-2.

Cawkell, A. E. (1969). Citation noise. Aslib Proceedings, 21(11), 467.

Chen, B., Murray, D., Liu, Y., & Barabási, A.-L. (2024). The origin, consequence, and visibility of criticism in science. Retrieved December 19, 2024, from https://arxiv.org/abs/2412.02809

Clarivate. (2022). Citation context in Web of Science. Retrieved January 28, 2025, from https://clarivate.com/webofsciencegroup/wp- content/uploads/sites/2/dlm_uploads/2022/05/202205-WoS-citation-context-final.pdf

Donner, P., Stahlschmidt, S., Haunschild, R., & Bornmann, L. (in press). Does citation context information enhance the validity of citation analysis for measuring research quality? An empirical comparison of peer assessments and enriched citations. Quantitative Science Studies. doi: 10.1162/qss.a.15.

Fisher, R. A. (1925). Statistical methods for research workers. Edinburgh, UK: Oliver & Boyd.

Funk, R. J., & Owen-Smith, J. (2017). A dynamic network measure of technological change. Management Science, 63(3), 791-817. doi: 10.1287/mnsc.2015.2366. Galton, F. (1907). Vox populi. Nature, 75, 450-451. doi: 10.1038/075450a0.

Gavras, H. (2002). Inappropriate attribution: The “lazy author syndrome”. American Journal of Hypertension, 15(9), 831. doi: 10.1016/s0895-7061(02)02989-8.

Gilbert, G. N. (1977). Referencing as persuasion. Social Studies of Science, 7(1), 113-122. Hicks, D., Wouters, P., Waltman, L., de Rijcke, S., & Rafols, I. (2015). Bibliometrics: The Leiden Manifesto for research metrics. Nature, 520(7548), 429-431. doi: 10.1038/520429a.

Jaffe, A. B., Trajtenberg, M., & Fogarty, M. S. (2000). Knowledge spillovers and patent citations: Evidence from a survey of inventors. American Economic Review, 90(2), 215-218. doi: 10.1257/aer.90.2.215.

Jannot, A. S., Agoritsas, T., Gayet-Ageron, A., & Perneger, T. V. (2013). Citation bias favoring statistically significant studies was present in medical research. Journal of Clinical Epidemiology, 66(3), 296-301. doi: 10.1016/j.jclinepi.2012.09.015.

Jergas, H., & Baethge, C. (2015). Quotation accuracy in medical journal articles: A systematic review and meta-analysis. PeerJ, 3, e1364. doi: 10.7717/peerj.1364.

Kahneman, D., Sibony, O., & Sunstein, C. R. (2021). Noise: A flaw in human judgment. New York, NY, USA: Little, Brown.

Khan, S., Banu S, A., Pawde, A., Kumar, R., Akash, S., Dhama, K., & Amarpal, A. (2023). ChatGPT and artificial hallucinations in stem cell research: Assessing the accuracy of generated references – a preliminary study. Annals of Medicine and Surgery, 85, 5275-5278. doi: 10.1097/MS9.0000000000001228.

Koffi, M. (2025). Innovative ideas and gender (in)equality. American Economic Review, 115(7), 2207–2236. doi: 10.1257/aer.20211811.

Kousha, K., & Thelwall, M. (2024). Factors associating with or predicting more cited or higher quality journal articles: An Annual Review of Information Science and Technology (ARIST) paper. Journal of the Association for Information Science and Technology, 75(3), 215-244. doi: 10.1002/asi.24810.

Li, L., Lin, Y., & Wu, L. (2025). Can recombination displace dominant scientific ideas. Retrieved June 27, 2025, from https://arxiv.org/abs/2506.15959

Liu, X., Zhang, C., & Li, J. (2023). Conceptual and technical work: Who will disrupt science? Journal of Informetrics, 17(3), 101432. doi: 10.1016/j.joi.2023.101432.

Masic, I. (2013). The importance of proper citation of references in biomedical articles. Acta Informatica Medica, 21(3), 148-155. doi: 10.5455/aim.2013.21.148-155.

Meho, L., & Yang, K. (2007, Jun 25-27). Fusion approach to citation-based quality assessment. Paper presented at the 11th International Conference of the International Society for Scientometrics and Informetrics, Madrid, Spain.

Merton, R. K. (1965). On the shoulders of giants: The post-italianate edition. Chicago, IL, USA: University of Chicago Press.

Merton, R. K. (1973). The sociology of science: Theoretical and empirical investigations. Chicago, IL, USA: University of Chicago Press.

Milojević, S. (in press). Science of science. Scientometrics. doi: 10.1007/s11192-025-05322- 1.

Mogull, S. A. (2017). Accuracy of cited “facts” in medical research articles: A review of study methodology and recalculation of quotation error rate. PLOS ONE, 12(9), e0184727. doi: 10.1371/journal.pone.0184727.

Mohammadi, E., Thelwall, M., & Kousha, K. (2016). Can Mendeley bookmarks reflect readership? A survey of user motivations. Journal of the Association for Information Science and Technology, 67(5), 1198-1209. doi: 10.1002/asi.23477.

Murphy, E. J. (2011). Citations: The rules they didn’t teach you. Lipids, 46(4), 307-309. doi: 10.1007/s11745-011-3543-3.

Mutz, R., & Bornmann, L. (2023). Measuring disruptiveness and continuity of research by using the Disruption Index (DI) – A Bayesian statistical approach [version 1; peer review: 2 accepted, 1 major revision] [preprint]. Paper presented at the 27th International Conference on Science, Technology and Innovation Indicators (STI 2023).

Mutz, R., Wolbring, T., & Daniel, H.-D. (2017). The effect of the “very important paper” (VIP) designation in Angewandte Chemie International Edition on citation impact: A propensity score matching analysis. Journal of the Association for Information Science and Technology, 68(9), 2139-2153. doi: 10.1002/asi.23701.

Penders, B. (2018). Ten simple rules for responsible referencing. PLOS Computational Biology, 14(4), e1006036. doi: 10.1371/journal.pcbi.1006036.

Rushforth, A., & Hammarfelt, B. (2023). The rise of responsible metrics as a professional reform movement: A collective action frames account. Quantitative Science Studies, 4(4), 879-897. doi: 10.1162/qss_a_00280.

Schilling, M. A., & Green, E. (2011). Recombinant search and breakthrough idea generation: An analysis of high impact papers in the social sciences. Research Policy, 40(10), 1321-1331. doi: 10.1016/j.respol.2011.06.009.

Small, H. (2004). On the shoulders of Robert Merton: Towards a normative theory of citation. Scientometrics, 60(1), 71-79. doi: 10.1023/B:SCIE.0000027310.68393.bc.

Smith, D. (2014). Finding the signal in the noise of patent citations: How to focus on relevance for strategic advantage. Technology Innovation Management Review, 4(9), 36-44. doi: 10.22215/timreview/830.

Sula, C. A., & Miller, M. (2014). Citations, contexts, and humanistic discourse: Toward automatic extraction and classification. Literary and Linguistic Computing, 29(3), 452-464. doi: 10.1093/llc/fqu019.

Surowiecki, J. (2004). The wisdom of crowds. New York, NY, USA: Random House.

Tahamtan, I., & Bornmann, L. (2018). Core elements in the process of citing publications: Conceptual overview of the literature. Journal of Informetrics, 12(1), 203-216. doi: 10.1016/j.joi.2018.01.002.

Tahamtan, I., & Bornmann, L. (2019). What do citation counts measure? An updated review of studies on citations in scientific documents published between 2006 and 2018. Scientometrics, 121(3), 1635–1684. doi: 10.1007/s11192-019-03243-4.

Tahamtan, I., & Bornmann, L. (2022). The Social Systems Citation Theory (SSCT): A proposal to use the social systems theory for conceptualizing publications and their citation links. Profesional de la información, 31(4), e310411. doi: 10.3145/epi.2022.jul.11.

Tang, B. L. (2023). Some insights into the factors influencing continuous citation of retracted scientific papers. Publications, 11(4). doi: 10.3390/publications11040047.

Teplitskiy, M., Duede, E., Menietti, M., & Lakhani, K. R. (2022). How status of research papers affects the way they are read and cited. Research Policy, 51(4), 104484. doi: 10.1016/j.respol.2022.104484.

Thelwall, M. (2018). Early Mendeley readers correlate with later citation counts. Scientometrics, 115(3), 1231-1240. doi: 10.1007/s11192-018-2715-9.

Thelwall, M. (2024). Quantitative methods in research evaluation: Citation indicators, altmetrics, and artificial intelligence. Retrieved July 5, 2024, from https://arxiv.org/abs/2407.00135

Thelwall, M., Kousha, K., Stuart, E., Makita, M., Abdoli, M., Wilson, P., & Levitt, J. (2023). In which fields are citations indicators of research quality? Journal of the Association for Information Science and Technology, 74(8), 941-953. doi: 10.1002/asi.24767.

Traag, V. A., & Waltman, L. (2022). Causal foundations of bias, disparity and fairness. Retrieved August 4, 2022, from https://arxiv.org/abs/2207.13665 Turing, A. M. (1937). On computable numbers, with an application to the Entscheidungsproblem. Proceedings of the London Mathematical Society, 42, 230-265. doi: 10.1112/plms/s2-42.1.230.

van Raan, A. F. J. (2005). Fatal attraction: Conceptual and methodological problems in the ranking of universities by bibliometric methods. Scientometrics, 62(1), 133-143. doi: 10.1007/s11192-005-0008-6.

Wakeling, S., Paramita, M. L., & Pinfield, S. (in press). How do authors perceive the way their work is cited? Findings from a large-scale survey on quotation accuracy. Journal of the Association for Information Science and Technology. doi: 10.1002/asi.70000.

Wang, H., Fu, T., Du, Y., Gao, W., Huang, K., Liu, Z., . . . Zitnik, M. (2023). Scientific discovery in the age of artificial intelligence. Nature, 620(7972), 47-60. doi: 10.1038/s41586-023-06221-2.

Wei, F. F., Zhang, G. J., Zhang, L., Liang, Y. K., & Wu, J. B. (2019, Sep 02-05). Decreasing the noise of scientific citations in patents to measure knowledge flow. Paper presented at the 17th International Conference of the International Society for Scientometrics and Informetrics (ISSI) on Scientometrics and Informetrics, Sapienza Univ Rome, Rome, Italy.

Wilsdon, J., Allen, L., Belfiore, E., Campbell, P., Curry, S., Hill, S., . . . Johnson, B. (2015). The metric tide: Report of the independent review of the role of metrics in research assessment and management. Bristol, UK: Higher Education Funding Council for England (HEFCE).

Wu, L., Wang, D., & Evans, J. A. (2019). Large teams develop and small teams disrupt science and technology. Nature, 566, 378–382. doi: 10.1038/s41586-019-0941-9.

Yang, A. J. (2025). Text vs. citations: A comparative analysis of breakthrough and disruption metrics in patent innovation. Research Policy, 54(8), 105295. doi: 10.1016/j.respol.2025.105295.

Appendix

Table 2. Fictitious social citation system with 1) no citation bias, 2) minimized citation system noise, 3) inaccurate citation counts for all cited papers (𝑇𝐶_𝑘 ≠ 𝐸𝐶_𝑘).
	Cited paper A			Cited paper B			Cited paper C			Cited paper D			𝑃𝑅_𝑖𝑗	𝑃𝐴_𝑖𝑗	𝑃𝐸_𝑖𝑗	𝑃̅̅𝐸̅̅_𝑖	𝜎𝑃𝑁,𝑖
	R	A	E	R	A	E	R	A	E	R	A	E	𝑃𝑅_𝑖𝑗	𝑃𝐴_𝑖𝑗	𝑃𝐸_𝑖𝑗	𝑃̅̅𝐸̅̅_𝑖	𝜎𝑃𝑁,𝑖
Author I: Citing paper 1	1	1	0	1	1	0	0	1	1	1	0	1	0.75	0.50	0.50	0.50	0.00
Author I: Citing paper 2	1	1	0	1	1	0	0	1	1	1	0	1	0.75	0.50	0.50
Author I: Citing paper 3	1	1	0	1	1	0	0	1	1	1	0	1	0.75	0.50	0.50
Author II: Citing paper 4	1	1	0	1	1	0	0	1	1	1	0	1	0.75	0.50	0.50	0.50	0.00
Author II: Citing paper 5	1	1	0	1	1	0	0	1	1	1	0	1	0.75	0.50	0.50	0.50	0.00
Author III: Citing paper 6	1	0	1	0	1	1	1	1	0	1	1	0	0.75	0.50	0.50	0.50	0.00
Author III: Citing paper 7	1	0	1	0	1	1	1	1	0	1	1	0	0.75	0.50	0.50
Author III: Citing paper 8	1	0	1	0	1	1	1	1	0	1	1	0	0.75	0.50	0.50
Author III: Citing paper 9	1	0	1	0	1	1	1	1	0	1	1	0	0.75	0.50	0.50
Author III: Citing paper 10	1	0	1	0	1	1	1	1	0	1	1	0	0.75	0.50	0.50
𝑃𝑅_𝑘	1			0.50			0.50			1.00
𝑇𝐶_𝑘	10			5			5			10
𝐸𝐶_𝑘		5			10			10			5
𝑃𝐴_𝑘			0.50			0.50			0.50			0.50
𝑃𝐸_𝑘			0.50			0.50			0.50			0.50
𝑃̅̅𝐴̅̅														0.60
𝑃̅̅𝐸̅̅															0.40
𝜎𝐿𝑁																0.50
𝜎𝑃𝑁																	0.00

Notes. 𝑅 = realized citation, 𝐴 = accurate citation, 𝐸= erroneous citation, 𝑃𝑅𝑖𝑗 = proportion of realized citations of citing author 𝑖 in citing paper 𝑗, 𝑃𝐴𝑖𝑗 = proportion of accurate citations of citing author 𝑖 in citing paper 𝑗, 𝑃𝐸𝑖𝑗 = proportion of erroneous citations of citing author 𝑖 in citing paper 𝑗, 𝑃̅̅𝐸̅̅𝑖 = Person-specific average error rate of citing author 𝑖, 𝜎𝑃𝑁,𝑖 = author-specific pattern noise (within-author variance of erroneous citations for author 𝑖), 𝑃𝑅𝑘 = cited paper 𝑘’s proportion of realized citations, 𝑇𝐶𝑘 = times cited, 𝐸𝐶𝑘 = expected citations, 𝑃𝐴𝑘 = cited paper 𝑘’s proportion of accurate citations, 𝑃𝐸𝑘 = cited paper 𝑘’s proportion of erroneous citations, 𝑃̅̅𝐴̅̅ = Proportion of accurate citations in the entire citation system, 𝑃̅̅𝐸̅̅ = Proportion of erroneous citations in the entire citation system, 𝜎𝐿𝑁 = overall level noise (between-author variance of erroneous citations), 𝜎𝑃𝑁 = overall pattern noise.

Table 3. Fictitious social citation system with 1) no citation bias, 2) maximized citation system noise, 3) accurate citation counts for all cited papers (𝑇𝐶_𝑘 = 𝐸𝐶_𝑘).
	Cited paper A			Cited paper B			Cited paper C			Cited paper D			Cited paper E			𝑃𝑅_𝑖𝑗	𝑃𝐴_𝑖𝑗	𝑃𝐸_𝑖𝑗	𝑃̅̅𝐸̅̅_𝑖	𝜎_{𝑃𝑁,𝑖}
	R	A	E	R	A	E	R	A	E	R	A	E	R	A	E	𝑃𝑅_𝑖𝑗	𝑃𝐴_𝑖𝑗	𝑃𝐸_𝑖𝑗	𝑃̅̅𝐸̅̅_𝑖	𝜎_{𝑃𝑁,𝑖}
Author I: Citing paper 1	1	0	1	1	0	1	1	0	1	1	0	1	1	0	1	1.00	0.00	1.00	1.00	0.00
Author I: Citing paper 2	0	1	1	0	1	1	0	1	1	0	1	1	0	1	1	0.00	0.00	1.00
Author I: Citing paper 3	1	0	1	1	0	1	1	0	1	1	0	1	1	0	1	1.00	0.00	1.00
Author II: Citing paper 4	0	1	1	0	1	1	0	1	1	0	1	1	0	1	1	0.00	0.00	1.00	1.00	0.00
Author II: Citing paper 5	1	0	1	1	0	1	1	0	1	1	0	1	1	0	1	1.00	0.00	1.00
Author II: Citing Paper 6	0	1	1	0	1	1	0	1	1	0	1	1	0	1	1	0.00	0.00	1.00
Author III: Citing paper 7	0	0	0	1	1	0	0	0	0	1	1	0	1	1	0	0.60	1.00	0.00	0.00	0.00
Author III: Citing paper 8	0	0	0	1	1	0	0	0	0	1	1	0	1	1	0	0.60	1.00	0.00
Author III: Citing paper 9	0	0	0	1	1	0	0	0	0	1	1	0	1	1	0	0.60	1.00	0.00
Author III: Citing paper 10	0	0	0	1	1	0	0	0	0	1	1	0	1	1	0	0.60	1.00	0.00
Author III: Citing paper 11	0	0	0	1	1	0	0	0	0	1	1	0	1	1	0	0.60	1.00	0.00
𝑃𝑅_𝑘	0.27			0.73			0.27			0.73			0.73
𝑇𝐶_𝑘	3			8			3			8			8
𝐸𝐶_𝑘		3			8			3			8			8
𝑃𝐴_𝑘			0.45			0.45			0.45			0.45			0.45
𝑃𝐸_𝑘			0.55			0.55			0.55			0.55			0.55
𝑃̅̅𝐴̅̅																	0.45
𝑃̅̅𝐸̅̅																		0.55
𝜎_𝐿𝑁																			0.50
𝜎_𝑃𝑁																				0.00

Notes. 𝑅 = realized citation, 𝐴 = accurate citation, 𝐸= erroneous citation, 𝑃𝑅𝑖𝑗 = proportion of realized citations of citing author 𝑖 in citing paper 𝑗, 𝑃𝐴𝑖𝑗 = proportion of accurate citations of citing author 𝑖 in citing paper 𝑗, 𝑃𝐸_𝑖𝑗 = proportion of erroneous citations of citing author 𝑖 in citing paper 𝑗, 𝑃̅̅𝐸̅̅_𝑖 = Person- specific average error rate of citing author 𝑖, 𝜎_{𝑃𝑁,𝑖} = author-specific pattern noise (within-author variance of erroneous citations for author 𝑖), 𝑃𝑅_𝑘 = cited paper 𝑘’s proportion of realized citations, 𝑇𝐶_𝑘 = times cited, 𝐸𝐶_𝑘 = expected citations, 𝑃𝐴_𝑘 = cited paper 𝑘’s proportion of accurate citations, 𝑃𝐸_𝑘 = cited paper 𝑘’s proportion of erroneous citations, 𝑃̅̅𝐴̅̅ = Proportion of accurate citations in the entire citation system, 𝑃̅̅𝐸̅̅ = Proportion of erroneous citations in the entire citation system, 𝜎_𝐿𝑁 = overall level noise (between-author variance of erroneous citations), 𝜎_𝑃𝑁 = overall pattern noise.

Table 4. Citation justification table (in chronological order from the beginning until the end of the manuscript)

Cited work	Knowledge flowed
Section: Introduction
Aman and Gläser (2025)	Scientific knowledge is a collective endeavor.
Tahamtan and Bornmann (2022)	Science can be conceptualized as a social citation system.
Milojević (in press)	Citations can be seen as value-free acts or value-laden acts.
Masic (2013)	The reference is the information that is necessary to the reader of a paper in identifying and finding used sources for the paper.
Penders (2018)	Citations are a form of scientific currency, actively conferring or denying value.
Aksnes et al. (2019)	Overview of basic concepts and theories on citations. Citations can be interpreted as knowledge flow from cited to citing paper.
Bornmann and Daniel (2008)	Overview of the literature on reasons to cite and citation functions.
Tahamtan and Bornmann (2018)	Overview of the literature on reasons to cite and citation functions.
Tahamtan and Bornmann (2019)	Overview of the literature on reasons to cite and citation functions.
Gilbert (1977)	Example of reasons to cite. One of the most early identified reasons to cite in bibliometrics: Authors cite to persuade their readers.
Bornmann and Daniel (2008)	Further explanation of the reason why authors cite to persuade their readers.
Sula and Miller (2014)	Linguistics tends to feature reinforcing citations of prior literature, whereas philosophy typically involves more critical engagement with cited works.
Wakeling et al. (in press)	Definition of quotation errors.
Wakeling et al. (in press)	Definition of reference errors.
Gavras (2002)	Definition of the ‘lazy author syndrome’.
Chen et al. (2024)	Definition of the ‘heuristic citation approach’.
Wakeling et al. (in press)	Overview of studies dealing with inaccurate citations from which summarizing results are reported.
Kahneman et al. (2021)	Conceptualization of citation noise, citation bias, and citation accuracy which have been transferred to citation decisions.
Kahneman et al. (2021)	Conceptualization of citation noise, citation bias, and citation accuracy which have been transferred to citation decisions.
Traag and Waltman (2022)	Definition of bias.
Kousha and Thelwall (2024)	Overview of factors that can lead to biased citation decisions.
Jannot et al. (2013)	Example of citation bias: favoring statistically significant results in medicine.
Cawkell (1969)	The term ‘citation noise’ appears in the title.
Meho and Yang (2007)	The term ‘citation noise’ appears in the abstract.
Tang (2023)	The term ‘citation noise’ appears in the abstract.
Wei et al. (2019)	The term ‘citation noise’ appears in the abstract.
Smith (2014)	Overview of the literature on citation noise in patent data.
Smith (2014)	Definition of knowledge flow.
Jaffe et al. (2000)	Citations in a patent not representing knowledge flow may account for half of its total citations.
Bornmann and Daniel (2008)	Studies on reasons to cite reveals a comparatively frequent occurrence of citations of the perfunctory (up to 50 percent) and persuasive (up to 40 percent) type.
Donner et al. (in press)	Around half of the citations in the WoS are of the background type.
Wakeling et al. (in press)	Range of inaccurate citations reported in studies is from 5% to 40%.
Wakeling et al. (in press)	Most studies report inaccurate citations between 10% to 20%.
Wakeling et al. (in press)	Illustration of the results on inaccurate citations.
Jergas and Baethge (2015)	Meta-analysis of studies dealing with inaccurate citations reports that 25.4% of citations are inaccurate.
Mogull (2017)	Meta-analysis of studies dealing with inaccurate citations reports that 14.5% of citations are inaccurate.
Smith (2014)	Patent citation noise is seen as a major challenge to the effectiveness of patent evaluation methodologies, which may therefore be poor indicators of the economic value of patents.
Section: Definition and measurement of citation accuracy, citation bias, and citation noise
Small (2004)	Norm of citation: Authors should acknowledge prior work in an accurate manner.
Merton (1973)	Norm of citation by Small (2004) is based on the normative theory of Merton (1973).
Small (2004)	Norm of citation: Authors should acknowledge prior work in an accurate manner.
Penders (2018)	Normative statements about how citations should be included in manuscripts.
Chen et al. (2024)	Examples of random citation decisions.
Kahneman et al. (2021)	Three types of noise are explained.
Jannot et al. (2013)	Citation bias favoring statistically significant results in medical research.
Mutz et al. (2017)	Papers labeled as very important papers by a reputable journal received more citations than comparable papers (i.e. papers of similar quality) without this label.
Kousha and Thelwall (2024)	Provide an overview of studies that investigated biases in citation decisions.
Section: Strategies to reduce citation noise and citation bias
Rushforth and Hammarfelt (2023)	Introduction of the term “professional reform movement” for the reports and initiatives pushing for the careful and responsible use of bibliometric indicators.
Hicks et al. (2015)	Leiden Manifesto as an example of a professional reform movement.
Wilsdon et al. (2015)	Metric Tide report as an example of a professional reform movement.
Thelwall (2024)	Overview of professional reform movements.
Thelwall et al. (2023)	Reform movement delivers strong arguments against using citations for aspects of research assessment.
Rushforth and Hammarfelt (2023)	Responsible metrics movement strives for wide-reaching changes in research assessment.
Penders (2018)	One of the most commonplace scholarly actions – the citation of publications – is embedded in a broader evaluative context.
Penders (2018)	To cite well, or to reference responsibly, is a matter of concern to all scientists.
Smith (2014)	Long direct quote describing the emergence of citation noise in patents.
Smith (2014)	Citation noise in patent data may lead to insufficient indicators of the economic value of patents and may obscure thus valuable economic insights.
Bornmann and Daniel (2008)	Overview of empirical studies shows that half of the citations are of the perfunctory and persuasive type.
Wakeling et al. (in press)	Overview of empirical studies shows that 10% to 20% of citations are inaccurate.
Smith (2014)	Importance of accurate citations in patent analysis.
Murphy (2011)	Many authors are not selective about referencing papers and tend to be a bit over zealous in citing more and more papers, many of which are not on the topic.
Masic (2013)	Definition of plagiarism.
Section: Aggregation of citation decisions
Galton (1907)	Introduction of the wisdom of crowds principle.
Galton (1907)	Introduction of the wisdom of crowds principle.
Galton (1907)	Introduction of the wisdom of crowds principle.
Surowiecki (2004)	Broad application of the ‘wisdom of crowds’ principle.
Surowiecki (2004)	Not all crowds make sound decisions.
van Raan (2005)	Method of aggregation targets citation errors.
Kahneman et al. (2021)	Aggregated decisions may be less noisy, but not less biased than the individual judgments.
Section: Citation decision hygiene
Section: Guidelines for ensuring accurate citations
Wakeling et al. (in press)	Many citations do not correspond to the content of a previous work or do not support the statement it accompanies.
Wakeling et al. (in press)	Reviewers and editors should be instrumental in detecting citation errors.
Bornmann and Daniel (2008)	Overview of citation context studies.
Tahamtan and Bornmann (2019)	Overview of citation context studies.
Clarivate (2022)	Clarivate has started to assign citations in papers to specific citation context categories in the WoS.
Bruton et al. (2025)	Results of a survey on ethically questionable citation behaviors.
Bruton et al. (2025)	Study shows that ethical norms of citations are rarely articulated clearly and meticulously.
Murphy (2011)	One of the few guidelines about citing accurately.
Murphy (2011)	List of eight rules in the citation guidelines refers to different aspects in connection with citation decisions, but essentially concerns the indication of knowledge flow.
Penders (2018)	Citations should be used to differentiate own research from that of others, and citations point readers to relevant literature.
Penders (2018)	Citations should be used to differentiate own research from that of others, and citations point readers to relevant literature.
Penders (2018)	Use of superficial citations is opposed.
Section: Citation justification table
Penders (2018)	Citations in publications should be explained.
Section: Training of (young) researchers
American Psychological Association (2020)	Example of a publication manual.
Section: Correction of citation errors
Wakeling et al. (in press)	What measures authors took when they had ever come across a citation of their work that they considered inappropriate.
Wakeling et al. (in press)	What measures authors took when they had ever come across a citation of their work that they considered inappropriate.
Section: The use of AI for citation decision correction
Binz et al. (2025)	Overview on how AI could be integrated into the work process of scientists.
Wang et al. (2023)	Overview on how AI could be integrated into the work process of scientists.
Algaba et al. (2024)	Study investigating the use of AI to suggest scholarly references.
Algaba et al. (2024)	LLMs can aid in citation generation, but may also generate biases.
Khan et al. (2023)	ChatGPT has the ability to produce generally accurate references, but occasionally generates artificial hallucinations.
Khan et al. (2023)	Study investigating the use of AI to suggest scholarly references.
Algaba et al. (2024)	Study investigating the use of AI to suggest scholarly references.
Section: The use of AI for citation decision correction
Aman and Gläser (2025)	Research papers include codified knowledge that is addressed to a large and partly anonymous audience.
Bornmann and Marewski (2019)	Definition of ‘bibliometrics-based heuristics’.
Teplitskiy et al. (2022)	Most citations have little-to-no influence on the citing authors.
Li et al. (2025)	Investigation of the roots of a seminal paper published by Turing (1937) based on cited references.
Turing (1937)	Paper investigated by Li et al. (2025).
Li et al. (2025)	Results of the investigation of the seminal paper published by Turing (1937): It is rooted in mainstream mathematical and logical texts.
Li et al. (2025)	Noise in citation data would make the evaluation and conclusion by Li et al. (2025) impossible or incorrect.
Funk and Owen-Smith (2017)	Introduction of the CD index for analyzing disruption and consolidation in patent citations.
Wu et al. (2019)	First application of the the CD index in bibliometrics.
Yang (2025)	The CD index assumes that a patent’s knowledge foundation is fully captured by its cited references.
Liu et al. (2023)	Different citation behaviors may result in inconsistencies between actual CD index values and expected (‘true’) values.
Mutz and Bornmann (2023)	Noise in CD index values may be tackled by advanced statistical modelling approaches that consider uncertainty in the calculation of these values.
Wakeling et al. (in press)	Different standards by which researchers are judging what constitutes citation errors.
Merton (1965)	Definition of ‘obliteration by incorporation’.
Fisher (1925)	The author introduced the statistical method ‘variance analysis’.
Schilling and Green (2011)	Authors draw from a deeper or broader knowledge reservoirs than what is reflected in a paper’s cited references.
Mohammadi et al. (2016)	Mendeley reader counts can be used as indicators for the future use of publications.
Thelwall (2018)	Mendeley reader counts can be used as indicators for the future citations of publications.
Koffi (2025)	Proposes the ‘omission indicator’ to identify papers that should have been cited in citing papers.
Section: Conclusions
Kahneman et al. (2021)	Providing a framework for understanding the variability in human decisions.
Kahneman et al. (2021)	The framework proposed by Kahneman et al. (2021) has been applied to the context of citation decisions.

Editors

Ludo Waltman
Editor-in-Chief

Ludo Waltman
Handling Editor

Editorial assessment

by Ludo Waltman

DOI: 10.70744/MetaROR.187.1.ea

This article has the ambitious goal to provide a foundation for citation analysis and to use this foundation to contribute to improving the use of citation analysis in research evaluation. The article has been reviewed by two reviewers. Reviewer 1 welcomes the fresh approach taken by the article. According to reviewer 2, the article offers an innovative approach to the systematic measurement of citation errors and an admirable attempt to make referencing practices more honest. However, both reviewers criticize the conceptual underpinning of the article. Reviewer 1 challenges the idea of an objective truth in terms of whether one article should cite another article. Likewise, reviewer 2 objects against reducing the many dimensions of the social practice of citing to the abstract notion of a knowledge flow. The reviewer argues that the citations the authors make themselves in their own article demonstrate that interpreting citations in terms of knowledge flows is problematic. The reviewer encourages the authors to relate the concept of knowledge flow in a more precise way to the actual citation practices of researchers. Reviewer 2 also critiques the independence assumption made by the authors to justify the use of citation analysis in research evaluation.

Recommendations for enhanced transparency

Add author ORCID IDs.
Add an author contribution statement. The use of the CRediT taxonomy for reporting author contributions is encouraged.
Add a competing interest statement. Authors should report all competing interests, including not only financial interests, but any role, relationship, or commitment of an author that presents an actual or perceived threat to the integrity or independence of the research presented in the article. If no competing interests exist, authors should explicitly state this.
Add a funding source statement. Authors should report all funding in support of the research presented in the article. Grant reference numbers should be included. If no funding sources exist, explicitly state this in the article.

Competing interests: None.

Peer review 1

Simon Wakeling

DOI: 10.70744/MetaROR.187.1.rv1

I enjoyed reading this paper, and am very pleased to see new research on this topic. Studies of quotation/citation accuracy might be argued to have been somewhat stagnant in recent years, with research (including my own) focusing on the calculation off error rates within or across disciplines. A fresh approach to conceptualising the citation system is welcome, and I think this article makes a useful contribution in informing future research on this topic.

As a general point, I think it worth acknowledging early in the paper that there is some inconsistency in terminology in this research area. What you refer to as ‘citation accuracy’ is quite often referred to as ‘quotation accuracy’, with some researchers using the term ‘citation error’ to refer to instances where the reference is incorrect (i.e, the bibliographic details in the reference list or in-text have errors). I think it’s fine to use citation accuracy in your paper, as frankly it seems to me the better descriptor, but important to acknowledge the difference.

I note that right from the start of the paper there is a focus on the ‘scientific’ context. Plenty of studies have shown that quotation errors happen in all disciplines, so I wonder if the strictly scientific framing is unnecessarily limiting? I note that later you provide examples from non-scientific disciplines (e.g. linguistics and philosophy, p.5). If there are reasons why the paper needs to maintain a strictly scientific context then these would be worth clearly articulating.

The main question I have about this work is whether it sufficiently acknowledges the complexity and subjectivity of citation decisions. I understand why the fictitious scenarios used to model social citation systems adopt a binary measure of citation accuracy, but in reality I don’t believe that there is always an objective truth in terms of whether a given paper should be cited at a certain point of a citing article. The ever growing volume of scholarship being produced, and the incremental nature of research, means that there are numerous occasions when different authors might make different decisions about what to cite, without one being objectively ‘wrong’. This is actually exemplified by the Murphy (2011) citation guidelines that are quoted later in the paper, which include this: ““Be as inclusive as possible, but know where to draw the line”. As long as there is disagreement among scholars about exactly where this ‘line’ is there will always be different (but still arguably valid) citation practices. I feel like this messiness could be discussed in more detail in the paper, because I believe it is fundamental to an holistic understanding of the citation system.

I also have a few more specific comments:

Top of pg 4 – You make a valuable point about papers being ranked based on citations. It might also be worth highlighting the fact that journals, too, are inevitably judged by citation metrics (e.g. Journal Impact Factor). The journal context is important, I think, because any efforts to address quotation errors are likely to require the engagement of journals (through editors, boards etc), as you later note.

“Citations can only be used meaningfully in research evaluation processes if they represent knowledge flow and the usefulness of research for authors.” (p. 4)

I’m not entirely convinced by the term ‘usefulness’ here. The implication is that the more a paper is cited, the more ‘useful’ it is. But of course not all citations are equal – it’s perfectly reasonable for an article to cite two papers, both for valid reasons, but for one to be much more influential (or useful) for the citing research than the other. I think the overall point you make in this paragraph – that citation metrics are only meaningful if the citations themselves are meaningful – is valid, but I might reconsider the language used.

“Other inaccurate citations result from the ‘lazy author syndrome’ (Gavras, 2002), i.e. from citing publications without engaging with the content of the cited publication.” (p. 5) And also “The authors’ decision to cite a work is not based on a comprehensive understanding, but rather on a superficial impression gleaned solely from the title or abstract. These citation decisions cannot represent knowledge flow and are erroneous.” p .25

This is in the context of lazy citations. The second quote here implies that any citation for which the citing author has not fully engaged with the cited article is erroneous. I think that’s contentious. If an author skims an abstract then cites that article (being a very lazy author), but it is actually a valid citation (e.g. it is being used correctly to justify a basic but integral piece of knoweldge), is it actually ‘erroneous’? Lazy author syndrome seems to me a behavioural explanation for why citation errors occur, rather than necessarily being a form of citation error.

“There seems to be a similar problem in publication citations as in patent citations where citations may not represent knowledge flow for half of total citations (see above).” (p. 8)

Perhaps I’m missing something, but it’s not clear to me where the justification is for assuming that the patent citation figures can be directly applied to the article citation context. As far as I can tell the argument is that 10-20% of citations are erroneous, and up to 50% of citations are ‘perfunctory’ and up to 40% ‘persuasive’. I’m not sure how that translates to the statement that citations may not represent knowledge flow for ‘half of total citations’. I also think the ‘perfunctory’ and ‘persuasive’ categorisation needs to be explained more. The paragraph implies that such citations contribute to citation noise, but that is contentious I think, and requires careful explanation.

“As we will show in the following, noise can be a significant source of error in citation decisions” (p. 9)

Is noise a “source” of error in citation “decisions”? Doesn’t the term describe the effect of incorrect citations within the system?

“… there is an upward citation bias in the social citation system in Table 1: on average, the cited papers are overcited.” p.21

Perhaps I miss something, but the Table shows the total number of accurate citations would have been 26 (average of 5.2 per paper), but the actual number of citations was 23 (average of 4.6 citations per paper), so doesn’t this mean that on average the cited papers are under cited?

Page 29 – missing closing quotation marks at the end of the van Raan quotation.

Citation justification table – I very much like this idea, but I think it worth acknowledging the major barrier to its implementation i.e. the additional burden it would place on already time-poor researchers. Were a journal to mandate such a table, one imagines it might reduce submissions (as was the case when some journals mandated that authors make the underlying research data accessible to readers), and therefore journals may be disincentivised to introduce the approach.

Training of (young) researchers – another good idea in theory, but perhaps worth noting that such training would probably be dependent on their first being generally accepted citation guidelines (as you argue in 3.2.1). I think a bit more detail might help here, too – what kind of ‘training’ are you envisioning? I would say that most ECRs already receive a form of training in this through the guidance of mentors/supervisors in the early experiences of publication. Are you imagining more formalised training (perhaps through doctoral programs)?

Correction of citation errors – this section feels a little underdone, and I don’t quite understand the focus on the citing author as the main actor here. Occasions when an author has cited an article, and then later decides that that citation was erroneous, and seeks to correct their paper, would seem to me to be quite unusual. More common, surely, would be authors of cited papers encountering inaccurate citations of their work. But as you note, we found in our study that it appears that relatively few authors will proactively seek to correct inaccurate citations they encounter. One point to consider here is that researchers are to some degree disincentivised from correcting erroneous citations of their work – while that citation exists in the scholarly record it counts as a citation in all the various citation-related metrics that researchers are generally seeking to maximise. So overall I’m not entirely sure what this section adds beyond the quite general message that inaccurate citations should ideally be corrected. I suspect almost everyone would agree – the real question is how that can be encouraged and facilitated!

Competing interests: None.

Peer review 2

Paul Wouters

DOI: 10.70744/MetaROR.187.1.rv2

Contribution to the existing literature

The central aim of this paper is to lay the foundation for the use of citation analysis in research evaluation. This has been an issue since the very beginning of scientometrics as a field and citation analysis as a practice. This vast and well-known literature on the correct citation theory is not explicitly referenced. Instead, the authors try to develop a new approach by starting with Kahnemann et al.’s concept of noise as a flaw in human judgement (Kahneman et al., 2021). They share with this book the idea that noise is “undesirable” and that it should therefore be reduced in social interactions and analysis.

On this basis, they propose the concept of “citation noise” as distinct from the better known concept of “citation bias”. The paper defines this concept with the help of a ficticious small model world in which a set of papers are cited both correctly (according to the definition by the authors) and incorrectly. They then distinguish two different forms of citation noise: citation pattern noise, and citation level noise. The paper proposes different indicators to measure the level of these types of citation noise.

The paper builds on this analysis to propose recommendations in two related but distinct areas:

guidelines and normative prescriptions for the ways in which authors decide to cite or not to cite a particular paper, hence for referencing practices;
and the application of citation analysis in research evalation. This includes the choice of levels of aggregation, an area which has also been addressed in scientometrics in relation to the reduction of citation bias.

In addition, the paper proposes a novel research agenda of no less than five research themes to address the gaps in our knowledge regarding citation noise as presented in the current paper. This would entail a considerable research effort, even if using AI would decrease the work to a certain extent. This program also includes the monitoring and analysis of the recommendations of the paper to reform both citing practices and citation analysis.

The paper acknowledges the existence of the movement to change the dominant research evaluation practices as proposed in the DORA declaration, the Leiden Manifesto, the European CoARA coalition, the Metrics Tide report, and the More than Our Rank initiative. However, according to the paper, these initiatives have ignored to address the root problem underlying the problematic use of citation analysis in research evaluation: citation errors. Therefore, this paper proposes to tackle this problem head-on, rather than to rely less on citation analysis in research evaluation. The paper therefore is also a straightforward critique of these initiatives, although on rather friendly and complementary, rather than contradictory, terms.

Problematic basic assumptions: the knowledge flow

The paper strikes me as an intriguing combination of a provocative innovative approach in its attempt to be able to measure citation errors in a more systematic way, an admirable attempt to make the referencing of past literature more honest in our academic daily lives, and a profoundly conservative attitude with respect to citation analysis. The paper is very well written and I enjoyed reading it and thinking about it (although a bit too long for MetaROR’s production cycle I must admit). In essence, the paper ignores the literature of the last quarter of the second century that critiqued Merton’s normative theory of citation. Instead it wants to revive the ideal of Merton by redefining what a correct citation is in informational concepts: a correct citation represents a flow of information from the cited paper to the citing author. The paper speaks about knowledge flow from cited to citing papers, but of course this is a sloppy formulation. Knowledge does not flow from itself from paper to paper, but is the result (as the paper itself puts central) of author’s decisions. Hence, knowledge flow is not a relation between papers, but between a paper and an author. But this is only a minor flaw in the paper, although it may have unexpected information theoretical implications (think of the role of AI).

More importantly, the paper essentially tries to purify the social processes of citing scientific literature in order to save citation analysis for research evaluation. In other words, the paper tries to rescue Robert Merton’s empirically refuted theory of citation by disqualifying all motivations for references except the existence of an original knowledge flow. Because only then can the claim that a citation represent a knowledge flow be maintained. Now, the problem is, as the paper itself references, that there is a huge variety of reasons why authors cite a particular paper. This does not mean that they are all wrong, except that there are many more dimensions to the social practice of referencing and citing than the rather abstract notion of a knowledge flow.

So, how does an author know when she was struck by a knowledge flow? “Citation accuracy refers to the precise attribution of knowledge flow from cited to citing papers, ensuring that each citation reflects a genuine intellectual contribution.” (p. 37). As it happens, the paper admirably tries to live up to its own standards and for this the authors developed a really original method: a citation justification table in which they make clear why they cited a particular paper. Now, let me first state that I support every attempt to make our academic practices more honest. Surely, the current geopolitical and academic climate can use a measure of honesty! So, how do the authors themselves make their citation motivations clear? The first entry in their table is to a paper written in 2025 written by Aman and Gläser and the motivation is: “scientific knowledge is a collective endeavour.” Well, I find it hard to believe that the authors discovered only in 2025 that science is a collective system by reading the paper by Aman and Gläser. Especially given the prolific research portfolio of one of the authors on precisely the collective nature of science. So, how precise is this citation? Should it not be identified as an empty citation, as identified by Wakeling et al., 2025 and hence as what they call a “quotation error” which seems to me identical to the “citation error” in this paper? The second entry in the citation table has the same problem. Other entries are basically not a description of the knowledge flow that happened after the authors read the paper, but a description of the nature of that paper, such as the entries that characterize the cited paper as “an overview of”.

I do not want to claim that the authors are wrong, nor that such a table would be useless. On the contrary, it can be seen as a machine-readable form of the traditional footnotes with which historians and philosophers are used to cite their sources and their primary and secondary literature (although humanists use more words for this and guide the reader in more detail to their libraries). So, yes, let us experiment with these kind of explanations! But I would like to know from the authors how an author can distinguish a primal knowledge flow from all sorts of other valid reasons to cite a paper if even they themselves deviate from this ideal. For example, in many fields it makes a lot of sense to cite not the very first paper that created a knowledge flow but a summary of the debate since then. This would actually a better service to the reader, it helps her to be more efficient and effective in catching up with the literature. And I find this purified concept of knowledge flow particularly shaky as the basis of a resurrection of Robert Merton’s theory of citation.

This may relate to a more general point in social science: to what extent can the complex interaction between action and structure be reduced to a comprehensible set of dimensions? Sociology and history research in general seems to suggest that we must be careful with what we strive for. And the reduction of social action to only one dimension, which is what the authors seem to propose, is surely an extreme case of a purification attempt. I am curious how far the authors wish to go in developing guidelines for proper citation behaviour. And to what extent do they think that their own referencing practice would live up to this ideal?

My suggestion would therefore be that the paper would gain strength if the authors could more precisely relate their concept of knowledge flow (which they seem to apply in practice more liberally than I deduced from their theoretical description) to the literature they cite on how researchers tend to cite literature in their fields. For this, the observation by Wakeling et al. seems relevant:

“Our findings also offer a potential explanation for the variation found in the error rates of previous studies. It is notable that substantial numbers of respondents indicated that the citation of their work was an oversimplification, or generalization, yet also agreed that the citation was appropriate. The suggestion here is that there is no clear consensus on what constitutes a quotation error; accuracy is in the eye of the beholder, with some authors clearly more forgiving of what others might consider inappropriately superficial or insufficiently nuanced uses of their work.” (Wakeling et al., 2025, p. 1406)

Problematic basic assumptions: the independence

The second area that the paper addresses with recommendations is the use of citation analysis in research evaluation. In this section, the paper puts the level of aggregation central, which is also familiar in discussions about citation bias. The reason to trust a higher level of aggregation more than a lower one with respect to the expected amount of citation errors hinges on the concept that each author is independent in her citing practices. This is the key idea of the wisdom of the crowds. If authors do not act independently, the argument breaks down, as the paper states correctly. However, I did not see any evidence cited in the paper that authors are really independent of each other. In this respect, it is relevant that the social organisation of the correction of citations in patents is very different from the scientific literature.

On the contrary: there are many reasons to assume that most researchers are guided by community based norms, either explicitly in prescriptions of the journals, or implicitly by the guidance of their supervisors. The paper does not address this, so I would like to suggest that the paper may be extended with a serious discussion of the assumed independence. Given that the paper develops recommendations to reduce the independence in order to enhance the integrity of the citation system, this seems a striking omission and therefore a nice opportunity to further bolster the case for the paper. My suspicion is, however, that this will prove to be not so easy, and it may lead to a paradox. If all authors start to follow precise guidelines, and therefore become less independent, would this lead to disqualify citation analysis for research evaluation? Following the argument in the paper, it would reduce citation noise, hence increase the validity of citation analysis for research evaluation. At the same time, it would also decrease the independence of the authors, thereby undermining the very idea of the wisdom of the crowds, and thereby decreasing the validity of citation analysis for research evaluation.

Conclusion

I would very much welcome the authors to address these contradictions in the key assumptions of the paper. If they would be willing to also address the purification problem in social science on this basis, it would add even more value to this conversation. Because, in the end we are adressing a very basic philosophical matter: are citations really supposed to be value free as the paper argues? The paper rather inter alia assumes that value-elements are inappropriate: “Value-laden (inappropriate) elements flow into the social citatioin system” (p. 3). My claim would be on the contrary that values are nothing less than the very core of all social systems, including the social citation system. Purification is in my view tantamount to taking the heart out of the social fabric, as the histories of all purification revolutions in the past attest. Is this really the way forward for citation analysis?

Cited literature

Kahneman, D., Sibony, O., & Sunstein, C. R. (2021). Noise: A Flaw in Human Judgment. Hachette Book Group USA.

Wakeling, S., Paramita, M. L., & Pinfield, S. (2025). How do authors perceive the way their work is cited? Findings from a large-scale survey on quotation accuracy. Journal of the Association for Information Science and Technology, 76(10), 1396–1410. https://doi.org/10.1002/asi.70000

Competing interests: None.

Author response

DOI: 10.70744/MetaROR.187.1.ar

Editorial assessment

Reply: We received exceptionally important feedback from both reviewers, which enabled us to significantly improve our manuscript. We hope we have succeeded in doing so. If we haven’t succeeded with all the comments, we are happy to address them in another round of peer review. In revising the manuscript, we focused primarily on the conceptual (knowledge flow) framework of our research. Both reviewers had requested a revision of this framework.

Reviewer 1 (Simon Wakeling)

I enjoyed reading this paper, and am very pleased to see new research on this topic. Studies of quotation/citation accuracy might be argued to have been somewhat stagnant in recent years, with research (including my own) focusing on the calculation of error rates within or across disciplines. A fresh approach to conceptualising the citation system is welcome, and I think this article makes a useful contribution in informing future research on this topic.

Reply: Thank you!

As a general point, I think it worth acknowledging early in the paper that there is some inconsistency in terminology in this research area. What you refer to as ‘citation accuracy’ is quite often referred to as ‘quotation accuracy’, with some researchers using the term ‘citation error’ to refer to instances where the reference is incorrect (i.e, the bibliographic details in the reference list or in-text have errors). I think it’s fine to use citation accuracy in your paper, as frankly it seems to me the better descriptor, but important to acknowledge the difference.

Reply: At two points at the beginning of the paper, we define now the term ‘citation’ and indicate that some previous studies used the terms ‘inaccurate quotation’ or ‘inaccurate reference’ for what we call ‘inaccurate citation’.

I note that right from the start of the paper there is a focus on the ‘scientific’ context. Plenty of studies have shown that quotation errors happen in all disciplines, so I wonder if the strictly scientific framing is unnecessarily limiting? I note that later you provide examples from non-scientific disciplines (e.g. linguistics and philosophy, p.5). If there are reasons why the paper needs to maintain a strictly scientific context then these would be worth clearly articulating.

Reply: It was not our intention to focus on the ‘scientific’ context. To avoid this false impression, we have included in the first sentence of the paper the term ‘in all disciplines’.

The main question I have about this work is whether it sufficiently acknowledges the complexity and subjectivity of citation decisions. I understand why the fictitious scenarios used to model social citation systems adopt a binary measure of citation accuracy, but in reality I don’t believe that there is always an objective truth in terms of whether a given paper should be cited at a certain point of a citing article. The ever growing volume of scholarship being produced, and the incremental nature of research, means that there are numerous occasions when different authors might make different decisions about what to cite, without one being objectively ‘wrong’. This is actually exemplified by the Murphy (2011) citation guidelines that are quoted later in the paper, which include this: “Be as inclusive as possible, but know where to draw the line”. As long as there is disagreement among scholars about exactly where this ‘line’ is there will always be different (but still arguably valid) citation practices. I feel like this messiness could be discussed in more detail in the paper, because I believe it is fundamental to an holistic understanding of the citation system.

Reply: We fully agree with the reviewer on this point. One of us (LB) has authored several literature reviews on the many motives and reasons that studies of scientists’ citation behavior have identified (Bornmann & Daniel, 2008; Tahamtan & Bornmann, 2018, 2019). We are therefore well aware that these diverse motives and reasons exist, and within our framework we primarily classify them as forms of bias and noise. In our study, we start from the empirical fact that the results of citation analyses are used in research evaluation. We then argue that – given this fact – a specific condition must be met with respect to scientists’ citation decisions. If this condition is not fulfilled to a sufficient degree, the results of citation analyses should (or can) not be used in research evaluation. The use of citations in research evaluation presupposes that a knowledge flow has occurred between the cited paper and the citing paper: content of the citing paper is assumed to build on knowledge contained in the cited paper. In the revised manuscript, we elaborate this line of argument, which is fundamental to our framework, more explicitly and provide a more precise definition of what we mean by the knowledge flow between cited and citing papers.

Top of pg 4 – You make a valuable point about papers being ranked based on citations. It might also be worth highlighting the fact that journals, too, are inevitably judged by citation metrics (e.g. Journal Impact Factor). The journal context is important, I think, because any efforts to address quotation errors are likely to require the engagement of journals (through editors, boards etc), as you later note.

Reply: We have included in the revised manuscript that papers can also be aggregated (in journals or institutions), and the citation impact of these units can be analyzed then.

“Citations can only be used meaningfully in research evaluation processes if they represent knowledge flow and the usefulness of research for authors.” (p. 4) I’m not entirely convinced by the term ‘usefulness’ here. The implication is that the more a paper is cited, the more ‘useful’ it is. But of course not all citations are equal – it’s perfectly reasonable for an article to cite two papers, both for valid reasons, but for one to be much more influential (or useful) for the citing research than the other. I think the overall point you make in this paragraph – that citation metrics are only meaningful if the citations themselves are meaningful – is valid, but I might reconsider the language used.

Reply: In the revised manuscript, we have deleted this part of the sentence: “and the usefulness of research for authors”.

“Other inaccurate citations result from the ‘lazy author syndrome’ (Gavras, 2002), i.e. from citing publications without engaging with the content of the cited publication.” (p. 5) And also “The authors’ decision to cite a work is not based on a comprehensive understanding, but rather on a superficial impression gleaned solely from the title or abstract. These citation decisions cannot represent knowledge flow and are erroneous.” p .25 This is in the context of lazy citations. The second quote here implies that any citation for which the citing author has not fully engaged with the cited article is erroneous. I think that’s contentious. If an author skims an abstract then cites that article (being a very lazy author), but it is actually a valid citation (e.g. it is being used correctly to justify a basic but integral piece of knowledge), is it actually ‘erroneous’? Lazy author syndrome seems to me a behavioral explanation for why citation errors occur, rather than necessarily being a form of citation error.

Reply: We have reformulated both sentences.

“There seems to be a similar problem in publication citations as in patent citations where citations may not represent knowledge flow for half of total citations (see above).” (p. 8) Perhaps I’m missing something, but it’s not clear to me where the justification is for assuming that the patent citation figures can be directly applied to the article citation context. As far as I can tell the argument is that 10-20% of citations are erroneous, and up to 50% of citations are ‘perfunctory’ and up to 40% ‘persuasive’. I’m not sure how that translates to the statement that citations may not represent knowledge flow for ‘half of total citations’. I also think the ‘perfunctory’ and ‘persuasive’ categorization needs to be explained more. The paragraph implies that such citations contribute to citation noise, but that is contentious I think, and requires careful explanation.

Reply: We have included definitions of perfunctory and persuasive citations at the beginning of the manuscript. We have also reformulated the sentence identified by the reviewer and the sentence before in our manuscript.

“As we will show in the following, noise can be a significant source of error in citation decisions” (p. 9) Is noise a “source” of error in citation “decisions”? Doesn’t the term describe the effect of incorrect citations within the system?

Reply: The sentence has been deleted during the manuscript revision.

“… there is an upward citation bias in the social citation system in Table 1: on average, the cited papers are overcited.” p.21 Perhaps I miss something, but the Table shows the total number of accurate citations would have been 26 (average of 5.2 per paper), but the actual number of citations was 23 (average of 4.6 citations per paper), so doesn’t this mean that on average the cited papers are under cited?

Reply: Thank you for identifying this error! We have corrected this.

Page 29 – missing closing quotation marks at the end of the van Raan quotation.

Reply: We have corrected this.

Citation justification table – I very much like this idea, but I think it worth acknowledging the major barrier to its implementation i.e. the additional burden it would place on already time-poor researchers. Were a journal to mandate such a table, one imagines it might reduce submissions (as was the case when some journals mandated that authors make the underlying research data accessible to readers), and therefore journals may be disincentivized to introduce the approach.

Reply: We have considered this point in the text at the end of the discussion section.

Training of (young) researchers – another good idea in theory, but perhaps worth noting that such training would probably be dependent on their first being generally accepted citation guidelines (as you argue in 3.2.1). I think a bit more detail might help here, too – what kind of ‘training’ are you envisioning? I would say that most ECRs already receive a form of training in this through the guidance of mentors/supervisors in the early experiences of publication. Are you imagining more formalised training (perhaps through doctoral programs)?

Reply: In section 3.2.3, we mention now that not only graduate programs should do more to train students in how to approach citing decisions, but also guidance of mentors and supervisors in the early experiences of publication is also important. We also mention in the section guidance through publication manuals of professional associations and instructional videos for training of (young) researchers.

Correction of citation errors – this section feels a little underdone, and I don’t quite understand the focus on the citing author as the main actor here. Occasions when an author has cited an article, and then later decides that that citation was erroneous, and seeks to correct their paper, would seem to me to be quite unusual. More common, surely, would be authors of cited papers encountering inaccurate citations of their work. But as you note, we found in our study that it appears that relatively few authors will proactively seek to correct inaccurate citations they encounter. One point to consider here is that researchers are to some degree disincentivized from correcting erroneous citations of their work – while that citation exists in the scholarly record it counts as a citation in all the various citation-related metrics that researchers are generally seeking to maximize. So overall I’m not entirely sure what this section adds beyond the quite general message that inaccurate citations should ideally be corrected. I suspect almost everyone would agree – the real question is how that can be encouraged and facilitated!

Reply: The reviewer is completely right in pointing out that the citing author should not be the main actor here: it should be the cited author. We have corrected this. In the process of revising the manuscript, we abstained from deleting the section (although we considered this), since the correction of errors may be one way of dealing with citation noise and bias (even if alternative ways might be more suitable). We didn’t want to leave any alternative unmentioned. At the end of the section, we have included the reviewer’s point that the incentives of the cited authors might be limited to correct erroneous citations.

Reviewer 2 (Paul Wouters)

Contribution to the existing literature

The central aim of this paper is to lay the foundation for the use of citation analysis in research evaluation. This has been an issue since the very beginning of scientometrics as a field and citation analysis as a practice. This vast and well-known literature on the correct citation theory is not explicitly referenced. Instead, the authors try to develop a new approach by starting with Kahneman et al.’s concept of noise as a flaw in human judgement (Kahneman et al., 2021). They share with this book the idea that noise is “undesirable” and that it should therefore be reduced in social interactions and analysis.

Reply: Since our study’s approach has not yet been discussed in the context of the various approaches to developing a theory of citations, we have now rectified this in the introduction. Thank you for pointing that out!

[…]

Problematic basic assumptions: the knowledge flow

The paper strikes me as an intriguing combination of a provocative innovative approach in its attempt to be able to measure citation errors in a more systematic way, an admirable attempt to make the referencing of past literature more honest in our academic daily lives, and a profoundly conservative attitude with respect to citation analysis. The paper is very well written and I enjoyed reading it and thinking about it (although a bit too long for MetaROR’s production cycle I must admit).

Reply: Thank you!

In essence, the paper ignores the literature of the last quarter of the second century that critiqued Merton’s normative theory of citation. Instead it wants to revive the ideal of Merton by redefining what a correct citation is in informational concepts: a correct citation represents a flow of information from the cited paper to the citing author. The paper speaks about knowledge flow from cited to citing papers, but of course this is a sloppy formulation. Knowledge does not flow from itself from paper to paper, but is the result (as the paper itself puts central) of author’s decisions. Hence, knowledge flow is not a relation between papers, but between a paper and an author. But this is only a minor flaw in the paper, although it may have unexpected information theoretical implications (think of the role of AI).

Reply: One of us (LB) has authored several literature reviews on the many motives and reasons that studies of scientists’ citation behavior have identified (Bornmann & Daniel, 2008; Tahamtan & Bornmann, 2018, 2019). LB is also one of the authors of a paper that has introduced the last attempt for a theory of citations (Tahamtan & Bornmann, 2022). This paper also includes an extensive overview of the previously proposed theories of citations. In the submitted version of the manuscript, we did not discuss the various theories of citations in detail because the study did not aim to develop a new theory or revitalize an existing one. Instead, we aim to make explicit the assumptions that are often implicit in the practice of citation analysis and to discuss the conditions that authors’ citation practices must fulfill in order for citation data to be useful for citation analysis. In the revised version of the manuscript, we have now rectified this, briefly introducing the different theories and situating our study within the research on theories of citations. In the revised version, we have also clarified our understanding of knowledge flow.

More importantly, the paper essentially tries to purify the social processes of citing scientific literature in order to save citation analysis for research evaluation. In other words, the paper tries to rescue Robert Merton’s empirically refuted theory of citation by disqualifying all motivations for references except the existence of an original knowledge flow. Because only then can the claim that a citation represent a knowledge flow be maintained. Now, the problem is, as the paper itself references, that there is a huge variety of reasons why authors cite a particular paper. This does not mean that they are all wrong, except that there are many more dimensions to the social practice of referencing and citing than the rather abstract notion of a knowledge flow.

Reply: Our study does not aim to perpetuate citation analysis or any particular theory. Our line of reasoning takes a different approach. We have therefore endeavored to clarify this line of reasoning even more clearly for the reader in the revised version. In our study, we begin from the empirically established fact that the results of citation analyses are employed in research evaluation. Building on this premise, we argue that a specific condition concerning scientists’ citation decisions must be satisfied. If this condition is not met to a sufficient degree, the outcomes of citation analyses cannot – or should not – be used for evaluative purposes. The evaluative use of citations presupposes that the content of the citing work builds upon knowledge contained in the cited work. We do not deny that there are countless other reasons and motives for citing. In our study, these additional reasons and motives are categorized under the headings of bias and noise. We suggest that if one wishes to use citation analysis in research evaluation (and not forego it entirely), then various measures (see section 3 in our paper) should be taken to reduce citation decisions to the dimension that one wishes to measure in citation analyses: which work has provided knowledge that has been used in other work. If, on the other hand, one decides not to use citation analysis in research evaluation, it is not necessary to strive for a reduction to this single dimension.

So, how does an author know when she was struck by a knowledge flow? “Citation accuracy refers to the precise attribution of knowledge flow from cited to citing papers, ensuring that each citation reflects a genuine intellectual contribution.” (p. 37). As it happens, the paper admirably tries to live up to its own standards and for this the authors developed a really original method: a citation justification table in which they make clear why they cited a particular paper. Now, let me first state that I support every attempt to make our academic practices more honest. Surely, the current geopolitical and academic climate can use a measure of honesty! So, how do the authors themselves make their citation motivations clear? The first entry in their table is to a paper written in 2025 written by Aman and Gläser and the motivation is: “scientific knowledge is a collective endeavour.” Well, I find it hard to believe that the authors discovered only in 2025 that science is a collective system by reading the paper by Aman and Gläser. Especially given the prolific research portfolio of one of the authors on precisely the collective nature of science. So, how precise is this citation? Should it not be identified as an empty citation, as identified by Wakeling et al., 2025 and hence as what they call a “quotation error” which seems to me identical to the “citation error” in this paper? The second entry in the citation table has the same problem. Other entries are basically not a description of the knowledge flow that happened after the authors read the paper, but a description of the nature of that paper, such as the entries that characterize the cited paper as “an overview of”.

Reply: As the reviewer himself noted in his comment above, knowledge flow is a process that takes place between a cited paper and the author of the citing paper. The author has extracted knowledge from a specific paper and used this knowledge in their own paper. The author decides which paper this is. It is a subjective decision – for the use of citations in research evaluation, the only crucial factor is that the process involves the transfer of knowledge. It is not about whether the knowledge is long established and should actually be cited from an earlier source, or about citing the source in which this knowledge first appeared. The author should cite the source from which they extracted knowledge for their own paper. We have tried to make this point clearer in the revised version of our manuscript (in the introduction).

I do not want to claim that the authors are wrong, nor that such a table would be useless. On the contrary, it can be seen as a machine-readable form of the traditional footnotes with which historians and philosophers are used to cite their sources and their primary and secondary literature (although humanists use more words for this and guide the reader in more detail to their libraries). So, yes, let us experiment with these kind of explanations! But I would like to know from the authors how an author can distinguish a primal knowledge flow from all sorts of other valid reasons to cite a paper if even they themselves deviate from this ideal. For example, in many fields it makes a lot of sense to cite not the very first paper that created a knowledge flow but a summary of the debate since then. This would actually a better service to the reader, it helps her to be more efficient and effective in catching up with the literature. And I find this purified concept of knowledge flow particularly shaky as the basis of a resurrection of Robert Merton’s theory of citation.

Reply: In our paper, we found that creating a citation justification table prompted us to carefully examine our citation decisions. Our table is certainly not perfect, but it has definitely led to less noise and bias in our citations. For every citation we included, we retrospectively considered whether there had actually been a flow of knowledge from the cited paper to the citing paper. During this process, we deleted many citations because we likely included them for other reasons (such as mentioning a specific name in connection with a particular concept without having drawn any knowledge from a specific work). We agree with the reviewer that it is not strictly necessary to cite the first paper in a debate, but rather the paper that summarizes the debate. If the citing author has used knowledge from the summarizing paper and not from the first paper, they should only cite the summarizing paper.

This may relate to a more general point in social science: to what extent can the complex interaction between action and structure be reduced to a comprehensible set of dimensions? Sociology and history research in general seems to suggest that we must be careful with what we strive for. And the reduction of social action to only one dimension, which is what the authors seem to propose, is surely an extreme case of a purification attempt. I am curious how far the authors wish to go in developing guidelines for proper citation behavior. And to what extent do they think that their own referencing practice would live up to this ideal?

Reply: We are aware of the complexity of the citation process. In our framework, we restructure this process with its many dimensions by designating one dimension as “citation accuracy” and assigning other dimensions to the areas of “citation bias” and “citation noise”. We undertake this reorganization in light of the fact that citations are used for citation analyses in research evaluation. For this application, it is necessary to focus on the dimension that is to be measured in research evaluation (knowledge flow). If one decides against using citation analysis in research evaluation, we also see no need to discipline citing authors in their citing decisions.

My suggestion would therefore be that the paper would gain strength if the authors could more precisely relate their concept of knowledge flow (which they seem to apply in practice more liberally than I deduced from their theoretical description) to the literature they cite on how researchers tend to cite literature in their fields. For this, the observation by Wakeling et al. seems relevant: “Our findings also offer a potential explanation for the variation found in the error rates of previous studies. It is notable that substantial numbers of respondents indicated that the citation of their work was an oversimplification, or generalization, yet also agreed that the citation was appropriate. The suggestion here is that there is no clear consensus on what constitutes a quotation error; accuracy is in the eye of the beholder, with some authors clearly more forgiving of what others might consider inappropriately superficial or insufficiently nuanced uses of their work.” (Wakeling et al., 2025, p. 1406)

Reply: We are aware that there will always be variability and bias in authors’ citation decisions. We do not expect to reach a state where all citing authors act in accordance with a single, objective truth. This is impossible in any area where people make decisions, such as judicial or medical decisions. Many factors play a role in judicial and medical decisions. However, if these decisions are used within a framework that attaches specific conditions to them, then the decisions should adhere to those conditions. If this does not happen, the decisions should not be used in court proceedings or medical offices. Using the framework by Kahneman et al. (2021) is about measuring (disclosing) variability and bias in human decisions and, if necessary, taking measures to reduce variability and bias and increase accuracy.

Problematic basic assumptions: the independence

The second area that the paper addresses with recommendations is the use of citation analysis in research evaluation. In this section, the paper puts the level of aggregation central, which is also familiar in discussions about citation bias. The reason to trust a higher level of aggregation more than a lower one with respect to the expected amount of citation errors hinges on the concept that each author is independent in her citing practices. This is the key idea of the wisdom of the crowds. If authors do not act independently, the argument breaks down, as the paper states correctly. However, I did not see any evidence cited in the paper that authors are really independent of each other. In this respect, it is relevant that the social organization of the correction of citations in patents is very different from the scientific literature. On the contrary: there are many reasons to assume that most researchers are guided by community based norms, either explicitly in prescriptions of the journals, or implicitly by the guidance of their supervisors. The paper does not address this, so I would like to suggest that the paper may be extended with a serious discussion of the assumed independence. Given that the paper develops recommendations to reduce the independence in order to enhance the integrity of the citation system, this seems a striking omission and therefore a nice opportunity to further bolster the case for the paper. My suspicion is, however, that this will prove to be not so easy, and it may lead to a paradox. If all authors start to follow precise guidelines, and therefore become less independent, would this lead to disqualify citation analysis for research evaluation? Following the argument in the paper, it would reduce citation noise, hence increase the validity of citation analysis for research evaluation. At the same time, it would also decrease the independence of the authors, thereby undermining the very idea of the wisdom of the crowds, and thereby decreasing the validity of citation analysis for research evaluation.

Reply: Following the reviewer’s recommendation, we included an extended discussion of the independence of citation decisions. In doing so, we note that the strong concentration of citations at the top of the citation distribution points to mechanisms of cumulative advantage, suggesting that authors are likely aware of – and influenced by – the citation decisions of others. In the literature on the wisdom of crowds (Surowiecki, 2004), independence refers to the requirement that individual judgments are formed without being systematically influenced by the judgments of others. Independence is crucial because it ensures that individual errors remain uncorrelated; when errors are uncorrelated, they cancel out in aggregation, allowing the collective estimate to approach the true value (Hogarth, 1978). Conversely, when individuals are influenced by social cues, reputational pressures, or shared information sources, their judgments become correlated, and the aggregate no longer benefits from error‑cancellation, thereby undermining the crowd’s epistemic advantage. We applied this principle to citation decisions made by authors. In research evaluation, citation counts are used as indicators of the importance or influence of published work. For such aggregated citation data to function as a “wise crowd”, authors’ citation decisions would need to be made independently – based on their own engagement with the cited work rather than on social influence, strategic considerations, or imitation of others’ citation practices. When citation decisions are not independent, but instead shaped by bandwagon effects, prestige biases, or cumulative advantage dynamics, the resulting citation counts no longer reflect the distributed judgments of many independent experts but rather the correlated behavior of a socially influenced community, thereby reducing their validity as indicators of research impact. In our paper, we discuss the risk of an “unwise crowd” in citations due to a possible lack of independence in citation decisions. However, there is a decisive advantage of using aggregations of citation decisions in research evaluation: citation counts are based on the decisions of a large number of scientists. This large number significantly increases the reliability of these counts.

Conclusion

I would very much welcome the authors to address these contradictions in the key assumptions of the paper. If they would be willing to also address the purification problem in social science on this basis, it would add even more value to this conversation. Because, in the end we are addressing a very basic philosophical matter: are citations really supposed to be value free as the paper argues? The paper rather inter alia assumes that value-elements are inappropriate: “Value-laden (inappropriate) elements flow into the social citation system” (p. 3). My claim would be on the contrary that values are nothing less than the very core of all social systems, including the social citation system. Purification is in my view tantamount to taking the heart out of the social fabric, as the histories of all purification revolutions in the past attest. Is this really the way forward for citation analysis?

Reply: The starting point of our argument is the evaluative use of citations in research evaluation. Using citations in evaluative processes is subject to certain conditions. When citations are used for evaluative purposes, these processes assume that knowledge flow is being measured. Therefore, if one wants to use citations in research evaluation practice, one should first investigate whether citations actually measure knowledge flow (or something else). In our study, we have proposed a framework for this measurement. Secondly, measures should be established in academia to encourage citing authors to only include citations in a paper if knowledge flow has occurred. Our framework does not negate the many dimensions of citation decisions; they exist. However, since they pose a problem in the context of using citation data in citation analysis, this problem should be addressed if one intends to use citation data in research evaluation.

Adrian Barnett January 18, 2026 at 10:28 am - Reply

This is an interesting and engaging article on citation noise. It took a while to sink in, but it has made me think differently about citation errors and in particular the potentially huge hidden errors from papers that were incorrectly not cited.
One minor thing that may need checking is the statement that the 0.54 probability is only slightly better than a random selection. I think an actual random selection may not work out to be 50:50 as it depends on the prevalence in the example table. The easiest way to work it out would be to repeatedly simulate random choices throughout the table and then get the average probability.
Another potential source of citation noise is what papers/authors appear high-up in Google searches. Related to this, there are now software tools that appear to suggest citations as you write, e.g., https://www.sourcely.net/ and https://paperpal.com/tools/ai-for-research; how these choose citations will also add to the noise and potentially bias.
I agree that some citations errors should be squashed by large numbers, but then citations can sometimes snowball, with a good early start leading to more citations. the power of a large sample size to reduce errors may be weakened by this non-independence.
One way to reduce citation noise is for researchers to publish less and spend more time on their projects, including more time and thought on their citations.
LaTeX could be useful for tracking missed citations and informing citation choices. I often make hidden notes in my tex or bibtex files about why a paper is cited or not.
Pubpeer could be a place for flagging missing citations, or perhaps missing citations needs a separate online database.

Cite

Citation accuracy, citation noise, and citation bias: A foundation of citation analysis

Abstract

Full text

1 Introduction

2 Definition and measurement of citation accuracy, citation bias, and citation noise

3 Strategies to reduce citation noise and citation bias

3.1 Aggregation of citation decisions

3.2 Citation decision hygiene

3.2.1 Guidelines for ensuring accurate citations

3.2.2 Citation justification table

3.2.3 Training of (young) researchers

3.2.4 Correction of citation errors

3.2.5 The use of AI for citation decision correction

4 Discussion

5 Conclusions

References

Appendix

Editors

Editorial assessment

by Ludo Waltman

Recommendations for enhanced transparency

Peer review 1

Simon Wakeling

Peer review 2

Paul Wouters

Contribution to the existing literature

Problematic basic assumptions: the knowledge flow

Problematic basic assumptions: the independence

Conclusion

Cited literature

Author response

Editorial assessment

Reviewer 1 (Simon Wakeling)

Reviewer 2 (Paul Wouters)

Leave a comment

Making publishing and peer review more efficient and more rewarding

Making publishing and peer review faster, more efficient and more rewarding

Making publishing and peer review faster, more efficient and more rewarding

TRACKING COOKIES

THIRD PARTY EMBEDS

OUR PRIVACY POLICY

TERMS & CONDITIONS