Researchers are willing to trade their results for journal prestige: Results from a discrete choice experiment

Published at MetaROR

November 20, 2024

Table of contents

Abstract
Full text
Editors
Editorial assessment
Peer review 1
Peer review 2
Leave a comment

Available versions

Version 1

Version 2

Cite

Cite this article as:

Bohorquez, N. G., Weerasuriya, S., Brain, D., Senanayake, S., Kularatna, S., & Barnett, A. (2024, July 31). Researchers are willing to trade their results for journal prestige: results from a discrete choice experiment. https://doi.org/10.31219/osf.io/uwt3b

Researchers are willing to trade their results for journal prestige: Results from a discrete choice experiment

Natalia Gonzalez Bohorquez¹ , Sucharitha Weerasuriya¹ , David Brain¹ , Sameera Senanayake² , Sanjeewa Kularatna² , Adrian Barnett¹

1. School of Public Health & Social Work, Queensland University of Technology, Australia
2. Health Services and Systems Research, Duke-NUS Medical School, Singapore

Originally published on August 3, 2024 at:

https://doi.org/10.31219/osf.io/uwt3b

This is an evaluation of an older version of this article. MetaROR has also evaluated a

more recent versionmore recent version

of the article.

Abstract

The research community’s fixation on journal prestige is harming research quality, as some researchers focus on where to publish instead of what. We examined researchers’ publication preferences using a discrete choice experiment in a cross-sectional survey of international health and medical researchers. We asked researchers to consider two hypothetical journals and decide which they would prefer. The hypothetical journals varied in their impact factor, formatting requirements, speed of peer review, helpfulness of peer review, editor’s request to cut results, and whether the paper would be useful for their next promotion. These attributes were designed using focus groups and interviews with researchers, with the aim of creating a tension between personal and societal benefit. Our survey found that researchers’ strongest preference was for the highest impact factor, and the second strongest for a moderate impact factor. The least important attribute was a preference for making changes in format and wording compared with cutting results. Some respondents were willing to cut results in exchange for a higher impact factor. Despite international efforts to reduce the importance of impact factor, it remains a driver of researchers’ behaviour. The most prestigious journals may have the most partial evidence, as researchers are willing to trade their results for prestige.

Full text

Peer reviewed publications are academic currency [1]. Having sufficient publications in the bank is important for hiring, promotion and funding [2, 3]. Publications are also a vital record of evidence which can improve policy and practice, and direct future research [4]. Ideally publications could be both useful as academic currency and sources of evidence for scientific progress. However, the value of publications as a currency may be trumping their main purpose to provide reliable evidence [5]. The intrinsic motivation of a “Taste for Science” (described by Merton [6]) may have been superseded by the extrinsic motivation of a “Taste for Publications” [7]. In a “publish or perish” world, researchers may “prefer popularity to intrinsic value” and hence focus on where to publish instead of what to publish [1].

Most researchers regularly make considered decisions on what journal to submit to and how to navigate peer review. Factors include the journal’s prestige (often defined using the impact factor), the target audience, the article processing charges, the required formatting, and the journal’s rejection rate and turnaround times. The perfect home for a paper is rare [8], and researchers often need to make compromises to be successful [9]. We aimed to study some of the important compromises that researchers make and thus examine how researchers publish their research.
We were especially interested in the trade-offs that researchers make concerning personal benefits and the wider benefits for society. We aimed to test trade-offs between earning academic currency and creating an accurate record of the evidence.

Results

Sample description

The surveys were collected between 26 March 2024 and 30 May 2024 (66 days) (see Supplement S.1). The median time to complete the survey was 7 minutes. We received 616 responses from 7,376 invites giving a response rate of 8.5%; this excludes 170 emails that were no longer active. A classification tree found that the response rate varied by email domain, with a higher response rate of 21% for – amongst others – Australia, Switzerland and the UK, and lower response rate of 3% for – amongst others – China, Germany and Japan (see Supplement S.2). The questions were generally well completed but there was some survey fatigue with under 1% missing the first choice task and 15% missing the tenth and last choice task (Supplement S.3).

Thirteen percent of respondents found answering the hypothetical choices to be difficult or very difficult. The dominant choice task was selected by over 99% of respondents, indicating an excellent understanding of the attributes and levels. The repeat choice task had the same answer as the original for 79% of respondents, indicating good internal consistency.

Summary statistics on the sample are in Table 1. Respondents had been working in research for a median of 10 years and had a median of 43 peer reviewed papers. Forty-seven percent were female. The most popular broad research area was Clinical Sciences (57%). Forty percent of respondents had a personal target for their annual number of publications.

Table 1: Summary statistics on the respondents’ characteristics. Whether researchers had a target number of publications was only asked in the final sample; respondents could tick multiple answers for this question. Q1 = first quartile, Q3 = third quartile.

The sample included responses from 63 countries, with the three most common of USA (15%), UK (11%) and Australia (10%) (table of all countries in Supplement S.4).

Researchers’ preferences

The utilities for each attribute are in Figure 1 and Table 2. The figure also shows the utilities stratified by the respondents’ characteristics and the scenario wording concerning prior rejections.

Figure 1: Utility estimates and 95% confidence intervals for the six attributes. The dotted vertical line at zero is for no difference in utility. Forty-three publications was the sample median. JIF = journal impact factor.

Table 2: Utilities for the journal preferences and attribute importance. See Table 3 for the full wording of the attributes and levels. JIF = journal impact factor.

The strongest preference was for the highest impact factor and the second strongest for the moderate impact factor. The least important attribute was a preference for making changes in format and wording compared with cutting a table and analysis.

After the impact factor, the next strongest preference was for a helpful review. The utilities for a fast review and minor formatting were similar. Researchers had a clear preference for papers that were useful for their promotion.

More experienced researchers had a stronger preference for the highest impact factor and minor formatting. Researchers who had more peer reviewed papers had a much stronger preference for the highest and moderate impact factors.

Female researchers had slightly stronger preferences for helpful reviews and papers that were useful for their promotion.

There was little difference in researchers’ preferences by whether the paper had been previously rejected or not.

The latent class results are in Figure 2. The optimal number of groups according to the AIC was four. The largest group had the strongest preferences for impact factor, a relatively small preference for fast results, and a slight preference for cutting results over minor formatting. The second largest group had the strongest preference for a helpful review, with a much reduced – although still positive – preference for journal impact factor. The third group were not concerned about a helpful review, but strongly preferred minor over major formatting and a paper that was useful for their promotion. Ten percent of respondents provided non-informative responses.

Figure 2: Utility estimates and 95% confidence intervals for the six attributes using a latent class model. The percents in the panel headers are the group sizes. The dotted vertical line at zero is for no difference in utility. JIF = journal impact factor.

Interactions

The five planned interactions are plotted in Figure 3 with the estimates in Supplement S.5. When the journal had no impact factor, there was a stronger preference for a faster review. The journal rank had a similar interaction with both the editor’s requests and the style requirements, as there was no difference in utility when the journal had no impact factor. This could indicate an indifference by researchers about their papers in journals without an impact factor.

There was an interaction between the editor’s requests and a helpful review, as if the review was not helpful then there was a stronger preference for formatting and wording changes over cutting results. Whereas for helpful reviews, researchers showed little difference between the editor’s requests, which could be because they interpreted all requests as helpful.

There was a small interaction between a helpful review and speed, as researchers were more willing to wait for a helpful review.

Discussion

Researchers had the strongest preference for impact factor above any other tested attributes. This was both a desire for high impact factors and an aversion to papers with no impact factor. The importance of impact factor to researchers has been called an “obsession” [10], a “mania” [11], and a “game” that encourages “questionable practices” [12]. Major international initiatives have sought to combat the influence of impact factors, such as DORA in 2012
(https://sfdora.org/) and COARA in 2022 (https://coara.eu/). Despite these initiatives and the extensive debate on the negative consequences of using impact factors for evaluating researchers, the highest possible impact factor is a target for many researchers. A focus group participant framed impact factors as useful for “quantifying my academic abilities”, whilst a survey participant commented, “I’ve been told if it isn’t in an impact factor over 10 it doesn’t matter/count”. Journals must be indexed for three years to get an impact factor, but some respondents interpreted a journal without an impact factor as predatory rather than new, as stated by a survey participant, “I would never select a journal without an impact factor as I always publish in journals that I know and can trust that are not predatory.”

Researchers with more publications and more years of experience had a stronger preference for impact factor (Figure 1). This could be because some early career researchers are yet to understand the importance impact factors. Another explanation is a survivorship effect, as researchers with high impact factor publications have an advantage in employment and promotion [13], whilst researchers with less prestigious papers are out-competed [5].

Some survey respondents commented that they could not understand how a paper in a high impact factor journal could not be useful for their promotion or fellowship, which was a combination in the discrete choice tasks. This illustrates the power of the impact factor, as it trumps the content of the paper [11]. A recent survey showed how the content of papers is commonly neglected by grant and hiring committees, as over half use journal impact factor to assess credibility [14]. When fellowship and hiring committees make career-changing decisions based on impact factors, this sends a clear signal to researchers to prioritise impact factors over content. A researcher in our interviews appeared comfortable with being assessed based on impact factors: “People have to quantify me by something. So impact factor is a very important way to do that.” However, a focus groups participant recognised that impact factors are usually meaningless when considering real-world impacts: “I’ve been working together with senior executives in the government and federal government. They don’t care about that [journal impact factor], they only want you to give them a half-page summary.”

A focus group participant gave a perspective on impact factors that was pragmatic and confessional, “Considering and admitting for everybody, for various reasons, usually go for a top ranked journal in its field, and everything, and some of that will be purely mercenary, because that’s what’s required.” Personal values are ceded to the reward systems that use impact factors and/or journal ranking. We aimed to distinguish researchers with a stronger focus on system requirements by asking if they had a target number of publications per year, and 53% had a personal and/or institutional target. However, having a target did not greatly alter researchers’ preferences (Figure 1). Potentially most researchers are “playing the game” and the preference for journal ranking remains high regardless of the desired publication numbers [15].

A surprising result was the lack of difference in researchers’ preferences for papers that were useful for promotion by experience and publication numbers (Figure 1). This could be because the competition for funding and promotion never ends and researchers are always looking to earn academic currency. Tenured or retired professors may be under less pressure [16] and a professor from the focus groups commented, “I am the least strategic person when it comes to publishing but I think that also comes with seniority as I have no need to ever write a promotion application again!”

Survey participants were randomised to a scenario where their hypothetical paper had not yet been submitted to a journal or had already been desk-rejected twice (Box 1). This was raised in the focus groups, with comments including: “But then, after many rejections, right? You just want to get it out”. However, in the survey the previous rejections had no effect as researchers’ preferences were remarkably similar (Figure 1). Researchers’ preferences may be impervious to rejection, as the logical approach is to continue to pursue the highest impact factor possible. Preferences may change with more than two rejections or if the rejections were after peer review rather than desk-rejections.

The lowest utility was for an editor’s request of formatting changes compared with cutting a table and analysis. On average, researchers preferred not to cut their analysis, but this was less of priority than the impact factor, formatting at the submission stage, or the speed of the peer review. In the latent class analysis, the group with the strongest preference for impact factor had a surprising preference for cutting results (Figure 2), showing a willingness to compromise on their evidence to get published in prestigious journals [11]. This compromise was also discussed in our focus groups as a likely trade-off during the peer review process: “I certainly have examples where I have cut things out of papers to try and get something published.” Cutting results has also been discussed in the literature, for example: “Academics who play the ‘publish or perish’ game have a strong incentive to […] accept all ‘suggestions’ by the referees even if one knows that they are misleading or even incorrect” [17], and how during peer review “authors […] remove ideas and insights that they believe in from their work” [18]. To the best of our knowledge, our survey is the first to empirically show this compromise. An important implication is that the journals with the highest impact factors potentially have the most partial evidence, as researchers are more willing to “hold their nose” to satisfy the editors at influential journals [8]. One could argue that the journals were correct, and that the cuts improved the paper. However, in the scenario we told researchers “you believe it [your paper] is good quality” and the cut was 1,000 out of 4,000 words and included a table. Some researchers potentially rationalised this compromise by thinking that the removed results could be included in a supplement, but this relegates their findings at the “whim” of an editor [19].

An interesting finding from the focus groups and the survey is that researchers showed a relatively strong preference for helpful reviews and were willing to wait longer for helpful reviews. For example, an interview respondent said, “If there’s something that can improve them [my papers], I want them to be improved.” The preference for helpful reviews did not change by the researchers’ experience or number of publications (Figure 1), so it was not restricted to early career researchers. The latent group analysis showed that the second largest group most preferred a helpful review (Figure 2). The relatively strong preference for helpful reviews shows clear support for peer review, as many researchers want the expertise of their peers. Similarly, an international survey on the perception of peer review found that 93% disagree with the claim that peer review is unnecessary and 85% believe that peer review benefits scientific communication [20].

Related studies

Previous studies have examined researchers’ publication preferences using hypothetical journal choices. Similar to our results, the journal’s impact factor dominated preferences compared with the journal’s editorial board, journal’s standing among peers, quality of reviews, waiting time for reviews, and probability of being accepted [21]. Journal prestige, described using “journal level”, was also the most important attribute to junior authors in a conjoint analysis that compared journal prestige, author numbers, author order, and researchers’ time investment [22]. A choice-set survey found that researchers were willing to trade citations for a more prestigious journal [23].

A discrete choice experiment examined what metrics academics use when choosing papers to read [24]. There were clear preferences for citation counts, followed by the journal impact factor and download counts.

Limitations

Our discrete choice experiment was hypothetical and examined stated preferences not revealed preferences.
The low response rate (8.5%) reduces our ability to generalise and likely creates a non-response bias. Our approach email included words such as “journal” and “publishing” and so may have appeared similar to the many nuisance journal requests that researchers regularly receive and may have been automatically or manually deleted.

Respondents to our survey could be more engaged about the publication process than the wider population. We found a difference in response rate by country, hence our results over-represent some countries.

Methods

Designing the discrete choice experiment

We used a discrete choice experiment to examine researchers’ publication preferences as this is well-suited to testing the multiple trade-offs that researchers make when publishing papers.

We used multiple stages to design and deploy the discrete choice experiment (see Figure 4 and Supplement S.6 for details). With the aim of considering a wide array of attributes, we started with a literature review of papers that examined one or more potential attributes. The review collected 77 potential attributes about publications, with most concerning the journal (e.g., impact factor), the impact (e.g., social media discussion), and paper’s characteristics (e.g., paper with novel findings).

Figure 4: The stages of designing and deploying the discrete choice experiment to elicit researchers’ publication preferences

We used focus groups and in-depth interviews with health researchers from Australian academic institutions to explore the most important attributes, collect new attributes, and test potential trade-offs. We recruited participants from our networks and maximised for variation in career stage, gender, and research field. We ran focus groups in clinical sciences (8 participants), public health and health services research (8 participants), and used interviews for the two participants in fundamental science as we did not have enough people for a focus group. We piloted the focus group with 9 participants from health services research. The focus groups and interviews sample sizes were arbitrary, being mostly determined by the number of interested participants.

We used a semi-structured interview guide with an adapted nominal group technique without consensus [25]. Participants were asked to imagine they had written a paper and were now thinking of submitting it to a journal. They were asked about the attributes they consider most important when submitting to a journal. Each participant talked through up to ten attributes with the group and explained their choices. The attributes mentioned were then added to an online survey and participants voted on their most important attributes, explaining the rationale for their choices. We analysed and selected the attributes using the five steps of attribute development with a distilling approach [26].

An initial design of eight attributes was tested using a thinking-aloud exercise with ten researchers [27]. Researchers were shown a choice task and were asked to discuss their thoughts aloud on whether: they had any comments on the content or wording; there were any levels that they struggled to understand or that seemed unrealistic; the gaps between any levels were too jarring or obvious; and there was anything missing. This exercise identified that an attribute on journal prestige was sometimes contradictory to an attribute on journal ranking, and hence the prestige attribute was removed.

Attribute and level selection

The final attributes and levels are in Table 3. In this section we explain the choices behind the attributes and levels, and explain the perceived importance of some attributes and why some attributes were excluded.

Table 3: The six attributes and their levels for the discrete choice experiment. The first column is a short label used to refer to the attributes.

Journal impact factor was the most common attribute in the literature review and was also frequently mentioned in the focus groups and interviews. Participants suggested that its importance relied on self-serving purposes like job promotions, grants, and funding, but also it was perceived as a reflection of the excellence of the researcher and a way to quantify the worth of their work. Related to the impact factor was the idea of predatory journals, which raised strong feelings of aversion due to reputational damage (e.g., “I avoid them like the plague”). For the levels, we decided against numeric impact factors because these numbers vary by field [28], hence we used a relative field ranking of the highest, middle, and a journal without an impact factor, which could represent a new journal or a potentially predatory journal.

Formatting was often considered as a “painful” process. Concerns were mentioned about the time needed to fit a journal’s style requirements, and respondents wanted to avoid onerous systems. We used two simple levels of minor and major formatting.

Peer review was widely discussed, with researchers interested in the speed and quality of reviews. We framed both these attributes by what their colleagues had told them, as colleagues were an important source of information about prospective journals. We used the relative labels of “slow” and “fast” rather than numeric review times (e.g., 30 days) because average times vary by field [29].

The focus group discussions uncovered a new issue as some researchers raised experiences of being asked by a journal editor to cut results from their paper at the peer review stage. There were multiple potential reasons including to reduce the word count, to keep a “clean story”, to make the story “digestible”, to remove results that contradicted previous findings, or to remove findings that were not of interest to journals or colleagues. We included this as an attribute as it suited the tension we were aiming to test, being a trade-off between the loss of evidence from presenting an abridged version of the work against the potential benefit of earning a publication. A difference between this attribute and the others is that it occurs post-submission.

The final attribute was a direct appeal to personal benefit, as it concerned whether the paper was useful or not for their next promotion or fellowship application. An example of a good quality paper that researchers might not use in a fellowship application is a “negative” study, where, for example, a new intervention or treatment did not work (often judged by the arbitrary statistical significance threshold of p < 0.05). “Negative” studies can be less cited and receive less publicity than “positive” studies [30, 31], highlighting their reduced value as academic currency.

Article processing charges (APCs) were often discussed, but we excluded them as an attribute because they could often not be traded – for example, researchers with no budget to pay APCs. Using charges could have introduced a hypothetical bias, as researchers mostly do not personally pay the APCs and therefore the choices would be not be as meaningful [32].

Citations were a common attribute in the literature review, but focus group discussions revealed that these were seen as being beyond the control of the researchers and somewhat due to chance. Hence it would not be plausible to use varying citation numbers as attribute levels. Supporting this decision, a prospective study of journal editors found that citation counts were difficult to predict [33].

Scenario

The scenario in Box 1 was shown at the start of the survey and was repeated under every choice task.

The scenario framed the choice tasks and included some attributes of journal choice relevant for decision-making that: 1) could not be measured independently as they overlapped with other attributes, or 2) their importance was either relative across participants or deterministic. For example, the scope and readership of the journal were often mentioned in focus groups as one of the most important attributes. However, as researchers were strongly unwilling to submit to journals outside their scope, we added it to the scenario.

Box 1: Scenario for the discrete choice experiment
Imagine you have written a paper and are now trying to get it published in a journal.
Your paper contains original research and is around 4,000 words long with tables and figures. Your paper is relevant in your field and you believe it is good quality.
You will only consider journals that fit the scope of your paper and are read by your target audience. You have no previous experience with the journals (good or bad). You do not have any personal or professional relationships with the journal editors or publishers.
You are the first author and will make all decisions on behalf of your co-authors.
Scenario 1 ending: Your paper has not yet been submitted to any journal.
Scenario 2 ending: Your paper has been submitted and desk-rejected (rejected without peer reviews) by two journals.

The two scenario endings were created because in the focus groups some researchers mentioned how they might change behaviour after experiencing some rejections. To test this potential difference in the survey, researchers were randomised to view one or the other scenario ending in a 1:1 ratio.

Focus group participants mentioned that previous experiences with a journal, good or bad, would strongly influence their choices. To avoid this concern, the scenario stated that the researchers did not have any experience with the journal. Similarly we stated that they did not know the editorial staff, as this also influences researchers’ journal choices.

Dominant task

An example discrete choice task is shown in Figure 5. The choice tasks were unlabelled as the hypothetical journals were “A” and “B”. This example is the dominant choice task where “Journal A” is clearly the most desirable. It was used to examine whether respondents understood the task. It was shown as the first task to warm up respondents and was not used in the data analysis.

Survey of discrete choice tasks

The online survey started with a link to the participant information sheet (Supplement S.7) and asked researchers to indicate their consent. Fourteen respondents did not consent. Those who consented were shown the scenario (Box 1) and dominant task (Figure 5). Respondents next answered eight choice tasks. A final task was a repeat task of one of the eight. This was used to assess the stability of the participants’ responses based on the percentage of respondents that gave the same answer as the original task [34]. Differing answers could be due to learning effects or fatigue [34]. The repeat task was not used in the analysis.

The final section of the survey asked respondents if they found the choice tasks easy or difficult. We also gathered the following information from the respondents: their broad research area, gender, years of experience in research, number of published papers, country, and their perceived publication pressure. Lastly, the respondents could add optional comments. Respondents could skip any question. The complete survey is available from Supplement S.7.

Figure 5: Example discrete choice task showing the attributes and levels. This is the dominant choice task where Journal A has the levels we assumed most respondents would prefer.

The NGene (version 13.0) software was used to select 24 pairwise choice tasks based on the D-error to give an efficient fractional design. This D-efficient design was developed using the Modified Federov algorithm to estimate a multinomial logit model. For the final D-efficient design, the weights were selected from the pilot test. The 24 choice tasks were divided into three blocks of eight. Using a fractional design maximised the design’s statistical efficiency, whilst giving a manageable number of choice tasks of ten: eight plus the dominant task and re-test.

Statistical methods

We used the panel mixed multinomial logit (pMMNL) model and the panel Latent Class Model (pLCM) for the main analysis. We also used the pMMNL model to examine whether preferences systematically differed based on respondents’ characteristics. Results are presented as mean utilities with 95% confidence intervals, and the estimated attribute importance [35]. Subgroup analyses were conducted using the following characteristics: years of experience, gender, number of publications, having a publication target, and the hypothetical paper’s prior rejection. The pLCM was used to capture non-systematic heterogeneity in preferences among respondents, assuming that differences in preferences manifest as discrete groups or latent classes [36]. The ideal number of classes was determined using the Akaike Information Criterion (AIC). A pLCM was used to assess task non-attendance, incorporating a “garbage class” to identify respondents who provided non-informative responses [37]. This approach enabled an evaluation of preference heterogeneity that distinguished between attentive and non-attentive participants.

Our data collection and analyses were preregistered in a study protocol [38]. The only change from our planned design was that we did not use the pre-notification email for most invites, as it did not appear to increase the response rate.

Data and code availability

Our R code and data are available on GitHub [39].

Sample size

Sample size formulae are not available for discrete choice experiments and estimates are often made using rules of thumb or simulations [40, 41]. We faced uncertainty in selecting plausible model parameters, with 1 to 2 parameters per discrete choice attribute and no similar prior studies. Hence our final sample size was based on a pilot. Pilot testing has been recommended to inform sample size calculations for complex interventions [42].

We analysed the pilot data of 51 respondents to inform the final design. The required sample size based on minimising the D-error was 309. Both the pilot and final design had 24 choice tasks in three blocks of eight. The attributes and levels were the same in the pilot and final design, hence we combined respondents from the pilot and final surveys in our analyses.

Sampling frame

Our target population was current health and medical researchers. We approached this population by creating a sampling frame of researchers extracted from papers on the PubMed database, which is a widely used search engine that contains the MEDLINE database of published papers in life sciences and biomedical topics [43]. To capture current researchers we restricted the search from the year 2022 onwards. We used the “publication type” search field to exclude non-research papers like obituaries. We only extracted researchers who had an email available. The search was conducted on 11 April 2024.

The search returned over 140,000 papers, which we randomly re-ordered and iteratively extracted no more than one unique email per paper until we had a sample of 9,000 researchers. Randomly selected researchers from the sampling frame were sent an initial email with reminders one and two weeks later.

Additional information

Contributions

Conceptualization: NGB, SW, DB, SS, SK and AB.
Methodology: NGB, SS and AB.
Software: SS and AB.
Formal analysis: NGB, SS and AB.
Investigation: NGB, SW, DB, SS, SK and AB.
Data curation: AB.
Visualization: AB.
Writing—original draft: AB.
Writing—review and editing: NGB, SS, DB, SW, SK.
Funding acquisition: DB, SS, SK and AB.

Ethics declarations

The focus groups and interviews were approved by the Queensland University of Technology Human Research Ethics committee (Date: 18 April 2023, number: LR 2023-6685-13695). The survey was approved by the Queensland University of Technology Human Research Ethics committee (Date: 5 March 2024, number: LR 2024-8188-18148).

Funding

This work received funding from an internal grant from the Centre for Healthcare Transformation at Queensland University of Technology.

References

[1] Génova, Gonzalo, Astudillo, Hern´an, and Fraga, Anabel. “The Scientometric Bubble Considered Harmful”. In: Science and Engineering Ethics 22.1 (Feb. 2015), pp. 227–235. doi: 10.1007/s11948-015-9632-6.
[2] Schimanski, Lesley A. and Alperin, Juan Pablo. “The evaluation of scholarship in academic promotion and tenure processes: Past, present, and future”. In: F1000Research 7 (Oct. 2018), p. 1605. doi: 10.12688/f1000research.16493.1.
[3] Rice, Danielle B et al. “Academic criteria for promotion and tenure in biomedical sciences faculties: cross sectional analysis of international sample of universities”. In: BMJ 369 (2020). doi: 10.1136/bmj.m2081.
[4] Dawes, Martin et al. “Sicily statement on evidence-based practice”. In: BMC Medical Education 5.1 (Jan. 2005). doi: 10.1186/1472-6920-5-1.
[5] Smaldino, Paul E. and McElreath, Richard. “The natural selection of bad science”. In: Royal Society Open Science 3.9 (Sept. 2016), p. 160384. doi: 10.1098/rsos.160384.
[6] Merton, R.K. and Storer, N.W. The Sociology of Science: Theoretical and Empirical Investigations. Phoenix books. University of Chicago Press, 1973. isbn: 9780226520926.
[7] Binswanger, Mathias. “Excellence by Nonsense: The Competition for Publications in Modern Science”. In: Opening Science. Springer International Publishing, Dec. 2013, pp. 49–72. isbn: 9783319000268. doi: 10.1007/978-3-319-00026-8_3.
[8] Maggio, Lauren A. et al. ““The best home for this paper”: A qualitative study of how authors select where to submit manuscripts”. In: (May 2024). doi: 10.1101/2024.05.14.594165.
[9] Anderson, Melissa S. et al. “The Perverse Effects of Competition on Scientists’ Work and Relationships”. In: Science and Engineering Ethics 13.4 (Nov. 2007), pp. 437–461. doi: 10.1007/s11948-007-9042-5.
[10] Onstad, David W and Sime, Karen R. “The ethical and social effects of the obsession over Journal Impact Factor”. In: Annals of the Entomological Society of America 117.3 (Mar. 2024). Ed. by Matt Hudson, pp. 160–162. doi: 10.1093/aesa/saae013.
[11] Casadevall, Arturo and Fang, Ferric C. “Causes for the Persistence of Impact Factor Mania”. In: mBio 5.2 (2014), 10.1128/mbio.00064–14. doi: 10.1128/mbio.00064-14.
[12] Falagas, Matthew E. and Alexiou, Vangelis G. “The top-ten in journal impact factor manipulation”. In: Archivum Immunologiae et Therapiae Experimentalis 56.4 (July 2008), pp. 223–226. doi: 10.1007/s00005-008-0024-5.
[13] Pitt, Rachael and Mewburn, Inger. “Academic superheroes? A critical analysis of academic job descriptions”. In: Journal of Higher Education Policy and Management 38.1 (2016), pp. 88–101. doi: 10.1080/1360080X.2015.1126896.
[14] Hrynaszkiewicz, Iain et al. “A survey of how biology researchers assess credibility when serving on grant and hiring committees”. In: (Mar. 2024). doi: 10.31222/osf.io/ht836.
[15] Chapman, Colin A. et al. “Games academics play and their consequences: how authorship, h-index and journal impact factors are shaping the future of academia”. In: Proceedings of the Royal Society B: Biological Sciences 286.1916 (2019), p. 20192047. doi: 10.1098/rspb.2019.2047.
[16] Niles, Meredith T. et al. “Why we publish where we do: Faculty publishing values and their relationship to review, promotion and tenure expectations”. In: PLOS ONE 15.3 (Mar. 2020), pp. 1–15. doi: 10.1371/journal.pone.0228914.
[17] Frey, Bruno S., Eichenberger, Reiner, and Frey, Ren´e L. “Editorial Ruminations: Publishing Kyklos”. In: Kyklos 62.2 (Apr. 2009), pp. 151–160. doi: 10.1111/j.1467-6435.2009.00428.x.
[18] Eisen, Michael B et al. “Peer review without gatekeeping”. In: eLife 11 (Oct. 2022). doi: 10.7554/elife.83889.
[19] Schmid, Sandra L. “Five years post-DORA: promoting best practices for research assessment”. In: Molecular Biology of the Cell 28.22 (Nov. 2017). Ed. by Doug Kellogg, pp. 2941–2944. doi: 10.1091/mbc.e17-08-0534.
[20] Ware, Mark. “Peer review in scholarly journals: Perspective of the scholarly community–Results from an international study”. In: Information Services & Use 28.2 (2008), pp. 109–112.
[21] Rousseau, Sandra and Rousseau, Ronald. “Interactions between journal attributes and authors’ willingness to wait for editorial decisions”. In: Journal of the American Society for Information Science and Technology 63.6 (Mar. 2012), pp. 1213–1225. doi: 10.1002/asi.22637.
[22] Krasnova, Hanna et al. “Publication Trade-Offs for Junior Scholars in IS: Conjoint Analysis of Preferences for Quality, First Authorship, Collaboration, and Time”. In: Proceedings of the International Conference on Information Systems (ICIS). 2014.
[23] Salandra, Rossella, Salter, Ammon, and Walker, James T. “Are Academics Willing to Forgo Citations to Publish in High-Status Journals? Examining Preferences for 4* and 4-Rated Journal Publication Among UK Business and Management Academics”. In: British Journal of Management 33.3 (May 2021), pp. 1254–1270. doi: 10.1111/1467-8551.12510.
[24] Lemke, Steffen, Mazarakis, Athanasios, and Peters, Isabella. “Conjoint analysis of researchers’ hidden preferences for bibliometrics, altmetrics, and usage metrics”. In: Journal of the Association for Information Science and Technology 72.6 (2021), pp. 777–792. doi: https://doi.org/10.1002/asi.24445.
[25] Bohorquez, Natalia Gonzalez et al. “Attribute Development in Health-Related Discrete Choice Experiments: A Systematic Review of Qualitative Methods and Techniques to Inform Quantitative Instruments”. In: Value in Health (June 2024). doi: 10.1016/j.jval.2024.05.014.
[26] Bohorquez, Natalia Gonzalez et al. “Enhancing Health Preferences Research: Guidelines for Qualitative Attribute Development in Stated Preference Studies”. In: OSF (July 2024). url: https://osf.io/g9jbt.
[27] Leighton, J.P. Using Think-Aloud Interviews and Cognitive Labs in Educational Research.
Understanding Qualitative Research. Oxford University Press, 2017. isbn: 9780199372911.
[28] Althouse, Benjamin M. et al. “Differences in impact factor across fields and over time”. In: Journal of the American Society for Information Science and Technology 60.1 (Dec. 2008), pp. 27–34. issn: 1532-2890. doi: 10.1002/asi.20936. url: http://dx.doi.org/10.1002/asi.20936.
[29] Publons. 2018 Global state of peer review series. 2018. doi: 10.14322/publons.GSPR2018. url: https://publons.com/static/Publons-Global-State-Of-Peer-Review-2018.pdf.
[30] Greenberg, S. A. “How citation distortions create unfounded authority: analysis of a citation network”. In: BMJ 339.jul20 3 (July 2009), b2680. doi: 10.1136/bmj.b2680.
[31] Koren, Gideon. “Bias Against Negative Studies in Newspaper Reports of Medical Research”. In: JAMA: The Journal of the American Medical Association 266.13 (Oct. 1991), p. 1824. doi: 10.1001/jama.1991.03470130104037.
[32] Hensher, David A. “Hypothetical bias, choice experiments and willingness to pay”. In: Transportation Research Part B: Methodological 44.6 (July 2010), pp. 735–752. doi: 10.1016/j.trb.2009.12.012.
[33] Schroter, Sara et al. “Evaluation of editors’ abilities to predict the citation potential of research manuscripts submitted to The BMJ: a cohort study”. In: BMJ 379 (2022). doi: 10.1136/bmj-2022-073880.
[34] Ozdemir, Semra et al. “Who pays attention in stated-choice surveys?” In:¨ Health Economics 19.1 (Mar. 2009), pp. 111–118. doi: 10.1002/hec.1452.
[35] Gonzalez, Juan Marcos. “A Guide to Measuring and Interpreting Attribute Importance”. In: The Patient – Patient-Centered Outcomes Research 12.3 (Mar. 2019), pp. 287–295. doi: 10.1007/s40271-019-00360-3.
[36] Greene, William H. and Hensher, David A. “A latent class model for discrete choice analysis: contrasts with mixed logit”. In: Transportation Research Part B: Methodological 37.8 (2003), pp. 681–698. doi: https://doi.org/10.1016/S0191-2615(02)00046-2.
[37] Gonzalez, Juan Marcos, Johnson, F. Reed, and Finkelstein, Eric. “To pool or not to pool: Accounting for task non-attendance in subgroup analysis”. In: Journal of Choice Modelling 51 (2024), p. 100487. doi: https://doi.org/10.1016/j.jocm.2024.100487.
[38] Barnett, Adrian G et al. Study protocol: A discrete choice experiment to examine researchers’ publication preferences: an international cross-sectional survey. Mar. 2024. doi: 10.17605/OSF.IO/P9GUJ. url: https://doi.org/10.17605/OSF.IO/P9GUJ.
[39] Barnett, Adrian G. Code and data for a discrete choice experiment of authors’ preferences. July 2024. doi: 10.5281/zenodo.12814359. url:https://github.com/agbarnett/publication_preferences.
[40] Lancsar, Emily and Louviere, Jordan. “Conducting Discrete Choice Experiments to Inform Healthcare Decision Making: A User’s Guide”. In: PharmacoEconomics 26.8 (2008), pp. 661–677. doi: 10.2165/00019053-200826080-00004.
[41] Reed Johnson, F. et al. “Constructing Experimental Designs for Discrete-Choice Experiments: Report of the ISPOR Conjoint Analysis Experimental Design Good Research Practices Task Force”. In: Value in Health 16.1 (Jan. 2013), pp. 3–13. doi: 10.1016/j.jval.2012.08.2223.
[42] Lancaster, GA et al. “Trials in primary care: statistical issues in the design, conduct and evaluation of complex interventions”. In: Statistical Methods in Medical Research 19.4 (May 2010), pp. 349–377. doi: 10.1177/0962280209359883.
[43] Sayers, Eric W et al. “Database resources of the national center for biotechnology information”. In: Nucleic Acids Research 50.D1 (Dec. 2021), pp. D20–D26. doi: 10.1093/nar/gkab1112. url: http://dx.doi.org/10.1093/nar/gkab1112.

Supplementary material

S.1 Survey responses over time

Figure S.1: Cumulative number of survey responses over time for the pilot and final design.

S.2 Classification tree predicting survey response

Table S.1: Results of the classification tree using email domain.

We used a classification tree to predict survey response (yes/no) based on the researchers’ email domain (a proxy for country, e.g., au = Australia), and whether the researcher’s affiliation mentioned the words “Hospital”, “Dentist*” or “University”. The classification tree had three leaves with a cross-validated error of 0.990 with a standard error of 0.034. The tree only used the email domain, but found a relatively large difference in response proportions. We present the results as a table instead of a plotted tree as the number of email domains makes the plot cluttered.

S.3 Item-missing data

The plot below shows item missing data by question number. The missing data patterns are clustered by similarity. The question numbers are presented in order. There is evidence of survey fatigue as the percent missing increases from left-to-right.

Figure S.2: Item missing data for the 616 survey responses. The column headings show the question number and percent missing. The panel on the right shows the questions for each question number.

S.4 Respondents’ countries

Table S.2: Number and percent of responses by country. There were 63 countries in total.

S.5 Attribute interactions

Table S.3: Utility estimates and 95% confidence intervals for the planned interactions between attributes. The interactions are plotted in Figure 3. This table shows only the interaction terms and not the main effects. These results help judge the null hypothesis of whether there was no interaction.

S.6 Details on the discrete choice experiment design

This additional file includes details on the literature review, focus groups and interviews, and thinking aloud exercise. It is available here: https://osf.io/gjch7.

S.7 Participant information sheet and survey questions

The online version of the participant information sheet is available here https://osf.io/p9guj/wiki/home/.

A PDF version of the survey is available here https://osf.io/j7mce. The survey was delivered online using Qualtrics.

The survey questions differed by two questions between the pilot and final survey as we altered the question that aimed to examine researcher’s publishing expectations. This is because for the original question – “My department’s or research group’s expectations with respect to publishing are reasonable” – 81% responded as “Agree” or “Strongly agree” creating limited variance between respondents. Hence in the main survey we asked researchers if they had an annual publication target and what it was.

Editors

Ludo Waltman
Editor-in-Chief

Ludo Waltman
Handling Editor

Editorial assessment

by Ludo Waltman

DOI: 10.70744/MetaROR.12.1.ea

In this article the authors use a discrete choice experiment to study how health and medical researchers decide where to publish their research, showing the importance of impact factors in these decisions. The article has been reviewed by two reviewers. The reviewers consider the work to be robust, interesting, and clearly written. The reviewers have some suggestions for improvements. One suggestion is to emphasize more strongly that the study focuses on the health and medical sciences and to reflect on the extent to which the results may generalize to other fields. Another suggestion is to strengthen the embedding of the article in the literature. Reviewer 2 also suggests to extend the discussion of the sample selection and to address in more detail the question of why impact factors still persist.

Competing interests: Ludo Waltman is Editor-in-Chief of MetaROR working with Adrian Barnett, a co-author of the article and a member of the editorial team of MetaROR.

Peer review 1

Stephen Curry

DOI: 10.70744/MetaROR.12.1.rv1

This manuscript reports the results of an interesting discrete choice experiment designed to probe the values and interests that inform researchers’ decisions on where to publish their work.

Although I am not an expert in the design of discrete choice experiments, the methodology is well explained and the design of the study comes across as well considered, having been developed in a staged way to identify the most appropriate pairings of journal attributes to include.

The principal findings to my mind, well described in the abstract, include the observations that (1) researchers’ strongest preference was for journal impact factor and (2) that they were prepared to remove results from their papers if that would allow publication in a higher impact factor journal. The first of these is hardly surprising – and is consistent with a wide array of literature (and ongoing activism, e.g. through DORA, CoARA). The second is much more striking – and concerning for the research community (and its funders). This is the first time I have seen evidence for such a trade-off.

Overall, the manuscript is very clearly written. I have no major issues with the methods or results. However, I think but some minor revisions would enhance the clarity and utility of the paper.

First, although it is made clear in Table 1 that the researchers included in the study are all from the medical and clinical sciences, this is not apparent from the title or the abstract. I think both should be modified to reflect the nature of the sample. In my experience researchers in these fields are among those who feel most intensely the pressure to publish in high IF journals. The authors may want also to reflect in a revised manuscript how well their findings may transfer to other disciplines.

Second, in several places I felt the discussion of the results could be enriched by reference to papers in the recent literature that are missing from the bibliography. These include (1) Muller and De Rijcke’s 2017 paper on Thinking with Indicators, which discusses how the pressure of metrics impacts the conduct of research (https://doi.org/10.1093/reseval/rvx023); (2) Bjorn Brembs’ analysis of the reliability of research published in prestige science journals (https://www.frontiersin.org/journals/human-neuroscience/articles/10.3389/fnhum.2018.00376/full; and (3) McKiernan’s et al.’s examination of the use of the Journal Impact Factor in academic review, promotion, and tenure evaluations (https://pubmed.ncbi.nlm.nih.gov/31364991/).

Third, although the text and figures are nicely laid out, I would recommend using a smaller or different font for the figure legends to more easily distinguish them from body text.

Competing interests: None.

Peer review 2

Tony Ross-Hellauer

DOI: 10.70744/MetaROR.12.1.rv2

In “Researchers Are Willing to Trade Their Results for Journal Prestige: Results from a Discrete Choice Experiment“, the authors investigate researchers’ publication preferences using a discrete choice experiment in a cross-sectional survey of international health and medical researchers. The study investigates publishing decisions in relation to negotiation of trade-offs amongst various factors like journal impact factor, review helpfulness, formatting requirements, and usefulness for promotion in their decisions on where to publish. The research is timely; as the authors point out, reform of research assessment is currently a very active topic. The design and methods of the study are suitable and robust. The use of focus groups and interviews in developing the attributes for study shows care in the design. The survey instrument itself is generally very well-designed, with important tests of survey fatigue, understanding (dominant choice task) and respondent choice consistency (repeat choice task) included. Respondent performance was good or excellent across all these checks. Analysis methods (pMMNL and latent class analysis) are well-suited to the task. Pre-registration and sharing of data and code show commitment to transparency. Limitations are generally well-described.

In the below, I give suggestions for clarification/improvement. Except for some clarifications on limitations and one narrower point (reporting of qualitative data analysis methods), my suggestions are only that – the preprint could otherwise stand, as is, as a very robust and interesting piece of scientific work.

Respondents come from a broad range of countries (63), with 47 of those countries represented by fewer than 10 respondents. Institutional cultures of evaluation can differ greatly across nations. And we can expect variability in exposure to the messages of DORA (seen, for example, in level of permeation of DORA as measured by signatories in each country, https://sfdora.org/signers/). In addition, some contexts may mandate or incentivise publication in some venues using measures including IF, but also requiring journals to be in certain databases like WoS or Scopus, or having preferred journal lists). I would suggest the authors should include in the Sampling section a rationale for taking this international approach, including any potentially confounding factors it may introduce, and then adding the latter also in the limitations.
Reporting of qualitative results: In the introduction and methods, the role of the focus groups and interviews seems to have been just to inform the design of the experiment. But then, results from that qualitative work then appear as direct quotes within the discussion to contextualise or explain results. In this sense though, the qualitative results are being used as new data. Given this, I feel that the methods section should include description of the methods and tools used for qualitative data analysis (currently it does not). But in addition, to my understanding (and this may be a question of disciplinary norms – I’m not a health/medicine researcher), generally new data should not be introduced in the discussion section of a research paper. Rather the discussion is meant to interpret, analyse, and provide context for the results that have already been presented. I personally hence feel that the paper would benefit from the qualitative results being reported separately within the results section.
Impact factors – Discussion section: While there is interesting new information on the relative trade-offs amongst other factors, the most emphasised finding, that impact factors still play a prominent role in publication venue decisions, is hardly surprising. More could perhaps be done to compare how the levels of importance reported here differ with previous results from other disciplines or over time (I know a like-for-like comparison is difficult but other studies have investigated these themes, e.g., https://doi.org/10.1177/01655515209585). In addition, beyond the question of whether impact factors are important, a more interesting question in my view is why they still persist. What are they used for and why are they still such important “driver[s] of researchers’ behaviour”? This was not the authors’ question, and they do provide some contextualisation by quoting their participants, but still I think they could do more to contextualise what is known from the literature on that to draw out the implications here. The attribute label in the methods for IF is “ranking”, but ranking according of what and for what? Not just average per-article citations in a journal over a given time frame. Rather, impact factors are used as a proxy indicators of less-tangible desirable qualities – certainly prestige (as the title of this article suggests), but also quality, trust (as reported by one quoted focus group member “I would never select a journal without an impact factor as I always publish in journals that I know and can trust that are not predatory”, p.6), journal visibility, importance to the field, or improved chances of downstream citations or uptake in news media/policy/industry etc. Picking apart the interactions of these various factors in researchers’ choices to make use of IFs (which is not in all cases bogus or unjustified) could add valuable context. I’d especially recommend engaging at least briefly with more work from Science and Technology Studies – especially Müller and de Rijcke’s excellent Thinking with Indicators study (doi: 10.1093/reseval/rvx023), but also those authors other work, as well as work from Ulrike Felt, Alex Rushforth (esp https://doi.org/10.1007/s11024-015-9274-5), Björn Hammerfelt and others.
Disciplinary coverage: (1) A lot of the STS work I talk about above emphasises epistemic diversity and the ways cultures of indicator use differ across disciplinary traditions. For this reason, I think it should be pointed out in the limitations that this is research in Health/Med only, with questions on generalisability to other fields. (2) Also, although the abstract and body of the article do make clear the disciplinary focus, the title does not. Hence, I believe the title should be slightly amended (e.g., “Health and Medical Researchers Are Willing to Trade …”)

Competing interests: None.

Available versions

Cite

Researchers are willing to trade their results for journal prestige: Results from a discrete choice experiment

Abstract

Full text

Results

Sample description

Researchers’ preferences

Interactions

Discussion

Related studies

Limitations

Methods

Designing the discrete choice experiment

Attribute and level selection

Scenario

Dominant task

Survey of discrete choice tasks

Statistical methods

Data and code availability

Sample size

Sampling frame

Additional information

Contributions

Ethics declarations

Funding

References

Supplementary material

S.1 Survey responses over time

S.2 Classification tree predicting survey response

S.3 Item-missing data

S.4 Respondents’ countries

S.5 Attribute interactions

S.6 Details on the discrete choice experiment design

S.7 Participant information sheet and survey questions

Editors

Editorial assessment

by Ludo Waltman

Peer review 1

Stephen Curry

Peer review 2

Tony Ross-Hellauer

Leave a comment

Making publishing and peer review more efficient and more rewarding

Making publishing and peer review faster, more efficient and more rewarding

Making publishing and peer review faster, more efficient and more rewarding

TRACKING COOKIES

THIRD PARTY EMBEDS

OUR PRIVACY POLICY

TERMS & CONDITIONS