A new typology of national research assessment systems: continuity and change in 13 countries

Alexander Rushforth^1,2 , Gunnar Sivertsen^2,3 , James Wilsdon^2,4 , Ana Arango⁵ , Adriana Bin⁶ , Catriona Firth⁷ , Claire Fraser⁷ , Nino Gogadze^1,2 , Natalia Gras⁸ , Lee Harris⁹ , Jon Holm¹⁰ , Peter Kolarz^2,4 , Moumita Koley^2,11 , Jorge Maldonado Soto¹² , Marie-Helene Nienaltowski⁷ , Laura Rovelli^2,13 , Sergio Salles-Filho⁶ , Scipione Sarlo¹⁴ , Nerina Sarthou¹³ , Arne Sjostedt⁹ , Federico Vasen^13,15 , Nicole Ward-Boot¹⁶ , Marta Natalia Wróblewska¹⁷ , Fang Xu¹⁸ , Lin Zhang¹⁹

1. Centre for Science and Technology Studies (CWTS), Leiden University, The Netherlands
2. Research on Research Institute (RoRI)
3. Nordic Institute for Studies in Innovation, Research and Education (NIFU), Oslo, Norway
4. Department of Science, Technology, Engineering and Public Policy (STEaPP), University College London (UCL), UK
5. CoLaV, Universidad de Antioquia UdeA, Medellín, Colombia
6. School of Applied Sciences, State University of Campinas, Brazil
7. Research England, UKRI
8. University Research Council, University of the Republic, Uruguay
9. Australian Research Council (ARC)
10. Research Council of Norway
11. DST-CPR, Indian Institute of Science, Bangalore, India
12. Universidad Alberto Hurtado, Chile
13. National University of La Plata / UNICEN / CONICET, Argentina
14. ANVUR, Rome, Italy
15. University of Buenos Aires / CONICET & AMU, Poland
16. Dutch National Research Council (NWO), The Netherlands
17. SWPS University, Warsaw, Poland
18. Institutes of Science and Development, Chinese Academy of Sciences (CAS), China
19. School of Information Management, Wuhan University, China

Originally published on June 23, 2025 at:

https://doi.org/10.6084/m9.figshare.29366204.v2

Abstract

How are national systems for assessing publicly funded research evolving? What purposes do they serve and how are they designed to fulfil these? This working paper surveys the landscape of national research assessment and funding systems across thirteen countries from 2010 to 2024, and makes three contributions to our understanding of these systems. First, we advance a new typology to categorize and compare important characteristics of these systems, providing insights into their similarities and differences, and a basis for mutual learning. Second, we identify and compare important shifts over time across the thirteen systems through the framework of three dynamic and interacting research performance paradigms. These point to a gradual shift away from narrow conceptions of research ‘excellence’ towards more holistic criteria of value, qualities and impacts across several systems – though not all. Finally, we consider potential trajectories over the next decade: including how a variety of assessment systems might respond to and incorporate responsible research assessment (RRA) movements for reform. By mapping the landscape of research assessment systems across countries and identifying dynamics of change, this paper offers insights for policymakers, research funders and institutional leaders looking to navigate this terrain at a time of shifting expectations.

Full article

Introduction

This paper develops a novel typology for comparing national research assessment and funding systems worldwide, then analyses significant shifts over the past fifteen years. The typology reflects patterns observed across national systems from thirteen countries. It forms part of AGORRA: A Global Observatory of Responsible Research Assessment, a project initiated by the Research on Research Institute (RoRI) in 2023^[1]. AGORRA’s outputs include the RoRI Atlas of Assessment, an online observatory which monitors national assessment and funding systems (informed by the typology introduced in this paper); records changes over time; and provides a platform for assessment system design, experimentation and mutual learning among researchers, policymakers, research funders and institutional leaders.

Our focus is on national research assessment and funding systems, defined as “organized sets of procedures for assessing the merits of research undertaken in publicly funded organisations that are implemented on a regular basis, usually by state or state-delegated agencies” (Whitley, 2007, 6). Our typology and study does not include organisational procedures for the recruitment or promotion of researchers, or the assessment of research grant proposals by funding agencies. This work continues an important tradition of comparative analysis of national research assessment and funding systems (Geuna and Martin, 2003, Hicks, 2012, Debackere et al., 2018, Zacharewicz et al., 2019, Oschner et al., 2021, Sivertsen, 2023). A recent systematic review of this literature (Thomas et al., 2020), which also included more than 300 opinion pieces, reveals that most contributions (including the review itself) implicitly assume that assessment and funding are always combined in national systems. An often-used term is therefore performance-based research funding systems (PRFS), which can be defined as “national systems of research output evaluation used to distribute research funding to universities” (Hicks, 2012, 260). Whitley’s broader definition, quoted above, is important, given information provided by international partners in AGORRA reinforced that:

Funding can be formally detached from research assessment and vice versa, and there appears to be a trend in this direction.
Some institutional funding systems are not built on research evaluation but on indicators representing already-performed assessments in other contexts – such as external funding and peer-reviewed publications.
Some national assessment systems have purposes other than funding allocation, including accountability, organisational learning and strategic development.
There is an increasing interest in assessing organisations and their procedures, not only their outputs.
Some systems include the assessment of individual researchers, rather than only their organisation’s research.
National assessment systems may operate at multiple levels: institutions, departments, research groups, and individual researchers – with different implications at each of these levels.

In contrast with other comparative accounts of performance-based funding systems, our typology and analysis also capture national ex post assessment systems where, for instance, periodic evaluations of research performance are used to provide strategic advice. Besides these differences, our criteria for inclusion of national assessment systems overlap with those employed in other comparisons of performance-based funding systems:

Research must be the object of assessment.
Evaluations focusing only on the quality of degree programmes or teaching are excluded.
Research assessment must be ex post. Evaluations of research grant proposals for project or programme funding are ex ante evaluations, and are excluded.
Research outputs must be evaluated in some way. Systems that allocate funding based only on external research funding or PhD enrolment numbers are excluded.
It must be a national system. University evaluations of their own research standing, even if used to inform internal funding distribution, are excluded.

(adapted from Hicks, 2012)

This study also extends the literature comparing national systems, by being the first to incorporate a dynamic, longitudinal perspective.

In developing and applying this new typology, we were motivated by three research questions:

What characteristics differentiate national research assessment and funding systems, and how can these be categorized and compared?
What patterns of change can be observed across national research assessment and funding systems over the period 2010-2024?
How might agendas of assessment reform play out over the next 5-10 years?

As in earlier comparative studies (e.g. Hicks 2012; Kolarz et al. 2019; Zacharewicz et al. 2019), we employ a desk-based, analytical review method, whereby the typology was co-developed with country-specific experts, comprising a mix of academic researchers and senior staff within the funding and evaluation agencies involved in AGORRA. The collaborative approach involved the core AGORRA research team holding multiple exchanges with experts across the life-cycle of the project: from initial study design, to providing provisional insights into their respective national systems, to applying and refining early drafts of the typology, and offering iterative feedback for improvement. Experts also shared up-to-date information on recent changes to their systems, contributed to the paper’s final draft, and to the accompanying Atlas of Assessment observatory. In recognition of these extensive and multi-layered contributions, country-specific experts are listed as co-authors of this paper.

Our study examines thirteen countries’ national research assessment and funding systems: Argentina, Australia, Brazil, Chile, China, Colombia, India, Italy, Mexico, Netherlands, Norway, Poland, and the United Kingdom (UK). This sample includes seven countries from the Global South (often under-represented in other studies). Existing comparative literature on national research assessment systems is focused more heavily on peer-review oriented, organisation-level evaluations, which are more common in OECD and EU countries – somewhat overlooking the individual-level systems that are more prevalent in regions like Latin America (Vasen et al, 2023). By including less- studied systems such as those of India, China, and five Latin American countries, our study captures characteristics of several under-mapped assessment systems. A short summary of the 13 countries’ respective systems can be found in the Appendix.

The next section addresses our first research question on characteristics, categorizations, and comparisons of national systems by introducing our typology as applied to thirteen countries. We then present a theoretical framework for observing system changes over time by distinguishing between three major performance paradigms in the development of research assessment and funding over the last forty years. We draw on insights from the typology and conceptual framework to analyse thirteen national systems and to inform our concluding exploration of future trajectories in research assessment over the next five-to-ten years. Taken together, the typology and conceptual framework offer a shared language for understanding the diverse landscape of research assessment systems and the various transformations they are undertaking, enabling sophisticated cross-country comparisons and mutual learning.

1.1 Typology of national research assessment and funding systems

Our typology is separated into four core aspects to categorize major differences and four additional aspects to support closer examination and assessment of the systems. Examples from thirteen national systems are given to illustrate each aspect of the typology. National systems may be complex by combining different assessment schemes for different purposes at different levels of aggregation. Each scheme may be described differently according to the typology. A country may therefore appear more than once to exemplify one aspect.

The four core aspects are:

1. Assigned Purpose

Systems may differ according to their primary purpose:

a. Funding allocation and reputation

b. Accountability

c. organisational learning and strategic development

d. Statistics and overview of research activity

d. Promotion of individual researchers

e. Accreditation

e. Other

Where these purposes are combined, the relative importance of each can be weighted.

Examples: Systems may differ in terms of their primary assigned purpose. Accountability – the idea of holding research performers accountable for their use of public funds and the results of research – is a frequently recurring purpose across our sample of systems. Many countries’ systems combine accountability with other objectives like funding allocation and reputation building (funding allocation and reputation). Some, like the UK’s Research Excellence Framework (REF) and Italy’s Evaluation of Research Quality (VQR), directly link evaluation results to funding outcomes, while others, such as Chile’s National Accreditation Commission, place more emphasis on the reputational consequences of research evaluation results. This combination of purposes extends to national systems for individual assessment and promotion in Argentina, China, Colombia, and Mexico. In Latin America such systems routinely assess “the individual performance of academics based on their academic activities and outputs and assigns them a ‘researcher category’, which carries prestige and, in many cases, additional monthly rewards” (Vasen et al, 2023, 244). Indicator-based systems, like those of Norway and India, also incorporate funding and reputational elements to varying degrees (with India’s NIRF characterized by low reliance on funding and high emphasis on reputation). By contrast, a smaller number of systems, exemplified by the Dutch Strategy Evaluation Protocol, Argentinian PEI and Norwegian Disciplinary Evaluation, prioritize organisational learning and strategic development (alongside accountability in the Dutch case).

2. Unit of assessment and scope

Systems may collect information about and assess:

a. Disciplines across organisations

b. The organisation as a whole

c. Units within the organisation/Research groups

d. Individual researchers

The scope may include all eligible candidates representing the unit of assessment or a certain selection among them. A unit of assessment is often a university department and may represent a discipline at the same time. The level of assessment may differ from the level of funding. Both need to be categorized. As an example, the United Kingdom assesses the quality of research of units of assessment within the organisations while the outcome influences the funding of the organisation as a whole.

More than one unit may be addressed in the same system. Italy combines a) and b). Norway combines a,) b), and c) in the system for disciplinary evaluation. In China, the word ‘Double’ in the system called ‘Double First-Class Evaluation’ means that it combines university-level and disciplinary- level assessments. A country may have more than one system, each of them addressing different units of assessment and purposes, e.g. Argentina, China, Colombia, and Norway.

3. Focus of the assessment

Systems may focus on and collect evidence on different aspects of research performance:

a. Scholarly outputs

b. Scientific impact (citations)

c. Societal interaction (collaboration and co-creation, public engagement, impact cases, technology transfer)

d. Competitive grants

e. organisational performance

f. Research culture

g. Performance of individuals

h. Other

Examples: The UK initially only focused on scholarly outputs and environment. Societal interaction (impact cases) and organisational performance (more weight on environment) have been included in the two most recent exercises in 2014 and 2021, and research culture (as an expansion of environment assessment) will be included in 2029. Revenues from competitive grants are often an indicator of organisational performance in indicator-based systems. Norway’s and Poland’s funding system are examples. Argentina, China, Colombia, and Mexico all have specific systems for assessing the performance of individuals.

4. Effects on funding and reputation

Systems are often discussed as having intended and unintended effects. Dahler-Larsen (2014) points out that systems always have ‘constitutive effects’ just by being implemented.

By their official aims and practical implications, systems may determine:

a. Funding and reputation

b. Only reputation

c. Other significant effects

The relative importance – and the strength of the influences on funding and reputation – of each can be weighted (Strong-Medium-Weak) by users of the typology.

Examples: Most national systems influence both funding and reputation. Examples where the assessments only influence reputation are Chile, Colombia (2 of 3 systems), India, the Netherlands, and Norway (1 of 2 systems). However, reputation may in turn indirectly influence funding. The ERA system in Australia initially influenced both funding and reputation. After the effect on funding was removed, reputation was still experienced as important for the universities.

Other significant effects than funding and reputation may be the influence on (and purpose of) strategic development and learning at the local level. This is the case in Chile, Colombia (1 of 3 systems), the Netherlands and Norway (1 of 2 systems). Systems with strong influence on funding and reputation will generally also affect strategic development. Individual level assessments systems may influence careers, salaries, and resources for performing research. Argentina, Brazil, China, Colombia, and Mexico have national systems for career assessment. Accreditation can be another effect: the outcome of the assessment may influence the right of an institution to grant certain degrees, provide certain courses, or establish new professorships. Poland is an example.

Four additional dimensions of the typology to support further examination of national systems now follows.

5. Methods

The national systems often rely on expert panels informed by peer review and by statistics and indicators, but they may differ in how peer-review and/or expert advice are organized and to what degree the assessment is informed by data and indicators. The types of indicators and their data sources may also differ. The relative importance of each specific method can be weighted.

Examples: The balance between quantitative performance indicators and qualitative peer review remains a persistent tension in national assessment and funding systems. Many evaluation-based systems aspire to operate according to principles of ‘informed peer review’ whereby quantitative indicators inform – but do not replace – expert assessments (Butler, 2007). Some systems differentiate methods by field: in Australia’s ERA and Italy’s VQR, social sciences and humanities have been assessed using peer review, with STEM and other disciplinary panels more likely to use bibliometrics. While the ERA has ceased, the role of bibliometrics in the current VQR is reduced. The UK’s REF allows expert panels some discretion in deciding to use bibliometric measures, but expert review remains the primary means of assessing outputs.

Indicator-based systems offer standardized approaches, exemplified by the Norwegian Indicator, which serves as a nationwide system for partly allocating resources based on a comprehensive publication database, a publication indicator, and a performance-based funding model (Sivertsen, 2018). Similarly, Poland’s Parametric Exercise (EJDN) aggregates points on a range of quantitative performance criteria, including bibliometrics, PhD graduations, and educational indicators, which inform core funding distribution and may affect accreditations.

The use of algorithms and formulas is not limited to indicator-based systems. In Italy’s VQR, panel scores given to units of assessment feed into competitive performance rankings and formula-based distribution of research funding.

Given their primary assigned purposes of generating strategic advice to research performing units, indicator-informed peer review-based evaluations like the Dutch Strategy Evaluation Protocol and the Norwegian disciplinary evaluations, work differently. There, expert panels use peer review to deliver narrative reports on performance and ongoing organisational improvements.

Pressures to ensure appropriate uses of quantitative indicators have been heightened in recent years by the rise of ‘responsible metrics’ as a trans-national professional reform movement across research policy and universities (discussed below in Section 1.2).

6. Type of performance-based institutional funding

Systems that affect institutional funding directly may appear in three main types: evaluation-based funding (the use of peer review and expert panels), indicator-based funding (direct use of performance indicators), and funding contingent on performance agreements between the funder and the individual research organisations.

Examples: Among the thirteen countries included in this overview only Norway and Poland have indicator-based organisational funding systems, and none have organisational funding based on performance agreements. These types are however more widespread than our sample indicates (Sivertsen, 2023). Indicators may also have an important role in informing evaluation-based funding systems. This is the case in Argentina, Chile, China, Italy, and Mexico. The expert panels in Australia’s ERA used to be strongly informed by supporting contextual indicators in select disciplines (e.g. STEM).

7. Formative versus summative

Different purposes may result in different main directions for the assessment. A formative evaluation learns from the past and looks forward, serving organisational learning and strategic development. A summative evaluation looks at past performance, checks whether goals or expectations have been reached, and serves decisions and/or resource allocation. Where these directions are combined, the relative importance of each can be weighted according to expert judgement.

Examples: Systems tend to be summative if they rely mainly on empirical evidence for the assessment of past performance. Also, if they determine funding, the outcome of the assessment has to be translated into a quantitative formula. With the exception of Chile, the Netherlands, one of Australia’s (now ceased) systems, and one of the Norwegian systems, most systems covered by our overview will tend to have a summative direction. However, one should not neglect the possible combination of the two purposes of summative and formative, particularly if the effect on funding and reputation is weak (Sivertsen, 2023). Both purposes may appear in official statements. An example is the three purposes that are stated for the REF in the UK in 2029:

Inform the allocation of block-grant research funding to HEIs based on research quality;
provide accountability for public investment in research and produce evidence of the benefits of this investment;
provide insights into the health of research in HEIs in the UK (REF, 2029)

The first purpose can be characterized as summative and the third as formative while the second may express both purposes. The example shows that this classification can in some cases be more a matter of emphasis than a strict binary distinction.

8. Governance

Governance is about how the systems are designed, implemented, and organized on a continuing basis with distributed responsibilities. Systems may differ according to: the role of the agency responsible for operating the system; the transparency and predictability of the results; the degree to which a system is made mandatory, voluntary, or incentivized.

Governing agencies

Systems develop within specific national and historical contexts, with varying degrees of political and administrative centralisation. They tend to be under formal control of a central government agency or arms-length body like a research council, but the role of these agencies varies across countries, particularly in their relationship to government and the academic community. The involvement of and collaboration with the national academic community in the design, implementation, management and evaluation of systems also varies.

Examples: Some systems are directly controlled by government agencies, such as in Poland, India, and Colombia, where ministries of science, education, or equivalent administer assessments.

However, a growing trend over the past three decades is the growth of intermediary organisations like funding agencies and research councils (Braun, 1998, Braun and Guston, 2003) overseeing ex- post assessment and funding, as seen in Brazil, Norway, the UK, and the Netherlands.

The extent of academic community involvement in system design also varies significantly between nations. Countries like the UK and the Netherlands regularly consult with academics, research managers, and sector groups when making periodic adjustments to their systems, while others, such as Colombia and India, offer limited opportunities for input. Even within countries, the level of academic consultation can differ across systems. The presence or absence of meta-evaluations of assessment and funding systems serves as another indicator of accountability to the research community and other stakeholders. At present few countries routinely commission independent meta-evaluations and make them publicly available. The UK REF (Stern, 2016, Digital Science, 2015) and Italian VQR (Expert Review Panel, n.d., Galderisi et al., 2019) though have had meta-evaluation reviews commissioned periodically.

The transparency and predictability of the methods and results (High-Medium-Low)

The transparency and predictability of methods and results in national assessment systems are variously influenced by the choice of method, purpose of assessment, and the consequences of results for funding and reputation.

Examples: Indicator-based systems, due to their standardized nature, theoretically offer higher reproducibility and transparency, as exemplified by the Norwegian Indicator. However, this principle is not universally upheld. Peer review-based evaluations, being less standardized, generally offer lower reproducibility and transparency. The purpose of the evaluation plays a role, as evaluations linked to funding may intentionally limit transparency due to potential litigation risks, as seen in the UK REF’s closed-door panel discussions where panel scoring of individual items are destroyed and not made public in accordance with confidentiality and data protection principles. Peer review evaluations oriented towards delivering narrative reports and strategic advice, like the Dutch Strategy Evaluation Protocol (SEP), exhibits low transparency in how results are produced and used. A distinction can be made here between the public availability of the overall results, versus how and to what degree feedback is utilized by the evaluated units (Gras, 2022).

Mandatory vs. voluntary vs. incentivized participation

Finally, the extent to which assessment exercises are mandatory, incentivized or voluntary varies across national assessment systems. The Brazilian CAPES, the Dutch SEP and the Italian VQR are examples of systems for which participation has effectively been mandatory through regulation or laws. Systems in which research performing organisations or researchers are incentivized to

participate but not legally mandated (through the ‘carrot’ of financial or reputational rewards or potential negative costs of not participating), include Argentina’s PRINUAR, the UK REF, and China’s national assessment and selection systems for elite individual researchers. Other systems where participation is incentivized, include India’s NIRF, Chile’s CNA, Colombia’s High-Quality Accreditation Model, and Mexico’s SNI. In some nations with multiple ex-post assessment systems, certain systems may be mandatory, while others are incentivized or voluntary. For example, the Norwegian indicator-based funding system for research in the higher education sector is mandatory for universities until 2025 (and still mandatory in hospitals and public research institutes), but subject specific evaluations are only normatively expected (and thus classed as voluntary in our typology); and in China, the National Disciplinary Evaluation is voluntary, but the Double-First Class Evaluation is incentivized.

1.1 A framework for understanding ongoing developments in national research assessment and funding systems

Our multi-case comparison of the evolution of national assessment and funding systems 2010-2024, combined with insights from earlier comparative studies, reveals the following broad pattern: the purposes, main focuses and methods of research assessment, have changed, albeit to varying degrees globally, over the last four decades. Changes can be mapped according to three broad paradigms of research performance popularized within research policy at various points in time:

The performance paradigm of professional-disciplinary evaluation.
The performance paradigm of excellence and competition.
The performance paradigm of responsible research assessment.

Originating from Thomas Kuhn, and developed for policy analysis by Peter Hall, the term paradigm can be defined as “a shared model of reality that guides policy makers’ problem-solving activities” (Carson et al., 2009, 18). Crucially, one performance paradigm does not straightforwardly replace another: in practice, there is ‘layering’ (Aagaard, 2017, Capano, 2023) of different aspects of these paradigmatic ideas, which emerge and evolve in context specific ways and shape assessment systems in different ways over time.

In giving our paradigms labels such as ‘excellence’ and ‘responsible research assessment’, we acknowledge we are giving them normative ‘member categories’ used by actors within the field. We consider such labels however to be useful analytical tools for naming and interpreting broad developments in assessment systems. Such labels typically emerge once developments are already underway, serving to capture a sense of transformation within research assessment. At the same time, these labels are performative: they assist actors in agenda building and help to justify, validate, or strengthen certain positions and decisions (Rip, 2000). We now elaborate on each paradigm.

The first paradigm: professional-disciplinary evaluation

The first paradigm of research performance, particularly dominant before the advent of formalized, external assessment, was largely driven by the professional-disciplinary tradition (Elzinga, 2012).

Here academic departments operated with relatively high levels of autonomy, relying on internal disciplinary standards to guide decision-making around hiring, promotion and internal allocation of resources (Whitley, 2000).

Up until the 2010s, much had been written about changes in the policy rationales, design, and methods of national public science systems. In 1994, John Ziman described the end of an era of ‘big bang’ investment in public sciences that had defined the post-war period for publicly funded research in many Western countries until the 1970s (Ziman, 1994). For Ziman, such states were entering a ‘dynamic steady state’ era of science policymaking, defined by moderate increases, stabilization, or cuts to research funding, with economic crises, ideological policy shifts, and changing cultural conceptions of science and technology making politicians generally less willing to continue meeting demands for levels of funding enjoyed in the rapid expansionist post-war period. In Ziman’s book, the clearest manifestation of this new incrementalist regime was the then-recent emergence of the world’s first periodic ex-post national assessment and funding system, the Research Assessment Exercise (RAE) in the UK. Together with decreases in block grant funding in favour of project funding and increased setting of priorities around public policy goals, the RAE promised to help rationally administer allocation of dwindling funds, provide research quality assurance mechanisms, ensure accountability for how research performers were spending scarce public funds, and promote the UK’s global competitiveness in research. Even before the first version of the UK’s exercise was conducted in 1986, national research policymakers in the UK and elsewhere had become increasingly concerned with the competitiveness and economic performance of their national ‘knowledge economies’, influenced in part by evaluation reports published by international organisations like the OECD, World Bank, and the European Union (Wagner et al., 2001, Henriques and Larédo, 2013, Lepori et al., 2007). The faith that increasing investment would improve the quality of research and the economic and social benefits that flow from it was giving way to a belief that funds should best be invested strategically in promising areas of science and technology.

This period was particularly advantageous and attractive for certain research domains like biomedical and health sciences, with others (e.g. social sciences and humanities) experiencing stressful estrangement and cutbacks. In the UK and many other national systems, a new performance paradigm in research policy was thus beginning. Despite such developments, the 1970s and 1980s were for many academics still largely an era where research assessment was dominated by disciplinary standards within individual fields, rather than explicit consideration of interdisciplinary or societal needs (Elzinga, 2012, Whitley, 2000).

The second paradigm: excellence and competition

From the early 1990s onwards came a growing performance paradigm with a strong focus on ‘excellence’ in research assessment and funding. The UK’s RAE system would even later change its name to the Research Excellence Framework (REF). In other countries, excellence also became a common denominator that could include all areas of research (regardless of contribution to economic growth) and bridge with different societal expectations. However, excellence also turned out to be a strong instrument of prioritization: we will only fund the best, and the funded will get more (Scholten et al., 2021). A competition not only within areas of research, but across them as well, was implemented. The focus on measuring research performance coincided with the increasing availability of bibliometric information and its usage in performance-based funding.

The concept of excellence was initially seen as a unifying standard that could encompass all research fields and align with societal expectations. However, in practice, such approaches were not well- aligned with research and evaluation traditions of certain fields, including much of social sciences and humanities. Within academic literatures, the rise of the excellence regime has been attributed to wider currents of globalization (pressures for competition between national knowledge economies) and New Public Management (less centralized command-and-control government combined with periodic accounting and auditing, performance and outcome measurement, and promises of increased efficiency through competition) (Elzinga 2012, Hicks 2012). Periodic assessments, performance-based funding, quantitative performance indicators, and increasing implementation of public policy goals into evaluation criteria of universities and funding agencies are characteristic of such developments (Rip, 2004, Whitley, 2019). A notable example of the ascendance of the excellence performance paradigm can be found in the official research policy aim at the European level to establish ‘more effective national research systems – including increased competition within national borders and sustained or greater investment in research’, as expressed by the European Commission (2012) in a communication with guidelines for A Reinforced European Research Area Partnership for Excellence and Growth.

As implied in the concept of ‘layering’, the excellence paradigm does not replace the disciplinary paradigm per se, but builds on and reconfigures it. Thus, systems that promote accountability and competition, including those with the word ‘Excellence’ in their title (like Excellence in Research Australia and the Research Excellence Framework), have continued to uphold certain practices synonymous with the disciplinary paradigm (e.g. appointing expert peer review panels to administer the evaluations along disciplinary lines). Likewise, systems that utilize bibliometrics and rankings often transform peer reviewed scholarly outputs into quantifiable data points for aggregation (Osterloh and Frei, 2014).

The third paradigm: responsible research assessment (RRA)

The evolution of the performance paradigm as manifest in national research assessment and funding systems is arguably entering a third stage (in some countries), with the emergence of the ‘responsible research assessment’ (RRA) agenda (Curry et al., 2020, Curry et al., 2022, IAP-GYA-ISC, 2023, Benamara et al., 2024). These calls for policy changes can be read as a response to perceived ‘public value failures’ in existing paradigms of research assessment (Bozeman and Sarewitz, 2011), featuring discontent towards both the traditional professional disciplinary evaluation paradigm (for excessive self-referentiality, specialization and esotericism of the ‘ivory tower’, manifest in perceived overemphasis on scientific publications and citation impact indicators) and with the excellence paradigm (for excessive emphasis on competition and selectivity). Critics have held excellence- oriented national assessment and funding systems responsible for creating unintended consequences in research, such as hyper-competition and poor academic work cultures, task reduction, goal displacement, and deterring inter- and trans-disciplinarity (de Rijcke et al., 2016).

RRA attempts to shift focus towards promoting a range of priorities, including Equity, Diversity, and Inclusion (EDI) and improving research culture, open science, promoting collaboration, and addressing societal challenges through impact-oriented research and alignment with public policy goals. The convergence of various science reform movements from the mid-2010s onwards has generated more visible, organised momentum for research assessment reform. Trans-national and national initiatives have provided visibility and engagement platforms for this broad agenda. Perhaps the best-known reform initiative, Coalition for Advancing Research Assessment (CoARA), advocates for research assessment to begin recognizing more alternative research quality criteria around open science and research integrity, equity and inclusion, and research culture in research assessments of various kinds. CoARA – a coalition of research funding organisations, research performing organisations, and assessment authorities – builds on calls for responsible uses of quantitative indicators, advanced for instance by the San Francisco Declaration on Research Assessment (DORA, 2013), Leiden Manifesto (Hicks et al., 2015), and Metric Tide (Wilsdon, 2016), which challenge the power of journal-based indicators and repositions bibliometric tools as means of supporting rather than replacing expert qualitative peer judgment (Rushforth and Hammarfelt, 2023). Similarly, the Global Research Council working group on research assessment recently set out a framework of eleven dimensions of responsible research assessment (Benamara et al., 2024). CoARA and GRC Dimensions incorporate but go beyond calls for responsible metrics, by outlining a number of hitherto marginalized qualities they would like to see rewarded and recognized in various levels of research assessments (CoARA recommends its commitments and principles be enacted at individual, proposal, and organisational assessment levels). Marginalized quality criteria they promote include professional service work, open science contributions, research integrity, equity and inclusion, and societal engagement. The RRA umbrella thus brings together existing and emerging ideas, practices, and criteria, seen as promising alternatives to shortcomings of the disciplinary and excellence paradigms. This may prompt greater awareness of alternative possibilities – even if not all the practices or ideas are new per se.

Twelve years after the European Commission (2012) had published its guidelines for A Reinforced European Research Area Partnership for Excellence and Growth, arguably a high watermark of the excellence paradigm, the same organisation in 2024 published an Action Plan to implement the Agreement on Reforming Research Assessment, in collaboration with CoARA (European Commission, 2024). The plan emphasizes qualitative assessment with the avoidance of inappropriate metrics and clearly shows that the general trends in research assessment have entered into the early stages of a third paradigm in which broader criteria for research assessment are promoted.

Can the national systems be expected to change at the same time and in the same direction?

CoARA is presently an example of how research performing and research funding organisations may collaborate across nations to change the criteria and procedures for research assessment in the same direction. However, CoARA mainly focuses on individual level assessments for recruitment, promotion and external project funding. Sivertsen and Rushforth have argued CoARA’s Agreement on Reform of Research Assessment does not approach organisational assessment and funding with the same clear understanding and guidelines (Sivertsen and Rushforth, 2024). The reason may be that national systems for assessment and funding of research organisations are more rooted in the ‘larger’ and more different national traditions for policies for spending and organisation in the total public R&D sector. The question is thereby to what degree can we expect countries to change their national assessment and funding systems in the same direction in accommodating the recent influence of the RRA performance paradigm?

Based on experiences with the designs, implementations, developments, and discussions of performance-based organisational funding systems in 26 countries, Sivertsen explains why the systems may differ and change at different rates with different influences:

Although some systems may seem similar across countries, they are never the same and they are modified all the time. PBFS [performance-based funding systems] differ because they are anchored in the local traditions and mechanisms of state budgeting and embedded in the local negotiations about priorities and developments in the higher education sector. They are dynamic because they are continuously contested and thereby often adjusted.

Countries also mutually learn from each other and inspire changes in their PBFS. The systems are conservative as well. Once implemented, they become games with rules and specific terminologies and infrastructures that are difficult to change. Also, they need to be predictable because they influence budgets and the spending of tax revenues on the funding side. There is a need to ensure some stability of budgets at the institutions. (Sivertsen, 2023, 90)

A further complication is that even when countries are influenced by the same performance paradigm, ‘the final result is not convergence but different interpretations of the same general recipe’ (Capano, 2023). As we will now show, layering of the nascent third paradigm – RRA – plays out very differently and to varying degrees in national systems across our sample of countries.

Preliminary observations of an emerging third paradigm with responsible research assessment paradigm

In highlighting junctures and patterns of change across our sample of countries, 2010-2024, we argue: a) different aspects of the three performance paradigms are ‘layered’ onto one another to greater or lesser degrees, meaning wholescale replacement of one by another does not occur (c.f. Aagaard, 2017, Capano, 2023), and b) the extent to which a third RRA paradigm has begun to affect national assessment and funding systems (or is even visible at all) varies considerably across our featured countries.

Early manifestations of an emerging third phase came with the introduction of societal contributions as a new criterion, in Australia (Williams and Grant, 2018) and most famously in the UK’s REF 2014 (Martin, 2011). This showed dissatisfaction with the emphasis on research excellence defined largely in disciplinary terms and can be read as a move to increase the importance of social and economic goals. The REF’s adoption of societal impact also inspired other countries, such as the Netherlands, Brazil, and Poland, to utilize similar criterion, albeit in distinctive ways. The Netherlands and Norway, like the UK, incorporated societal impact or relevance into existing systems, and in Poland, societal impact criterion has been added to the EJDN evaluation system, one of many changes made by the 2018 Law on Higher Education (Wróblewska, 2025). Australia set up the EI as a new, separate system to the already existing ERA (which continued to serve the rationale of promoting disciplinary-based excellence), with the EI dedicated to research engagement and societal impact. The EI ran only once, in 2018. Societal contributions are arguably the most prominently developed and formally integrated dimensions of the RRA umbrella to feature across national assessment and funding systems to date, with a growing (though not universal) number of systems accommodating some variation of this criterion.

Other RRA components, like open science, multilingualism, and responsible metrics, appear to have had varying levels of visible impact on designs and methods of systems within our sample. Though Poland’s EJDN has incorporated societal impact, they have so far not accommodated additional items that have emerged under the RRA umbrella. Ongoing sectoral discussions in Australia suggest widespread awareness of the RRA agenda internationally, although the discontinuation of the ERA in 2023 and uncertainties over what will succeed it, means it is unclear presently how exactly the RRA agenda might play out in future national assessments and funding systems. While quantitative, indicator-based systems and excellence criteria feature prominently in many Latin American research assessment systems, there have also been efforts to diversify assessment approaches.

Efforts to counter English-language bias and support publishing in major regional languages like Spanish and Portuguese have grown, mainly by integrating regional indexes like SciELO, RedALyC, and Latindex into national assessment systems (Beigel, 2025). In 2014, Argentina’s CONICET approved a resolution for the social science and humanities equating journals indexed in regional databases with those in international indexes like Web of Science and Scopus. CONICET’s researcher career system also has qualitative, narrative components integrated into its evaluations, alongside bibliometrics and interviews. As mentioned, Brazil’s 2021–2024 CAPES evaluation cycle also integrated ‘impact of society’ as an explicit criterion. The above changes have been promoted and amplified by regional advocacy networks such as CLACSO, whose 2022 principles sought to raise awareness, adapt, and extend global frameworks like DORA and the Helsinki Initiative on Multilingualism in a Latin American and Caribbean context (CLACSO, 2022).

In 2018, five ministries and institutions in China issued a special action, calling on Chinese science to address concerns over the dominance of ‘the four onlys’ (‘only papers, only titles, only educational background, and only awards’) as the primary criteria for research evaluation and talent recognition across the system (Xiaoxuan and Fang, 2020). This coincided with critiques growing across national research policies towards ‘Science Citation Index worship’, echoing statements from North America and Europe, such as the DORA Declaration and Leiden Manifesto (Zhang and Sivertsen, 2020). Even before the special action by the five ministries and institutions began, some well-known research institutes had already started moving away from largely quantitative, output-based definitions of academic performance, with the Chinese Academy of Sciences (CAS) among those taking a leading role in the reform. From 2011 onward, the Chinese Academy of Science’s (CAS) Research Institute Evaluation was reformed into a major achievement, output-oriented evaluation system primarily using qualitative peer review, with quantitative indicators serving only as supporting information.

This shift was the culmination of longer-term efforts by CAS to increase the role of expert judgment and counter-balance systemic effects of the metrics-dominated approach its Research Institute Evaluation system had initially adopted in the early 1990s (Xiaoxuan and Fang, 2020).

One feature of an emerging RRA paradigm is a shift in assessment methods from an exclusive focus on outputs (heavily emphasized in Hicks’s 2012 definition of PBRF systems) to include more process- oriented indicators and narrative-based methods of research performance. Of the performance based funding systems we studied, the UK’s REF has experimented the most so far with moves away from traditional output and results-based forms of evaluation. This is particularly evident in the introduction of ‘institutional-level environment statements’ in REF 2021, which continued a shift in emphasis towards process-oriented indicators, rather than only research outputs and results, that started in RAE 2008. The environment statement required institutions to submit strategic statements that focused on, for instance, support for inclusive research cultures for staff and research students, and steps to ensure sustainability of the unit of assessment through investing in people and infrastructure (Inglis et al., 2024). The elevation of process and input-oriented factors is a qualitatively new development for performance-based funding systems. Where process indicators were mentioned at all in earlier comparative studies, they were treated as unrelated to research performance (e.g. Hicks’s 2012 account explicitly positioned process indicators as separate from this type of system). The growing importance of more process-oriented indicators over the past decade and a half in the UK REF has gone largely under the radar in existing comparative studies. This trend is being intensified still further via the expanded weighting of ‘People, Culture, and Environment’ in REF 2029 and currently ongoing development of new indicators. While earlier accounts stating that process indicators of research management and infrastructure fall outside the scope of assessing research performance reflected prevailing views at the time they were written, an evolving cultural framing of assessment is seeking to reposition research culture and process indicators as important elements in assessing research performance.

Open science – itself a diverse umbrella term (UNESCO, 2021) – has begun in diverse ways to feature as a requirement in some national assessment systems. In Latin America, SciELO, RedALyC, and Latindex’s infrastructures have long supported integration of Diamond Open Access journals (free to read and publish in), facilitating their inclusion in formal research assessment processes. These indexing systems have also implemented quality control mechanisms to screen so-called predatory or spurious journals (Beigel, 2025). Meanwhile, open access publishing was prioritized in Colombia’s 2024 Research Groups Assessment Model. The UK REF 2021 passed an open access mandate as a condition for eligibility of some output types, as has the most recent version of the Italian VQR 2020- 2024, but not all countries have such requirements yet. Furthermore, within the period covered, none of the studied countries’ committed to adhering to use of open information on which to base evaluations of performance in their national assessment and funding systems, as advocated by the Barcelona Declaration on Open Research Information (Barcelona Declaration 2024). Elsewhere

under the open science umbrella, the Italian VQR 2024’s assessment is based on three criteria: originality, methodology, and the impact of the submitted research products. Unlike in past VQR exercises, the methodological criterion has been re-defined to encourage evaluators to pay attention not only to the rigor of research steps but also to certain aspects such as reproducibility, transparency, accessibility and the reuse of data (when applicable) in the publications. This again is reflective of national assessment systems’ gradual responsiveness towards pressures by trans- national science reform movements on issues of open science, research integrity, and metascience.

Several national assessment system authorities have furthermore signed declarations such as DORA, to commit to responsible uses of (particularly quantitative) research indicators. These include NWO and KNAW (overseeing the Dutch SEP) and UKRI (of which Research England – one of the UK’s four higher education funding bodies who govern the REF – is a council), while ANVUR (overseeing the Italian VQR) is a signatory of CoARA. In 2023, Chile’s National Research and Development Agency (ANID) became the first Latin American institution to sign up to the CoARA agreement. Typically, there is no single agency that oversees all research assessments (including ex-ante and ex-post) in an entire national context and voluntary action initiatives like DORA and CoARA can be signed by different actors in a given national research context. In Argentina, for instance, CONICET signed DORA in 2022, and while it is one of the most important agencies in the country, it is the only national agency responsible for overseeing a national assessment that has done so. Some universities also sign up to these agreements, without the national assessment system agency doing so, as is the case in Poland. It is at this point unclear how signing these voluntary action initiatives translates into concrete reforms of research assessment practices in a given national system, especially as there are not typically strict enforcement or compliance checks by the initiatives themselves. Of course, there are other types of assessment beyond national ex-post assessment systems (e.g. career or project selection assessments) where such initiatives can be taken up, though these are beyond the scope of this study.

Notable across the 2010-2024 period is that no new strategic advice-oriented assessment systems have been established, save for Australia’s EI (which was discontinued after running once) and the Norwegian Disciplinary Evaluation. For the latter, since 2011, one of the main tasks of the evaluation committees for biology and for medicine and health has been to deliver advice to the research council and the government. This ambition was followed up in the 2017 and 2018 evaluations of social science and humanities, and in the third round of STEM-evaluations starting in 2021. Across the thirteen countries, though, the overall picture is one of consolidation of summative-oriented systems that do not explicitly deliver strategic advice.

Combinations of disciplinary and excellence-oriented ideas are visible in all cases, albeit how these are combined and play out in different systems varies considerably. Principles of excellence (such as competition through selective performance-based funding derived from evaluating traditional research output and results evaluations) continue to be valorized. So far, efforts to mitigate or counter perceived dysfunctions in the disciplinary and excellence regimes have led to notable, visible changes in only some countries’ national assessment systems.

Possible developments over the next 5-10 years

How might elements of the Responsible Research Assessment (RRA) agenda translate differently, according to the methods, rationales, and focuses of the respective national assessment and funding systems in our sample? Generally, the principles and criteria of RRA may be more readily absorbed in formative, advisory assessment systems. In summative, competitive systems, by contrast, changes in assessment criteria can directly alter evaluation outcomes, which may affect institutional reputation or access to funding, making such changes more consequential and therefore more likely to be contested. Before recent adaptations of performance-based funding systems like the UK REF to include more process indicators, evaluation of research inputs and processes was typically the preserve of advisory-oriented assessments. As far back as 2009-2015, the Dutch SEP included ‘vitality and feasibility’ (e.g. staffing, prioritization and project management as research performance criteria). By 2021 additional criteria for evaluating research performance in the SEP included: open science, PhD policy and training, academic culture, and human resources policy. Although incorporating RRA criteria into formative frameworks like the Dutch SEP and Norwegian Disciplinary Evaluation may be less contested, they are potentially less likely to serve as strong levers for inducing behaviour and culture change. ‘Strong’ systems, by contrast, seem more likely to shift priorities of the assessed, given the significant resource and reputational importance of scoring well in their outcomes (Whitley, 2007). Arguably the introduction of societal impact in REF 2014 has contributed to shifts (albeit unevenly) in the sector, in ways that more discretionary, advisory

procedures could not. Though not without controversy, the inclusion of ‘People, Culture and Environment’ in REF 2029 will likely provide strong stimulus for organisations in the sector to take more seriously this particular interpretation of the RRA agenda.

One uncertainty with respect to RRA-oriented reforms in the context of national assessment and funding systems is the extent to which indicator-based systems might play a role in furthering this reform agenda. Ostensibly, statements such as CoARA’s Agreement may appear sceptical towards what our typology terms indicator-based funding systems (where standardized indicators are applied directly to measure performance). However, the potential for indicator-based systems to incorporate and promote certain elements of the RRA agenda, such as open science, team science, or peer review data, should not be overlooked (Sivertsen and Rushforth, 2024). Advances in bibliometric meta-data might, for instance, help indicator-based allocation schemes evaluate and monitor certain kinds of open access publishing activity, or track cross-sector or cross-disciplinary collaborations or research impacts. Caution of course needs to be taken in employing any quantitative assessment system, to ensure data quality, reliability, and monitoring for unintended consequences. However, the potential for advanced bibliometric methods and data to support the RRA agenda (rather than being its nemesis) is an important consideration meriting wider discussion and awareness-raising. This is particularly important for countries with established indicator-based national systems and little appetite to establish large-scale national peer-review based evaluations.

From where we sit now, it appears unlikely there will be a single direction of travel for national research assessment systems. Some systems may consolidate around large-scale competitive peer review based systems, while others may explore alternatives (as hinted by Adams et al, 2022). Some policymakers may look to indicator-based systems as an option, while others could look towards advisory-oriented assessments or systems where accreditation is an important purpose (a recent report by Guillet (2025) calls on research culture and environment to become a major focus in the external quality assurance of research in higher education institutions conducted by accreditation bodies). Interest will no doubt grow in the potential for AI and data-driven assessment methods, especially in light of concerns regarding the administrative burden of large assessment exercises, though care should be taken not to succumb to hype or reproduce existing biases (Adams et al., 2022). An alternative to national research assessment and funding systems are performance agreements, a ‘dialogue-based funding’ model whereby governments or intermediaries negotiate targets with higher education and research organisations, with funding tied partially or fully to organisations then meeting the contractual targets (Salmi and Hauptman, 2006, 40, 58). Supporters of performance agreements argue they can support diversification of organisational profiling, enhance sensitivity of performance criteria towards institutional missions and public values, generate more context-specific indicators, and strengthen trust between government and institutions (Jongbloed et al., 2020). This approach is not without potential drawbacks though, with critics pointing to concerns over erosion of organisational autonomy, increased bureaucracy, and financial penalties for unmet targets occurring for reasons beyond an organisation’s control (Sivertsen, 2023).

Whichever models get selected, the pathways towards RRA reforms in national assessment systems are likely to be manifold and complex. Some systems may largely ignore or buffer RRA ideas altogether, some may shift gear and press accelerate more rapidly towards RRA, while others may start to accept and enact these ideas and practices but at a slower pace. Certain countries may experience ‘pressure from below’, for instance, in 2024, 36% (1147 institutions) of the total organisational signatories of DORA’s Declaration were from Latin America; but equally academic elites and policymakers in such a regional context could close ranks and valorise the excellence paradigm. There may also be perceptions that the RRA paradigm is a largely European movement.

This scenario points towards a fundamental fork in the road in the coming 5-10 years: some national systems begin or continue to accommodate an RRA-oriented agenda, while others further consolidate along disciplinary-excellence lines. Such developments in national systems are of course not immune to larger systems shocks: financial crises, austerity, pandemics, backlashes against universities, and geopolitical developments, may yet shift landscapes for publicly-funded research in very unpredictable directions. An emerging performance paradigm like RRA may benefit from such disruptions or be held back by these shifting fault-lines.

Conclusion

This paper has two objectives: first, to develop and apply a comprehensive typology of national research assessment and funding systems across thirteen countries (including some from the Global South); and second, to analyse changes in these systems from 2010 to 2024.

Our typology confirms the considerable diversity in national assessment and funding systems reported in existing comparative literature (Geuna and Martin, 2003, Whitley and Gläser, 2007, Hicks, 2012, Zacharewicz et al., 2019, Oschner et al., 2021), reinforcing that there is no universal recipe. Our typology makes several important new contributions. First, it provides an up-to-date tool that enables the comparison of diverse systems that have evolved very differently along the three performance paradigms forming our conceptual framework. This makes it easier to compare systems that would otherwise be difficult to compare directly. Second, our study included countries with at least one of the following types of ex-post national assessment systems currently or recently in place: indicator-based funding; peer review linked to funding; peer review linked to organisational improvement; and/or individual-level national assessment systems. By expanding our inclusion criteria beyond performance-based funding systems to include, for instance, ex-post evaluation systems with the primary assigned purpose of generating strategic advice, we present a more comprehensive overview of alternative options for administering research funding and quality assurance across national research spaces (c.f. Sivertsen, 2023). Third, our sample allows us to shed further light on ex-post systems where the unit of assessment is individual-level research performance. These predominate in Latin America and have tended to be relatively under-studied in both comparative analyses and research evaluation literature more generally (Vasen et al., 2023).

Fourth, our collaborative model has allowed our typology to draw on and synthesize an extensive range of information that otherwise can only be found across a large number of documents and in different languages.

Our longitudinal perspective has led us to theorize changes in rationales and expectations surrounding national research assessment and funding systems from 2010-2024. This is a departure from the cross-sectional designs typically informing comparisons of national assessment and funding systems. This approach led us to argue that over time three paradigms of performance assessment have shaped these systems, to varying degrees. First, was the disciplinary paradigm (peer autonomy, internal disciplinary standards), then the excellence paradigm (competition, selectivity of funding, performance indicators) and most recently the nascent ‘responsible research assessment’ (RRA) paradigm (broadening assessment beyond traditional outputs, diversifying research quality, widening public values). Importantly, we contend national systems have not moved seamlessly from one phase to another: rather paradigmatic ideas accumulate over time, creating ‘layering’ effects where elements of earlier paradigms persist alongside newer paradigms and combine to generate systems with complex, hybrid characteristics.

Our analysis also shows that the RRA agenda is a paradigm still in its early stages, featuring in some but not all national systems, meaning it is not yet a global trend. Non-linear and geographically diverse responses to RRA so far suggest a potential fork in the evolution of national assessment and funding systems could emerge in the coming five-to-ten years, where only some national systems accommodate and adapt to this paradigm in a pro-active fashion. Furthermore, where the RRA agenda is gathering momentum, it remains heterogeneous, still in its infancy, and context- dependent.

Our analysis leads us to caution against oversimplification when describing trends in national assessment systems. While systems have been discontinued in some countries, others have been introduced or are under consideration for the first time. In our view, the overall picture is one largely of consolidation of such systems under dynamic steady state conditions, and of gradual, uneven changes in rationales, designs, and focuses of assessments, rather than wholescale transformation or withdrawal. Looking to the near future, the key challenge for both reforming and newly established systems alike, will be selecting appropriate criteria, design, and methods for assessing research performance and successfully translating aspirational RRA principles into workable assessment practices. How this emerging paradigm gets translated into large-scale periodic peer review-based assessments, versus advisory assessments, indicator-based systems, or perhaps even national individual assessment systems or performance agreements, and how the emerging paradigm interacts with existing ideas and practices of disciplinarity and excellence, are urgent policy questions demanding additional empirical and theoretical research.

Future research should extend our current analysis to an even broader range of countries not included in this study. Additionally, investigating the practical implementation and outcomes of RRA principles in various national contexts would provide valuable insights for policymakers, institution leaders and researchers worldwide. It is essential that the evolution of these systems reflects each country’s unique strategic needs rather than uncritically pursuing a single ‘best practice’ model. In this respect, our typology and analysis of system changes offers valuable opportunities for cross- national mutual learning and further underlines the importance of a comparative research agenda. Indeed with the exception of the important studies cited throughout this paper, comparative perspectives are rare, and longitudinal comparative studies still rarer.

In conclusion, this study contributes a comparative lens and understanding of the complex landscape of national research assessment and funding systems. By providing a comprehensive typology and conceptual framework to compare systems’ evolutions longitudinally, we hope to inform more effective and context-appropriate policy decisions in the domain of research performance evaluation and funding allocation.

Endnote

[1] https://researchonresearch.org/project/agorra/

Acknowledgements

This paper forms part of AGORRA: A Global Observatory of Responsible Research Assessment, a collaboration led by the Research on Research Institute (RoRI). RoRI’s second phase (2023–2027) is funded by an international consortium of partners, including: Australian Research Council (ARC); Canadian Institutes of Health Research (CIHR); Digital Science; Dutch Research Council (NWO); FWF-Austrian Science Fund; Gordon and Betty Moore Foundation [Grant number GBMF12312; DOI 10.37807/GBMF12312]; King Baudouin Foundation; La Caixa Foundation; Leiden University; Luxembourg National Research Fund (FNR); Michael Smith Health Research BC; National Research Foundation of South Africa; Novo Nordisk Foundation [Grant number NNF23SA0083996]; Research England (part of UK Research and Innovation); Social Sciences and Humanities Research Council of Canada (SSHRC); Swiss National Science Foundation (SNSF); University College London (UCL); Volkswagen Foundation; and Wellcome Trust [Grant number 228086/Z/23/Z]. The Research Council of Norway has funded the research of Gunnar Sivertsen in this project through grant number 256223 R-

Quest. Thank you to Emanuel Kulczycki for helpful feedback on the Polish national research assessment and funding systems. Sincere thanks to all AGORRA partners for their engagement and support:

Research England/UKRI
Australian Research Council
ANVUR (Agenzia Nazionale Di Valutazione Del Sistema Universitario E Della Ricerca)
Evaluation Research Centre, Chinese Academy of Sciences (CAS), Beijing
Dutch Research Council (NWO)
The National Research Foundation of South Africa (NRF-SA)
Canadian Institutes of Health Research (CIHR)
Social Sciences and Humanities Research Council (SSHRC)
Swedish Research Council (SRC)
Research Council of Norway (RCN)
Volkswagen Foundation

We would also like to record our gratitude to members of the AGORRA working group for advice and guidance at every stage. Responsibility for the content of RoRI outputs lies with the authors and RoRI CIC. Any views expressed do not necessarily reflect those of our partners. RoRI is committed to open research as an important enabler of our mission, as set out in our Open Research Policy. Any errors or omissions remain our own.

CREDIT Authorship Statement

Conceptualization: Alexander Rushforth, Gunnar Sivertsen, James Wilsdon

Writing – Original Draft: Alexander Rushforth, Gunnar Sivertsen, James Wilsdon.

Writing – Review and Editing: Alexander Rushforth, Gunnar Sivertsen, James Wilsdon, Ana Arango, Adriana Bin, Catriona Firth, Claire Fraser, Nino Gogadze, Natalia Gras, Lee Harris, Jon Holm, Peter Kolarz, Moumita Koley, Jorge Maldonado, Marie-Helene Nienaltowski, Laura Rovelli, Sergio Salles-Filho, Scipione Sarlo, Nerina Sarthou, Arne Sjostedt, Federico Vasen, Nicole Ward-Boot, Marta Wroblewska, Fang Xu, and Lin Zhang.

Country-specific desk-work performed by: Alexander Rushforth, Gunnar Sivertsen, James Wilsdon, Ana Arango, Adriana Bin, Catriona Firth, Claire Fraser, Nino Gogadze, Natalia Gras, Lee Harris, Jon Holm, Peter Kolarz, Moumita Koley, Jorge Maldonado, Marie-Helene Nienaltowski, Laura Rovelli, Sergio Salles-Filho, Scipione Sarlo, Nerina Sarthou, Arne Sjostedt, Federico Vasen, Nicole Ward-Boot, Marta Wroblewska, Fang Xu, and Lin Zhang.

Supervision: Alexander Rushforth and James Wilsdon.

Openness statement

This is a desk-based review and does not involve empirical data collection. However, in the resource-sharing spirit of open science, further details provided by co-authors – particularly in classifying national assessment systems with the typology and reporting developments over time – is available as a supplementary file on Figshare.

In addition, the Atlas of Assessment observatory aims to serve as a ‘living document’ of national assessment systems. Curated up-to-date information contained in the observatory is publicly available for download from: https://researchonresearch.org/atlas/about-agorra/

References

Aagaard, Kaare. 2017. ‘The Evolution of a National Research Funding System: Transformative Change through Layering and Displacement’. Minerva 55 (3): 279–97.

Abramo, Giovanni. (2024). The forced battle between peer-review and scientometric research assessment: Why the CoARA initiative is unsound, Research Evaluation, 2024; rvae021, https://doi.org/10.1093/reseval/rvae021

ACOLA & Office of the Chief Scientist. 2023. Research Assessment in Australia: Evidence for Modernisation. https://www.chiefscientist.gov.au/ResearchAssessment

Adams, Jonathan, Ryan Beardsley, Lutz Bornmann, Jonathan Grant, Martin Szomszor, and Kate Williams. 2022. ‘Research Assessment: Origins, Evolution, Outcomes’. Clarivate. https://clarivate.com/academia-government/wp-content/uploads/sites/3/dlm_uploads/XBU968048850-ISI-Research-Assessment-Report-v5b-Spreads.pdf.

Barcelona Declaration. 2024. ‘Barcelona Declaration on Open Research Information’. 2024.

Beigel, Fernanda. 2025. ‘The Transformative Relation between Publishers and Editors: Research Quality and Academic Autonomy at Stake’. Quantitative Science Studies, 1–17.

Benamara, Abdelmajid;, Ahmed; Fahal, Alicia; Kowaltowski, Anh-Khoi; Trinh, Anne; Cody, Catriona; Firth, and et al. 2024. ‘Dimensions of Responsible Research Assessment (Full Report and Summary)’. Figshare: Online Resource. https://doi.org/10.6084/m9.figshare.26064223.v3.

Bozeman, Barry, and Daniel Sarewitz. 2011. ‘Public Value Mapping and Science Policy Evaluation’. Minerva 49:1–23.

Braun, Dietmar. 1998. ‘The Role of Funding Agencies in the Cognitive Development of Science’. Research Policy 27 (8): 807–21.

Braun, Dietmar, and David H. Guston. 2003. ‘Principal-Agent Theory and Research Policy: An Introduction’. Science and Public Policy 30 (5): 302–8

Bush, Vannevar. 1945. Science The Endless Frontier: A Report to the President by Vannevar Bush, Director of the Office of Scientific Research and Development, July 1945. United States Government Printing Office, Washington DC

Butler, Linda. 2007. ‘Assessing University Research: A Plea for a Balanced Approach’. Science and Public Policy 34 (8): 565–74.

Capano, Giliberto. 2023. ‘Ideas and Instruments in Public Research Funding’. In Handbook of Public Funding of Research, 73–89. Cheltenham: Edward Elgar Publishing.

Carson, Marcus, Tom R. Burns, and Dolores Calvo. 2009. ‘Introduction’. In Paradigms in Public Policy: Theory and Practice of Paradigm Shifts in the EU, edited by Marcus Carson, Tom R. Burns, and Dolores Calvo, 11–28. Berlin: Peter Lang.

CLACSO. 2022. ‘Declaration of Principles’. Presented at the CLACSO´s XXVII General Assembly, Mexico. https://biblioteca-repositorio.clacso.edu.ar/bitstream/CLACSO/169747/1/Declaration-of-Principes.pdf.

Curry, S., S. De Rijcke, A. Hatch, Dorsamy (Gansen) Pillay, I. van der Weijden, and James Wilsdon. 2020. ‘The Changing Role of Funders in Responsible Research Assessment’.

Curry, S., E. Gadd, and James Wilsdon. 2022. ‘Harnessing the Metric Tide: Indicators, Infrastructures & Priorities for UK Responsible Research Assessment. Report of The Metric Tide Revisited Panel’.

Dahler-Larsen, P. 2014. Constitutive effects of performance indicators: Getting beyond unintended consequences. Public Management Review, 16(7), 969–986.

Debackere, Koenraad, E. Arnold, G. Sivertsen, J. Spaapen, and D. Sturn. 2018. ‘Mutual Learning Exercise: Performance-Based Funding of University Research’. Directorate-General for Research and Innovation, European Commission, B-1049 Brussels ISBN, 978–92.

Digital Science. n.d. ‘The Nature, Scale and Beneficiaries of Research Impact: An Initial Analysis of Research Excellence Framework (REF) 2014 Impact Case Studies.’ http://dera.ioe.ac.uk/22540/1/Analysis_of_REF_impact.pdf.

DORA. 2013. ‘The Declaration’. 2013.

Elzinga, Aant. 2012. ‘Features of the Current Science Policy Regime: Viewed in Historical Perspective’. Science and Public Policy 39 (4): 416–28.

European Commission. 2012. ‘A Reinforced European Research Area Partnership for Excellence and Growth’.

European Commission. 2024. ‘Action Plan by the Commission to implement the ten commitments of the Agreement on Reforming Research Assessment (ARRA)’.

Expert Review Panel. n.d. ‘VQR 2015-2019’. https://www.anvur.it/sites/default/files/2025-02/Expert-Review-Panel_Report-on-VQR-2015-2019.pdf.

Galderisi, Claudio, Mauro Perretti, Nura Galles, and Thed N van Leeuwen. 2019. ‘Report of the Group of Experts Charged by ANVUR to Advice on the Process “Valutazione Della Qualità Della Ricerca (VQR)”’. https://www.anvur.it/sites/default/files/2025-02/High-Experts-Report-on-VQR.pdf.

Geuna, Aldo, and Ben R. Martin. 2003. ‘University Research Evaluation and Funding: An International Comparison’. Minerva 41 (4): 277–304.

Gras, Natalia. n.d. ‘Forms of Research Assessment Oriented at Development Problems. Practices and Perspectives from National Science and Technology organisations and Higher Education Institutions in Latin America and the Caribbean.’ https://doi.org/10.5281/zenodo.6607850.

Guillet, Sophie. 2025. ‘External Quality Assurance of Research in Higher Education Institutions. Taking Stock of the Practices of European Quality Assurance Agencies’. European Association for Quality Assurance in Higher Education. https://www.enqa.eu/wp-content/uploads/ENQA-QA-of-R-report.pdf.

Hall, Peter. 1993. ‘Policy Paradigms, Social Learning, and the State: The Case of Economic Policymaking in Britain’ Comparative Politics, 25 (3): 275-296. http://www.jstor.org/stable/422246

Henriques, Luisa, and Philippe Larédo. 2013. ‘Policy-Making in Science Policy: The “OECD Model” Unveiled’. Research Policy 42 (3): 801–16.

Hicks, Diana. 2012. ‘Performance-Based University Research Funding Systems’. Research Policy 41 (2): 251–61.

Hicks, Diana, Paul Wouters, Ludo Waltman, Sarah De Rijcke, and Ismael Rafols. 2015. ‘Bibliometrics: The Leiden Manifesto for Research Metrics’. Nature 520 (7548): 429–31.

IAP-GYA-ISC. 2023. ‘The Future of Research Evaluation’. Paris: Center for Science Futures.

Inglis, Matthew, Elizabeth Gadd, and Elizabeth Stokoe. 2024. ‘What Is a High-Quality Research Environment?

Evidence from the UK’s Research Excellence Framework’. Research Evaluation. https://doi.org/10.1093/reseval/rvae010.

Jones, Richard and Wilsdon, James. (2018). The biomedical bubble: Why UK research and innovation needs a greater diversity of priorities, politics, places and people. London, Nesta.

Jong, Lisette; Franssen, Thomas; Pinfield, Stephen (2021). ‘Excellence’ in the Research Ecosystem: A Literature Review. (RoRI Working Paper No. 5). Research on Research Institute. Report. https://doi.org/10.6084/m9.figshare.16669834.v2

Jongbloed, Ben, and Harry de Boer. 2020. ‘Performance Agreements in Denmark, Ontario and the Netherlands’.

Kuhn, Thomas. 1962. The Structure of Scientific Revolutions. pp. 54.University of Chicago Press: Chicago Lepori, Benedetto, Peter van den Besselaar, Michael Dinges, Bianca Potì, Emanuela Reale, Stig Slipersæter, Jean Thèves, and Barend van der Meulen. 2007. ‘Comparing the Evolution of National Research Policies: What Patterns of Change?’ Science and Public Policy 34 (6): 372–88. https://doi.org/10.3152/030234207X234578.

Leslie, Ian (2025). ‘Notes on the Great Vibe Shift’. Substack, 18 January 2025. https://www.ian- leslie.com/p/notes-on-the-great-vibe-shift

Martin, Ben R. 2011. ‘The Research Excellence Framework and the “Impact Agenda”: Are We Creating a Frankenstein Monster?’ Research Evaluation 20 (3): 247–54.

Ochsner, Michael, Emanuel Kulczycki, Aldis Gedutis, and Ginevra Peruginelli. 2020. ‘National Research Evaluation Systems’. In Handbook Bibliometrics, edited by Rafael Ball, 99–106. De Gruyter Saur. https://doi.org/10.1515/9783110646610-011.

REF. (2029). Research Excellence Framework. https://2029.ref.ac.uk/?detectflash=false

Rijcke, Sarah de, Paul Wouters, Alex Rushforth, Thomas Franssen, and Björn Hammarfelt. 2016. ‘Evaluation Practices and Effects of Indicator Use—a Literature Review’. Research Evaluation 25 (2): 161–69.

Rip, Arie. 2000. ‘Fashions, Lock-Ins and the Heterogeneity of Knowledge Production’. In The Future of Knowledge Production in the Academy, 28–39. Society for Research into Higher Education and Open Universi.

———. 2004. ‘Strategic Research, Post-Modern Universities and Research Training’. Higher Education Policy 17:153–66.

Rushforth, Alexander, and Björn Hammarfelt. 2023. ‘The Rise of Responsible Metrics as a Professional Reform Movement: A Collective Action Frames Account’. Quantitative Science Studies 4 (4): 879–97. https://doi.org/10.1162/qss_a_00280.

Salmi, Jamil, and Arthur Hauptman. 2006. ‘Innovations in Tertiary Education Financing: A Comparative Evaluation of Allocation Mechanisms’. World Bank, Education Working Paper Series Number 4. Scholten, Wout, Thomas P. Franssen, Leonie van Drooge, Sarah de Rijcke, and Laurens K. Hessels. 2021. ‘Funding for Few, Anticipation among All: Effects of Excellence Funding on Academic Research Groups’. Science and Public Policy 48 (2): 265–75.

Sivertsen, Gunnar. 2018. ‘The Norwegian Model in Norway’. Journal of Data and Information Science 3 (4): 3– 19.

———. 2023. ‘Performance-Based Research Funding and Its Impacts on Research organisations’. In Handbook of Public Funding of Research, 90–106. Edward Elgar Publishing.

Sivertsen, Gunnar, and Alexander Rushforth. 2024. ‘The Ongoing Reform of Research Assessment’. In Challenges in Research Policy, edited by Gunnar Sivertsen and Liv Langfeldt. Dordrecht: Springer.

Stern, Lord Nicholas. 2016. ‘Building on Success and Learning from Experience. An Independent Review of the Research Excellence Framework’. London: Department for Business, Energy and Industrial Strategy. https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/541338/ind-16-9-ref-stern-review.pdf.

Thomas, Duncan A., Maria Nedeva, Mayra M. Tirado, and Merle Jacob. 2020. ‘Changing Research on Research Evaluation: A Critical Literature Review to Revisit the Agenda’. Research Evaluation 29 (3): 275–88.

UNESCO. 2021. ‘Recommendation on Open Science’.

Vasen, Federico, Nerina F. Sarthou, Silvina A. Romano, Brenda D. Gutiérrez, and Manuel Pintos. 2023. ‘Turning Academics into Researchers: The Development of National Researcher Categorization Systems in Latin America’. Research Evaluation 32 (2): 244–55.

Wagner, Caroline S, Irene Brahmakulam, Brian Jackson, Anny Wong, and Tatsuro Yoda. 2001. ‘Science and Technology Collaboration: Building Capacity in Developing Countries: Prepared for the World Bank’. RAND Corporation.

Whitley, Richard. 2000. The Intellectual and Social organisation of the Sciences. Oxford: OUP.

———. 2007. ‘Changing Governance of the Public Sciences’. In The Changing Governance of the Sciences, 3–27. Dordrecht: Springer.

———. 2019. ‘Changing Science Policies, Authority Relationships and Innovations in Public Science Systems’. In Handbook on Science and Public Policy, 204–26. Edward Elgar Publishing.

Whitley, Richard, and Jochen Gläser. 2007. The Changing Governance of the Sciences. Dordrecht: Springer.

Williams, Kate, and Jonathan Grant. 2018. ‘A Comparative Review of How the Policy and Procedures to Assess Research Impact Evolved in Australia and the UK’. Research Evaluation 27 (2): 93–105. https://doi.org/10.1093/reseval/rvx042.

Wilsdon, James. 2016. The Metric Tide: Independent Review of the Role of Metrics in Research Assessment and Management. HEFCE.

Wróblewska, Marta Natalia. 2025. ‘One size fits all? A comparative review of policy-making in the area of research impact evaluation in the UK, Poland and Norway’. Research Evaluation, Volume 34, rvaf010, https://doi.org/10.1093/reseval/rvaf010.

Xiaoxuan, L., & Fang, X. U. (2020). How to Break the ‘Siwei’?—Practice and Enlightenment Based on Research Institute Evaluation of Chinese Academy of Sciences. Bulletin of Chinese Academy of Sciences (Chinese Version), 35(12), 1431–1438. https://doi.org/10.16418/j.issn.1000-3045.20201116002

Zacharewicz, Thomas, Benedetto Lepori, Emanuela Reale, and Koen Jonkers. 2019. ‘Performance-Based Research Funding in EU Member States—a Comparative Assessment’. Science and Public Policy 46 (1): 105–15.

Zhang, L., & Sivertsen, G. (2023). The New Research Assessment Reform in China and Its Implementation. In Towards a New Research Era (pp. 239–252). Brill. https://doi.org/10.1163/9789004546035_017

Ziman, John M. 1994. Prometheus Bound. Cambridge: Cambridge University Press.

Appendix

National Research Assessment and Funding Systems

Country	Name of system(s)	Year of introduction / major change(s)	Census period
Argentina	Program for Researcher Teachers (PRINUAR, previously PROINCE)	1994	2 yrs (admission and promotion) / 4 years (permanence)
	Institutional Evaluation Program /Programa de Evaluación Institucional (PEI)	2005	Ad hoc
	CONICET Career of Scientific and Technological Researcher (CICYT)	1961	1 or 2 years according to seniority
Australia	Excellence in Research for Australia (ERA)	2010, Ceased since 2023	3 years
Australia	Engagement and Impact (EI) Assessment	2018	Not repeated
Brazil	CAPES Evaluation System for Graduate Programs	1976	4 years
Chile	National Accreditation Commission / Comisión Nacional de Acreditación (CNA)	2006	3-7 years
China	National Disciplinary Evaluation	2002	4 years
	Double First-Class Evaluation	2017	5 years
	National assessment and selection systems for elite individual researchers	1994	1 year
	Chinese Academy of Science research institute evaluation	1998	1 or 5 years
Colombia	Research Groups Assessment Model / Modelo de Medición de Grupos	1991	~2 years
	Decreto 1279 de 2002	2002	1 year
	High Quality Accreditation Model	1992	4-10 years
India	National Institutional Ranking Framework (NIRF)	2016	1 year
Italy	Evaluation of Research Quality (VQR)	2011	5 years
Mexico	National System of Researchers (SNI)	1984	Varies per seniority level
Netherlands	Strategy Evaluation Protocol (SEP), previously Standard Evaluation Protocol	1994	6 years
Norway	Evaluations of specific subjects and thematic areas	1990	10 years
Norway	Indicator-Based Funding	2004, due to cease for universities in 2025	1 year
Poland	Evaluation of Quality of Scientific Activity / Ewaluacja Jakości Działalności Naukowej (EJDN)	Early 1990s, current format run in 2020/1	4 years
	Algorithm Performance Based Funding System	Early 1990s Current format 2020	1 year
	Research University Program (IDUB)	2019	6 years
UK	Research Excellence Framework (REF), previously Research Selectivity Exercise (RSE) and Research Assessment Exercise (RAE).	1986 (RSE), 1992 (RAE), 2014 (REF)	~7-8 years

Editors

Kathryn Zeiler
Editor-in-Chief

Alex Holcombe
Handling Editor

Editorial Assessment

by Alex Holcombe

DOI: 10.70744/MetaROR.156.1.ea

All three reviewers agree that the manuscript presents a valuable and timely contribution to the literature on national research assessment and funding systems. The proposed typology, historical analysis, and forward-looking perspective are valuable, with particular appreciation of the inclusion of underrepresented countries relative to most treatments of this topic. The article is highly relevant to ongoing reform initiatives such as CoARA (the Coalition for Advancing Research Assessment). This article may be most valuable to people involved in research assessment across the globe, in part because of its clear description of the range of research assessment systems. The three paradigms section is also recognized by one reviewer as providing valuable insights. The suggested avenues for improvement include the addition of methodological details and graphics, clarification of terminology and concepts, better integration of the typology with the historical analysis, and more accurate representation and interpretation of CoARA. This article is described by its authors as a “working paper” and might also be described as a whitepaper in terms of its formatting, its value to a policymaker audience, and its focus on description. This working paper is not the type of scholarship that focuses on evidence with a view toward convincing skeptics or making new discoveries, it is more a descriptive and narrative overview of the types of and history of assessment systems.

More than one of the reviewers is directly involved with CoARA, which is one of the topics of the article. Regarding competing interests, it should be noted that multiple reviewers are involved with CoARA and one acknowledges previously working with the authors. The reviews should perhaps be understood in part as a useful dialogue among people heavily invested in reforming research assessment, including representatives of COARA with these authors.

This editorial assessment was prepared with some assistance from the Microsoft Copilot LLM.

Competing interests: The Editor-in-Chief and Editor have no competing interests with the following exception: One co-author, James Wilsdon, is a co-founder and editor of MetaROR.

Peer Review 1

Elizabeth Gadd

DOI: 10.70744/MetaROR.156.1.rv1

This working paper combines three contributions to the literature based on a study of thirteen countries’ assessment systems between 2010 and 2024: i) a proposed typology for categorising and comparing national research assessment systems; ii) a characterisation of some paradigmatic shifts affecting these systems over time; and iii) a view as to how assessment systems might respond to the responsible research assessment movement in future.

The paper offers greater global coverage of national assessment mechanisms than previous studies which is to be commended. The research questions posed were also legitimate and helpful in steering the design of the typology.

The definition of national research assessment systems is taken from Whitley (2007) and defined as “organized sets of procedures for assessing the merits of research undertaken in publicly funded organisations that are implemented on a regular basis, usually by state or state-delegated agencies”. The word ‘usually’ casts some uncertainty over what might be classified as a national research assessment system, given this role is often performed by non-state actors such as university rankings, or professional bodies for programme accreditation.

The criteria for inclusion in the analysis was adapted from Hicks (2012) and specified that “research outputs must be included in some way”. To my mind this is an unnecessary inclusion given the work of the responsible research assessment movement (on which they later report) to broaden the diversity of research contributions on which assessment is based. The inclusion of outputs as a necessary prerequisite for consideration as a national research assessment system exacerbates an unhelpful stereotype.

Section 1.1 Typology of national research assessment and funding systems

This is a useful set of considerations informing the development of a typology, however, I felt there may be some issues in the design of the typology as follows:

Under Section 1 (Assigned Purpose) combing terms such as ‘funding allocation and reputation’, and ‘statistics and overview activity’ can introduce confusion into the scheme if some systems address one of the stated purposes and not the other. This combination has necessitated the introduction of a weighting system (more funding, less reputation) that would not otherwise be needed, or at least could be applied across all categories in this section, not just one element.
The term ‘promotion of individuals’ precludes other forms of individual level assessment (appraisal, reward) and should be broadened out given one of the examples relates to monthly rewards.
The lettering in this section (a,b,c,d,d,e,e) needs correcting.
Section 3 (Focus of the assessment) could use a broader term such as ‘external research income’ rather than competitive grants, to capture as wide a range of forms of third-party income as possible.
Section 4 (Effects on funding and reputation) felt like an odd category which might be addressed by separating out funding and reputation in Section 1. Again the categories here (Funding & Reputation/Only Reputation/Other Significant Effects) seemed to be missing an ‘Only Funding’ category. The examples provided under ‘Other’ included strategic development and learning and individual level assessments, both of which are also listed under purposes further leading to the conclusion that this section might be unnecessary.
Section 5 (Methods) loosely refers to peer review and ‘statistics’ but it’s not clear how a user of the typology would apply this section or what the categories actually are.
Section 6 (Types of performance-based institutional funding) feels like it should be a subcategory of either s1 or s4 (if retained) as it wouldn’t apply to all systems. In this section, the use of the term “evaluation-based funding” to mean the use of peer review and expert panels feels imprecise and is exacerbated by the statement that “Indicators may also have an important role in informing evaluation-based funding systems.”
Arguably, space should be made in Section 6 for the concept of dialogue-based funding systems, as introduced in the final section. There are many strong arguments for such approaches, and they are no less national research funding systems than any other except that they would be harder to capture in a typology of this nature.
Under 8 (Governance) the authors make an important point that “the presence or absence of meta-evaluations of assessment and funding systems serves as another indicator of accountability to the research community and other stakeholders.” However, it is not clear whether this is to be another category in the typology, or just a general comment.

Overall, I should like to have seen this section as a separate article, with a much clearer articulation of the typology so that it would be ready-to-implement by a third-party. I should also have like to have seen visualisations of the thirteen countries’ systems across the categories proposed by the typology to more easily identify regional characteristics and possible trends over time.

Section 1.2 A framework for understanding ongoing developments in national research assessment and funding systems

This is an interesting and insightful section which partially, but not wholly, draws on the typology work, to identify three paradigms which national research assessment systems have adopted over time. It would have unified the paper to a greater degree if the three paradigms had been exemplified by different characteristics captured by the typology.

There is one error that should be corrected, namely, the characterisation of CoARA as focusing mainly “on individual level assessments for recruitment, promotion and external project funding”. The second sentence of the Agreement on Reforming Research Assessment states “Our vision is that the assessment of research, researchers and research organisations recognises the diverse outputs, practices and activities that maximise the quality and impact of research” (emphasis mine). CoARA membership categories include funding organisations which assess research organisations, and many of those mentioned in this paper (UKRI, ANVUR, etc.,) are signatories. If there is a perceived focus on individual-level assessments in CoARA, it is likely due to the fact that the vast majority of members are not in a position to assess RPOs and therefore a lot of the reported activity does not address this.

In summary: the paper conceptualises itself as a working paper, and I think that’s a fair assessment. The typology is not yet ready to apply by a third-party and there are clarifications that could helpfully be made. The observations regarding the three paradigms of national research assessment systems and their possible future trajectory are informed and interesting and make a novel and valuable contribution to the research assessment literature. The authors may want to consider separating the two elements out into separate papers.

Competing interests: I have worked with a number of the authors of this paper on various formal and informal projects.

Peer Review 2

Karen Stroobants

DOI: 10.70744/MetaROR.156.1.rv2

This manuscript presents a valuable and timely contribution to the comparative study of national research assessment and funding systems. By examining thirteen countries, the authors develop a typology for comparing national research assessment and funding systems worldwide; identify three dynamic and interacting research performance paradigms; and explore potential trajectories for the next decade. The inclusion of countries from the Global South, that are often underrepresented in such comparative analyses, adds significant depth and relevance to the study.

The paper enhances our understanding of the complex and evolving landscape of research assessment globally. The conceptual framework and typology proposed are promising tools for both analysis and practical application, and will be useful to policy makers and funders, as well as to champions of reform initiatives in understanding the contexts in which they work towards change.

General comments

1. Methodology – country selection

While the inclusion criteria are mentioned, the rationale behind the final selection of the thirteen countries is not entirely clear. Further clarification on the selection process, including why no systems with funding based on performance agreements were included, would increase the transparency of the study.

2. Terminology – conceptual consistency

Some terms within the typology are elaborated (e.g., “societal interaction” under “Focus of assessment”), while others (e.g., “research culture”) are not. Given the international audience, the authors should consider reviewing where additional explanation of terms would aid comprehension, especially for concepts that may be familiar in some systems but not others.

3. Effects on, and incentivisation of, behaviours

The typology sections on “Effects on funding and reputation” and “Assigned purpose” give limited attention to the behavioural effects of research assessment systems. While the manuscript references “levers for inducing behaviour and culture change” (p. 22), a more explicit reflection on how assessment systems incentivise, affect or shape behaviour would be a valuable addition to the typology.

Specific comments

CoARA representation – page 17

The manuscript currently lists only research funding organisations, research performing organisations, and assessment authorities as CoARA participants. However, CoARA also includes learned societies, researcher organisations, and associations of these groups as well as those already mentioned. This broader representation should be acknowledged for accuracy.

Interpretation of CoARA’s focus – page 18

“The reason may be that national systems for assessment and funding of research organisations are more rooted in the ‘larger’ and more different national traditions…”

This interpretation of CoARA’s focus on individual-level assessment may be too narrow, especially given its intention. Several factors likely contributed to this emphasis, including:

The complexity of addressing multiple assessment levels in the ARRA (drafts of which initially included tables addressing levels of assessment, that were removed as perceived too complex).

The perception that dissatisfaction with assessment practices is most acute at the individual level (while acknowledging the link with other levels), which might have resulted in a stronger focus on this in the ‘simplified’ ARRA.

The nature of initial Working Group submissions, many of which address researcher-level assessment, linked to the perceived urgency of addressing this level.

A more comprehensive explanation would better reflect the more nuanced range of reasons, including but not limited to the one currently stated in the manuscript.

Reform opportunities and challenges – page 22

“Some universities also sign up to these agreements, without the national assessment system agency doing so, as is the case in Poland…”

This section could be reframed to better reflect the nuanced reform pathways that exist even in the absence of national-level engagement. For example, in the case of CoARA, institutional commitments have served as signals of discontent and catalysts for national-level dialogue, sometimes leading to the formation of National Chapters. The authors might consider focusing more on the challenge of misalignment between institutional and national levels, while acknowledging the reform opportunities that can still emerge.

Nuancing the view on indicator-based systems – page 23

“Ostensibly, statements such as CoARA’s Agreement may appear sceptical towards what our typology terms indicator-based funding systems…”

This statement could be further nuanced in light of reflections from the CoARA Steering Board (e.g., LSE Impact Blog, 2024). The blog highlights that while indicator-based systems are problematic at the individual level, they may be more appropriate at higher levels of aggregation. Incorporating this perspective would provide a more balanced interpretation of CoARA’s stance.

Recommendation

Minor Revisions

The manuscript is well-conceived and makes a significant contribution. I recommend minor revisions, primarily to enhance clarity, accuracy, and nuance in the areas outlined above.

Competing interests: None.

Peer Review 3

Neil Jacobs

DOI: 10.70744/MetaROR.156.1.rv3

In general this is a useful paper that sets out a novel typology of national assessment systems and a historical review, and forward look, of their evolution.

The paper would benefit from a methodology section, to explain exactly why the typology was developed, how it was derived, how it was validated, who influenced it, how that influence was managed and governed, whether it will be maintained, etc. At present there is a lack of transparency.

More justification could be provided for the particular features of the assessment systems that are included in the typology, perhaps by reference to who the typology is for.

The two aims of the paper could be considerably better integrated. While both the typology and the historical review are useful in themselves, it is not always very clear how the former has made a difference to the latter. For example, have there been differences between different types of assessment system and their adoption of, or resistance to, the three paradigms presented in the historical review? At present the paper emphasises description at the expense of explanation.

The paper does a good job of abstracting assessment systems from their national contexts, for example the wider national research and innovation ecosystems with their histories, cultures, socio-economic and political factors and institutional architectures. These are only occasionally invoked when they help to explain some aspect of the assessment system. However, again, because the paper is mainly descriptive rather than explanatory, this can leave the reader with little understanding of why those sponsoring or running assessment systems choose to do so in the particular ways captured in the typology.

Some potentially contestable statements are simply given with little justification eg: on the purpose of assessment systems (p6): “where these purposes are combined, the relative importance of each can be weighted.” Is that always the case? And weighted by whom, how and why?

Competing interests: I work on a project that is funded by Research England, which also employs one of the authors.

Cite