Published at MetaROR

June 5, 2026

Table of contents

Cite this article as:

Shanahan, H., & Bezuidenhout, L. (2026). The localisation and accessibility of preprint services: implications for Open Science. In MetaRoR (1.0). Zenodo. https://doi.org/10.5281/zenodo.18621116

The localisation and accessibility of preprint services: implications for Open Science

Hugh Shanahan1, Louise Bezuidenhout2

1 Department of Computer Science, Royal Holloway, University of London
2 CWTS (Center for Science and Technology Studies), Leiden University, Leiden, Netherlands

Originally published on February 12, 2026 at: 

Abstract

Preprint services now play a key role in disseminating research across a wide range of domains. In this paper we examine where a set of 64 preprint services, collated by ASAPbio, are being physically hosted and the type of Internet Service Provider (ISP) they are using. In addition to this access to these services from 106 territories was simulated using Virtual Private Networks (VPNs). We find that the majority of services (47/64) are physically based in the USA, despite instances where the Top Level Domain (TLD) indicates a different country. In addition to this 56/64 of the services are being hosted by commercial ISPs with 43/64 being hosted by Cloudflare, Google LLC or Amazon. Poorer countries are more likely to encounter an error when attempting a URL corresponding to the landing page of one of the preprint services collated by ASAPbio than more wealthy countries. Sites that are physically located in Low or Middle Income Countries have similar accessibilities and may be better. We draw some overall conclusions about the potential frailty of these services with respect to the technical decisions made here.

Introduction

Rising adoption and normalization of preprints

In academic publishing, a preprint is a version of a scholarly or scientific paper that precedes formal peer review and publication in a peer-reviewed scholarly or scientific journal. While the practice of sharing pre-publication copies of articles goes back to at least the 1960s, when the National Institutes of Health circulated biological preprints, the first digital preprint database is widely accepted to be arXiv. Since its establishment in 1991, preprints have increasingly been distributed electronically on the internet, giving rise to a growing number of pre-publication articles in preprint, generalist and institutional repositories.

A recent large-scale bibliometric study (1991–2023) demonstrated that the number of preprints has almost tripled between 2017 and 2022 (Rzayeva et al., 2025). Nonetheless, the adoption of preprinting varies across both disciplines and geographic regions. The adoption of preprinting is highest in the physical and mathematical sciences, particularly among researchers in the Americas and Europe. In recent years, preprinting has also increased notably in the information and computing sciences and the life and medical sciences, driven primarily by researchers in North America and Western and Northern Europe (Rzayeva et al., 2025). Furthermore, the global health crisis around COVID-19 significantly accelerated preprint adoption outside traditional fields: preprints became key for rapid dissemination of urgent medical research (Fraser et al., 2021). This has led a number of global funders, such as the Gates Foundation, to recognise the importance of preprints as an integral element of scholarly communication.[1]

Integration of preprinting into scholarly workflows

Over the past decade many journals that previously forbade preprinted submissions changed their policies to accommodate — and often encourage — manuscripts previously posted as preprints (Alves, 2023). The increasing legitimisation of preprinting has led to the evolution of new publishing models, such as overlay journals and Publish-Review-Curate (PRC) scholarly communication models.[2] This has been supported by significant investment in the technical infrastructure underpinning preprinting, including workflows, transfer protocols (like the Manuscript Exchange Common Approach (MECA), and better metadata practices have helped integrate preprints more seamlessly into publishing pipelines (Alves, 2023). Some preprint servers/journals now allow or encourage posting post-review versions, review reports, and supplementary materials (data, code), enhancing transparency and reproducibility.

Infrastructure improvements support this integration of preprinting into scholarly workflows. preprints increasingly get DOIs, author-ID systems (such as ORCID) now allow researchers to list preprints as part of their scholarly outputs, and indexing services like Google Scholar include preprints. In many disciplines, preprints are no longer just early drafts to “reserve priority.” They increasingly function as legitimate, citable contributions that influence the discourse — sometimes regardless of whether they become peer-reviewed journal articles (Teixeira da Silva, 2017).

Preprinting as a response to equity and accessibility

Preprints are regularly used as an example of how Open Science can improve free access to research. Key documents, such as the UNESCO Recommendation on Open Science, encourage Open Science from the very start of research and cite preprints explicitly as an example of “innovative models” for early dissemination, while insisting that they be clearly distinguished from final peer-reviewed publications (UNESCO, 2022).

The potential of preprinting is positioned as key for improving the transparency and speed of research by offering a fast, free, and accessible way to disseminate research and claim priority. This is thought to lower barriers to scholarly communication, especially for those with limited resources. Preprinting is also thought to democratize access, foster collaboration, and accelerate knowledge flow. They also enable more transparent peer review and versioning, while also lowering costs for authors compared to high-priced open-access journals.

A recent dataset connecting preprints (from, e.g., bioRxiv) to published articles shows there are thousands of preprints that never map to a journal publication. This has raised questions about curation, quality control, and stability of the scientific record (Badalova et al., 2025). Moreover, preprints also present new challenges — in quality control, citation inequality, version tracking, and potential commercial consolidation of infrastructure. This raises important questions for Open Science, namely whether the claims about the transformative nature of preprinting are being realised by the ongoing changes in how scholarly outputs are being communicated around the world.

Equity in Open Science infrastructures

Recent scholarship on the Open Science movement have highlighted the need to critically investigate the design, governance, location and deployment of the digital infrastructures underpinning open scholarship. In particular, this scholarship highlights core issues that influence the accessibility and usability of these infrastructures, including long-term sustainability challenges, the influence of geopolitics on user communities and the challenges experienced by scholars in low-resourced settings when engaging with these resources.

In the last decade there has been growing interest in mapping digital resources to understand the evolution of the Open Science ecosystem. A study by Kramer and Bosman (Kramer and Bosman, 2016)[3] outlined the evolution of Open Science workflows and detailed the interconnection between commercial, community-led and institutionally-supported tools within Open Science. Similar activities such as the Joint Roadmap for Open Science Tools,[4] the Research Data Alliance group Mapping the Landscape of Digital Research Tools[5] and Invest in Open Infrastructure InfraFinder tool[6] all provide extensive lists of digital tools supporting Open Science.

In contrast, a recent study by Bezuidenhout and Havemann (Bezuidenhout and Havemann, 2021) built on the existing tool maps to examine the geographical location of digital Open Science tools. This study illustrated the significant dominance of tools located in and funded by a small number of high-income countries (HICs) such as the US, UK and EU. This study raised questions of design and access biases due to these digital tools being designed in, for and by researchers in HICs in ways that could marginalise users from low/middle-income countries (LMICs).

These concerns were further supported by a study from Shanahan and Bezuidenhout (Shanahan and Bezuidenhout, 2022)that examined the success of access requests to digital Open Science tools and repositories from 14 LMICs and HICs. The study clearly showed not only lowered access request success from the LMICs, but also evidence of geoblocking against access requests from specific countries. This study challenged the implicit assumption that access to Open Science tools and repositories are independent of where the user is geographically located.

Further studies (Bezuidenhout and Havemann, 2021; Gregory et al., 2025) have examined the sustainability of Open Science tools and knowledge infrastructures. These studies draw attention to the high level of heterogeneity in funding, significant (over)reliance on volunteer labour and often poorly articulated long-term sustainability planning. Together, these issues introduce significant vulnerabilities in the Open Science landscape as key infrastructures and connections between infrastructures may be subject to rapid change.

The studies discussed above focus broadly on Open Science tools and infrastructures and not explicitly on preprint repositories. Nonetheless, the lessons learned from these studies are as pertinent for Open Access as for other domains of Open Science. Given the rapid rise of preprinting practices within research, it is perhaps surprising that the structure of preprint repositories has not been critically scrutinised. A brief comparison of 8 prominent preprint repositories (table 1) illustrates not only how varied their design and governance are, but also the presence of both commercial and non-commercial actors in the field. The recognition of the heterogeneity of preprint repository structures illustrates the importance of critically interrogating preprinting. It should raise questions as to whether the infrastructures and practices of preprinting are capable of realising the promises made of preprinting supporting global inclusive and equitable research.

Table 1. Comparison of general features of prominent preprint repositories
Repository Platform/Software Governance/Ownership PID & Versioning Screening/Moderation APIs & Metadata Preservation Publisher Integration Funding Model
arXiv Custom in-house platform Cornell University (non-profit; community-governed) arXiv IDs; versioned (v1,v2,…); limited DOI use Human moderators; endorsement system arXiv API; OAI-PMH; bulk data dumps Long-term preservation via Cornell; widely mirrored Extensive informal integration; overlay journals Institutional support + sponsorships/donations
bioRxiv / medRxiv (openRxiv) Custom platform (CSHL) transitioning to openRxiv Community-governed nonprofit (CSHL + partners) DOIs (10.1101); versioning supported Editorial screening; ethics & public-health checks Metadata exposure; links to published articles Archiving arrangements; openRxiv sustainability focus Strong links to life-science publishers Free to authors; sponsor & grant funded
OSF Preprints Open Science Framework (open source) Center for Open Science (non-profit) DOIs minted on publication; versioning Basic moderation; community comments OSF API; metadata harvesting Preserved within OSF infrastructure Links to OSF projects and workflows Grant- and donation-funded
Zenodo InvenioRDM (CERN) CERN & OpenAIRE (non-profit) DataCite DOIs; versioning supported Minimal checks (scope & legality) REST APIs; OAI-PMH; DataCite metadata Strong long-term preservation via CERN Used by publishers and EU projects EU/CERN funding; free to authors
Research Square Proprietary platform Springer Nature (commercial) DOIs; versioning supported Editorial screening; optional paid services APIs; publisher-centric metadata access Corporate archiving policies Deep integration with Springer Nature journals Commercial; optional author fees
SSRN Proprietary platform Elsevier (commercial) Platform IDs; mixed DOI practices Subject screening & moderation Metadata APIs/search export Elsevier-managed preservation Strong publisher & journal links Commercial (Elsevier-supported)
ChemRxiv Cambridge Open Engage Chemical societies & Cambridge University Press DOIs; versioning supported Subject-specific safety screening Metadata feeds; API via host Host & society agreements Integration with chemistry journals Society & publisher supported
Community servers OSF or community-hosted platforms Community-led or institutional Typically DOIs; versioning varies Light moderation OSF/OAI-PMH/REST APIs Varies by host institution Some overlay journal use Grants, institutions, volunteers

Methods

The research presented in this paper interrogated the preprint servers listed on ASAPBio. The research interrogated various aspects of these preprint servers, including

  • Where are they physically based?
  • What internet service providers (ISPs) are used to provide their services?
  • How accessible are they to access requests from different countries?
  • How do different country hosts compare with each other?

Testing accessibility

The same method described in (Shanahan and Bezuidenhout, 2022) was used to test the accessibility of the preprint services for user requests from different countries. In summary, VPNs provided by the Bright Foundation were used to represent access from 106 countries. For each country a python  script attempted to download the landing page of 64 preprint services, collated by ASAPbio (Polka, 2023). If the download was successful, then the html was stored. If it failed then the error code was stored. A timeout of 10 s was applied. The run was repeated 10 times for each country to understand any variability in errors such as timeouts. The analysis was carried out in April 2024 to determine how access to these servers have changed.

Analysis of preprint service provision

In order to understand the providers and location of the preprint services the URL of each of the services were run through a geolocation tool (https://www.iplocation.net/ip-lookup). This service collates the output of a number of other similar services to determine a consensus of the country location of an IP address for a given URL. In each case, the country the provider for the preprint service was recorded as well as the ISP.

Results

The results of the study critique the evolution of the preprint landscape by asking the question: does the design of preprint services truly ensure that open preprints are globally accessible in perpetuity. The results are presented in two sections. The first section looks at the hosting and location of the preprint services with the aim of understanding where and by whom preprints are being stored. Understanding the geographic distribution of preprint services and the commercial actors underpinning the landscape enable critical questioning of the vulnerability of preprinting to geopolitics and market capitalism.

The second section of the results interrogates the accessibility of open preprints to users around the world. This section questions whether preprinting in its current format, which has been designed with an assumption of access to high-bandwidth connectivity,  is truly the fully open alternative to the current commercial publishing landscape. The uncritical adoption of the “easiest hosting options” rather than those best suited for low bandwidth settings continue to dominate the establishment of preprint services and raise questions.

Hosting and location of preprint services

The Top Level Domains (TLDs) of a URL give an initial indication of the physical location of the repository or else describe the status of the organisation (e.g. .com indicates that it is a commercial organisation). As illustrated below the TLDs are not however a final indication of the physical location. We have chosen this to demonstrate ambiguity between the geographic scope of the repository and the TLD.  It also demonstrates the prevalence of commercially run services in the preprint landscape.

The TLDs of each URL was extracted from the list of landing pages that were collated by ASAPbio. The incidence of the TLDs are plotted in figure 1. These are heavily dominated by the .org TLD, followed by .com, .io and .net, none of which have a specified national location. There is a tail of mostly national TLDs. As the preprint services are dominated by TLDs with no identifiable geographic location and those labelled with a national TLD may not be based in that country, we need a separate analysis to determine the actual geographic location of these URLs.

Figure 1. Incidence of TLDs from the ASAPbio preprint server list

As noted in the methods, the use of an IP tool allows one to determine the ISP for each of the URLs and their geographic location. This was used to disambiguate the results from figure 1 to determine the actual location of each preprint service. Utilising the IP location tool, figure 2 illustrates the incidence of the country location of the IP addresses for the preprint services. This is heavily dominated by sites based in the USA with only 17 out of  64 sites located elsewhere.

The results of Figure 2 illustrate not only the unequal geographic spread of preprint service hosting, but also that the stated geographic scope of the preprint service and the TDL cannot be taken as indicative of geographic hosting. To demonstrate that national TLDs do not necessarily correspond to the location of the services, two URLs, https://rinarxiv.lipi.go.id and https://www.bioblast.at/index.php/MitoFit_Preprint_Archives ostensibly Indonesian and Austrian, both link to IP addresses physically located in the USA.

Figure 2. Incidence of location of IP addresses for preprint servers

As noted in figure 1 the TLDs also introduce considerable ambiguity with respect to the commercial status of the hosting ISP. The distribution of the ISPs supporting the ASAPbio preprint services is shown in figure 3 and table 2 (showing the full names of the ISPs). Three commercial providers, Cloudflare, Google LLC and Amazon dominate the provision of services with 43 of the 64 services being provided by them.

Figure 3. Incidence of ISPs hosting preprint services.

 

Table 2. Names of ISPs and the number of preprint services that are hosted by them.
ISP Number of preprint services hosted by this ISP
Cloudflare 26
Google LLC 11
Amazon 6
CERN – European Organization for Nuclear Research 2
The Indian Institute of Horticultural Research (IIHR) Bangalore 1
Fasthosts Internet Limited 1
Namecheap Inc. 1
Fastly Inc 1
GitHub Inc. 1
Computer Network Information Center of Chinese Academy of Sciences 1
Microsoft 1
Digital Network JSC 1
One.com A/S 1
University of Pittsburgh 1
Gossamer Threads Inc. 1
Akamai 1
BT 1
China Education and Research Network 1
Fundação de Amparo à Pesquisa do Estado São Paulo 1
Automattic Inc 1
HostGator.com LLC 1
Next Dimension Inc 1
Japan Science and Technology Agency 1

Figure 3 illustrates that the hosting of ASAPBio preprint services  is dominated by three commercial companies, namely Cloudflare, Google LLC and Amazon (which will be referred to collectively as CGA). Despite these companies having data centres in different geographic locations, Figure 2 indicates that the servers utilized through these services are located in the USA.  It is therefore important to break down the distribution of commercial/non-commercial and geographic location of the ISPs in more detail to obtain a clear indication of the distribution of preprint resources within the ASAPBio network.

The commercial status of the ISPs were determined through inspection of their web sites. The following ISPs were determined to be commercial:

Cloudflare, One.com A/S,  Fasthosts Internet Limited,  Namecheap Inc., Fastly Inc, GitHub Inc., Amazon, Microsoft, Google LLC, Digital Network JSC, Gossamer Threads Inc., Next Dimension Inc, Akamai, BT, Automattic Inc, HostGator.com LLC

and the remainder as non-commercial:

Computer Network Information Center of Chinese Academy of Sciences, The Indian Institute of Horticultural Research (IIHR) Bangalore, CERN – European Organization for Nuclear Research, University of Pittsburgh, China Education and Research Network, Fundação de Amparo à Pesquisa do Estado São Paulo, Japan Science and Technology Agency.

Figure 4 provides a breakdown of the number of preprint services against these different categories (ISP physically based in the USA or not, commercial service or not, ISP being CGA or not) is shown. Services in the USA are heavily oriented to the use of commercial ISPs. Only 1 of those services use a non-commercial ISP (http://philsci-archive.pitt.edu/ with the ISP being the University of Pittsburgh). 10 of the 17 preprint services based outside of the USA are using commercial ISPs. 39 of the 47 preprint services that have ISPs based in the USA use CGA. On the other hand, 4 out 17 of the preprint services with ISPs based outside of the USA use CGA.

The 3 preprint services based in Ireland use Amazon. This is unsurprising as Ireland is one of the major data centre locations for Europe (Walsh, 2025). Interestingly, however, the preprint services that use the Irish-based ISP (https://gatesopenresearch.org/, https://hrbopenresearch.org/ and https://f1000research.com/) are not explicitly related to Irish preprint services.

Figure 4. Stacked bar diagram of the physical location of the ISP versus the commercial status of ISP counting the number of  preprint services they host.

Accessibility of services via VPNs

As previous studies have noted that the geographic location of the ISP influences accessibility of resources for users on a global basis. In part, this is understood to reflect variable internet connectivity in low/middle-income countries (LMICs). Nonetheless, decreased accessibility of open resources also reflects the design of the servers, services and data burden of access requests. In order to investigate the level of accessibility of open preprint servers to users around the world we utilised the VPN methodology discussed in the methods section to test access requests from different geographic locations.

In figure 5 the median number of preprint sites that return an error code for each of the 106 countries is plotted as a function of GDP per capita which is taken from the World Bank. A regression line has been computed to indicate the trend. This  trend illustrates for the number of errors to increase as GDP per capita decreases. There are outliers to this trend. The territories Iraq and the Syrian Arab Republic do not respond to any queries indicating the issue is with the VPN for those countries rather than the sites themselves. We note that the territories Cuba, China and Turkmenistan lie significantly higher than the trend.

In order to determine if there is a difference between preprint services based in High Income Countries (HICs) and Low and Middle Income Countries (LMICs) five preprint services were selected that were based in Russia, China, India and Brazil. In figure 6 two histograms are drawn comparing the fraction of times a site returned an error code for this set of sites (labelled ‘LMIC’) and the set of sites based in the USA. The data suggests that those sites based on LMICs are somewhat less likely to generate an error when downloaded. Nonetheless, given the disparity in the numbers of preprint services in these two categories  this can only be taken as a preliminary result. Furthermore, to truly understand this finding, a more detailed analysis of the design and deployment differences of the preprint services in these two categories is required.

In addition to this previous finding, it was noted that the URL pointing to AfricArxiv (https://info.africarxiv.org/) as tabulated in the ASAPbio list is based with an ISP in Denmark. This is a landing page but the data is based with UbuntuNet (https://africarxiv.ubuntunet.net/home) located in Malawi. With this in mind, both sites were accessed in the same fashion, using Bright Data VPNs in October 2025 following the same procedure (i.e. 10 runs from each VPN accessing these sites). The number of times there was a successful or failed download is collated in table 3. There is a slight difference with the UbuntuNet site having more successful downloads but it is unlikely to be significant.

If we assume that poor connectivity in LMICs is the sole explanation for accessibility challenges then we would expect that accessing those servers in those countries would have a similar trend to figure 5. For a country of Malawi’s GDP per capita it is more likely to experience errors than a country of Denmark’s GDP per capita. Hence we would see poorer response rates from a service based in Malawi over Denmark. The fact that its performance is comparable illustrates that further investigation into how these services are deployed in LMICs is required.

Figure 5. Median number of preprint server sites accessed from the VPN sites that return an error as a function of the GDP per capita. A regression line is drawn using a loess regression approach. The GDP per capita of Malawi and Denmark are indicated. VPN data for these two countries was not gathered.

 

Figure 6. Histograms of fraction of LMIC and USA preprint services that return an error code when accessed from all VPNs. As there are only five sites for the LMIC set the number of fractions is discrete (i.e. 0.0, 0.2, 0.4, 0.6, 0.8 and 1.0).

 

Table 3. Summary of number of types of download from different AfricArxiv pages.
AfricArxiv page Number of successful downloads (200 code returned) Number of failed downloads (200 code not returned)
UbuntuNet Landing page 960 100
One.com A/S (Denmark) Landing page 946 114

Discussion

This paper presents an analysis of a) a breakdown of the ISP service provision for a set of preprint service providers on the basis of their physical location, their commercial status and whether they are one of three largest commercial ISPs, Cloudflare, Google LLC or Amazon and b) the accessibility of those preprint services across a large number of territories, including LMICs and HICs. Together, the results present an analysis of the ASAPBio preprint services and asks questions relating to the design and deployment of these services. The results question whether the current design of preprint services is optimally placed to offer the unrestricted global access to preprint articles that is assumed from the practice of preprinting. The critique offered focuses on the design of the preprint services and focuses on two interlinking strands, namely the overwhelming dominance of commercial, US-located hosting within the preprint landscape, and the likelihood that this leads to accessibility challenges for users outside of high bandwidth contexts of use. The findings are discussed in the subheadings below.

Preprint services are overwhelmingly based in the USA. Slightly over 73% of the services have a landing page URL that is hosted by a server based in the USA. This includes two services that have a TLD that have a different nationality but where the web page is hosted in a US-based service.

Taken within historical context, this concentration of repositories in the USA is not surprising. The USA has a central role within Open Science and forwarding the use of preprints and hence it is inevitable that many preprint services would be based there. Nonetheless, the recent changes made by the Trump administration to research funding, data availability, and data categorisation (Bezuidenhout and Verriet, 2025; Shanahan and Bezuidenhout, 2025) raise significant concerns relating to the long-term robustness of a scholarly communication ecosystem heavily reliant on the geopolitics of a single country. (“Attention arXiv users: arXiv mirrors to shut down September 15th, 2024 – arXiv blog,” n.d.). The concentration of preprint services in the USA raises questions that the current preprint ecosystem is ill-equipped to address. Indeed, the decision to shut down mirroring services physically located outside of the USA by arXiv  illustrates that critical discussions about the impact of geopolitics on the future accessibility of services is urgently needed.

Preprint services in the USA are overwhelmingly reliant on a small number of commercial ISPs. These ISPs are Cloudflare, Google LLC and Amazon. Moreover, 10 of the services (15%) hosted through the Center for Open Science (“Center for Open Science,” 2017) also use Google LLC. Likewise, three preprint services nominally based in Ireland are also using Amazon. A range of other commercial providers are also used. Other HICs also make use of commercial rather than institutional services.

The overwhelming dependency on a few commercial companies is not surprising given the resources necessary to host cost-efficient cloud storage. In addition to cost effectiveness, the involvement of a small number of well-resourced companies with stated ambitions of global inclusion could indeed benefit longer term accessibility of preprint services (ref). Nonetheless, uncritical reliance on these commercial actors must also be accompanied by reservations. Commercial companies are primarily beholden to their shareholders and profits, rather than to Open Science values of inclusivity and collective benefit. This can mean that design and restructuring decisions – taken without the involvement of the research community – could significantly disrupt the Open Science movement.

Commercial services also regularly change their own terms of use. For example Cloudflare have taken steps to limit the use of crawlers which has become a significant problem as collating data for LLMs has expanded enormously (Atkinson, 2025). The overreliance on commercial actors may introduce vulnerabilities into the preprinting landscape, as previously relied upon services or access conditions may be subject to non-consultative change.

Another important topic of concern relates to the national legislation of the server host country. Both commercial and non-commercial hosts are subject to national legislation that may restrict their ability to transact with users from other countries – resulting in practices such as the geo-blocking of users. Such restrictions have been documented in other infrastructures relied on by the Open Science community, such as GitHub (Shanahan and Bezuidenhout, 2022).

Countries that are poorer tend to have poorer access to preprint services. Figure 4 demonstrates that the number of sites that return gradually increases roughly from 10% of the services tending to not respond to roughly 20% particularly as the GDP per capita goes under US$1,000.

As such, these results are not surprising. Global variability in internet bandwidth and download speeds is significant, largely driven by economic disparity, infrastructure investment, and geography.[7] There are stark differences between high-income and low-income regions, as well as between countries and even localities within a single country. Nonetheless, the variability in successful access from LMIC regions undermines the promise of preprinting to equalise access across global regions.

Preprint services based in LMICs perform better than expected. Figure 5 demonstrates that the five preprint services based in LMICs are at least as likely to respond to a download request as those based in the USA. A similar picture is drawn from the comparison between the Danish landing site for AfricArXiv and the UbuntuNet site itself (hosted in Malawi). We would expect that as connectivity is poorer in territories such as Malawi then access to that site would be even lower than the Danish site.  There is a need to examine the differences in design, implementation and deployment of preprint services in low-bandwidth settings to understand why we see matching performance with high-bandwidth settings. More broadly this points to an urgent need within the preprinting community to initiate discussions on how the future design of preprinting infrastructures can be better aligned with the access requirements of low-bandwidth settings.

The organisation of URLs and hosting is complex. As noted, TLDs for a specific nationality for preprint services don’t necessarily correspond to where the corresponding service is physically located. This raises interesting questions for the preprinting community relating to data sovereignty and ownership. The evidence presented in this paper strongly illustrates that we cannot make assumptions about the global nature of the preprint landscape when services are consolidated in a small number of countries, despite their remits to serve disparate national or regional communities.

Conclusions

Preprint services have been seen as one of the success stories of Open Science. They have been presented as a relatively light-weight technical solution to ensuring that publications are made rapidly available and open for all. Indeed, the transformative role that preprinting has played in advancing access to academic resources deserves celebration.

However, preprinting services are subject to constraints. Like most Open Science infrastructures and services, the overreliance on volunteer labour, insecure (and often inadequate) funding streams and poorly articulated sustainability planning influence have influenced the design and roll-out of the preprint landscape. As we can see from the trends presented in this paper, expediency and cost efficiency have influenced the choices in ISPs and hosting locations.

This paper has also demonstrated that preprint services are making technical choices that may further affect their accessibility. The dependence on three commercial ISPs and that most of those services are physically based in the USA leaves such preprint services subject to risks across those ISPs. This could include disagreements with one or more ISPs at a national level (“Comunicato stampa | Agcom,” 2026; Sommese et al., 2025) or a sustained outage of a data centre.

The lack of critical reflection regarding the underlying vulnerabilities and dependencies of the preprint landscape must give rise for concern. In particular, given the promises made on behalf of preprinting by prominent supporters of Open Science, including the UNESCO Recommendation on Open Science, we must ask whether the current preprint landscape is in a position to fulfil these expectations. If commercial actors continue to dominate the landscape, and users from low-bandwidth settings continue to experience marginalisations, the question has to be raised if preprinting really can claim to be equalising – or equitizing – scholarly communication.

More importantly, the febrile state of strategic relationships between countries indicates that there is a need to diversify the geographic representation of preprint services to protect itself against such geopolitical shifts. Lessons learnt from the data community in light of the changes made by the Trump Administration strongly support diversification of physical resources (Azevedo et al., 2026). The observed performance of preprint services in the LMICs offer further encouragement to explore the North-South distribution of preprint services. In this the expertise with Latin America will be of significant importance when expanding to other regions in the Global South.[8]

It is clear that far more research is needed to map these vulnerabilities to specific design and deployment decisions. Without such research, the preprint services community will continue to struggle to gain clarity on where this reform in scholarly communication falls short of the Open Science commitments to equity, inclusivity and diversity. Preprint services themselves need to consider these issues and ultimately develop principles and best practices equivalent to, for example, the TRUST principles for Data Repositories (Lin et al., 2020) to address this.

Materials

Source code for accessing the preprint services and the analysis of the data can be found at https://doi.org/10.5281/zenodo.18612617. The data downloaded from the VPNs can be found at https://doi.org/10.5281/zenodo.18435714.

References

Alves, T., 2023. The Preprint Workflow Revolution [WWW Document]. PublishersWeekly.com. URL https://www.publishersweekly.com/pw/by-topic/digital/content-and-e-books/article/92730-the-preprint-workflow-revolution.html (accessed 12.9.25).

Atkinson, D., 2025. In the Mood to Exclude: Revitalizing Trespass to Chattels in the Era of GenAI Scraping. https://doi.org/10.48550/arXiv.2510.16049

Attention arXiv users: arXiv mirrors to shut down September 15th, 2024 – arXiv blog [WWW Document], n.d. URL https://blog.arxiv.org/2024/09/13/attention-arxiv-users-arxiv-mirrors-to-shut-down-september-15th-2024/ (accessed 10.30.25).

Azevedo, F., Bezuidenhout, L., Bosman, J., CERAN, O.M., Costas, R., D’Agostino, A., Gawehns, D., Gregory, K., Gum, J., Hanahoe, H., Havemann, J., Kellam, L., Lee, T., Sesink, L., Shanahan, H., Sheehan, N., Stall, S., 2026. Resilience in Times of Crisis: Strengthening Open Science Against Geopolitical Pressures. Recommendations to the Netherlands National Commission for UNESCO. https://doi.org/10.5281/zenodo.18299450

Badalova, F., Sienkiewicz, J., Mayr, P., 2025. PreprintToPaper dataset: connecting bioRxiv preprints with journal publications. https://doi.org/10.48550/arXiv.2510.01783

Bezuidenhout, L., Havemann, J., 2021. The Varying Openness of Digital Open Science Tools. F1000 9, 1292. https://doi.org/10.5281/ZENODO.4013812

Bezuidenhout, L., Verriet, J., 2025. The withdrawal of the US from UNESCO: What does this mean for Open Science? Leiden Madtrics. URL https://www.leidenmadtrics.nl/articles/the-withdrawal-of-the-us-from-unesco-what-does-this-mean-for-open-science (accessed 12.10.25).

Center for Open Science: Strategic Plan, 2017.

Comunicato stampa | Agcom [WWW Document], 2026. URL https://www.agcom.it/comunicazione/comunicati-stampa/comunicato-stampa-71 (accessed 1.29.26).

Fraser, N., Brierley, L., Dey, G., Polka, J.K., Pálfy, M., Nanni, F., Coates, J.A., 2021. The evolving role of preprints in the dissemination of COVID-19 research and their impact on the science communication landscape. PLoS Biol. 19, e3000959. https://doi.org/10.1371/journal.pbio.3000959

Gregory, K., Zurbach, J., Shankar, K., Mayernik, M., Treloar, A., 2025. Sustaining Knowledge Infrastructures: Asking Questions and Listening for Answers. https://doi.org/10.48550/arXiv.2502.19360

Kramer, B., Bosman, J., 2016. Innovations in scholarly communication – global survey on research tool usage. F1000Research 5, 692. https://doi.org/10.12688/f1000research.8414.1

Lin, D., Crabtree, J., Dillo, I., Downs, R.R., Edmunds, R., Giaretta, D., De Giusti, M., L’Hours, H., Hugo, W., Jenkyns, R., Khodiyar, V., Martone, M.E., Mokrane, M., Navale, V., Petters, J., Sierman, B., Sokolova, D.V., Stockhause, M., Westbrook, J., 2020. The TRUST Principles for digital repositories. Sci. Data 7, 144. https://doi.org/10.1038/s41597-020-0486-7

Polka, J., 2023. Archive of ASAPbio’s list of preprint servers: policies and practices across platforms. https://doi.org/10.5281/zenodo.8230987

Rzayeva, N., Pinfield, S., Waltman, L., 2025. Adoption of Preprinting Across Scientific Disciplines and Geographical Regions (1991-2023). https://doi.org/10.31235/osf.io/xdwc4_v2

Shanahan, H., Bezuidenhout, L., 2025. Putting tragedy in context. https://doi.org/10.31235/osf.io/hqzej_v1

Shanahan, H., Bezuidenhout, L., 2022. Rethinking the A in FAIR Data: Issues of Data Access and Accessibility in Research. Front. Res. Metr. Anal. 7. https://doi.org/10.3389/frma.2022.912456

Sommese, R., Sperotto, A., Prado, A., Ham, J. van der, Affinito, A., 2025. 90th Minute: A First Look to Collateral Damages and Efficacy of the Italian Piracy Shield, in: 2025 21st International Conference on Network and Service Management (CNSM). Presented at the 2025 21st International Conference on Network and Service Management (CNSM), pp. 1–9. https://doi.org/10.23919/CNSM67658.2025.11297497

Teixeira da Silva, J.A., 2017. The preprint wars. AME Med. J. 2. https://doi.org/10.21037/amj.2017.05.23

UNESCO, 2022. UNESCO Recommendation on Open Science. UNESCO, Paris.

Walsh, K., 2025. The future of data centres in Ireland.

Notes

[1] As of 2025 the Bill and Melinda Gates Foundation no longer supports article-processing charges for grantees, instead encouraging preprint publication to ensure open access https://openaccess.gatesfoundation.org/open-access-policy/2025-open-access-policy/

[2] Platforms that peer-review and curate preprints instead of requiring submission of new manuscripts

[3] https://101innovations.wordpress.com/

[4] https://jrost.org/

[5] https://www.rd-alliance.org/groups/mapping-the-landscape-of-digital-research-tools-ii-maldreth-ii/activity/

[6] https://infrafinder.investinopen.org/solutions

[7] https://radar.cloudflare.com/quality

[8] Organisations in Latin America, including https://www.scielo.org/en/ and https://www.lareferencia.info/en/ have extensive experience in Open Access publishing in LMICs.

Editors

Kathryn Zeiler
Editor-in-Chief

Adrian Barnett
Handling Editor

Editorial assessment

by Adrian Barnett

DOI: 10.70744/MetaROR.360.1.ea

This observational study looks at some interesting questions for preprint servers and their locations. Both reviewers are generally positive and feel the paper makes a useful contribution to the understanding of preprints.

Both reviewers had issues with Figure 5, and one reviewer makes a valid point about the trend line being relatively unconvincing. A good robustness analysis here would be a leave-one-country-out sensitivity analysis. Also, the methods say this is a regression line, but the black line is not linear and the figure legend mentions LOESS; the details of the method used should be given in the text.

Both reviewers want to see more statistical tests, although I am less in favour and would prefer a focus on the practical significance of any differences.

One reviewer points out the need for more literature on open science. For completeness, it may also be useful to acknowledge some of the negative and/or sceptical voices on preprints.

Recommendations from the editor

The accessibility tests depend on the time of day because of network congestion. This means the authors should re-run the queries at the same time in each country and on a day that is a typical workday in most countries. The current tests may be at 2pm in some countries and 2am in others.

It would be useful to give a brief explanation of who ASAPBio are (page 6).

Figure 3 could be rotated by 90 degrees to make the ISPs easier to read.

Something in the discussion about the growing pressure on preprint servers due to LLM-generated submissions might be worthwhile.

There are a lot of acronyms (e.g., ISP, VPN, TLD, CGA), so a table of acronyms would be useful.

Recommendations for enhanced transparency

  • Properly identify the statistical software used in the research, including version numbers. Identifying information should be reported in the text of the article, in an appendix, in supplementary materials, or in the source code and scripts documentation.
  • Software packages (e.g., Stata, SPSS, SAS, R) used in the research should be cited in detail in the reference section.
  • Add author ORCID IDs.
  • Add an email address for the correponding author.
  • Add an author contribution statement.
  • Add a competing interest statement. Authors should report all competing interests, including not only financial interests, but any role, relationship, or commitment of an author that presents an actual or perceived threat to the integrity or independence of the research presented in the article. If no competing interests exist, authors should explicitly state this.
  • Add a funding source statement. Authors should report all funding in support of the research presented in the article. Grant reference numbers should be included. If no funding sources exist, explicitly state this in the article.

For more information on these recommendations, please refer to our author guidelines.

Competing interests: None.

Peer review 1

Jonathan Wheeler

DOI: 10.70744/MetaROR.360.1.rv1

The article describes a study of ASAPbio preprint servers to accurately geolocate them and determine their internet service provider, and to test access to each server from a large set of VPN representative of global regions and national income levels (per classification by high, middle, and low income countries). The article makes a compelling contribution to the literature on Open Science, specifically with regard to infrastructure concerns and related disparities in national and regional participation in Open Science. First, the authors present a concise but coherent overview of the factors that have led to increasing adoption of preprints across research disciplines, including a nuanced discussion of the connection between preprinting and the promises and pitfalls of Open Science (for example the potential of preprinting services to reinforce US/Euro-centric hierarchies and research methodologies). Through this discussion, both the value of preprinting services and the significance of their vulnerability to disruption are highlighted. Second, the authors’ detailed analysis of where ASAPbio preprint servers are actually located, in occasional contrast with the location implied by a site’s top level domain or landing page, is a significant contribution that can inform and impact future research into Open Science infrastructure and webometrics in general. Specifically, if the results can be generalized to other disciplines and preprint services, the observed consolidation of servers among commercial service providers in the US sheds new light on previous findings related to high error rates and other disparities of access from locations in low-resourced countries.

Strengths and weaknesses, with suggestions for improvement:

  • The literature review is thorough and key works are cited. Some additional citations may strengthen the argument, in particular in the section on “Preprinting as a response to equity and accessibility.” This section includes some generalizations about the promises of Open Science – for example lowering barriers to scholarly communication – that are well described in the literature and can be supported with relevant citations. As the discussion shifts here and introduces factors that complicate or mitigate the impact of Open Science practices, additional grounding in the literature will reinforce the contrast between aspiration and reality.
  • The data collection and analysis methodologies are well described and appropriate to the findings as presented. The data are available on Zenodo, though the authors may consider citing the dataset at points where it is referenced in the article, in addition to the data availability statement at the end.
  • The inclusion of multiple tables and figures is appreciated, and helps not only to illustrate the findings but also to differentiate between the results of the various methods the authors used to analyze and explore the data. The presentation of the data in the tables and figures is clear and the corresponding discussions are thorough, though some additional discussion of Table 1 within the text would be helpful. It’s a large table and demonstrates a lot of variation in the management and administration of preprint services, so an overview of key differences (for example, inconsistent DOI practices) within the text would be helpful.
  • The inclusion of Denmark and Malawi in Figure 5 is unclear, as no VPN data were collected from these countries. The relevance to the number of returned errors based on ISP and VPN location relative to the GDP of the VPN host country is understood, but noting these two countries in figure 5 based on their GDP suggests that connection requests to a server in Malawi will return more errors than requests to a server in Denmark. Yet the actual findings are in contrast to this assumption. Perhaps it would be more clear to note each country’s GDP in Table 3?
  • Inclusion of p-values or other indicators of the significance and size of the effect of GDP on error rates may reinforce the findings presented in Figure 5. This is an important result, and the implications are thoughtfully addressed in the discussion and conclusion. Additional detail on effect size and statistical significance may further highlight the problems of disparity of access and the vulnerability of Open Science infrastructure to commercial and political interests. It is understood that the smaller sample sizes for the other analyses makes significance and ability to generalize findings more difficult to determine.

Competing interests: I have no competing interests, based on MetaROR’s information for reviewers. In full transparency, I have formerly collaborated with the authors on similar research, but we have not worked together since spring of 2024. I did not contribute to this project or paper, and I have no competing scholarly or financial interest in its publication.

Peer review 2

Jessica Polka

DOI: 10.70744/MetaROR.360.1.rv2

In “The localisation and accessibility of preprint services: implications for Open Science,” the authors access preprint servers from VPNs to determine accessibility from different geographic regions. This is an important topic, researched with transparent methods, revealing a need to ensure access to preprinting for users worldwide. However, some of the conclusions drawn need to be nuanced and clarified.

MAJOR COMMENTS

  • In the discussion, in the section called “Countries that are poorer tend to have poorer access to preprint services.” – I assume there’s a typo here and this text is intended to refer to figure 5. The trend line (and raw data) do not convince me of the conclusion. Perhaps a more qualified version of the claim could be made about the bottom quartile of GDP; I would like to see a statistical test employed.
  • In “Preprint services based in LMICs perform better than expected.” – I believe this refers to figure 6. The concept of expectation here is subjective, and it would perhaps be clearer to more directly compare the two groups. It would helpful to see some statistical tests, but I understand this will be limited by the small sample size of LMIC servers.
  • I had trouble interpreting figure 6. For example, does this mean that NO LMIC preprint servers returned an error for 800+ VPNs, while 20% of the LMIC services returned an error for ~200 VPNs? It would be helpful to clarify the legend, axis labels, and chart label.
  • Given the small sample of preprint services hosted in LMICs, it’s not clear to me that LMIC hosting definitely solves the access problem.
  • The paper would benefit from a discussion of not only the threats, but also the potential benefits of hosting with both large commercial companies, and smaller hosting providers. A more comprehensive analysis would help to guide policy decisions.
  • Do the authors have any concrete policy recommendations? While the paper states that more analysis is needed, perhaps the authors could recommend that similar methods used in the paper could be employed when evaluating hosting providers for future preprint services.

MINOR COMMENTS

  • Missing references – in the introduction, additional citations would be helpful to back up claims. Furthermore, there is a missing reference (labeled “(ref)”) on page 18.
  • Table 1 – I believe bioRxiv uses Highwire
  • All the Irish-hosted preprint services listed on page 13 are associated with F1000, which could perhaps be included in the text as a means of explanation.
  • Page 18 – missing word between “return” and “gradually?”

Thanks to the authors for this important direction, which I believe could result in improved access to open science resources in the future!

Competing interests: None.

Leave a comment