Who owns the knowledge? Copyright, genAI, and the future of academic publishing

Published at MetaROR

March 12, 2026

Table of contents

Abstract
Full text
Editors
Editorial assessment
Peer review 1
Peer review 2
Leave a comment

Cite

Cite this article as:

Kochetkov, D. (2025). Who Owns the Knowledge? Copyright, GenAI, and the Future of Academic Publishing. arXiv preprint arXiv:2511.21755.

Who owns the knowledge? Copyright, genAI, and the future of academic publishing

Dmitry Kochetkov^1,2

1. Centre for Science and Technology Studies (CWTS), Leiden University, Leiden, the Netherlands
2. Institute for the Study of Science, Russian Academy of Sciences, Moscow, Russia

Originally published on January 18, 2026 at:

https://arxiv.org/abs/2511.21755v2

Abstract

The integration of generative artificial intelligence (GenAI) and large language models (LLMs) into scientific research and higher education presents a paradigm shift, offering revolutionizing opportunities while simultaneously raising profound ethical, legal, and regulatory questions. This study examines the complex intersection of AI and science, with a specific focus on the challenges posed to copyright law and the principles of open science. The author argues that current regulatory frameworks in key jurisdictions like the United States, China, the European Union, and the United Kingdom, while aiming to foster innovation, contain significant gaps, particularly concerning the use of copyrighted works and open science outputs for AI training. Widely adopted licensing mechanisms, such as Creative Commons, fail to adequately address the nuances of AI training, and the pervasive lack of attribution within AI systems fundamentally challenges established notions of originality. While current doctrine treats AI training as potentially fair use, this paper argues such mechanisms are inadequate and that copyright holders should retain explicit opt-out rights regardless of fair use doctrine. Instead, the author advocates for upholding authors' rights to refuse the use of their works for AI training and proposes that universities assume a leading role in shaping responsible AI governance. The conclusion is that a harmonized international legislative effort is urgently needed to ensure transparency, protect intellectual property, and prevent the emergence of an oligopolistic market structure that could prioritize commercial profit over scientific integrity and equitable knowledge production. This is a substantially expanded and revised version of a work originally presented at the 20th International Conference on Scientometrics & Informetrics (Kochetkov, 2025).

Full text

Introduction

While artificial intelligence (AI) research dates back to the 1956 Dartmouth Conference (Strickland, 2021), recent advances in deep learning and natural language processing (NLP) (Vaswani et al., 2017) have enabled large language models (LLMs) capable of processing and generating human-like content at unprecedented scale. These capabilities have particular significance for scientific publishing, where LLMs are increasingly deployed for tasks ranging from peer review assistance (Zhang & Abernethy, 2025; Zhuang et al., 2025) to automated literature reviews (De La Torre-López et al., 2023).

The development of AI technology presents both challenges and opportunities across various fields (Rama Padmaja & Lakshminarayana, 2024; Wolff et al., 2018). While AI offers immense potential, its advancement raises ethical concerns, including biases, privacy issues, and broader social implications (Rama Padmaja & Lakshminarayana, 2024). AI’s influence spans all five dimensions of sustainability, with both positive and negative consequences (Khakurel et al., 2018). For instance, an analysis of a Google Scholar sample of questionable scientific papers suspected to be generated by GPT revealed that many address applied, often controversial issues prone to misinformation, such as environment, health, and computing (Haider et al., 2024).

AI is transforming research jobs, and in turn, science provides LLMs with a vast amount of data for training. However, LLMs may pose a threat to copyright, as they can generate content that potentially violates intellectual property rights (German, 2024). Currently, neither copyright nor “open” licenses can protect scholarly content from without author consent reuse in AI training (Decker, 2025). This fact raises fundamental questions that existing legal frameworks struggle to address.

The intersection of AI training and copyright law has generated substantial legal scholarship examining how different jurisdictions approach these challenges. Sag & Yu (2024) identify an emerging international equilibrium where countries recognize text and data mining as potentially valuable while maintaining some copyright protections, driven by the centrality of the idea-expression distinction, global AI competition, and reform convergence. Chopra et al., (2025) offer comparative analysis of how courts and policymakers in multiple jurisdictions address generative AI’s impact on copyright and personality rights, highlighting divergent regulatory approaches between U.S. litigation-based doctrines, EU statutory frameworks, and emerging jurisprudence in other legal systems. The European Parliament’s 2025 study on generative AI and copyright examines structural risks to European creative economy, calling for harmonized opt-out mechanisms and equitable licensing models (Lucchi, 2025). Senftleben (2025) proposes policy pathways to balance innovation with creator protection.

This study examines how generative AI and LLMs challenge existing copyright frameworks in scholarly publishing, an issue that remains underdeveloped in current policy discussions. Within the broader discussion about AI’s impact on science, I focus specifically on three interconnected issues: (1) whether current licensing mechanisms adequately protect scholarly works from unauthorized AI training, (2) how regulatory frameworks across major jurisdictions address these copyright concerns, and (3) what actions stakeholders can take to establish fairer governance of AI training on academic content. The focus on these three issues responds to an urgent gap in current policy discussions, where the use of publicly available research outputs for training large language models remains largely unregulated.

AI-Related Regulations

In this section, I provide an analysis of the regulations related to artificial intelligence (AI) in the United States, China, the United Kingdom, the European Union, and major international initiatives influencing national AI legislation.

United Kingdom

Interestingly, there is currently no comprehensive regulation governing AI in the UK and the United States. The Sunak government issued a framework document in 2023 titled A Pro-Innovation Approach to AI Regulation (2023), which establishes basic principles for AI. The document promotes flexible regulation and aims to foster innovation through the development and use of AI technologies. The British government has also expressed its ambition to make the UK the best place to invest in AI.

The Artificial Intelligence (Regulation) Bill, reintroduced as a Private Members’ Bill by Lord Holmes of Richmond on March 4, 2025, represents a significant departure from the government’s earlier voluntary approach by proposing the creation of a centralized AI Authority to coordinate regulatory oversight across sectors, establish regulatory sandboxes for testing AI innovations with real consumers, and mandate that all organizations developing, deploying, or using AI designate AI officers and undergo independent audits by accredited third parties. The bill specifically addresses the copyright protection regulatory gap by requiring organizations involved in AI training to report all third-party data and intellectual property used to the AI Authority with assurances of informed consent and compliance with copyright laws, alongside mandatory health warnings, labeling, and consent options for AI products and services, thus providing the transparency and accountability mechanisms around training data provenance that have been largely absent from other AI regulatory frameworks.

The Data (Use and Access) Act 2025, which received Royal Assent on June 19, 2025, represents a foundational intervention: rather than establishing immediate copyright protection, it creates a statutory framework for future copyright and AI training governance. The Act does not directly prohibit unlicensed AI training or establish opt-out rights; instead, it imposes government obligations to evaluate whether such protections are needed and to report recommendations to Parliament. The UK government has established three competing policy objectives that any copyright-AI framework must balance – (1) control (rightsholders’ ability to license and monetize), (2) access (developers’ lawful access to training data), and (3) transparency (clear framework with disclosure of training sources). At the same time, the Act involves Secretary of State obligations (due March 18, 2026): (1) economic impact assessment of all policy options; (2) comprehensive report on use of copyright works in AI development. The December 2025 progress statement indicates the four policy options under consideration:

Option 0: do nothing
Option 1: strengthen copyright – requiring licensing in all cases
Option 2: a broad data mining exception¹
Option 3: a data mining exception with rights reservation underpinned by supporting measures on transparency (Government’s preferred option).

Option 3 assumes default permission for AI training unless copyright holder expressly reserves rights (opt-out mechanism). It looks like a balanced approach but requires technical opt-out standards. Notably, the consultation received over 11,500 responses, with 88% supporting mandatory licensing (Option 1) versus only 3% supporting the government’s preferred opt-out framework (Option 3). It looks like a public mandate for author-protective policy that may fundamentally reshape the UK’s regulatory trajectory.

United States

In the United States, a framework document was published in October 2023, titled Executive Order 14110 on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence (2023). Notably, this document included actions related to copyright law, stating: “…consult with the Director of the United States Copyright Office and issue recommendations to the President on potential executive actions relating to copyright and AI. The recommendations shall address any copyright and related issues discussed in the United States Copyright Office’s study, including the scope of protection for works produced using AI and the treatment of copyrighted works in AI training.”

A significant step forward was taken with the development of the Generative AI Copyright Disclosure Act of 2024 (Schiff, 2024). This act aimed to ensure transparency in the use of copyrighted works for AI training and is currently under consideration in the House of Representatives. If passed, the act would require companies to notify the U.S. Copyright Office about any copyrighted works used in their AI systems. These notifications must be submitted

30 days before or after the public release of the AI system, ensuring transparency and accountability. The act is intended to help copyright holders make informed decisions about licensing and compensation. However, the wording of the document remains vague, raising questions for both AI developers and copyright owners. Additionally, I have concerns about the inability of copyright holders to prohibit the use of their works for AI training, which creates a bias in favor of bigtech AI development.

The future of this Act is unclear because Trump rescinded Executive Order 14110 during the first days of his presidency. His Executive Order 14179, signed on January 23, 2025, articulated the need to develop a new approach to AI development (Executive Order 14179 Removing Barriers to American Leadership in Artificial Intelligence, 2025). This approach was formulated in AI Action Plan (2025) that prioritize “American values” like free speech in AI development, support open models, streamline permitting for infrastructure, support American workers, and build global alliances on AI standards and security. However, I didn’t find any mentions of copyright in the document.

AI Action Plan was accompanied by a number of executive orders, namely:

Promoting the export of the American AI technology stack
Accelerating federal permitting of data center infrastructure
Preventing “Woke AI” in the Federal Government

The documents signed on July 23, 2025, clearly indicate that AI, like many other spheres of American social life under Trump’s administration, proved to be highly politicized.

One of the most significant recent developments in state-level AI regulation is the California Assembly Bill 2013 (AB 20213), Generative Artificial Intelligence: Training Data Transparency Act (2024), which came into effect on January 1, 2026. AB 2013 mandates that developers of generative AI systems released or substantially modified after January 1, 2022, and offered for public use in California, must publicly disclose a “high-level summary” of their training data on their official websites. The exemptions are systems for cybersecurity/security testing, aircraft operation in national airspace, national security/military/defense. AB 2013 does not establish opt-out rights for copyright holders, does not prohibit unlicensed training, and does not provide compensation mechanisms.

However, the viability of state-level AI regulation like AB 2013 is now in serious concern. On December 11, 2025, President Trump signed an Executive Order, Ensuring a National Policy Framework for Artificial Intelligence, which represents a dramatic reversal of regulatory momentum toward state autonomy. The Executive Order explicitly characterizes state AI laws as obstacles to U.S. global AI competitiveness, stating that they create ‘a patchwork of 50 different regulatory regimes’ that hinder innovation and U.S. dominance over China.

China

In China, the Interim Measures for the Management of Generative Artificial Intelligence Services were implemented on August 15, 2023. These regulations, comprising 24 articles, aim to strike a balance between fostering innovation and ensuring the security and governance of AI. Article 3 emphasizes the importance of maintaining a harmonious relationship between development and innovation while prioritizing security and governance in the field of AI. Articles 5 and 6 highlight the need for collaboration in developing basic technologies, such as chips and software platforms, as well as the creation of shared data resources. Article 16 states that all regulatory measures must be compatible with innovation, and Article 2 clarifies that the regulations apply only to publicly available generative AI services. Service providers are held responsible for the content generated using their services. Chinese regulations are among the most stringent in the world. For example, Article 12 mandates that users must be informed when content is generated using AI as a blanket rule.

On July 26, 2025, just tree days after Trump issued his AI Action Plan, China unveiled Global AI Governance Action Plan. The document details a comprehensive international framework for AI development and deployment. Unlike the U.S. plan, which centers on national priorities and addressing ideological biases in AI systems, China’s approach focuses on global coordination, emphasizing multilateralism, openness, and technological support for developing countries. It does not generally address copyright protection in detail, but it does mention data and privacy protection, the lawful use of training data, and the need to actively explore regulated data transactions in AI. The document frames the creation of high-quality, legitimate data sets as essential for AI development, with safeguards in place, implying that intellectual property (including copyright) should be respected in the pursuit of innovative data governance.

European Union

On August 1, 2024, the European Artificial Intelligence Act (AI Act) entered into force (Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024, 2024). Article 53(1) of the EU AI Act, which became effective on August 2, 2025, establishes specific copyright-related obligations for providers of general-purpose AI (GPAI) models. Specifically, Articles 53(1)(c) and (d) require GPAI providers to: (i) implement a policy to identify and comply with reservation of rights expressed under the Copyright Directive (EU) 2019/790, and (ii) publish a sufficiently detailed summary of the content used for model training.

The Copyright Directive (EU) 2019/790 introduced two landmark exceptions for text and data mining that directly apply to AI training activities (Directive (EU) 2019/790 of the European Parliament and of the Council of 17 April 2019 on Copyright and Related Rights in the Digital Single Market and Amending Directives 96/9/EC and 2001/29/EC (Text with EEA Relevance.), 2019). These provisions establish a dual system: a broad exception with an opt-out mechanism for commercial use (art. 4), balanced against a narrower exception for scientific research (art. 3).

(Commission Presents Template for General-Purpose AI Model Providers to Summarise the Data Used to Train Their Model, 2025; The General-Purpose AI Code of Practice, 2025). The template balances transparency requirements with protection of intellectual property rights such as trade secrets, allowing providers to describe private datasets in a general manner without disclosing specific works. Importantly, the template supports the opt-out model, enabling rightsholders to reserve their rights against text and data mining through machine-readable protocols.

GPAI providers placed on the EU market before 2 August 2025 have until 2 August 2027 to publish their summaries in compliance with the new template. Importantly, the AI Act requires qualifying providers to comply with Article 53 regardless of where the GPAI model was trained.

To support implementation of the opt-out mechanism, the European Commission launched an official stakeholder consultation on copyright compliance under the EU AI Act and the GPAI Code of Practice, running from 1 December 2025 to 23 January 2026 (extended from the original 9 January 2026 deadline). The consultation, supported by the EU Intellectual Property Office (EUIPO), seeks to identify state-of-the-art, technically implementable machine-readable protocols for expressing reservations of rights against text and data mining (Uuk, 2025).

However, the implementation timeline of the EU AI Act has been subject to revision through Digital Omnibus Regulation Proposal (2025), unveiled by the European Commission on 19 November 2025. Most significantly for copyright compliance, Digital Omnibus does not postpone Article 53 copyright transparency requirements, which entered into force on 2 August 2025 and remain applicable. The Digital Omnibus proposal must still be approved by the European Parliament and Council, with adoption not expected before mid-2026.

The EU has been preparing a comprehensive strategy to accelerate the responsible and impactful use of artificial intelligence in science, with major policy developments expected by the end of 2025. One of the developments within this strategy was Living guidelines on the responsible use of generative AI in research (Directorate-General for Research and Innovation, 2025), as well as key initiatives such as a distributed AI infrastructure and a European AI Research Council (Artificial Intelligence (AI) in Science, 2025).

On October 8, 2025, the European Commission released its European Strategy for Artificial Intelligence in Science (A European Strategy for Artificial Intelligence in Science Paving the Way for the Resource for AI Science in Europe (RAISE), 2025), a document which promises to significantly influence copyright governance concerning AI training data within the scientific publishing ecosystem. The strategy explicitly acknowledges the necessity of gathering evidence to improve both access to and the reuse of publicly funded research outputs. This focus suggests a potential shift in policy, one that could broaden the permissible uses of openly accessible research for the purposes of training AI systems.

A central and pivotal element of the strategy is its emphasis on data governance, particularly through established initiatives like the European Open Science Cloud (EOSC) and the planned Data Labs within AI Factories. These structures imply a movement toward more formalized and structured permissions frameworks for data access. Yet, it is notable that the document remains largely silent on the specific copyright protections afforded to authors whose published works constitute the raw material for these data spaces. However, the strategy does not adequately differentiate between the long-accepted practice of text and data mining for non- commercial, human led research and the increasingly prevalent commercial application of AI training, a distinction with profound legal implications.

Whether the strategy’s promise to update the “Living Guidelines on the responsible use of generative AI in research” and to establish new ethics review processes will provide a viable pathway for addressing these unresolved copyright concerns remains an open question. These proposed mechanisms may indeed offer a forum for such discussions, but the initial communication leaves considerable room for interpretation regarding the balance between fostering AI innovation and safeguarding intellectual property rights in academia.

The European Union’s approach to AI regulation represents the most comprehensive and binding framework globally, with specific mechanisms to address copyright concerns that have been largely absent from other jurisdictions. While the EU AI Act initially focused primarily on risk management and transparency, the implementation through Article 53, the GPAI Code of Practice, and the ongoing stakeholder consultation process demonstrates a substantive effort to operationalize copyright protection in the AI training context. The opt-out model, requiring GPAI providers to respect machine-readable rights reservations, represents a significant advance over frameworks that merely defer to existing copyright laws. However, the Digital Omnibus proposal reveals ongoing political tensions between innovation promotion and regulatory enforcement, with industry pressure leading to delays in some areas while copyright compliance timelines remain intact. The ultimate effectiveness of these measures will depend on international adoption, technical standardization of opt-out protocols, and enforcement mechanisms yet to be fully developed.

Global Initiatives

The OECD AI Principles (2019) set global, value-based guidance for trustworthy AI: promote inclusive growth and well-being; respect human rights, the rule of law and democratic values; ensure transparency and explainability; build robustness, security and safety; and maintain accountability across the AI lifecycle. Governments are urged to invest in R&D, enable high- quality and representative data, build interoperable ecosystems, adopt agile and outcome-based regulation (including experimentation), and prepare society and workers for AI-driven transformation. The Principles influence many national and international frameworks and emphasize privacy, non-discrimination, and responsible data use. At the same time, they do not create copyright rules or specifically resolve training-data copyright questions; rather, they root in existing IP and legal regimes and encourage accountability and transparency practices that can complement copyright compliance.

The UNESCO Recommendation on the Ethics of Artificial Intelligence (2022), adopted by all 193 member states in November 2021, establishes the first global standard for AI ethics grounded in human rights, dignity, and environmental protection. The framework promotes four foundational values (human rights and dignity; living in peaceful societies; ensuring diversity and inclusiveness; environment and ecosystem flourishing) and ten key principles including proportionality, human oversight, transparency, explainability, accountability, and fairness. It provides extensive policy action areas covering data governance, education, gender equality, culture, health, and environment, while requiring member states to implement ethical impact assessments and oversight mechanisms throughout AI lifecycles. While it calls for transparency about data sources and requires compliance with legal frameworks, the Recommendation does not establish specific copyright rules but emphasizes the importance of respecting existing intellectual property rights and international law.

The G7 Hiroshima Process, launched in May 2023 and finalized in December 2023, created international framework for advanced AI systems through the Hiroshima AI Process Comprehensive Policy Framework, which includes International Guiding Principles for all AI actors and a voluntary International Code of Conduct for organizations developing advanced AI systems (Hiroshima Process International Guiding Principles for Advanced AI System, 2023). However, the principles mention protecting intellectual property and implementing “appropriate data input measures and protection for personal data and IP” (Principle 11), but like other international frameworks, they defer to existing copyright laws rather than creating new IP protections.

*Table 1: AI-Related Regulations*
Country	Document	Year	Main Points	Copyright Protection
United Kingdom	A Pro-Innovation Approach to AI Regulation	2023	Framework document establishing basic principles for AI; promotes flexible regulation; aims to foster innovation through AI development and use; no comprehensive AI regulation	No specific mention of copyright protection in the framework
United Kingdom	Artificial Intelligence (Regulation) Bill	2025	Private Members’ Bill reintroduced by Lord Holmes; proposes creation of AI Authority for regulatory coordination; establishes regulatory sandboxes; requires designated AI officers; mandates independent AI audits; requires health warnings and labeling for AI products	Specifically requires organizations training AI to report all third-party data and IP used to AI Authority with assurances of informed consent and compliance with copyright laws; addresses transparency around training data provenance and IP usage
United Kingdom	Data (Use and Access) Act 2025	2025	Does not immediately change copyright law. Establishes mandatory government assessment and reporting requirements (Sections 135-136).	Does not currently establish copyright protections but mandates comprehensive government assessment by March 18, 2026.
United States	Executive Order 14110 (RESCINDED)	2023	Safe, secure, and trustworthy development and use of AI; included actions related to copyright law; recommendations on copyright and AI issues	Specifically addressed copyright issues, including scope of protection for AI-produced works and treatment of copyrighted works in AI training
United States	Generative AI Copyright Disclosure Act (H.R.7913)	2024	Requires companies to notify U.S. Copyright Office about copyrighted works used in AI systems 30 days before/after public release; ensures transparency and accountability	Directly addresses copyright by requiring disclosure of copyrighted training data, but criticized for being vague and biased toward AI developers rather than copyright holders
United States	Executive Order 14179 & AI Action Plan 2025	2025	Removes barriers to “American AI leadership”; prioritizes “American values” like free speech; supports open models; streamlines infrastructure permitting; highly politicized approach	No mentions of copyright found in the document, representing a departure from previous copyright considerations
United States (California)	Assembly Bill 2013 (AB 2013): Generative AI Training Data Transparency Act	2025	Applies to developers of generative AI systems released or substantially modified after Jan. 1, 2022, made available to Californians (free or paid). Developers must post on their website a “high-level summary” of training data.	First U.S. legislation specifically targeting copyright in AI training data disclosure. Requires public disclosure of intellectual property status of training datasets, including copyright, trademark, and patent status. The law does not provide an opt-out mechanism. There is a serious concern that this law will be blocked by the federal government.
China	Interim Measures for the Management of Generative AI Services	2023	24 articles balancing innovation with security/governance; applies only to publicly available services; service providers responsible for generated content; mandatory AI labeling	Does not specifically address copyright protection; focuses on content control and governance rather than IP rights
China	Global AI Governance Action Plan 2025	2025	Comprehensive international framework emphasizing multilateralism, openness, and technological support for developing countries; focuses on global coordination	Mentions data and privacy protection, lawful use of training data, and need for legitimate datasets, implying IP rights should be respected but no specific copyright provisions
European Union	Copyright Directive (EU) 2019/790 (Digital Single Market Directive)	2019	Introduces two text and data mining (TDM) exceptions: Article 3 (mandatory exception for scientific research with no opt-out mechanism) and Article 4 (broad exception for all TDM including commercial use, with opt-out mechanism); rightsholders can reserve rights through machine- readable means for publicly available online content	Establishes opt-out framework for AI training: Article 4(3) allows rightsholders to expressly reserve rights against TDM in “appropriate manner” (robots.txt, metadata, terms of service for online content); default permits TDM without authorization unless opt-out expressed; directly linked to EU AI Act Article 53(1)(c) which requires GPAI providers to identify and respect these reservations; Article 3 provides mandatory exception for scientific research without opt-out option
European Union	EU Artificial Intelligence Act (AI Act)	2024	Entered force August 1, 2024; reduces AI- associated risks; focuses on high-risk AI systems; requires transparency for low-risk systems; mandatory labeling of AI-generated content	Articles 53(1)(c) and (d) require GPAI providers to: implement a policy to identify and comply with reservation of rights publish a “sufficiently detailed summary” of the content used for model training
European Union	GPAI Code of Practice	2025	Voluntary code providing detailed implementation guidance for EU AI Act obligations; three chapters on transparency, copyright, and safety; commits signatories to respect machine-readable opt-out protocols; requires copyright policies and safeguards against infringing outputs	Directly addresses copyright through five concrete measures: copyright policy requirement; lawful data access; respect for rights reservations (“do not train” signals); mitigation of infringing outputs; complaint mechanisms for rightsholders
European Union	Article 53 Training Content Summary Template	2025	Template published by AI Office requiring GPAI providers to disclose websites from which they sourced training data; balances transparency with trade secret protection; applies to models placed on EU market before 2 August 2025 with compliance deadline of 2 August 2027	Implements opt-out model, i.e. enables rightsholders to reserve rights against text and data mining; requires disclosure of training data sources but protects proprietary information; extraterritorial effect regardless of where model was trained
European Union	EU Commission Consultation on TDM Opt-Out Protocols	2025- 2026	Stakeholder consultation (1 December 2025 – 23 January 2026) to identify machine-readable protocols for expressing rights reservations; supported by EUIPO; seeks agreement on common list of opt-out solutions for GPAI Code signatories	Operationalizes copyright opt-out mechanism; identifies technical standards for rightsholders to reserve rights under Copyright Directive Article 4(3); establishes regularly reviewed list of agreed machine-readable solutions
European Union	Digital Omnibus Proposal	2025	Proposes to delay enforcement of some provisions from August 2026 to December 2027; response to industry lobbying and concerns about implementation readiness; does not delay Article 53 copyright requirements	Does not affect copyright compliance timeline: Article 53 transparency and opt-out requirements remain applicable on original schedule
European Union	Living Guidelines on Responsible Use of Generative AI in Research	2025	Part of comprehensive strategy to accelerate responsible AI use in science; aims to enhance innovation, competitiveness, ethical deployment, and international leadership	Does not explicitly address copyright protection; mentions only “unpublished and sensitive work” but no specific IP protections
International (OECD)	OECD AI Principles	2019	Global, value-based guidance for trustworthy AI; promotes inclusive growth, human rights respect, transparency, robustness, and accountability; influences many national frameworks	Does not create specific copyright rules; defers to existing IP and legal regimes; encourages accountability and transparency that can complement copyright compliance
International (UNESCO)	AI Ethics Recommendation	2021	Adopted by 193 member states; first global standard for AI ethics; four foundational values and ten key principles; extensive policy action areas	Does not establish specific copyright rules but emphasizes respecting existing intellectual property rights and international law; requires compliance with legal frameworks
International (G7)	Hiroshima Process Comprehensive Policy Framework	2023	International framework for advanced AI systems; includes International Guiding Principles and voluntary Code of Conduct; emphasizes safe, secure, trustworthy AI development	Mentions protecting intellectual property and implementing “appropriate data input measures and protection for personal data and IP” but defers to existing copyright laws rather than creating new IP protections

Copyright, Licensing, and Legal Analysis

Copyright Framework and Creative Commons Licensing

Most scientific works are protected by copyright laws. Copying and retaining these works in AI systems, as well as reproducing them in outputs, involves copyright, making appropriate licensing essential for compliance (Johnson, 2024). The generated output can be considered a derivative work, although this is not explicitly stated in any legal documents.

Creative Commons (CC) licenses are the most widely used for open-access outputs. Approximately 28% of global research output is licensed under the Creative Commons Attribution license (CC BY), while another 22% uses more restrictive Creative Commons licenses (Pollock & Michael, 2024). However, Creative Commons acknowledges that existing CC licenses do not fully address the specific challenges related to using creative works for AI training (Walsh, 2023).

On the other hand, if the use of content is subject to copyright exclusions, the licensee’s abilities are limited. In fact, such an exclusion is currently being considered for legislation in the United States. Moreover, the U.S. fair use doctrine allows for the unlicensed use of copyrighted works under certain circumstances. AI training is often considered a case of fair use (Johnson, 2024; Walsh, 2023). For instance, OpenAI argues that this position is “supported by long-standing and widely accepted precedents” (OpenAI and Journalism, 2024).

Publishers are also responding to market changes by developing licensing agreements for the use of content in LLM training (Schonfeld, 2024). Currently, the number of such deals is relatively low², and they primarily cover content distributed through subscription services. If a publishing contract includes the full transfer of rights to the publisher, the publisher can license the content for AI training without seeking the authors’ consent (Hansen, 2024). This underscores the importance of the rights retention strategy. Major publishers, along with Clarivate, are rapidly developing new AI-based businesses, which are evolving into data cartels (Pooley, 2024). This could lead to a situation where the academic AI market adopts the same oligopolistic structure as the current academic publishing market.

In June 2025, Creative Commons unveiled CC Signals (“Introducing CC Signals,” 2025), an initiative widely regarded as the most substantial development in open licensing since the suite’s initial introduction. This move responds to a recognized shortfall in the capacity of existing Creative Commons licenses to govern rights concerning AI training data. Rather than operating as a simple binary of permission or denial, CC Signals proposes a system of preference signals. Its stated aim is to increase reciprocity and “sustain the commons in the age of AI.” This will allow dataset curators to express conditional preferences for machine-based content reuse. A critical aspect of the design is its dual nature as both a technical and a legal instrument. The signals are engineered to be machine and human readable, a feature intended to provide flexible application across what might be termed legal, technical, and normative contexts. One might ask, however, whether such flexibility can be standardized effectively across diverse jurisdictions.

An alpha version was initially planned for release in November 2025; however, as of January 2026, CC Signals remains in development phase. The development process is being conducted, in the words of Creative Commons, “alongside our partners and community,” a commitment evidenced by the solicitation of input through a GitHub repository (Creativecommons/Cc- Signals, 2025/2025). This approach suggests an iterative methodology where the technical specifications are being refined in direct response to stakeholder feedback, a process that may prove crucial for the framework’s eventual adoption and legitimacy. The ultimate success of this endeavor, of course, remains to be seen, depending on complex factors of both technical implementation and community buy-in.

The Really Simple Licensing (RSL) standard, launched September 2025 and formalized as RSL 1.0 in December 2025 (Eayrs, 2025), represents the next step and a potentially significant moment in machine-readable licensing frameworks. RSL operates as an open standard enabling publishers to embed machine-readable licensing terms directly into web content metadata. Unlike static legal documents, RSL terms execute automatically when AI crawlers access content, eliminating the need for bilateral negotiation or centralized licensing intermediaries. RSL embeds licensing preferences directly into robots.txt files, HTTP headers, RSS feeds, and HTML metadata. Possible compensation models include (Eayrs, 2025):

Attribution-Only License: free machine access provided visible credit and functional links to original sources are included in AI outputs.
Pay-Per-Crawl: automated micropayments triggered each time an AI system accesses copyrighted content.
Pay-Per Inference: Developers pay creators a micro-royalty for each inference (output generation) their AI model produces.
Attribution + Reciprocity: AI developers commit to supporting content ecosystems through donations to non-profits, shared datasets, or open-source model releases.

Right holders also have an option to prohibit AI use and search summaries (Really Simple Licensing (RSL) 1.0 Specification, 2025).

CC Signals and RSL standard shift the narrative from (non)transformative and (non)commercial use to the question “What did the publisher declare acceptable?”. However, RSL compliance remains entirely voluntary with no binding legal requirement for AI companies to honor machine-readable licensing terms. Without partnerships with content delivery networks like Fastly or Cloudflare to implement technical barriers blocking non- compliant crawlers, publishers can request payment but lack mechanisms to enforce it.

A further complication involves the tricky question of retroactive application. The scholarly ecosystem already contains millions of research outputs shared under existing CC BY licenses, which cannot be easily relicensed. It is therefore questioned whether CC Signals will apply solely to new publications. Such a limitation would potentially create a two-tiered system, a situation where a vast corpus of older research remains freely trainable by AI systems without any reciprocity requirements, thereby diluting the framework’s overall impact.

Legal Analysis of AI-Generated Content as a Fair Use Case

The fair use analysis applied to AI training practices has yielded deeply divided judicial opinions. This legal uncertainty is extensively documented in comparative legal scholarship (Chopra et al., 2025; Sag & Yu, 2024), which highlights how different legal traditions produce varied outcomes in analogous cases. Recent litigation highlights the critical importance of two factors: the manner in which training data is sourced and the commercial nature of the eventual AI application.

A persistent and legally uncertain question is whether content produced by AI systems should be classified as a derivative work of the copyrighted materials utilized during the training process. Courts have thus far provided inconsistent guidance on this matter. Under the framework of the U.S. copyright law, a derivative work is legally defined as one “based upon one or more preexisting works,” which constitutes a “recasting, transformation, or adaptation” of an original source (17 U.S. Code § 101 – Definitions, n.d.).

In May 2025, the U.S. Copyright Office released Part 3 of its comprehensive Report on Copyright and Artificial Intelligence, directly addressing whether the training of generative AI systems on copyrighted works constitutes fair use (Copyright and Artificial Intelligence, Part 3: Generative AI Training, 2025). The main conclusion is that AI training does not automatically qualify as a fair use case, each case requires thorough analysis under the four statutory fair use factors – purpose and character, nature of the copyrighted work, amount and substantiality, and market effect.

Proponents of classifying AI output as derivative often argue that these outputs can, in certain circumstances, incorporate protectable expression from the training dataset. This is particularly plausible when the generated content bears a close resemblance to specific, identifiable source materials. At the same time, the degree of similarity substantial enough to consider the work a derivative is not entirely clear (Griem, Jr. & Wallace, 2023).

Conversely, a strong counterargument suggests that AI models do not store or replicate copies but instead learn to generate content based on abstract statistical patterns. From this perspective, the outputs represent novel combinations that do not constitute a direct adaptation of any specific pre-existing work. This view found judicial support in Kadrey v. Meta Platforms, Inc., No. 3:24-cv-02029 (2025), where the Northern District of California dismissed the claim that an LLM is itself an infringing derivative work. The court deemed this notion “nonsensical,” reasoning that there is no way to understand an LLM as a recasting or adaptation of the plaintiffs’ books. The court also rejected the broader proposition that every output from an LLM is automatically a derivative work, insisting instead on a case-specific demonstration of substantial similarity.

The settlement in Bartz v. Anthropic PBC, No. 3:24-cv-05417 (2025) points toward a more nuanced legal pathway. In that case, Judge Alsup determined that training the Claude model on legally acquired books was a “transformative” fair use. Nevertheless, the subsequent $1.5 billion settlement represents a watershed moment in this area of litigation. The lead plaintiff, Andrea Bartz, characterized the $1.5 billion settlement as a clear message: “You are not above the law, our intellectual property isn’t yours for the taking” (Ortutay, 2025). The settlement draws a sharp distinction based on data provenance. While training on legally obtained books was deemed potentially fair, Judge Alsup ruled that Anthropic’s use of “pirated” copies from shadow libraries was inherently infringing.”

Building on the momentum of the Bartz v. Anthropic class action settlement and explicitly rejecting its purportedly “bargain-basement” terms, Pulitzer Prize-winning journalist John Carreyrou filed individual copyright infringement actions on December 22, 2025, against six major AI companies on behalf of himself and five fellow authors (Carreyrou v. Anthropic PBC et al., 25-cv-10897, 2025). This lawsuit signals that the copyright community views the Anthropic settlement not as a resolution but as an inadequate interim measure, and that subsequent litigation will likely demand accountability for the full scope and value of training data misappropriation, potentially triggering damages awards that would fundamentally reshape AI companies’ business models and their obligations to compensate creative rights holders.

Arguments against fair use, however, concentrate on commercial harm and the potential for market substitution. A pivotal moment came from the Delaware District Court in Thomson Reuters Enterprise Centre GmbH v. Ross Intelligence Inc., No. 1:20-cv-613-SB (2025), which issued the first unambiguous rejection of a fair use defense for AI training. The court concluded the use was “commercial” and, importantly, “not transformative.” A key element of the ruling was the recognition of an “obvious” potential market for licensing copyrighted works specifically for AI training. The court took judicial notice that more and more copyright owners are striking deals with AI companies to license works for training purposes, thereby substantiating the market harm factor.

A critical evidentiary ruling issued on January 5, 2026, in the consolidated copyright litigation against OpenAI (In Re: OpenAI, Inc. Copyright Infringement Litigation, 2026) directly undermines OpenAI’s fair use defense by requiring production of 20 million de-identified ChatGPT user interaction logs. The District Court affirmed Magistrate Judge Ona T. Wang’s November 7, 2025 discovery order, rejecting OpenAI’s privacy and burden objections and holding that the full unfiltered log sample is essential because even logs which do not reproduce plaintiffs works may help OpenAI assert defenses such as fair use and are thus relevant for this case under the applicable discovery standard.

Europe demonstrates completely different legal patterns. The Munich Regional Court issued a comprehensive decision on November 11, 2025, in GEMA v. OpenAI (Case No. 42 O 14139/24) finding that both the training of ChatGPT using protected song lyrics and the subsequent output of those lyrics constitute copyright infringement under German law (Entscheidung GEMA v. OpenAI, 42 O 14139/24, 2025). The court rejected OpenAI’s reliance on the EU text and data mining (TDM) exception under Directive 2019/790, holding that the reproduction of song lyrics within model parameters exceeds the scope of permissible TDM, which is limited to extraction of information such as abstract syntactic regulations, common terms and semantic relationships rather than “memorization” of entire creative works. Critically, the court placed full responsibility on OpenAI as the developer and operator of the system, rejecting arguments that user prompts create independent liability and concluding that OpenAI directly commits the act of communicating to the public by designing and operating a system capable of reproducing protected works.

Beyond the training-data copyright disputes exemplified by Bartz v. Anthropic and Kadrey v. Meta, a separate class of AI copyright infringement has emerged involving retrieval-augmented generation (RAG) systems that reproduce copyrighted content in real-time response to user queries, rather than through initial model training. On December 5, 2025, The New York Times Company filed a comprehensive copyright and trademark infringement complaint against Perplexity AI, Inc. (The New York Times Company v. Perplexity AI, Inc., Complaint for Copyright Infringement, Trademark Infringement, and Related Relief, Case No. 1:25-cv- 10106, 2025), alleging that Perplexity’s AI-powered search engine systematically copies and reproduces Times’ content without authorization, despite 18 months of cease-and-desist letter. The significance of New York Times v. Perplexity lies in its articulation of a two-stage infringement model that challenges the assumption that fair use protections extend to all forms of AI content processing. In Stage 1 (Input), the complaint alleges that Perplexity deployed specialized web crawlers (PerplexityBot and Perplexity-User-Agent) that made over 175,000 attempts to access nytimes.com in August 2025 alone, deliberately circumventing The Times’ robots.txt protocol and hard-block technical barriers designed specifically to prevent scraping. The complaint characterizes this conduct as willful, since The Times had previously notified Perplexity by cease-and-desist letters to cease accessing its content, the continued scraping after express objection establishes the requisite intent for enhanced damages.

In Stage 2 (Output), the complaint alleges that Perplexity’s GenAI products generate responses that are “identical or substantially similar” to Times content, reproducing entire articles or substantial portions thereof in “verbatim or near-verbatim” forms that provide users with comprehensive answers eliminating any need to visit the source material. The Times provides specific examples.

The complaint asserts that this two-stage process constitutes infringement that cannot be shielded by fair use. The key distinction from prior AI training cases is that fair use analysis has typically focused on transformative creation of new works from training data. By contrast, Perplexity’s RAG system creates no new work; it retrieves existing copyrighted content and reproduces it, adding only minor synthesizing commentary.

Table 2 summarizes the analyzed legal cases in terms of fair use statutory factors.

*Table 2: Four-Factor Fair Use Test Application in the AI Copyright Cases (2025)*
Case and Court	Factor 1: Purpose and Character of Use	Factor 2: Nature of the Copyrighted Work	Factor 3: Amount and Substantiality Used	Factor 4: Market Effect	Fair Use Outcome
Kadrey v. Meta Platforms, Inc.	Transformative: commercial AI training creates fundamentally new tool (general- purpose LLM) distinct from original novels	Favors fair use: novels are creative works, but transformative use of training data mitigates	Disfavors fair use: entire books used as training data; substantial copying	Favors fair use: no market harm demonstrated; plaintiffs failed to prove AI outputs competed with book sales; no licensing market shown	Fair use upheld: training on lawfully acquired books constitutes fair use despite commercial nature
Thomson Reuters Enterprise Centre GmbH v. Ross Intelligence Inc.	Not transformative: commercial AI tool directly competes with Westlaw	Disfavors fair use: copyrighted summaries and original documents	Disfavors fair use: entire copyrighted headnotes directly copied into training data; substantial amount of identifiable works reproduced	Strong market harm: direct competition with Westlaw legal research platform; emerging licensing market for legal data to train AI	Fair use rejected: partial summary judgment for Thomson Reuters; court found non- transformative commercial infringement with clear market substitution
Bartz v. Anthropic PBC	Mixed: training on lawfully acquired books = transformative; training on pirated books = not transformative; commercial model	Disfavors fair use: books (creative works); full copyright protection	Disfavors fair use: millions of entire books used; critical distinction drawn between lawful and pirated sources	Significant market harm: market harm from piracy; emerging licensing deals with publishers; settlement value ($1.5B) acknowledges infringement liability	Settled for $1.5 billion Partial fair use for lawful training; pirated book training admitted infringing; data provenance (lawful vs. pirated) is a decisive factor
The New York Times Company v. Perplexity AI, Inc. (compliant filed – early-stage litigation!)	Not transformative: real-time content retrieval and reproduction to end users; direct market substitute for visiting nytimes.com; RAG system retrieves and reproduces, not creates	Disfavors fair use: original journalism; creative editorial composition and analysis; favors copyright holder	Disfavors fair use: verbatim or near- verbatim reproduction of entire articles; specific examples include complete article copying	Perfect market substitution: direct competition with nytimes.com; users obtain comprehensive content summaries without visiting the source; lost pageviews, ad revenue, subscriptions	Alleged infringement Complaint alleges two- stage infringement (scraping + output reproduction); willful infringement alleged
GEMA v. OpenAI (German law)	Not transformative: paid commercial subscription model; memorization of protected song lyrics for commercial AI product	Disfavors fair use: song lyrics (creative musical pieces); full copyright protection under German law	Disfavors fair use: entire song lyrics embedded in model parameters; memorization creates retrievable reproductions matching source lyrics	Direct market harm: Eliminates licensing opportunity; copyright holders unable to control reproductive uses	Fair use rejected: German court found TDM exception inapplicable; memorization exceeds permissible use; ordered cease-and- desist + damages; rejects EU TDM exception for AI training

A Call for Action

Science and artificial intelligence (AI) are closely linked. Research provides data, which is crucial for training large language models (LLMs) and advancing data science more broadly. At the same time, generative AI (GenAI) is revolutionizing research. Open-source LLMs are an essential part of open science. While AI presents significant opportunities for scientific advancement, it also poses substantial risks. Legislation in this field is still evolving, and regulatory and policy documents often focus on attracting investment in AI or promoting its responsible development and use. The use of publicly available research outputs for training LLMs falls into a “grey area.” At the moment, the community lacks any meaningful discussion on the reuse of academic content for LLMs’ training. Attempts to raise this issue are made, but their impact is rather limited (Decker, 2025). Below, I offer some thoughts on actions that can be taken in the near future.

First and foremost, AI training should not be considered an exception to copyright law (i.e., under the fair use doctrine). Recognizing LLM training as a case of fair use undermines efforts to reform copyright regulation. In my opinion, LLM training should not qualify as fair use for at least three main reasons:

Many AI systems already operate on paid subscription models. Even if no fees are currently charged, there are no legal restrictions preventing these models from becoming commercialized in the future.
AI-generated content often closely resembles the original, making it subject to copyright and attribution requirements. It’s a derivative work (not transformative)!
AI-generated output, not referencing the source, violates academic traditions of credit allocation. It can be considered a special case of “market harm.”

This issue is particularly relevant in the U.S. context but given that most AI developers are based in the United States, it is critical for the global development of the industry. Some researchers argue that it will take years for U.S. courts to address the issue of licensing content for LLM training (Bergstrom & Ruediger, 2024). This is a major concern for the academic community, as the market will continue to evolve, researchers will increasingly rely on AI for interacting with scholarly output, and it will become more difficult to implement changes (see below for further discussion of limitations and challenges).

Authors should have the option to refuse the use of their work for training GenAI models or specific groups of such models. On the one hand, in terms of Creative Commons licensing, there are two possible strategies to achieve this:

Examine existing licenses: The Creative Commons BY-ND (Attribution- NoDerivatives) license could be considered restrictive for AI training, but only if regulatory frameworks recognize AI-generated content as derivative works. However, determining whether AI-generated content qualifies as a derivative work is complicated by the fact that LLMs can produce different responses for each query, making it difficult to assess similarity to the original. The BY-NC (Attribution-NonCommercial) license may also be restrictive for training models intended for commercial use

Introduce a new “NT” (no train) extension: This would explicitly prohibit the use of licensed works for AI training. However, since the original datasets used for LLM training are not publicly accessible, the prospects for enforcing such licensing terms remain uncertain. Additionally, publishing contracts should specify that publishers cannot use articles to train their LLMs or other AI models without author consent.

On the other hand, the evident limitations of existing Creative Commons licenses in governing AI training practices, as previously discussed, appear to have motivated the development of CC Signals. This new framework for expressing preferences is conceived specifically for contemporary AI applications. Instead of depending on interpretations of legal tools designed for human-centric content sharing, CC Signals proposes purpose-built mechanisms to articulate permissions and constraints for machine learning processes.

From a personal perspective, there is a compelling case for the academic community to rally behind the CC Signals initiative, rather than pursuing an isolated NT license extension. The CC Signals framework, despite its ongoing development, seems to offer a more comprehensive approach. It attempts to grapple with the intertwined challenges of enforceability, machine readability, and the crucial need for international coordination. The advantage of CC Signals lies in its institutional backing from a trusted organization, its participatory development process, and its explicit design for systems-level coordination across multiple domains.

Consequently, universities and major research funders would be well-advised to commit to adopting CC Signals, though such commitment should be contingent on the framework meeting specific requirements. These would include the provision of legally binding signals where jurisdictionally feasible, a default position that opts-in to commercial AI training unless otherwise specified, clearly articulated definitions of what constitutes adequate reciprocity, and the development of mechanisms for retroactive application to content already published under traditional CC licenses, though this presents obvious practical difficulties.

Another emerging mechanism is the Really Simple Licensing (RSL) standard, formalized as RSL 1.0 in December 2025. Unlike static copyright licenses, RSL terms execute automatically when AI crawlers access content, because the licensing preference is embedded directly into robots.txt files, HTTP headers, RSS feeds, and HTML metadata.

Universities as Key Players in AI Regulation

Universities should take a leading role in regulating AI. On the one hand, universities often act as publishers or maintain their own repositories, making it feasible to implement content licensing approaches in practice. On the other hand, universities conduct research and develop GenAI models, placing them at the forefront of addressing the ethical aspects of these processes. Furthermore, universities can provide evidence to support legislative regulation. Having said that, I must acknowledge that universities lack the regulatory power that governments possess. Among specific actions that universities can take I would mention:

Supporting development and implementation of CC Signals and RSL,
Adopting institutional policies that prohibit the use of faculty work for AI training without consent,
Developing open-source, responsible LLMs,
Creating mandatory AI usage and AI ethics curricula.

Most community documents in open science remain silent on the matter of AI training, e.g. the recent Barcelona Declaration on Open Research Information (Kramer et al., 2024). This apparent oversight represents a significant missed opportunity for the open science community to establish a coherent normative position on this issue. Such a position is arguably needed before commercial practices and legal expectations become entrenched and more resistant to change. The current moment therefore presents a critical juncture. It seems imperative that the community acts now to shape governance frameworks, ensuring that the principles of open science ultimately serve broader human flourishing rather than facilitating primarily corporate extraction.

Legislative Measures and International Cooperation

Governments and international organizations must develop and implement legislative measures to protect authors’ rights and prevent unauthorized use of their works for training GenAI models. One of the first steps should be the mandatory disclosure of training datasets by developers.

The challenge lies not only in adopting national AI laws but also in harmonizing these laws globally. Without international coordination, commercial developers could exploit “safe harbors” to serve their own interests. Therefore, it is essential for large intergovernmental organizations, such as UNESCO, to take on this task. International copyright harmonization for AI training would require establishing minimum standards while allowing national flexibility, similar to the TRIPS Agreement model (Agreement on Trade-Related Aspects of Intellectual Property Rights, 1994). Such harmonization requires at least three components: law alignment, procedural standardization, and enforcement coordination (“Harmonization of International Copyright Standards,” 2025).

Another challenge is that AI models cannot be “untrained.” If restrictions are imposed only on new models, existing models would gain a non-market advantage. Conversely, applying restrictions retroactively to existing models could destabilize the industry. A responsible dialogue is needed to find a balanced solution. One possible approach is retrieval-augmented generation, which allows models to reference relevant papers in their outputs (“AI Firms Must Play Fair When They Use Academic Data in Training,” 2024).

Conclusion

The author of this article does not oppose AI. In fact, while writing this manuscript, I used Yandex.Translate (YaGPT-5) and DeepSeek R3 to assist with reading Chinese and German source texts and proofreading the final version the paper, as well as Perplexity to search for relevant sources. This analysis argues that existing copyright frameworks are fundamentally ill- equipped to regulate the use of scholarly works in AI training, thereby posing urgent challenges to academic integrity and the equitable production of knowledge. Three principal findings arise from this investigation.

Widely implemented licensing mechanisms such as Creative Commons licenses prove inadequate for addressing the distinct challenges presented by AI training. Although CC BY licenses function effectively for human-to-human content sharing, they were never intended to govern machine learning processes at scale, nor do they adequately differentiate between non- commercial research applications and commercial AI development. The 2025 launch of CC Signals constitutes a significant institutional reaction to this problem. In December 2025, the RSL Technical Steering Committee published a machine-readable licensing standard RSL 1.0., enabling different models of compensation and direct prohibition. Nevertheless, potential efficacy of these advances depends on resolving fundamental questions concerning enforceability, its retroactive application, and the necessity for international coordination.

Fragmented Regulation and Judicial Practice

Secondly, findings reveal fundamentally divergent regulatory approaches among major jurisdictions in addressing copyright protection for AI training data. While the United States initially attempted to address copyright issues under the Biden administration through Executive Order 14110 and the proposed Generative AI Copyright Disclosure Act, the Trump administration’s 2025 AI Action Plan marks a significant shift in priorities, moving away from the copyright-focused approach of the previous administration. This creates a regulatory environment that critics argue disproportionately favors the interests of AI developers over those of copyright holders. However, California’s Assembly Bill 2013 (AB 2013), which took effect on January 1, 2026, represents a notable state-level countermovement, establishing mandatory disclosure requirements for the intellectual property status of AI training datasets. Yet this federal-state regulatory fragmentation creates additional uncertainty: while AB 2013 requires transparency on copyright status, the Trump Administration’s December 2025 Executive Order simultaneously attempts to block state AI laws through federal litigation.

The European Union has emerged as the global leader in establishing enforceable copyright protections for AI training. Building upon the Copyright Directive (EU) 2019/790, which established a text and data mining opt-out framework under Article 4(3), the EU AI Act’s Article 53 creates binding obligations for general-purpose AI providers to identify and respect rightsholders’ reservations of rights, publish detailed training content summaries, and implement policies to prevent copyright-infringing outputs. The GPAI Code of Practice, adopted in July 2025, translates these requirements into five concrete measures including respecting machine-readable opt-out protocols and establishing complaint mechanisms for rightsholders. Critically, the European Commission’s ongoing stakeholder consultation (concluding January 23, 2026) aims to standardize machine-readable opt-out protocols, addressing a key implementation challenge. Unlike other jurisdictions that merely recommend or defer to existing copyright laws, the EU has created an operationalized opt-out model with extraterritorial reach. Significantly, while the Digital Omnibus proposal delays certain high- risk AI obligations until 2027 in response to industry lobbying, Article 53 copyright transparency requirements that entered into force on August 2, 2025, remain unaffected by these delays, demonstrating the EU’s commitment to prioritizing intellectual property protection even amid pressure to ease regulatory burdens.

The UK’s 2025 Artificial Intelligence (Regulation) Bill represents another significant development, directly addressing the copyright protection gap by requiring organizations training AI systems to report all third-party data and intellectual property used, with assurances of informed consent and compliance with copyright laws. The Data (Use and Access) Act 2025, which received Royal Assent on June 11, 2025, establishes a statutory framework mandating the government to publish by March 18, 2026 both an economic impact assessment and a comprehensive report on the use of copyright works in AI development, addressing three competing policy objectives: control, access, and transparency. International frameworks from OECD, UNESCO, and G7 continue to defer to existing copyright laws rather than establishing AI-specific IP protections, while China’s regulatory measures remain focused on risk management and content control rather than addressing fundamental copyright issues in AI training.

The trajectory of 2025 fair use cases reveal system tensions. U.S. courts oscillate between protecting AI innovation (Kadrey v. Meta) and recognizing legitimate copyright claims (Thomson Reuters v. Ross Intelligence; New York Times v. Perplexity AI (filed)). International jurisdictions have moved decisively toward copyright enforcement: the Munich Regional Court rejected fair use defenses in GEMA v. OpenAI, finding that memorization of song lyrics in commercial LLMs exceeded permissible text-and-data-mining uses under German law and EU Copyright Directive. This conceptual instability, reflected in the U.S. split between transformativeness-focused defenses (Kadrey) and market-harm-based liability (Thomson Reuters, Perplexity), has catalyzed two alternative frameworks that may reshape AI content governance without awaiting further judicial consensus or international harmonization.

Towards Implementation

These findings collectively support the idea of harmonizing international legislative efforts. The objective of such efforts would be to ensure transparency, protect intellectual property, and prevent the emergence of an oligopolistic market structure that could potentially prioritize commercial profit over research integrity and equitable knowledge production.

Achieving this harmonization demands three interconnected components. First, substantive law alignment is needed to establish minimum standards for author consent and disclosure, while still permitting national flexibility in implementation mechanisms, perhaps following models like the TRIPS Agreement. Second, procedural standardization should focus on creating machine-readable permission frameworks, CC Signals and RSL standard are examples, which would enable automated compliance verification at a large scale. Third, enforcement coordination is essential to prevent regulatory arbitrage, a situation where AI companies might exploit jurisdictional differences by training models in permissive legal environments for subsequent global deployment.

Fundamental Choice

The academic community stands at a critical juncture. Current market dynamics disproportionately benefit large AI developers, who can train models on copyrighted scholarly works without providing compensation or attribution. This pattern risks replicating the very oligopolistic concentration that already plagues the academic publishing industry. Therefore, universities, research funders, and scholarly societies must act decisively to influence governance frameworks before these commercial practices become entrenched. Specific actions could include the institutional adoption of CC Signals and RSL standard, contingent upon their binding enforceability; the development of new rights retention contract that explicitly prohibits unauthorized AI training; and coordinated advocacy for legislative reforms that establish firm transparency and consent requirements.

The central question, then, is not whether AI should be utilized in research, but rather whether its development will proceed in a manner that respects the intellectual property rights of knowledge creators, or if it will perpetuate extractive models that ultimately undermine the foundations of scholarly communication. This analysis suggests that without coordinated action across stakeholder groups and jurisdictions, the latter outcome appears increasingly inevitable. The time for incremental measures has passed; what is needed now is the immediate establishment of comprehensive governance frameworks.

It may appear that I oppose reuse, which is a fundamental part of the open science agenda. However, my critique targets specifically the commercial exploitation of research outputs absent appropriate attribution and remuneration. I argue that establishing transparent governance frameworks for industry actors, coupled with explicit rights reservation mechanisms for authors, will ultimately strengthen and advance open science sustainability.

Limitations of the Study

This analysis is subject to several important limitations that affect its scope and temporal applicability. In terms of geographic coverage, the study focuses on major economies and global initiatives but does not sufficiently engage with significant regional actors (South Korea, Taiwan, Israel). These regions may have developed distinctive regulatory or judicial approaches that could enrich the global understanding of AI copyright issues.

The further constraint is temporal. The field of AI and its corresponding regulatory environment are in a state of constant and rapid change. New legal precedents, settlement agreements, and regulatory frameworks are continuously emerging. Consequently, the information presented here is most accurate as of its publication date and may require revision as the legal landscape evolves.

The jurisdictional focus of the legal analysis is predominantly on U.S. copyright law and recent American court decisions. There is limited coverage of how analogous issues are being regulated in other major legal systems, such as the European Union, China, or the United Kingdom, or in emerging economies with growing AI sectors where different legal traditions may yield different answers.

Finally, a call for action mainly addresses open science outputs and relevant licensing. This area represents a major gap in regulation – copyright infringements are covered much better.

Future Research Directions

This analysis reveals several promising avenues for further research:

Systematic examination of how different legal systems approach AI copyright issues, particularly in civil law jurisdictions and countries with different fair dealing/fair use
Empirical research on the actual economic effects of AI training on copyright holders, including quantification of market harm and analysis of emerging licensing markets.
Interdisciplinary research examining how specific AI architectures and training methodologies affect legal analysis, particularly regarding substantial similarity and transformative use determinations.
Longitudinal studies evaluating the effectiveness of emerging AI copyright regulations, including the UK’s proposed disclosure requirements and the EU’s transparency
Forward-looking research on how developments in AI technology (such as few-shot learning, federated training, or synthetic data generation) might affect the legal landscape and require new regulatory approaches.
Development of machine-readable licensing standards, establishment of enforceability mechanisms, and evaluation of multiple deployment scenarios.

Acknowledgments

The author gratefully acknowledges the peers’ contributions:

Ludo Waltman: Supervision, Writing – review & editing
Erna Sattler: Writing – review & editing

Author’s Contribution

Dmitry Kochetkov: Conceptualization, Investigation, Writing – original draft

AI Assistance Disclosure

This manuscript was prepared with the assistance of artificial intelligence tools for the following purpose:

Translation assistance for reading and interpreting source texts (Yandex.Translate (YaGPT-5) and DeepSeek R3)
Proofreading and linguistic refinement of the final manuscript version (Yandex.Translate (YaGPT-5) and DeepSeek R3)
Retrieval of legislative acts and legal precedents (Perplexity AI)

Notes

¹ Statutory exception permitting all AI training without copyright holder consent; no opt-out mechanism; strongest AI developer protection; weakest author protection.

² Generative AI Licensing Agreement Tracker. URL: https://sr.ithaka.org/our-work/generative-ai-licensing-agreement-tracker/.

³ However, can we be certain that today’s open models will not be commercialized in the future?

References

17 U.S. Code § 101—Definitions. (n.d.). LII / Legal Information Institute. Retrieved October 22, 2025, from https://www.law.cornell.edu/uscode/text/17/101

A European Strategy for Artificial Intelligence in Science Paving the way for the Resource for AI Science in Europe (RAISE). (2025). [COM]. European Comission. https://primarysources.brillonline.com/browse/human-rights-documents-online/communication-from-the-commission-to-the-european-parliament-and-the-council;hrdhrd46790058

A pro-innovation approach to AI regulation (No. 815). (2023). Department for Science, Innovation & Technology. https://www.gov.uk/government/publications/ai-regulation-a-pro-innovation-approach/white-paper

Agreement on Trade-Related Aspects of Intellectual Property Rights. (1994). I.L.M.

AI Action Plan. (2025, July 23). https://www.ai.gov/action-plan

AI firms must play fair when they use academic data in training. (2024). Nature, 632(8027), 953–953. https://doi.org/10.1038/d41586-024-02757-z

AI principles. (2019). OECD. https://www.oecd.org/en/topics/ai-principles.html

Artificial Intelligence (AI) in Science. (2025, April 7). Research and Innovation. https://research-and-innovation.ec.europa.eu/research-area/industrial-research-and-innovation/artificial-intelligence-ai-science_en

Artificial Intelligence (Regulation) Bill, No. HL Bill 82, UK Parliament (2025). https://bills.parliament.uk/bills/3519

Bartz v. Anthropic PBC, No. 3:24-Cv-05417 (N.D. Cal. August 26, 2025). https://www.courtlistener.com/docket/69058235/bartz-v-anthropic-pbc/

Bergstrom, T., & Ruediger, D. (2024). A Third Transformation? Generative AI and Scholarly Publishing. Ithaka S+R. https://doi.org/10.18665/sr.321519

California Assembly Bill 2013, Generative Artificial Intelligence: Training Data Transparency Act, Pub. L. No. AB 2013 (2024). https://leginfo.legislature.ca.gov/faces/billTextClient.xhtml?bill_id=202320240AB2013

Carreyrou v. Anthropic PBC et al., 25-Cv-10897 (U.S. District Court, Northern District of California December 22, 2025). https://www.pacermonitor.com/public/case/62003240/Carreyrou_et_al_v_Anthropic_PBC_et_al

Chopra, P., Sony, R., & Chopra, S. (2025). Generative AI, Copyright and Personality Rights: A Comparative Legal Perspective. Legal Issues in the Digital Age, 6(3), 23–51. https://doi.org/10.17323/2713-2749.2025.3.23.51

Commission presents template for General-Purpose AI model providers to summarise the data used to train their model. (2025, July 24). https://digital-strategy.ec.europa.eu/en/news/commission-presents-template-general-purpose-ai-model-providers-summarise-data-used-train-their

Copyright and Artificial Intelligence, Part 3: Generative AI Training. (2025). https://www.copyright.gov/ai/Copyright-and-Artificial-Intelligence-Part-3-Generative-AI-Training-Report-Pre-Publication-Version.pdf

Creativecommons/cc-signals. (2025). [Computer software]. Creative Commons. https://github.com/creativecommons/cc-signals (Original work published 2025)

Data (Use and Access) Act 2025, c.18 UK Public General Acts (2025). https://www.legislation.gov.uk/ukpga/2025/18/contents

De La Torre-López, J., Ramírez, A., & Romero, J. R. (2023). Artificial intelligence to automate the systematic review of scientific literature. Computing, 105(10), 2171–2194. https://doi.org/10.1007/s00607-023-01181-x

Decker, S. (2025, April 15). Guest Post – The Open Access – AI Conundrum: Does Free to Read Mean Free to Train? The Scholarly Kitchen. https://scholarlykitchen.sspnet.org/2025/04/15/guest-post-the-open-access-ai-conundrum-does-free-to-read-mean-free-to-train/

Digital Omnibus Regulation Proposal. (2025, November 19). https://digital-strategy.ec.europa.eu/en/library/digital-omnibus-regulation-proposal

Directive (EU) 2019/790 of the European Parliament and of the Council of 17 April 2019 on Copyright and Related Rights in the Digital Single Market and Amending Directives 96/9/EC and 2001/29/EC (Text with EEA Relevance.), EP, CONSIL, 130 OJ L (2019). http://data.europa.eu/eli/dir/2019/790/oj

Directorate-General for Research and Innovation. (2025). Living guidelines on the responsible use of generative AI in research. European Commission.

Eayrs, A. (2025, December 10). Integrating Choices in Open Standards: CC Signals and the RSL Standard. Creative Commons. https://ec2-3-130-221-114.us-east-2.compute.amazonaws.com/2025/12/10/integrating-choices-in-open-standards/

Entscheidung GEMA v. OpenAI, 42 O 14139/24 (Landgericht München I November 11, 2025). https://www.gesetze-bayern.de/Content/Document/Y-300-Z-GRURRS-B-2025-N-30204

Executive Order 14110, On the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence (2023). https://www.whitehouse.gov/briefing-room/presidential-actions/2023/10/30/executive-order-on-the-safe-secure-and-trustworthy-development-and-use-of-artificial-intelligence/

Executive Order 14179, Removing Barriers to American Leadership in Artificial Intelligence (2025). https://www.federalregister.gov/documents/2025/01/31/2025-02172/removing-barriers-to-american-leadership-in-artificial-intelligence

Executive Order, Ensuring a National Policy Framework for Artificial Intelligence (2025). https://www.whitehouse.gov/presidential-actions/2025/12/eliminating-state-law-obstruction-of-national-artificial-intelligence-policy/

German, D. M. (2024). Copyright related risks in the creation and use of ML/AI systems (Version 1). arXiv. https://doi.org/10.48550/ARXIV.2405.01560

Global AI Governance Action Plan. (2025, July 26). Ministry of Foreign Affairs of the People’s Republic of China. https://www.fmprc.gov.cn/eng./xw/zyxw/202507/t20250729_11679232.html

Griem, Jr., J. M., & Wallace, J. (2023, August 28). Managing the Risk of Using AI-Generated Content in a World of Copyright Uncertainty: Are AI Content Generators and AI Generated Expressions “Derivatives” of Copyrighted Works? Carter Ledyard & Milburn LLP. https://www.clm.com/managing-the-risk-of-using-ai-generated-content-in-a-world-of-copyright-uncertainly-are-ai-content-generators-and-ai-generated-expressions-derivatives-of-copyrighted-works/

Haider, J., Söderström, K. R., Ekström, B., & Rödl, M. (2024). GPT-fabricated scientific papers on Google Scholar: Key features, spread, and implications for preempting evidence manipulation. Harvard Kennedy School Misinformation Review. https://doi.org/10.37016/mr-2020-156

Hansen, D. (2024, July 30). What happens when your publisher licenses your work for AI training? Authors Alliance. https://www.authorsalliance.org/2024/07/30/what-happens-when-your-publisher-licenses-your-work-for-ai-training/

Harmonization of International Copyright Standards: The Interaction Between the TRIPS Agreement and WIPO Treaties. (2025, April 29). Şengün Law. https://sengunlaw.com/harmonization-of-international-copyright-standards-the-interaction-between-the-trips-agreement-and-wipo-treaties/

Hiroshima Process International Guiding Principles for Advanced AI System. (2023). G7, Ministry of Foreign Affairs of Japan. https://www.mofa.go.jp/files/100573471.pdf

In Re: OpenAI, Inc. Copyright Infringement Litigation (S.D.N.Y. January 5, 2026). https://www.courtlistener.com/docket/69879510/in-re-openai-inc-copyright-infringement-litigation/

Interim Measures for the Management of Generative Artificial Intelligence Services (2023). http://www.cac.gov.cn/2023-07/13/c_1690898327029107.htm

Introducing CC Signals: A New Social Contract for the Age of AI. (2025, June 25). Creative Commons. https://creativecommons.org/2025/06/25/introducing-cc-signals-a-new-social-contract-for-the-age-of-ai/

Johnson, B. (2024, July 31). True or False? Addressing Common Assumptions About Copyright and AI. Copyright Clearance Center. https://www.copyright.com/blog/addressing-common-assumptions-copyright-ai/

Kadrey v. Meta Platforms, Inc., No. 3:24-Cv-02029 (N.D. Cal. June 25, 2025). https://www.courtlistener.com/docket/67569326/kadrey-v-meta-platforms-inc/

Khakurel, J., Penzenstadler, B., Porras, J., Knutas, A., & Zhang, W. (2018). The Rise of Artificial Intelligence under the Lens of Sustainability. Technologies, 6(4), 100. https://doi.org/10.3390/technologies6040100

Kochetkov, D. (2025, July 10). Science and Artificial Intelligence: A Copyright Perspective. 20th International Conference on Scientometrics & Informetrics. https://doi.org/10.51408/issi2025_047

Kramer, B., Neylon, C., & Waltman, L. (2024). Barcelona Declaration on Open Research Information. https://doi.org/10.5281/ZENODO.10958522

Lucchi, N. (2025). Generative AI and copyright: Training, creation, regulation. European Parliament. https://doi.org/10.2861/0365517

OpenAI and journalism. (2024, January 8). https://openai.com/index/openai-and-journalism/ Ortutay, B. (2025, September 25). Judge approves $1.5 billion copyright settlement between AI company Anthropic and authors. AP News. https://apnews.com/article/anthropic-authors-copyright-judge-artificial-intelligence-9643064e847a5e88ef6ee8b620b3a44c

Pollock, D., & Michael, A. (2024, December 10). News and Views: How much content can AI legally exploit? https://www.deltathink.com/news-and-views-how-much-content-can-ai-legally-exploit

Pooley, J. (2024). Large Language Publishing: The Scholarly Publishing Oligopoly’s Bet on AI. KULA: Knowledge Creation, Dissemination, and Preservation Studies, 7(1), 1–11. https://doi.org/10.18357/kula.291

Rama Padmaja, C. V., & Lakshminarayana, S. (2024). The rise of AI: A comprehensive research review. IAES International Journal of Artificial Intelligence (IJ-AI), 13(2), 2226. https://doi.org/10.11591/ijai.v13.i2.pp2226-2235

Really Simple Licensing (RSL) 1.0 Specification. (2025). RSL: Really Simple Licensing. https://rslstandard.org/rsl

Recommendation on the Ethics of Artificial Intelligence. (2022). UNESCO. https://www.unesco.org/en/articles/recommendation-ethics-artificial-intelligence

Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024 (2024). http://data.europa.eu/eli/reg/2024/1689/oj/eng

Sag, M., & Yu, P. K. (2024). The Globalization of Copyright Exceptions for AI Training. SSRN. https://doi.org/10.2139/ssrn.4976393

Schiff, A. B. [D-C.-30. (2024, September 4). Actions – H.R.7913 – 118th Congress (2023-2024): Generative AI Copyright Disclosure Act of 2024 (2024-04-09) [Legislation]. https://www.congress.gov/bill/118th-congress/house-bill/7913/all-actions

Schonfeld, R. C. (2024, October 15). Tracking the Licensing of Scholarly Content to LLMs. The Scholarly Kitchen. https://scholarlykitchen.sspnet.org/2024/10/15/licensing-scholarly-content-llms/

Senftleben, M. (2025). Win-Win: How to Remove Copyright Obstacles to AI Training While Ensuring Author Remuneration (and Why the AI Act Fails to do the Magic). Chicago-Kent Law Review, 1. https://scholarship.kentlaw.iit.edu/cklawreview/vol100/iss1/5

Strickland, E. (2021). The Turbulent Past and Uncertain Future of AI: Is there a way out of AI’s boom-and-bust cycle? IEEE Spectrum, 58(10), 26–31. https://doi.org/10.1109/MSPEC.2021.9563956

The General-Purpose AI Code of Practice. (2025, July 10). https://digital-strategy.ec.europa.eu/en/policies/contents-code-gpai

The New York Times Company v. Perplexity AI, Inc., Complaint for Copyright Infringement, Trademark Infringement, and Related Relief, Case No. 1:25-Cv-10106 (S.D.N.Y. December 5, 2025). https://www.courtlistener.com/docket/71997720/the-new-york-times-company-v-perplexity-ai-inc/

Thomson Reuters Enterprise Centre GmbH v. Ross Intelligence Inc., No. 1:20-Cv-613-SB (D. Del. February 11, 2025). https://www.courtlistener.com/docket/17131648/thomson-reuters-enterprise-centre-gmbh-v-ross-intelligence-inc/

Uuk, R. (2025, December 15). The EU AI Act Newsletter #92: AI Sandboxes Consultation Open [Substack newsletter]. The EU AI Act Newsletter. https://artificialintelligenceact.substack.com/p/the-eu-ai-act-newsletter-92-ai-sandboxes

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention is All you Need. Neural Information Processing Systems, 30, 5998–6008.

Walsh, K. (2023, August 18). Understanding CC Licenses and Generative AI. Creative Commons. https://creativecommons.org/2023/08/18/understanding-cc-licenses-and-generative-ai/

Wolff, J., Gordon, S., & Guo, D. (2018). The Rise of Artificial Intelligence. Advances in Social Sciences Research Journal. https://doi.org/10.14738/assrj.56.4722

Zhang, T. M., & Abernethy, N. F. (2025). Reviewing Scientific Papers for Critical Problems With Reasoning LLMs: Baseline Approaches and Automatic Evaluation (No. arXiv:2505.23824). arXiv. https://doi.org/10.48550/arXiv.2505.23824

Zhuang, Z., Chen, J., Xu, H., Jiang, Y., & Lin, J. (2025). Large language models for automated scholarly paper review: A survey. Information Fusion, 124, 103332. https://doi.org/10.1016/j.inffus.2025.103332

Editors

Kathryn Zeiler
Editor-in-Chief

Jason Chin
Handling Editor

Editorial assessment

by Jason Chin

DOI: 10.70744/MetaROR.297.1.ea

This preprint provides a call for researchers to act in response to the widespread use of their published research to train artificial intelligence models. In making this case, the author reviews several international legal frameworks and recent disputes. The two reviewers agree on this preprint’s strengths: the issue is important and the call to action is timely; the mapping of legal frameworks is useful to researchers in the field; and paper as a whole is an accessible primer for researchers to begin engaging in this area. They also note numerous areas for improvement. Both emphasise the need to engage seriously and critically with the political and commercial context that has given rise to lax regulation in this area. The reviewers also call for greater precision in the legal analysis and provide several recommendations to make the writing more persuasive and consistent.

Recommendations for enhanced transparency

Add a Data Availability Statement to report that no data are used in the article.
Add a competing interest statement. Authors should report all competing interests, including not only financial interests, but any role, relationship, or commitment of an author that presents an actual or perceived threat to the integrity or independence of the research presented in the article. If no competing interests exist, authors should explicitly state this.
Add a funding source statement. Authors should report all funding in support of the research presented in the article. Grant reference numbers should be included. If no funding sources exist, explicitly state this in the article.

For more information on these recommendations, please refer to our author guidelines.

Competing interests: None.

Peer review 1

Kathy Bowrey

DOI: 10.70744/MetaROR.297.1.rv1

The article is a call for action by researchers and research communities to more actively co-ordinate their response to AI training of research, acknowledging that different disciplines and individuals have contrary views on the utility of using the various AI tools and functions that are being developed. This is a worthy goal. However, in attempting to be even-handed and not come across as anti-AI the question of who is responsible for regulation of AI slop and its impact on the future of scholarly communication is overlooked.

The strengths of the article lie in providing a compilation of recent developments in AI and scholarly publishing and emerging regulatory options, including brief summaries of various regulatory initiatives and shifting policy positions on AI regulation in the UK, EU, US and China. The article also provides summaries of recent US law on the scope of copyright infringement in AI training, the limitations of US fair use doctrine and briefly acknowledges AI infringement in creating derivative works. There is also some analysis of Creative commons licences. The article works well as an introductory primer for scholars interested in finding their way into current policy discussions around AI, copyright and scholarly publishing. However, the use of AI to produce the analysis of Chinese law and policy is a cause of concern. I do not have expertise to verify whether or not the account is accurate and should be relied upon by others.

The political context in which copyright law operates in academia is not well contextualised. With the space given over to surveying new policy developments across the globe there ends up being a disproportionate focus on National Governments in setting AI standards and explaining national copyright frameworks at the expense of considering the role of private actors as architects of our global information systems. The complexity of AI regulation and the haphazard attention paid to copyright by national Governments can be readily explained. Copyright has not been left out of National and Global AI policy development by accident but through design. A higher priority has been attached to initiatives to support global AI development and there is reluctance to regulate US Big Tech which would harm AI investment and impact anticipated productivity gains in an AI driven economy.

The significance of licensing deals for AI policy

In terms of copyright reform, governments everywhere are relying on commercial actors- largely multinational publishers and media owners working with AI companies- to work out how to manage copyright amongst themselves through licensing deals to access past and future collections of scholarly works.

Both institutional and commercial frameworks mediate rights in scholarly publishing, and commercial choices frame the design of licensing conditions, including various permissions regarding AI. Both the role of institutional actors and the demands of commercial publishers in setting licensing conditions could be fleshed out more in the article.

There are co-ordination points that already exist to develop community of practice and house the capacity to develop responsible research policy around scholarly communications and AI. There are numerous international and national bodies with standard setting capacity eg. National Open Science Statements, UNESCO, DORA, Research Funders. Combined, these organisations can impact publisher practice and influence licensing terms and conditions. But to date there has been limited address to AI by these institutional actors who tend to move very slowly and they also have to also navigate Government AI positions. And in the meantime the use of AI in academia has taken off, including through our Universities entering into their own licensing deals with AI companies where adoption is eased by sweeteners like Microsoft’s highly questionable indemnity for copyright infringement when AI is used by staff and students.

Overall, the commercial context of scholarly publishing needs to be considered more in the article. The article significantly underreports the extent of AI licensing deals with key scholarly publishers, (the link to Ft 6 is not current but all the big scholarly publishers have AI licensing deals). Egregious examples of AI terms in contracts are already rife:

automatically being opted-in to AI use of every kind;
no royalties for AI use;
vague and meaningless royalty provisions eg. suggesting a 20% royalty with no means to determine how royalties will be proportionally calculated for individuals, how copyrights will be valued in global datasets, and which uses warrant payment;
assignment of rights to use an author’s image and likeness on future copyright works created using AI;
waivers of the moral rights of attribution, against false attribution and of integrity, which purport to preclude author objections to AI-generated content attributed to the author.

There are also attempts to address concern about AI hallucination, misinformation and disinformation by linking AI produced citations to the publisher-controlled copy of research, thereby trapping research communities into APCs for their research to remain visible in AI research ecosystems. And if you opt-out will your research be cited at all?

The light on the hill

It may be the case where publication is via Diamond OA that the relevant research community has control and capacity to adopt more considered approaches to regulating responsible use of AI, including adoption of new kinds of Creative Commons licensing. The strength of Diamond OA is that decision making about use of publications and ethics can be determined by the scholarly community.

As today most research licensing conditions are determined by funder mandates, research assessment policies, open science mandates, and institutional IP policies as well as by commercial actors. There is also scope for optimism there.

Revision recommendations

It needs a section that discusses the power of commercial actors impacting the possibilities for scholarly communications through setting licensing terms and conditions. In this regard, the focus on whether or not AI involves infringement in training, or transferring data, or generating content is not as significant an issue as is suggested because authors are not necessarily able to retain sufficient ownership of their scholarship to litigate to protect their interests.
The overview of developments should address the licensing collaboration between large publishing companies and AI platforms, and universities and AI companies, which has implications for the success of any Call to Action.
It needs to acknowledge that authors are subjected to competing pressures through funding mandates, research assessment, journal ownership which impacts their publication choices and licensing preference. The promotion of various Creative Commons licences to advance Open Access, including the normalisation of APCs and Read and Publish deals, came from the activities of institutional actors, more than the free choice of academics or publishers. The Call to Action might consider how to engage these parties on AI as well as the difficulties of them entering into debates about regulation of AI.
The costs of managing the consequences of paper mills and AI slop on academic communities, increased academic workload in identifying errors and misinformation and risk of serious harm from reliance on AI generated content and corruption of the scholarly record needs to be acknowledged. The volume of publications and quality concerns are continually increasing. Commercial models take no significant responsibility for the integrity of research. Publisher track record on responding to Expression of Concern and retraction is poor. Authors cannot get AI hallucinated publications and citations corrected or removed through established channels already. This is arguably the biggest concern with AI adoption in research, with significant resource and public policy implications.
The legal analysis needs a consideration of the moral rights of authors under Art 6 bis Berne Convention for Protection of Literary and Artistic Works which has been enacted in most nations excluding the US. These rights create attribution of authorship and integrity obligations that are likely to be routinely infringed by AI operations that lack verification of AI content. Institutions whose reputations are also impacted by Al slop attributed to associated researchers could assist them in exercising moral rights and demand publishers to act.
The AI assisted section on China needs verifying by an appropriate expert.
Informal language and first person references, especially in the second half of the paper, are likely related to the origins of this research paper as an oral presentation but they come across as lazy writing. Stylistically it makes the article less authoritative and come across more as an Op-Ed or Blog post than research. The dot points, Limitations and Future Directions are also not well developed. The overall structure of the paper needs improving which might allow for more sophisticated treatment of these issues.

Competing interests: None.

Peer review 2

William Nicholas Cesta

DOI: 10.70744/MetaROR.297.1.rv2

Thank you for the opportunity to read this piece, which explores interesting questions about how copyright regimes draw the boundaries between fair and unfair use of academic content by AI model developers.

Strengths

Mapping key regulatory instruments

One of the major strengths of the paper is your diligent mapping of major developments in AI regulation.

Explaining existing copyright schemes

You also did a great job of walking readers through existing copyright regimes and showing where they fall short. The legal debates surrounding fair use were helpful context, and your table — like the table outlining existing regulatory instruments — is an excellent resource for scholars working in this area.

Important area of work

It is great to see more work tackling the hugely important debate about how the law should be adapted to balance competing aims and values within the realm of GenAI.

Potential improvements

What is the problem?

Would you consider crystalizing the problem you are grappling with more clearly? You claim that if we fail to act, we may witness the “emergence of an oligopolistic market structure that could prioritize commercial profit over scientific integrity and equitable knowledge production.” This sounds scary—but it’s a very abstract concern. Can you define the problem more concretely? And can you explain the underlying mechanism that will cause the outcomes you are worried about?

I would also pay attention to whether the problem — whatever it is — is attributable to:

a general failure of copyright law to keep up with technological progress; or
specific issues with the rules relating specifically to scholarly content.

The answer has significant implications for your solution: Do we need specific rules concerning scholarly content production, or do we need more expansive reform? Is it actually possible to solve the “scholarly content” problem without overhauling copyright regimes in their entirety?

Also, when defining the problem, can you anticipate rebuttals? For example, isn’t it better if these models are made more reliable by being trained on scientific data (c.f. junk scraped off Wikipedia)? Maybe the answer is “no, because it creates a false sense of security”, but it’s still worth engaging with these kinds of intuitions.

Also consider: Is the problem model misinformation and harm that results, which could be addressed through tort reform rather than property reform? Or is it about imbalances of power? Or are you just bothered by the theft of people’s ideas as a matter of principle?

What is your original contribution?

I would have liked to see a crisp articulation of the original contribution you are purporting to make. A lot has already been written on the questions you pose (albeit as part of broader articles addressing copyright and model training) that you pose in your introduction.

After reading the article, I went back to your abstract and thought about what was new here. My concern is that the claims are either: (a) not original; or (b) original but not yet convincing.

“The author argues that current regulatory frameworks in key jurisdictions like the United States, China, the European Union, and the United Kingdom, while aiming to foster innovation, contain significant gaps, particularly concerning the use of copyrighted works and open science outputs for AI training.”

Sure, but there is no serious debate about whether there are gaps in existing regulation in these jurisdictions.

“Widely adopted licensing mechanisms, such as Creative Commons, fail to adequately address the nuances of AI training, and the pervasive lack of attribution within AI systems fundamentally challenges established notions of originality.”

Indeed, this is the premise of all ongoing “AI and copyright” debates that I am aware of.

“While current doctrine treats AI training as potentially fair use, this paper argues such mechanisms are inadequate and that copyright holders should retain explicit opt-out rights regardless of fair use doctrine. Instead, the author advocates for upholding authors’ rights to refuse the use of their works for AI training and proposes that universities assume a leading role in shaping responsible AI governance.”

The status of this as an “important original claim” hinges on it being compelling. I am not yet persuaded that it is: see “Your solution” below.

“The conclusion is that a harmonized international legislative effort is urgently needed to ensure transparency, protect intellectual property, and prevent the emergence of an oligopolistic market structure that could prioritize commercial profit over scientific integrity and equitable knowledge production.”

Do any scholars disagree with this? Who is advocating for an oligopolistic market structure that chooses profit over science? Or: Who doubts that this is happening?

Ultimately, I think the burden is on researchers to show that they are adding something to the conversation; and if you agree, it would be worth articulating what your distinctive offering is. What are the limitations of the literature that you are trying to address? Or, alternatively, what are the things you were curious about — wanted to know about — but could not find information about in the literature?

Two further thoughts:

A premise of your article seems to be that because scholars working in the “AI training and fair use” space do not focus exclusively on scholarly content they are not addressing the issue. However, that is not necessarily the case: the issue is often dealt with in tandem with other issues, and maybe for good reason: piecemeal law reform can be problematic.
Broad claims do not discharge this function (e.g., “At the moment, the community lacks any meaningful discussion on the reuse of academic content for LLMs’ training”). A more nuanced analysis of the literature is needed.

Your solution

I have a few concerns about the solution you offer.

First, a premise of your argument is that you can discretely address the issue of fair use in the context of scholarly publishing. But the lines between different kinds of content are blurry, for which reason most scholars have chosen to debate copyright reform at a higher level of abstraction (i.e., writing about fair use and AI training generally). It is difficult to imagine law reform that deals with a discrete issue.

Second, you do not offer a compelling framework for evaluating tradeoffs, framing your solution as some kind of “silver bullet.” Silver bullet policy proposals are usually too good to be true—which I think is the case here.

Third, the nature and role universities are supposed to be playing is unclear. For example, your claim that universities are at the forefront of developing GenAI models seems implausible. Isn’t it the big commercial players (OpenAI, Google, Microsoft, etc.) with the leading models? Moreover, how much bargaining power do universities really have here? It seems to me that the changes you want can only be procured through law reform, and universities are not lawmakers. I acknowledge that you recognize that universities “lack the regulatory power that governments possess.” Yet that does not negate the weakness of hollow solutions about what universities should/could do.

Fourth, can you pay more attention to the administrability of the scheme?

Fifth, how would your solution be implemented across diverse jurisdictions? One of the fundamental principles of policymaking is that you cannot always “transplant” something that worked in one jurisdiction to another. Can your solution be tailored? If so, how?

Avoid empty sentences

There are a few sentences throughout the paper that use lots of words to say very little. Consider, for example, the opening sentence of the abstract: “The integration of generative artificial intelligence (GenAI) and large language models (LLMs) into scientific research and higher education presents a paradigm shift, offering revolutionizing opportunities while simultaneously raising profound ethical, legal, and regulatory questions.” First, it is not clear whose paradigm stands to be shifted here. Second, “revolutionizing opportunities” is at once a contentious claim (how much of the promise of a “revolution” is hype?) and one that lacks clarity (what is the nature of this revolution?). Third, it is trite to say that AI raises profound questions. In place of these kinds of comments, I would try to get straight to the point. Here, that would entail saying what the paper is about without all the fluff.

Legal authority

It would be helpful if you pointed directly to legal authorities rather than relying on second-hand accounts. Consider the following examples:

“AI training is often considered a case of fair use (Johnson, 2024; Walsh, 2023). For instance, OpenAI argues that this position is “supported by long-standing and widely accepted precedents” (OpenAI and Journalism, 2024).” Is its argument compelling?

Or: “Moreover, the U.S. fair use doctrine allows for the unlicensed use of copyrighted works under certain circumstances.” What is that doctrine? Where does it come from? What is its status?

Or: “AI training is often considered a case of fair use.” In what circumstances? How do courts draw the lines?

Many of the answers eventually arrive at page 11. But I would hate to think that some law scholars would give up on the article prematurely because they assume – based on the first ten pages – that it is devoid of nuanced legal analysis.

Competing interests: None.

Cite

Who owns the knowledge? Copyright, genAI, and the future of academic publishing

Abstract

Full text

Introduction

AI-Related Regulations

United Kingdom

United States

China

European Union

Global Initiatives

Copyright, Licensing, and Legal Analysis

Copyright Framework and Creative Commons Licensing

Legal Analysis of AI-Generated Content as a Fair Use Case

A Call for Action

Universities as Key Players in AI Regulation

Legislative Measures and International Cooperation

Conclusion

Fragmented Regulation and Judicial Practice

Towards Implementation

Fundamental Choice

Limitations of the Study

Future Research Directions

Acknowledgments

Author’s Contribution

AI Assistance Disclosure

Notes

References

Editors

Editorial assessment

by Jason Chin

Recommendations for enhanced transparency

Peer review 1

Kathy Bowrey

The significance of licensing deals for AI policy

The light on the hill

Revision recommendations

Peer review 2

William Nicholas Cesta

Strengths

Mapping key regulatory instruments

Explaining existing copyright schemes

Important area of work

Potential improvements

What is the problem?

What is your original contribution?

Your solution

Avoid empty sentences

Legal authority

Leave a comment

Making publishing and peer review more efficient and more rewarding

Making publishing and peer review faster, more efficient and more rewarding

Making publishing and peer review faster, more efficient and more rewarding

TRACKING COOKIES

THIRD PARTY EMBEDS

OUR PRIVACY POLICY

TERMS & CONDITIONS