[Paper] Creative Ownership in the Age of AI
Source: arXiv
Source: arXiv:2602.12270v1
Overview
The paper “Creative Ownership in the Age of AI” tackles a pressing legal and technical question: When does a generative‑AI output cross the line into copyright infringement?
The authors argue that the traditional “substantially similar” test is inadequate for AI systems that can mimic an artist’s style without copying any exact text or image. They propose a new, mathematically grounded criterion and explore its consequences for AI‑driven content creation.
Key Contributions
- New infringement definition – An AI‑generated work infringes if it could not have been produced without a particular copyrighted piece being present in the training data.
- Formal model using closure operators – Treats a generative system as a mathematical operator that maps a corpus of existing works to a set of possible outputs.
- Characterization of “permissible” outputs – Provides conditions under which AI‑generated content is legally safe under the new definition.
- Asymptotic dichotomy theorem – Shows that the tail behavior of the distribution of original works (light‑ vs. heavy‑tailed) determines whether regulation will eventually become irrelevant or remain restrictive for AI generation.
- Bridge between economics, AI theory, and copyright law – Integrates concepts from game theory, algorithmic learning, and legal doctrine to inform policy design.
Methodology
Modeling the creative ecosystem
- The authors treat the set of all existing creative works as a corpus (C).
- A generative‑AI system is modeled as a closure operator (G) that, given (C), produces a superset (G(C)) containing every work the system could output.
Defining infringement
- For any work (w \in G(C)), infringement occurs if removing a particular source work (s \in C) from the corpus makes (w) impossible to generate:
[ w \notin G\bigl(C \setminus {s}\bigr). ]
- For any work (w \in G(C)), infringement occurs if removing a particular source work (s \in C) from the corpus makes (w) impossible to generate:
Statistical analysis of the corpus
- The distribution of “originality” among works is modeled with either a light‑tailed distribution (e.g., exponential) or a heavy‑tailed distribution (e.g., power‑law).
- Using tools from probability theory, the authors examine how the influence of any single work on (G(C)) behaves as the corpus grows.
Deriving the dichotomy
- Light‑tailed case: They prove that the marginal impact of any individual work vanishes asymptotically, implying that regulation eventually ceases to limit AI output.
- Heavy‑tailed case: Certain “high‑impact” works retain disproportionate influence, so the regulatory constraint remains active indefinitely.
Results & Findings
| Scenario | Distribution of Original Works | Effect on AI‑Generated Output | Regulatory Implication |
|---|---|---|---|
| Light‑tailed (e.g., exponential) | Most works have similar, modest influence | Dependence on any single source fades as the corpus expands | After a certain scale, the new infringement rule imposes no practical limits on AI generation |
| Heavy‑tailed (e.g., power‑law) | A few works dominate the creative landscape (e.g., “canonical” novels, iconic songs) | Those dominant works continue to shape the output space, regardless of corpus size | Regulation remains persistently constraining; AI systems must still avoid generating content that hinges on those high‑impact works |
The dichotomy is sharp: a clear threshold in the tail‑heaviness of the underlying distribution flips the outcome from “unconstrained” to “constrained.”
Practical Implications
For AI product teams – Knowing whether your training data follows a heavy‑tailed pattern can inform risk assessments. If your corpus is dominated by a few blockbuster works, you may need to add filters or adopt specific licensing strategies.
Dataset curation – Building a more balanced dataset (e.g., by down‑weighting or excluding ultra‑popular works) can shift the distribution toward the light‑tailed regime, thereby reducing legal exposure under the proposed rule.
Policy design – Regulators could adopt the “necessity” test (the paper’s definition) as an objective standard, focusing enforcement on outputs that truly depend on protected works rather than on superficial stylistic similarity.
Tooling – The closure‑operator framework suggests an audit pipeline: simulate the removal of a candidate source work from the training set and check whether a given output remains producible. This process can be automated for large‑scale compliance monitoring.
Business models – Companies might consider pay‑per‑use licensing for high‑impact works that are likely to stay in the heavy‑tailed tail, turning a legal constraint into a revenue stream.
Limitations & Future Work
- Abstraction vs. reality – Modeling generative systems as pure closure operators abstracts away many practical details (e.g., stochastic sampling, temperature settings). Real‑world models may not fit the idealized mathematical properties assumed.
- Empirical validation – The paper provides a theoretical dichotomy but lacks large‑scale experiments on actual AI models and corpora to confirm the tail‑behavior predictions.
- Scope of “necessity” – Determining whether a work could not have been generated without a source may be computationally intensive; approximations or heuristics are needed for practical enforcement.
- Cross‑jurisdictional nuances – Copyright doctrines differ globally; the proposed definition may need adaptation to fit fair‑use doctrines, moral rights, or database rights in various legal systems.
- Future directions – The authors suggest extending the model to multimodal generation (text + image), exploring dynamic corpora where new works continuously enter the training set, and developing concrete algorithmic tools for “necessity” testing.
Bottom line: By reframing infringement as a question of necessity rather than similarity, this work offers a fresh lens for developers, product managers, and policymakers to navigate the evolving landscape of AI‑generated creativity. Understanding the statistical shape of your training data could be the key to building compliant, innovative generative systems.
Authors
- Annie Liang
- Jay Lu
Paper Information
| Item | Details |
|---|---|
| arXiv ID | 2602.12270v1 |
| Categories | econ.TH, cs.AI, cs.GT |
| Published | February 12, 2026 |
| Download PDF |