[Paper] Creative Ownership in the Age of AI
Source: arXiv - 2602.12270v1
Overview
The paper “Creative Ownership in the Age of AI” tackles a pressing legal and technical question: When does a generative‑AI output cross the line into copyright infringement? The authors argue that the traditional “substantially similar” test is inadequate for AI systems that can mimic an artist’s style without copying any exact text or image. They propose a new, mathematically grounded criterion and explore its consequences for AI‑driven content creation.
Key Contributions
- New infringement definition – An AI‑generated work infringes if it could not have been produced without a particular copyrighted piece being present in the training data.
- Formal model using closure operators – Treats a generative system as a mathematical operator that maps a corpus of existing works to a set of possible outputs.
- Characterization of “permissible” outputs – Provides conditions under which AI‑generated content is legally safe under the new definition.
- Asymptotic dichotomy theorem – Shows that the tail behavior of the distribution of original works (light‑ vs. heavy‑tailed) determines whether regulation will eventually become irrelevant or remain restrictive for AI generation.
- Bridge between economics, AI theory, and copyright law – Integrates concepts from game theory, algorithmic learning, and legal doctrine to inform policy design.
Methodology
-
Modeling the creative ecosystem
- The authors view the set of all existing creative works as a corpus (C).
- A generative AI system is represented as a closure operator (G) that, given (C), produces a superset (G(C)) of all works the system could output.
-
Defining infringement
- For any work (w \in G(C)), infringement occurs if removing a particular source work (s \in C) from the corpus makes (w) impossible to generate: (w \notin G(C \setminus {s})).
-
Statistical analysis of the corpus
- The distribution of “originality” among works is modeled with either a light‑tailed (e.g., exponential) or heavy‑tailed (e.g., power‑law) probability distribution.
- Using tools from probability theory, the authors study how the influence of any single work on (G(C)) behaves as the corpus grows.
-
Deriving the dichotomy
- They prove that with a light‑tailed distribution, the marginal impact of any individual work vanishes asymptotically, meaning regulation eventually stops limiting AI output.
- Conversely, with a heavy‑tailed distribution, certain “high‑impact” works retain disproportionate influence, keeping the regulatory constraint active indefinitely.
Results & Findings
| Scenario | Distribution of Original Works | Effect on AI‑Generated Output | Regulatory Implication |
|---|---|---|---|
| Light‑tailed (e.g., exponential) | Most works have similar, modest influence | Dependence on any single source fades as the corpus expands | After a certain scale, the new infringement rule imposes no practical limits on AI generation |
| Heavy‑tailed (e.g., power‑law) | A few works dominate the creative landscape (think “canonical” novels, iconic songs) | Those dominant works continue to shape the output space, regardless of corpus size | Regulation remains persistently constraining; AI systems must still avoid generating content that hinges on those high‑impact works |
The dichotomy is sharp: there is a clear threshold in the tail‑heaviness of the underlying distribution that flips the outcome from “unconstrained” to “constrained”.
Practical Implications
- For AI product teams – Understanding whether your training data follows a heavy‑tailed pattern can inform risk assessments. If you’re training on a corpus dominated by a few blockbuster works, you may need to implement additional filters or licensing strategies.
- Dataset curation – Curating a more balanced dataset (e.g., by down‑weighting or excluding ultra‑popular works) could shift the distribution toward the light‑tailed regime, reducing legal exposure under the proposed rule.
- Policy design – Regulators could adopt the “necessity” test (the paper’s definition) as a more objective standard, focusing enforcement on outputs that truly depend on protected works rather than on superficial stylistic similarity.
- Tooling – The closure‑operator framework suggests a possible audit pipeline: simulate removal of a candidate source work from the training set and check whether a given output remains producible. This could be automated for large‑scale compliance monitoring.
- Business models – Companies may consider pay‑per‑use licensing for high‑impact works that are likely to stay in the heavy‑tailed tail, turning a legal constraint into a revenue stream.
Limitations & Future Work
- Abstraction vs. reality – Modeling generative systems as pure closure operators abstracts away many practical details (e.g., stochastic sampling, temperature settings). Real‑world models may not fit the idealized mathematical properties assumed.
- Empirical validation – The paper provides a theoretical dichotomy but lacks large‑scale experiments on actual AI models and corpora to confirm the tail‑behavior predictions.
- Scope of “necessity” – Determining whether a work could not have been generated without a source may be computationally intensive; approximations or heuristics are needed for practical enforcement.
- Cross‑jurisdictional nuances – Copyright doctrines differ globally; the proposed definition may need adaptation to fit fair‑use doctrines, moral rights, or database rights in various legal systems.
- Future directions – The authors suggest extending the model to multi‑modal generation (text + image), exploring dynamic corpora where new works continuously enter the training set, and developing concrete algorithmic tools for “necessity” testing.
Bottom line: By reframing infringement as a question of necessity rather than similarity, this work offers a fresh lens for developers, product managers, and policymakers to navigate the evolving landscape of AI‑generated creativity. Understanding the statistical shape of your training data could be the key to building compliant, innovative generative systems.
Authors
- Annie Liang
- Jay Lu
Paper Information
- arXiv ID: 2602.12270v1
- Categories: econ.TH, cs.AI, cs.GT
- Published: February 12, 2026
- PDF: Download PDF