[Paper] Creative Ownership in the Age of AI

Published: 2 months ago (February 12, 2026 at 01:56 PM EST)

5 min read

Source: arXiv

Source: arXiv

Overview

The paper “Creative Ownership in the Age of AI” tackles a pressing legal and technical question: When does a generative‑AI output cross the line into copyright infringement?

The authors argue that the traditional “substantially similar” test is inadequate for AI systems that can mimic an artist’s style without copying any exact text or image. They propose a new, mathematically grounded criterion and explore its consequences for AI‑driven content creation.

Key Contributions

New infringement definition – An AI‑generated work infringes if it could not have been produced without a particular copyrighted piece being present in the training data.
Formal model using closure operators – Treats a generative system as a mathematical operator that maps a corpus of existing works to a set of possible outputs.
Characterization of “permissible” outputs – Provides conditions under which AI‑generated content is legally safe under the new definition.
Asymptotic dichotomy theorem – Shows that the tail behavior of the distribution of original works (light‑ vs. heavy‑tailed) determines whether regulation will eventually become irrelevant or remain restrictive for AI generation.
Bridge between economics, AI theory, and copyright law – Integrates concepts from game theory, algorithmic learning, and legal doctrine to inform policy design.

Methodology

Modeling the creative ecosystem
- The authors treat the set of all existing creative works as a corpus (C).
- A generative‑AI system is modeled as a closure operator (G) that, given (C), produces a superset (G(C)) containing every work the system could output.
Defining infringement
- For any work (w \in G(C)), infringement occurs if removing a particular source work (s \in C) from the corpus makes (w) impossible to generate:
  [ w \notin G\bigl(C \setminus {s}\bigr). ]
Statistical analysis of the corpus
- The distribution of “originality” among works is modeled with either a light‑tailed distribution (e.g., exponential) or a heavy‑tailed distribution (e.g., power‑law).
- Using tools from probability theory, the authors examine how the influence of any single work on (G(C)) behaves as the corpus grows.
Deriving the dichotomy
- Light‑tailed case: They prove that the marginal impact of any individual work vanishes asymptotically, implying that regulation eventually ceases to limit AI output.
- Heavy‑tailed case: Certain “high‑impact” works retain disproportionate influence, so the regulatory constraint remains active indefinitely.

Results & Findings

Scenario	Distribution of Original Works	Effect on AI‑Generated Output	Regulatory Implication
Light‑tailed (e.g., exponential)	Most works have similar, modest influence	Dependence on any single source fades as the corpus expands	After a certain scale, the new infringement rule imposes no practical limits on AI generation
Heavy‑tailed (e.g., power‑law)	A few works dominate the creative landscape (e.g., “canonical” novels, iconic songs)	Those dominant works continue to shape the output space, regardless of corpus size	Regulation remains persistently constraining; AI systems must still avoid generating content that hinges on those high‑impact works

The dichotomy is sharp: a clear threshold in the tail‑heaviness of the underlying distribution flips the outcome from “unconstrained” to “constrained.”

Practical Implications

For AI product teams – Knowing whether your training data follows a heavy‑tailed pattern can inform risk assessments. If your corpus is dominated by a few blockbuster works, you may need to add filters or adopt specific licensing strategies.
Dataset curation – Building a more balanced dataset (e.g., by down‑weighting or excluding ultra‑popular works) can shift the distribution toward the light‑tailed regime, thereby reducing legal exposure under the proposed rule.
Policy design – Regulators could adopt the “necessity” test (the paper’s definition) as an objective standard, focusing enforcement on outputs that truly depend on protected works rather than on superficial stylistic similarity.
Tooling – The closure‑operator framework suggests an audit pipeline: simulate the removal of a candidate source work from the training set and check whether a given output remains producible. This process can be automated for large‑scale compliance monitoring.
Business models – Companies might consider pay‑per‑use licensing for high‑impact works that are likely to stay in the heavy‑tailed tail, turning a legal constraint into a revenue stream.

Limitations & Future Work

Abstraction vs. reality – Modeling generative systems as pure closure operators abstracts away many practical details (e.g., stochastic sampling, temperature settings). Real‑world models may not fit the idealized mathematical properties assumed.
Empirical validation – The paper provides a theoretical dichotomy but lacks large‑scale experiments on actual AI models and corpora to confirm the tail‑behavior predictions.
Scope of “necessity” – Determining whether a work could not have been generated without a source may be computationally intensive; approximations or heuristics are needed for practical enforcement.
Cross‑jurisdictional nuances – Copyright doctrines differ globally; the proposed definition may need adaptation to fit fair‑use doctrines, moral rights, or database rights in various legal systems.
Future directions – The authors suggest extending the model to multimodal generation (text + image), exploring dynamic corpora where new works continuously enter the training set, and developing concrete algorithmic tools for “necessity” testing.

Bottom line: By reframing infringement as a question of necessity rather than similarity, this work offers a fresh lens for developers, product managers, and policymakers to navigate the evolving landscape of AI‑generated creativity. Understanding the statistical shape of your training data could be the key to building compliant, innovative generative systems.

Authors

Annie Liang
Jay Lu

Paper Information

Item	Details
arXiv ID	`2602.12270v1`
Categories	econ.TH, cs.AI, cs.GT
Published	February 12, 2026
PDF	Download PDF

[Paper] Creative Ownership in the Age of AI

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Imitating What Works: Simulation-Filtered Modular Policy Learning from Human Videos

[Paper] Semantic Chunking and the Entropy of Natural Language

[Paper] CoPE-VideoLM: Codec Primitives For Efficient Video Language Models

[Paper] Selection of CMIP6 Models for Regional Precipitation Projection and Climate Change Assessment in the Jhelum and Chenab River Basins