technology-trendsIP-and-AI

Voice Cloning and IP Law: The Legal Gap AI Is Exploiting Right Now

Kwame Asante

May 22, 2026

14 min read

comparative-lawvoice-cloninggenerative-AIright-of-publicityIP-lawdeepfakesound-marksNLP

Educational Content – Not Legal Advice

This article provides general information. Consult a qualified attorney before taking action.

Disclaimer

This analysis is for educational purposes only and does not constitute legal advice. The information provided is general in nature and may not apply to your specific situation. Laws and regulations change frequently; verify current requirements with qualified legal counsel in your jurisdiction.

Last Updated: May 22, 2026

Full Disclaimer|Report an Error

Taylor Swift and McConaughey Filed Trademark Applications for Their Voices. That Is Not a PR Move—It Is a Legal Emergency Response.

Taylor Swift's management entity, TAS Rights Management, LLC, filed applications with the USPTO in April 2026 to register sound marks of her voice saying phrases like "Hey, it's Taylor Swift." Matthew McConaughey secured the registration of "ALRIGHT ALRIGHT ALRIGHT"—including his specific intonation and cadence—as a sensory mark in 2025. Neither of these is a branding exercise. Both are adaptive responses to a structural failure in intellectual property law that generative AI has made impossible to ignore.

The core problem has a name in the doctrine: the vocal identity gap. Copyright protects the fixation of a sound performance—the phonogram, the audio file, the "container." What it does not protect are the biological and stylistic properties of the voice that produced it: the fundamental frequency, the timbre, the cadence patterns, the way a person shapes a syllable. Those qualities sit in a legal vacuum. And generative AI has learned to exploit that vacuum at industrial scale.

What Copyright Actually Protects—and What It Leaves Open

Under 17 U.S.C. § 102(a)(7), copyright protection in the United States attaches to sound recordings fixed in a tangible medium. The EU framework, governed by the InfoSoc Directive (2001/29/EC), follows the same logic. Protection activates when a performance is captured in a concrete form. The style, the timbre, the identity signal of the voice that created that recording is not covered.

Section 114(b) of U.S. copyright law makes this explicit: the exclusive rights of a sound recording owner do not prevent the making of an independent recording that imitates or simulates the sounds of the original. In other words, Congress anticipated imitation and permitted it—back when it was still impossible to imagine that a machine could do it perfectly, in seconds, at zero marginal cost.

This is not a theoretical gap. When the song Heart on My Sleeve circulated in 2023, using AI-generated replicas of Drake and The Weeknd's voices, it demonstrated that current models can absorb phrasing patterns, timbre and interpretive cadence to generate entirely new content (ex novo) that audiences perceive as authentic. The legal conclusion was uncomfortable but clear: no phonogram was reproduced, so no copyright was infringed. The system learned the identity signal without copying the protected container.

The Technology That Broke the Fixation Paradigm

Understanding the legal problem requires understanding what makes it technically possible.

Current voice cloning systems are built on Generative Adversarial Networks (GANs). The architecture pits two neural networks against each other: the "generator" produces synthetic voice samples, and the "discriminator" attempts to distinguish real samples from artificial ones. Through iterative competition, the generator improves its ability to emulate timbres, intonations and cadences until the replicas become indistinguishable to the human ear. Complementary technologies—Hidden Markov Models (HMM) and WaveNet architectures—allow these systems to extract speech patterns from minimal training data, dramatically lowering both the computational cost and the entry barrier for abuse.

The jurisprudentially critical point is that these models do not operate by duplicating protected audio fragments. The system learns the parameters of the identity signal—frequency, intensity, pitch signature—and uses them to generate something entirely new. This severs the conceptual link that the entire fixation-based copyright framework was built upon. You can appropriate the essence of a person's voice without ever copying a single file they own.

What Came Before: The Precedents That Protect—and Their Limits

Voice protection was not nonexistent before generative AI. The Ninth Circuit had been building a coherent doctrine since the 1980s that recognized voice as a legally protectable identity attribute in the context of commercial appropriation.

Midler v. Ford Motor Co. (849 F.2d 460, 9th Cir. 1988) remains the foundational case. When Bette Midler refused to participate in a Ford advertising campaign, the agency hired a backing vocalist to deliberately imitate her timbre and style. The court held that while a voice is not a copyrightable fixed work under § 102, the deliberate imitation of a distinctive professional voice to sell a product constitutes a tort under California common law. The reasoning was explicit: a person's voice is as distinctive and personal an attribute as a face.

Waits v. Frito-Lay, Inc. (978 F.2d 1093, 9th Cir. 1992) extended the doctrine to style imitation without a pre-existing song. The court not only affirmed voice protection as an identity attribute, but validated a false endorsement claim under § 43(a) of the Lanham Act: audiences may be deceived about a celebrity's commercial association with a product. This added a federal trademark dimension to what had been a state-law right of publicity claim.

These precedents are solid. Their limitation is structural: they presuppose a human actor who deliberately imitates another human actor. Generative AI achieves the same result without a discernible intentional actor, without using any protected recording, and at a scale that makes case-by-case enforcement under state right of publicity statutes effectively meaningless.

Two Artists, One Strategic Intuition—and Its Doctrinal Tension

The trademark filing strategy pursued by McConaughey and Swift is clever precisely because it redirects protection from identity law to commercial origin law. Trademark law protects distinctiveness—the capacity of a sign to function as a source identifier in commerce. A registered sound mark gives its holder federal standing to challenge AI-generated content that creates a likelihood of confusion or dilution in the marketplace, regardless of whether any protected phonogram was reproduced.

McConaughey's registration of "ALRIGHT ALRIGHT ALRIGHT"—combined with his specific intonation—as a sensory mark creates a formal nexus between his vocal delivery and a unique commercial origin. Swift's pending applications for her voice saying "Hey, it's Taylor Swift" pursue the same logic at a broader scale, attempting to build overlapping layers of protection (marks, image rights, copyright) capable of intercepting synthetic content that uses her vocal identity to imply unauthorized affiliation or endorsement.

Legal scholars have rightly identified the doctrinal tension in this approach. Trademark law was designed to protect source identifiers, not personality attributes. Stretching it to cover identity signals may generate complex precedents that neither fully protect the person nor remain coherent within existing trademark doctrine. As one University of Reading commentator noted in 2026, the fundamental question—whether trademark law can be the right vehicle for voice identity protection—remains unanswered.

The strategy is a pragmatic patch over a structural hole. It is not a substitute for comprehensive identity protection.

The Johansson–OpenAI Incident: When AI Evokes Without Copying

The incident between Scarlett Johansson and OpenAI in May 2024 illustrates a dimension of the problem that the trademark approach cannot address: protection against an AI system that evokes without copying.

OpenAI launched a ChatGPT-4o update featuring a voice called "Sky," whose similarity to Johansson's voice in the film Her was widely noted. The company maintained that the voice belonged to a different actress and was not modeled on Johansson's. But the suggestion of intentional evocation was hard to dismiss: Sam Altman had previously approached Johansson about providing her voice for the product (she declined), and he posted the word "Her" on social media during the demonstration.

The case never reached a court. But the legal question it leaves open is critical: can identity be protected against an AI that evokes but does not copy actual biometric data? Data protection law—both under GDPR and analogous U.S. frameworks—generally requires the processing of real personal data belonging to the affected individual. A system that constructs a perceptually similar voice from scratch does not obviously trigger those protections. The right of publicity or personality rights frameworks would have offered a more robust pathway, because their foundation is not data processing but identity appropriation.

The "Sky" case demonstrates that the value of a voice does not reside solely in its acoustic frequency. It resides in the stylistic attributes and cultural associations that audiences project onto the physical person. What is appropriated in these cases is not an audio file—it is a mental association. Legal frameworks need to be capable of protecting that.

A Fragmented Global Response

The comparative picture shows a fundamental divergence in how different legal cultures conceptualize voice: as an economic asset or as a personal attribute.

United States. Protection remains fragmented at the state level. The right of publicity is a state law creation with no federal equivalent—until, possibly, the NO FAKES Act (Nurture Originals, Foster Art, and Keep Entertainment Safe Act), reintroduced in May 2026 with broad industry support. The proposal would establish a uniform, transferable federal intellectual property right over digital voice and image replicas, extending beyond the rights holder's death. Tennessee's ELVIS Act (2024) marked a first by explicitly naming "voice" as a protected attribute against AI model use and imposing secondary liability on platforms that knowingly distribute unauthorized digital replicas. New York's Digital Replica Law (2025) took a contractual approach, voiding agreements that replace an artist's live performance without clearly defining the scope of replica use.

European Union. The AI Act (Regulation (EU) 2024/1689) addresses the problem through transparency: Article 50 requires deepfake systems to disclose that content has been artificially generated or manipulated. This protects the public interest against disinformation but delegates personal integrity protection to existing personality rights and data protection frameworks. The more structurally significant development is Denmark's legislative initiative to create an autonomous intellectual property right over voice and physical appearance, conceived as a neighbouring right that makes vocal identity a commercially exploitable and licensable asset while integrating inalienable moral rights preventing degrading use or unauthorized alteration.

Spain. The Organic Law 1/1982 (LODH) protects voice only explicitly against advertising or commercial uses. Integration of voice as biometric data under the LOPDGDD has strengthened the defensive architecture, but the data protection pathway has a structural limitation: its strictly personal nature makes it difficult to exercise iure hereditatis after the rights holder's death. Recent lege ferenda proposals have called for reforming the LODH to codify "the right to one's voice" as an autonomous right, severed from the image right to better address sonic impersonation cases that do not necessarily affect honor or reputation.

China. The Civil Code (2020), Article 1023, explicitly extends the image right framework to voice, requiring consent for any reproduction that allows unique identification. The Beijing Internet Court has established relevant precedents by prioritizing the "identifiability" of the personhood signal over the fixation requirement of Western copyright doctrine—allowing artists to bring claims even when an AI generates a new performance that merely emulates their style.

What Needs to Change: A Case for Biometric Voice Registration

The existing patchwork—sound marks, state right of publicity, data protection, ELVIS-type statutes—is a collection of adaptive responses to a problem that requires a structural solution. The insufficiency of traditional IP frameworks demands an ontological reconsideration of what voice is in intellectual property law.

The most robust lege ferenda proposal is the recognition of voice as an autonomous intellectual property object through an official register of "biometric vocal fingerprints" or "personhood signals." This model, inspired by Denmark's neighbouring right initiative, would allow vocal identity to be treated as an intangible, autonomous and commercially exploitable asset. The registration would not protect a specific recording but the bioacoustic parameters and cadence patterns that allow unique identification of the individual—a biometric "deed."

For the system to function, three complementary elements are required.

A stratified licensing system. For voice professionals, "single-use vocal contracts" explicitly prohibiting the use of recordings for AI model training without high-value specific synthesis licenses. For general-purpose AI models (GPAI), a transition from the current opt-out framework to a "double opt-in" protocol as a mandatory condition for training on materials containing identifiable voices. For posthumous exploitation, a controlled "digital resurrection right"—analogous to the NO FAKES Act provisions—granting heirs control over digital replicas for a defined period (fifty to seventy years post mortem).

Mandatory technical traceability. Compulsory implementation of cryptographic watermarking and content provenance technologies in voice synthesis engines. These watermarks, imperceptible to the human ear but algorithmically detectable, would enable verification of whether an audio sample was artificially generated and by which AI model. This is essential for enabling secondary liability compliance, such as that established in Tennessee's ELVIS Act.

International coordination through WIPO. Registration should be centralized in national IP offices—the Spanish OEPM, the USPTO—under international standards coordinated by WIPO. The continental European model would additionally require the integration of inalienable moral rights empowering the holder to oppose degrading uses or unauthorized alteration of their digital identity, even after licensing the economic exploitation rights.

The Ontological Question the Law Cannot Defer

The comparative analysis ultimately converges on a question that legal systems have not yet answered explicitly.

Is the human voice an inalienable attribute of the physical person? Or is it a reproducible asset subject to the market logic of digital goods?

Continental European law has tended toward the first answer: voice is a manifestation of identity, protected as such through human dignity. The Anglo-Saxon right of publicity framework has tended, implicitly, toward the second: what it protects is the market value that the holder has built around their identity.

Generative AI has collapsed the distance between these two positions into an immediate practical problem. When a voice can be separated from the person who produced it, replicated at zero marginal cost, and exploited indefinitely without requiring that person to be alive, the question of its legal nature carries enormous practical consequences: who can authorize it, who can inherit it, who can prohibit it—and above all, who protects the individual whose acoustic identity has become raw material for a language model without their knowledge or consent.

The four doctrinal conclusions this analysis supports are these:

The fixation paradigm is obsolete. GAN technology synthesizes ex novo performances without copying any phonogram, leaving the "personhood signal" in a protection vacuum that traditional copyright cannot cover.
Sound marks are an intelligent patch, not a solution. McConaughey and Swift have found pragmatic defenses, but extending trademark law to cover personality attributes creates fragmented precedents that do not substitute comprehensive identity protection.
Global divergence is a real problem. Tennessee has the ELVIS Act, Denmark has its neighbouring right proposal, Spain has a half-reformed LODH. The absence of international harmonization allows AI models to operate in the gaps between jurisdictions.
A biometric voice registration system, complemented by technical traceability and stratified licensing, is the most robust proposal available today. It treats voice as what it actually is: an autonomous intangible asset deserving protection independent of any specific recording that may contain it.

The decision that legal systems must make is whether the human voice will retain its status as an inalienable attribute of the physical person—or whether it will be surrendered permanently to the logic of the digital market as a reproducible asset without limit. That is, at its core, a question about the nature of identity in the age of machines that learn.

This article draws on the academic study "La voz como objeto de propiedad intelectual en la era de la IA generativa: registro, protección y desafíos ante la clonación sintética. Análisis de los casos McConaughey, Swift y Johansson" (Firma Scarpa — Derecho & Inteligencia Artificial, May 2026), available in full from the firm's research channels.

Back to News