Chardet v7.0.0: AI Rewrites Copyleft Out of Existence

Last updated: April 10, 2026

An AI-assisted rewrite of one of Python's most-downloaded libraries has ignited the most consequential open-source licensing crisis in years — and no one agrees on what the law actually says. On March 2, 2026, maintainer Dan Blanchard released chardet 7.0.0 under an MIT license, replacing the LGPL that governed the library since its creation in 2006. He used Anthropic's Claude Code to perform what he calls an independent rewrite. Two days later, original author Mark Pilgrim — who famously disappeared from the internet in 2011 — resurfaced to file GitHub Issue #327, calling the relicensing "an explicit violation of the LGPL." The dispute has since drawn responses from Bruce Perens, the Free Software Foundation, Salvatore Sanfilippo (antirez), and dozens of prominent developers, exposing a legal vacuum at the intersection of AI code generation, copyright law, and copyleft enforcement that no court has yet addressed.

The chardet library receives ~130 million downloads per month on PyPI and has roughly 955,000 dependent repositories. As of March 31, 2026, version 7.4.0.post1 is the latest release on PyPI under the 0BSD license — a public-domain-equivalent license that functions identically whether the code is copyrightable or not. Issues #327, #331, and #334 are all closed; the only open issue is #355, where Fontana clarified his earlier LGPL statements and Kuhn announced SFC's investigation. No legal action has been filed. Blanchard has consulted with an attorney and publicly noted that if the legal consensus settles on AI-generated output being non-copyrightable, he would be comfortable declaring chardet 7.x public domain — a conclusion that, if correct, would defeat Pilgrim's LGPL claim (no copyright means no LGPL enforcement) and is now gracefully accommodated by the 0BSD license.

The timeline

March 2: Blanchard releases chardet 7.0.0 under MIT. The same day, the U.S. Supreme Court denies certiorari in Thaler v. Perlmutter, solidifying the human authorship requirement for copyright.

March 4: Mark Pilgrim resurfaces and opens Issue #327. A separate Issue #325 ("Nullified License") arguing the MIT license is itself invalid because AI output is public domain is opened and quickly closed as not planned. Blanchard releases v7.0.1 with bugfixes under the same MIT license. The Software Freedom Conservancy publishes a blog post about the Thaler cert denial but does not address chardet directly.

March 5: Blanchard responds in detail on #327 with JPlag analysis data and full process disclosure. Simon Willison publishes a detailed analysis. Armin Ronacher publishes "AI And The Ship of Theseus." LWN.net publishes a subscriber article on the relicensing. OSnews calls it "the great license-washing." An NVIDIA employee opens Issue #331 calling v7.0.0 "absolutely toxic" for enterprise users.

March 6: Thomas Claburn publishes in The Register, with Bruce Perens warning that AI rewrites will upend software economics (while noting he did the same thing himself with a proprietary SRE platform). Slashdot picks up the story. Richard Fontana (Red Hat attorney, co-author of GPLv3/LGPLv3/AGPL) opens Issue #334: "What does the MIT license cover in chardet?" — questioning whether Blanchard can claim copyright at all. (Discussion on #334 continues actively for the next two weeks.)

March 7–8: Phoronix covers the dispute. Antirez publishes "GNU and the AI reimplementations." Bruce Perens comments directly on Issue #331, recommending against rejecting chardet v7 on legal risk grounds. Issues #327 and PR #322 are locked to collaborators.

March 9: Hong Minhee publishes "Is legal the same as legitimate." A discuss.python.org thread appears asking how to globally exclude chardet v7 from pip installations.

March 10: Blanchard reopens #334 and discloses that he has consulted a lawyer and is considering releasing chardet v7 into the public domain. Ars Technica publishes a detailed overview by Kyle Orland (165+ comments). The European Parliament adopts a non-binding resolution on "Copyright and generative artificial intelligence." Luka Kladaric publishes "License Laundering and the Death of Clean Room" on ShiftMag. Shuji Sado publishes a detailed legal analysis on Open Source Guy.

March 11: Blanchard releases chardet v7.1.0, fixing a 0.5-second first-call startup cost, restoring backward-compatible encoding names from chardet 5.x, achieving 100% line coverage, and adding a compat_names parameter to smooth the upgrade path.

March 12: LWN.net makes chardet the lead front-page story of its weekly edition. The comment thread features sharp debate on whether clean-room design is a legal requirement or merely one way to establish independence.

March 13: Le Monde Informatique publishes a detailed overview framing the dispute as a threat to both copyleft and proprietary software — the first major French-language coverage.

March 15: charset-normalizer v3.4.6 is released with a README footnote directly addressing the chardet controversy, alleging that chardet 7 incorporated key architectural ideas from charset-normalizer without acknowledgment.

March 16: Python Bytes Episode #473 ("A clean room rewrite?") covers chardet as its lead topic. Former OSI Executive Director Stefano Maffulli publishes "The second liberation: AI is the final frontier of Copyleft" on his personal blog (Maffulli stepped down from OSI in October 2025), arguing AI collapses the cost of the "freedom to fork." The FSF urges Anthropic to "liberate" its LLMs but issues no chardet-specific statement.

March 17–18: Blanchard merges three conda-forge feedstock PRs (v7.0.1, v7.1.0, v7.2.0), overriding an initial hold from another conda-forge maintainer who cited the licensing controversy. He updates the license metadata from LGPL-2.1-only to MIT and migrates the feedstock to recipe.yaml v1. Blanchard releases chardet v7.2.0 with PEP 263 detection, a CLI --language flag, backward-compatibility stubs for chardet.universaldetector, and mypyc-compiled wheels for CPython 3.11–3.14. A draft U.S. AI copyright bill is unveiled that would make unauthorized use of copyrighted works in AI training explicitly not fair use.

March 22: Blanchard changes the chardet 7.x license from MIT to 0BSD, a public-domain-equivalent license that sidesteps the copyrightability question entirely. The PR description states: "All chardet 7.x code is additionally licensed under 0BSD. From 7.3.0 onward, 0BSD will be the only license distributed with the code." The move closes Issue #334, and Richard Fontana — the Red Hat attorney who opened #334 — gives the PR a thumbs-up, writing on #334 that he doesn't "currently see any basis for concluding that chardet 7.0.0 is required to be released under the LGPL," that no one including Pilgrim has identified persistence of copyrightable expression from earlier versions, and that no one has articulated a viable alternate theory of license violation. He clarified that his original concern was never about LGPL enforcement but about the appropriateness of using MIT (or any license) for AI-generated code — a concern the 0BSD switch addresses. Blanchard also closes Issue #327 (Pilgrim's original objection) in light of Fontana's assessment. 0BSD is already used within CPython itself (documentation code has been dual-licensed under PSF License v2 and 0BSD since Python 3.8.6), is OSI-approved, and is accepted by Google's and Microsoft's legal departments. Unlike MIT, 0BSD makes no claim of copyright ownership: it grants unconditional permission with a warranty disclaimer, meaning it functions identically whether the code is copyrightable or not.

March 24: Blanchard releases chardet v7.3.0, the first version published to PyPI with the 0BSD license expression. The release adds PEP 263 encoding declaration detection, backward-compatibility stubs for chardet.universaldetector, fixes false UTF-7 detection, and resolves the 0.5-second startup cost regression. Blanchard gives a 48-minute podcast interview on elvex's "Building for Others" — his first long-form public appearance on the controversy. He reveals that Guido van Rossum personally asked about adding chardet to the Python standard library but said "we can't do it if it's [L]GPL," and discusses the Pilgrim reappearance, the JPlag analysis, and his willingness to comply with any court ruling. He also adds project history documentation with Pilgrim-era graft instructions and a historical performance table across all major versions.

March 25: Walled Culture publishes "Why genAI means the end of copyright for software and the re-invention of open source", a dedicated article using chardet as the central case study.

March 26: Blanchard releases chardet v7.4.0.post1, adding include_encodings/exclude_encodings parameters, a --language CLI flag, and configurable defaults. Updated benchmarks claim 99.3% accuracy on 2,517 test files and 47x faster performance than chardet 6.0.0. The same day, LWN.net publishes "Vibe-coded ext4 for OpenBSD" by Jonathan Corbet, covering Thomas de Grivel's LLM-generated ext4 implementation submitted to OpenBSD and explicitly citing chardet in the opening paragraph as the reference example of "efforts to use LLM-driven reimplemention as a way to remove copyleft restrictions from a body of existing code."

March 27: Fontana opens Issue #355 ("Clarification regarding prior comment in #334"), walking back his earlier statement. He writes: "When I said I did not currently see a basis for concluding that chardet 7.0.0 would need to be released under the LGPL, I meant that as an observation about the state of the discussion at the time, not as a definitive legal conclusion." He clarifies he "did not intend to take a firm position on the LGPL issue" and that his focus was on the separate copyrightability question. A commenter cites the rewrite plan's instruction for Claude to fetch metadata/charsets.py from chardet 6.0.0 as evidence of non-clean-room practice; Blanchard responds that he authored that file himself and gave Claude permission to reference it. Bradley Kuhn (Policy Fellow & Hacker-in-Residence at the Software Freedom Conservancy) posts a major statement on #355, announcing that SFC has begun a "substantial investigation into — and analysis of — this situation", which he will personally lead in collaboration with SFC's technical experts and legal counsel. Kuhn recommends that "folks do not rely on the notion that the new release of chardet has been legitimately relicensed," notes that LGPLv2.1-or-later is compatible with both 0BSD and MIT (meaning anyone may relicense Blanchard's 0BSD-licensed releases as LGPLv2.1-or-later without violating his rights), and urges commercial users to seek legal counsel immediately. He promises SFC will publish the results of their analysis. Fontana gives Kuhn's post a heart reaction. Separately, Simon Willison publishes a short quote post featuring Fontana's #334 comment (and also appends the same quote as an update to his March 5 chardet piece). Kuhn posts a TL;DR of the SFC investigation announcement on LWN.net as a comment on Corbet's chardet article, cross-linking his #355 comment.

March 29: Blanchard publishes "Everything Claude Saw: A Transparent Account of the Chardet v7 Rewrite" — a ~30-minute read providing the most detailed account yet of what happened during the rewrite. The post documents every time Claude accessed old chardet source code during the process, with transcript quotes for each instance. It reveals that Claude has chardet in its training data and can reproduce the architecture from memory (though with errors), that Blanchard blocked Claude's day-1 attempt to browse the old repo, that Claude's subagents independently read universaldetector.py (~567 lines) from the 6.0.0 tag without Blanchard's instruction (the largest exposure event, but occurring after the detection engine was already written), and that the files Claude was explicitly directed to reference (charsets.py, create_language_model.py, languages.py) are all solely or primarily authored by Blanchard himself. The post presents three independent code similarity analyses: git blame (zero Pilgrim lines survive in 7.0.0, even with -C -C -C copy detection), JPlag (0.04% average / 1.29% max similarity between 6.0.0.post1 and 7.0.0, with only 47 matched tokens across 3 matches, all generic boilerplate), and Copydetect/MOSS (0.00% match between chardet 1.0 and 7.0.0; against 6.0.0 the highest per-file Copydetect match is __main__.py at 59% — a three-line CLI entry point). The post also compares chardet 7's architecture against independently developed encoding detectors (chardetng, Google CED, ICU) to demonstrate convergent design choices, and addresses the "recipe for legal theft" objection directly by acknowledging that AI has made reimplementation dramatically cheaper while arguing the legal mechanism isn't new. He links to the post from Issue #355. As of this date, despite the 0BSD change, Kuhn's SFC investigation, and the blog post, no new discussion has appeared on Reddit (r/programming removed Blanchard's submission, calling it "generic AI content"), Hacker News (9 points, 3 comments after 16 hours), Slashdot, or Lobsters. Discussion has continued on the Fediverse and Bluesky, however, and chardet remains an active topic on X/Twitter across Japanese, Russian, French, and Turkish accounts: Kuhn cross-promoted his SFC investigation on Mastodon, and Gentoo developer Michał Górny posted on treehouse.systems calling the relicensing "#copywashing" and claiming most distributions have "rightfully boycotted" chardet 7 — though his own Repology links show most distros are on 5.2.0 and also skipped 6.0.0, suggesting normal update lag rather than a principled stance. On March 30, a commenter on #355 ("nanoscopic") asked for a pinned issue pointing to a neutral discussion venue, noting that "people will just continue making / opening new issues about this eternally."

March 30: Michael Weinberg publishes "AI-Assisted Library Rewriting and Relicensing Brings Open Source Software into the IP World of Open Source Hardware" on his personal blog. Weinberg — Executive Director of NYU Law's Engelberg Center for Innovation Law and Policy and a board member of the Open Source Hardware Association (OSHWA), where he also runs the Open Hardware Certification Program — argues that the chardet dispute "may be an early rumble of an earthquake" and that OSS may be entering a world that resembles OSHW's, where many functional elements are "categorically ineligible for copyright protection" and enforcement relies more on social norms than legal mechanisms. Notably, Weinberg's legal framing is largely sympathetic to Blanchard's position: he writes that copyright on code "pretty narrowly covers the actual code as actually written, and excludes the functionality that the code represents (see, e.g., Google v Oracle)," that this is "on balance a good thing," and that "it would be bad if we decided to expand the scope of software copyright in an attempt to make open source software licenses more robust." The post becomes the primary English-language vehicle for the "OSS is becoming OSHW" framing.

March 31: ZDNET publishes "How AI has suddenly become much more useful to open-source developers" by Steven Vaughan-Nichols, using chardet as a central example of the legal complications arising from AI-assisted open-source development. The article frames chardet within a broader trend of AI tools becoming genuinely useful for maintaining open-source software — citing Greg Kroah-Hartman (Linux stable kernel maintainer) saying AI-generated security reports suddenly improved "a month ago," Dirk Hondhel (Verizon) predicting AI will be able to maintain code "with acceptable results at some point this year," and Ruby maintainer Stan Lo reporting AI has already helped with documentation, refactors, and debugging. The piece quotes both Pilgrim's objection and Blanchard's response, and notes Daniel Stenberg's (cURL) ongoing struggle with AI slop and the death of Jazzband due to AI-generated spam. Linus Torvalds is quoted warning that AI-generated code can be "horrible to maintain." ZDNET.fr publishes the French translation.

April 1: Blanchard converts Issue #355 into GitHub Discussion #362 and enables Discussions for the chardet repository, in response to nanoscopic's request for a more appropriate venue for ongoing conversation.

April 3: Techdirt republishes Glyn Moody's Walled Culture essay ("Can Agentic AI Coding Tools Finally End Copyright For Software While Re-Inventing Open Source?"), using chardet as the central case study and arguing that AI-generated code may not be copyrightable at all — extending the chardet story to a significantly larger US tech-policy audience than the original Walled Culture post.

April 6: A short link post on blog.bilak.info titled "What Happens If Someone Reimplements Your Open Source Software with LLMs And Relicenses It?" highlights the chardet controversy, quoting extended passages from Armin Ronacher's post and Simon Willison's coverage (including Fontana's #334 comment), but offering only a few sentences of original commentary. The same day, yomoyomo publishes a Japanese translation of Michael Weinberg's March 30 "OSS → OSHW" essay on yamdas.org, and Shuji Sado publishes a three-tweet X thread critiquing Weinberg's argument directly. Sado calls the essay "good as a problem-raiser or wake-up call, but legally a bit sloppy," and makes three specific objections: (1) Weinberg's reliance on Oracle v. Google is inappropriate, because that case assumed copyrightability and turned on fair use — it doesn't support the claim that core software functionality is unprotected; (2) the argument that AI rewrites "undermine the legal efficacy of open source" is an overreach that ignores the fact that "modern open source legal practice is generally understood not as resting on a single pillar of copyright but as a two-tier structure involving both copyright and contracts"; and (3) the real consequence is not that OSS licenses' legal foundation disappears, but that AI drives down the cost of reimplementation to the point where access, substantial similarity, and reliance become increasingly difficult to prove — meaning legitimacy will depend more on the development process than on license text. This is the most technically substantive critique of the Weinberg framing to date. (Sado's own March 9 memo was already nuanced — concluding that the rewrite "is difficult to say is clearly an independent implementation, yet there is not enough to conclude immediately that it is copyright infringement" — so his April 6 thread is better read as a continuation of that line than as a walk-back.)

April 7: Blanchard releases chardet 7.4.1 on PyPI, with wheels for CPython 3.12–3.14 across Windows, macOS, and Linux.

April 9: Heather Meeker publishes "The Chardet Controversy: Open Source and the AI Clean Room" on her personal blog — the first detailed analysis of the dispute from a tier-1 US commercial open source licensing attorney (author of Open (Source) for Business and From Project to Profit, and one of the most widely cited practitioners in the field). Meeker's assessment is substantially favorable to Blanchard. On the naming question — the sharpest point made by Kladaric and others — she declines to treat it as disqualifying, writing that while reusing the name and API may have been a strategic misstep, "perhaps we should cut Blanchard some slack for the many years he put into maintaining the project," and noting that because naming in open source is a trademark concern (about source, nature, and quality), "what constituted Chardet in 2026 may be as much Blanchard's doing as anyone else's, so perhaps he effectively controls the name." She calls the rewrite "one of the more conscientious [clean room efforts] I have seen." On the clean-room methodology question, Meeker affirmatively endorses the AI-mediated approach: she writes that although prior exposure to a codebase can never be fully cleansed, "using AI to perform clean room development can help manage this issue by interposing a neutral actor — the AI — between the reimplementation developer and the specification developer." On the core legal question, she endorses Fontana's #334 analysis as "probably the right legal analysis with respect to copyright," noting that no one has identified copyrightable material from earlier versions persisting in 7.0.0. She frames the ongoing controversy as "only partly about copyright and also about open source community expectations" — situating Pilgrim's objection as a social/normative complaint rather than a legally viable one. On the 0BSD switch and the copyrightability-of-AI-output debate, she writes that "all of this is a red herring for the infringement issue surfaced by Pilgrim." Her broader concerns are not directed at Blanchard: she argues copyleft faces structural pressures predating AI, flags that copyright duration is far too long relative to software development cycles, and raises the "great wealth transfer" problem — that the eventual passing of copyright interests in foundational OSS projects to potentially indifferent heirs will force a reckoning regardless of what AI does. As the first in-depth treatment from a practitioner with Meeker's standing in commercial OSS licensing, the post is likely to significantly shape how enterprise legal departments assess chardet 7.x.

Mark Pilgrim breaks a 15-year silence to object

Pilgrim's return was itself a shock to the Python community. Posting on March 4 under his GitHub handle a2mark, he opened with characteristic dry humor: "Hi, I'm Mark Pilgrim. You may remember me from such classics as 'Dive Into Python' and 'Universal Character Encoding Detector.'" His legal argument was direct: the LGPL requires modified code to remain under the same license, this is not a clean-room implementation given the maintainers' exposure, and an AI tool does not grant additional rights.

Pilgrim's issue has accumulated roughly 1,460 thumbs-up reactions (and about 1,880 reactions overall), indicating overwhelming community sympathy. He has not posted any follow-up comments since his initial statement.

Blanchard's detailed response: "a means to an end, not the end itself"

Blanchard responded substantively on Issue #327, engaging directly with Pilgrim's claims. He opened by acknowledging the personal significance of chardet and Pilgrim's work, then addressed the core claim head-on.

On the clean-room question, Blanchard conceded what many expected him to deny: "You're right that I have had extensive exposure to the original codebase: I've been maintaining it for over a decade. A traditional clean-room approach involves a strict separation between people with knowledge of the original and people writing the new implementation, and that separation did not exist here." However, he argued that clean-room methodology is "a means to an end, not the end itself" — the end being proof that the new code is not a derivative work — and that he could demonstrate the same result through direct measurement.

His JPlag v6.3.0 analysis compared every major release:

Version Pair	Avg Similarity	Max Similarity
5.2.0 vs 5.0.0	90.93%	93.83%
5.0.0 vs 4.0.0	87.41%	91.99%
4.0.0 vs 3.0.0	82.99%	94.09%
6.0.0 vs 5.2.0	3.30%	80.05%
7.0.0 vs 6.0.0	0.04%	1.29%
1.1 vs 7.0.0	0.50%	0.64%

Blanchard drew critical attention to the v6.0.0 comparison: that version had only 3.3% average similarity to v5.2.0, but its 80% max similarity revealed that "entire files were carried forward from the prior release. It was still clearly part of the same lineage, still a derivative work, and still rightfully LGPL." Version 7.0.0, by contrast, showed max similarity under 1.3% against every prior version — "No file in the 7.0.0 codebase structurally resembles any file from any prior release."

On the LGPL's scope, Blanchard argued that its copyleft provisions "apply to derivative works and do not extend to independent implementations of the same idea. Character encoding detection via BOMs, statistical modeling, and candidate elimination are well-established techniques described in publicly available research predating both uchardet and chardet." He noted the logical consequence of Pilgrim's position: "if prior exposure alone were enough to disqualify a rewrite, it would be very difficult for any maintainer of an LGPL project to ever write a new implementation of the same functionality under a different license."

He provided full process disclosure: a 13-point requirements list he wrote on his phone, starting in an empty repository with no access to the old source tree, and explicitly instructing Claude not to base anything on LGPL/GPL-licensed code. "I did not write the code by hand," he stated, "but I was deeply involved in designing, reviewing, and iterating on every aspect of it."

On his motivations, Blanchard explained that roughly a decade ago there was discussion of including chardet in the Python standard library — blocked because the stdlib requires permissive licensing. His goal was to encourage broader contribution "either through trying to submit it to the standard library, or just by having more people work on it who might avoid LGPL projects for whatever reason." He noted that the other two members of the chardet team haven't made a commit since before 2013.

Bruce Perens weighs in directly on Issue #331

Bruce Perens' "fire alarm" quote to The Register was widely interpreted as a condemnation of the chardet relicensing, but that misreads his position. Perens was describing the macro implications for software economics — that AI makes reimplementation so cheap that licensing friction is disappearing — not judging Blanchard's specific action as illegitimate. In the same Register article, he described doing essentially the same thing himself, replicating an existing proprietary SRE platform in a different language under a different license using AI. His comment on Issue #331 is consistent with this framing:

I do not recommend rejecting an AI-mediated Open Source program with verified low-similarity to other works on the basis of legal risk at this time. The largest risk is that the license might be difficult to enforce, because courts have ruled that the production of an AI can not be copyrighted and thus might be in the public domain.

Perens argued that "the courts have not sided with plaintiffs in finding AI work to be infringing so far, because the law as it stands today is built primarily around the concept of literal copying." He described AI as "a blender that mixes something close to the sum of human knowledge in the way most probable to answer its prompt, so that the result is unrecognizable as derived from any one source."

For the law to make chardet v7 illegal, Perens argued, "there would have to be a fundamental change in copyright law. No such legislation is pending or probable at this time." He concluded: "I am not evangelizing this. As I wrote to The Register's reporter, this might not be the world I would have liked to have, but it's the one we got."

The public domain question emerges: Issue #334

On March 9, Issue #334 ("What does the MIT license cover in chardet?") was opened by Richard Fontana — Senior Managing Attorney at Red Hat, one of the three principal authors of the GPLv3/LGPLv3/AGPL, and a former OSI director. His question was precise: if Claude wrote the code and Blanchard's involvement was "designing, reviewing, and iterating on" but not writing the code by hand, then under what theory does "Copyright (c) 2024 Dan Blanchard" in the LICENSE file hold?

On March 10, Blanchard responded by disclosing that he had "already spoken to one lawyer about this" and planned to "get a couple opinions before making a decision," and that if it really did turn out that AI-generated output was not copyrightable he was "actually quite happy to release this into public domain", and raised the find | sed analogy: "it is pretty crazy to me that even for things where I said 'Replace all instances of X with Y'... I wouldn't be considered an author because I delegated the editing of the files to the AI."

Blanchard ultimately resolved Fontana's concern by switching the license to 0BSD on March 22 — a public-domain-equivalent license that sidesteps the copyrightability question entirely. 0BSD functions identically whether the code is copyrightable or not: it grants unconditional permission with a warranty disclaimer, making no claim of copyright ownership. Fontana gave the 0BSD PR a thumbs-up, though he later clarified on Issue #355 that his earlier statements were observations, not definitive legal conclusions.

Media coverage and global reach

Thomas Claburn's Register article (March 6) became the central press narrative, with Bruce Perens declaring software economics "dead, gone, over, kaput" and Zoë Kooyman of the FSF calling the approach fundamentally compromised. Ars Technica published a detailed overview on March 10 by Kyle Orland (165+ comments), highlighting three complicating factors: Claude's reliance on metadata files from prior versions, the likelihood that Claude was trained on chardet's source code, and Blanchard's heavy involvement in reviewing the output.

Slashdot framed the story as "laundered-via-LLM." Phoronix called it the "newest open-source concern around AI." Hong Minhee argued that legality and legitimacy are distinct — even if the relicensing is legal, it breaks the social compact underlying copyleft. Kitty Giraudel highlighted that Claude's implementation plan directly referenced chardet v6.0.0 code, undermining the "clean room" claim. Coverage also appeared in Daring Fireball, Korben (French), and ShiftMag. Python Bytes Episode #473 ("A clean room rewrite?", March 16) covered chardet as its lead topic.

ZDNET's Steven Vaughan-Nichols placed chardet in the broader context of AI's growing utility for open-source maintenance (March 31), reporting that top maintainers like Greg Kroah-Hartman and Dirk Hondhel see AI tools as newly capable of sustaining neglected projects — while immediately flagging chardet as the example of where legal risk looms. The article also highlighted the flip side: Daniel Stenberg's ongoing battle with AI slop in cURL, and Jazzband's shutdown under a "flood of AI-generated spam PRs and issues."

After March 10, the initial hot takes gave way to longer-form analysis. Luka Kladaric's ShiftMag piece argued that Blanchard's core mistake was shipping under the same package name while claiming independence, and criticized all sides: Blanchard for the name reuse, the community mob for not "showing up years ago" to help maintain chardet, and AI optimists for "celebrating license laundering." The piece confirmed Jason Scott vouched for Mark Pilgrim's identity in the GitHub thread. Kladaric made the sharpest version of the package-name argument of any source: "the value isn't in the code — the value is in the name. In the twelve years of trust built by that name. In the fact that thousands of requirements.txt files already have chardet in them." He concluded that the name "isn't his to relicense" and that a separate package would have avoided nearly all of the controversy. Shuji Sado dissected whether chardet's encoding name mappings constitute protectable expression, concluding the answer is genuinely ambiguous. LWN.net made chardet its lead front-page story for the March 12 weekly edition; the comment thread featured sharp debate between subscribers arguing clean-room design is a legal requirement versus those arguing it's merely one way to establish independence. Mission Cloud CTO and PSF Fellow Jonathan LaCour published "Before You Use AI to Rewrite a GPL Library, Read This" (March 13), taking a nuanced position: he explicitly agrees with Pilgrim that "given how this was done, the new implementation is a derivative work" (while noting he is "not convinced that the 'clean room' separation is a reasonable expectation in the first place") and simultaneously argues the controversy would be "a big fat nothing-burger" if Blanchard had chosen a new package name like chardetect or pychardet-mit — LaCour also stresses he "greatly prefer[s] the much more liberal MIT license." Former OSI Executive Director Stefano Maffulli took the contrarian position, arguing AI collapses the cost of exercising the "freedom to fork" and that "Claude is more liberating than the GNU GPL."

The controversy achieved genuine global reach across at least 10 languages and 40+ sources. In Japan, GIGAZINE ran comprehensive coverage and Shuji Sado published the most detailed legal analysis in any language (also available in English), introducing the concept of "two-stage dependency" from Japanese copyright theory. Habr covered it twice for the Russian developer community. CSDN published a full Chinese translation of the elvex podcast transcript with editorial commentary, making it one of the most substantive non-English sources. French tech magazine Programmez! published a follow-up on March 31. Additional coverage appeared in Korean (GeekNews), Spanish (El Ecosistema Startup), Portuguese (Oficina dos Bits), Italian (Mia Mamma Usa Linux), and Turkish (AI Haberleri). German-language coverage came not from written media but from the conference stage: at FOSS Backstage Berlin (March 16–17), German attorney Chan-jo Jun and Andreas Kutulla (CEO of Bitsy) presented "AI Generated Code or Rewrites Violate FOSS Licences," using chardet as their central case study. Jun argued there is "no such thing as a clean room implementation when you use a large language model because the large language model has been trained with all software that's out there," and demonstrated the point by having Claude Code produce a new chardet implementation in 50 minutes from a single prompt. His legal framework centered on SSO (structure, sequence, organization) analysis: regardless of textual similarity, what matters is whether the copyrighted "functionality architecture" is the same. He conceded the key limiting principle: if someone "managed to create a new architecture with AI or without AI, you could be able to rewrite legally and get rid of the former license." In the Q&A, every audience question was about chardet. Blanchard's March 29 blog post addresses the criteria Jun laid out almost point by point: the three code similarity analyses (JPlag, git blame, Copydetect/MOSS) target textual derivation, while the architectural comparison against chardetng, Google CED, and ICU targets exactly the SSO question, arguing chardet 7's design reflects convergent engineering rather than derivation from chardet 6. Jun's concession that a genuinely new architecture could support relicensing is the position Blanchard's empirical evidence is designed to establish. No coverage appeared in any legal/IP publication (Law360, The IPKat, PatentlyO).

On Bluesky, the controversy generated substantial engagement. Armin Ronacher's post endorsing the relicensing received 102 likes and 58 reposts — the single highest-engagement post on any platform. Hong Minhee's "legal vs. legitimate" analysis received 84 likes. On the critical side, Baldur Bjarnason called it "derivative rewrites" providing "cover for bullshit relicensing attempts" (39 likes, 27 reposts), Christine Lemmer-Webber (co-editor of the ActivityPub spec) offered a reductio ad absurdum — if this works, "we can run Microsoft's Windows source code releases through an LLM and get a FOSS version, right?!?" — and Yarn Spinner called it "Shit AI developers Think They Can Do With Open Source Code" (24 likes, 35 reposts). At least one user reported a possible regression in endianness detection, though no GitHub issue was filed. Coverage also reached Turkish via AI Haberleri, bringing the total language count to at least 10. By late March, chardet had become the canonical reference case for AI-and-licensing disputes: Shuji Sado cited it when discussing the LLM-generated ext4 patch submitted to OpenBSD, and Gergely Orosz (The Pragmatic Engineer) referenced the chardet precedent on March 31 when discussing the accidental leak of Claude Code's own source code and a Python rewrite that circumvents Anthropic's DMCA takedowns (234 likes on Bluesky). On April 1, Sado explicitly distinguished chardet from the Claude Code leak, arguing that unlike chardet — where "the original author's rights were minimal" — the Claude Code Python rewrite "is squarely stepping into the arena of copyright infringement." On Bluesky, Elf M. Sternberg called AI "Licensewashing" a "huge deal right now" (April 1), and Nighthaven (@moja.blue) published a philosophical essay "Does Code Have Authorship? — Author, Ideator, Agent" on March 18, weaving chardet together with Foucault's author function and Thaler v. Perlmutter to argue that authorship is shifting "from the act of writing to the act of conceiving."

On Reddit, r/programming removed Blanchard's blog post submission, calling it "generic AI content." However, the same post stayed up on r/Python (April 2, 4 comments). French-language discussion appeared on r/developpeurs (19 points, 15 comments) and r/actutech. On r/openSUSE, chardet 7 generated 20 comments in a packaging context. Multiple r/comfyui threads (late March–early April) showed users hitting the RequestsDependencyWarning caused by installing chardet 7.x alongside requests — confirming the compatibility issue documented in Issue #7284.

On X/Twitter, Japanese-language discussion remained the most active non-English thread through early April.

Antirez invokes the GNU project and the history of reimplementation

On March 8, Salvatore Sanfilippo (antirez, creator of Redis) published "GNU and the AI reimplementations," making the most historically grounded case for the legitimacy of Blanchard's approach. His core argument: the open-source movement was itself built on reimplementation, from the GNU userspace to Linux to Minix.

He noted that Stallman directed GNU contributors to reimplement UNIX tools with specific qualities to provide a "protective layer against litigations," and that many contributors "likely were exposed or had access to the UNIX source code." The Linus/Minix/UNIX chain of exposure is even more pointed — Linus was "massively exposed to the Minix source code," and Tanenbaum himself had been deeply familiar with UNIX internals.

On the law, antirez argued clean-room methodology is "just an optimization in case of litigation" — it makes it easier to win, but being exposed to source code is fine as long as only ideas and behavior are used. Where AI changes things is speed and cost, not the fundamental nature of the process. He dismissed the "uncompressed copy" theory as "consolatory as it is false."

He closed by invoking "the Stallman way" — urging those who reimplement to also improve, adding novelty rather than producing lazy copies. By this measure, chardet v7's 43x speed improvement and architectural overhaul would qualify.

Enterprise risk and downstream impact

Issue #331, where an NVIDIA employee called v7.0.0 "absolutely toxic" for corporate use, crystallized the downstream risk. Blanchard engaged substantively, and when asked if he'd consider reverting to LGPL, responded: "I have definitely considered it, but I think it's worth sticking with so far."

The downstream ecosystem has been muted. The requests library already uses charset-normalizer as its default backend, and a new Issue #7284 documents that installing chardet 7.x alongside requests triggers a confusing RequestsDependencyWarning, since requests only supports chardet <6 — further limiting chardet 7.x adoption in the wild. charset-normalizer v3.4.6 (March 15) updated its README with a prominent footnote directly addressing the chardet controversy, calling the relicensing "disputed on two independent grounds" and alleging that chardet 7 incorporated key architectural ideas pioneered by charset-normalizer — notably decode-first validity filtering and encoding pairwise similarity — without acknowledgment. charset-normalizer ranks in the top 10 PyPI projects with 22 billion lifetime downloads and is positioned as the established, uncontroversial alternative.

A chardet-rust fork by Andreas Jung emerged as a Rust-powered, Python-API-compatible alternative that deliberately preserves the LGPL license. Built using the Kimi 2.5 AI model, it claims 43x faster performance than chardet 6.0. A discuss.python.org thread discusses how to globally exclude chardet v7 from pip.

No major Linux distribution has adopted chardet 7.x. Debian remains on 5.1.0, Fedora/EPEL ships 5.2.0 (with a February changelog entry that explicitly corrected the license to "LGPL-2.1-or-later"), and Arch Linux packages 6.0.0.post1. Gentoo developer Michał Górny characterized this as a "boycott" of what he called "#copywashing," but the Repology data he linked tells a less dramatic story: most distributions are on 5.2.0 and also skipped 6.0.0, suggesting normal distro update lag rather than a principled refusal of 7.x specifically. The downstreams shipping chardet 7.x are Chromebrew, Conda-Forge, Homebrew, KaOS, OpenIndiana, openmamba, Ravenports, Spack, and T2 SDE.

Conda-forge, however, now ships all chardet 7.x releases (v7.0.1, v7.1.0, v7.2.0) with MIT license metadata, after Blanchard merged the pending feedstock PRs on March 17–18 over the initial objections of another conda-forge maintainer.

No CPython Steering Council member, PSF board member, or PSF executive has made any public statement about chardet v7. The closest engagement has come through a discuss.python.org thread on LLM code, which grew to 96 posts across March 26 – April 2. Tim Peters (CPython core developer, author of the Zen of Python) called out chardet by name on March 28, calling the relicensing morally wrong: "it obviously violated the intent of the original author, who used the LGPL for their own reasons. 'Honor their wishes' is the decent thing to do." Peters subsequently proposed concrete wording for an addition to CPython's AI policy and on March 31 opened devguide issue #1777 ("Suggested elaboration of AI policies") to formalize it, warning that AI tools substantially intensify the chance of submissions unintentionally including derivative work and encouraging (not mandating) disclosure of AI involvement in contributions. On April 8, core developer Mariatta opened PR #1778 ("Update guidelines on using GenAI") to implement the proposal. On April 1, Łukasz Langa (CPython Developer in Residence) dismissed the disclosure proposal as "wishful thinking," writing: "you can't trust PR authors to disclose use of AI tools… At this point you have to assume every PR is made with AI tools" and that reviewers must treat any PR as "possibly adversarial," extending back to the xz backdoor precedent. Langa's response — while explicitly caveated as his personal view, not the Steering Council's — is the closest any PSF-affiliated figure has come to addressing the broader AI-and-licensing question chardet has raised. The thread's other notable contribution was Oscar Benjamin (SymPy maintainer), who argued forcefully that LLM-generated text in open-source communication is "massively demotivating" and leads to "a complete breakdown in human to human communication," calling for clear rules even if LLMs are allowed for code. The original poster (ell1e) catalogued 14 open-source projects that have adopted outright AI bans, including Gentoo, NetBSD, QEMU, Zig, and Servo. Notably, the PSF accepted $1.5 million from Anthropic in January 2026 for PyPI security and operations — Anthropic being the maker of Claude, the tool used for the chardet rewrite.

The two underlying questions and where people land

The debate is really about two distinct legal questions, complicated by a third factor: Claude was almost certainly trained on chardet's source code, and Blanchard's blog post confirms Claude can reproduce the architecture from memory. Whether that training data exposure produces a derivative work is the crux of the dispute. The standard US copyright test for non-literal software infringement, from Computer Associates v. Altai (1992), requires (1) access to the original and (2) substantial similarity after filtering out unprotectable elements. Access is undeniable. Blanchard's similarity data targets the second prong directly.

Question 1: Is v7 a derivative work of the LGPL-licensed chardet, and therefore required to remain LGPL?

Those who say no include antirez (citing decades of reimplementation precedent from GNU to Linux), Ronacher (who sees the code as a new ship), Willison (leaning this direction), and Perens (who recommends against rejecting v7 on legal risk grounds). Blanchard's JPlag data — showing under 1.3% structural similarity — is the empirical foundation for this position. Richard Fontana (Red Hat attorney, co-author of GPLv3/LGPLv3/AGPL, former OSI director) initially stated on Issue #334 that he didn't "currently see any basis for concluding that chardet 7.0.0 is required to be released under the LGPL" — but subsequently clarified on Issue #355 that this was "an observation about the state of the discussion at the time, not as a definitive legal conclusion," and that he "did not intend to take a firm position on the LGPL issue." His focus was on the separate copyrightability question, not on endorsing the relicensing. (Note: chardet uses LGPLv2.1, which is materially different from the v3 Fontana co-authored — v3 adds anti-tivoization, patent retaliation, and a 30-day cure period.)

Those who say yes include Pilgrim (who argues exposure alone taints the rewrite), the FSF's Kooyman (who argues LLMs trained on the code cannot produce clean output), and much of the Slashdot and OSnews commentary. Hong Minhee's "legal vs. legitimate" essay makes a nuanced version of this argument — that even if it's technically legal, it violates the social compact of copyleft. Hong goes further than other critics by proposing a concrete remedy: a "specification copyleft" that would cover test suites and API definitions, arguing that "if source code can now be generated from a specification, the specification is where the essential intellectual content of a GPL project resides." This would close the exact loophole Blanchard's process exploited (working from the API and test suite rather than the source code). Hong also made a pointed structural observation about the debate's leading voices: both antirez and Ronacher "arrive at conclusions that align precisely with their own interests," and when such positional asymmetry is ignored and the argument is presented as universal analysis, "what you get is not analysis but rationalization."

Question 2: Is AI-generated code copyrightable at all?

Those who say no include Perens (in his #331 comment), Blanchard's attorney (in an initial assessment), and the author of the closed Issue #325. Under Thaler v. Perlmutter, purely AI-generated works lack the human authorship required for copyright. If the answer to this question is "no," it also implies the answer to Question 1 is "no" — because the LGPL is a copyright license, and something that isn't copyrightable can't be bound by copyright terms. The code would simply be in the public domain.

This question remains unresolved for AI-assisted works (as opposed to purely AI-generated ones). The pending Allen v. Perlmutter case may draw that line. Blanchard's find | sed analogy highlights the absurdity edge: at what point does directing an AI tool cross from "assistance" (copyrightable) to "generation" (not)?

Most people who believe AI-generated code is not copyrightable also believe v7 is not a derivative work — these positions reinforce each other, since the LGPL can't bind uncopyrightable material. But the reverse isn't necessarily true: someone can believe v7 is an independent non-derivative work while still believing Blanchard is its human author and can legitimately claim MIT copyright. That's the position antirez and Ronacher seem to hold — they see the rewrite as legitimate on its own merits, without needing to reach the copyrightability question at all.

What comes next

The controversy has entered its most consequential phase. The Software Freedom Conservancy has announced a formal investigation into the chardet relicensing, led by Bradley Kuhn personally, with results to be published. This is the first institutional commitment to a rigorous hybrid technical/legal analysis of the situation. Kuhn's explicit recommendation that people not rely on the legitimacy of the relicensing — coupled with his observation that anyone can relicense chardet 7.x back to LGPLv2.1-or-later — signals that SFC views the LGPL question as genuinely unresolved and potentially favorable to Pilgrim's position.

Blanchard has responded by publishing his most comprehensive defense yet: a blog post with complete Claude session logs, three code similarity analyses, and a full accounting of Claude's source access during the rewrite. This moves the dispute from GitHub issue comments into a more structured evidentiary presentation.

The controversy now has three active institutional tracks: SFC's investigation (timeline unknown but described as "high priority"), the pending Allen v. Perlmutter ruling in Colorado (which could define AI authorship thresholds), and potential discussion at the OSI Legal and Licensing Workshop (April 15–17 in Berlin). No legal action has been filed against Blanchard directly, but Kuhn's statement positions SFC as the most likely organization to articulate a formal copyleft enforcement theory if their analysis supports one.

The Black Duck OSSRA 2026 report found that 68% of audited codebases now contain license conflicts — the highest rate ever recorded and a 12-percentage-point jump from the prior year — and explicitly used the term "license laundering" to describe AI assistants generating code from copyleft sources without retaining license information. chardet appears less an anomaly than a canary: the first high-profile instance of a pattern that AI-accelerated development is making endemic.

The deeper question remains: can AI be used to effectively circumvent copyleft licenses? The SFC investigation may be the first authoritative attempt to answer that question. As Kuhn wrote: "Many want to treat these technologies like a speedboat — it's wiser to use them like a rowboat."

Sources

GitHub Issue #327: No right to relicense this project — Mark Pilgrim's objection and Blanchard's detailed response
GitHub Issue #331: v7.0.0 presents unacceptable legal risk — NVIDIA employee's risk assessment, Blanchard's engagement, and Bruce Perens' legal analysis
GitHub Issue #334: What does the MIT license cover in chardet? — Public domain question and Blanchard's disclosure of legal consultation
GitHub Issue #325: Nullified License — Public domain argument (closed)
Chardet 7.0.0 Release Notes
The Register: Chardet dispute shows how AI will kill software licensing — Thomas Claburn, March 6
Ars Technica: AI can rewrite open source code—but can it rewrite the license, too? — Kyle Orland, March 10
Simon Willison: Can coding agents relicense open source? — March 5
Simon Willison: A quote from Richard Fontana — March 27 update
Armin Ronacher: AI And The Ship of Theseus — March 5
Antirez: GNU and the AI reimplementations — March 8
Hong Minhee: Is legal the same as legitimate — March 9
Tuan-Anh Tran: Relicensing with AI-assisted rewrite — March 5
OSnews: The great license-washing has begun
Software Freedom Conservancy: SCOTUS Declines to Hear AI Case — March 4
LWN.net: The relicensing of chardet — March 5 (subscriber)
LWN.net comment: NYKevin on the AFC test
Phoronix: LLM-Driven Large Code Rewrites — March 8
Slashdot: Python 'Chardet' Package Replaced — March 6
Daring Fireball — March 8
discuss.python.org: Globally excluding a package or version — March 9
Chardet 7.0.1 on PyPI
Chardet 7.2.0 on PyPI
Chardet Changelog
conda-forge chardet-feedstock PR #43 — v7.0.1 merge with licensing discussion
charset-normalizer v3.4.6 on PyPI — README footnote addressing chardet controversy
chardet-rust on GitHub — LGPL-preserving Rust fork
ShiftMag: License Laundering and the Death of Clean Room — Luka Kladaric, March 10
Open Source Guy: Can You Relicense Open Source by Rewriting It with AI? — Shuji Sado, March 10
LWN.net comment thread on chardet — Subscriber discussion
Mission Cloud: Before You Use AI to Rewrite a GPL Library, Read This
Black Duck OSSRA 2026 Report — 68% license conflict rate
European Parliament: Copyright and Generative AI Resolution — March 10
Deadline: GOP Senator Unveils Draft AI Legislation — March 18
FSF urges AI vendors to liberate LLMs — March 16
It's FOSS: PSF Accepts $1.5M from Anthropic — January 2026
RedMonk: The Generative AI Policy Landscape in Open Source — Tracks 77 organizations' AI policies
Fedora EPEL chardet 5.2.0 update
Kitty Giraudel: On chardet, AI and OSS licensing — March 6
Stefano Maffulli: The second liberation: AI is the final frontier of Copyleft — March 16
Python Bytes Episode #473: A clean room rewrite? — March 16
LWN.net: Vibe-coded ext4 for OpenBSD — Jonathan Corbet, March 26 (opens by citing chardet as the reference LLM/licensing example)
LWN.net comment: SFC is analyzing chardet LGPL situation — Bradley Kuhn, March 27
Le Monde Informatique: Pourquoi l'IA menace les licences logicielles — Reynald Fléchaux, March 13 (French)
GIGAZINE: AI has destroyed the rule that "if you copy code, you inherit the license" — March 10 (Japanese, English translation)
Shuji Sado: AI-generated reimplementation license change memo — March 9 (Japanese)
Habr: chardet released under MIT instead of LGPL — March 2026 (Russian)
CSDN: Library with 130M monthly downloads rewritten by AI — March 2026 (Chinese)
El Ecosistema Startup: Relicensing open source con IA — March 2026 (Spanish)
Oficina dos Bits: O Fim de uma Era? — March 6, 2026 (Portuguese; note: contains factual error conflating chardet 7 with charset-normalizer)
Mia Mamma Usa Linux: AI, Ahi — March 2026 (Italian)
Programmez!: Affaire chardet 7.0 — March 31 (French)
Walled Culture: Why genAI means the end of copyright for software — March 25
Can Artuc: The Maintainer Used AI to Kill His Open Source License. It Took Five Days. — March 10 (Medium)
requests Issue #7284: Clarify problematic chardet dependency warning
D.C. Circuit opinion: Thaler v. Perlmutter — March 2025
Chardet PR #349: Change 7.x license from MIT to 0BSD — March 22
GitHub Issue #355 / Discussion #362: Clarification regarding prior comment in #334 — Fontana's March 27 walkback, Bradley Kuhn's SFC investigation announcement; converted from issue to discussion on April 1
Dan Blanchard: "Everything Claude Saw: A Transparent Account of the Chardet v7 Rewrite" — March 29, complete session logs and code similarity analyses
Elvex podcast: "He rewrote chardet with Claude. The internet blew up. Here's his take." — March 24, 48 min
Commit: Project History documentation with Pilgrim-era graft instructions
Commit: Historical performance table across all versions
Allen v. Perlmutter docket (CourtListener) — Pending, D. Colorado
discuss.python.org: I am concerned about LLM code in Python — March 26 – April 2, 141 posts; Tim Peters chardet remarks, proposed CPython AI policy addition, and Łukasz Langa's response
python/devguide#1777: Suggested elaboration of AI policies — Tim Peters, March 31
python/devguide#1778: Update guidelines on using GenAI — Mariatta, April 8
FOSS Backstage Berlin: "AI Generated Code or Rewrites Violate FOSS-Licences" — Chan-jo Jun and Andreas Kutulla, March 16–17 (video, ~30 min)
ZDNET: How AI has suddenly become much more useful to open-source developers — Steven Vaughan-Nichols, March 31
ZDNET.fr: Open source : comment l'IA est soudainement devenue bien plus utile pour les développeurs — French translation, March 31
Techdirt: Can Agentic AI Coding Tools Finally End Copyright For Software While Re-Inventing Open Source? — Glyn Moody, April 3
blog.bilak.info: What Happens If Someone Reimplements Your Open Source Software with LLMs And Relicenses It? — Riyad, April 6
Michael Weinberg: AI-Assisted Library Rewriting and Relicensing Brings Open Source Software into the IP World of Open Source Hardware — March 30
yamdas.org: Japanese translation of Weinberg's OSS→OSHW essay — yomoyomo, April 6
X thread: Shuji Sado critiquing Weinberg's OSS→OSHW argument — April 6
Heather Meeker: The Chardet Controversy: Open Source and the AI Clean Room — April 9

dan-blanchard/report.md

Select an option

No results found