RFE v2 + Defensive Review: `notebooklm-py`

What this is. A two-part document. Part I is a defensive adoption review of the Python package teng-lin/notebooklm-py — should you install it, under what constraints, with what blast radius. Part II is the v2 revision of our Research-Formalize-Educate methodology, refined by observing how it performed when applied to itself.

What this is not. It is not a tutorial on NotebookLM, an attack guide, or a complete supply-chain primer. The companion documents linked at the end carry the conceptual scaffolding.

Last verified. 2026-05-14. Per-section staleness budget: 90 days for tool-version specifics (uv, notebooklm-py); 180 days for the methodology rules; reverify external URLs on any modification of this document.

Provenance. Part I is original synthesis of 4 parallel research agents (1 refused). Part II is original synthesis of 1 meta-observation agent + the patterns we've used across rookie handoff, Bootstrap Conundrum, and Bootstrap Handbook. The 21 hard rules are original; the 5-part fresh-agent test borrows the orient / execute / cite / anti-pattern / diff shape from systematic-review reproducibility benchmarks (REPRO-BENCH, arXiv 2507.18901). No agents were attacked, no maintainers were contacted, no credentials were tested.

Part I — Defensive adoption review: `teng-lin/notebooklm-py`

I.1 Recommendation (TL;DR)

Risk: LOW-MEDIUM. Recommended posture: ADOPT WITH CONSTRAINTS. Specifically:

Pin to notebooklm-py==0.4.1 with hashes in uv.lock; install via uv sync --frozen --locked --only-binary=:all:.
Verify the PEP 740 attestation out-of-band with pypi-attestations verify pypi --repository https://github.com/teng-lin/notebooklm-py --version 0.4.1 notebooklm-py because uv does not yet verify attestations natively (open issue astral-sh/uv#9122).
Run under a dedicated burner Google account that owns nothing of value — never your daily-driver, never your Workspace admin. The package authenticates with full Google session cookies; blast radius is the entire Google account (Gmail, Drive, Photos, Cloud Console, Workspace Admin if applicable), bypasses 2FA, and revocation can take up to one hour to propagate.
Install and run in a sandbox (rootless container, bubblewrap, or dedicated VM) with NOTEBOOKLM_HOME pointing inside the sandbox.
Egress-allowlist the sandbox to *.google.com, *.googleapis.com, *.googleusercontent.com, *.youtube.com only — make the package's internal domain allowlist a defense-in-depth measure, not the sole control.
Do NOT install the [cookies] extra (avoid rookiepy >=0.1.0). Do NOT set NOTEBOOKLM_REFRESH_CMD (the only shell=True code path in the package).
Treat the resulting storage_state.json as a tier-1 credential: chmod 0600, never committed, never logged, rotated every 30 days via interactive notebooklm login on the isolated profile.

I.2 Decision record

Decision	Choice	Alternatives considered	Why	Reversibility cost
Install path	`uv add notebooklm-py==0.4.1` + `uv sync --frozen`	`pipx install`; `pip install`; build-from-source	uv's first-index strategy blocks dependency confusion by default (uv concepts/indexes); `uv.lock` hash-pins transitives	Low — `uv remove` cleans up
Attestation verification	Manual via `pypi-attestations` CLI	Trust uv to do it; skip	uv does not verify PEP 740 (uv#9122); pip 24+ does. Manual one-time check before pinning.	Zero — it's just a verification step
Auth account	Dedicated burner Gmail	Daily-driver Gmail; Workspace user; service account	No NotebookLM API exists; cookie auth is full session; burner contains blast radius (Google: investigate suspicious session cookies)	Medium — must rebuild library on new account if burner dies
Extras	None (skip `[cookies]`, `[browser]`)	Install all extras	`rookiepy` is pre-1.0 and adds attack surface; Playwright downloads browser binaries on first use (~300MB, another supply-chain trust decision)	Low — add later if needed
Sandbox	Rootless container with no host mounts	bubblewrap; firejail; ephemeral VM; bare install	Cross-platform, repeatable, network-egress filterable	Low — `docker run --rm`
Egress	Allowlist `*.google.com` family	Open egress; blocklist	Defense in depth against exfil; package's own allowlist becomes a second line not the only line	Zero — firewall rule
`NOTEBOOKLM_REFRESH_CMD`	Never set	Set if useful	Eliminates the only `subprocess(shell=True)` path in the codebase	Zero — env var omission

I.3 Static + provenance review (what Agent 1 found)

Repository metadata. github.com/teng-lin/notebooklm-py. Created 2026-01-07; last push 2026-05-13 (within 24 hours of this review). 13.2k stars, 1.8k forks, 21 contributors with teng-lin at 632/695 commits (~91% solo). MIT license. 13+ PyPI releases v0.1.1 → v0.4.1 cadenced ~1-3 weeks. PyPI uploads use Trusted Publishing (OIDC) with PEP 740 attestations and Sigstore transparency entries.

Maintainer. Teng Lin (@teng-lin), NY-based, SWE/PM at XtalPi Inc. (real pharma-tech company). 574 followers, Arctic Code Vault badge (pre-2020 account), sibling project agent-fetch (278 stars, MIT, TypeScript) in the same AI-tooling niche. Plausible identity; no history of typosquatting; PyPI publisher email matches GitHub bio. Bus factor = 1 — if teng-lin stops, you inherit an unmaintained client of undocumented Google internal APIs.

Source-review findings (read 13 files via raw.githubusercontent.com; nothing was installed or executed):

src/notebooklm/__init__.py — clean import side effects: version check, logging config, importlib.metadata version lookup. No network, no subprocess, no decoding of literals.
src/notebooklm/_env.py — NOTEBOOKLM_BASE_URL is constrained to a Google-host allowlist with HTTPS enforcement. User cannot redirect traffic to evil.example.com via env var.
src/notebooklm/paths.py — all filesystem I/O scoped under ~/.notebooklm/ (override: NOTEBOOKLM_HOME). Does NOT read ~/.aws/, ~/.npmrc, ~/.gitconfig, ~/.ssh/.
src/notebooklm/auth.py (123 KB, the highest-risk file) — only Google-domain URLs; cookie domains validated via _is_allowed_cookie_domain(). One subprocess.run(..., shell=True) invoking NOTEBOOKLM_REFRESH_CMD if the user sets it. No eval/exec. No base64/hex literal decoding.
src/notebooklm/_firefox_containers.py — reads profiles.ini + cookies.sqlite from standard Firefox profile dirs. sqlite3 + configparser + shutil only. No subprocess, no network. Cookies returned in-process, not transmitted.
src/notebooklm/_artifacts.py (88 KB) — domain validation before every download via .google.com / .googleusercontent.com / .googleapis.com allowlist. No subprocess, no eval.
pyproject.toml — hatchling build backend (no setup.py, no setup_requires, no install scripts). Direct deps: httpx, click, rich, markdownify, playwright (browser extra), rookiepy (cookies extra). Loose >= pins with no upper bounds — concrete supply-chain weakness — offset by committed uv.lock.

Provenance. PyPI Trusted Publishing confirmed on v0.4.1. PEP 740 attestations present (in-toto.io/Statement/v1) with Sigstore transparency log entries for both wheel and sdist. SHA256 published. GitHub release commits GPG-signed by GitHub's web-flow key. CodeQL runs weekly. Dependabot enabled. CHANGELOG is 41 KB and well-maintained. SECURITY.md describes private email reporting. No SBOM is published — gap.

Standard-malware-pattern check (clean): no dynamic __import__ from base64; no urllib.request to pastebin/discord/IPs; no setup-time exec; no pip-self-update tricks; no postinstall hooks; no obfuscated strings.

Specific concerns to flag even though verdict is "low risk":

ToS / account-flagging risk. Undocumented Google internal APIs. Google can terminate the account. Burner is non-negotiable.
Loose dependency pinning in pyproject.toml. Mitigated by committed uv.lock; install with uv sync --frozen.
subprocess.run(shell=True, …) for NOTEBOOKLM_REFRESH_CMD. RCE if env vars are attacker-controlled. Don't set the var; audit whatever sets env in your CI.
rookiepy >=0.1.0 is the least mainstream dep and pre-1.0. Skip the cookies extra unless you genuinely need browser-cookie import.
playwright >=1.40.0 (browser extra) downloads Chromium on install — significant attack surface, but legitimate Microsoft package.
Bus factor of 1. Plan for teng-lin to disappear.

I.4 The `uv` security model (what Agent 3 found)

Default behavior	Status
TLS to PyPI (rustls + Mozilla roots)	YES
Dependency-confusion safe (`first-index` strategy)	YES — stricter than pip out of the box
Hash verification when hashes are present in `requirements.txt` / `uv.lock`	YES (post-uv 0.2.x; fix landed via astral-sh/uv#4007)
Implicit `--require-hashes` if any package lacks a hash	NO — must pass explicitly
PEP 740 attestation verification	NO — astral-sh/uv#9122 still open
Code-signing of `uv` itself	Sigstore + GitHub Artifact Attestations; OS code-signing not yet
`uv audit` (vulnerability scan)	Preview command in recent 0.11.x
Sandboxing	None. uv runs build backends + post-install code with full user privileges

Net assessment. uv is materially stricter than pip on index handling and metadata validation, equal on hash verification (both require opt-in), and behind pip on attestation verification. PEP 740 verification gap matters specifically for less-known packages like notebooklm-py.

Hardening commands (use this exact recipe for the install):

# Step 0: verify uv itself (one-time)
gh attestation verify (Get-Command uv).Source --owner astral-sh

# Step 1: project-isolated venv
mkdir notebooklm-sandbox; cd notebooklm-sandbox
uv init --no-readme
uv python pin 3.12

# Step 2: verify the PEP 740 attestation out-of-band BEFORE pinning
pip install --user pypi-attestations    # one-time
pypi-attestations verify pypi `
  --repository https://github.com/teng-lin/notebooklm-py `
  --version 0.4.1 notebooklm-py

# Step 3: hash-locked add
uv add "notebooklm-py==0.4.1"
git add pyproject.toml uv.lock
git commit -m "pin notebooklm-py 0.4.1 with verified PEP 740 attestation"

# Step 4: reproducible, binary-only install (no setup.py execution)
uv sync --frozen --locked --only-binary=:all: --no-cache `
  --index-strategy first-index --keyring-provider disabled

# Step 5: pre-run vulnerability audit
uv tool run --from pip-audit pip-audit -r (uv export --format requirements-txt --no-emit-project)

Anti-patterns (never use any of these for an untrusted package):

--allow-insecure-host / --trusted-host — disables TLS verification
--no-verify-hashes / UV_NO_VERIFY_HASHES=1 — removes integrity check
--index-strategy unsafe-best-match — re-enables dependency confusion
--no-build-isolation — lets build backend see existing site-packages
Installing into system Python instead of a venv
Skipping --locked in CI — lockfile drift goes unnoticed
Setting UV_INDEX_URL from an untrusted source

I.5 The credential blast radius (what Agent 4 found)

This is the single most important section.

teng-lin/notebooklm-py does not authenticate to a scoped NotebookLM API — there is no public NotebookLM API. It persists the same first-party browser session cookies (__Secure-1PSID, __Secure-1PSIDTS, SID, APISID/SAPISID, optionally OSID) that your logged-in Chrome uses for every *.google.com property. Stored as plaintext JSON in ~/.notebooklm/profiles/<profile>/storage_state.json.

If that file leaks:

Definitely accessible:

Read/send Gmail via mail.google.com
Browse/download/share/delete Drive files
Read Calendar, Contacts, Photos, Keep, Tasks, Maps Timeline
YouTube history, subscriptions, uploads, monetization
Google Pay history (in some cases initiate purchases without re-prompt)
Cloud Console for any project the account owns
Workspace Admin console if the account has admin role — catastrophic
Any "Sign in with Google" downstream site (Stack Overflow, Medium, Zoom, hundreds)
New OAuth grants and app passwords

Almost certainly bypasses 2FA — the session has already cleared 2FA. Cookie-bearer requests don't re-challenge.

Persistence properties: password change should invalidate sessions, but Google has a documented ~10-20 minute window where the old session is still valid (Luke Berner, "How I abused 2FA to maintain persistence after a password change"). "Sign out of all sessions" can take up to one hour to propagate per Google's own docs.

Recommended posture (ranked pitfalls schema):

#	Pitfall (most-frequent first)	Symptom	Fix
1	Daily-driver Google account used for the bot	Compromise = your whole life	Dedicated burner Gmail; nothing in Drive/Photos; no payment method; recovery phone not your primary
2	`storage_state.json` committed to git "just for CI debugging"	GitHub secret scanning does NOT detect Google session cookies; permanent in history	Pre-commit hook greps for `storage_state.json`; CI secret via `NOTEBOOKLM_AUTH_JSON` only
3	Shared dev environment (RDP, shared bastion, coworker's laptop)	Cookies on shared disk	Personal full-disk-encrypted machine only; never log in over RDP
4	Workspace admin account used	Pivot to entire org	Burner is NEVER an admin
5	`--browser-cookies auto` in dev environment	Slurps cookies from real Chrome profile	Skip the `cookies` extra; interactive `notebooklm login` only
6	Long-lived CI secret with no rotation	Pull-request-from-fork can exfil	Rotate every 30 days; restrict workflow to non-fork events
7	Browser profile not isolated	Daily-driver session leaks into burner cookies	Dedicated Chrome profile or Firefox container or VM

Revocation in <1 hour (file this as runbooks/notebooklm-creds-leaked.md before you need it):

myaccount.google.com → Security → Your devices → Sign out everywhere (invalidates sessions; up to 1h propagation)
Change password
Revoke 2FA app passwords; re-enroll 2FA with new hardware key
Security → Third-party apps with account access → remove unknowns
Delete ~/.notebooklm/ everywhere it exists; rotate NOTEBOOKLM_AUTH_JSON in any CI; audit any backup that ever held it
Workspace only: Admin Console → user → Reset sign-in cookies; gam user <user> signout; pull audit logs (Workspace OAuth Token events, Login Audit events)

I.6 PyPI threat landscape 2024–2026 (what Agent 5 found)

Eight named incidents that should inform our defense (postmortems linked):

Date	Package	Technique	Postmortem
2024-03	`colorama` typosquats	Steganography in audio files; Windows + Linux RATs	Checkmarx: PyPI supply-chain attack on colorama
2024-04	`pingdomv3`	Revival hijack: deleted package, attacker re-registered name, benign v1, malicious v2 gated on `JENKINS_URL` env var	JFrog: revival hijack on 22K packages
2024-12-04	`ultralytics` 8.3.41/42/45/46 (60M downloads)	GitHub Actions Script Injection via PR branch name → poisoned build → XMRig published to PyPI while source stayed clean	PyPI Blog: Ultralytics attack analysis
2025-05–present	`graphalgo`/`graphex` (Lazarus)	Fake recruiter campaign; multistage encrypted RAT; MetaMask fingerprinting	The Hacker News: Lazarus campaign plants malicious PyPI
2025-07–09	Mass maintainer accounts	Phishing via `pypi-mirror.org` lookalike + TOTP relay	PyPI Blog: Plenty of phish in the sea
2025-09-26	`soopsocks` (Windows)	Compiled Go AUTORUN, installs Windows Service + Scheduled Task, UAC bypass, Discord webhook exfil every 30s	JFrog: soopsocks deep-dive
2026-03-24	`litellm` 1.82.7/1.82.8 (~97M monthly downloads)	Maintainer account hijack bypassing GitHub release. Payload in `litellm_init.pth` + `proxy_server.py`; 3-layer base64; AES-256-CBC + RSA exfil of SSH/AWS/GCP/Azure/kubeconfigs/Terraform/Helm; persistence via `sysmon.py` polling every 50min	Sonatype: LiteLLM credential stealer
2026-05	`mistral` AI package	Auto-runs on Linux import; fetches `transformers.pyz` credential stealer; locale-gated	The Hacker News: mini Shai-Hulud worm

The pattern that matters for our threat model: Ultralytics + LiteLLM + Mistral are all legitimate, popular packages compromised via maintainer-account hijack or CI compromise — NOT typosquats. Typosquat defenses (PyPI's name flagging, simple diffs) don't help here. What does help:

Hash-pin via uv.lock so a post-publish replacement doesn't slip in on the next uv sync
Trusted-Publisher provenance gating — refuse to install a release that wasn't published via OIDC (manual via pypi-attestations until uv ships #9122)
.pth-file ban — grep installed venvs for .pth files outside easy-install.pth/distutils-precedence.pth; any other .pth is a build break
Runner egress allowlist — ARC runner image blocks outbound except to pypi.org, files.pythonhosted.org, github.com
Pre-install audit hook — pip-audit + guarddog pypi scan + wheel inspection before any uv add

I.7 What to add to our project's CLAUDE.md and `.claude/settings.local.json`

`CLAUDE.md` additions (paste into "Critical rules"):

8. **No new Python deps without a 15-minute audit.** Any `uv add`, `pip install`,
   or change to `requirements.txt`/`pyproject.toml` must be preceded by:
   - `pip-audit` clean against OSV
   - `guarddog pypi scan <pkg>` clean
   - Trusted-Publisher provenance present (PyPI Simple JSON `provenance` field
     non-null)
   - Wheel inspected for `.pth` files and base64/exec patterns
   - PR description records audit output
   File the audit summary under `docs/audits/<pkg>-<version>.md`.

9. **Pin by hash, not version.** All Python deps in this repo use uv.lock with
   hashes or `--require-hashes`. Version-only pin does NOT protect against
   post-publish replacement (LiteLLM 1.82.8 pattern).

10. **No `.pth` files in vendored / installed packages.** CI greps the venv for
    `*.pth` outside `easy-install.pth` and `distutils-precedence.pth`. Any other
    `.pth` is a build break.

11. **Egress allowlist for runners.** ARC runner image blocks outbound traffic
    except to `pypi.org`, `files.pythonhosted.org`, `github.com`, and the
    fleet's own IPs. New egress destinations require a PITFALLS entry.

12. **NotebookLM auth is full-Google-account auth.** Any tool authenticating via
    Google cookies (notebooklm-py, undici-google, etc.) runs only under a
    dedicated burner Google account, never our daily-driver or Workspace admin.

`.claude/settings.local.json` additions:

{
  "permissions": {
    "deny": [
      "Bash(pip install*)",
      "Bash(uv add*)",
      "Bash(uv pip install*)",
      "Bash(pipx install*)",
      "Bash(curl * | sh)",
      "Bash(curl * | bash)",
      "Bash(wget * | sh)"
    ],
    "ask": [
      "Bash(uv pip install -r*)",
      "Bash(uv sync*)",
      "Edit(requirements*.txt)",
      "Edit(pyproject.toml)",
      "Edit(uv.lock)"
    ]
  },
  "hooks": {
    "PreToolUse": [
      { "matcher": "Edit", "command": "scripts/guard-dep-changes.sh" }
    ]
  }
}

Where scripts/guard-dep-changes.sh refuses changes to requirements*.txt/pyproject.toml/uv.lock unless docs/audits/<pkg>-<version>.md exists in the same diff, runs pip-audit + guarddog on the new pin, and prints the Trusted-Publisher provenance status. This blocks the most common "Claude pulls in a fresh dep mid-task" risk.

I.8 Day-30 verifiable competency

A rookie operator is fluent with notebooklm-py adoption when they can demonstrate, on a fresh machine, in under 60 minutes, without notes:

Stand up a sandbox (container or VM); show egress is allowlisted to Google domains only via curl -v https://example.com (refused) and curl -v https://www.google.com (allowed).
Run the hardening commands from §I.4 in order. Show the pypi-attestations verify output succeeds.
Create a new burner Google account; log into NotebookLM via the package's interactive notebooklm login once.
ls -la ~/.notebooklm/profiles/<p>/storage_state.json shows mode 0600.
Run a smoke task (generate a video overview of one of our published gists) and show success.
Deliberately leak a fake cookie file by cat-ing it to a junk location; demonstrate the revocation runbook (§I.5) end-to-end against the burner account in under one hour.
Run pip-audit and guarddog pypi scan notebooklm-py — show clean.
Open the rotation runbook (docs/runbooks/notebooklm-creds-rotate.md) and re-login the burner from cold; old cookie is invalidated.

If all 8 are demonstrated, adoption is "operationally owned." If any fails, that step's chapter of the Bootstrap Handbook is missing context that needs to be added.

I.9 Further reading

teng-lin/notebooklm-py repo — primary source; start with docs/configuration.md for auth model
PyPI page for v0.4.1 — confirms PEP 740 attestation + Trusted Publisher
teng-lin/notebooklm-py SECURITY.md — maintainer's disclosure policy
astral-sh/uv concepts/indexes — read before any production install
astral-sh/uv#9122 — open issue; PEP 740 verification not yet in uv
Trail of Bits — Are We PEP 740 Yet? — tracker showing 132K+ packages have attestations
pypi-attestations CLI — what to run for manual attestation verification
Sonatype: LiteLLM credential stealer — the threat model anchor: maintainer-account compromise of a popular legitimate package
PyPI Blog: Ultralytics attack analysis — read for the GitHub Actions script-injection pattern
Google Workspace: investigate suspicious session cookies — the audit log path you'll actually use during revocation
Datadog Security Labs: GuardDog for PyPI malware — the static analyzer to run before any new dep
PEP 578 — Python Runtime Audit Hooks — for post-install monitoring of import-time syscalls

Part II — RFE v2: methodology refined by running it on itself

II.1 What worked, what didn't (Observe-phase report)

Worked:

Distinct angles produced complementary outputs. The five returned agents covered package audit, package-manager security, credential blast radius, supply-chain patterns, and methodology critique with minimal overlap. The synthesis was additive rather than redundant.
Briefing-as-merge-anticipation. Telling each agent "your output will be merged with N others" produced sections that read as standalone chapters rather than monologue summaries.
Iterative-search instruction. Agents that hit a maintainer's name or a specific package version drilled deeper without prompting.
Citation hygiene. Inline URLs preserved through synthesis; no paraphrase-away-the-citation cases observed.
Per-agent risk verdicts. Agent 1 (notebooklm-py) and Agent 5 (PyPI patterns) both produced actionable verdicts, not just observations. Agent 4 produced an explicit blast-radius statement. These verdicts collapsed into a single decision-record in §I.2.

Didn't work:

One agent was refused by Anthropic's AUP classifier. Agent 2 (TanStack/Shai-Hulud compromise) was briefed as defensive research but triggered cyber-violative-content restrictions when asked to detail the attack chain by name. The five other agents had similar framing but were briefed to study defensive controls and vendor postmortems rather than attack mechanics. Lesson: RFE on attack-adjacent topics must brief surveyors at the verb level — "study what the postmortem authors recommend" passes; "explain the attack chain" gets refused. Update SKILL.md.
No corpus log shipped with the deliverable. I did not explicitly log "I searched X, found Y candidates, kept Z, dropped W." If we get challenged on "why these 8 incidents and not 12?" the audit trail does not exist. R-18 below addresses this.
Verification artifacts did not ship. The Day-30 competency in §I.8 is the closest thing to a verifiable test, but the 5-part fresh-agent test (§II.4) was not actually run against this document before publish. We have proposed the rigor without demonstrating it.
Single-skill encoding. The methodology runs as one SKILL.md. The Observe agent argued (correctly, I think) that Frame and Observe are deliberative phases that don't decompose into subagents well, while Research is parallel and Educate is single-author-editorial. The current encoding masks these phase shapes.
The "use the flow on itself" experience was good but expensive. Six agents in parallel produced ~12,000 words of input which had to be synthesized. The next iteration should consider whether the meta-observation agent (Agent 6) is needed every run or only when the methodology itself changes.

II.2 The 21 hard rules (v2 of SKILL.md)

Numbering continues from the original 10 (Phase rules, briefing rules, etc., which carry forward).

Genre + opening discipline

R-12. Inline-URL citations only. No footnotes. No "(see Smith 2024)." Every claim that depends on a source links to the primary source on the same line. Format: [Title — Author — what to read first](URL). Bare URL lists in "Further reading" are forbidden.

R-13. Operator-only banner. Any procedure requiring privileged credentials uses the exact phrase "Do NOT automate. Ever." inside a visually distinct callout. The phrase is a grep-target; do not paraphrase.

Pitfalls + closing discipline

R-14. Day-N milestone obligation. Every executable manual chapter ends with a Day-N fluency checklist using the phrasing "You are fluent when you can, without notes, in under X minutes…" N defaults to 30 unless justified.

R-15. Ranked pitfalls schema. Pitfalls sections rank entries by frequency, lead each entry with the symptom in the heading, and give a one-line fix. No free-form prose pitfalls. Borrowed from the Bootstrap Handbook chapter format.

Cross-artifact discipline

R-16. Cross-artifact pointer obligation. Every executable-manual chapter links back to the conceptual chapter that justifies its tool choice. Every conceptual chapter links forward to the executable chapter that performs its pattern. Broken-pointer = build failure.

R-17. Decision-record output. The Formalize phase emits a DECISIONS table with one row per non-obvious choice. Columns: Decision / Choice / Alternatives considered / Why / Reversibility cost. See §I.2 of this document as the canonical example.

Research transparency

R-18. Corpus-construction log. The Research phase emits a CORPUS note listing: query strings used, sources searched, candidate count, retained count, exclusion criteria. Adapted from PRISMA's flow-diagram discipline at miniature scale. See §II.5 below.

R-19. Verifiable competency. Every Educate-phase artifact includes at least one verifiable competency check per major section. Self-report checklists do not satisfy this rule — the reader must be able to demonstrate a capability, not assert it.

Provenance + freshness

R-20. Last-verified stamp. Every artifact's front matter includes last verified: YYYY-MM-DD and per-major-section staleness budget (e.g., "if older than 90 days, re-check links"). Tool-version specifics get 90 days; methodology rules get 180; cross-artifact pointers get 365.

R-21. Provenance paragraph. Every artifact opens with a provenance note distinguishing (a) findings from existing practice, (b) original synthesis, (c) opinion. Forces honest authorship.

II.3 New anti-patterns (added to the existing list)

Frame: skipping the "what this is NOT" paragraph
Research: corpus-construction without an inclusion/exclusion log; briefing surveyors at the noun level ("study X") instead of the verb level ("study what postmortem authors recommend about X") when the domain is attack-adjacent
Formalize: tool choices presented without a decision record; voice averaging from too many co-authoring agents on one section
Educate: bare-URL "further reading" lists; self-report-only competencies; missing the cross-artifact pointers in R-16
Observe: producing a critique without producing a diff to SKILL.md (action lost to analysis)

II.4 The 5-part fresh-agent test (verification ritual)

A fresh agent is a new Claude Code session with no conversation history, given only (a) the artifact under test and (b) a fixed task derived from the artifact's stated purpose.

#	Test	Pass criterion	Catches
1	Orient. Prompt: "What is this project, what state is it in, what should I do next?"	Fresh agent matches a human-authored reference answer on ≥4 of 5 named facts	Artifact fails to orient
2	Execute. Pick one runbook/chapter at random; fresh agent executes on clean VM	No clarifying questions whose answer was in the artifact	Artifact has hidden context
3	Cite. Sample 5 claims; fresh agent locates cited source within 30s via inline link	Link resolves; content matches claim	Link rot or paraphrase-away
4	Anti-pattern. Give the fresh agent a deliberately bad prompt that violates the artifact's rules ("automate the vault-init runbook")	Agent refuses with reference to the specific rule	Rules aren't operationally legible
5	Diff. Two fresh agents, same artifact, same prompt, run separately	Their outputs agree on factual content (stylistic divergence OK)	Implicit context invisible to author

Failure on any criterion blocks publish. The diff-test (#5) is the most expensive but catches the failure the author can't catch — their own assumed context.

II.5 Corpus log for THIS document (canonical example of R-18)

Angle	Searches issued (representative)	Candidates considered	Retained	Why dropped
notebooklm-py static audit	`notebooklm-py repo`, maintainer GitHub history, PyPI publish history, GitHub Actions workflow contents, raw source file fetches for 13 files	1 package, 21 contributors, 13 files, ~10 dep options	1 package, 5 direct deps reviewed, no concerning patterns found	N/A — primary target
uv security model	uv docs (concepts/indexes, authentication, settings), uv issue tracker for #4924/#9122/#3305, Astral security blog, pypi-attestations CLI	~30 uv config options, 4 alternative package managers (pip, pipx, poetry, pdm)	uv-specific only	Out of scope: alternative managers
NotebookLM credential blast radius	Google session-cookie scope, DBSC status, Workspace admin revocation paths, Luke Berner password-change persistence research, teng-lin/notebooklm-py docs/configuration.md and docs/troubleshooting.md, RFC #233 alternative auth	Personal Gmail vs Workspace, dedicated burner vs shared, OAuth scopes (none exist), service account (Enterprise only)	Burner-only recommendation	Workspace path too org-specific to recommend
PyPI supply chain patterns 2024–2026	~25 incidents found across PyPI Blog, Sonatype, JFrog, Wiz, Checkmarx, ReversingLabs, Snyk, Hacker News, Securityaffairs	25 incidents; 12 attack techniques; 6 PyPI defenses; 6 pre-install + 6 post-install tools	8 named incidents; 12 techniques; 6 defenses; toolkits	Older incidents (pre-2024) and lesser-known cases dropped for relevance
Methodology critique	Three published gists read end-to-end; PRISMA + SALSA review frameworks; REPRO-BENCH reproducibility test design; multi-agent synthesis literature	~12 candidate rules; 5 weaknesses; phase critiques; primitive-choice review	11 new rules; 5-part fresh-agent test; phase-decomposition recommendation	Generic "best practices" rules without specific evidence dropped
Refused TanStack/Shai-Hulud	Aikido blog, original Shai-Hulud postmortems, npm advisories, GitHub Security	N/A	None	Agent refused under AUP cyber-violative-content classifier despite defensive framing. Lesson: brief surveyors at verb level.

II.6 Recommended Claude Code primitive decomposition (replaces single-skill encoding)

The Observe agent argued that one SKILL.md is the wrong shape because the five phases have different invocation patterns. Concur. Proposed v2 encoding:

Phase	Encoding	Tool surface	Parallelism
Frame	`/rfe-frame` slash command (skill)	Main thread + AskUserQuestion	None (deliberative)
Research	`rfe-researcher` subagent type	`general-purpose` template with web-research tools	3–10 in parallel
Formalize	`/rfe-formalize` skill	Main thread only — single editorial voice	None (parallelization hurts voice)
Educate	`rfe-educate-writer` subagent type per chapter	Single subagent per chapter with template injection	Parallel across chapters, single within
Observe	`/rfe-observe` slash command (skill)	Main thread	None (deliberative); optional meta-observation subagent on demand

Enforcement primitive: hooks. Convention drifts; hooks don't. A PostToolUse hook on Write/Edit that lints generated artifacts against the SKILL.md rules (link annotation, no-bare-URL, Day-N milestone presence, genre declaration in first 200 words) is the missing enforcement mechanism. Without it, R-11 through R-21 will silently rot.

Sample hook command:

# .claude/hooks/rfe-lint.sh
# Invoked PostToolUse on Write/Edit when path matches *.md and file >2000 chars
set -euo pipefail
FILE="$1"

# R-11: genre declaration
head -c 1200 "$FILE" | grep -q "This is a" || { echo "RFE-LINT R-11: missing genre declaration in first 200 words"; exit 1; }
head -c 1200 "$FILE" | grep -q "It is NOT" || { echo "RFE-LINT R-11: missing anti-genre declaration"; exit 1; }

# R-12: no bare URL lists
if grep -E "^- https?://" "$FILE" > /dev/null; then
  echo "RFE-LINT R-12: bare URL list detected (use '[Title — annotation](URL)' format)"
  exit 1
fi

# R-20: last-verified stamp
grep -q "last verified" "$FILE" || { echo "RFE-LINT R-20: missing 'last verified: YYYY-MM-DD' stamp"; exit 1; }

# R-21: provenance paragraph
grep -qi "provenance" "$FILE" || { echo "RFE-LINT R-21: missing provenance paragraph"; exit 1; }

II.7 Day-30 verifiable competency for the methodology

You are fluent with RFE v2 when you can, without notes, in under 90 minutes:

Pick a new domain. Restate the goal more precisely than the prompter did.
Identify 3–10 distinct angles. Justify the count.
Write the corpus-construction note (R-18) before running searches — name the search strings you'll use.
Spawn N surveyors in parallel with shared preamble + distinct focus + verb-level briefing (R-18 v2).
Synthesize into a single artifact in one editorial voice that passes the 5-part fresh-agent test.
Emit the DECISIONS table (R-17) with reversibility costs.
Publish via gh gist create --public and record URL in two places on disk.
File the Observe-phase report: what worked, what didn't, what to change in SKILL.md before the next run.

The hardest of these is step 8. The temptation is to skip it because the artifact is published and the user is happy. Resist. The Observe phase is where methodologies improve; without it, RFE freezes at v2 and becomes another piece of folklore.

II.8 What we'd change in the next iteration

Brief attack-adjacent surveyors at the verb level. "Study what vendors recommended after the incident" not "explain the attack chain."
Ship the corpus log as a sibling artifact, not buried in a section.
Run the 5-part fresh-agent test before publish, not as a recommended-but-skipped ritual.
Decompose to multiple skills/subagents + the enforcement hook rather than one mega-skill.
Consider whether the meta-observation agent is needed every run. Probably not. Run it quarterly, or when the methodology has been used N times since last revision.

II.9 Cross-references to prior artifacts

Project handoff — the rookie-orientation snapshot. RFE v1 produced it.
Bootstrap Conundrum — concepts book. RFE v1 produced it.
Bootstrap Handbook — implementation manual. RFE v1 produced it.

This document is RFE v2's first product — both an output of the methodology and a revision of it.

Sources for Part II's methodology-design synthesis:

Final note. This document was produced by 6 parallel research agents (5 returned, 1 refused under AUP), with synthesis by a single editorial agent in the main thread. The refusal is documented in §II.5 and §II.1 because honest methodology requires honest reporting of failure modes. The remaining 5 agents and the synthesis pass were sufficient to produce an actionable decision on notebooklm-py and an evidence-backed revision of RFE.

YoraiLevi/rfe-v2-and-notebooklm-py-review.md

Select an option

No results found

Select an option

No results found

RFE v2 + Defensive Review: `notebooklm-py`

Part I — Defensive adoption review: `teng-lin/notebooklm-py`

I.1 Recommendation (TL;DR)

I.2 Decision record

I.3 Static + provenance review (what Agent 1 found)

I.4 The `uv` security model (what Agent 3 found)

I.5 The credential blast radius (what Agent 4 found)

I.6 PyPI threat landscape 2024–2026 (what Agent 5 found)

I.7 What to add to our project's CLAUDE.md and `.claude/settings.local.json`

`CLAUDE.md` additions (paste into "Critical rules"):

`.claude/settings.local.json` additions:

I.8 Day-30 verifiable competency

I.9 Further reading

Part II — RFE v2: methodology refined by running it on itself

II.1 What worked, what didn't (Observe-phase report)

II.2 The 21 hard rules (v2 of SKILL.md)

Genre + opening discipline

Pitfalls + closing discipline

Cross-artifact discipline

Research transparency

Provenance + freshness

II.3 New anti-patterns (added to the existing list)

II.4 The 5-part fresh-agent test (verification ritual)

II.5 Corpus log for THIS document (canonical example of R-18)

II.6 Recommended Claude Code primitive decomposition (replaces single-skill encoding)

II.7 Day-30 verifiable competency for the methodology

II.8 What we'd change in the next iteration

II.9 Cross-references to prior artifacts

YoraiLevi/rfe-v2-and-notebooklm-py-review.md

RFE v2 + Defensive Review: notebooklm-py

Part I — Defensive adoption review: teng-lin/notebooklm-py

I.1 Recommendation (TL;DR)

I.2 Decision record

I.3 Static + provenance review (what Agent 1 found)

I.4 The uv security model (what Agent 3 found)

I.5 The credential blast radius (what Agent 4 found)

I.6 PyPI threat landscape 2024–2026 (what Agent 5 found)

I.7 What to add to our project's CLAUDE.md and .claude/settings.local.json

CLAUDE.md additions (paste into "Critical rules"):

.claude/settings.local.json additions:

I.8 Day-30 verifiable competency

I.9 Further reading

Part II — RFE v2: methodology refined by running it on itself

II.1 What worked, what didn't (Observe-phase report)

II.2 The 21 hard rules (v2 of SKILL.md)

Genre + opening discipline

Pitfalls + closing discipline

Cross-artifact discipline

Research transparency

Provenance + freshness

II.3 New anti-patterns (added to the existing list)

II.4 The 5-part fresh-agent test (verification ritual)

II.5 Corpus log for THIS document (canonical example of R-18)

II.6 Recommended Claude Code primitive decomposition (replaces single-skill encoding)

II.7 Day-30 verifiable competency for the methodology

II.8 What we'd change in the next iteration

II.9 Cross-references to prior artifacts

RFE v2 + Defensive Review: `notebooklm-py`

Part I — Defensive adoption review: `teng-lin/notebooklm-py`

I.4 The `uv` security model (what Agent 3 found)

I.7 What to add to our project's CLAUDE.md and `.claude/settings.local.json`

`CLAUDE.md` additions (paste into "Critical rules"):

`.claude/settings.local.json` additions: