brain/
conceptcareer

LLM Wiki Pattern

Notes

LLM Wiki Pattern

One-line summary: A pattern articulated by Andrej Karpathy for building persistent, LLM-maintained personal knowledge bases — three layers (raw sources, LLM-owned wiki, schema) with three core operations (ingest, query, lint); this brain vault is a direct instantiation.

The insight

Most LLM-plus-documents setups are RAG: retrieve chunks at query time, generate an answer, repeat. Each query re-derives knowledge from scratch — nothing accumulates. The LLM Wiki pattern inverts this: the LLM incrementally builds and maintains a persistent markdown wiki sitting between the user and the raw sources. Adding a source doesn't just index it — the LLM reads it, extracts claims, integrates them into existing pages, flags contradictions, and updates cross-references.

The wiki is a compounding artifact. The cross-references are already there. The synthesis already reflects every source ingested. The human's job is sourcing, exploration, and asking good questions. The LLM's job is the maintenance — the bookkeeping that makes knowledge bases actually useful and that humans historically abandon.

Why it works, per 2026-04-20-llm-wiki: "The tedious part of maintaining a knowledge base is not the reading or the thinking — it's the bookkeeping… Humans abandon wikis because the maintenance burden grows faster than the value. LLMs don't get bored, don't forget to update a cross-reference, and can touch 15 files in one pass. The wiki stays maintained because the cost of maintenance is near zero."

Why this matters to career

Three reasons, each tied to a specific track:

  1. Track 3 (portfolio) — this vault is a shipped artifact of the pattern. The brain vault is a working implementation. A portfolio that includes it (or writes publicly about it) demonstrates AI-native workflow skill in a form that can be cloned, inspected, and used — which is exactly what what-makes-compelling-frontend-portfolio-for-ai-era argues for over static showcase sites.
  2. Track 4 (thought leadership) — anchor for multiple article topics. Possible angles: implementing the pattern for research specifically, a frontend engineer's take on LLMs as maintainers rather than generators, the shift from "ask an LLM" to "build with an LLM," a comparison of LLM-wiki vs. RAG-first tools (NotebookLM, ChatGPT file uploads). The gist is intentionally abstract, which leaves room for opinionated implementation writeups.
  3. Macro (solo-operator enablement) — the maintenance-near-zero claim is structurally important. solo-human-company-thesis rests on "AI lowers the cost of creating things"; the Karpathy argument extends this to keeping things current, which is where solo operators typically fail. If the pattern generalizes, solo operators get a compounding knowledge advantage that didn't previously exist.

Evidence

The architecture (three layers)

Per 2026-04-20-llm-wiki:

  • Raw sources — curated source documents. Immutable. LLM reads, never writes.
  • The wiki — LLM-generated and LLM-maintained markdown pages: summaries, entity pages, concept pages, an index. Human reads; LLM writes.
  • The schema — a config file (CLAUDE.md / AGENTS.md) defining the wiki's structure, conventions, and workflows. This is "what makes the LLM a disciplined wiki maintainer rather than a generic chatbot." Human and LLM co-evolve it over time.

The three operations

  • Ingest — drop a new source, LLM reads, integrates into existing pages, updates index, appends to log. A single source can touch 10–15 wiki pages in one pass.
  • Query — ask a question, LLM searches wiki, synthesizes with citations. Good answers get filed back as new pages so explorations compound rather than evaporate into chat history.
  • Lint — periodic health check: contradictions, stale claims, orphans, missing cross-references, data gaps. Diagnostic, not destructive.

Indexing without RAG infrastructure

The gist claims the index-file approach "works surprisingly well at moderate scale (~100 sources, ~hundreds of pages) and avoids the need for embedding-based RAG infrastructure." Notable because it reframes the RAG-first mental model: for personal/project-scale knowledge bases, a hand-organized index with a disciplined schema can outperform vector search.

At larger scale, Karpathy suggests qmd — hybrid BM25/vector with on-device LLM re-ranking, usable via CLI or MCP.

Lineage: Vannevar Bush's Memex (1945)

Per 2026-04-20-llm-wiki: "The idea is related in spirit to Vannevar Bush's Memex — a personal, curated knowledge store with associative trails between documents… The part he couldn't solve was who does the maintenance. The LLM handles that." Useful framing for articles: this pattern is a 1945 idea that was blocked on a labor-cost problem, now unblocked.

Design implications (for the career project specifically)

  • Stop describing work; ship working instances. Track 4 articles should reference concrete wikis (this one, plus maybe a second public demo) rather than abstract descriptions of the pattern. The gist is intentionally abstract — an opinionated implementation is the differentiated contribution.
  • Make the vault linkable. If Paul wants to cite the brain vault in articles or portfolio work, the public-facing surface (README, a subset of pages, or a sanitized mirror) needs to be legible to an outside reader who hasn't seen the conversation history.
  • Track the pattern's limits. The gist doesn't discuss where the pattern fails (adversarial sources, contradictory domains, scale beyond "moderate," multi-user). Useful article contribution: document where this vault has already hit edges — contradiction handling, source-volume growth, cross-project concept migration.
  • Treat the schema (not the wiki) as the primary artifact. Per the gist, the schema is "the key configuration file." Paul's _meta/RESEARCH.md and _meta/WORKFLOW.md are the transferable, reusable assets — more so than the content inside any single project.

Contradictions / tensions

  • "Humans never write the wiki" (Karpathy) vs. real-world practice. In 2026-04-20-llm-wiki: "You never (or rarely) write the wiki yourself." In practice this vault has had human-authored seed content (e.g., scope files, initial question pages). The pattern tolerates this but doesn't make it explicit; worth flagging as one of the documentable limits.
  • Moderate-scale claim. "~100 sources, ~hundreds of pages" is the stated sweet spot. Career is at ~24 sources, patia at ~19. No direct evidence yet for where the pattern starts to strain — this vault will eventually produce that evidence.

Open questions

  • What's the right public-facing surface for the brain vault if Paul wants it to be a portfolio artifact? Read-only mirror? Redacted subset? Full vault with private marker files?
  • At what scale does the index-based approach actually break down, and what's the specific failure mode?
  • How does this pattern compose with career-ops-style working agent loops? The wiki is passive reference; career-ops is active task execution. Is there a combined pattern — the wiki informs the loop, the loop files back into the wiki?
  • Does the "LLM as maintainer" framing translate to team/business settings (the gist mentions it; no evidence of it working in practice)?

Related

Sources

Referenced by