LLM Wiki Pattern
LLM Wiki Pattern
One-line summary: A pattern articulated by Andrej Karpathy for building persistent, LLM-maintained personal knowledge bases — three layers (raw sources, LLM-owned wiki, schema) with three core operations (ingest, query, lint); this brain vault is a direct instantiation.
The insight
Most LLM-plus-documents setups are RAG: retrieve chunks at query time, generate an answer, repeat. Each query re-derives knowledge from scratch — nothing accumulates. The LLM Wiki pattern inverts this: the LLM incrementally builds and maintains a persistent markdown wiki sitting between the user and the raw sources. Adding a source doesn't just index it — the LLM reads it, extracts claims, integrates them into existing pages, flags contradictions, and updates cross-references.
The wiki is a compounding artifact. The cross-references are already there. The synthesis already reflects every source ingested. The human's job is sourcing, exploration, and asking good questions. The LLM's job is the maintenance — the bookkeeping that makes knowledge bases actually useful and that humans historically abandon.
Why it works, per 2026-04-20-llm-wiki: "The tedious part of maintaining a knowledge base is not the reading or the thinking — it's the bookkeeping… Humans abandon wikis because the maintenance burden grows faster than the value. LLMs don't get bored, don't forget to update a cross-reference, and can touch 15 files in one pass. The wiki stays maintained because the cost of maintenance is near zero."
Why this matters to career
Three reasons, each tied to a specific track:
- Track 3 (portfolio) — this vault is a shipped artifact of the pattern. The
brainvault is a working implementation. A portfolio that includes it (or writes publicly about it) demonstrates AI-native workflow skill in a form that can be cloned, inspected, and used — which is exactly what what-makes-compelling-frontend-portfolio-for-ai-era argues for over static showcase sites. - Track 4 (thought leadership) — anchor for multiple article topics. Possible angles: implementing the pattern for research specifically, a frontend engineer's take on LLMs as maintainers rather than generators, the shift from "ask an LLM" to "build with an LLM," a comparison of LLM-wiki vs. RAG-first tools (NotebookLM, ChatGPT file uploads). The gist is intentionally abstract, which leaves room for opinionated implementation writeups.
- Macro (solo-operator enablement) — the maintenance-near-zero claim is structurally important. solo-human-company-thesis rests on "AI lowers the cost of creating things"; the Karpathy argument extends this to keeping things current, which is where solo operators typically fail. If the pattern generalizes, solo operators get a compounding knowledge advantage that didn't previously exist.
Evidence
The architecture (three layers)
Per 2026-04-20-llm-wiki:
- Raw sources — curated source documents. Immutable. LLM reads, never writes.
- The wiki — LLM-generated and LLM-maintained markdown pages: summaries, entity pages, concept pages, an index. Human reads; LLM writes.
- The schema — a config file (CLAUDE.md / AGENTS.md) defining the wiki's structure, conventions, and workflows. This is "what makes the LLM a disciplined wiki maintainer rather than a generic chatbot." Human and LLM co-evolve it over time.
The three operations
- Ingest — drop a new source, LLM reads, integrates into existing pages, updates index, appends to log. A single source can touch 10–15 wiki pages in one pass.
- Query — ask a question, LLM searches wiki, synthesizes with citations. Good answers get filed back as new pages so explorations compound rather than evaporate into chat history.
- Lint — periodic health check: contradictions, stale claims, orphans, missing cross-references, data gaps. Diagnostic, not destructive.
Indexing without RAG infrastructure
The gist claims the index-file approach "works surprisingly well at moderate scale (~100 sources, ~hundreds of pages) and avoids the need for embedding-based RAG infrastructure." Notable because it reframes the RAG-first mental model: for personal/project-scale knowledge bases, a hand-organized index with a disciplined schema can outperform vector search.
At larger scale, Karpathy suggests qmd — hybrid BM25/vector with on-device LLM re-ranking, usable via CLI or MCP.
Lineage: Vannevar Bush's Memex (1945)
Per 2026-04-20-llm-wiki: "The idea is related in spirit to Vannevar Bush's Memex — a personal, curated knowledge store with associative trails between documents… The part he couldn't solve was who does the maintenance. The LLM handles that." Useful framing for articles: this pattern is a 1945 idea that was blocked on a labor-cost problem, now unblocked.
Design implications (for the career project specifically)
- Stop describing work; ship working instances. Track 4 articles should reference concrete wikis (this one, plus maybe a second public demo) rather than abstract descriptions of the pattern. The gist is intentionally abstract — an opinionated implementation is the differentiated contribution.
- Make the vault linkable. If Paul wants to cite the brain vault in articles or portfolio work, the public-facing surface (README, a subset of pages, or a sanitized mirror) needs to be legible to an outside reader who hasn't seen the conversation history.
- Track the pattern's limits. The gist doesn't discuss where the pattern fails (adversarial sources, contradictory domains, scale beyond "moderate," multi-user). Useful article contribution: document where this vault has already hit edges — contradiction handling, source-volume growth, cross-project concept migration.
- Treat the schema (not the wiki) as the primary artifact. Per the gist, the schema is "the key configuration file." Paul's
_meta/RESEARCH.mdand_meta/WORKFLOW.mdare the transferable, reusable assets — more so than the content inside any single project.
Contradictions / tensions
- "Humans never write the wiki" (Karpathy) vs. real-world practice. In 2026-04-20-llm-wiki: "You never (or rarely) write the wiki yourself." In practice this vault has had human-authored seed content (e.g., scope files, initial question pages). The pattern tolerates this but doesn't make it explicit; worth flagging as one of the documentable limits.
- Moderate-scale claim. "~100 sources, ~hundreds of pages" is the stated sweet spot. Career is at ~24 sources, patia at ~19. No direct evidence yet for where the pattern starts to strain — this vault will eventually produce that evidence.
Open questions
- What's the right public-facing surface for the brain vault if Paul wants it to be a portfolio artifact? Read-only mirror? Redacted subset? Full vault with private marker files?
- At what scale does the index-based approach actually break down, and what's the specific failure mode?
- How does this pattern compose with career-ops-style working agent loops? The wiki is passive reference; career-ops is active task execution. Is there a combined pattern — the wiki informs the loop, the loop files back into the wiki?
- Does the "LLM as maintainer" framing translate to team/business settings (the gist mentions it; no evidence of it working in practice)?
Related
- ai-native-multi-agent-workflow — the LLM Wiki pattern is one concrete reference instance of this macro question
- solo-human-company-thesis — maintenance-at-near-zero-cost is a structural enabler for solo operations
- what-makes-compelling-frontend-portfolio-for-ai-era — shipped working instances of AI-native patterns are the portfolio signal
- career-ops — a different form of AI-native workflow (active task loop vs. passive knowledge base)
- ai-assistants-for-older-adults (patia) — the patia wiki is the largest implementation of this pattern Paul has
Sources
- 2026-04-20-llm-wiki — Karpathy's original gist articulating the pattern