LLM Wiki Pattern

One-line summary: A pattern articulated by Andrej Karpathy for building persistent, LLM-maintained personal knowledge bases — three layers (raw sources, LLM-owned wiki, schema) with three core operations (ingest, query, lint); this brain vault is a direct instantiation.

The insight

Most LLM-plus-documents setups are RAG: retrieve chunks at query time, generate an answer, repeat. Each query re-derives knowledge from scratch — nothing accumulates. The LLM Wiki pattern inverts this: the LLM incrementally builds and maintains a persistent markdown wiki sitting between the user and the raw sources. Adding a source doesn't just index it — the LLM reads it, extracts claims, integrates them into existing pages, flags contradictions, and updates cross-references.

The wiki is a compounding artifact. The cross-references are already there. The synthesis already reflects every source ingested. The human's job is sourcing, exploration, and asking good questions. The LLM's job is the maintenance — the bookkeeping that makes knowledge bases actually useful and that humans historically abandon.

Why it works, per 2026-04-20-llm-wiki: "The tedious part of maintaining a knowledge base is not the reading or the thinking — it's the bookkeeping… Humans abandon wikis because the maintenance burden grows faster than the value. LLMs don't get bored, don't forget to update a cross-reference, and can touch 15 files in one pass. The wiki stays maintained because the cost of maintenance is near zero."

Why this matters to career

Three reasons, each tied to a specific track:

Track 3 (portfolio) — this vault is a shipped artifact of the pattern. The brain vault is a working implementation. A portfolio that includes it (or writes publicly about it) demonstrates AI-native workflow skill in a form that can be cloned, inspected, and used — which is exactly what what-makes-compelling-frontend-portfolio-for-ai-era argues for over static showcase sites.
Track 4 (thought leadership) — anchor for multiple article topics. Possible angles: implementing the pattern for research specifically, a frontend engineer's take on LLMs as maintainers rather than generators, the shift from "ask an LLM" to "build with an LLM," a comparison of LLM-wiki vs. RAG-first tools (NotebookLM, ChatGPT file uploads). The gist is intentionally abstract, which leaves room for opinionated implementation writeups.
Macro (solo-operator enablement) — the maintenance-near-zero claim is structurally important. solo-human-company-thesis rests on "AI lowers the cost of creating things"; the Karpathy argument extends this to keeping things current, which is where solo operators typically fail. If the pattern generalizes, solo operators get a compounding knowledge advantage that didn't previously exist.

Evidence

The architecture (three layers)

Per 2026-04-20-llm-wiki:

Raw sources — curated source documents. Immutable. LLM reads, never writes.
The wiki — LLM-generated and LLM-maintained markdown pages: summaries, entity pages, concept pages, an index. Human reads; LLM writes.
The schema — a config file (CLAUDE.md / AGENTS.md) defining the wiki's structure, conventions, and workflows. This is "what makes the LLM a disciplined wiki maintainer rather than a generic chatbot." Human and LLM co-evolve it over time.

The three operations

Ingest — drop a new source, LLM reads, integrates into existing pages, updates index, appends to log. A single source can touch 10–15 wiki pages in one pass.
Query — ask a question, LLM searches wiki, synthesizes with citations. Good answers get filed back as new pages so explorations compound rather than evaporate into chat history.
Lint — periodic health check: contradictions, stale claims, orphans, missing cross-references, data gaps. Diagnostic, not destructive.

Indexing without RAG infrastructure

The gist claims the index-file approach "works surprisingly well at moderate scale (~100 sources, ~hundreds of pages) and avoids the need for embedding-based RAG infrastructure." Notable because it reframes the RAG-first mental model: for personal/project-scale knowledge bases, a hand-organized index with a disciplined schema can outperform vector search.

At larger scale, Karpathy suggests qmd — hybrid BM25/vector with on-device LLM re-ranking, usable via CLI or MCP.

Lineage: Vannevar Bush's Memex (1945)

Per 2026-04-20-llm-wiki: "The idea is related in spirit to Vannevar Bush's Memex — a personal, curated knowledge store with associative trails between documents… The part he couldn't solve was who does the maintenance. The LLM handles that." Useful framing for articles: this pattern is a 1945 idea that was blocked on a labor-cost problem, now unblocked.

Design implications (for the career project specifically)

Stop describing work; ship working instances. Track 4 articles should reference concrete wikis (this one, plus maybe a second public demo) rather than abstract descriptions of the pattern. The gist is intentionally abstract — an opinionated implementation is the differentiated contribution.
Make the vault linkable. If Paul wants to cite the brain vault in articles or portfolio work, the public-facing surface (README, a subset of pages, or a sanitized mirror) needs to be legible to an outside reader who hasn't seen the conversation history.
Track the pattern's limits. The gist doesn't discuss where the pattern fails (adversarial sources, contradictory domains, scale beyond "moderate," multi-user). Useful article contribution: document where this vault has already hit edges — contradiction handling, source-volume growth, cross-project concept migration.
Treat the schema (not the wiki) as the primary artifact. Per the gist, the schema is "the key configuration file." Paul's _meta/RESEARCH.md and _meta/WORKFLOW.md are the transferable, reusable assets — more so than the content inside any single project.

Contradictions / tensions

"Humans never write the wiki" (Karpathy) vs. real-world practice. In 2026-04-20-llm-wiki: "You never (or rarely) write the wiki yourself." In practice this vault has had human-authored seed content (e.g., scope files, initial question pages). The pattern tolerates this but doesn't make it explicit; worth flagging as one of the documentable limits.
Moderate-scale claim. "~100 sources, ~hundreds of pages" is the stated sweet spot. Career is at ~24 sources, patia at ~19. No direct evidence yet for where the pattern starts to strain — this vault will eventually produce that evidence.

Open questions

What's the right public-facing surface for the brain vault if Paul wants it to be a portfolio artifact? Read-only mirror? Redacted subset? Full vault with private marker files?
At what scale does the index-based approach actually break down, and what's the specific failure mode?
How does this pattern compose with career-ops-style working agent loops? The wiki is passive reference; career-ops is active task execution. Is there a combined pattern — the wiki informs the loop, the loop files back into the wiki?
Does the "LLM as maintainer" framing translate to team/business settings (the gist mentions it; no evidence of it working in practice)?

ai-native-multi-agent-workflow — the LLM Wiki pattern is one concrete reference instance of this macro question
solo-human-company-thesis — maintenance-at-near-zero-cost is a structural enabler for solo operations
what-makes-compelling-frontend-portfolio-for-ai-era — shipped working instances of AI-native patterns are the portfolio signal
career-ops — a different form of AI-native workflow (active task loop vs. passive knowledge base)
ai-assistants-for-older-adults (patia) — the patia wiki is the largest implementation of this pattern Paul has

Sources

2026-04-20-llm-wiki — Karpathy's original gist articulating the pattern

LLM Wiki Pattern

LLM Wiki Pattern

The insight

Why this matters to career

Evidence

The architecture (three layers)

The three operations

Indexing without RAG infrastructure

Lineage: Vannevar Bush's Memex (1945)

Design implications (for the career project specifically)

Contradictions / tensions

Open questions

Related

Sources