One command kicks off a seven-wave analysis: DeepInit reads your whole codebase —
deep into each component, then across the system, in parallel — grounds every claim to a
file:line and checks it against your code before
writing the lean two-tier brief your agent loads. So your agent stops guessing — and DeepInit
flags the real problems it finds along the way. Local, MIT.
Point an agent at a large codebase it's never seen — half a million lines, years of decisions — and it'll rewrite a function with total confidence, then break a business rule that was never written down.
And the obvious fix backfires: piling everything into a "document everything" file (think a giant
CLAUDE.md or AGENTS.md) makes agents do
worse, not better — a 2026 ETH Zurich study measured lower task success and 20%+
more cost per task, because the few things that matter get buried. The evidence →
DeepInit writes a lean, always-loaded context file your agent auto-loads — CLAUDE.md for Claude Code, the open AGENTS.md standard plus per-tool rules for Cursor, Copilot and Windsurf (.cursorrules, .github/copilot-instructions.md, .windsurf/rules/) — and a deep .ai/docs/ layer it pulls on demand, flagging the real problems in the same pass. Both come from one engine, and both obey the same rule: every claim cites a file:line, gets verified against your code before it's written, and is trimmed to only what your agent couldn't work out on its own. Nothing fabricated, nothing restated, nothing it can't point at.
It writes a short, always-loaded front-door file — CLAUDE.md for Claude Code (which loads it natively), or the cross-tool AGENTS.md standard for other agents — plus deeper files your agent
opens on demand, kept small on purpose so the few things that matter aren't buried.
Every line meets one test — it’s something your agent can’t already figure out for itself.
It comes down to three moves — parse, understand, verify. Most tools hand your whole repo to an LLM and hope; DeepInit parses your code first (real AST parsing via Graphify, 25 languages), reasons about meaning on top, then verifies every finding against the code before it's written. Under the hood that's seven waves: parallel subagents analyze each component and then the patterns across them, before findings are cross-checked and adversarially reviewed.
Auto-detect tools, estimate the token cost up front, check the database, get permissions. Nothing runs until you approve it.
Scan the tree, detect the stack, build the structural graph, order components by dependency, read git history — all deterministic, before any model runs.
Deep into each component, leaves first. Parallel subagents pull business rules, workflows and integration points — grounded to file:line.
Across all components — the patterns no single-component pass can see: shared tables, end-to-end workflows, bounded contexts.
Unify it — entity ↔ component ↔ table maps, rule-to-workflow links, coverage-gap detection across the whole set.
Drop anything inferable; every surviving claim must resolve to real code; a critic agent challenges the findings (0–3 cycles).
Emit the two tiers — lean root + deep .ai/docs/ — plus a Claude Code skill package. Backup first; re-check every claim against the code.
DeepInit reads your live database schema read-only — SQL this release (Postgres, MySQL, SQLite), where it compares the live schema against your code and flags the drift. NoSQL stores (Mongo, Redis and others) are stub-level for now — honestly labeled, not implied as done. A secret/PII redaction gate scrubs anything sensitive before a single file is written to disk. (A cross-model verification pass — a second model double-checking findings — is on the roadmap, not shipped yet.)
It tells you the cost before it spends a token. A preflight shows the estimate; nothing runs until you approve it, under a ceiling you set. After the first run, /deep-init-update re-reads only what changed and /deep-init-check costs zero tokens — so you're never surprised by the bill.
DeepInit burns tokens generously on the first pass — parsing, inspecting your database, cross-checking and verifying every finding — because one thorough analysis is worth far more than a cheap guess your agent will trust for months. Quality is the default, not an upsell.
Two ways. At runtime: your agent reads a lean, ready-made brief instead of burning
tokens re-exploring the codebase from scratch every session — and a tight file is cheaper per task than
a bloated one (the same ETH study measured 20%+ more cost from oversized context).
Over
time: you pay for the deep analysis once, then /deep-init-update refreshes
only the diffs and /deep-init-check is free — instead of re-deriving
the whole picture every session, on every machine, for every teammate. Spend on depth once; coast for
months.
After the first run you never re-pay for the whole repo. An edit re-documents only its blast
radius — the components you touched, plus anything whose public interface actually moved.
Here is exactly how /deep-init-update stays proportional to the change:
A content_hash per component, compared against the stored manifest by an authoritative symmetric set-diff (stored + current keys). git diff and the commit breadcrumb are only accelerators — so a deleted module, or a repo with no git history, is still caught.
Just the components whose content actually changed — nothing else is touched.
Recompute each dirty component’s public-surface hash. A body-only refactor re-analyzes that one component but skips its dependents (their view is provably unchanged); only a changed export marks the transitive dependents dirty.
The five cross-cutting docs re-run regardless of what changed — because a cross-component effect (a new circular dependency, a shifted end-to-end workflow) is invisible from any single component’s diff. This is what makes the step-3 skip a reversible optimization, not a correctness risk.
Filter, redact, re-verify every citation, then write only the changed files — inside the owned-region markers, with a dated reversible backup. Issues are diffed against a symbol-keyed baseline (new / persisting / resolved / regressed), so a line-shift never re-churns a finding.
Two things it therefore guarantees — the two ways docs usually rot, closed:
Even on the grep path (no precise parser present), DeepInit reconciles the public surface it captured against export-indicator tokens — an unusual form (an export *, a CommonJS module.exports, a dynamic __all__) marks the surface “incomplete” and conservatively re-checks dependents. A breaking change never quietly skips the code that needed it.
Because detection is a symmetric set-diff (stored vs. current), a deleted or moved component is caught and its docs archived — even with no git history, or a shallow / rewritten ref. No stale page outlives the code it described.
Two plugin-shipped hooks do the surfacing, both calling the same 0-token, no-LLM status script (the only kind of check a git hook can actually run): a post-commit nudge that prints how far the docs have drifted, and a SessionStart offer — open a session while the docs are behind and DeepInit offers a one-click refresh. Beyond those, a real headless auto-refresh exists but is off by default, because it’s the only level that spends tokens. None of them ever auto-commits — you always review the diff.
The honest part: a git hook can’t summon an AI session, so DeepInit doesn’t pretend your docs silently regenerate on every commit — the free nudge and the session-start offer are the real, visible surface.
Because DeepInit already extracts your rules, your schema, and the why, the problems surface as a byproduct — now across about ten issue families plus a class-conformance census, every finding grounded to the line, framed as “likely” rather than asserted, and report-only — it never touches your source. Anything a linter already catches is suppressed, because false positives are what kill tools like this. Five of the kinds it looks for:
Code that still reads a column the live database no longer has. Schema-diff tools check the DB against a declared schema; this checks it against your actual code — on legacy code with no clean schema.
Code that breaks a decision you recorded, or a "temporary" hack that's quietly load-bearing — only visible when the documented why is on hand.
A business rule applied on one path but missing on another that writes the same data — access-control gaps included (surfaced as rule violations, not a security claim).
Parts quietly sharing a table with no interface between them — change one, break the other.
Everything ranked by what to fix first — how often it changes, how few people understand it, how critical it is, how thin the tests are.
The goal — a short, trustworthy list worth a human's
attention, not a wall of warnings. Findings are framed as likely and grounded to a
file:line rather than proven, and this isn't a security product.
A single self-contained, offline report.html: the browsable docs (search, a component tree, an architecture overview, a decisions timeline, jump-to-file:line) and the issue/metrics dashboard, merged into one file with a ⌘K palette. Vanilla JS, no network. (It supersedes the old docs-viewer.html + dashboard.html, now redirect stubs.)
One command emits report.<lang>.html in 8 languages (Spanish, Chinese, Portuguese, Russian, Japanese, German, French, Hebrew — full RTL), with an in-app switcher. English stays the canonical analysis; grounded tokens (file:line, code, record IDs) are masked and verified, so a translation can never corrupt a grounded claim — and any miss falls back to English, never a fabricated translation.
The Insights view shows real composite risk (severity × criticality × churn × bus-factor × coverage) when the repo has the signals — and honestly says “unavailable” when it doesn’t, never a fake zero. Beside it, a static component-dependency graph read from the structural graph DeepInit already computes. (An interactive explorer — DeepMap — is coming.)
Findings appear in GitHub code scanning and your IDE alongside everything else — via SARIF v2.1.0, the standard format those tools already read. No custom integration.
## Business rules — billing (vertical — per component) [BR-billing:003] CORE — An invoice can't be voided once its payment has settled; void attempts must go through the refund flow instead. from src/billing/invoice.ts:142 ✓ checked · HIGH ## Database vs. code — orders (vertical) ⚠ orders.legacy_status (text) is still read by the reporting job, but your Prisma schema dropped it — an agent trusting the schema will miss it. from prisma/schema.prisma:88 ↔ src/orders/order.ts:24 ✓ checked ## Use case — cross-component (horizontal — across components) [UC-014] Checkout → charge → fulfil: orders.create() calls billing.charge(); on success it emits `paid` → fulfilment.start(). A failure *after* the charge must call billing.refund() — orders can't roll the payment back itself. spans orders/ · billing/ · fulfilment/ ✓ checked
Grounded + verified isn't a slogan — it's a pipeline stage. Before a single claim reaches your context file, it walks this chain. If a citation doesn't resolve, the claim is dropped from the lean tier, never silently kept.
src/billing/invoice.ts:142CLAUDE.mdVerified means the citation exists and is plausible — not that the claim is provably correct. A confidently-wrong claim is worse than a gap, so DeepInit prefers omission over a guess.
Every finding is typed, tagged by importance and confidence, points to the exact
file:line, and is checked against your code before it's
written. Findings are traceable by ID:
This example is TypeScript — DeepInit's AST parser reads 25 languages (Rails/Ruby, Python, Go, Rust, Java, C#, PHP, Kotlin and more), with a grep fallback for the few it doesn't (e.g. Crystal, OCaml).
Wikis, code graphs, and index tools give you a separate place to go ask questions — a system that's only as current as its last crawl. DeepInit writes verified markdown straight into the context files your agent already loads — CLAUDE.md for Claude Code, AGENTS.md for the rest — so the context is just there, in front of the model, on every run. And it carries a second axis nothing else does: it's measured. The same file:line grounding that makes the context trustworthy is what lets it flag problems without crying wolf — a precision discipline we test on every change, not a promise.
Two rows are the whole story — start here, then the full table substantiates the rest.
DeepInit compares your live database schema against what the code actually reads and writes, and flags where they've diverged — a column the schema dropped that a job still reads. Schema-diff tools check the DB against a declared schema; this checks it against your code.
No other tool here does thisEach statement cites a real file:line, is verified to exist before it's written, and is re-checked as the code changes. The others snapshot prose and go stale at the next crawl; DeepInit writes verified truth and maintains it.
| What matters | DeepInit | /init (Claude Code) | Starter-file generators | Understand-Anything | GitNexus | Google Code Wiki | DeepWiki |
|---|---|---|---|---|---|---|---|
| Approach | Analyze once → write | Quick scan | Scaffold a stub | Graph to explore | Graph to query | Wiki to ask | Wiki to ask |
| License & cost | Free · MIT | Built in | Free / open source | Free · MIT | Paid for commercial use | Free public · cloud | Paid for private · cloud |
| Runs | On your machine | In the agent | Local | Local | Local | In the cloud | In the cloud |
| What it produces | Context files your agent reads | One context file | Context files | A graph + dashboard | A code graph | A hosted wiki | A hosted wiki |
| How it reads your code | Real parsing + AI, checked | Quick AI read | File scan + AI | Parsing + AI | Code graph | AI over the repo | AI over the repo |
| Your business rules | ✓ written & ranked | — | — | ~ a domain view | — | ~ in prose | ~ in prose |
| The "why" behind the code | ✓ | — | — | — | — | ~ inferred | ~ inferred |
| Database vs. your code | ✓ spots the drift | — | — | — | — | — | — |
| How features cross the code | ✓ traced | — | — | ~ dependencies | ~ dependencies | ~ in prose | ~ in prose |
| Keeps only what helps | ✓ | — | — no filter | n/a | n/a | — | — |
| Traceable to the exact file & line | ✓ every finding | — | — | ~ | ~ | ~ links | ~ links |
| Checked against your code | ✓ | — | — | — | — | — | — |
| Measured precision (false-positive rate) | ✓ 0/22 on real bugfixes | — | — | — | — | — | — |
| Flags risky / single-owner code | ✓ | — | — | — | — | — | — |
| Small file + depth on demand | ✓ | — one file | ~ tiered | n/a | n/a | n/a | n/a |
| Works with your agent | ✓ standard files · Claude Code first-class | — its own agent | ✓ standard files | ✓ many | ~ some editors | ~ web / MCP | ~ web / MCP |
| Cheap to keep updated | ✓ only what changed | — | — | ~ | ~ | ~ auto (cloud) | ~ auto (cloud) |
| Stays on your machine | ✓ | via the agent | ✓ | ✓ | ✓ | — cloud | — cloud |
✓ = does it · ~ = partial / adjacent · — = doesn't · "n/a" = not that kind of tool.
/init (Claude Code)Comparisons reflect publicly available information about third-party products as of June 2026. These tools evolve fast — details may be out of date. "Starter-file generators" are lightweight tools (Apify's generator, hcc, Intent and similar) that scaffold a short AGENTS.md — useful, but shallow by design; several deliberately cap themselves at ~20–30 lines. Product names and trademarks belong to their respective owners and are used for identification only; mention does not imply endorsement or affiliation. Something inaccurate? Open an issue and we'll fix it.
That drift check and the verified-to-the-line grounding are the difference. Point it at your repo:
Get startedDeepInit's design is a response to a measured, counterintuitive result: handing a coding agent a big, auto-generated context file makes it perform worse — and costs more to run. So DeepInit does the opposite of "document everything."
more cost per task — the penalty for bloat. Pile everything into one auto-generated context file and the model burns extra tokens on context it didn't need, for output that comes out worse, not better. The fix isn't more context, it's less: trim to only what the agent can't infer, and the result flips to a measured gain.
ETH Zurich / LogicStar · arXiv:2602.11988 · CC BY 4.0
false positives on real, human-merged bug fixes — it never re-flagged a line a maintainer had already fixed. Across four more real repos, a naive rule-checker would have fired ~90 false alarms; DeepInit fired none.
DeepInit’s own measurement, on real code · How we tested it
Run blind on 8 real repos with their own architecture docs removed, DeepInit never once confidently stated a fact the code disproves. That's the Mirror Test — does it actually understand your architecture? We strip a project's own docs (.NET, Rust, Python, Go), run on the code alone, and grade what it re-derived. Zero confidently-wrong facts — the same hard bar as the precision result above, applied to understanding.
What it re-derived (indicative): ~66% of what the human docs state, at 98% faithfulness — strongest on structure (components, dependencies, data stores), honestly weaker on deep invariants. Is it just memorizing famous repos? No: on two obscure repos a model is very unlikely to have memorized (one Go, one Rust) faithfulness held at 100% — so what it states about your unfamiliar code is just as trustworthy. The coverage number moves with how deep one pass goes; the trustworthiness doesn't.
DeepInit’s own measurement, docs removed · INDICATIVE (8 held-out repos; 2 contamination-resistant shown) · the precision result above stays the headline
DeepInit analyzes your source and inspects your live database schema (read-only). It doesn't execute your app, so purely runtime behavior — load, race conditions, anything that only surfaces live — is out of scope.
Findings are framed as likely, grounded to a file:line, with linter-territory suppressed. It is not a security product and makes no safety guarantee — access-control gaps are surfaced as risk to review, not proof your code is safe.
A full analysis does real work: parsing, database checks, and multiple verification passes. Expect minutes and real tokens the first time, not seconds — after that, updates are incremental and cheap.
Measured precision, an architecture it re-derived from code alone, the limits stated plainly. That's the proof — run it on your own repo:
Get startedA finding earns trust two ways: it is grounded in your actual code, and it is not noise. DeepInit is built for the first and measured for the second — on its own fixtures, on 22 real human-merged bug fixes, and on real open-source repos.
file:line and is checked to exist against your code first — the difference between analysis and a confident guess.restic bug blind. On a full-pipeline analysis of the kemal web framework it raised zero issues — all nine structural near-candidates suppressed by a named guard rather than raised as a guess. Comprehension, not keyword-matching.Then the harder half — the false-positive rate, because a false alarm is what gets a tool turned off:
These are DeepInit’s own measurements on real code, kept exact. Recall (the false-negative side) on those real bug fixes was 14/22 — labeled indicative, below our own ship-gate, not a headline — and the speedup benefit is still being benchmarked, so this page leaves it blank rather than invent one.
Trust comes from stating what each test proves and what it can’t. In rough order of independence:
This is the harness — not the model. Together, those five layers are the harness: the engineered apparatus around the model that makes its understanding of your code trustworthy. It is the part a weekend prompt doesn’t have. A prompt hands you one ungrounded guess; the harness grounds every claim to a file:line, measures its own false-alarm rate, and is regression-tested on every change — so it doesn’t quietly drift as the model moves underneath it.
A big test count is easy to fake. What makes the 331 checks across 72 oracle sections trustworthy is how they’re built — four disciplines borrowed from how you’d test production-grade software, not a prompt:
Metamorphic bug-fix replay. We replay 22 real, human-merged bug fixes (3 of them CVEs): the detector must flag the broken commit and must not re-flag the fixed one. The ground truth predates our spec, so nothing can be over-fitted to it — and re-flagging an already-fixed line came up 0 times (0/22).
Blind, separated duties. The engine emits its findings before anyone sees the key; an independent party pins the held-out answers; a third scores. In the Mirror Test the reference doc is provably removed from the inputs and the commit pinned by hash — so it reconstructs your architecture from code alone, never from a doc it quietly read.
Mutation meta-testing. A meta-harness makes one known-bad edit at a time to a committed fixture and demands the suite go red. 92 of 92 mutations killed, 0 survived — proof the checks are load-bearing, not decorative. A check that can’t catch a planted bug proves nothing.
Frozen baselines + a drift guard. The hard zeros (never confidently wrong, zero false defects, zero re-flagged fixes) are re-asserted against the current results on every change. And every figure on this page is derived from committed records by one aggregator — no number here is hand-typed, and a stale one fails the build.
The 331 checks run with no model in the loop (deterministic) and grow only by addition — the original engine checks must never regress. Recall is reported, never gated, and kept below the headline because our own fixtures over-state it; precision (the false-alarm side) is what we gate on.
The same detectors, run over ~1.12M lines of real open-source code in 15 languages — Go, Rust, C, C++, Java, C#, Kotlin, PHP, Ruby/Rails, Elixir, OCaml, Swift, Python, TypeScript, Crystal — comprehend each language’s own structure rather than matching a surface pattern. The clearest example is circular dependencies: the identical check stays silent where the language forbids cycles (Go packages, Rust crates, C# assemblies, and the OCaml and Swift build manifests — verified by building the real dependency graph), fires on a genuine one where the language permits it (a 33-package cycle in a Java backend, a 31-component cycle across a PHP framework’s separately-published packages, a 14-package cycle in a Kotlin HTTP client, two namespace cycles inside a C# media server, and a real cycle in nginx’s foundational C core), and reads the subtler cases right: on Elixir it separates the compile-time graph the compiler keeps acyclic from the runtime cycle it allows, and on the C/C++ pair — the same #include model — it fires on nginx yet stays silent on a strictly-layered C++ library, because that regime permits cycles without requiring them. Sharper still: the same swallowed-error check is inapplicable on pure C (which has no exception construct to match) yet correctly re-activates on C++ — same check, different language. Each fire was independently re-computed a second way before it was trusted.
| Cycle regime | What the language does | Field witnesses | The same IF-8 check… |
|---|---|---|---|
| Hard ban | A cross-component cycle is a compile/build error — structurally impossible. | Go, Rust, C#, OCaml, Swift | — stays silent (0 cycles) |
| Partial ban | Compile-time cycles banned; runtime call-cycles permitted. | Elixir (Phoenix) | — silent compile-time · ✓ fires at runtime |
| Permitted, explicit | No ban; dependencies are explicit import statements — fully groundable. | Java, PHP, Kotlin, C# namespaces, TypeScript | ✓ fires — a real SCC |
| Permitted, textual | No module system; #include + guards let a cyclic include graph compile. | C (nginx) · C++ (Poco) | ✓ nginx · — Poco |
| Permitted, hidden | Dependencies are implicit (autoloaded constant refs) — below an import-grep substrate. | Ruby (Rails / Zeitwerk) | declines to fabricate — honest gap, never a false alarm |
Same check, five compiler regimes. It builds the real dependency graph for each language's actual unit of modularity, then runs a genuine cycle search — so it fires where cycles are real and permitted, and stays silent where the language forbids them. The C/C++ row is the proof: identical #include model, nginx firing a real triad while a strictly-layered C++ library stays clean.
Every one of these is recorded as a structural observation, never filed as a bug or published — and these are direct detector sweeps, not the full graded pipeline. The point is comprehension across the ecosystem, not a bug count on famous repos.
Where Graphify ends and the understanding begins. Graphify is the AST extractor that sharpens the structural graph on 25 languages — it resolves an import to the file that defines the symbol, which a grep can’t. But it’s an accelerant, not the engine. A stack with no grammar (Crystal, OCaml) automatically falls through to a ctags/grep import graph that still captures roughly 80% of cross-component imports — the run just carries lower certainty, and the rule is always degrade, don’t false-flag. The proof it isn’t a crutch: most of these 15 field sweeps were run on the grep fallback, before Graphify was wired in as the default — so the comprehension above stands on its own. The language reasoning — knowing a cycle is even possible in this language before flagging one — is DeepInit’s, layered on top of whatever parser is available.
Beyond the sweeps, the full pipeline is run on a deliberate matrix of 16 leading repositories — 13 languages × three size tiers, each pinned to a commit and measured, not cherry-picked. The kinds of project span how real codebases actually differ: web frameworks (gin/Go, express/JS, sinatra/Ruby, laravel/PHP at 330k lines, phoenix/Elixir), libraries & CLIs (click, gorilla/mux, itsdangerous, fmt/C++, a Kotlin schema lib, uniffi-rs/Rust), and larger apps, data stores & SDKs (redis/C at 346k lines, excalidraw/TypeScript at 157k, pyccel transpiler, a commercetools Java SDK). 15 of the 16 parse on the designed AST path; the one that doesn’t (Crystal, no grammar) proves the grep-fallback degradation path end-to-end.
DeepInit estimates the cost up front, before a token is spent. We measured the real output across the matrix to keep that estimate honest: a small library runs around 150–160k tokens, a medium framework 80–230k, and a large 100k-line transpiler about 200k — one thorough pass, then incremental updates that re-read only what changed. We’re holding the dollar figure blank on purpose: the token counts are measured, but a published price waits on one clean end-to-end accounting run rather than an estimate. (INDICATIVE; Claude Opus pricing as of June 2026; re-derivable from committed records.)
We measured it. On three of those repos we ran the analysis three ways — the full designed path (AST + grounding + verification), the grep fallback, and a naive LLM-only baseline (the controlled stand-in for “dump the code into a model and ask for docs”, no structural parse / no grounding / no verification) — and scored each, blind, against the parsed graph and the real code. The full path grounded ~99% of its claims to a verified file:line; the naive baseline grounded ~44% (and 0% on one repo — it cites filenames, not lines), inflated the dependency graph with edges that aren’t imports, and surfaced none of the grounded security-relevant findings the verified paths did. The honest part: on these famous repos every mode was ~99–100% faithful (the model knows them) — so the real difference isn’t hallucination, it’s whether you can trust which line a claim refers to, and whether the gaps get caught. That is the whole thesis: a prompt gives you a plausible description; the harness gives you a grounded, verified one.
All figures here are DeepInit’s own, INDICATIVE (small-n, repo@SHA-grounded, re-derivable from committed records); the precision result above stays the headline. We also run DeepInit on our own tools — an independent review of a dogfood run on our plugin returned “would use”, 10 of 11 spot-checked claims correct, every hard count exact.
DeepInit is defined in markdown, so the honest measure of its depth isn’t a line count — it’s the validation surface and the decision log that make a model’s understanding trustworthy. And under that markdown it runs genuine graph, AST and set-algebra algorithms over a whole-system model, not text a single clever prompt could match. Put plainly, it’s closer to a research result and the lab that proves it than to an app — a small, dense instruction-set, with the overwhelming majority of the committed repo given to controlled fixtures, real-world field runs, and a decision log built to show it produces grounded, correct output. The engine that runs it is Claude itself.
Run three ways over the same repos and scored against the AST as ground truth, DeepInit’s full path grounds 98.9% of its claims to a verified file:line (100% on the grep-fallback path); a naive LLM-only baseline grounds just 43.5% — 0% on one repo — and missed every grounded security-relevant finding. That gap is the difference between analysis and a confident guess.
Tarjan strongly-connected-components. It builds your real import graph — resolving each import to the file that defines the symbol — and runs an SCC search to find every set of components that secretly import each other. A cycle is a global property of the whole graph; no per-file linter can see it.
Cross-module constant-fold. It carries an exported constant’s literal value across an import edge, follows re-export chains to the single origin, and only then flags a branch as unreachable — grounded to that origin’s exact line. Standard tools don’t fold constants across modules; this lives in the gap they leave.
Name-keyed set difference. It groups value-sets defined under the same name in two places and fires only when their membership diverges — the polarity-opposite of a copy-paste detector, which goes silent the moment two copies drift apart. Five guards keep it from firing on coincidental name-clashes.
The depth no prompt has is the record of detectors we deliberately rejected after measuring them. Each was put through a design workshop and then an independent adversarial panel — reviewers blind to the verdict, told to try their hardest to ship it — left a frozen specification, and named exactly what would unblock it. We added no green test for any deferred check, because a test you co-designed to pass proves the wrong thing. Several detector families were measured and shelved this way; the engine ships only the checks whose decidable rule can’t be confused with a guess.
Each detector grounds to a criticality-ranked, lifecycle-tracked, verified finding — report-only, never touching your source.
~1.12M lines of real open-source code across 15 stacks, plus an independent oracle of 22 real bug fixes in 4 languages.
A weekend prompt has zero regression checks. This has a deterministic gate that must stay green — and a meta-harness proving each check is load-bearing (92/92 killed).
Every fire shown across this page is a structural observation — recorded, never filed as a bug or published. DeepInit is report-only and 100% local; it writes context and flags problems, it does not build a graph for you to explore or run your code.
Point DeepInit at it and get a grounded map in minutes: the architecture, the components, and the non-obvious rules a new engineer (or agent) would trip over — every claim cited to a file:line you can open.
Even code your agent helped write drifts — rules accrete, modules start sharing state, the live schema moves, and a single hand-written CLAUDE.md goes stale. DeepInit keeps a grounded, current CLAUDE.md and refreshes only what changed (/deep-init-update / /deep-init-check) — so the agent works from what the code is now, not what it was when you last wrote it down.
The lean CLAUDE.md (and the AGENTS.md / Copilot / Cursor / Windsurf projections) give the agent the load-bearing context up front — so it stops guessing the conventions and architecture on every task.
Before you move things, see the key invariants, the boundary rules, and the hidden couplings (the same drift / contradiction / circular-dependency families it reports) — so the refactor doesn't quietly break an unwritten rule.
/init?No. /init writes a quick starter file from a one-pass read. DeepInit parses your code first (real AST via Graphify, with a graceful fallback), grounds every claim to a verified file:line, separates a lean always-loaded tier from a deep on-demand one, and reports the problems it finds — and it's regression-tested so it doesn't quietly drift as the model changes. The comparison table above lays out the difference column by column.
No. It is report-only. It writes documentation (a CLAUDE.md owned region and an .ai/ folder) and never edits your source. It writes a .bak before touching any existing file and preserves human-authored content byte-for-byte.
No — it's 100% local. The skill declares no network tool; there is no egress path (we gate that as a test). Parsing and analysis run on your machine in your existing agent session. Secrets and PII are redacted before anything is written, and the report it generates opens offline with zero network calls.
It runs in your own agent session, so the cost is the model tokens of one analysis pass — no subscription, no API key for the parser. A small repo is an inexpensive single pass; a large one costs more. We're finishing a clean per-tier benchmark before publishing a dollar figure rather than inventing one.
DeepInit ships as a Claude Code plugin. To be clear, “plugin” and “skill” aren’t alternatives — the plugin is just the delivery package, and the /deep-init skill that does the work lives inside it. Installing the plugin is how you get the skill; after that, you run /deep-init in any project. No subscriptions, no API keys, no servers.
/plugin marketplace add deepfusionlabs/deep-init /plugin install deep-init@deepfusionlabs-deep-init
These are slash commands you type into the Claude Code chat (not your terminal). They install from DeepFusion Labs’ plugin marketplace on GitHub.
Claude Code reads plugin commands only at startup, so run /reload-plugins (or start a new session). A window reload won’t pick it up. Running Claude Code inside VS Code or JetBrains? Restart the IDE itself — reloading the editor window isn’t enough.
/deep-init # zero config — the full, thorough analysis (2 review cycles). The whole getting-started.
/deep-init-upgrade # pulls the newest version on one confirm, then guides the reload
Requirements: Claude Code. DeepInit is free and MIT-licensed; its public repo is coming soon — once it’s live, the two commands above work for anyone, with no access step or sign-up.
CLAUDE.md (or AGENTS.md), DeepInit backs it up first and preserves anything inside your keep-markers — your hand-written notes survive.A bare /deep-init is the zero-friction default. Beyond it, a curated menu of slash commands — no flags to memorize, nothing to mistype:
| You want to… | What it does | Command |
|---|---|---|
| The full run (default) | Thorough — 2 adversarial review cycles | /deep-init |
| Maximum scrutiny | 3 adversarial review cycles — for code you can’t get wrong | /deep-init-aggressive |
| A quick first pass | Skips the review cycles — faster and cheaper | /deep-init-fast |
| Refresh only what changed | Incremental re-analysis of the touched components + issue lifecycle | /deep-init-update |
| Check it’s still true | 0-token staleness + broken-citation audit (no model call) | /deep-init-check |
| Tune the run with buttons | Depth · issues · outputs · scope · cost — a native picker, no typing | /deep-init-customize |
| Translate the report | Emit report.<lang>.html in 8 languages (English stays canonical) | /deep-init-translate |
| Preflight (0 tokens) | Tools, scope, resolved config, families, cost estimate | /deep-init-doctor |
| Which version is running | Loaded vs on-disk — tells you if you need to reload | /deep-init-version |
| Update the plugin | Pull the newest from the marketplace, on one confirm | /deep-init-upgrade |
| Every command + option | Grouped and ordered by how often you’ll use it | /deep-init-help |
Type-safe by design — no flags to memorize. Each option lives where it costs you least: a command for the common dials, a button picker (/deep-init-customize) for the rest, and a JSON-Schema-validated .ai/deepinit.config your editor autocompletes and checks before you run. The literal flags and natural language still work for power users and CI.
It even knows its own version. Claude Code loads a plugin’s commands once per session, so after an update you can’t normally tell what’s actually running — /deep-init-version compares the loaded version against what’s on disk and tells you if you need to reload, and /deep-init-upgrade pulls the newest in one confirm. Most plugins can’t tell you what’s live.
Zero setup: DeepInit checks for its one dependency (scc) and installs it for you if it’s missing. Graphify gives richer parsing and installs the same way — optional; if you skip it during setup, DeepInit falls back to ctags/grep.
DeepInit by Deep Fusion Labs is free and MIT-licensed, built with a research-first process — every choice traced to evidence and checked before a line was written.
No lock-in: no proprietary format, nothing to escape from — DeepInit writes plain
markdown into the files your agent already reads. For Claude Code that's CLAUDE.md
(loaded natively — the grounded replacement for /init); for the other tools it
emits the open AGENTS.md standard (Agentic AI Foundation, under the Linux
Foundation; already used by 60,000+ repositories) plus per-tool rule files (.cursorrules, .github/copilot-instructions.md, .windsurf/rules/). Generate once and any agent —
Claude Code, Codex, Cursor, GitHub Copilot, Google Jules — can use it; remove DeepInit and
your context files still work. Claude Code gets first-class support as a native skill. And it all runs
100% locally — no server to keep running — reading only your code, with a secret/PII
redaction gate that scrubs detected secrets before anything is written.