Home DeepQuery DeepKnowledge GitHub Get in Touch
Deep Fusion Labs · Open source Your coding agent can read every line of your code — and still breaks rules that were never written down anywhere.

DeepInit — grounded, verified code-truth for your coding agent

DeepInit writes down the truth about your codebase — the real rules, the live database schema, the why, and the problems it finds — every claim checked against your code before it's written.

One command kicks off a seven-wave analysis: DeepInit reads your whole codebase — deep into each component, then across the system, in parallel — grounds every claim to a file:line and checks it against your code before writing the lean two-tier brief your agent loads. So your agent stops guessing — and DeepInit flags the real problems it finds along the way. Local, MIT.

Writes the verified truth your rules, the live database schema, the why — checked Flags the problems measured, not noise — grounded & report-only Tested like production software 331-check test harness · 0/22 false positives · not a weekend skill
Free & open source (MIT) 100% local Works with Claude Code, Cursor & more
The problem

Your agent can see every file — and still misses what matters.

Point an agent at a large codebase it's never seen — half a million lines, years of decisions — and it'll rewrite a function with total confidence, then break a business rule that was never written down.

It treats a critical business rule like a style preference, and quietly breaks it.
It trusts your schema file when the live database has drifted — and writes against a column that's already gone.
It changes one service and breaks another through a shared table it never knew they shared.
It rebuilds something that already exists three modules over — nothing told it the code was there.
It "cleans up" a workaround that was load-bearing — the ugly code was holding something together.
It contradicts a decision your team made months ago — one it had no way to know about.
It renames a function and leaves broken references scattered across files it never thought to check.
It pulls in a new dependency for something your codebase already does its own way.
It calls the deprecated function when a newer one exists — the old one was still in the tree.

And the obvious fix backfires: piling everything into a "document everything" file (think a giant CLAUDE.md or AGENTS.md) makes agents do worse, not better — a 2026 ETH Zurich study measured lower task success and 20%+ more cost per task, because the few things that matter get buried. The evidence →

What you get

Lean, verified context — and the problems it finds, in the same pass.

DeepInit writes a lean, always-loaded context file your agent auto-loads — CLAUDE.md for Claude Code, the open AGENTS.md standard plus per-tool rules for Cursor, Copilot and Windsurf (.cursorrules, .github/copilot-instructions.md, .windsurf/rules/) — and a deep .ai/docs/ layer it pulls on demand, flagging the real problems in the same pass. Both come from one engine, and both obey the same rule: every claim cites a file:line, gets verified against your code before it's written, and is trimmed to only what your agent couldn't work out on its own. Nothing fabricated, nothing restated, nothing it can't point at.

It writes a short, always-loaded front-door file — CLAUDE.md for Claude Code (which loads it natively), or the cross-tool AGENTS.md standard for other agents — plus deeper files your agent opens on demand, kept small on purpose so the few things that matter aren't buried.

Every line meets one test — it’s something your agent can’t already figure out for itself.

How it works

Seven waves. Parallel. Adversarial. Thorough.

It comes down to three moves — parse, understand, verify. Most tools hand your whole repo to an LLM and hope; DeepInit parses your code first (real AST parsing via Graphify, 25 languages), reasons about meaning on top, then verifies every finding against the code before it's written. Under the hood that's seven waves: parallel subagents analyze each component and then the patterns across them, before findings are cross-checked and adversarially reviewed.

1

Preflight no tokens

Auto-detect tools, estimate the token cost up front, check the database, get permissions. Nothing runs until you approve it.

2

Discovery no tokens

Scan the tree, detect the stack, build the structural graph, order components by dependency, read git history — all deterministic, before any model runs.

3

Vertical analysis parallel · heaviest

Deep into each component, leaves first. Parallel subagents pull business rules, workflows and integration points — grounded to file:line.

4

Horizontal analysis parallel

Across all components — the patterns no single-component pass can see: shared tables, end-to-end workflows, bounded contexts.

5

Cross-references

Unify it — entity ↔ component ↔ table maps, rule-to-workflow links, coverage-gap detection across the whole set.

6

Filter, verify & review the gates

Drop anything inferable; every surviving claim must resolve to real code; a critic agent challenges the findings (0–3 cycles).

7

Generation

Emit the two tiers — lean root + deep .ai/docs/ — plus a Claude Code skill package. Backup first; re-check every claim against the code.

Inside waves 3 & 4 — go deep, then go wide

Wave 3 Vertical — deep into each component
auth/
business rules
DB schema
workflows
the why
billing/
business rules
DB schema
workflows
the why
orders/
business rules
DB schema
workflows
the why
then — go wide
Wave 4 Horizontal — patterns across them
auth/billing/orders/
shared tables · end-to-end workflows · bounded contexts · domain rules

Your database, and what stays private

DeepInit reads your live database schema read-onlySQL this release (Postgres, MySQL, SQLite), where it compares the live schema against your code and flags the drift. NoSQL stores (Mongo, Redis and others) are stub-level for now — honestly labeled, not implied as done. A secret/PII redaction gate scrubs anything sensitive before a single file is written to disk. (A cross-model verification pass — a second model double-checking findings — is on the roadmap, not shipped yet.)

It tells you the cost before it spends a token. A preflight shows the estimate; nothing runs until you approve it, under a ceiling you set. After the first run, /deep-init-update re-reads only what changed and /deep-init-check costs zero tokens — so you're never surprised by the bill.

The philosophy: tokens are fuel, not a constraint.

DeepInit burns tokens generously on the first pass — parsing, inspecting your database, cross-checking and verifying every finding — because one thorough analysis is worth far more than a cheap guess your agent will trust for months. Quality is the default, not an upsell.

And counterintuitively, it costs you less.

Two ways. At runtime: your agent reads a lean, ready-made brief instead of burning tokens re-exploring the codebase from scratch every session — and a tight file is cheaper per task than a bloated one (the same ETH study measured 20%+ more cost from oversized context).

Over time: you pay for the deep analysis once, then /deep-init-update refreshes only the diffs and /deep-init-check is free — instead of re-deriving the whole picture every session, on every machine, for every teammate. Spend on depth once; coast for months.

Run it once — then it stays current, proportional to the diff.

After the first run you never re-pay for the whole repo. An edit re-documents only its blast radius — the components you touched, plus anything whose public interface actually moved. Here is exactly how /deep-init-update stays proportional to the change:

1

Detect 0 tokens · deterministic

A content_hash per component, compared against the stored manifest by an authoritative symmetric set-diff (stored + current keys). git diff and the commit breadcrumb are only accelerators — so a deleted module, or a repo with no git history, is still caught.

2

Mark the dirty set

Just the components whose content actually changed — nothing else is touched.

3

Skip safely — the interface-hash test the cost saver

Recompute each dirty component’s public-surface hash. A body-only refactor re-analyzes that one component but skips its dependents (their view is provably unchanged); only a changed export marks the transitive dependents dirty.

4

Always re-run the whole-system docs the safety net

The five cross-cutting docs re-run regardless of what changed — because a cross-component effect (a new circular dependency, a shifted end-to-end workflow) is invisible from any single component’s diff. This is what makes the step-3 skip a reversible optimization, not a correctness risk.

5

Re-emit only the affected files

Filter, redact, re-verify every citation, then write only the changed files — inside the owned-region markers, with a dated reversible backup. Issues are diffed against a symbol-keyed baseline (new / persisting / resolved / regressed), so a line-shift never re-churns a finding.

Two things it therefore guarantees — the two ways docs usually rot, closed:

The guarantee

A real interface change can’t silently skip a dependent

Even on the grep path (no precise parser present), DeepInit reconciles the public surface it captured against export-indicator tokens — an unusual form (an export *, a CommonJS module.exports, a dynamic __all__) marks the surface “incomplete” and conservatively re-checks dependents. A breaking change never quietly skips the code that needed it.

The guarantee

A removed file never leaves an orphaned doc

Because detection is a symmetric set-diff (stored vs. current), a deleted or moved component is caught and its docs archived — even with no git history, or a shallow / rewritten ref. No stale page outlives the code it described.

And it tells you — for free — the moment the docs fall behind.

Two plugin-shipped hooks do the surfacing, both calling the same 0-token, no-LLM status script (the only kind of check a git hook can actually run): a post-commit nudge that prints how far the docs have drifted, and a SessionStart offer — open a session while the docs are behind and DeepInit offers a one-click refresh. Beyond those, a real headless auto-refresh exists but is off by default, because it’s the only level that spends tokens. None of them ever auto-commits — you always review the diff.

The honest part: a git hook can’t summon an AI session, so DeepInit doesn’t pretend your docs silently regenerate on every commit — the free nudge and the session-start offer are the real, visible surface.

The problems layer

The problems fall out as a byproduct — grounded, ranked, report-only.

Because DeepInit already extracts your rules, your schema, and the why, the problems surface as a byproduct — now across about ten issue families plus a class-conformance census, every finding grounded to the line, framed as “likely” rather than asserted, and report-only — it never touches your source. Anything a linter already catches is suppressed, because false positives are what kill tools like this. Five of the kinds it looks for:

1

Your database has drifted from your code

Code that still reads a column the live database no longer has. Schema-diff tools check the DB against a declared schema; this checks it against your actual code — on legacy code with no clean schema.

2

The code contradicts a decision you made

Code that breaks a decision you recorded, or a "temporary" hack that's quietly load-bearing — only visible when the documented why is on hand.

3

A rule is enforced in some places, not others

A business rule applied on one path but missing on another that writes the same data — access-control gaps included (surfaced as rule violations, not a security claim).

4

Two components are secretly coupled

Parts quietly sharing a table with no interface between them — change one, break the other.

5

Where to look first

Everything ranked by what to fix first — how often it changes, how few people understand it, how critical it is, how thin the tests are.

The goal — a short, trustworthy list worth a human's attention, not a wall of warnings. Findings are framed as likely and grounded to a file:line rather than proven, and this isn't a security product.

One report you can read — and act on.

One report — Docs + Insights

A single self-contained, offline report.html: the browsable docs (search, a component tree, an architecture overview, a decisions timeline, jump-to-file:line) and the issue/metrics dashboard, merged into one file with a ⌘K palette. Vanilla JS, no network. (It supersedes the old docs-viewer.html + dashboard.html, now redirect stubs.)

Read it in your language

One command emits report.<lang>.html in 8 languages (Spanish, Chinese, Portuguese, Russian, Japanese, German, French, Hebrew — full RTL), with an in-app switcher. English stays the canonical analysis; grounded tokens (file:line, code, record IDs) are masked and verified, so a translation can never corrupt a grounded claim — and any miss falls back to English, never a fabricated translation.

Risk, ranked — with the graph

The Insights view shows real composite risk (severity × criticality × churn × bus-factor × coverage) when the repo has the signals — and honestly says “unavailable” when it doesn’t, never a fake zero. Beside it, a static component-dependency graph read from the structural graph DeepInit already computes. (An interactive explorer — DeepMap — is coming.)

Shows up in your tools

Findings appear in GitHub code scanning and your IDE alongside everything else — via SARIF v2.1.0, the standard format those tools already read. No custom integration.

See an example

What it actually writes down.

## Business rules — billing  (vertical — per component)
[BR-billing:003] CORE — An invoice can't be voided once its payment has settled;
            void attempts must go through the refund flow instead.
            from  src/billing/invoice.ts:142          ✓ checked · HIGH

## Database vs. code — orders  (vertical)
  orders.legacy_status (text) is still read by the reporting job, but your
            Prisma schema dropped it — an agent trusting the schema will miss it.
            from  prisma/schema.prisma:88  ↔  src/orders/order.ts:24     ✓ checked

## Use case — cross-component  (horizontal — across components)
[UC-014] Checkout → charge → fulfil: orders.create() calls billing.charge();
            on success it emits `paid` → fulfilment.start(). A failure *after* the charge
            must call billing.refund() — orders can't roll the payment back itself.
            spans  orders/ · billing/ · fulfilment/     ✓ checked

How every “✓ checked” is earned

Grounded + verified isn't a slogan — it's a pipeline stage. Before a single claim reaches your context file, it walks this chain. If a citation doesn't resolve, the claim is dropped from the lean tier, never silently kept.

1 · The claim
“An invoice can't be voided once payment has settled.”
2 · Grounded
tied to src/billing/invoice.ts:142
3 · Exists · 0 tokens
file + line in range, symbol still there — a deterministic check
4 · Plausible
the cited line actually supports the claim
5 · Written
only now does it reach CLAUDE.md

Verified means the citation exists and is plausible — not that the claim is provably correct. A confidently-wrong claim is worse than a gap, so DeepInit prefers omission over a guess.

Every finding is typed, tagged by importance and confidence, points to the exact file:line, and is checked against your code before it's written. Findings are traceable by ID:

BR business rules DR domain rules WF workflows IP integration points UC use cases US user stories WA workarounds ADR decisions KL knowledge log

This example is TypeScript — DeepInit's AST parser reads 25 languages (Rails/Ruby, Python, Go, Rust, Java, C#, PHP, Kotlin and more), with a grep fallback for the few it doesn't (e.g. Crystal, OCaml).

How it compares

Wikis and graphs give you a place to ask. DeepInit puts the answer where your agent already looks.

Wikis, code graphs, and index tools give you a separate place to go ask questions — a system that's only as current as its last crawl. DeepInit writes verified markdown straight into the context files your agent already loads — CLAUDE.md for Claude Code, AGENTS.md for the rest — so the context is just there, in front of the model, on every run. And it carries a second axis nothing else does: it's measured. The same file:line grounding that makes the context trustworthy is what lets it flag problems without crying wolf — a precision discipline we test on every change, not a promise.

Two rows are the whole story — start here, then the full table substantiates the rest.

Live database vs. your code

It spots the drift nobody else looks for.

DeepInit compares your live database schema against what the code actually reads and writes, and flags where they've diverged — a column the schema dropped that a job still reads. Schema-diff tools check the DB against a declared schema; this checks it against your code.

No other tool here does this
Grounded + verified, kept current

Every claim is checked against your code — and stays true.

Each statement cites a real file:line, is verified to exist before it's written, and is re-checked as the code changes. The others snapshot prose and go stale at the next crawl; DeepInit writes verified truth and maintains it.

Others snapshot — DeepInit verifies & maintains
What mattersDeepInit/init (Claude Code)Starter-file generatorsUnderstand-AnythingGitNexusGoogle Code WikiDeepWiki
ApproachAnalyze once → writeQuick scanScaffold a stubGraph to exploreGraph to queryWiki to askWiki to ask
License & costFree · MITBuilt inFree / open sourceFree · MITPaid for commercial useFree public · cloudPaid for private · cloud
RunsOn your machineIn the agentLocalLocalLocalIn the cloudIn the cloud
What it producesContext files your agent readsOne context fileContext filesA graph + dashboardA code graphA hosted wikiA hosted wiki
How it reads your codeReal parsing + AI, checkedQuick AI readFile scan + AIParsing + AICode graphAI over the repoAI over the repo
Your business rules✓ written & ranked~ a domain view~ in prose~ in prose
The "why" behind the code~ inferred~ inferred
Database vs. your code✓ spots the drift
How features cross the code✓ traced~ dependencies~ dependencies~ in prose~ in prose
Keeps only what helps— no filtern/an/a
Traceable to the exact file & line✓ every finding~~~ links~ links
Checked against your code
Measured precision (false-positive rate)✓ 0/22 on real bugfixes
Flags risky / single-owner code
Small file + depth on demand— one file~ tieredn/an/an/an/a
Works with your agent✓ standard files · Claude Code first-class— its own agent✓ standard files✓ many~ some editors~ web / MCP~ web / MCP
Cheap to keep updated✓ only what changed~~~ auto (cloud)~ auto (cloud)
Stays on your machinevia the agent— cloud— cloud

✓ = does it · ~ = partial / adjacent · — = doesn't · "n/a" = not that kind of tool.

A closer look at the main alternatives

/init (Claude Code)
Claude Code · similar in Codex
Built into the agent
The zero-effort starting point: a quick structural overview — stack, layout, a few conventions. Shallow by design. No business rules, no database, no rationale, nothing verified.
DeepInit: the depth upgrade for the same files — rules with criticality, live DB drift, the why — checked against the code.
Understand-Anything
multi-agent · knowledge-graph.json + dashboard
MIT · multi-platform
A good tool: a multi-agent pipeline turns a repo into a portable knowledge graph and a dashboard you explore. Built for exploring structure, not briefing an agent up front. A graph, not a context file.
DeepInit: writes compiled, verified meaning into the files the agent already loads — instead of a graph to query. Complementary, not competing.
Google Code Wiki
Gemini · scans after every commit
Google's self-updating wiki: structured docs, diagrams, and a chat that points to line numbers. Excellent for human onboarding. A hosted, human-facing wiki — agent context isn't the goal.
DeepInit: 100% local and agent-facing — files the agent loads, not a hosted wiki — with the deep semantic layer, verification, and MIT.
Karpathy's "LLM Wiki"
a pattern · markdown · community code forks
Open pattern · built for documents
Not a product — Andrej Karpathy's popular pattern for turning documents into a self-maintaining markdown wiki an agent builds and you then query; community projects adapt it to code. Like the tools above, the output is a wiki you ask — not context the agent auto-loads.
DeepInit: the same instinct — precompute, don't re-derive — but aimed at a codebase, verified against the code, and written into the files your agent already reads.

Comparisons reflect publicly available information about third-party products as of June 2026. These tools evolve fast — details may be out of date. "Starter-file generators" are lightweight tools (Apify's generator, hcc, Intent and similar) that scaffold a short AGENTS.md — useful, but shallow by design; several deliberately cap themselves at ~20–30 lines. Product names and trademarks belong to their respective owners and are used for identification only; mention does not imply endorsement or affiliation. Something inaccurate? Open an issue and we'll fix it.

That drift check and the verified-to-the-line grounding are the difference. Point it at your repo:

Get started
Evidence & limits

Grounded in research — and honest about the limits.

DeepInit's design is a response to a measured, counterintuitive result: handing a coding agent a big, auto-generated context file makes it perform worse — and costs more to run. So DeepInit does the opposite of "document everything."

20%+

more cost per task — the penalty for bloat. Pile everything into one auto-generated context file and the model burns extra tokens on context it didn't need, for output that comes out worse, not better. The fix isn't more context, it's less: trim to only what the agent can't infer, and the result flips to a measured gain.

ETH Zurich / LogicStar · arXiv:2602.11988 · CC BY 4.0

0 / 22

false positives on real, human-merged bug fixes — it never re-flagged a line a maintainer had already fixed. Across four more real repos, a naive rule-checker would have fired ~90 false alarms; DeepInit fired none.

DeepInit’s own measurement, on real code · How we tested it

0 wrong

Run blind on 8 real repos with their own architecture docs removed, DeepInit never once confidently stated a fact the code disproves. That's the Mirror Test — does it actually understand your architecture? We strip a project's own docs (.NET, Rust, Python, Go), run on the code alone, and grade what it re-derived. Zero confidently-wrong facts — the same hard bar as the precision result above, applied to understanding.

What it re-derived (indicative): ~66% of what the human docs state, at 98% faithfulness — strongest on structure (components, dependencies, data stores), honestly weaker on deep invariants. Is it just memorizing famous repos? No: on two obscure repos a model is very unlikely to have memorized (one Go, one Rust) faithfulness held at 100% — so what it states about your unfamiliar code is just as trustworthy. The coverage number moves with how deep one pass goes; the trustworthiness doesn't.

DeepInit’s own measurement, docs removed · INDICATIVE (8 held-out repos; 2 contamination-resistant shown) · the precision result above stays the headline

What the research says

The limits

It reads your code — it doesn't run it.

DeepInit analyzes your source and inspects your live database schema (read-only). It doesn't execute your app, so purely runtime behavior — load, race conditions, anything that only surfaces live — is out of scope.

It flags likely problems — it won't prove them.

Findings are framed as likely, grounded to a file:line, with linter-territory suppressed. It is not a security product and makes no safety guarantee — access-control gaps are surfaced as risk to review, not proof your code is safe.

The first run is thorough — not instant.

A full analysis does real work: parsing, database checks, and multiple verification passes. Expect minutes and real tokens the first time, not seconds — after that, updates are incremental and cheap.

Measured precision, an architecture it re-derived from code alone, the limits stated plainly. That's the proof — run it on your own repo:

Get started
How we tested it

Engineered to be right — measured not to cry wolf.

A finding earns trust two ways: it is grounded in your actual code, and it is not noise. DeepInit is built for the first and measured for the second — on its own fixtures, on 22 real human-merged bug fixes, and on real open-source repos.

Right — grounded, verified, demonstrated

Then the harder half — the false-positive rate, because a false alarm is what gets a tool turned off:

These are DeepInit’s own measurements on real code, kept exact. Recall (the false-negative side) on those real bug fixes was 14/22 — labeled indicative, below our own ship-gate, not a headline — and the speedup benefit is still being benchmarked, so this page leaves it blank rather than invent one.

The five layers — and the limit of each

Trust comes from stating what each test proves and what it can’t. In rough order of independence:

  1. A 331-check regression harness across 72 oracle sections (deterministic, no model) — proves the engine’s mechanical logic stays correct on every change: dependency ordering, the cost math, the issue oracles, the census arithmetic, SARIF shape, and the scoring that grades the blind runs below. Doesn’t prove the model-driven findings are right — it grounds and scores fixtures, it doesn’t run the live analysis.
  2. Blind runs on our own fixtures — multiple independent lenses, each blind to the answer key, agree: recall 9/9, false positives 0. Doesn’t prove real-world recall — these fixtures were designed alongside the spec, so 9/9 over-states what you’d see on unseen code. We say so.
  3. An independent oracle on real fixed bugs — 22 human-merged bug fixes across 22 repos and 4 languages (3 of them CVEs). The hard, gated result: it never re-flagged a line a maintainer had already fixed (0/22). Doesn’t prove a headline recall number — recall here was 14/22, small-sample and below our own ship bar, so it stays an indicative sub-line, never a claim.
  4. Naive-vs-guarded precision — on four real repos carrying a documented rule, a naive “mismatch = violation” checker would have fired ~90 false alarms; the guarded detector raised none, every census signal arithmetically correct. Doesn’t prove those repos had bugs — they’re clean; this measures precision and scope-honesty, not recall.
  5. A record of the checks we refused to ship — where a check’s decidable rule couldn’t be told apart from its real-defect rule (it would cry wolf), we measured it, left it off, and wrote down why. Doesn’t prove those problem classes are undetectable — each deferral names exactly what would unblock it.

This is the harness — not the model. Together, those five layers are the harness: the engineered apparatus around the model that makes its understanding of your code trustworthy. It is the part a weekend prompt doesn’t have. A prompt hands you one ungrounded guess; the harness grounds every claim to a file:line, measures its own false-alarm rate, and is regression-tested on every change — so it doesn’t quietly drift as the model moves underneath it.

How we keep it honest — the techniques, not just the count

A big test count is easy to fake. What makes the 331 checks across 72 oracle sections trustworthy is how they’re built — four disciplines borrowed from how you’d test production-grade software, not a prompt:

The answer key is the maintainer’s patch

Metamorphic bug-fix replay. We replay 22 real, human-merged bug fixes (3 of them CVEs): the detector must flag the broken commit and must not re-flag the fixed one. The ground truth predates our spec, so nothing can be over-fitted to it — and re-flagging an already-fixed line came up 0 times (0/22).

The grader never tells the engine the answer

Blind, separated duties. The engine emits its findings before anyone sees the key; an independent party pins the held-out answers; a third scores. In the Mirror Test the reference doc is provably removed from the inputs and the commit pinned by hash — so it reconstructs your architecture from code alone, never from a doc it quietly read.

We test that our tests would catch a regression

Mutation meta-testing. A meta-harness makes one known-bad edit at a time to a committed fixture and demands the suite go red. 92 of 92 mutations killed, 0 survived — proof the checks are load-bearing, not decorative. A check that can’t catch a planted bug proves nothing.

A model change can’t quietly rot a number

Frozen baselines + a drift guard. The hard zeros (never confidently wrong, zero false defects, zero re-flagged fixes) are re-asserted against the current results on every change. And every figure on this page is derived from committed records by one aggregator — no number here is hand-typed, and a stale one fails the build.

The 331 checks run with no model in the loop (deterministic) and grow only by addition — the original engine checks must never regress. Recall is reported, never gated, and kept below the headline because our own fixtures over-state it; precision (the false-alarm side) is what we gate on.

Field-validated across 15 language stacks (of the 25 its parser supports)

The same detectors, run over ~1.12M lines of real open-source code in 15 languages — Go, Rust, C, C++, Java, C#, Kotlin, PHP, Ruby/Rails, Elixir, OCaml, Swift, Python, TypeScript, Crystal — comprehend each language’s own structure rather than matching a surface pattern. The clearest example is circular dependencies: the identical check stays silent where the language forbids cycles (Go packages, Rust crates, C# assemblies, and the OCaml and Swift build manifests — verified by building the real dependency graph), fires on a genuine one where the language permits it (a 33-package cycle in a Java backend, a 31-component cycle across a PHP framework’s separately-published packages, a 14-package cycle in a Kotlin HTTP client, two namespace cycles inside a C# media server, and a real cycle in nginx’s foundational C core), and reads the subtler cases right: on Elixir it separates the compile-time graph the compiler keeps acyclic from the runtime cycle it allows, and on the C/C++ pair — the same #include model — it fires on nginx yet stays silent on a strictly-layered C++ library, because that regime permits cycles without requiring them. Sharper still: the same swallowed-error check is inapplicable on pure C (which has no exception construct to match) yet correctly re-activates on C++ — same check, different language. Each fire was independently re-computed a second way before it was trusted.

Cycle regimeWhat the language doesField witnessesThe same IF-8 check…
Hard banA cross-component cycle is a compile/build error — structurally impossible.Go, Rust, C#, OCaml, Swift— stays silent (0 cycles)
Partial banCompile-time cycles banned; runtime call-cycles permitted.Elixir (Phoenix)— silent compile-time · ✓ fires at runtime
Permitted, explicitNo ban; dependencies are explicit import statements — fully groundable.Java, PHP, Kotlin, C# namespaces, TypeScript✓ fires — a real SCC
Permitted, textualNo module system; #include + guards let a cyclic include graph compile.C (nginx) · C++ (Poco)✓ nginx · — Poco
Permitted, hiddenDependencies are implicit (autoloaded constant refs) — below an import-grep substrate.Ruby (Rails / Zeitwerk)declines to fabricate — honest gap, never a false alarm

Same check, five compiler regimes. It builds the real dependency graph for each language's actual unit of modularity, then runs a genuine cycle search — so it fires where cycles are real and permitted, and stays silent where the language forbids them. The C/C++ row is the proof: identical #include model, nginx firing a real triad while a strictly-layered C++ library stays clean.

Every one of these is recorded as a structural observation, never filed as a bug or published — and these are direct detector sweeps, not the full graded pipeline. The point is comprehension across the ecosystem, not a bug count on famous repos.

Where Graphify ends and the understanding begins. Graphify is the AST extractor that sharpens the structural graph on 25 languages — it resolves an import to the file that defines the symbol, which a grep can’t. But it’s an accelerant, not the engine. A stack with no grammar (Crystal, OCaml) automatically falls through to a ctags/grep import graph that still captures roughly 80% of cross-component imports — the run just carries lower certainty, and the rule is always degrade, don’t false-flag. The proof it isn’t a crutch: most of these 15 field sweeps were run on the grep fallback, before Graphify was wired in as the default — so the comprehension above stands on its own. The language reasoning — knowing a cycle is even possible in this language before flagging one — is DeepInit’s, layered on top of whatever parser is available.

Tested across a language × size matrix

Beyond the sweeps, the full pipeline is run on a deliberate matrix of 16 leading repositories — 13 languages × three size tiers, each pinned to a commit and measured, not cherry-picked. The kinds of project span how real codebases actually differ: web frameworks (gin/Go, express/JS, sinatra/Ruby, laravel/PHP at 330k lines, phoenix/Elixir), libraries & CLIs (click, gorilla/mux, itsdangerous, fmt/C++, a Kotlin schema lib, uniffi-rs/Rust), and larger apps, data stores & SDKs (redis/C at 346k lines, excalidraw/TypeScript at 157k, pyccel transpiler, a commercetools Java SDK). 15 of the 16 parse on the designed AST path; the one that doesn’t (Crystal, no grammar) proves the grep-fallback degradation path end-to-end.

Measured per size tier — and it tells you before it spends

DeepInit estimates the cost up front, before a token is spent. We measured the real output across the matrix to keep that estimate honest: a small library runs around 150–160k tokens, a medium framework 80–230k, and a large 100k-line transpiler about 200k — one thorough pass, then incremental updates that re-read only what changed. We’re holding the dollar figure blank on purpose: the token counts are measured, but a published price waits on one clean end-to-end accounting run rather than an estimate. (INDICATIVE; Claude Opus pricing as of June 2026; re-derivable from committed records.)

Why real understanding beats “just send it to an LLM”

We measured it. On three of those repos we ran the analysis three ways — the full designed path (AST + grounding + verification), the grep fallback, and a naive LLM-only baseline (the controlled stand-in for “dump the code into a model and ask for docs”, no structural parse / no grounding / no verification) — and scored each, blind, against the parsed graph and the real code. The full path grounded ~99% of its claims to a verified file:line; the naive baseline grounded ~44% (and 0% on one repo — it cites filenames, not lines), inflated the dependency graph with edges that aren’t imports, and surfaced none of the grounded security-relevant findings the verified paths did. The honest part: on these famous repos every mode was ~99–100% faithful (the model knows them) — so the real difference isn’t hallucination, it’s whether you can trust which line a claim refers to, and whether the gaps get caught. That is the whole thesis: a prompt gives you a plausible description; the harness gives you a grounded, verified one.

All figures here are DeepInit’s own, INDICATIVE (small-n, repo@SHA-grounded, re-derivable from committed records); the precision result above stays the headline. We also run DeepInit on our own tools — an independent review of a dogfood run on our plugin returned “would use”, 10 of 11 spot-checked claims correct, every hard count exact.

Depth, not a weekend prompt

An instruction-defined engine — with real algorithms under the markdown.

DeepInit is defined in markdown, so the honest measure of its depth isn’t a line count — it’s the validation surface and the decision log that make a model’s understanding trustworthy. And under that markdown it runs genuine graph, AST and set-algebra algorithms over a whole-system model, not text a single clever prompt could match. Put plainly, it’s closer to a research result and the lab that proves it than to an app — a small, dense instruction-set, with the overwhelming majority of the committed repo given to controlled fixtures, real-world field runs, and a decision log built to show it produces grounded, correct output. The engine that runs it is Claude itself.

Run three ways over the same repos and scored against the AST as ground truth, DeepInit’s full path grounds 98.9% of its claims to a verified file:line (100% on the grep-fallback path); a naive LLM-only baseline grounds just 43.5% — 0% on one repo — and missed every grounded security-relevant finding. That gap is the difference between analysis and a confident guess.

The algorithms a one-file linter can’t reproduce

Circular dependencies, the whole graph

Tarjan strongly-connected-components. It builds your real import graph — resolving each import to the file that defines the symbol — and runs an SCC search to find every set of components that secretly import each other. A cycle is a global property of the whole graph; no per-file linter can see it.

Dead branches across file boundaries

Cross-module constant-fold. It carries an exported constant’s literal value across an import edge, follows re-export chains to the single origin, and only then flags a branch as unreachable — grounded to that origin’s exact line. Standard tools don’t fold constants across modules; this lives in the gap they leave.

The same list that quietly disagrees

Name-keyed set difference. It groups value-sets defined under the same name in two places and fires only when their membership diverges — the polarity-opposite of a copy-paste detector, which goes silent the moment two copies drift apart. Five guards keep it from firing on coincidental name-clashes.

The decision log — we measured what not to ship

The depth no prompt has is the record of detectors we deliberately rejected after measuring them. Each was put through a design workshop and then an independent adversarial panel — reviewers blind to the verdict, told to try their hardest to ship it — left a frozen specification, and named exactly what would unblock it. We added no green test for any deferred check, because a test you co-designed to pass proves the wrong thing. Several detector families were measured and shelved this way; the engine ships only the checks whose decidable rule can’t be confused with a guess.

~10 issue families

On a whole-system graph

Each detector grounds to a criticality-ranked, lifecycle-tracked, verified finding — report-only, never touching your source.

15 of 25 languages · 22-target oracle

Field-validated, not bench-only

~1.12M lines of real open-source code across 15 stacks, plus an independent oracle of 22 real bug fixes in 4 languages.

331 checks · 72 sections · 92 mutations

Regression-tested on every change

A weekend prompt has zero regression checks. This has a deterministic gate that must stay green — and a meta-harness proving each check is load-bearing (92/92 killed).

Every fire shown across this page is a structural observation — recorded, never filed as a bug or published. DeepInit is report-only and 100% local; it writes context and flags problems, it does not build a graph for you to explore or run your code.

Who & when

Run it the moment a codebase outgrows what an agent can infer on its own.

You inherited a legacy / under-documented repo

Point DeepInit at it and get a grounded map in minutes: the architecture, the components, and the non-obvious rules a new engineer (or agent) would trip over — every claim cited to a file:line you can open.

You’re in a large, fast-moving codebase

Even code your agent helped write drifts — rules accrete, modules start sharing state, the live schema moves, and a single hand-written CLAUDE.md goes stale. DeepInit keeps a grounded, current CLAUDE.md and refreshes only what changed (/deep-init-update / /deep-init-check) — so the agent works from what the code is now, not what it was when you last wrote it down.

You’re onboarding a coding agent

The lean CLAUDE.md (and the AGENTS.md / Copilot / Cursor / Windsurf projections) give the agent the load-bearing context up front — so it stops guessing the conventions and architecture on every task.

You're about to refactor

Before you move things, see the key invariants, the boundary rules, and the hidden couplings (the same drift / contradiction / circular-dependency families it reports) — so the refactor doesn't quietly break an unwritten rule.

Straight answers

The four questions everyone asks.

Isn't this just /init?

No. /init writes a quick starter file from a one-pass read. DeepInit parses your code first (real AST via Graphify, with a graceful fallback), grounds every claim to a verified file:line, separates a lean always-loaded tier from a deep on-demand one, and reports the problems it finds — and it's regression-tested so it doesn't quietly drift as the model changes. The comparison table above lays out the difference column by column.

Does it touch my source code?

No. It is report-only. It writes documentation (a CLAUDE.md owned region and an .ai/ folder) and never edits your source. It writes a .bak before touching any existing file and preserves human-authored content byte-for-byte.

Will it leak my code?

No — it's 100% local. The skill declares no network tool; there is no egress path (we gate that as a test). Parsing and analysis run on your machine in your existing agent session. Secrets and PII are redacted before anything is written, and the report it generates opens offline with zero network calls.

What does it cost to run?

It runs in your own agent session, so the cost is the model tokens of one analysis pass — no subscription, no API key for the parser. A small repo is an inexpensive single pass; a large one costs more. We're finishing a clean per-tier benchmark before publishing a dollar figure rather than inventing one.

Install & activate

Install the plugin once. Then run /deep-init.

DeepInit ships as a Claude Code plugin. To be clear, “plugin” and “skill” aren’t alternatives — the plugin is just the delivery package, and the /deep-init skill that does the work lives inside it. Installing the plugin is how you get the skill; after that, you run /deep-init in any project. No subscriptions, no API keys, no servers.

1 · Add the marketplace, install the plugin

/plugin marketplace add deepfusionlabs/deep-init
/plugin install deep-init@deepfusionlabs-deep-init

These are slash commands you type into the Claude Code chat (not your terminal). They install from DeepFusion Labs’ plugin marketplace on GitHub.

2 · Reload so the plugin loads

Claude Code reads plugin commands only at startup, so run /reload-plugins (or start a new session). A window reload won’t pick it up. Running Claude Code inside VS Code or JetBrains? Restart the IDE itself — reloading the editor window isn’t enough.

3 · Run it, in any project

/deep-init      # zero config — the full, thorough analysis (2 review cycles). The whole getting-started.

Updating later

/deep-init-upgrade   # pulls the newest version on one confirm, then guides the reload

Requirements: Claude Code. DeepInit is free and MIT-licensed; its public repo is coming soon — once it’s live, the two commands above work for anyone, with no access step or sign-up.

More control, when you want it

A bare /deep-init is the zero-friction default. Beyond it, a curated menu of slash commands — no flags to memorize, nothing to mistype:

You want to…What it doesCommand
The full run (default)Thorough — 2 adversarial review cycles/deep-init
Maximum scrutiny3 adversarial review cycles — for code you can’t get wrong/deep-init-aggressive
A quick first passSkips the review cycles — faster and cheaper/deep-init-fast
Refresh only what changedIncremental re-analysis of the touched components + issue lifecycle/deep-init-update
Check it’s still true0-token staleness + broken-citation audit (no model call)/deep-init-check
Tune the run with buttonsDepth · issues · outputs · scope · cost — a native picker, no typing/deep-init-customize
Translate the reportEmit report.<lang>.html in 8 languages (English stays canonical)/deep-init-translate
Preflight (0 tokens)Tools, scope, resolved config, families, cost estimate/deep-init-doctor
Which version is runningLoaded vs on-disk — tells you if you need to reload/deep-init-version
Update the pluginPull the newest from the marketplace, on one confirm/deep-init-upgrade
Every command + optionGrouped and ordered by how often you’ll use it/deep-init-help

Type-safe by design — no flags to memorize. Each option lives where it costs you least: a command for the common dials, a button picker (/deep-init-customize) for the rest, and a JSON-Schema-validated .ai/deepinit.config your editor autocompletes and checks before you run. The literal flags and natural language still work for power users and CI.

It even knows its own version. Claude Code loads a plugin’s commands once per session, so after an update you can’t normally tell what’s actually running — /deep-init-version compares the loaded version against what’s on disk and tells you if you need to reload, and /deep-init-upgrade pulls the newest in one confirm. Most plugins can’t tell you what’s live.

Zero setup: DeepInit checks for its one dependency (scc) and installs it for you if it’s missing. Graphify gives richer parsing and installs the same way — optional; if you skip it during setup, DeepInit falls back to ctags/grep.

Open source

Built in the open — no lock-in.

DeepInit by Deep Fusion Labs is free and MIT-licensed, built with a research-first process — every choice traced to evidence and checked before a line was written.

No lock-in: no proprietary format, nothing to escape from — DeepInit writes plain markdown into the files your agent already reads. For Claude Code that's CLAUDE.md (loaded natively — the grounded replacement for /init); for the other tools it emits the open AGENTS.md standard (Agentic AI Foundation, under the Linux Foundation; already used by 60,000+ repositories) plus per-tool rule files (.cursorrules, .github/copilot-instructions.md, .windsurf/rules/). Generate once and any agent — Claude Code, Codex, Cursor, GitHub Copilot, Google Jules — can use it; remove DeepInit and your context files still work. Claude Code gets first-class support as a native skill. And it all runs 100% locally — no server to keep running — reading only your code, with a secret/PII redaction gate that scrubs detected secrets before anything is written.