Open source · Runs locally · MIT License

Ask your codebase precise questions.
Get grounded answers.

Noumenon builds a Datomic knowledge graph from your repository so agents query structured facts instead of dumping raw files into context windows. In benchmarks across 9 repos and 8 languages, graph-augmented answers scored 2× higher on average.

Terminal
$ clj -M:run ask -q "Which files are the biggest risk hotspots?" ./my-repo

# iteration 1 — querying files-by-complexity (47 results)
# iteration 2 — querying commit frequency and authorship
# iteration 3 — cross-referencing co-change patterns
# done — 3 iterations, 4 Datalog queries

Risk hotspots (high churn + high complexity + few contributors):

  file                      complexity     bus-factor  changes
  src/core/parser.ts        very-complex   1           47
  src/api/middleware.ts      complex        2           38
  src/db/migrations.ts      complex        1           31

Why structured knowledge

Context windows don't scale with repo size. A queryable graph returns exactly what you need.

Context windows don't scale

A Datalog query returns exactly the entities a question needs. In benchmarks, graph context improved LLM accuracy by +19.7 percentage points on average.

Answers you can't verify

When an LLM answers from raw context, you can't tell what it looked at. Grounding answers in queryable facts makes them auditable.

Code structure without history

Who changed this file, how often, and what else changed with it? Churn, bus factor, and co-change patterns live in git history, not the current tree.


Three complementary layers

Each layer adds different information. The first two require no LLM. Use them independently or together.

No LLM required

Deterministic Facts

Git history, file structure, authorship, change frequency, and cross-file import graphs. Immutable Datomic facts with stable entity identities. Fully reproducible.

LLM-powered

Semantic Annotations

Complexity ratings, safety concerns, architectural roles, function signatures, deprecation markers. Up to 20 parallel workers with automatic retry and checkpointing.

Iterative reasoning

Recursive LLM Querying

ask takes a natural-language question, formulates Datalog queries, reads results, and iterates until it has a grounded answer from the graph.

Immutable

Datomic stores facts, not state. Every transaction is an accretion — nothing is overwritten.

Composable

Datalog queries compose naturally. Join across git history, code structure, and semantics in a single query.

Measurable

Built-in A/B benchmarks. 9 repos, 8 languages: 18.2% (raw) to 37.9% (graph-augmented).

Local

Runs on your machine. Only the LLM provider you choose sees file contents — sensitive files are never sent.


How it works

Independent, idempotent steps. Run individually or together. The graph stays in sync as your codebase evolves.

Import Git history & files deterministic
Enrich Import graph deterministic
Analyze LLM semantics concurrent
Query / Ask / Serve Use the graph iterative

update detects HEAD changes and processes only new commits. watch polls continuously. The MCP server auto-syncs before each query.


Measured on real codebases

40 questions per repo, two conditions: Raw (source file listings only) vs. Full (knowledge graph context). Sonnet via GLM, March 2026.

Repository Language Raw Full Improvement
flaskPython12.5%41.2%+28.8pp
fzfGo13.8%42.5%+28.8pp
expressJavaScript18.8%45.0%+26.2pp
freshTypeScript12.5%35.0%+22.5pp
guavaJava2.5%23.8%+21.3pp
ripgrepRust12.5%30.0%+17.5pp
redisC11.3%26.3%+15.0pp
ringClojure51.2%60.0%+8.8pp
noumenonClojure28.8%37.5%+8.8pp
Average18.2%37.9%+19.7pp

Biggest gains on unfamiliar repos

Flask, fzf, and Express saw +26–29pp — the graph fills in what the LLM lacks from training data.

Factual lookups improved most

Single-hop accuracy (e.g. “which files import X?”) jumped from 29.5% to 65.9% on Ring — +36pp.

8 languages, zero failures

Clojure, Python, JavaScript, TypeScript, C, Go, Rust, and Java all completed the full pipeline successfully.

Read the full benchmark report →


Built for developers and agents

CLI and MCP tools for people who work in terminals and agents that query programmatically.

Concurrent & built for scale

1–20 parallel workers with automatic retry. Incremental sync processes only new commits. Append-only storage keeps memory predictable as the graph grows.

Git history as a first-class citizen

Commits, authorship, change frequency, co-changed files, and fix-commit ratios are all queryable — not just the current tree.

MCP server

Expose the graph as MCP tools for Claude Desktop, Claude Code, or any MCP-compatible agent.

Multi-language import extraction

Deterministic import graphs for Clojure, Python, JS/TS, Rust, Java, C#, C/C++, Elixir, and Erlang. LLM analysis works with any language.

Cost transparency

Token estimates before you start, per-file telemetry as it runs, aggregate totals when it finishes. Query LLM spend with llm-cost-by-file and llm-cost-by-model.

Benchmark framework

40-question A/B test comparing graph-augmented vs. raw-context answers on your repo. Deterministic scoring by default, LLM-judged opt-in.

Sensitive file protection

Files matching known secret patterns (.env, *.pem, credentials) are tracked as entities but never read or sent to any provider.

Open data model

Stable entity identities. show-schema lists every attribute. Named queries are plain EDN files you can read, edit, and version control.

Perforce support

Works with Helix Core repositories via git-p4. Import a P4 depot as a Git mirror, then query it like any other repo. Helix4Git mirrors work directly.


Named queries

50 pre-built Datalog queries in EDN. Deterministic, reproducible, composable. Run from the CLI or MCP server.

  • Complexity and change hotspots
  • Bug-prone files by fix commit frequency
  • Co-changed file pairs
  • Top contributors per subsystem
  • Component dependency graph
  • LLM cost tracking by model and file

Or use ask to query the graph with natural language — the LLM formulates Datalog queries and iterates until it has a grounded answer.

Terminal
$ clj -M:run query list

  hotspots                Complexity and change frequency
  bug-hotspots            Files with high fix-commit ratio
  co-changed-files        Files that change together
  top-contributors        Most active authors per file
  files-by-complexity     Ranked by semantic complexity
  files-by-layer          Grouped by architectural layer
  component-dependencies  Component coupling graph
  dependency-hotspots     High fan-in import targets
  pure-segments           Pure functions and their files
  llm-cost-total          Aggregate analysis cost

$ clj -M:run query hotspots ./my-repo

file                      changes  complexity
src/core/parser.ts        47       very-complex
src/api/middleware.ts     38       complex
src/db/migrations.ts      31       complex
src/auth/session.ts       28       complex

Get started

Requires JDK 21+ and Clojure CLI. See the README for JAR and dependency options.

1

Clone and run

Or download the standalone JAR from GitHub Releases.

git clone https://github.com/leifericf/noumenon.git && cd noumenon
2

Import a repository

Local path or remote URL. Import is deterministic and takes seconds for most repos.

clj -M:run import /path/to/your/repo
3

Query or ask

Named queries for instant results, or natural-language questions with iterative LLM reasoning.

clj -M:run ask -q "Which files have the most risk?" /path/to/your/repo