Experimental, early beta. Data model and interfaces are unstable; expect breaking changes between releases.

Introspect

Noumenon improves itself by running an autonomous loop. Propose a change, benchmark it against a fixed question set, keep the change if quality goes up, revert it if not.

The Autonomous Loop

Each iteration picks one of five targets, drafts a change, applies it, runs the benchmark suite, and compares the new score against the baseline. The loop is bounded by iterations, wall-clock time, or LLM cost, whichever you set first.

Five Targets

TargetWhat changes
system-promptEdit the system prompt that drives the Ask agent.
examplesAdd, remove, or replace few-shot examples shown to the model.
rulesAdjust derived rules used during query planning.
codePatch a small region of the Ask agent's source code.
trainRetrain the routing model that picks which files and queries to seed before the agent runs.

Anatomy of an Iteration

A run prints its baseline, then for each iteration: the chosen target, the proposed change, the resulting score, and an IMPROVED or reverted verdict.

Terminal
$ noum introspect ./my-repo --max-iterations 5

# baseline mean=52.3% (22 deterministic questions)
# === Iteration 1/5 ===
# target=system-prompt: "Fix empty result handling"
# IMPROVED +6.8% (52.3% -> 59.1%)
# === Iteration 2/5 ===
# target=examples: "Add dependency query patterns"
# reverted (delta=-4.5%)
# === Iteration 3/5 ===
# target=examples: "Replace low-impact examples"
# IMPROVED +11.4% (56.8% -> 68.2%)

Introspect complete: 2 improvements in 3 iterations (final score: 68.2%)

Signals from Ask Sessions

Benchmarks are not the only input. The optimizer's meta-prompt also includes a {{ask-insights}} block aggregated from real Ask sessions in the meta database. Three streams feed it:

  • Agent self-reflection. When the Ask agent emits :missing-attributes, :quality-issues, and :suggested-queries on a session, those are aggregated by frequency. "Function dependencies (reported 10×)" becomes a signal to add a schema attribute. "Empty commit messages (reported 7×)" signals a data-quality issue.
  • Telemetry queries. ask-empty-results surfaces queries that returned nothing — gaps in the data model. ask-popular-queries surfaces patterns the LLM writes most often — candidates for named queries with optimized descriptions in the system prompt. ask-error-steps surfaces parse failures — signals for prompt improvements.
  • Explicit feedback. Thumbs-up/thumbs-down on a past session is recorded on :ask.session/feedback. Negative feedback is surfaced first in the meta-prompt.

The result is a closed loop: agents try to answer questions, tell the system what's missing or broken, and introspect proposes the fix on the next run. Real questions are a higher-quality signal than synthetic benchmarks because they reflect what people actually need.

Run noum introspect <repo> --help for the full flag set, or invoke the MCP equivalents: noumenon_introspect, noumenon_introspect_start, and noumenon_introspect_status. The Ask agent's side of the loop lives at Ask.