Introspect
Noumenon improves itself by running an autonomous loop. Propose a change, benchmark it against a fixed question set, keep the change if quality goes up, revert it if not.
The Autonomous Loop
Each iteration picks one of five targets, drafts a change, applies it, runs the benchmark suite, and compares the new score against the baseline. The loop is bounded by iterations, wall-clock time, or LLM cost, whichever you set first.
Five Targets
| Target | What changes |
|---|---|
system-prompt | Edit the system prompt that drives the Ask agent. |
examples | Add, remove, or replace few-shot examples shown to the model. |
rules | Adjust derived rules used during query planning. |
code | Patch a small region of the Ask agent's source code. |
train | Retrain the routing model that picks which files and queries to seed before the agent runs. |
Anatomy of an Iteration
A run prints its baseline, then for each iteration: the chosen target, the proposed change, the resulting score, and an IMPROVED or reverted verdict.
$ noum introspect ./my-repo --max-iterations 5 # baseline mean=52.3% (22 deterministic questions) # === Iteration 1/5 === # target=system-prompt: "Fix empty result handling" # IMPROVED +6.8% (52.3% -> 59.1%) # === Iteration 2/5 === # target=examples: "Add dependency query patterns" # reverted (delta=-4.5%) # === Iteration 3/5 === # target=examples: "Replace low-impact examples" # IMPROVED +11.4% (56.8% -> 68.2%) Introspect complete: 2 improvements in 3 iterations (final score: 68.2%)
Signals from Ask Sessions
Benchmarks are not the only input. The optimizer's meta-prompt also includes a {{ask-insights}} block aggregated from real Ask sessions in the meta database. Three streams feed it:
- Agent self-reflection. When the Ask agent emits
:missing-attributes,:quality-issues, and:suggested-querieson a session, those are aggregated by frequency. "Function dependencies (reported 10×)" becomes a signal to add a schema attribute. "Empty commit messages (reported 7×)" signals a data-quality issue. - Telemetry queries.
ask-empty-resultssurfaces queries that returned nothing — gaps in the data model.ask-popular-queriessurfaces patterns the LLM writes most often — candidates for named queries with optimized descriptions in the system prompt.ask-error-stepssurfaces parse failures — signals for prompt improvements. - Explicit feedback. Thumbs-up/thumbs-down on a past session is recorded on
:ask.session/feedback. Negative feedback is surfaced first in the meta-prompt.
The result is a closed loop: agents try to answer questions, tell the system what's missing or broken, and introspect proposes the fix on the next run. Real questions are a higher-quality signal than synthetic benchmarks because they reflect what people actually need.
Run noum introspect <repo> --help for the full flag set, or invoke the MCP equivalents: noumenon_introspect, noumenon_introspect_start, and noumenon_introspect_status. The Ask agent's side of the loop lives at Ask.