Ask | Noumenon

The Ask agent answers questions about a repository by iterating over the knowledge graph. It starts warm: a TF-IDF retrieval and a routing-model hint pre-load scope and method before the first LLM call. Every step is recorded so the system can improve itself from real usage.

The Loop

noum ask <repo> "<question>" runs an iterative agent against the knowledge graph. The agent alternates between proposing the next move and the daemon executing it. Five tools are available per turn:

:query run a named or raw Datalog query.
:schema inspect attributes for a namespace.
:rules expand a derived rule used in queries.
:reflect annotate this session with what's missing or broken.
:answer emit the final answer and stop.

Each turn picks one tool. The loop is bounded by an iteration budget (default 10). Every step is recorded on the session entity in the meta database, so the full reasoning trace is queryable after the fact.

Two-Tier Warm Start

Before the first iteration, the agent receives two hints. They answer different questions:

vector-seed (where to look). A TF-IDF cosine-similarity search of the question against per-file and per-component summaries. The top fifteen results land in the system message as relevant entities. Built at noum embed time and cached on disk as a Nippy file at <db-dir>/<db-name>/tfidf-index.nippy.
model-hint (which queries to try). A small feed-forward neural net trained on past benchmark and session data emits the top three Datalog patterns to try first. Pure-Clojure inference, runs in microseconds, zero token cost. Configured by resources/model/config.edn and retrained by the introspect :train target. The agent is free to ignore the suggestion; when no model is available the hint is omitted and behavior is unchanged.

Together the two hints give the agent both scope (which entities matter) and method (which queries to start with) before any LLM call runs. In benchmarks the TF-IDF tier alone captures roughly three quarters of the full-KG mean accuracy; combined with the agent loop, the Ask agent ships with that lift built in. See the embedded layer for the comparison.

When You Don't Want the Full Loop

The TF-IDF index is also exposed directly as noumenon_search (MCP) and noum search (CLI). It returns ranked file and component matches in milliseconds with zero LLM calls. Use it when you want "which files are about X" without paying for an Ask session.

Agent Self-Reflection

When the agent answers, it can also emit a :reflect step. This is structured feedback the system uses to improve itself:

{:tool :reflect
 :args {:missing-attributes  ["function-level dependency graph"
                       "test-to-source file mapping"]
        :quality-issues     ["some commit messages are empty"
                       "author emails inconsistent across repos"]
        :suggested-queries  ["files by cyclomatic complexity"
                       "test coverage per source file"]
        :notes "The schema has file-level analysis but no
               function-level granularity."}}

Each field is persisted on the session entity in the meta database (:ask.session/missing-attributes, :ask.session/quality-issues, :ask.session/suggested-queries, :ask.session/agent-notes). Aggregated by frequency across all sessions, these fields feed the introspect optimizer's {{ask-insights}} block; see Signals from Ask Sessions for the full loop.

Telemetry Queries

The session store is queryable. Every named query below ships in the catalog and runs against the meta database via noum query, noumenon_query (MCP), or the HTTP API.

Query	Surfaces
`ask-empty-results`	Datalog queries the agent ran that returned nothing.
`ask-unanswered`	Sessions where the agent exhausted its iteration budget without an answer.
`ask-error-steps`	Steps where the LLM emitted output the agent couldn't parse.
`ask-popular-queries`	Datalog patterns the agent writes most often. Candidates for new named queries.
`ask-token-cost`	Spend per session: input/output tokens and dollars.
`ask-by-caller`	Spend split by channel and caller (CLI, MCP, HTTP, agent vs human).
`ask-missing-attributes`	What the agent reports as missing from the data model.
`ask-quality-issues`	Data-quality problems the agent observed.
`ask-suggested-queries`	Named queries the agent thinks should exist.

Browse the full catalog at /queries/.

The agent improves the system by being watched. Every session is data; the introspect loop reads that data; the next session answers a little better. The price is that real questions get stored verbatim in the meta database — see Data Safety for what that does and doesn't include.