Experimental, early beta. Data model and interfaces are unstable; expect breaking changes between releases.

Data Safety

Real codebases have real secrets in them. Noumenon blocks well-known sensitive files from analysis, skips binary assets automatically, and records every LLM call so you can see exactly what was sent and what it cost.

Sensitive Files

Real codebases contain real secrets, especially in their git history. Noumenon tracks the existence of sensitive files (so commit history stays accurate) but never reads their contents. The filter runs before any file content is loaded for analysis or import-graph extraction.

PatternExamplesNotes
Filename starts with .envAnything that looks like a dotenv file. Allowlisted: .env.example, .env.sample, .env.template.
Extensions*.pem, *.key, *.p12, *.pfx, *.keystore, *.jks, *.certPrivate keys and certificates.
Exact filenames.npmrc, .pypirc, .netrc, .htpasswd, .pgpass, credentials.json, token.jsonWell-known credential files.
Path segments.ssh/Anything inside a .ssh directory at any depth.
SSH key prefixesid_rsa*, id_ed25519*, id_ecdsa*OpenSSH private-key files.

If your repo has secrets in a non-standard path, the existing list is easy to extend. Patterns live in src/noumenon/files.clj under sensitive-extensions, sensitive-basenames, and sensitive-path-segments.

What Gets Analyzed

Only files with a recognized programming-language extension are candidates for the analyze stage. Everything else (images, archives, fonts, binaries, compiled artifacts, lockfiles, config formats Noumenon doesn't understand) stays in the file index but never goes to the LLM.

The recognized set covers the usual suspects: Clojure, Python, JS/TS, Rust, Java, C#, C/C++, Go, Elixir, Erlang, Ruby, Swift, Kotlin, Scala, Haskell, OCaml, Lua, R, Perl, PHP, Terraform, Protobuf, GraphQL, plus shell, SQL, HTML, CSS, JSON, YAML, XML, TOML, EDN, and the MSBuild project formats. If your language isn't on the list, the file is skipped silently.

Perforce clones get a separate exclusion pass for game-engine binaries (Unreal .uasset, Unity .prefab, .fbx/.png/.wav/.mp4 families) so the working tree stays small. See Source control for the full list and how to override it.

What's Actually Sent to the LLM

  • Analyze: one file's source at a time, only if it passes the sensitive-file filter and has a recognized language. Output is structured metadata (complexity, code smells, segments), not free-form text.
  • Synthesize: summaries from earlier analyze passes plus the directory structure. No raw source. Hierarchical map-reduce keeps prompts bounded.
  • Ask: a TF-IDF-seeded short list of relevant files, plus the question. The agent can request specific files by path, never the whole tree.
  • Introspect: prompts, examples, and benchmark scores. No user code beyond the fixed benchmark question set.

Branch Deltas Stay Local

Experimental — interfaces may change between releases. When a developer is on a feature branch, Noumenon materializes a sparse delta database under ~/.noumenon/deltas/ containing only the files that differ from the hosted trunk basis. The delta lives on the developer's own machine, never on the shared instance. Federated queries combine trunk + delta in a single HTTP roundtrip on the local-mode daemon, so the working-branch view never leaves the laptop. See Source control for the full picture.

Deployment Shapes

There is no runtime-mode toggle. The daemon's bind address is the deployment signal:

  • Local (bind 127.0.0.1, the default). Fine for laptop use. LLM credentials resolve from env vars first, then fall back to ~/.noumenon/credentials — so noum setup is enough; you don't need to source anything in your shell.
  • Shared service (bind non-loopback, e.g. 0.0.0.0). The daemon disables the file-credentials fallback at startup, so LLM credentials must come from env vars on the host. A user's ~/.noumenon/credentials cannot leak into a multi-tenant deployment. The daemon also warns if NOUMENON_LLM_BASE_URL is http:// — terminate TLS at a reverse proxy and front the upstream call with https:// in production. Authentication of the client is unchanged (NOUMENON_TOKEN and the Datomic-stored token table).

If you want multi-model flexibility, point NOUMENON_LLM_BASE_URL at a router (OpenRouter, self-hosted LiteLLM, etc.) and let it handle provider selection. Noumenon itself never routes.

See Run as a shared service for how to wire these into a Docker deployment.

Cost Transparency

Every LLM call is recorded in Datomic with the model, the input token count, the output token count, and an estimated dollar cost. Provider and model-source provenance go on the transaction too. Per-file telemetry streams while analyze runs; totals land in the graph when it finishes.

Three named queries cover the spend axis:

  • llm-cost-total sums input tokens, output tokens, and dollars across every recorded call.
  • llm-cost-by-model groups the same totals by model.
  • llm-cost-by-file gives per-file analyze cost, most expensive first.

Each runs through noum query <name> <repo>, noumenon_query over MCP, or the HTTP API. The dollar figure is an estimate from a built-in price table — directional, not invoice-grade.

If you find a sensitive-file pattern Noumenon should recognize but doesn't, please open an issue. The blocklist is meant to be conservative, but it's also community-driven.