Data Safety
Real codebases have real secrets in them. Noumenon blocks well-known sensitive files from analysis, skips binary assets automatically, and records every LLM call so you can see exactly what was sent and what it cost.
Sensitive Files
Real codebases contain real secrets, especially in their git history. Noumenon tracks the existence of sensitive files (so commit history stays accurate) but never reads their contents. The filter runs before any file content is loaded for analysis or import-graph extraction.
| Pattern | Examples | Notes |
|---|---|---|
| Filename starts with | .env | Anything that looks like a dotenv file. Allowlisted: .env.example, .env.sample, .env.template. |
| Extensions | *.pem, *.key, *.p12, *.pfx, *.keystore, *.jks, *.cert | Private keys and certificates. |
| Exact filenames | .npmrc, .pypirc, .netrc, .htpasswd, .pgpass, credentials.json, token.json | Well-known credential files. |
| Path segments | .ssh/ | Anything inside a .ssh directory at any depth. |
| SSH key prefixes | id_rsa*, id_ed25519*, id_ecdsa* | OpenSSH private-key files. |
If your repo has secrets in a non-standard path, the existing list is easy to extend. Patterns live in src/noumenon/files.clj under sensitive-extensions, sensitive-basenames, and sensitive-path-segments.
What Gets Analyzed
Only files with a recognized programming-language extension are candidates for the analyze stage. Everything else (images, archives, fonts, binaries, compiled artifacts, lockfiles, config formats Noumenon doesn't understand) stays in the file index but never goes to the LLM.
The recognized set covers the usual suspects: Clojure, Python, JS/TS, Rust, Java, C#, C/C++, Go, Elixir, Erlang, Ruby, Swift, Kotlin, Scala, Haskell, OCaml, Lua, R, Perl, PHP, Terraform, Protobuf, GraphQL, plus shell, SQL, HTML, CSS, JSON, YAML, XML, TOML, EDN, and the MSBuild project formats. If your language isn't on the list, the file is skipped silently.
Perforce clones get a separate exclusion pass for game-engine binaries (Unreal .uasset, Unity .prefab, .fbx/.png/.wav/.mp4 families) so the working tree stays small. See Source control for the full list and how to override it.
What's Actually Sent to the LLM
- Analyze: one file's source at a time, only if it passes the sensitive-file filter and has a recognized language. Output is structured metadata (complexity, code smells, segments), not free-form text.
- Synthesize: summaries from earlier analyze passes plus the directory structure. No raw source. Hierarchical map-reduce keeps prompts bounded.
- Ask: a TF-IDF-seeded short list of relevant files, plus the question. The agent can request specific files by path, never the whole tree.
- Introspect: prompts, examples, and benchmark scores. No user code beyond the fixed benchmark question set.
Branch Deltas Stay Local
Experimental — interfaces may change between releases. When a developer is on a feature branch, Noumenon materializes a sparse delta database under ~/.noumenon/deltas/ containing only the files that differ from the hosted trunk basis. The delta lives on the developer's own machine, never on the shared instance. Federated queries combine trunk + delta in a single HTTP roundtrip on the local-mode daemon, so the working-branch view never leaves the laptop. See Source control for the full picture.
Deployment Shapes
There is no runtime-mode toggle. The daemon's bind address is the deployment signal:
- Local (bind
127.0.0.1, the default). Fine for laptop use. LLM credentials resolve from env vars first, then fall back to~/.noumenon/credentials— sonoum setupis enough; you don't need tosourceanything in your shell. - Shared service (bind non-loopback, e.g.
0.0.0.0). The daemon disables the file-credentials fallback at startup, so LLM credentials must come from env vars on the host. A user's~/.noumenon/credentialscannot leak into a multi-tenant deployment. The daemon also warns ifNOUMENON_LLM_BASE_URLishttp://— terminate TLS at a reverse proxy and front the upstream call withhttps://in production. Authentication of the client is unchanged (NOUMENON_TOKENand the Datomic-stored token table).
If you want multi-model flexibility, point NOUMENON_LLM_BASE_URL at a router (OpenRouter, self-hosted LiteLLM, etc.) and let it handle provider selection. Noumenon itself never routes.
See Run as a shared service for how to wire these into a Docker deployment.
Cost Transparency
Every LLM call is recorded in Datomic with the model, the input token count, the output token count, and an estimated dollar cost. Provider and model-source provenance go on the transaction too. Per-file telemetry streams while analyze runs; totals land in the graph when it finishes.
Three named queries cover the spend axis:
llm-cost-totalsums input tokens, output tokens, and dollars across every recorded call.llm-cost-by-modelgroups the same totals by model.llm-cost-by-filegives per-file analyze cost, most expensive first.
Each runs through noum query <name> <repo>, noumenon_query over MCP, or the HTTP API. The dollar figure is an estimate from a built-in price table — directional, not invoice-grade.
If you find a sensitive-file pattern Noumenon should recognize but doesn't, please open an issue. The blocklist is meant to be conservative, but it's also community-driven.