No tool call ledger
The agent did something, the side effect happened, but the call itself was not retained. There is nothing to review when ai governance or ai compliance teams ask for evidence.
// IN-HOUSE BUILD · AGENT OBSERVABILITY
AI governance fails when nobody can answer the question what did the agent do and why. CloudNSite captures, retains, and reasons over every tool call across its agent stack, from the first input to the final action. This case study documents the audit substrate, the seven-layer memory, and the prompt-injection guard that fires on every write.
// Why most AI deployments fail audit
ai governance fails when evidence lives in screenshots, partial logs, and memory nobody can trace. ai compliance teams need durable records, not recollections from the build team. ai explainability starts with the ability to replay the action from input to effect.
The agent did something, the side effect happened, but the call itself was not retained. There is nothing to review when ai governance or ai compliance teams ask for evidence.
Even when calls are logged, the model output that led to each call is not. You cannot replay why the agent acted, so ai explainability turns into guesswork.
What the agent thought it knew came from somewhere. Most stacks cannot tell you from where, or whether the source is still trusted.
// Every tool call, on the record
Every tool call across the agent stack is captured to a tamper-evident log. The schema is small and stable on purpose. Querying it should not require a data engineer, which is why ai observability, llm observability, agent tracing, and tool call logging share one substrate.
The exact prompt and context provided to the model that produced the call are retained. The record includes which memory rows were retrieved and which were not.
The model output that proposed the call is captured before any post-processing or schema validation. This is the thinking, not the cleaned-up version.
The log records which tool was chosen, from how many candidates, and with what alternative scores if the model emitted them. Agent tracing starts before the tool runs.
The exact arguments passed to the tool are schema-validated, normalized, and stored alongside the raw model output for diff. Tool call logging keeps both the structured and raw forms.
The record stores what the tool returned, including errors, retries, and any retry reasoning. Failed calls are first-class citizens in ai observability and llm observability.
The log records what changed in the world: which row was written, which message was sent, which file was modified. Every effect is linked back to the call that caused it.
Any decision can be replayed. Any audit question can be answered with a query, not a meeting.
// Memory that remembers what matters
agent memory is not one database with a vague recall prompt. llm memory, durable summaries, structured facts, and ai agent memory each need a boundary so the agent knows what to trust, what to cite, and what to forget.
Entities and relationships live here, the layer that knows this customer belongs to this account belongs to this region. It is used when reasoning needs structure, not similarity.
Semantic embeddings of unstructured text live here for questions that are fuzzy and not exact-match. Every retrieval is cited back to source so llm memory can be reviewed.
Durable summaries of past interactions are written at conversation close, not at every turn. This is the agent memory layer that lets a new session feel like a continuation.
Facts with a schema live here: pricing, status flags, configuration, identity. This layer does not invent because it cannot write outside the structure.
The active session's working memory is fast to read, deliberately small, and cleared on session end. It keeps ai agent memory useful without turning temporary context into permanent belief.
What one agent learned that the next agent should inherit is scoped to a project or account. This prevents agents from rediscovering the same fact.
What was decided, by whom, under what hypothesis, and with what data lives here. This is the layer that lets a person reconstruct an outcome six months later.
Entity-explicit phrasing moved recall accuracy from roughly 50 percent to roughly 100 percent. The data shape is the discipline.
// What never makes it to memory
Memory is a write target. Anything an attacker can convince an agent to write becomes future-trusted context. The guard runs before the write, not after.
The guard scans every candidate memory write for prompt-injection patterns, untrusted-source provenance, and entity coherence with existing knowledge. Writes that fail are quarantined to a shadow log with the full reasoning. The guard is itself audited, the same way every tool call is. We watch the guard's false-positive rate weekly.
// What this unlocks
The audit log is the evidence. Access controls, change history, and decision provenance are all queryable per period.
Every PHI touch is on the record with the agent that touched it, the source it was retrieved from, and the action that followed.
When leadership asks why did the agent do that, the answer is a query, not a reconstruction.
// What we ship for clients
The capture layer wraps tool calls without rewriting the agents themselves, so ai agent observability lands where the work already runs.
Log retention, redaction, and export flow follow your governance rules. The audit substrate adapts to your ai agent governance requirements.
The audit UI is yours, not a vendor portal. Reviewers can inspect calls, memory writes, and effects inside your operating environment.
tool calls captured
fields retained per call
memory layers, each scoped
unaudited writes
We wire the audit substrate into your agent stack and hand the reviewer interface to your governance team.