Skip to content

Auditing every agent API call with tamper-proof logs

3 min read Fullmakt Team

  • audit
  • security
  • observability

The first question anyone asks after an incident is simple: what happened? For AI agents acting against real APIs, the honest answer is often “we’re not sure.” Logs are scattered across the model host, the tool layer, and each upstream service, none of them agree on identity, and any of them could have been edited. That uncertainty is expensive — in a security review, a customer escalation, or a compliance audit.

An agent platform should be able to answer, for any call: which agent did what, when, on whose behalf, and was it allowed? And it should be able to prove that answer hasn’t been altered after the fact.

What belongs in an agent audit record

A useful entry captures the whole decision, not just the outcome:

  • Who — the agent principal, and the human or system it acted for.
  • What — the tool or method, the upstream host, and a hash of the arguments.
  • When — a trusted timestamp, plus latency.
  • Why it was allowed — the policy decision and which credential reference was used (a reference, never the secret itself).
  • Result — status code, success or failure, and any error.

Hashing the arguments instead of storing them raw is deliberate: you get integrity and correlation (“was this the same request as that one?”) without turning your audit log into a second copy of sensitive payloads.

Why “append-only” isn’t enough

Most systems call their logs append-only and stop there. But if an attacker — or a careless process — can reach the store, append-only is a convention, not a guarantee. Entries can be deleted or quietly rewritten, and nothing about the remaining records reveals the gap.

The stronger property is tamper-evidence: any modification, including a deletion, becomes detectable.

Cryptographic chaining, briefly

The idea is the one behind a blockchain’s ledger, minus the cryptocurrency. Each audit entry includes a hash of the previous entry:

entry[n].chainHash = hash(entry[n].content + entry[n-1].chainHash)

That single field links every record into a sequence. To verify the log, you recompute the chain from the start and confirm it matches. If someone edits entry 42, its hash changes, which breaks 43, 44, and everything after — the tampering is obvious and localized. Delete an entry and the chain simply doesn’t connect. You can’t quietly rewrite history without rewriting all of it, and even that fails the moment you’ve anchored a hash anywhere outside the store.

Turning the log into answers

A trustworthy record is most valuable when you can ask it questions in real time, not just during forensics:

  • Live tail — watch agent activity as it happens.
  • Per-principal history — everything one agent has done, end to end.
  • Behavioral baselines — what “normal” looks like for an agent, so drift stands out.
  • Alerts — fire when an agent exceeds a rate, touches a sensitive endpoint, or starts failing policy checks.

Because every call already flows through the broker that issues credentials, the audit log is a natural byproduct of how access works — not a separate pipeline you have to remember to wire up. The same chokepoint that enforces least privilege is the one that records what happened.

The payoff

When the audit trail is complete and tamper-evident, “what happened?” stops being a research project. You can show an auditor exactly which agent touched which system and prove the record is intact. You can answer a customer the same day. And you can spot misuse while it’s happening instead of reconstructing it later.

For autonomous software acting on your behalf, that kind of accountability isn’t a nice-to-have — it’s the price of letting agents touch anything that matters.