Long-running agents with durable state
and human-in-the-loop.
Most agentic demos work for the demo and break in production — no checkpointing, no idempotency, no plan for the third tool call that returns a 500. We build agents that run for hours, recover from failures, and surface their reasoning so your operations team can intervene.
Agentic systems take a goal, decompose it, call tools, observe results, and loop. They’re useful for workflows where rules-based automation breaks: claims triage, research assistants, document orchestration, customer ops. The production constraints are different from RAG — long-running execution, durable state, idempotency, replay, observability of the agent’s plan.
Agents that ship past the demo.
An agentic system in production
Orchestrator, tool layer, memory store, HITL approval gates, retry/replay logic — running in your environment under your audit framework.
Trace observability
Every agent run is replayable. Operations teams can debug a bad outcome by walking the plan, the tool calls, and the observations.
Failure-mode coverage
Non-deterministic retries, tool timeouts, circular plans, malformed outputs — handled, instrumented, and bounded by configurable budgets.
Eval gates in CI
Pre-deploy harness that runs the agent against curated tasks and blocks regressions on plan quality, tool selection, and outcome accuracy.
A handover
Your team owns the prompt library, the tool catalog, the trace pipeline, and the runbook. We don't keep the keys.
Engineering, not stagecraft.
- Orchestration
- Anthropic tool use, OpenAI Assistants, LangGraph, or custom — picked per use case based on durability and HITL requirements.
- State
- Postgres for durable runs, Redis for short-term memory, S3-compatible for artefacts. State is replayable.
- Tooling
- Typed tool interfaces (JSON schema), per-tool timeouts, retry budgets, dead-letter queues. Tool catalog versioned in git.
- Tracing
- Langfuse or custom OTEL pipeline. Every trace includes plan, tool calls, observations, costs, and outcome.
- HITL
- Approval gates at high-impact actions — payouts, external messages, regulated decisions. Configurable per workflow.
- Eval
- Scenario-based agent evals + LLM-as-judge for plan quality. Block deploy on regression.
How it runs
A top-10 insurer's claims team handled simple claims in days and complex claims in weeks. We shipped an agentic triage system that classifies, requests missing documents, and routes — with HITL on every payout decision. Time-to-disposition for simple claims fell under one hour. SOC 2 review passed on first audit.
Top-10 insurer · Agentic claims triage · SOC 2 audited
What buyers actually ask
Are agentic systems production-ready?
How do you handle non-determinism?
What's the cost of a failed run?
Can the agent call our existing internal tools?
Talk to an engineer, not a salesperson.
30 minutes. No slides. Bring an architecture, a stalled roadmap, or a vendor proposal you want a second opinion on. We'll tell you what we'd do.