Observability
Monitor agent performance, inspect execution traces, analyze token usage, and build custom dashboards to debug and optimize your agents.
Overview
Every agent execution in Connic is recorded as a run. Each run captures the full lifecycle of a request: the input that triggered it, every intermediate step the agent took, the final output, token consumption, cost estimates, and timing data. Connic provides three layers of observability to help you understand and debug your agents:
Run Logs
Search and filter all runs across your project. Find specific executions by status, agent, deployment, or content.
Execution Traces
Inspect the full step-by-step execution of any run. See every LLM call, tool invocation, and reasoning step.
Dashboards
Build custom dashboards with charts, metrics, and activity feeds to monitor trends across your agents.
Run Logs
The Logs tab in your project shows a chronological list of all agent runs. Each row displays the run status, agent name, deployment, duration, token count with estimated cost, and when it was queued. The table updates automatically every few seconds.
Filtering Runs
Use the filter bar at the top to narrow down runs:
| Filter | Description |
|---|---|
| Status | Filter by one or more statuses: queued running completed failed cancelled |
| Date Range | Select a time window using presets (24h, 7d, 30d) or a custom date range |
| Deployment | Show runs from a specific deployment only |
| Search | Free-text search across run content. Use key=value syntax to search by context values (e.g. customer_id=abc123) |
Context Search
The key=value search syntax queries against the run context. This is particularly useful if your middleware stores metadata like user IDs, session IDs, or request identifiers. You can then search for all runs associated with a specific user or session.
Inspecting a Run
Click any run in the logs table to open its detail view. The run detail provides the full picture of what happened during that execution.
Run Header
The header shows the run ID, status badge, and key metadata at a glance:
- Connector: Which connector triggered this run (with a link to the connector)
- Triggered by: If another agent triggered this run, a link to that parent run
- Duration: Total execution time (live-updating for in-progress runs)
- Token usage: Total tokens consumed, with a tooltip showing the full breakdown and estimated cost
Run Sections
Error
If the run failed, a prominent error banner shows the full error message. This is the first thing to check when investigating a failure.
Input
The payload that triggered the agent. Displayed in a structured format with support for file attachments, JSON payloads, and plain text. Toggle to raw JSON view to see the exact payload.
Output
The agent's final response. For agents with an output schema, this will be structured JSON matching the defined schema.
Context
The run context dictionary, including any values set by middleware. Expandable with both formatted and raw JSON views. Useful for verifying what data your middleware attached to the request.
Traces
The full execution trace showing every step the agent took. This is the most powerful debugging tool and is covered in detail below.
You can also re-run any execution from the run detail view. This triggers the same agent with the same input, letting you verify that a fix resolved the issue. For queued or running runs, a cancel button is available.
Execution Traces
Execution traces provide a step-by-step breakdown of everything that happened during a run. Connic captures traces using OpenTelemetry spans, organized in a hierarchical tree structure. Each span represents a discrete operation: an LLM call, a tool invocation, a middleware hook, or a sub-agent execution.
Span Types
Each span in the trace tree has a type icon and color to help you quickly identify what it represents:
| Icon | Span Type | What It Represents |
|---|---|---|
| LLM | A call to the language model. Contains the prompt (input), the model's response (output), and reasoning (thoughts) if enabled. | |
| Tool | A tool function invocation. Shows the arguments passed to the tool and its return value. | |
| MCP Tool | A tool call to an external MCP server. Includes the server name, tool name, arguments, and response. | |
| Middleware | A middleware hook execution (before or after). Shows the data flowing through the middleware. | |
| Sequential | A sequential agent orchestration step. Contains child spans for each agent in the chain. | |
| Run / Step | The top-level run or an individual iteration in the agent loop. |
Reading a Trace
Traces are displayed as an indented tree. The top-level span represents the entire run, and child spans are nested below showing the execution order. For each span you can see:
- Status: Whether the span completed successfully (ok) or encountered an error (error)
- Duration: How long this step took in milliseconds
- Inputs: The data passed into this step (expandable)
- Outputs: The data returned by this step (expandable)
- Thoughts: The model's internal reasoning, displayed with a distinct dashed border. Only present on LLM spans when
reasoning: trueis set in the agent configuration - Metadata: Additional context such as the model name, retry count, or tool error details (expandable)
Trace Example
Here is what a typical LLM agent trace looks like when the model uses a tool:
In this trace, the agent received a request, passed through the before middleware, made an LLM call that decided to use the calculator.add tool, then made a second LLM call to formulate the final response using the tool result, and finished with the after middleware.
Token Usage & Cost
Every run records detailed token consumption broken down into four categories. Understanding these categories helps you optimize cost and identify unexpected behavior.
| Category | Description |
|---|---|
| Input Tokens | Tokens in the prompt sent to the model, including the system prompt, conversation history, and tool definitions. This is typically the largest category. |
| Output Tokens | Tokens in the model's text response. Does not include reasoning tokens. |
| Thinking Tokens | Tokens used by the model's internal reasoning process. Only present when reasoning: true is configured. Controlled by reasoning_budget. |
| Cached Input | Portion of input tokens served from the provider's cache (a subset of input tokens, not additional). Cached tokens are typically billed at a reduced rate. |
Token counts and estimated costs are visible in multiple places:
- The runs table shows total tokens and estimated cost per run
- The run detail header shows total tokens with a tooltip breaking down all four categories plus cost
- The agent detail page shows aggregate token usage for that agent
- Observability dashboards provide token charts and stat cards with configurable breakdowns
Cost Estimation
Estimated costs are calculated based on model pricing. Connic supports per-model pricing with volume tiers and reduced rates for cached tokens. Costs shown are estimates and may differ slightly from your provider's invoice.
Custom Dashboards
The Observability tab in your project provides customizable dashboards for monitoring agent performance over time. You can create multiple dashboards, each with a mix of widget types arranged in a drag-and-drop grid.
Widget Types
Stat Cards
Single-value metrics with optional breakdowns. Display metrics like total runs, success rate, failed runs, tool calls, total tokens, average tokens per run, total cost, or average cost per run.
Area Charts
Time-series visualizations with toggleable series. Chart agent runs (completed vs. failed), connector runs, token usage by category, token cost by category, or usage by model.
Bar Charts
Ranked comparisons grouped by agent, connector, or model. Compare run counts, token consumption, or cost across your agents to identify outliers.
Logs Lists
Recent activity feeds showing agent or connector runs. Optionally filter to errors only for a quick view of recent failures with links to run details.
Dashboard Features
- Multiple dashboards: Create separate dashboards for different concerns (e.g. one for cost monitoring, one for error tracking)
- Date range picker: Adjust the time window globally for all widgets with presets or custom ranges
- Auto-refresh: Dashboards poll for new data every 10 seconds
- Per-widget filters: Scope any widget to specific agents or connectors
- Drag-and-drop layout: Arrange and resize widgets freely in edit mode
- Default dashboard: A pre-configured dashboard is created automatically for new projects
Agent-Level Observability
Each agent has its own detail page accessible from the Agents tab. This provides a focused view of that agent's performance:
- Statistics: Total runs, success rate with trend indicator, average duration, and total tokens used
- Status breakdown: Counts for completed, failed, running, and queued runs
- Configuration: Agent type, model, max concurrent runs, and linked tools or agents
- Run history: An agent-scoped runs table with the same filtering capabilities as the main logs page
- Manual trigger: Send a test payload to the agent directly from the dashboard
Debugging an Agent
When an agent occasionally produces incorrect results, the goal is to narrow the issue to a specific slice of runs, compare a bad run against a similar good run, and identify the first step where the execution diverges. In Connic, the fastest workflow is: isolate the pattern in Logs, compare traces side by side, inspect the decision path in LLM and tool spans, then use token patterns to confirm the likely cause.
Make Runs Easy to Search
Intermittent issues are much easier to debug when your middleware tags runs with stable context keys such as customer_id, request_type, plan, locale, or workflow_version. You can then use key=value search in Logs to isolate only the runs that matter.
from typing import Any, Dict
async def before(content: Dict[str, Any], context: Dict[str, Any]) -> Dict[str, Any]:
context["request_type"] = "invoice_validation"
context["customer_tier"] = context.get("customer_tier", "unknown")
context["workflow_version"] = "2026-03-debug-pass"
context["debug_bucket"] = "high-risk"
return contentIsolate the Failing Pattern
Start in Logs and narrow to the agent in question. If the issue produces outright failures, filter by failed status. If the agent completes but returns wrong results, use search plus context filters to isolate the bad slice of runs: for example customer_id=abc123, request_type=refund, or workflow_version=2026-03-debug-pass. This helps you determine whether the problem is tied to a specific customer segment, input shape, deployment, or recent prompt/tool change.
Compare Like-for-Like Runs
Open one incorrect run and one healthy run side by side. Pick runs from the same agent, the same deployment, and as similar a context slice as possible. First compare the inputs and run context to confirm you are looking at like-for-like executions. Then compare the traces and look for the first span where the runs diverge. That first divergence usually points to the real source of the issue faster than inspecting the final output.
Inspect the Decision Path
If the first divergence happens in an LLM span and your agent has reasoning: true enabled, expand the Thoughts section to see how the model interpreted the request. This often reveals missing context, incorrect assumptions, or a prompt that is too vague. If the first divergence appears in a middleware or tool span instead, inspect that step first: incorrect enrichment, stale upstream data, or malformed tool arguments often create downstream model errors that only show up later in the run.
What to look for in the trace
- LLM span chose the wrong tool or passed incorrect arguments: the issue is likely in the system prompt or tool descriptions. The model did not have enough guidance to make the right call.
- Tool span returned empty or unexpected data: the issue is upstream of the model. Check the tool itself: stale data, an API error, or the model passed arguments the tool could not handle.
- More LLM iterations than similar successful runs: the agent may be looping because a tool consistently returns unsatisfying results, or the system prompt does not define a clear stopping condition.
Use Token Patterns as Clues
Compare token usage between passing and failing runs. The token breakdown often explains why a run went off course even when the final output looks superficially similar.
| Signal | What It Suggests |
|---|---|
| High input tokens | The prompt or middleware-added context may be bloated, making the model pay less attention to the most relevant details. |
| Thinking tokens hit the budget | If thinking tokens repeatedly land at reasoning_budget, the model may be cutting its reasoning short on harder cases. |
| Very low or zero thinking tokens | The model may not be spending enough reasoning effort on requests that require multi-step analysis. |
| Trace reaches max iterations | The agent is stopping because of max_iterations, not because it actually finished the task. |
| Failing runs use far more tokens | The agent may be wandering, retrying, or compensating for unclear instructions, weak tool outputs, or noisy context. |
Verify Tool Inputs and Outputs
Expand tool spans and inspect both the arguments the model sent and the data the tool returned. Intermittent wrong answers often come from tools that return empty results, stale data, or technically valid data in an unexpected shape. If the passing run and failing run call the same tool with different arguments, the issue is usually in the model's decision-making. If the arguments match but the outputs differ, the issue is usually upstream of the model.
Re-run to Verify Fixes
After adjusting your prompt, middleware, tool behavior, or agent configuration, use Run Again on the original failing run. This lets you verify the fix against the exact scenario that failed, then compare the new trace against the original to confirm that the execution path is now behaving as expected.
Configuration for Better Observability
These agent configuration options directly impact what data is captured in traces:
| Setting | Impact on Observability |
|---|---|
| reasoning: true | Captures the model's thinking process in trace spans. Essential for understanding why the model made a decision, not just what it output. |
| reasoning_budget | Controls how many tokens the model can use for reasoning. If traces show truncated thoughts, increase this value. Set to -1 to let the model decide automatically. |
| max_iterations | Limits agent loop iterations. If a failing run's trace shows exactly this many iterations, the agent was forced to stop. Consider increasing the limit or refining the prompt to reduce unnecessary iterations. |
| middleware context | Values set in middleware context are saved with the run and are searchable via key=value search. Tag runs with user IDs, session IDs, or request types to make debugging easier. |
Accessing Run Data in Code
The middleware after() hook is the way to programmatically access run data such as token usage and duration. The context dict passed to after() includes system fields that are populated automatically when the agent finishes:
| Context Key | Description |
|---|---|
| run_id | Unique identifier for this run |
| agent_name | Name of the agent that executed |
| duration_ms | Total execution time in milliseconds |
| token_usage | Dict with prompt_tokens, candidates_tokens, and total_tokens |
Use these fields to send run data to your own monitoring or analytics system:
import httpx
from typing import Any, Dict
async def after(response: str, context: Dict[str, Any]) -> str:
"""Send run metadata to an external monitoring system."""
try:
async with httpx.AsyncClient() as client:
await client.post(
"https://monitoring.internal/events",
json={
"run_id": context.get("run_id"),
"agent": context.get("agent_name"),
"duration_ms": context.get("duration_ms"),
"tokens": context.get("token_usage", {}),
"request_type": context.get("request_type"),
}
)
except Exception:
pass
return responseAny custom values you set in before() are also available in after(), so you can include context like request_type or customer_id in your monitoring payloads. See the Context documentation for the full reference.
Full visibility into every execution
With run logs, execution traces, token breakdowns, and custom dashboards, you have everything you need to monitor, debug, and optimize your agents directly from the Connic dashboard.