Observability
Monitor agent performance, inspect execution traces, analyze token usage, and build custom dashboards to debug and optimize your agents.
Overview
Every agent execution in Connic is recorded as a run. Each run captures the full lifecycle of a request: the input that triggered it, every intermediate step the agent took, the final output, token consumption, cost estimates, and timing data. Connic provides three layers of observability to help you understand and debug your agents:
Run Logs
Search and filter all runs across your project. Find specific executions by status, agent, deployment, or content.
Execution Traces
Inspect the full step-by-step execution of any run. See every LLM call, tool invocation, and reasoning step.
Dashboards
Build custom dashboards with charts, metrics, and activity feeds to monitor trends across your agents.
Run Logs
The Logs tab in your project shows a chronological list of all agent runs. Each row displays the run status, agent name, deployment, duration, token count with estimated cost, and when it was queued. The table updates automatically every few seconds.
Filtering Runs
Use the filter bar at the top to narrow down runs:
| Filter | Description |
|---|---|
| Status | Filter by one or more statuses: queued running completed failed cancelled |
| Date Range | Select a time window using presets (24h, 7d, 30d) or a custom date range |
| Deployment | Show runs from a specific deployment only |
| Search | Free-text search across run content. Use key=value syntax to search by context values (e.g. customer_id=abc123) |
Context Search
The key=value search syntax queries against the run context. This is particularly useful if your middleware stores metadata like user IDs, session IDs, or request identifiers. You can then search for all runs associated with a specific user or session.
Inspecting a Run
Click any run in the logs table to open its detail view. The run detail provides the full picture of what happened during that execution.
Run Header
The header shows the run ID, status badge, and key metadata at a glance:
- Connector: Which connector triggered this run (with a link to the connector)
- Triggered by: If another agent triggered this run, a link to that parent run
- Duration: Total execution time (live-updating for in-progress runs)
- Token usage: Total tokens consumed, with a tooltip showing the full breakdown and estimated cost
Run Sections
Error
If the run failed, a prominent error banner shows the full error message. This is the first thing to check when investigating a failure.
Input
The payload that triggered the agent. Displayed in a structured format with support for file attachments, JSON payloads, and plain text. Toggle to raw JSON view to see the exact payload.
Output
The agent's final response. For agents with an output schema, this will be structured JSON matching the defined schema.
Context
The run context dictionary, including any values set by middleware. Expandable with both formatted and raw JSON views. Useful for verifying what data your middleware attached to the request.
Traces
The full execution trace showing every step the agent took. This is the most powerful debugging tool and is covered in detail below.
You can also re-run any execution from the run detail view. This triggers the same agent with the same input, letting you verify that a fix resolved the issue. For queued or running runs, a cancel button is available.
Execution Traces
Execution traces provide a step-by-step breakdown of everything that happened during a run. Connic captures traces using OpenTelemetry spans, organized in a hierarchical tree structure. Each span represents a discrete operation: an LLM call, a tool invocation, a middleware hook, or a sub-agent execution.
Span Types
Each span in the trace tree has a type icon and color to help you quickly identify what it represents:
| Icon | Span Type | What It Represents |
|---|---|---|
| LLM | A call to the language model. Contains the prompt (input), the model's response (output), and reasoning (thoughts) if enabled. | |
| Tool | A tool function invocation. Shows the arguments passed to the tool and its return value. | |
| MCP Tool | A tool call to an external MCP server. Includes the server name, tool name, arguments, and response. | |
| Middleware | A middleware hook execution (before or after). Shows the data flowing through the middleware. | |
| Sequential | A sequential agent orchestration step. Contains child spans for each agent in the chain. | |
| Run / Step | The top-level run or an individual iteration in the agent loop. |
Reading a Trace
Traces are displayed as an indented tree. The top-level span represents the entire run, and child spans are nested below showing the execution order. For each span you can see:
- Status: Whether the span completed successfully (ok) or encountered an error (error)
- Duration: How long this step took in milliseconds
- Inputs: The data passed into this step (expandable)
- Outputs: The data returned by this step (expandable)
- Thoughts: The model's internal reasoning, displayed with a distinct dashed border. Only present on LLM spans when
reasoning: trueis set in the agent configuration - Metadata: Additional context such as the model name, retry count, or tool error details (expandable)
Trace Example
Here is what a typical LLM agent trace looks like when the model uses a tool:
In this trace, the agent received a request, passed through the before middleware, made an LLM call that decided to use the calculator.add tool, then made a second LLM call to formulate the final response using the tool result, and finished with the after middleware.
Token Usage & Cost
Every run records detailed token consumption broken down into four categories. Understanding these categories helps you optimize cost and identify unexpected behavior.
| Category | Description |
|---|---|
| Input Tokens | Tokens in the prompt sent to the model, including the system prompt, conversation history, and tool definitions. This is typically the largest category. |
| Output Tokens | Tokens in the model's text response. Does not include reasoning tokens. |
| Thinking Tokens | Tokens used by the model's internal reasoning process. Only present when reasoning: true is configured. Controlled by reasoning_budget. |
| Cached Input | Portion of input tokens served from the provider's cache (a subset of input tokens, not additional). Cached tokens are typically billed at a reduced rate. |
Token counts and estimated costs are visible in multiple places:
- The runs table shows total tokens and estimated cost per run
- The run detail header shows total tokens with a tooltip breaking down all four categories plus cost
- The agent detail page shows aggregate token usage for that agent
- Observability dashboards provide token charts and stat cards with configurable breakdowns
Cost Estimation
Estimated costs are calculated based on model pricing. Connic supports per-model pricing with volume tiers and reduced rates for cached tokens. Costs shown are estimates and may differ slightly from your provider's invoice.
Custom Dashboards
The Observability tab in your project provides customizable dashboards for monitoring agent performance over time. You can create multiple dashboards, each with a mix of widget types arranged in a drag-and-drop grid.
Widget Types
Stat Cards
Single-value metrics with optional breakdowns. Display metrics like total runs, success rate, failed runs, tool calls, total tokens, average tokens per run, total cost, or average cost per run.
Area Charts
Time-series visualizations with toggleable series. Chart agent runs (completed vs. failed), connector runs, token usage by category, token cost by category, or usage by model.
Bar Charts
Ranked comparisons grouped by agent, connector, or model. Compare run counts, token consumption, or cost across your agents to identify outliers.
Logs Lists
Recent activity feeds showing agent or connector runs. Optionally filter to errors only for a quick view of recent failures with links to run details.
Dashboard Features
- Multiple dashboards: Create separate dashboards for different concerns (e.g. one for cost monitoring, one for error tracking)
- Date range picker: Adjust the time window globally for all widgets with presets or custom ranges
- Auto-refresh: Dashboards poll for new data every 10 seconds
- Per-widget filters: Scope any widget to specific agents or connectors
- Drag-and-drop layout: Arrange and resize widgets freely in edit mode
- Default dashboard: A pre-configured dashboard is created automatically for new projects
Agent-Level Observability
Each agent has its own detail page accessible from the Agents tab. This provides a focused view of that agent's performance:
- Statistics: Total runs, success rate with trend indicator, average duration, and total tokens used
- Status breakdown: Counts for completed, failed, running, and queued runs
- Configuration: Agent type, model, max concurrent runs, and linked tools or agents
- Run history: An agent-scoped runs table with the same filtering capabilities as the main logs page
- Manual trigger: Send a test payload to the agent directly from the dashboard
Debugging an Agent
When an agent occasionally produces incorrect results, here is a systematic workflow to identify the root cause using Connic's observability features:
Find the Failing Runs
Go to Logs and filter by the agent in question. If the issue produces outright failures, filter by failed status. If the agent completes but returns wrong results, use the search bar to find runs containing specific incorrect output, or search by context key-value pairs if your middleware tags runs with relevant metadata.
Compare Passing and Failing Runs
Open a failing run and a similar passing run side by side. Compare their inputs to rule out differences in the trigger payload. Then compare the traces: look at where the execution paths diverge. Did the failing run use different tools? Did the LLM make a different reasoning decision?
Inspect the Reasoning
If your agent has reasoning: true enabled, expand the Thoughts section on each LLM span. This shows the model's internal thinking process and often reveals why it made a wrong decision: a misinterpretation of the input, a flawed assumption, or missing context.
Check Token Usage Patterns
Compare token usage between passing and failing runs. Watch for these signals:
- Unusually high input tokens: The context may be bloated, pushing important information out of the model's effective attention window
- Truncated thinking tokens: If thinking tokens are exactly at the
reasoning_budgetlimit, the model may have been forced to cut its reasoning short - Iteration count at max: If the trace shows as many iterations as
max_iterations, the agent was forced to stop early and may have returned an incomplete answer - Zero thinking tokens on a failing run: The model may not have engaged reasoning for a complex request that needed it
Inspect Tool Results
Expand tool spans in the trace to check what data was returned. A common cause of incorrect results is the model receiving unexpected data from a tool: an API error, empty results, or data in an unexpected format. Check both the arguments the model sent to the tool and the tool's return value.
Re-run to Verify Fixes
After adjusting your agent configuration, system prompt, or tools, use the Run Again button on the original failing run. This re-triggers the agent with the same input payload, letting you verify the fix against the exact scenario that failed.
Configuration for Better Observability
These agent configuration options directly impact what data is captured in traces:
| Setting | Impact on Observability |
|---|---|
| reasoning: true | Captures the model's thinking process in trace spans. Essential for understanding why the model made a decision, not just what it output. |
| reasoning_budget | Controls how many tokens the model can use for reasoning. If traces show truncated thoughts, increase this value. Set to -1 to let the model decide automatically. |
| max_iterations | Limits agent loop iterations. If a failing run's trace shows exactly this many iterations, the agent was forced to stop. Consider increasing the limit or refining the prompt to reduce unnecessary iterations. |
| middleware context | Values set in middleware context are saved with the run and are searchable via key=value search. Tag runs with user IDs, session IDs, or request types to make debugging easier. |
Full visibility into every execution
With run logs, execution traces, token breakdowns, and custom dashboards, you have everything you need to monitor, debug, and optimize your agents directly from the Connic dashboard.