You deployed your first AI agent. It processed 500 requests yesterday. Great news, right? Except you have no idea how many tokens it consumed, what it cost you, or why 12% of those requests failed silently. Welcome to the observability problem.
Traditional APM tools were built for request-response patterns: latency percentiles, error rates, throughput. AI agents are different. They make multiple LLM calls per request. Token usage varies wildly based on context. Costs can spike 10x when users send longer inputs. You need observability built specifically for agentic workloads.
What Makes Agent Observability Different
When a user sends a message to your agent, a lot happens behind the scenes. The agent might:
- 1.Query a knowledge base for context (RAG retrieval)
- 2.Make an initial LLM call to reason about the request
- 3.Execute 2-3 tool calls (API requests, database queries)
- 4.Make another LLM call to synthesize results
- 5.Optionally call another agent for specialized tasks
Each step consumes tokens. Each step can fail. Traditional metrics like "average response time" hide all this complexity. You need granular visibility into each phase.
The Four Pillars of Agent Observability
Run Tracking
Total runs, success rates, failures. The baseline health metrics for your agent fleet.
Token Usage
Input vs output tokens, cached tokens, reasoning tokens. Understand where context is going.
Cost Attribution
Per-model pricing, volume tiers, input vs output costs. Know exactly where money goes.
Real-time Logs
Live run history, duration, status. Debug issues as they happen, not hours later.
Building Your First Dashboard
Connic creates a default dashboard when you start your first project. But the real power comes from customization. Here is how to build a dashboard tailored to your needs:
Step 1: Navigate to Observability
In your project sidebar, click Observability. You will see the default dashboard with pre-configured widgets showing total runs, success rate, token usage, and costs.
Step 2: Enter Edit Mode
Click the Edit button in the top right. This unlocks drag-and-drop arrangement and the ability to add, remove, or configure widgets.
Step 3: Add Widgets
Click Add Widget to choose from three widget types:
Stat Cards
Single metric displays. Choose from: Total Runs, Success Rate, Failed Runs, Tool Calls, Total Tokens, Input/Output Tokens, Total Cost, Input/Output Cost, Avg Cost per Run, Avg Tokens per Run.
Area Charts
Time-series visualizations. Track agent runs (completed vs failed), token usage (input vs output over time), or token cost trends.
Logs Lists
Recent activity feeds. Show agent runs or connector runs with status, duration, and direct links to detailed traces.
Step 4: Filter by Agent
Most widgets support filtering by specific agents. Running multiple agents for different purposes? Create separate widgets to track each one, or compare them side-by-side in the same chart.
Understanding Token Economics
Token usage drives your LLM costs. But not all tokens are equal:
Output tokens typically cost 3-4x more than input tokens. If your costs seem high, check your output token usage first. Long, verbose responses are often the culprit.
Setting Up Model Pricing
Token counts are useful. Dollar amounts are actionable. To convert tokens to costs, you need to configure pricing for the models your agents use.
Global Defaults
Connic includes global pricing for popular models out of the box. These appear with a "global" badge in your pricing settings. You do not need to configure anything to start tracking costs for common models like GPT-4o, Claude Sonnet, or Gemini 2.5.
Custom Model Pricing
Using a fine-tuned model? Self-hosting? Or just need different pricing than the defaults? Navigate to Settings > Observability and click Add Pricing.
# Exact model match openai/gpt-4o anthropic/claude-sonnet-4-5-20250514 gemini/gemini-2.5-pro # Regex pattern for model families openai/gpt-4o.* # Matches all GPT-4o variants anthropic/claude-.* # All Claude models gemini/gemini-2.* # Gemini 2.x family
All pricing is per 1 million tokens. Project-level pricing overrides global defaults, so you can customize costs for specific use cases without affecting other projects.
Volume-Based Pricing Tiers
Some providers offer tiered pricing for high-volume usage. Configure volume tiers to accurately track costs when your token counts exceed certain thresholds:
Multi-Dashboard Workflows
One dashboard rarely fits all needs. Create multiple dashboards for different perspectives:
- -Executive Overview: High-level cost and success metrics for weekly reviews
- -Debugging Dashboard: Recent runs, failure rates, and logs for on-call engineers
- -Cost Optimization: Token breakdowns and cost trends for budget planning
- -Agent Comparison: Side-by-side metrics for A/B testing different agent configurations
Set a default dashboard that loads automatically. Configure default time ranges per dashboard: your executive overview might default to 30 days while your debugging dashboard shows the last hour.
Real-Time Monitoring
Dashboards auto-refresh every 10 seconds. The "Last updated" indicator shows you exactly how fresh the data is. For incident response, this means you can watch failures happen live without manual refreshes.
Pro Tip: Environment Isolation
Each environment (development, staging, production) has isolated observability data. Use the environment selector to switch contexts. Production dashboards stay clean even when you are running thousands of test runs in development.
Common Patterns and Anti-Patterns
DO: Track cost per agent
Different agents have different cost profiles. Your research agent might use GPT-4o while your simple FAQ bot uses Flash. Track them separately.
DON'T: Ignore success rate drops
A 95% to 85% success rate drop might seem minor. But it means 3x more failures. Set up alerting thresholds based on percentages, not raw counts.
DO: Compare input vs output ratios
Healthy agents typically have 2-5x more input than output tokens (context + RAG retrieval). An inverted ratio often indicates runaway generation or inefficient prompts.
DON'T: Rely on averages alone
Average cost per run hides outliers. One 50K token conversation can skew your daily average. Use time-series charts to spot anomalies.
Getting Started
Observability is available in all Connic projects. Here is how to start:
- 1.Deploy an agent and run a few requests to generate data
- 2.Navigate to Observability in your project
- 3.Review the default dashboard, then customize for your needs
- 4.Configure model pricing in Settings > Observability for accurate cost tracking
Running agents without observability is like driving without a dashboard. You might get where you are going, but you will not know if you are running out of gas until it is too late.
Check out the quickstart guide to deploy your first agent, or explore the agent documentation to learn about advanced configurations.