March was all about confidence and control. A/B testing lets you compare agent variants side by side, guardrails enforce safety rules on every run, and API spec tools turn any OpenAPI definition into callable agent tools. We also shipped dashboard templates with percentile metrics, a migration CLI, and a long list of improvements.
A/B Testing
You can now test agent variants against each other in a live environment. Deploy a variant alongside a base agent, split traffic between them, and compare results side by side in the dashboard.
Variants follow a simple naming convention. Create an agent file named {base}-test-{name} and the SDK links it to the base agent automatically:
order-processor.yaml # base agent
order-processor-test-faster-model.yaml # variant: "faster-model"
order-processor-test-new-prompt.yaml # variant: "new-prompt"From the dashboard, open the base agent and click Manage A/B Tests to configure a test:
- ✓Traffic split: Route a percentage of requests to the variant, the rest stays on control
- ✓Minimum sample size: Set a threshold before results are considered meaningful
- ✓Auto-rollback: Pause the test automatically if the variant failure rate exceeds a threshold within a rolling window
- ✓Sticky sessions: The same user or chat thread always sees the same variant
The comparison view shows runs, average cost, P50 and P95 duration, average judge score, and success rate for control and variant side by side. For a deeper walkthrough, read A/B Testing for AI Agents.
Agent Guardrails
Guardrails add a safety layer around your agents. Define input rules that run before the agent executes and output rules that check the response before it reaches the user.
guardrails:
input:
- type: prompt_injection
mode: block
- type: pii
mode: redact
config:
entities: [email, phone, ssn, credit_card]
- type: topic_restriction
mode: block
config:
allowed_topics: [product support, billing, account help]
off_topic_message: "I can only help with support and billing."
output:
- type: system_prompt_leakage
mode: block
- type: pii_leakage
mode: block
config:
entities: [ssn, api_key]Each rule has a mode that controls what happens on violation:
- •block: Stop processing and return a rejection message
- •warn: Log the violation in the trace but continue normally
- •redact: Replace detected content with placeholders like
[EMAIL_REDACTED]
Built-in types include prompt_injection, pii, moderation, topic_restriction, regex, and custom. Custom guardrails point to your own Python functions for full flexibility. Every evaluation is recorded in the run trace so you can audit exactly what was checked and why. Read more in Agent Guardrails: Real-Time Safety and Secure AI Agents in Production.
API Spec Tools
Agents can now call any API defined in an OpenAPI v3.x specification. Upload a spec to your project and reference its operations as API spec tools using the api: prefix. Wildcard matching lets you expose entire specs or subsets in a single line.
tools:
- api:stripe.* # all operations from the Stripe spec
- api:hubspot.get_contact # single operation
- api:internal_api.list* # wildcard: list_users, list_orders, etc.
- billing.lookup_invoice # file-based tool (unchanged)Tool names, descriptions, and parameters are derived directly from the OpenAPI schema. File-based tools also support wildcards now — use support_tools.search_* to match all functions starting with search_ in a module.
Dashboard Templates & Metrics
Setting up observability dashboards no longer means configuring widgets one by one. Choose a pre-built template when creating a new dashboard and get a complete layout instantly.
- ✓Overview: Total runs, success rate, costs, token usage, top agents, and model distribution
- ✓Agent: Scoped to a single agent with runs, errors, judge scores, and model breakdown
- ✓Cost: Total, input, output, thinking, and cached input costs with model-level breakdown
- ✓Token / LLM: Token consumption patterns by type and model
Alongside templates, all run metrics now include P50 and P95 percentiles for duration, cost, and tokens per run. Averages hide outliers — percentiles show you the real picture. Cost tracking is also more granular: every run now shows a computed cost based on actual token usage and model pricing, broken down by input, output, thinking, and cached tokens. For a full guide, read Agent Observability: Track Costs, Tokens & Runs.
Migrate Command
Moving to Connic from another framework is now a single command. The new connic migrate CLI command scans your project, detects agents, tools, and models, and scaffolds a complete Connic project structure.
$ connic migrate ./my-langchain-project
Detected framework: LangChain
Found 3 agents, 7 tools, 2 models
Scaffolding project...
Created:
agents/order-processor.yaml
agents/support-agent.yaml
agents/classifier.yaml
tools/lookup_order.py
tools/search_docs.py
MIGRATION_REPORT.mdSupported frameworks:
- •LangChain: Extracts agents from
create_agent()andcreate_react_agent()calls - •Google ADK: Reads
root_agent.yamland Python agent classes including sequential, parallel, and loop agents
A generated MIGRATION_REPORT.md lists everything that was converted, any unresolved references, and items that need manual review. For a step-by-step walkthrough, read Migrate from LangChain to Production.
More Improvements
- •Bulk run actions: Rerun or cancel multiple agent runs at once from the dashboard
- •Knowledge Base namespaces: The new
kb_list_namespacestool lets agents explore the hierarchical structure of a knowledge base - •Scheduled triggers: Use
trigger_agent_atto schedule another agent to run at a specific future time - •Web search filters: The
web_searchtool now supportscountryandinclude_newsparameters for more targeted results - •Nested project structures: Agent YAML files can now be organized in subdirectories under
agents/ - •Telegram allowlist: Restrict which Telegram user IDs can interact with your agent, plus configurable session TTL
- •Judge improvements: Filter evaluations by low scores and see cost estimates before running bulk evaluations
- •Fullscreen dashboard: Expand any dashboard widget to fullscreen for a closer look at your data