AI agents are powerful. They can draft emails, summarize documents, call APIs, and make decisions. But they also inherit every risk that comes with running a language model in production: prompt injection, PII leakage, off-topic responses, system prompt exposure, and outputs that violate your content policies.
You can't ship an agent to production and hope it behaves. You need runtime safety checks that sit between your users and your agent, inspecting every input and every output before they cause harm. That's what Connic Guardrails does.
Why Agents Need Guardrails
Traditional software is deterministic. If you write a function that adds two numbers, it adds two numbers. Language models are different. The same prompt can produce wildly different outputs depending on context, temperature, and how creatively a user phrases their request.
This unpredictability creates real risks in production:
Connic Guardrails address all of these. They run as a configurable layer around your agent, inspecting content in real time and taking action before damage is done.
How Guardrails Work
Guardrails sit in the execution pipeline of every agent run. They check content at two points: before the agent processes the input, and after the agent produces a response.
Input guardrails evaluate the raw user message before the agent sees it. If a guardrail detects a prompt injection attempt, the message is blocked and the agent never executes. If PII is detected, it can be redacted in place so the agent receives a sanitized version.
Output guardrails evaluate the agent's response before it reaches the user. If the response contains system prompt fragments, toxic content, or data exfiltration patterns, the guardrail intercepts it. The user gets a safe rejection message instead.
Three Modes of Action
Every guardrail rule operates in one of three modes. This gives you fine-grained control over how aggressively each check should respond:
10 Built-In Guardrail Types
Connic ships a set of guardrail types that cover the most common safety needs for production agents. Each can run on input, output, or both.
guardrails/ directory with a check() function. Supports both sync and async execution.Configuration in YAML
Guardrails are defined in your agent's YAML configuration. Each rule specifies its type, mode, and optional parameters. Input and output guardrails are configured separately, so you can apply different checks at each stage.
guardrails:
input:
- type: prompt_injection
mode: block
- type: pii
mode: redact
config:
entities: [email, phone, ssn]
- type: topic_restriction
mode: block
config:
allowed_topics: [support, billing]
off_topic_message: "I can only help with support and billing questions."
output:
- type: moderation
mode: block
- type: system_prompt_leakage
mode: block
- type: pii_leakage
mode: redact
- type: relevance
mode: warnThis configuration blocks prompt injection on input, redacts PII from user messages, restricts the agent to support and billing topics, and then checks the output for moderation violations, system prompt leakage, PII in the response, and relevance drift.
Writing Custom Guardrails
When the built-in types aren't enough, you can write custom guardrails in Python. Create a module in your agent's guardrails/ directory that exports a check() function. It receives the content being checked and a context dictionary with metadata about the current run.
from connic import GuardrailResult
COMPETITORS = ["acme corp", "rival inc", "other platform"]
def check(content: str, context: dict) -> GuardrailResult:
content_lower = content.lower()
for name in COMPETITORS:
if name in content_lower:
return GuardrailResult(
passed=False,
message="I'm not able to discuss other platforms.",
details={"matched": name},
)
return GuardrailResult(passed=True)Then reference it in your YAML configuration:
guardrails:
output:
- type: custom
name: competitor_mentions
mode: blockFull Observability with Traces
Every guardrail evaluation is captured as an OpenTelemetry trace span. You get complete visibility into what was checked, what passed, and what was blocked or redacted.
guardrails:input or guardrails:output. Attributes include the rule type, mode, direction, and pass/fail status.That means you can answer questions like: How often is prompt injection being attempted? Which agents trigger the most PII redactions? Are topic restrictions too aggressive? The data is there for every run.
External Providers
Several built-in guardrail types support external providers for more accurate detection. You can swap the default detection engine for a specialized service without changing your guardrail configuration:
| Provider | Guardrail Types | Strength |
|---|---|---|
| Lakera | Prompt Injection | Purpose-built for injection detection with continuously updated models |
| OpenAI Moderation | Moderation, PII Leakage | High-quality toxicity and category-level content classification |
| Perspective API | Moderation, PII Leakage | Google-backed toxicity scoring with fine-grained attribute breakdown |
Real-World Examples
A few guardrail configurations teams run in production:
Getting Started
Adding guardrails to an existing agent takes minutes:
- 1.Open your agent's YAML configuration and add a
guardrailssection with the rules you need - 2.Deploy your agent. Guardrails activate automatically on the next run
- 3.Check the Traces tab in the Connic dashboard to see guardrail spans for each run
- 4.Open individual runs to inspect which guardrails fired and drill into blocked requests
Start with prompt injection and PII detection. Those two cover the most common attack vectors. Then layer on topic restriction, moderation, and custom checks as you understand your traffic patterns.
For the full configuration reference and all available options, check the Guardrails documentation. New to Connic? Start with the quickstart guide to deploy your first agent, then come back here to add safety layers around it.