AI agents are powerful. They can draft emails, summarize documents, call APIs, and make decisions. But they also inherit every risk that comes with running a language model in production: prompt injection, PII leakage, off-topic responses, system prompt exposure, and outputs that violate your content policies.
You cannot ship an agent to production and hope it behaves. You need runtime safety checks that sit between your users and your agent, inspecting every input and every output before they cause harm. That is what Connic Guardrails does.
Why Agents Need Guardrails
Traditional software is deterministic. If you write a function that adds two numbers, it adds two numbers. Language models are different. The same prompt can produce wildly different outputs depending on context, temperature, and how creatively a user phrases their request.
This unpredictability creates real risks in production:
Prompt Injection
A user crafts input that overrides your system prompt. Your helpful customer support agent suddenly starts ignoring its instructions and doing whatever the user asks. OWASP lists prompt injection as the #1 risk for LLM applications.
PII Exposure
Users paste sensitive data into prompts: email addresses, phone numbers, social security numbers, credit card details. Without guardrails, that data flows straight into your model provider's API and potentially into your logs.
System Prompt Leakage
A cleverly worded request tricks your agent into revealing its system prompt, including internal instructions, tool configurations, and business logic you intended to keep private.
Off-Topic or Harmful Output
Your billing support agent starts giving medical advice. Your internal assistant generates toxic content. Without output checks, there is no safety net between the model's response and the end user.
Connic Guardrails address all of these. They run as a configurable layer around your agent, inspecting content in real time and taking action before damage is done.
How Guardrails Work
Guardrails sit in the execution pipeline of every agent run. They check content at two points: before the agent processes the input, and after the agent produces a response. The flow looks like this:
Input guardrails evaluate the raw user message before the agent sees it. If a guardrail detects a prompt injection attempt, for example, the message is blocked immediately and the agent never executes. If PII is detected, it can be redacted in place so the agent receives a sanitized version.
Output guardrails evaluate the agent's response before it reaches the user. If the response contains system prompt fragments, toxic content, or data exfiltration patterns, the guardrail intercepts it. The user receives a safe rejection message instead.
Three Modes of Action
Every guardrail rule operates in one of three modes. This gives you fine-grained control over how aggressively each check should respond:
Block
Stop processing entirely. The user receives a configurable rejection message. The agent never runs (input) or the response is replaced (output). Use this for hard safety boundaries.
Warn
Log the violation as a trace span and continue. Processing is not interrupted. Use this when you want visibility into potential issues without blocking legitimate requests.
Redact
Replace sensitive content with placeholders and continue processing. Available for PII guardrails. The agent receives sanitized input, or the user receives a sanitized response, without the run being interrupted.
10 Built-In Guardrail Types
Connic ships with a comprehensive set of guardrail types that cover the most common safety requirements for production agents. Each can run on input, output, or both.
Prompt Injection Detection
OWASP-style detection that catches instruction override attempts, typoglycemia attacks, encoding tricks, and structural manipulation. Supports Lakera as an external provider.
PII Detection (Input)
Detects personally identifiable information in user input: emails, phone numbers, SSNs, credit cards, and more. Configurable entity types. Supports block, warn, and redact modes.
PII Leakage (Output)
Catches PII that appears in agent responses, even if it was not in the original input. Prevents your agent from surfacing sensitive data from its context or tools.
Content Moderation
Toxicity and harmful content detection. Uses OpenAI Moderation or Perspective API as external providers to catch hate speech, harassment, violence, and other policy violations.
Topic Restriction
Restrict your agent to specific topics. Define an allowed topics list and a custom off-topic message. Requests outside the allowed scope are blocked before the agent runs.
Regex Pattern Matching
Define custom regex patterns to catch specific strings, formats, or keywords. Useful for catching internal identifiers, proprietary terms, or domain-specific patterns.
System Prompt Leakage
Detects when an agent's response contains fragments of its system prompt. Prevents attackers from extracting your internal instructions, tool schemas, or business logic.
Output Relevance
Checks whether the agent's response is actually relevant to the original question. Catches hallucinated tangents, off-track reasoning, and responses that drift from the task.
Data Exfiltration Detection
Detects patterns that indicate an attempt to extract data through the agent, such as encoding payloads, URL smuggling, or structured extraction of private context.
Custom Guardrails
Write your own guardrail logic in Python. Drop a module into your guardrails/ directory with a check() function. Supports both sync and async execution.
Configuration in YAML
Guardrails are defined in your agent's YAML configuration. Each rule specifies its type, mode, and optional parameters. Input and output guardrails are configured separately, so you can apply different checks at each stage.
guardrails:
input:
- type: prompt_injection
mode: block
- type: pii
mode: redact
config:
entities: [email, phone, ssn]
- type: topic_restriction
mode: block
config:
allowed_topics: [support, billing]
off_topic_message: "I can only help with support and billing questions."
output:
- type: moderation
mode: block
- type: system_prompt_leakage
mode: block
- type: pii_leakage
mode: redact
- type: relevance
mode: warnThis configuration blocks prompt injection on input, redacts PII from user messages, restricts the agent to support and billing topics, and then checks the output for moderation violations, system prompt leakage, PII in the response, and relevance drift.
Tip: Order Matters
Guardrails run in the order you define them. Place cheaper, faster checks first (like regex and prompt injection) and more expensive checks (like moderation with external providers) later. If an early check blocks, the later ones never run.
Writing Custom Guardrails
When the built-in types are not enough, you can write custom guardrails in Python. Create a module in your agent's guardrails/ directory that exports a check() function. It receives the content being checked and a context dictionary with metadata about the current run.
from connic import GuardrailResult
COMPETITORS = ["acme corp", "rival inc", "other platform"]
def check(content: str, context: dict) -> GuardrailResult:
content_lower = content.lower()
for name in COMPETITORS:
if name in content_lower:
return GuardrailResult(
passed=False,
message="I'm not able to discuss other platforms.",
details={"matched": name},
)
return GuardrailResult(passed=True)Then reference it in your YAML configuration:
guardrails:
output:
- type: custom
name: competitor_mentions
mode: blockFull Observability with Traces
Every guardrail evaluation is captured as an OpenTelemetry trace span. You get complete visibility into what was checked, what passed, and what was blocked or redacted.
Trace Spans
Each guardrail check creates a child span under guardrails:input or guardrails:output. Attributes include the rule type, mode, direction, and pass/fail status.
Run-Level Detail
Open any run in the dashboard to see exactly which guardrails fired, whether they passed or blocked, and what content triggered them. Blocked runs show the rejection reason directly in the run detail view.
This means you can answer questions like: How often is prompt injection being attempted? Which agents trigger the most PII redactions? Are topic restrictions too aggressive? The data is there for every run.
External Providers
Several built-in guardrail types support external providers for more accurate detection. You can swap the default detection engine for a specialized service without changing your guardrail configuration:
| Provider | Guardrail Types | Strength |
|---|---|---|
| Lakera | Prompt Injection | Purpose-built for injection detection with continuously updated models |
| OpenAI Moderation | Moderation, PII Leakage | High-quality toxicity and category-level content classification |
| Perspective API | Moderation, PII Leakage | Google-backed toxicity scoring with fine-grained attribute breakdown |
Real-World Examples
Here are some guardrail configurations we see teams deploying in production:
Customer Support Agent
A SaaS company runs a customer-facing support agent. They use prompt injection detection on input (block mode) to prevent manipulation, topic restriction to keep conversations about their product, PII redaction on input so customer emails and phone numbers are never sent to the model, and content moderation on output to ensure responses stay professional.
Internal Knowledge Assistant
An enterprise deploys an internal agent that queries their knowledge base. System prompt leakage detection on output prevents the agent from revealing its retrieval configuration. Data exfiltration detection catches attempts to extract internal documents through crafted prompts. Relevance checking (warn mode) flags when the agent starts generating tangential content.
Regulated Industry Agent
A healthcare company uses PII detection on both input and output with redact mode to ensure patient data never persists in logs. Topic restriction limits the agent to approved medical information topics. A custom guardrail validates that every response includes a required disclaimer. Guardrail trace spans provide a complete record of every check for compliance reviews.
Getting Started
Adding guardrails to an existing agent takes minutes. Here is how:
- 1.Open your agent's YAML configuration and add a
guardrailssection with the rules you need - 2.Deploy your agent. Guardrails activate automatically on the next run
- 3.Check the Traces tab in the Connic dashboard to see guardrail spans for each run
- 4.Open individual runs to inspect which guardrails fired and drill into blocked requests
Start with prompt injection and PII detection. Those two cover the most common attack vectors. Then layer on topic restriction, moderation, and custom checks as you understand your traffic patterns.
For the full configuration reference and all available options, check the Guardrails documentation. If you are new to Connic, start with the quickstart guide to deploy your first agent, then come back here to add safety layers around it.