Connic

Guardrails

Add configurable safety guardrails to your agents for input validation, output filtering, PII protection, prompt injection detection, and more. Guardrails wrap agent execution and are configured directly in your agent YAML.

Overview

Guardrails are safety checks that run before and after agent execution. They protect against prompt injection attacks, PII leakage, toxic content, off-topic requests, and more. Each guardrail rule has a mode that determines what happens when a violation is detected.

Execution Flow
Input → Input Guardrails → Middleware (before) → Agent Execution → Middleware (after) → Output Guardrails → Response

Modes

ModeBehaviorRun Status
blockStop processing. Return a rejection message (configurable via rejection_message). Agent never executes (input) or response is replaced (output).completed (set fail_run: true in config to mark as failed)
warnLog the violation as a trace span. Continue processing normally.completed
redactReplace detected content with placeholders (e.g., [EMAIL_REDACTED]). Only available for pii and pii_leakage.completed

Quick Start

Add a security baseline to any agent with just a few lines:

agents/agent.yaml
guardrails:
  input:
    - type: prompt_injection
      mode: block
    - type: pii
      mode: redact
  output:
    - type: moderation
      mode: block
    - type: system_prompt_leakage
      mode: block

Full Example

A comprehensive configuration demonstrating all guardrail types:

agents/support-agent.yaml
version: "1.0"
name: support-agent
model: gemini/gemini-2.5-pro
system_prompt: "You are a customer support agent."
guardrails:
  input:
    - type: prompt_injection
      mode: block
    - type: pii
      mode: redact
      config:
        entities: [email, phone, ssn, credit_card, iban]
    - type: moderation
      mode: block
      config:
        categories: [hate, self_harm, violence, sexual]
    - type: topic_restriction
      mode: block
      config:
        allowed_topics: [product support, billing, account help]
        off_topic_message: "I can only help with product support, billing, and account questions."
        model: openai/gpt-4o-mini
    - type: regex
      mode: block
      config:
        patterns:
          - pattern: "(?i)\\b(drop|delete|truncate)\\s+table\\b"
            message: "SQL commands are not allowed"
    - type: custom
      name: validate-ticket-id
      mode: block
  output:
    - type: moderation
      mode: block
    - type: pii_leakage
      mode: block
      config:
        entities: [ssn, credit_card, api_key]
    - type: system_prompt_leakage
      mode: block
    - type: relevance
      mode: warn
      config:
        model: openai/gpt-4o-mini
    - type: regex
      mode: warn
      config:
        patterns:
          - pattern: "(?i)internal use only|confidential"
            message: "Output contains internal-only content"
    - type: custom
      name: brand-voice-check
      mode: warn

Input Guardrails

prompt_injection

Multi-layered prompt injection detection, the #1 risk for agentic AI. Built-in detection includes:

  1. Heuristic pattern matching - detects known injection phrases
  2. Typoglycemia defence - catches scrambled-letter variants (e.g., "ignroe prevoius insturctoins")
  3. Encoding detection - decodes and scans Base64/hex/unicode content
  4. Structural analysis - detects delimiter abuse and instruction boundary violations
ConfigDescription
sensitivitylow | medium (default) | high
providerOptional. Set to lakera for Lakera Guard API (100+ languages)

Modes: block, warn

pii

PII detection and redaction. Detects emails, phone numbers, SSNs, credit cards, IBANs, IP addresses, API keys, and more.

agents/agent.yaml
# Input: "My email is john@example.com and SSN is 123-45-6789"
# After redaction: "My email is [EMAIL_REDACTED] and SSN is [SSN_REDACTED]"

guardrails:
  input:
    - type: pii
      mode: redact
      config:
        entities: [email, phone, ssn, credit_card]
ConfigDescription
entitiesList of entity types to detect (default: all). Available: email, phone, ssn, credit_card, iban, ip_address, api_key, and more

Modes: block, warn, redact

moderation

Content moderation and safety classification. Works on both input and output.

ConfigDescription
categorieshate, harassment, self_harm, violence, sexual, illegal_activity, dangerous_instructions
threshold0.0-1.0 confidence threshold (default: 0.7, external providers only)
providerOptional. openai (OpenAI Moderation API) or perspective (Google Perspective API)

Modes: block, warn

topic_restriction

Keep your agent on-topic using an LLM classifier call.

Adds one LLM call per request (latency + token cost). Use a small/cheap model via the model field.

ConfigDescription
allowed_topicsList of allowed topic descriptions
blocked_topicsList of explicitly blocked topics (alternative to allowed_topics)
off_topic_messageCustom rejection message
modelModel for classification (e.g., openai/gpt-4o-mini). Defaults to agent model.

Modes: block, warn

regex

Custom regex pattern matching for business rules. Works on both input and output.

ConfigDescription
patternsList of {pattern, message} objects

Modes: block, warn

Output Guardrails

In addition to moderation and regex (which work on both input and output), these guardrails are specific to output:

pii_leakage

Detect PII leaking in agent responses. Critical for GDPR/CCPA compliance. Uses the same detection engine as input pii.

ConfigDescription
entitiesEntity types to detect (default: ssn, credit_card, api_key)

Modes: block, warn, redact

system_prompt_leakage

Detect when the agent leaks its system prompt in the output. Uses pattern matching and similarity comparison against the actual system prompt.

ConfigDescription
similarity_threshold0.0-1.0 (default: 0.6). How similar to the system prompt the output must be to trigger.

Modes: block, warn

relevance

Detect off-topic or irrelevant responses using an LLM classifier. Catches goal hijacking attacks.

Adds one LLM call per request (latency + token cost). Use a small/cheap model via the model field.

ConfigDescription
contextAdditional description of what "relevant" means
modelModel for classification. Defaults to agent model.

Modes: block, warn

data_exfiltration

Detect data exfiltration attempts in output: suspicious URLs with encoded data, markdown images to external domains, and more.

ConfigDescription
allowed_domainsList of allowed external domains (default: none, all flagged)

Modes: block, warn

Custom Guardrails

Write Python files in a guardrails/ directory (following the middleware pattern):

guardrails/validate-ticket-id.py
from connic import GuardrailResult
import re

def check(content: str, context: dict) -> GuardrailResult:
    """Verify the input contains a valid ticket ID format."""
    if not re.search(r'TICKET-\d{4,8}', content):
        return GuardrailResult(
            passed=False,
            message="Please include a valid ticket ID (e.g., TICKET-12345)"
        )
    return GuardrailResult(passed=True)

Contract

  • File name matches the name in config (e.g., guardrails/validate-ticket-id.py for name: validate-ticket-id)
  • Must export a check(content: str, context: dict) -> GuardrailResult function (sync or async)
  • content is the text being checked (input or output)
  • context is the same context dict available to middleware (run_id, agent_name, connector_id, timestamp, and any user-set values)
  • Returns GuardrailResult(passed=True) or GuardrailResult(passed=False, message="...")

Async Example

guardrails/check-user-permissions.py
from connic import GuardrailResult
import httpx

async def check(content: str, context: dict) -> GuardrailResult:
    """Check if the user has permission to use this agent."""
    user_id = context.get("user_id")
    if not user_id:
        return GuardrailResult(passed=False, message="User not authenticated")
    
    async with httpx.AsyncClient() as client:
        resp = await client.get(f"https://api.example.com/users/{user_id}/permissions")
        if resp.status_code != 200:
            return GuardrailResult(passed=False, message="Could not verify permissions")
        
        data = resp.json()
        if not data.get("allowed"):
            return GuardrailResult(passed=False, message="Insufficient permissions")
    
    return GuardrailResult(passed=True)

External Providers

Each built-in guardrail ships with a lightweight default. For production accuracy, swap in an external provider via config.provider:

ProviderForSetup
openaimoderationUses your configured OpenAI provider credentials (no extra setup needed). The moderation endpoint is free to use
lakeraprompt_injectionAdd LAKERA_API_KEY as an environment variable. Get your key from platform.lakera.ai
perspectivemoderationAdd GOOGLE_PERSPECTIVE_API_KEY as an environment variable. Enable the API in your Google Cloud Console and create an API key

OpenAI Moderation Example

agents/agent.yaml
guardrails:
  input:
    - type: moderation
      mode: block
      config:
        provider: openai
        categories: [hate, self_harm, violence, sexual]
        threshold: 0.7

Lakera Guard Example

agents/agent.yaml
guardrails:
  input:
    - type: prompt_injection
      mode: block
      config:
        provider: lakera

Perspective API Example

agents/agent.yaml
guardrails:
  input:
    - type: moderation
      mode: block
      config:
        provider: perspective
        threshold: 0.7
        languages: [en]     # Configurable language list

Observability

Every guardrail evaluation is automatically recorded as a trace span. Violations appear in the traces timeline with full details:

  • Span name: guardrail:prompt_injection, guardrail:pii, etc.
  • Status: passed, blocked, warned, or redacted
  • Direction: input or output
  • Provider used and detection details
Best Practices
  • Start with a security baseline: prompt_injection + pii (input), moderation + system_prompt_leakage (output)
  • Use warn mode first to monitor violations before switching to block
  • For topic_restriction and relevance, use a cheap model like openai/gpt-4o-mini to minimize latency and cost
  • Use external providers (openai, lakera) for production workloads requiring higher accuracy
  • Keep guardrail rules ordered by cost: cheap checks first (regex, heuristics), expensive checks last (LLM-based)