Connic Composer SDK

Guardrails

Add configurable safety guardrails to your agents for input validation, output filtering, PII protection, prompt injection detection, and more. Guardrails wrap agent execution and are configured directly in your agent YAML.

Last updated July 5, 2026

Overview

Guardrails are safety checks that run before and after agent execution. They protect against prompt injection attacks, PII leakage, toxic content, off-topic requests, and more. Each guardrail rule has a mode that determines what happens when a violation is detected. Guardrails are configured directly in your agent YAML.

Execution Flow

Input↓Input Guardrails↓Middleware (before)↓Agent Execution↓Middleware (after)↓Output Guardrails↓Response

When an input guardrail blocks, the agent and before middleware are skipped and the rejection message is returned. The after middleware still runs on that rejection by default, so it can post-process every response. Set run_after_on_block: false under guardrails to skip it on a block instead.

Modes

Mode	Behavior	Run Status
`block`	Stop processing. Return a rejection message (configurable via `rejection_message`). Agent never executes (input) or response is replaced (output).	`completed` (set `fail_run: true` in config to mark as `failed`)
`warn`	Log the violation as a trace span. Continue processing normally.	`completed`
`redact`	Replace detected content with placeholders (e.g., `[EMAIL_REDACTED]`). Only available for `pii` and `pii_leakage`.	`completed`

Quick Start

Add a security baseline to any agent with just a few lines:

agents/agent.yaml

guardrails:
  input:
    - type: prompt_injection
      mode: block
    - type: pii
      mode: redact
  output:
    - type: moderation
      mode: block
    - type: system_prompt_leakage
      mode: block

Input Guardrails

`prompt_injection`

Multi-layered prompt injection detection, the #1 risk for agentic AI. Built-in detection includes:

Heuristic pattern matching - detects known injection phrases
Typoglycemia defence - catches scrambled-letter variants (e.g., "ignroe prevoius insturctoins")
Encoding detection - decodes and scans Base64/hex/unicode content
Structural analysis - detects delimiter abuse and instruction boundary violations

agents/agent.yaml

guardrails:
  input:
    - type: prompt_injection
      mode: block
      config:
        sensitivity: high

Config	Description
`sensitivity`	`low` \| `medium` (default) \| `high`
`provider`	Optional. Set to `lakera` for Lakera Guard API (100+ languages)

Modes: block, warn

`pii`

PII detection and redaction. Detects emails, phone numbers, SSNs, credit cards, IBANs, IP addresses, and API keys.

agents/agent.yaml

# Input: "My email is john@example.com and SSN is 123-45-6789"
# After redaction: "My email is [EMAIL_REDACTED] and SSN is [SSN_REDACTED]"

guardrails:
  input:
    - type: pii
      mode: redact
      config:
        entities: [email, phone, ssn, credit_card]

Config	Description
`entities`	List of entity types to detect (default: all). Available: `email`, `phone`, `ssn`, `credit_card`, `iban`, `ip_address`, and `api_key`.

Modes: block, warn, redact

`moderation`

Content moderation and safety classification. Works on both input and output.

agents/agent.yaml

guardrails:
  input:
    - type: moderation
      mode: block
      config:
        categories: [hate, harassment, self_harm, violence, sexual]

Config	Description
`categories`	`hate`, `harassment`, `self_harm`, `violence`, `sexual`, `illegal_activity`, `dangerous_instructions`
`threshold`	0.0-1.0 confidence threshold (default: 0.7, external providers only)
`provider`	Optional. `openai` (OpenAI Moderation API) or `perspective` (Google Perspective API)

Modes: block, warn

`topic_restriction`

Keep your agent on-topic using an LLM classifier call.

Adds one LLM call per request (latency + token cost). Use a small/cheap model via the model field.

agents/agent.yaml

guardrails:
  input:
    - type: topic_restriction
      mode: block
      config:
        allowed_topics: [product support, billing, account help]
        off_topic_message: "I can only help with product support, billing, and account questions."
        model: openai/gpt-5-mini

Config	Description
`allowed_topics`	List of allowed topic descriptions
`blocked_topics`	List of explicitly blocked topics (alternative to allowed_topics)
`off_topic_message`	Custom rejection message
`model`	Model for classification (e.g., `openai/gpt-5-mini`). Supports all providers including custom OpenAI-compatible providers. Defaults to agent model.

Modes: block, warn

`regex`

Custom regex pattern matching for business rules. Works on both input and output.

agents/agent.yaml

guardrails:
  input:
    - type: regex
      mode: block
      config:
        patterns:
          - pattern: "(?i)\\b(drop|delete|truncate)\\s+table\\b"
            message: "SQL commands are not allowed"
  output:
    - type: regex
      mode: warn
      config:
        patterns:
          - pattern: "(?i)internal use only|confidential"
            message: "Output contains internal-only content"

Config	Description
`patterns`	List of `{pattern, message}` objects

Modes: block, warn

Output Guardrails

In addition to moderation and regex (which work on both input and output), these guardrails are specific to output:

`pii_leakage`

Detect PII leaking in agent responses. Critical for GDPR/CCPA compliance. Uses the same detection engine as input pii.

agents/agent.yaml

guardrails:
  output:
    - type: pii_leakage
      mode: block
      config:
        entities: [ssn, credit_card, api_key]

Config	Description
`entities`	Entity types to detect (default: `ssn, credit_card, api_key`)

Modes: block, warn, redact

`system_prompt_leakage`

Detect when the agent leaks its system prompt in the output. Uses pattern matching and similarity comparison against the actual system prompt.

agents/agent.yaml

guardrails:
  output:
    - type: system_prompt_leakage
      mode: block
      config:
        similarity_threshold: 0.6

Config	Description
`similarity_threshold`	0.0-1.0 (default: 0.6). How similar to the system prompt the output must be to trigger.

Modes: block, warn

`relevance`

Detect off-topic or irrelevant responses using an LLM classifier. Catches goal hijacking attacks.

Adds one LLM call per request (latency + token cost). Use a small/cheap model via the model field.

agents/agent.yaml

guardrails:
  output:
    - type: relevance
      mode: warn
      config:
        context: "Responses must relate to the user's support request."
        model: openai/gpt-5-mini

Config	Description
`context`	Additional description of what "relevant" means
`model`	Model for classification. Defaults to agent model.

Modes: block, warn

`data_exfiltration`

Detect data exfiltration attempts in output: suspicious URLs with encoded data, markdown images to external domains, and more.

agents/agent.yaml

guardrails:
  output:
    - type: data_exfiltration
      mode: block
      config:
        allowed_domains: [example.com, cdn.example.com]

Config	Description
`allowed_domains`	List of allowed external domains (default: none, all flagged)

Modes: block, warn

Custom Guardrails

Write Python files in a guardrails/ directory (following the middleware pattern):

guardrails/validate-ticket-id.py

from connic import GuardrailResult
import re

def check(content: str, context: dict) -> GuardrailResult:
    """Verify the input contains a valid ticket ID format."""
    if not re.search(r'TICKET-\d{4,8}', content):
        return GuardrailResult(
            passed=False,
            message="Please include a valid ticket ID (e.g., TICKET-12345)"
        )
    return GuardrailResult(passed=True)

Contract

File name matches the name in config (e.g., guardrails/validate-ticket-id.py for name: validate-ticket-id)
Must export a check(content: str, context: dict) -> GuardrailResult function (sync or async)
content is the text being checked (input or output)
context is the same context dict available to middleware (run_id, agent_name, connector_id, timestamp, and any user-set values)
Returns GuardrailResult(passed=True) or GuardrailResult(passed=False, message="...")

Async Example

guardrails/check-user-permissions.py

from connic import GuardrailResult
import httpx
import re

async def check(content: str, context: dict) -> GuardrailResult:
    """Check if the user has permission to use this agent.

    Input guardrails run before middleware, so request data is read from
    content. If you need middleware-enriched context, use an output guardrail
    or enforce permissions in middleware instead.
    """
    match = re.search(r'["\']user_id["\']\s*:\s*["\']([^"\']+)', content)
    user_id = match.group(1) if match else None
    if not user_id:
        return GuardrailResult(passed=False, message="User not authenticated")

    async with httpx.AsyncClient() as client:
        resp = await client.get(f"https://api.example.com/users/{user_id}/permissions")
        if resp.status_code != 200:
            return GuardrailResult(passed=False, message="Could not verify permissions")

        data = resp.json()
        if not data.get("allowed"):
            return GuardrailResult(passed=False, message="Insufficient permissions")

    return GuardrailResult(passed=True)

Logging from a custom guardrail

print, writes to sys.stderr, and stdlib logging calls from a custom guardrail are all captured per run and shown on the project Logs tab and in the run detail view, tagged with source guardrail.<name>. If the check function raises, the runtime catches the exception and auto-logs the traceback at error level under that same source. The exception is then handled as a guardrail violation according to the rule's mode: warn continues processing, while the default block mode stops processing and returns the configured rejection message.

External Providers

Each built-in guardrail ships with a lightweight default. For production accuracy, swap in an external provider via config.provider:

Provider	For	Setup
`openai`	`moderation`	Uses your configured OpenAI provider credentials (no extra setup needed). The moderation endpoint is free to use
`lakera`	`prompt_injection`	Add `LAKERA_API_KEY` as an environment variable. Get your key from platform.lakera.ai
`perspective`	`moderation`	Add `GOOGLE_PERSPECTIVE_API_KEY` as an environment variable. Enable the API in your Google Cloud Console and create an API key

OpenAI Moderation Example

agents/agent.yaml

guardrails:
  input:
    - type: moderation
      mode: block
      config:
        provider: openai
        categories: [hate, self_harm, violence, sexual]
        threshold: 0.7

Lakera Guard Example

agents/agent.yaml

guardrails:
  input:
    - type: prompt_injection
      mode: block
      config:
        provider: lakera

Perspective API Example

agents/agent.yaml

guardrails:
  input:
    - type: moderation
      mode: block
      config:
        provider: perspective
        threshold: 0.7
        languages: [en]     # Configurable language list

Observability

Every guardrail evaluation is automatically recorded as a trace span. Violations appear in the traces timeline with full details:

Span name: guardrail:prompt_injection, guardrail:pii, etc.
Status: passed, blocked, warned, or redacted
Direction: input or output
Provider used and detection details

Best Practices

Start with a security baseline: prompt_injection + pii (input), moderation + system_prompt_leakage (output)
Use warn mode first to monitor violations before switching to block
For topic_restriction and relevance, use a cheap model like openai/gpt-5-mini to minimize latency and cost
Use external providers (openai, lakera) for production workloads requiring higher accuracy
Keep guardrail rules ordered by cost: cheap checks first (regex, heuristics), expensive checks last (LLM-based)

Agent Configuration

Full YAML reference for agent configuration

Middleware

Run custom logic before and after agent execution

Context

Share data between middleware, guardrails, and tools

Observability

Monitor guardrail violations in traces