Documentation

Agent Evaluation

client.agent

Evaluate agentic tool calls before execution, scan results after, detect prompt injection, and accumulate risk across a session. Available in rail-score-sdk v2.4.0+.

Agent Evaluation Lifecycle

Agent requests tool call

Pre-execution (1.5–3.0 credits)

evaluate_tool_call(tool_name, params, domain)

Decision

ALLOW

Execute tool

FLAG

Execute + log

BLOCK

AgentBlockedError

ALLOW / FLAG path continues

Tool executes

Post-execution (0.5–1.0 credits)

evaluate_tool_result(tool_name, result, params)
PII scan
Injection
RAIL score
Safe result returned to agent

Scope required: All agent endpoints require agent:evaluate scope on your API key. Keys created before v2.4.0 are automatically grandfathered in.

OperationBasicDeep
evaluate_tool_call()1.5 credits3.0 credits
evaluate_tool_result()0.5 credits1.0 credits
check_injection()0.5 credits (flat)
evaluate_plan()1.5 per step3.0 per step
registry operations0 credits

Pre-execution: evaluate_tool_call()

Call this before running a tool. Returns an AgentDecision with a decision of ALLOW, FLAG, or BLOCK. A BLOCK decision raises AgentBlockedError if your policy is set to block.

from rail_score_sdk import RailScoreClient
from rail_score_sdk.agent.exceptions import AgentBlockedError

client = RailScoreClient(api_key="YOUR_RAIL_API_KEY")

try:
    decision = client.agent.evaluate_tool_call(
        tool_name="credit_scoring",
        tool_params={
            "applicant_id": "usr_123",
            "zip_code": "10001",
            "annual_income": 65000,
        },
        domain="finance",
        mode="basic",           # "basic" | "deep"
        compliance_frameworks=["eu_ai_act", "gdpr"],
    )

    print(decision.decision)          # "ALLOW" | "FLAG" | "BLOCK"
    print(decision.decision_reason)   # human-readable reason
    print(decision.rail_score.score)  # overall RAIL score (0-10)
    print(decision.compliance_violations)   # list of detected violations
    print(decision.suggested_params)  # safer parameter alternatives (if any)
    print(decision.credits_consumed)  # float

    if decision.decision == "FLAG":
        # Proceed with caution — log and monitor
        result = run_tool(decision.suggested_params or tool_params)

except AgentBlockedError as e:
    print(f"Tool call blocked: {e.tool_name}")
    print(f"Reason: {e.decision_reason}")

What Determines the Decision

Inputs

tool_name
tool_params
domain
compliance_frameworks

Processing

Tool Risk Registry

org override or system default

Context Signals
Proxy variablesProhibited attributesPHI / PII in paramsCompliance rules
RAIL Scoring (8 dims)
Policy Thresholds

Decision

ALLOW

score above thresholds

FLAG

below flag threshold

BLOCK

critical violation or below block threshold

AgentDecision response fields

FieldTypeDescription
decisionstrALLOW, FLAG, or BLOCK
decision_reasonstrHuman-readable explanation for the decision
event_idstrUnique event ID for audit trails
rail_scoreRailScoreOverall score, confidence, summary
dimension_scoresdictPer-dimension scores (all 8 RAIL dimensions)
compliance_violationslistDetected framework violations (article, severity, description)
suggested_paramsdict | NoneSafer alternative parameters, if available
credits_consumedfloatCredits charged for this call

Post-execution: evaluate_tool_result()

Scan a tool's output after it runs. Checks for PII in the result, prompt injection attempts embedded in returned data, and RAIL dimension violations.

result_data = run_tool(tool_params)

risk = client.agent.evaluate_tool_result(
    tool_name="web_search",
    tool_result_data=result_data,      # dict, str, or list
    tool_params=tool_params,           # original params for context
    mode="basic",
)

print(risk.pii_detected)              # bool
print(risk.pii_entities)              # list of detected PII types
print(risk.injection_detected)        # bool
print(risk.rail_score.score)          # overall RAIL score
print(risk.redacted_result)           # result with PII replaced by [TYPE]

Prompt injection: check_injection()

Standalone fast classifier for prompt injection. Covers direct instruction override, role hijack, jailbreak, data exfiltration attempts, and indirect injection from external sources.

Injection Detection Flow

Input content (user message, tool result, or external data)
Fast regex classifier

millisecond latency, no LLM call

5 attack patterns checked

direct_instruction_overridecritical
role_hijackhigh
jailbreakcritical
data_exfil_attempthigh
indirect_injectionhigh

borderline / inconclusive only

LLM fallback classifier
detected

bool

attack_type

str

severity

critical–low

recommended_action

block/flag/allow

check = client.agent.check_injection(
    content=user_input,
    content_source="user_message",   # "user_message" | "tool_result" | "external_data"
)

print(check.detected)           # bool
print(check.confidence)         # float 0-1
print(check.attack_type)        # "direct_instruction_override" | "role_hijack" | "jailbreak" | ...
print(check.severity)           # "critical" | "high" | "medium" | "low"
print(check.recommended_action) # "block" | "flag" | "allow"

if check.detected and check.severity in ("critical", "high"):
    raise ValueError(f"Injection attempt blocked: {check.attack_type}")

Plan evaluation: evaluate_plan()

Evaluate a multi-step agent plan (up to 20 steps) before any execution begins. Returns a per-step risk assessment and an overall plan decision.

Plan Evaluation: Per-Step Assessment

evaluate_plan(plan_steps, domain, mode)
1
web_search

query: "user address"

FLAG
2
send_email

to: {search_result}

FLAG
3
update_record

table: users, data: ...

BLOCK
Overall Decision
BLOCK

Blocked at step 3 — compliance violation detected in update_record

Steps 1–2 flagged but plan cannot proceed

from rail_score_sdk.agent.exceptions import PlanBlockedError

plan_steps = [
    {"step": 1, "tool": "web_search",     "params": {"query": "user address lookup"}},
    {"step": 2, "tool": "send_email",     "params": {"to": "{search_result}", "body": "..."}},
    {"step": 3, "tool": "update_record",  "params": {"table": "users", "data": "..."}},
]

try:
    evaluation = client.agent.evaluate_plan(
        plan_steps=plan_steps,
        domain="general",
        mode="basic",
    )

    print(evaluation.overall_decision)    # "ALLOW" | "FLAG" | "BLOCK"
    print(evaluation.overall_risk_score)  # float 0-10
    for step in evaluation.step_evaluations:
        print(f"Step {step.step_number}: {step.decision} — {step.decision_reason}")

except PlanBlockedError as e:
    print(f"Plan blocked at step {e.blocked_step}: {e.reason}")

Cross-call tracking: AgentSession

AgentSession wraps the same agent methods but accumulates risk across calls. It detects patterns like repeated PII exposure, escalating risk, or blocked retries within a single workflow.

AgentSession: Cross-Call Risk Accumulation

Call sequence

evaluate_tool_call()

credit_scoring

ALLOW

evaluate_tool_result()

PII: email detected

medium

evaluate_tool_call()

loan_approval

FLAG

session.risk_summary()

total_calls3
average_riskmedium
privacy trend4.2 / 10
repeated_pii_exposure

cross-call pattern detected

from rail_score_sdk.agent.session import AgentSession

session = AgentSession(
    client=client,
    agent_id="my-agent-001",
    max_session_age=3600,    # session expires after 1 hour
)

# Same API as client.agent — but risk is tracked cumulatively
decision = session.evaluate_tool_call(
    tool_name="credit_scoring",
    tool_params={"applicant_id": "usr_123", "income": 45000},
    domain="finance",
    mode="basic",
)

risk = session.evaluate_tool_result(
    tool_name="credit_scoring",
    tool_result_data=result,
    tool_params=tool_params,
)

# Review session-wide risk summary
summary = session.risk_summary()
print(summary.total_calls)        # int
print(summary.average_risk)       # "low" | "medium" | "high" | "critical"
print(summary.dimension_trends)   # dict of per-dimension avg scores
print(summary.patterns)           # list of detected cross-call patterns

session.close()

Per-tool policy: AgentPolicy

Define per-tool thresholds and attach them to an AgentPolicyEngine. The engine applies the matching policy after each evaluation.

from rail_score_sdk.agent.policy import AgentPolicy, AgentPolicyEngine

engine = AgentPolicyEngine(
    policies={
        "credit_scoring": AgentPolicy(
            block_below=6.0,
            flag_below=8.0,
            dimension_thresholds={"fairness": 7.0, "privacy": 7.0},
        ),
        "send_email": AgentPolicy(
            block_below=5.0,
            flag_below=7.0,
        ),
    },
    default_policy=AgentPolicy(block_below=4.0, flag_below=6.0),
)

decision = client.agent.evaluate_tool_call(
    tool_name="credit_scoring",
    tool_params=params,
    domain="finance",
    mode="basic",
)

# Engine applies the credit_scoring policy to the decision
enforced = engine.evaluate("credit_scoring", decision)
print(enforced.final_action)   # "block" | "flag" | "allow"

AgentMiddleware: @guard decorator

The @guard decorator from AgentMiddleware wraps any async function to evaluate its inputs before execution and optionally scan its output after.

from rail_score_sdk.agent.middleware import AgentMiddleware

middleware = AgentMiddleware(
    rail_api_key="YOUR_RAIL_API_KEY",
    policy="block",        # "block" | "log_only" | "suggest_fix" | "auto_fix"
    threshold=7.0,
    domain="finance",
    scan_results=True,     # Also scan tool output after execution
)

@middleware.guard(tool_name="credit_scoring")
async def run_credit_check(applicant_id, annual_income, zip_code):
    # This function is only called if evaluation returns ALLOW or FLAG
    return await credit_api.check(applicant_id, annual_income, zip_code)

# Call as normal — RAIL evaluation happens automatically
result = await run_credit_check(
    applicant_id="usr_123",
    annual_income=65000,
    zip_code="10001",
)

Tool Risk Registry

Register custom risk profiles for your organization's tools. Org profiles override system defaults and allow you to define proxy variable watchlists, prohibited attributes, and per-tool thresholds.

registry = client.agent.registry

# List all tools visible to your org (system defaults + your overrides)
tools = registry.list_tools()
for tool in tools:
    print(f"{tool['tool_name']}: risk={tool['risk_level']}, depth={tool['evaluation_depth']}")

# Register a custom tool profile
registry.register_tool({
    "tool_name": "loan_approval",
    "risk_level": "high",               # "low" | "medium" | "high" | "critical"
    "evaluation_depth": "deep",         # "basic" | "deep"
    "compliance_frameworks": ["eu_ai_act", "gdpr"],
    "proxy_variable_watchlist": ["zip_code", "neighborhood", "surname"],
    "prohibited_attributes": ["race", "religion", "national_origin"],
    "description": "Automated loan approval — requires deep evaluation",
})

# Remove a custom profile (reverts to system default)
registry.delete_tool("loan_approval")

Exceptions

ExceptionHTTPWhen
AgentBlockedError403Tool call blocked by policy — exposes .tool_name, .decision_reason
PlanBlockedErrorPlan evaluation blocked at a specific step — exposes .blocked_step, .reason
SessionClosedErrorMethod called on a closed AgentSession
from rail_score_sdk.agent.exceptions import (
    AgentBlockedError,
    PlanBlockedError,
    SessionClosedError,
)

try:
    decision = client.agent.evaluate_tool_call(
        tool_name="loan_approval",
        tool_params=params,
        domain="finance",
        mode="deep",
    )
except AgentBlockedError as e:
    print(f"Blocked: tool={e.tool_name}, reason={e.decision_reason}")
except PlanBlockedError as e:
    print(f"Plan blocked at step {e.blocked_step}: {e.reason}")
except SessionClosedError:
    print("Session was already closed")