Agent Evaluation

client.agent

Evaluate agentic tool calls before execution, scan results after, detect prompt injection, and accumulate risk across a session. Available in rail-score-sdk v2.4.0+.

Agent Evaluation Lifecycle

Agent requests tool call

Pre-execution (1.5–3.0 credits)

evaluate_tool_call(tool_name, params, domain)

Decision

ALLOW

Execute tool

FLAG

Execute + log

BLOCK

AgentBlockedError

ALLOW / FLAG path continues

Tool executes

Post-execution (0.5–1.0 credits)

evaluate_tool_result(tool_name, result, params)

PII scan

Injection

RAIL score

Safe result returned to agent

Scope required: All agent endpoints require agent:evaluate scope on your API key. Keys created before v2.4.0 are automatically grandfathered in.

Operation	Basic	Deep
evaluate_tool_call()	1.5 credits	3.0 credits
evaluate_tool_result()	0.5 credits	1.0 credits
check_injection()	0.5 credits (flat)
evaluate_plan()	1.5 per step	3.0 per step
registry operations	0 credits

Pre-execution: evaluate_tool_call()

Call this before running a tool. Returns an AgentDecision with a decision of ALLOW, FLAG, or BLOCK. A BLOCK decision raises AgentBlockedError if your policy is set to block.

from rail_score_sdk import RailScoreClient
from rail_score_sdk.agent.exceptions import AgentBlockedError

client = RailScoreClient(api_key="YOUR_RAIL_API_KEY")

try:
    decision = client.agent.evaluate_tool_call(
        tool_name="credit_scoring",
        tool_params={
            "applicant_id": "usr_123",
            "zip_code": "10001",
            "annual_income": 65000,
        },
        domain="finance",
        mode="basic",           # "basic" | "deep"
        compliance_frameworks=["eu_ai_act", "gdpr"],
    )

    print(decision.decision)          # "ALLOW" | "FLAG" | "BLOCK"
    print(decision.decision_reason)   # human-readable reason
    print(decision.rail_score.score)  # overall RAIL score (0-10)
    print(decision.compliance_violations)   # list of detected violations
    print(decision.suggested_params)  # safer parameter alternatives (if any)
    print(decision.credits_consumed)  # float

    if decision.decision == "FLAG":
        # Proceed with caution — log and monitor
        result = run_tool(decision.suggested_params or tool_params)

except AgentBlockedError as e:
    print(f"Tool call blocked: {e.tool_name}")
    print(f"Reason: {e.decision_reason}")

What Determines the Decision

Inputs

tool_name

tool_params

domain

compliance_frameworks

Processing

Tool Risk Registry

org override or system default

Context Signals

Proxy variablesProhibited attributesPHI / PII in paramsCompliance rules

RAIL Scoring (8 dims)

Policy Thresholds

Decision

ALLOW

score above thresholds

FLAG

below flag threshold

BLOCK

critical violation or below block threshold

AgentDecision response fields

Field	Type	Description
decision	str	`ALLOW`, `FLAG`, or `BLOCK`
decision_reason	str	Human-readable explanation for the decision
event_id	str	Unique event ID for audit trails
rail_score	RailScore	Overall score, confidence, summary
dimension_scores	dict	Per-dimension scores (all 8 RAIL dimensions)
compliance_violations	list	Detected framework violations (article, severity, description)
suggested_params	dict \| None	Safer alternative parameters, if available
credits_consumed	float	Credits charged for this call

Post-execution: evaluate_tool_result()

Scan a tool's output after it runs. Checks for PII in the result, prompt injection attempts embedded in returned data, and RAIL dimension violations.

result_data = run_tool(tool_params)

risk = client.agent.evaluate_tool_result(
    tool_name="web_search",
    tool_result_data=result_data,      # dict, str, or list
    tool_params=tool_params,           # original params for context
    mode="basic",
)

print(risk.pii_detected)              # bool
print(risk.pii_entities)              # list of detected PII types
print(risk.injection_detected)        # bool
print(risk.rail_score.score)          # overall RAIL score
print(risk.redacted_result)           # result with PII replaced by [TYPE]

Prompt injection: check_injection()

Standalone fast classifier for prompt injection. Covers direct instruction override, role hijack, jailbreak, data exfiltration attempts, and indirect injection from external sources.

Injection Detection Flow

Input content (user message, tool result, or external data)

Fast regex classifier

millisecond latency, no LLM call

5 attack patterns checked

direct_instruction_overridecritical

role_hijackhigh

jailbreakcritical

data_exfil_attempthigh

indirect_injectionhigh

borderline / inconclusive only

LLM fallback classifier

detected

bool

attack_type

str

severity

critical–low

recommended_action

block/flag/allow

check = client.agent.check_injection(
    content=user_input,
    content_source="user_message",   # "user_message" | "tool_result" | "external_data"
)

print(check.detected)           # bool
print(check.confidence)         # float 0-1
print(check.attack_type)        # "direct_instruction_override" | "role_hijack" | "jailbreak" | ...
print(check.severity)           # "critical" | "high" | "medium" | "low"
print(check.recommended_action) # "block" | "flag" | "allow"

if check.detected and check.severity in ("critical", "high"):
    raise ValueError(f"Injection attempt blocked: {check.attack_type}")

Plan evaluation: evaluate_plan()

Evaluate a multi-step agent plan (up to 20 steps) before any execution begins. Returns a per-step risk assessment and an overall plan decision.

Plan Evaluation: Per-Step Assessment

evaluate_plan(plan_steps, domain, mode)

web_search

query: "user address"

FLAG

send_email

to: {search_result}

FLAG

update_record

table: users, data: ...

BLOCK

Overall Decision

BLOCK

Blocked at step 3 — compliance violation detected in update_record

Steps 1–2 flagged but plan cannot proceed

from rail_score_sdk.agent.exceptions import PlanBlockedError

plan_steps = [
    {"step": 1, "tool": "web_search",     "params": {"query": "user address lookup"}},
    {"step": 2, "tool": "send_email",     "params": {"to": "{search_result}", "body": "..."}},
    {"step": 3, "tool": "update_record",  "params": {"table": "users", "data": "..."}},
]

try:
    evaluation = client.agent.evaluate_plan(
        plan_steps=plan_steps,
        domain="general",
        mode="basic",
    )

    print(evaluation.overall_decision)    # "ALLOW" | "FLAG" | "BLOCK"
    print(evaluation.overall_risk_score)  # float 0-10
    for step in evaluation.step_evaluations:
        print(f"Step {step.step_number}: {step.decision} — {step.decision_reason}")

except PlanBlockedError as e:
    print(f"Plan blocked at step {e.blocked_step}: {e.reason}")

Cross-call tracking: AgentSession

AgentSession wraps the same agent methods but accumulates risk across calls. It detects patterns like repeated PII exposure, escalating risk, or blocked retries within a single workflow.

AgentSession: Cross-Call Risk Accumulation

Call sequence

evaluate_tool_call()

credit_scoring

ALLOW

evaluate_tool_result()

PII: email detected

medium

evaluate_tool_call()

loan_approval

FLAG

session.risk_summary()

total_calls3

average_riskmedium

privacy trend4.2 / 10

repeated_pii_exposure

cross-call pattern detected

from rail_score_sdk.agent.session import AgentSession

session = AgentSession(
    client=client,
    agent_id="my-agent-001",
    max_session_age=3600,    # session expires after 1 hour
)

# Same API as client.agent — but risk is tracked cumulatively
decision = session.evaluate_tool_call(
    tool_name="credit_scoring",
    tool_params={"applicant_id": "usr_123", "income": 45000},
    domain="finance",
    mode="basic",
)

risk = session.evaluate_tool_result(
    tool_name="credit_scoring",
    tool_result_data=result,
    tool_params=tool_params,
)

# Review session-wide risk summary
summary = session.risk_summary()
print(summary.total_calls)        # int
print(summary.average_risk)       # "low" | "medium" | "high" | "critical"
print(summary.dimension_trends)   # dict of per-dimension avg scores
print(summary.patterns)           # list of detected cross-call patterns

session.close()

Per-tool policy: AgentPolicy

Define per-tool thresholds and attach them to an AgentPolicyEngine. The engine applies the matching policy after each evaluation.

from rail_score_sdk.agent.policy import AgentPolicy, AgentPolicyEngine

engine = AgentPolicyEngine(
    policies={
        "credit_scoring": AgentPolicy(
            block_below=6.0,
            flag_below=8.0,
            dimension_thresholds={"fairness": 7.0, "privacy": 7.0},
        ),
        "send_email": AgentPolicy(
            block_below=5.0,
            flag_below=7.0,
        ),
    },
    default_policy=AgentPolicy(block_below=4.0, flag_below=6.0),
)

decision = client.agent.evaluate_tool_call(
    tool_name="credit_scoring",
    tool_params=params,
    domain="finance",
    mode="basic",
)

# Engine applies the credit_scoring policy to the decision
enforced = engine.evaluate("credit_scoring", decision)
print(enforced.final_action)   # "block" | "flag" | "allow"

AgentMiddleware: @guard decorator

The @guard decorator from AgentMiddleware wraps any async function to evaluate its inputs before execution and optionally scan its output after.

from rail_score_sdk.agent.middleware import AgentMiddleware

middleware = AgentMiddleware(
    rail_api_key="YOUR_RAIL_API_KEY",
    policy="block",        # "block" | "log_only" | "suggest_fix" | "auto_fix"
    threshold=7.0,
    domain="finance",
    scan_results=True,     # Also scan tool output after execution
)

@middleware.guard(tool_name="credit_scoring")
async def run_credit_check(applicant_id, annual_income, zip_code):
    # This function is only called if evaluation returns ALLOW or FLAG
    return await credit_api.check(applicant_id, annual_income, zip_code)

# Call as normal — RAIL evaluation happens automatically
result = await run_credit_check(
    applicant_id="usr_123",
    annual_income=65000,
    zip_code="10001",
)

Tool Risk Registry

Register custom risk profiles for your organization's tools. Org profiles override system defaults and allow you to define proxy variable watchlists, prohibited attributes, and per-tool thresholds.

registry = client.agent.registry

# List all tools visible to your org (system defaults + your overrides)
tools = registry.list_tools()
for tool in tools:
    print(f"{tool['tool_name']}: risk={tool['risk_level']}, depth={tool['evaluation_depth']}")

# Register a custom tool profile
registry.register_tool({
    "tool_name": "loan_approval",
    "risk_level": "high",               # "low" | "medium" | "high" | "critical"
    "evaluation_depth": "deep",         # "basic" | "deep"
    "compliance_frameworks": ["eu_ai_act", "gdpr"],
    "proxy_variable_watchlist": ["zip_code", "neighborhood", "surname"],
    "prohibited_attributes": ["race", "religion", "national_origin"],
    "description": "Automated loan approval — requires deep evaluation",
})

# Remove a custom profile (reverts to system default)
registry.delete_tool("loan_approval")

Exceptions

Exception	HTTP	When
AgentBlockedError	403	Tool call blocked by policy — exposes `.tool_name`, `.decision_reason`
PlanBlockedError	—	Plan evaluation blocked at a specific step — exposes `.blocked_step`, `.reason`
SessionClosedError	—	Method called on a closed `AgentSession`

from rail_score_sdk.agent.exceptions import (
    AgentBlockedError,
    PlanBlockedError,
    SessionClosedError,
)

try:
    decision = client.agent.evaluate_tool_call(
        tool_name="loan_approval",
        tool_params=params,
        domain="finance",
        mode="deep",
    )
except AgentBlockedError as e:
    print(f"Blocked: tool={e.tool_name}, reason={e.decision_reason}")
except PlanBlockedError as e:
    print(f"Plan blocked at step {e.blocked_step}: {e.reason}")
except SessionClosedError:
    print("Session was already closed")

← Integrations API Reference →