Agent Evaluation
client.agent
Evaluate agentic tool calls before execution, scan results after, detect prompt injection, and accumulate risk across a session. Available in rail-score-sdk v2.4.0+.
Agent Evaluation Lifecycle
Pre-execution (1.5–3.0 credits)
Decision
Execute tool
Execute + log
AgentBlockedError
ALLOW / FLAG path continues
Post-execution (0.5–1.0 credits)
Scope required: All agent endpoints require agent:evaluate scope on your API key. Keys created before v2.4.0 are automatically grandfathered in.
| Operation | Basic | Deep |
|---|---|---|
| evaluate_tool_call() | 1.5 credits | 3.0 credits |
| evaluate_tool_result() | 0.5 credits | 1.0 credits |
| check_injection() | 0.5 credits (flat) | |
| evaluate_plan() | 1.5 per step | 3.0 per step |
| registry operations | 0 credits | |
Pre-execution: evaluate_tool_call()
Call this before running a tool. Returns an AgentDecision with a decision of ALLOW, FLAG, or BLOCK. A BLOCK decision raises AgentBlockedError if your policy is set to block.
from rail_score_sdk import RailScoreClient
from rail_score_sdk.agent.exceptions import AgentBlockedError
client = RailScoreClient(api_key="YOUR_RAIL_API_KEY")
try:
decision = client.agent.evaluate_tool_call(
tool_name="credit_scoring",
tool_params={
"applicant_id": "usr_123",
"zip_code": "10001",
"annual_income": 65000,
},
domain="finance",
mode="basic", # "basic" | "deep"
compliance_frameworks=["eu_ai_act", "gdpr"],
)
print(decision.decision) # "ALLOW" | "FLAG" | "BLOCK"
print(decision.decision_reason) # human-readable reason
print(decision.rail_score.score) # overall RAIL score (0-10)
print(decision.compliance_violations) # list of detected violations
print(decision.suggested_params) # safer parameter alternatives (if any)
print(decision.credits_consumed) # float
if decision.decision == "FLAG":
# Proceed with caution — log and monitor
result = run_tool(decision.suggested_params or tool_params)
except AgentBlockedError as e:
print(f"Tool call blocked: {e.tool_name}")
print(f"Reason: {e.decision_reason}")What Determines the Decision
Inputs
Processing
org override or system default
Decision
score above thresholds
below flag threshold
critical violation or below block threshold
AgentDecision response fields
| Field | Type | Description |
|---|---|---|
| decision | str | ALLOW, FLAG, or BLOCK |
| decision_reason | str | Human-readable explanation for the decision |
| event_id | str | Unique event ID for audit trails |
| rail_score | RailScore | Overall score, confidence, summary |
| dimension_scores | dict | Per-dimension scores (all 8 RAIL dimensions) |
| compliance_violations | list | Detected framework violations (article, severity, description) |
| suggested_params | dict | None | Safer alternative parameters, if available |
| credits_consumed | float | Credits charged for this call |
Post-execution: evaluate_tool_result()
Scan a tool's output after it runs. Checks for PII in the result, prompt injection attempts embedded in returned data, and RAIL dimension violations.
result_data = run_tool(tool_params)
risk = client.agent.evaluate_tool_result(
tool_name="web_search",
tool_result_data=result_data, # dict, str, or list
tool_params=tool_params, # original params for context
mode="basic",
)
print(risk.pii_detected) # bool
print(risk.pii_entities) # list of detected PII types
print(risk.injection_detected) # bool
print(risk.rail_score.score) # overall RAIL score
print(risk.redacted_result) # result with PII replaced by [TYPE]Prompt injection: check_injection()
Standalone fast classifier for prompt injection. Covers direct instruction override, role hijack, jailbreak, data exfiltration attempts, and indirect injection from external sources.
Injection Detection Flow
millisecond latency, no LLM call
5 attack patterns checked
borderline / inconclusive only
bool
str
critical–low
block/flag/allow
check = client.agent.check_injection(
content=user_input,
content_source="user_message", # "user_message" | "tool_result" | "external_data"
)
print(check.detected) # bool
print(check.confidence) # float 0-1
print(check.attack_type) # "direct_instruction_override" | "role_hijack" | "jailbreak" | ...
print(check.severity) # "critical" | "high" | "medium" | "low"
print(check.recommended_action) # "block" | "flag" | "allow"
if check.detected and check.severity in ("critical", "high"):
raise ValueError(f"Injection attempt blocked: {check.attack_type}")Plan evaluation: evaluate_plan()
Evaluate a multi-step agent plan (up to 20 steps) before any execution begins. Returns a per-step risk assessment and an overall plan decision.
Plan Evaluation: Per-Step Assessment
query: "user address"
to: {search_result}
table: users, data: ...
Blocked at step 3 — compliance violation detected in update_record
Steps 1–2 flagged but plan cannot proceed
from rail_score_sdk.agent.exceptions import PlanBlockedError
plan_steps = [
{"step": 1, "tool": "web_search", "params": {"query": "user address lookup"}},
{"step": 2, "tool": "send_email", "params": {"to": "{search_result}", "body": "..."}},
{"step": 3, "tool": "update_record", "params": {"table": "users", "data": "..."}},
]
try:
evaluation = client.agent.evaluate_plan(
plan_steps=plan_steps,
domain="general",
mode="basic",
)
print(evaluation.overall_decision) # "ALLOW" | "FLAG" | "BLOCK"
print(evaluation.overall_risk_score) # float 0-10
for step in evaluation.step_evaluations:
print(f"Step {step.step_number}: {step.decision} — {step.decision_reason}")
except PlanBlockedError as e:
print(f"Plan blocked at step {e.blocked_step}: {e.reason}")Cross-call tracking: AgentSession
AgentSession wraps the same agent methods but accumulates risk across calls. It detects patterns like repeated PII exposure, escalating risk, or blocked retries within a single workflow.
AgentSession: Cross-Call Risk Accumulation
Call sequence
evaluate_tool_call()
credit_scoring
evaluate_tool_result()
PII: email detected
evaluate_tool_call()
loan_approval
session.risk_summary()
cross-call pattern detected
from rail_score_sdk.agent.session import AgentSession
session = AgentSession(
client=client,
agent_id="my-agent-001",
max_session_age=3600, # session expires after 1 hour
)
# Same API as client.agent — but risk is tracked cumulatively
decision = session.evaluate_tool_call(
tool_name="credit_scoring",
tool_params={"applicant_id": "usr_123", "income": 45000},
domain="finance",
mode="basic",
)
risk = session.evaluate_tool_result(
tool_name="credit_scoring",
tool_result_data=result,
tool_params=tool_params,
)
# Review session-wide risk summary
summary = session.risk_summary()
print(summary.total_calls) # int
print(summary.average_risk) # "low" | "medium" | "high" | "critical"
print(summary.dimension_trends) # dict of per-dimension avg scores
print(summary.patterns) # list of detected cross-call patterns
session.close()Per-tool policy: AgentPolicy
Define per-tool thresholds and attach them to an AgentPolicyEngine. The engine applies the matching policy after each evaluation.
from rail_score_sdk.agent.policy import AgentPolicy, AgentPolicyEngine
engine = AgentPolicyEngine(
policies={
"credit_scoring": AgentPolicy(
block_below=6.0,
flag_below=8.0,
dimension_thresholds={"fairness": 7.0, "privacy": 7.0},
),
"send_email": AgentPolicy(
block_below=5.0,
flag_below=7.0,
),
},
default_policy=AgentPolicy(block_below=4.0, flag_below=6.0),
)
decision = client.agent.evaluate_tool_call(
tool_name="credit_scoring",
tool_params=params,
domain="finance",
mode="basic",
)
# Engine applies the credit_scoring policy to the decision
enforced = engine.evaluate("credit_scoring", decision)
print(enforced.final_action) # "block" | "flag" | "allow"AgentMiddleware: @guard decorator
The @guard decorator from AgentMiddleware wraps any async function to evaluate its inputs before execution and optionally scan its output after.
from rail_score_sdk.agent.middleware import AgentMiddleware
middleware = AgentMiddleware(
rail_api_key="YOUR_RAIL_API_KEY",
policy="block", # "block" | "log_only" | "suggest_fix" | "auto_fix"
threshold=7.0,
domain="finance",
scan_results=True, # Also scan tool output after execution
)
@middleware.guard(tool_name="credit_scoring")
async def run_credit_check(applicant_id, annual_income, zip_code):
# This function is only called if evaluation returns ALLOW or FLAG
return await credit_api.check(applicant_id, annual_income, zip_code)
# Call as normal — RAIL evaluation happens automatically
result = await run_credit_check(
applicant_id="usr_123",
annual_income=65000,
zip_code="10001",
)Tool Risk Registry
Register custom risk profiles for your organization's tools. Org profiles override system defaults and allow you to define proxy variable watchlists, prohibited attributes, and per-tool thresholds.
registry = client.agent.registry
# List all tools visible to your org (system defaults + your overrides)
tools = registry.list_tools()
for tool in tools:
print(f"{tool['tool_name']}: risk={tool['risk_level']}, depth={tool['evaluation_depth']}")
# Register a custom tool profile
registry.register_tool({
"tool_name": "loan_approval",
"risk_level": "high", # "low" | "medium" | "high" | "critical"
"evaluation_depth": "deep", # "basic" | "deep"
"compliance_frameworks": ["eu_ai_act", "gdpr"],
"proxy_variable_watchlist": ["zip_code", "neighborhood", "surname"],
"prohibited_attributes": ["race", "religion", "national_origin"],
"description": "Automated loan approval — requires deep evaluation",
})
# Remove a custom profile (reverts to system default)
registry.delete_tool("loan_approval")Exceptions
| Exception | HTTP | When |
|---|---|---|
| AgentBlockedError | 403 | Tool call blocked by policy — exposes .tool_name, .decision_reason |
| PlanBlockedError | — | Plan evaluation blocked at a specific step — exposes .blocked_step, .reason |
| SessionClosedError | — | Method called on a closed AgentSession |
from rail_score_sdk.agent.exceptions import (
AgentBlockedError,
PlanBlockedError,
SessionClosedError,
)
try:
decision = client.agent.evaluate_tool_call(
tool_name="loan_approval",
tool_params=params,
domain="finance",
mode="deep",
)
except AgentBlockedError as e:
print(f"Blocked: tool={e.tool_name}, reason={e.decision_reason}")
except PlanBlockedError as e:
print(f"Plan blocked at step {e.blocked_step}: {e.reason}")
except SessionClosedError:
print("Session was already closed")