Evaluation
Score AI content across all 8 RAIL dimensions using client.eval(). Returns scores, confidence, and optional per-dimension explanations.
client.eval()
Basic mode
result = client.eval(
content="There are several natural approaches that may help with insomnia. Establishing a consistent sleep schedule, limiting screen time before bed, and creating a cool, dark sleeping environment are well-supported strategies. If sleep problems persist, consulting a healthcare provider is recommended.",
mode="basic",
)
print(result.rail_score.score) # 8.4 (float 0–10)
print(result.rail_score.confidence) # 0.85 (float 0–1)
print(result.rail_score.summary) # "RAIL Score: 8.4/10 — Good"
for dim, detail in result.dimension_scores.items():
print(f" {dim}: {detail.score} (confidence: {detail.confidence})")
print(result.from_cache) # FalseDeep mode — explanations and issues
result = client.eval(
content="When reviewing resumes, prioritize candidates from top-tier universities like Stanford and MIT. Candidates from lesser-known institutions typically lack the rigorous training needed for this role.",
mode="deep",
include_explanations=True,
include_issues=True,
include_suggestions=True,
)
for dim, detail in result.dimension_scores.items():
print(f"\n{dim} — {detail.score}/10")
if detail.explanation:
print(f" Explanation: {detail.explanation}")
if detail.issues:
print(f" Issues: {detail.issues}") # ["biased_framing", "demographic_assumption"]
if detail.suggestions:
print(f" Suggestions: {detail.suggestions}")Selective dimensions
result = client.eval(
content="To reset your password, click the link sent to john.doe@company.com. Your employee ID is EMP-29481.",
mode="basic",
dimensions=["privacy", "safety"],
)Custom weights
result = client.eval(
content="Based on your symptoms of chest tightness, you should take 325mg aspirin immediately. This is likely a mild cardiac event that will resolve on its own.",
mode="deep",
domain="healthcare",
weights={"safety": 25, "reliability": 20, "privacy": 20, "accountability": 15, "transparency": 10, "fairness": 5, "inclusivity": 3, "user_impact": 2},
)
# Overall score weighted: safety + reliability matter mostModes: basic — scores only, cached 5 min, costs 1.0 credit. deep — scores + explanations + issues, cached 3 min, costs 3.0 credits.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| content | str | Required | AI-generated text to evaluate (10–10,000 chars) |
| mode | str | "basic" | "basic" or "deep" |
| dimensions | list[str] | all 8 | Subset of dimensions to score |
| weights | dict | equal | Per-dimension weights (must sum to 100) |
| domain | str | "general" | Content domain: "general", "healthcare", "finance", "legal" |
| include_explanations | bool | False | Per-dimension explanations (deep mode) |
| include_issues | bool | False | Issue tags per dimension (deep mode) |
| include_suggestions | bool | False | Improvement suggestions (deep mode) |
Response: EvalResult
{
"rail_score": {
"score": 8.4, # float 0–10
"confidence": 0.85, # float 0–1
"summary": "RAIL Score: 8.4/10 — Good"
},
"dimension_scores": {
"fairness": {"score": 9.0, "confidence": 0.90},
"safety": {"score": 9.2, "confidence": 0.92},
"reliability": {"score": 8.1, "confidence": 0.88},
"transparency": {"score": 7.8, "confidence": 0.82},
"privacy": {"score": 5.0, "confidence": 1.00}, # 5.0 = N/A
"accountability": {"score": 8.5, "confidence": 0.86},
"inclusivity": {"score": 8.9, "confidence": 0.91},
"user_impact": {"score": 8.7, "confidence": 0.89}
},
# Deep mode adds per-dimension explanation, issues, suggestions
"from_cache": false
}Score Labels
| Range | Label | Meaning |
|---|---|---|
| 9.0 – 10.0 | Excellent | Meets or exceeds all expectations |
| 7.0 – 8.9 | Good | Acceptable with minor gaps |
| 5.0 – 6.9 | Needs Improvement | Significant issues present |
| 3.0 – 4.9 | Poor | Multiple major problems |
| 0.0 – 2.9 | Critical | Severe violations, likely blocked |