The RAIL Framework
A single quality score can tell you content is bad, but not why or what to fix. RAIL breaks AI evaluation into 8 independent dimensions so you can pinpoint exactly which aspect failed, set per-dimension thresholds, and take targeted action.
The 8 Dimensions
Why 8 Dimensions
Content can be factually accurate but biased, safe but unhelpful, or transparent but unreliable. A single score hides these tradeoffs. RAIL surfaces them so you can make informed decisions:
Targeted thresholds
Set safety: 8.0 for a healthcare bot while accepting inclusivity: 6.0 for internal tooling. One size doesn't fit all.
Actionable diagnostics
When content fails, the dimension breakdown tells you whether to fix factual claims (reliability), add disclaimers (transparency), or rephrase for equity (fairness).
Trend monitoring
Track per-dimension averages over time. A drop in accountability scores after a model update is a different problem than a drop in safety.
Compliance mapping
Dimensions map to regulatory requirements. Privacy and transparency scores feed directly into GDPR assessments; safety and accountability into EU AI Act checks.
How Scoring Works
Each of the 8 dimensions is scored from 0 to 10. The overall RAIL Score is a weighted average (equal weights by default, customizable per request).
Every score includes a confidence value (0.0–1.0). Confidence reflects evaluation certainty. It drops when content is ambiguous, domain-specific, or when the evaluator encounters conflicting signals. Scores with confidence below 0.5 should be treated as approximate.
- Per-dimension scores + confidence
- Overall RAIL score + label
- Cached for 5 minutes
Best for: content gating, CI pipelines, high-volume monitoring
- Everything in basic, plus:
- Per-dimension explanations
- Issue tags with severity (e.g.,
hallucination,pii_exposure) - Improvement suggestions
- Cached for 3 minutes
Best for: content review, debugging low scores, audit trails
Meets the highest standards across all evaluated dimensions
Strong performance with minor areas for improvement
Noticeable gaps that should be addressed before deployment
Significant issues. Content may cause harm or mislead users
Severe failures. Content should not be served to users
Acting on Scores
How you use RAIL scores depends on your risk tolerance and use case. Here are common patterns:
| Score Range | Recommended Action | Example |
|---|---|---|
| 9–10 | Serve directly | Content passes all quality gates. Deliver to the user |
| 7–8.9 | Serve, optionally log for review | Good enough for most use cases. Log low-dimension scores for trend monitoring. |
| 5–6.9 | Regenerate or flag for human review | Use the Safe-Regenerate API to iteratively improve before serving |
| 3–4.9 | Block and queue for manual review | Content has significant issues. Regeneration may help but human review is recommended. |
| 0–2.9 | Block immediately | Critical failures: harmful content, fabricated claims, or privacy violations. Do not serve or regenerate. |
Privacy dimension note: When privacy is not relevant to the content (e.g., "How do I center a div?"), the privacy score is fixed at 5.0 with confidence 1.0. This is expected behavior, not a failure. Do not gate content on a neutral privacy score.
Custom Weights
By default, all 8 dimensions contribute equally to the overall RAIL Score. Pass a weights object to shift emphasis toward the dimensions that matter most for your use case. Weights must sum to 100.
Healthcare
Patient safety and factual accuracy are non-negotiable
Customer Support
Helpful, accessible responses that serve all users equitably
Legal / Finance
Verifiable claims with traceable reasoning and clear assumptions
{
"content": "Based on your symptoms...",
"mode": "deep",
"domain": "healthcare",
"weights": {
"safety": 30, "reliability": 25, "privacy": 20,
"accountability": 10, "transparency": 5,
"fairness": 5, "inclusivity": 3, "user_impact": 2
}
}Dimensions in Compliance Checks
When you use the Compliance API, each regulatory framework maps to a relevant subset of RAIL dimensions. The compliance score draws from these dimension evaluations.
| Framework | Primary Dimensions |
|---|---|
| GDPR | Privacy, Transparency, Accountability |
| HIPAA | Privacy, Safety, Reliability, Accountability |
| EU AI Act | Transparency, Fairness, Safety, Accountability |
| CCPA | Privacy, Transparency |
| India DPDP | Privacy, Transparency, Accountability |