The RAIL Framework

A single quality score can tell you content is bad, but not why or what to fix. RAIL breaks AI evaluation into 8 independent dimensions so you can pinpoint exactly which aspect failed, set per-dimension thresholds, and take targeted action.

The 8 Dimensions

Why 8 Dimensions

Content can be factually accurate but biased, safe but unhelpful, or transparent but unreliable. A single score hides these tradeoffs. RAIL surfaces them so you can make informed decisions:

Targeted thresholds

Set safety: 8.0 for a healthcare bot while accepting inclusivity: 6.0 for internal tooling. One size doesn't fit all.

Actionable diagnostics

When content fails, the dimension breakdown tells you whether to fix factual claims (reliability), add disclaimers (transparency), or rephrase for equity (fairness).

Trend monitoring

Track per-dimension averages over time. A drop in accountability scores after a model update is a different problem than a drop in safety.

Compliance mapping

Dimensions map to regulatory requirements. Privacy and transparency scores feed directly into GDPR assessments; safety and accountability into EU AI Act checks.

How Scoring Works

Each of the 8 dimensions is scored from 0 to 10. The overall RAIL Score is a weighted average (equal weights by default, customizable per request).

Every score includes a confidence value (0.0–1.0). Confidence reflects evaluation certainty. It drops when content is ambiguous, domain-specific, or when the evaluator encounters conflicting signals. Scores with confidence below 0.5 should be treated as approximate.

Basic Mode1 credit (all dims)

Per-dimension scores + confidence
Overall RAIL score + label
Cached for 5 minutes

Best for: content gating, CI pipelines, high-volume monitoring

Deep Mode3 credits (all dims)

Everything in basic, plus:
Per-dimension explanations
Issue tags with severity (e.g., hallucination, pii_exposure)
Improvement suggestions
Cached for 3 minutes

Best for: content review, debugging low scores, audit trails

035710

Excellent9.0 – 10

Meets the highest standards across all evaluated dimensions

Good7.0 – 8.9

Strong performance with minor areas for improvement

Needs Improvement5.0 – 6.9

Noticeable gaps that should be addressed before deployment

Poor3.0 – 4.9

Significant issues. Content may cause harm or mislead users

Critical0 – 2.9

Severe failures. Content should not be served to users

Acting on Scores

How you use RAIL scores depends on your risk tolerance and use case. Here are common patterns:

Score Range	Recommended Action	Example
9–10	Serve directly	Content passes all quality gates. Deliver to the user
7–8.9	Serve, optionally log for review	Good enough for most use cases. Log low-dimension scores for trend monitoring.
5–6.9	Regenerate or flag for human review	Use the Safe-Regenerate API to iteratively improve before serving
3–4.9	Block and queue for manual review	Content has significant issues. Regeneration may help but human review is recommended.
0–2.9	Block immediately	Critical failures: harmful content, fabricated claims, or privacy violations. Do not serve or regenerate.

Privacy dimension note: When privacy is not relevant to the content (e.g., "How do I center a div?"), the privacy score is fixed at 5.0 with confidence 1.0. This is expected behavior, not a failure. Do not gate content on a neutral privacy score.

Custom Weights

By default, all 8 dimensions contribute equally to the overall RAIL Score. Pass a weights object to shift emphasis toward the dimensions that matter most for your use case. Weights must sum to 100.

Healthcare

Patient safety and factual accuracy are non-negotiable

safety 30%

reliability 25%

privacy 20%

Customer Support

Helpful, accessible responses that serve all users equitably

user_impact 30%

inclusivity 20%

fairness 15%

Legal / Finance

Verifiable claims with traceable reasoning and clear assumptions

reliability 30%

accountability 25%

transparency 20%

Example: Healthcare weights

{
  "content": "Based on your symptoms...",
  "mode": "deep",
  "domain": "healthcare",
  "weights": {
    "safety": 30, "reliability": 25, "privacy": 20,
    "accountability": 10, "transparency": 5,
    "fairness": 5, "inclusivity": 3, "user_impact": 2
  }
}

Dimensions in Compliance Checks

When you use the Compliance API, each regulatory framework maps to a relevant subset of RAIL dimensions. The compliance score draws from these dimension evaluations.

Framework	Primary Dimensions
GDPR	Privacy, Transparency, Accountability
HIPAA	Privacy, Safety, Reliability, Accountability
EU AI Act	Transparency, Fairness, Safety, Accountability
CCPA	Privacy, Transparency
India DPDP	Privacy, Transparency, Accountability

Evaluation API Reference Quick Start Guide Compliance API Try the Live Evaluator

The RAIL Framework

The 8 Dimensions

Fairness

Safety

Reliability

Transparency

Privacy

Accountability

Inclusivity

User Impact