Documentation

The RAIL Framework

A single quality score can tell you content is bad, but not why or what to fix. RAIL breaks AI evaluation into 8 independent dimensions so you can pinpoint exactly which aspect failed, set per-dimension thresholds, and take targeted action.

Score frameworkFairnessSafetyReliabilityTransparencyPrivacyAccountabilityInclusivityUser Impact

The 8 Dimensions

Why 8 Dimensions

Content can be factually accurate but biased, safe but unhelpful, or transparent but unreliable. A single score hides these tradeoffs. RAIL surfaces them so you can make informed decisions:

Targeted thresholds

Set safety: 8.0 for a healthcare bot while accepting inclusivity: 6.0 for internal tooling. One size doesn't fit all.

Actionable diagnostics

When content fails, the dimension breakdown tells you whether to fix factual claims (reliability), add disclaimers (transparency), or rephrase for equity (fairness).

Trend monitoring

Track per-dimension averages over time. A drop in accountability scores after a model update is a different problem than a drop in safety.

Compliance mapping

Dimensions map to regulatory requirements. Privacy and transparency scores feed directly into GDPR assessments; safety and accountability into EU AI Act checks.

How Scoring Works

Each of the 8 dimensions is scored from 0 to 10. The overall RAIL Score is a weighted average (equal weights by default, customizable per request).

Every score includes a confidence value (0.0–1.0). Confidence reflects evaluation certainty. It drops when content is ambiguous, domain-specific, or when the evaluator encounters conflicting signals. Scores with confidence below 0.5 should be treated as approximate.

Basic Mode1 credit (all dims)
  • Per-dimension scores + confidence
  • Overall RAIL score + label
  • Cached for 5 minutes

Best for: content gating, CI pipelines, high-volume monitoring

Deep Mode3 credits (all dims)
  • Everything in basic, plus:
  • Per-dimension explanations
  • Issue tags with severity (e.g., hallucination, pii_exposure)
  • Improvement suggestions
  • Cached for 3 minutes

Best for: content review, debugging low scores, audit trails

035710
Excellent9.0 – 10

Meets the highest standards across all evaluated dimensions

Good7.0 – 8.9

Strong performance with minor areas for improvement

Needs Improvement5.0 – 6.9

Noticeable gaps that should be addressed before deployment

Poor3.0 – 4.9

Significant issues. Content may cause harm or mislead users

Critical0 – 2.9

Severe failures. Content should not be served to users

Acting on Scores

How you use RAIL scores depends on your risk tolerance and use case. Here are common patterns:

Score RangeRecommended ActionExample
9–10Serve directlyContent passes all quality gates. Deliver to the user
7–8.9Serve, optionally log for reviewGood enough for most use cases. Log low-dimension scores for trend monitoring.
5–6.9Regenerate or flag for human reviewUse the Safe-Regenerate API to iteratively improve before serving
3–4.9Block and queue for manual reviewContent has significant issues. Regeneration may help but human review is recommended.
0–2.9Block immediatelyCritical failures: harmful content, fabricated claims, or privacy violations. Do not serve or regenerate.

Privacy dimension note: When privacy is not relevant to the content (e.g., "How do I center a div?"), the privacy score is fixed at 5.0 with confidence 1.0. This is expected behavior, not a failure. Do not gate content on a neutral privacy score.

Custom Weights

By default, all 8 dimensions contribute equally to the overall RAIL Score. Pass a weights object to shift emphasis toward the dimensions that matter most for your use case. Weights must sum to 100.

Healthcare

Patient safety and factual accuracy are non-negotiable

safety 30%
reliability 25%
privacy 20%

Customer Support

Helpful, accessible responses that serve all users equitably

user_impact 30%
inclusivity 20%
fairness 15%

Legal / Finance

Verifiable claims with traceable reasoning and clear assumptions

reliability 30%
accountability 25%
transparency 20%
Example: Healthcare weights
{
  "content": "Based on your symptoms...",
  "mode": "deep",
  "domain": "healthcare",
  "weights": {
    "safety": 30, "reliability": 25, "privacy": 20,
    "accountability": 10, "transparency": 5,
    "fairness": 5, "inclusivity": 3, "user_impact": 2
  }
}

Dimensions in Compliance Checks

When you use the Compliance API, each regulatory framework maps to a relevant subset of RAIL dimensions. The compliance score draws from these dimension evaluations.

FrameworkPrimary Dimensions
GDPRPrivacy, Transparency, Accountability
HIPAAPrivacy, Safety, Reliability, Accountability
EU AI ActTransparency, Fairness, Safety, Accountability
CCPAPrivacy, Transparency
India DPDPPrivacy, Transparency, Accountability