Documentation

Evaluation

Evaluation is the core primitive of RAIL. You give it a piece of AI-generated text and it returns a score from 0 to 10 for each of 8 responsible-AI dimensions, plus an overall RAIL score.

How RAIL Scoring Works

AI-Generated Text

your content

RAIL Score API

8 responsible-AI criteria

Fairness
Safety
Reliability
Transparency
Privacy
Accountability
Inclusivity
User Impact

8 dimension scores (0–10)

RAIL Score

0–10 overall

Evaluation observes — it doesn't change. It tells you how responsible a piece of content is. It doesn't modify the content or block it. For automatic improvement, see Safe Regeneration. For enforcement, see Policy Engine.

The 8 RAIL Dimensions

Each dimension is scored independently on a 0–10 scale. The overall RAIL score is a weighted average across all 8.

DimensionWhat it measuresWhat it catches
FairnessEquitable treatment across demographic groupsBias, stereotyping, double standards, differential treatment based on race, gender, religion, or other characteristics
SafetyPrevention of harmful, toxic, or dangerous contentHarmful instructions, insufficient warnings, toxic or violent content, promotion of self-harm
ReliabilityFactual accuracy and appropriate epistemic calibrationHallucinations, fabricated citations, factual errors stated as fact, inappropriate certainty on uncertain claims
TransparencyHonest communication of reasoning, limitations, and AI natureConcealed limitations, fabricated reasoning, misleading certainty, failure to disclose when relevant
PrivacyN/A → score 5.0Protection of personal information and sensitive dataPII exposure, unnecessary data disclosure, surveillance facilitation, insecure data handling suggestions
AccountabilityTraceable reasoning that can be audited and correctedOpaque conclusions without basis, circular reasoning, discouraging scrutiny or correction
InclusivityAccessible, inclusive language for diverse usersExclusionary language, unexplained jargon, cultural assumptions, unnecessarily gendered defaults
User ImpactPositive value delivered relative to the user's actual needFailing to address the real question, wrong level of detail, tone mismatch, unjustified refusals
Privacy special case: When privacy is not applicable to a prompt/response (e.g., a question about JavaScript syntax), the score is forced to exactly 5.0 (neutral). This prevents privacy from unfairly dragging down the overall score in non-privacy contexts.

How Scoring Works Internally

Evaluation Pipeline

Content + Parameters

Scoring Layer

ML Classifier

8 dimensions

Safety Check

toxicity signals

Privacy NLP

PII detection

Deep mode only
LLM Judge

explanations + issues

Overall Score

weighted avg

8 Dim Scores

0–10 each

Confidence

0–1 per dim

Explanations

deep only

The scoring pipeline runs in layers. Every evaluation — basic or deep — starts with the same core ML + NLP layer:

  • 1.DeBERTa-v3 classifier produces raw dimension scores from the content alone using an ONNX-optimised model — fast and deterministic.
  • 2.Perspective API augments the safety dimension with trained toxicity detection across several content categories.
  • 3.spaCy NLP analyses the privacy dimension by detecting PII patterns and data-handling language.
  • 4.Deep mode only: GPT-4o-mini generates natural-language explanations for each dimension score, identifies specific issues, and produces actionable improvement suggestions.

Basic vs Deep Mode

FeatureBasicDeep
Score per dimension
Confidence per dimension
Overall RAIL score
Per-dimension explanations
Issue detection
Improvement suggestions
Typical latency~200ms~2–4s
Credit cost (all 8 dimensions)1.03.0

Use basic mode when:

  • → Scoring high volumes of content (cost-sensitive)
  • → Gating responses in real-time with a threshold check
  • → Monitoring aggregate quality trends
  • → Running inside a safe-regenerate loop

Use deep mode when:

  • → Debugging why specific content scores poorly
  • → Surfacing issues for human review
  • → Building an audit trail with explanations
  • → Evaluating samples for quality monitoring

Score Tiers

Every dimension score and the overall RAIL score map to a human-readable tier. The SDK exposes these via result.rail_score.summary (both SDKs) and getScoreLabel(score) (JavaScript SDK).

RangeTierMeaning
≥ 9.0ExcellentMeets the highest responsible-AI standards
≥ 7.0GoodSafe and responsible — minor improvements possible
≥ 5.0Needs ImprovementIssues present — review before production use
≥ 3.0PoorSignificant concerns — substantial revision needed
< 3.0CriticalSerious violations — block from production immediately

Scoring Specific Dimensions

You can evaluate a subset of dimensions to reduce cost and latency. Pass a dimensions array with any combination of the 8 dimension keys.

# Only score safety and fairness — cost: min(0.3 × 2, 1.0) = 0.6 credits
result = client.eval(
    content="...",
    mode="basic",
    dimensions=["safety", "fairness"]
)

# The response only includes the requested dimensions
print(result.dimension_scores["safety"].score)
print(result.dimension_scores["fairness"].score)

Caching

Identical evaluation requests within the cache window return the cached result immediately at zero credit cost. The cache key is a hash of content + mode + dimensions.

ModeCache windowIndicator
Basic5 minutesfrom_cache: true
Deep3 minutesfrom_cache: true

Next Steps