Documentation

Evaluation

Score AI content across all 8 RAIL dimensions using client.eval(). Returns scores, confidence, and optional per-dimension explanations.

client.eval()

Basic mode

result = client.eval(
    content="There are several natural approaches that may help with insomnia. Establishing a consistent sleep schedule, limiting screen time before bed, and creating a cool, dark sleeping environment are well-supported strategies. If sleep problems persist, consulting a healthcare provider is recommended.",
    mode="basic",
)

print(result.rail_score.score)         # 8.4  (float 0–10)
print(result.rail_score.confidence)    # 0.85 (float 0–1)
print(result.rail_score.summary)       # "RAIL Score: 8.4/10 — Good"

for dim, detail in result.dimension_scores.items():
    print(f"  {dim}: {detail.score} (confidence: {detail.confidence})")

print(result.from_cache)               # False

Deep mode — explanations and issues

result = client.eval(
    content="When reviewing resumes, prioritize candidates from top-tier universities like Stanford and MIT. Candidates from lesser-known institutions typically lack the rigorous training needed for this role.",
    mode="deep",
    include_explanations=True,
    include_issues=True,
    include_suggestions=True,
)

for dim, detail in result.dimension_scores.items():
    print(f"\n{dim} — {detail.score}/10")
    if detail.explanation:
        print(f"  Explanation: {detail.explanation}")
    if detail.issues:
        print(f"  Issues: {detail.issues}")     # ["biased_framing", "demographic_assumption"]
    if detail.suggestions:
        print(f"  Suggestions: {detail.suggestions}")

Selective dimensions

result = client.eval(
    content="To reset your password, click the link sent to john.doe@company.com. Your employee ID is EMP-29481.",
    mode="basic",
    dimensions=["privacy", "safety"],
)

Custom weights

result = client.eval(
    content="Based on your symptoms of chest tightness, you should take 325mg aspirin immediately. This is likely a mild cardiac event that will resolve on its own.",
    mode="deep",
    domain="healthcare",
    weights={"safety": 25, "reliability": 20, "privacy": 20, "accountability": 15, "transparency": 10, "fairness": 5, "inclusivity": 3, "user_impact": 2},
)
# Overall score weighted: safety + reliability matter most

Modes: basic — scores only, cached 5 min, costs 1.0 credit. deep — scores + explanations + issues, cached 3 min, costs 3.0 credits.

Parameters

ParameterTypeDefaultDescription
contentstrRequiredAI-generated text to evaluate (10–10,000 chars)
modestr"basic""basic" or "deep"
dimensionslist[str]all 8Subset of dimensions to score
weightsdictequalPer-dimension weights (must sum to 100)
domainstr"general"Content domain: "general", "healthcare", "finance", "legal"
include_explanationsboolFalsePer-dimension explanations (deep mode)
include_issuesboolFalseIssue tags per dimension (deep mode)
include_suggestionsboolFalseImprovement suggestions (deep mode)

Response: EvalResult

{
    "rail_score": {
        "score": 8.4,              # float 0–10
        "confidence": 0.85,        # float 0–1
        "summary": "RAIL Score: 8.4/10 — Good"
    },
    "dimension_scores": {
        "fairness":       {"score": 9.0, "confidence": 0.90},
        "safety":         {"score": 9.2, "confidence": 0.92},
        "reliability":    {"score": 8.1, "confidence": 0.88},
        "transparency":   {"score": 7.8, "confidence": 0.82},
        "privacy":        {"score": 5.0, "confidence": 1.00},  # 5.0 = N/A
        "accountability": {"score": 8.5, "confidence": 0.86},
        "inclusivity":    {"score": 8.9, "confidence": 0.91},
        "user_impact":    {"score": 8.7, "confidence": 0.89}
    },
    # Deep mode adds per-dimension explanation, issues, suggestions
    "from_cache": false
}

Score Labels

RangeLabelMeaning
9.0 – 10.0ExcellentMeets or exceeds all expectations
7.0 – 8.9GoodAcceptable with minor gaps
5.0 – 6.9Needs ImprovementSignificant issues present
3.0 – 4.9PoorMultiple major problems
0.0 – 2.9CriticalSevere violations, likely blocked