Middleware

Middleware is the pattern of intercepting every LLM response and attaching a RAIL score to it before it reaches the rest of your application. You replace your LLM client with a RAIL wrapper — the wrapper calls the LLM, evaluates the response, and returns both the content and the scores in a single object.

The Problem It Solves

Without middleware, adding responsible-AI checks to every LLM call means writing evaluation code in every place you call the LLM — duplicating logic, risking coverage gaps, and cluttering your application code:

# ❌ Without middleware: eval code scattered everywhere
async def get_response(user_message):
    response = await openai_client.chat.completions.create(
        model="gpt-4o", messages=[{"role": "user", "content": user_message}]
    )
    content = response.choices[0].message.content

    # Must remember to eval every time, in every function
    score = rail_client.eval(content=content, mode="basic")
    if score.rail_score.score < 7.0:
        raise ValueError("Response below quality threshold")

    return content

With middleware, you set this up once at client construction and forget about it:

# ✓ With middleware: scoring is automatic
from rail_score_sdk.integrations import RAILOpenAI

client = RAILOpenAI(
    openai_api_key="...",
    rail_api_key="...",
    eval_mode="basic",
    threshold=7.0,
)

async def get_response(user_message):
    # Eval happens automatically — no extra code needed
    response = await client.chat(messages=[{"role": "user", "content": user_message}])
    return response.content   # .rail_score and .threshold_met are also on this object

How It Works

Provider Wrapper Pipeline

Messages

RAILOpenAI / RAILGemini

LLM API Call

RAIL Eval (auto)

RAILChatResponse / RAILGeminiResponse

.content.rail_score.rail_dimensions.threshold_met

When you call a method on the RAIL wrapper, it performs three steps transparently:

1.Forwards your messages to the underlying LLM API (OpenAI, Gemini, Anthropic, etc.) as a normal API call.
2.Takes the LLM response content and submits it to the RAIL evaluation endpoint in the same mode you configured at setup.
3.Returns a wrapped response object that contains the original content, the RAIL score, per-dimension scores, and a threshold_met boolean — all in one return value.

Supported Providers

Wrapper	Wraps	Python	JavaScript
RAILOpenAI	OpenAI chat completions	✓	✓
RAILGemini	Google Gemini	✓	✓
RAILAnthropic	Anthropic Claude	✓	✓
RAILLangChain	Any LangChain LLM	✓	—
Custom wrapper	Any HTTP-based LLM	✓	✓

Observe Mode vs Enforce Mode

You can use middleware purely to observe scores without interrupting the response flow, or configure it to enforce a threshold and raise an error (or trigger regeneration) when content doesn't meet the bar:

# Observe mode: score every response, never block
client = RAILOpenAI(
    openai_api_key="...",
    rail_api_key="...",
    eval_mode="basic",
    # No threshold set — always returns response
)

response = await client.chat(messages=[...])
print(response.content)       # The LLM's response
print(response.rail_score)    # RAIL score (always present)
print(response.threshold_met) # None — no threshold configured

Writing Custom Middleware

If you use an LLM provider without a built-in wrapper, you can build your own middleware using the core eval() call with any HTTP-based completion flow:

from rail_score_sdk import RailScoreClient

rail = RailScoreClient(api_key="...")

async def rail_middleware(llm_call, messages, threshold=7.0):
    """
    Generic RAIL middleware for any async LLM call.
    llm_call: async function that returns a string response
    """
    content = await llm_call(messages)

    result = rail.eval(content=content, mode="basic")

    if result.rail_score.score < threshold:
        raise ValueError(
            f"Response scored {result.rail_score.score:.1f} — below threshold {threshold}. "
            f"Failed: {[d for d, s in result.dimension_scores.items() if s.score < threshold]}"
        )

    return content, result


# Use with any LLM:
content, score = await rail_middleware(my_llm_call, messages, threshold=7.5)

Middleware

The Problem It Solves

How It Works

Supported Providers

Observe Mode vs Enforce Mode

Writing Custom Middleware

Next Steps

Policy Engine →

Python: Integrations →

JavaScript: Providers →

AI Chatbot Tutorial →