Middleware
Middleware is the pattern of intercepting every LLM response and attaching a RAIL score to it before it reaches the rest of your application. You replace your LLM client with a RAIL wrapper — the wrapper calls the LLM, evaluates the response, and returns both the content and the scores in a single object.
The Problem It Solves
Without middleware, adding responsible-AI checks to every LLM call means writing evaluation code in every place you call the LLM — duplicating logic, risking coverage gaps, and cluttering your application code:
# ❌ Without middleware: eval code scattered everywhere
async def get_response(user_message):
response = await openai_client.chat.completions.create(
model="gpt-4o", messages=[{"role": "user", "content": user_message}]
)
content = response.choices[0].message.content
# Must remember to eval every time, in every function
score = rail_client.eval(content=content, mode="basic")
if score.rail_score.score < 7.0:
raise ValueError("Response below quality threshold")
return contentWith middleware, you set this up once at client construction and forget about it:
# ✓ With middleware: scoring is automatic
from rail_score_sdk.integrations import RAILOpenAI
client = RAILOpenAI(
openai_api_key="...",
rail_api_key="...",
eval_mode="basic",
threshold=7.0,
)
async def get_response(user_message):
# Eval happens automatically — no extra code needed
response = await client.chat(messages=[{"role": "user", "content": user_message}])
return response.content # .rail_score and .threshold_met are also on this objectHow It Works
Provider Wrapper Pipeline
When you call a method on the RAIL wrapper, it performs three steps transparently:
- 1.Forwards your messages to the underlying LLM API (OpenAI, Gemini, Anthropic, etc.) as a normal API call.
- 2.Takes the LLM response content and submits it to the RAIL evaluation endpoint in the same mode you configured at setup.
- 3.Returns a wrapped response object that contains the original content, the RAIL score, per-dimension scores, and a
threshold_metboolean — all in one return value.
Supported Providers
| Wrapper | Wraps | Python | JavaScript |
|---|---|---|---|
| RAILOpenAI | OpenAI chat completions | ✓ | ✓ |
| RAILGemini | Google Gemini | ✓ | ✓ |
| RAILAnthropic | Anthropic Claude | ✓ | ✓ |
| RAILLangChain | Any LangChain LLM | ✓ | — |
| Custom wrapper | Any HTTP-based LLM | ✓ | ✓ |
Observe Mode vs Enforce Mode
You can use middleware purely to observe scores without interrupting the response flow, or configure it to enforce a threshold and raise an error (or trigger regeneration) when content doesn't meet the bar:
# Observe mode: score every response, never block
client = RAILOpenAI(
openai_api_key="...",
rail_api_key="...",
eval_mode="basic",
# No threshold set — always returns response
)
response = await client.chat(messages=[...])
print(response.content) # The LLM's response
print(response.rail_score) # RAIL score (always present)
print(response.threshold_met) # None — no threshold configuredWriting Custom Middleware
If you use an LLM provider without a built-in wrapper, you can build your own middleware using the core eval() call with any HTTP-based completion flow:
from rail_score_sdk import RailScoreClient
rail = RailScoreClient(api_key="...")
async def rail_middleware(llm_call, messages, threshold=7.0):
"""
Generic RAIL middleware for any async LLM call.
llm_call: async function that returns a string response
"""
content = await llm_call(messages)
result = rail.eval(content=content, mode="basic")
if result.rail_score.score < threshold:
raise ValueError(
f"Response scored {result.rail_score.score:.1f} — below threshold {threshold}. "
f"Failed: {[d for d, s in result.dimension_scores.items() if s.score < threshold]}"
)
return content, result
# Use with any LLM:
content, score = await rail_middleware(my_llm_call, messages, threshold=7.5)