RAIL publishes open datasets so the responsible-AI research community can reproduce, extend, and critique our evaluation methodology. Both datasets below are MIT-licensed and available through the Responsible AI Labs HuggingFace organization.
RAIL-HH-10K
789 downloads · Published Oct 31, 2025
The first large-scale, multi-dimensional safety dataset for responsible AI evaluation. Every example is annotated across all eight RAIL dimensions with scores, key spans, and reviewer reasoning. Supports text generation, RLHF, QA, and feature extraction tasks. 10K size category, Parquet format.
View on HuggingFace →Indian Responsible AI Benchmark
158 downloads · Published March 12, 2026
Responsible-AI evaluation tasks authored in an Indian context: English and Hindi prompts across text classification and generation, designed to test whether models handle Indian cultural, linguistic, and regulatory context as carefully as they handle Western defaults. Parquet + optimized-parquet formats.
View on HuggingFace →Both datasets are designed to be used together: RAIL-HH-10K for general responsible-AI scoring and fine-tuning, and the Indian Responsible AI Benchmark for evaluating regional competence. If you use either dataset in published research, we'd appreciate a citation of our arXiv paper.