DatasetAvailable
JudicialMind Legal Training Dataset
A large-scale, multilingual query-passage corpus for training legal IR and QA systems.
3.69 million annotated query-passage pairs across 35 languages, covering case law, statutes, regulations, and contracts. Rich row-level metadata (query type, legal domain, difficulty, jurisdiction) supports clean train / validation / test partitioning via an A / B / C bucket split.
legalmultilingualretrievalragquestion-answeringsemantic-searchlegal-reasoning
DatasetAvailable
India Acts - Central & State Statutes
A comprehensive PDF corpus of Indian legislation, Central and State, in English and Hindi.
12,102 PDF files spanning all 28 States and 8 Union Territories plus Parliament Acts from 1836 to 2025. Consolidated from the India Code portal and individual state-legislature sources. Intended for statutory retrieval, legal QA, summarization, OCR / parsing benchmarks, multilingual legal NLP and citation analysis. Structured by Central vs. State, language, year of enactment and act title for clean navigation.
legalindiaactsstatutesbilingualgovernmentsummarizationretrieval
DatasetComing soon
Legal Reranking Corpus
Cross-jurisdictional pairwise relevance judgements for reranker training.
Coming soon. A curated set of hard-negative triples for training legal cross-encoders and rerankers, with calibrated relevance labels across case law and statutory passages.
legalrerankingcross-encoderhard-negatives