The Probabilistic Courthouse

For most of the modern era, predicting how a case would end was an art practiced in the dark. A seasoned litigator read the room, the judge and the precedent, then offered a client a gut estimate dressed in the language of confidence. That instinct was often wrong in expensive ways, and it was almost never tested against data. Today a different machinery is moving into the courthouse: outcome-simulation engines that run thousands of probabilistic scenarios, judicial analytics that mine decades of rulings, and forecasting models that estimate when a docket will clear. The shift is quietly rewriting how litigants behave, how settlements are valued, and how courts think about their own fairness.

70M

U.S. state-court filings, 2024

79%

Accuracy predicting ECHR rulings from text

5 yrs

Max prison term under France's judge-analytics ban

$7.5B

Projected legal-analytics market by 2031

The numbers hint at the scale of the opportunity and the depth of the unease. The volume of disputes flowing through courts is enormous, the predictive models are surprisingly accurate, and at least one major democracy has decided that some of this analysis is dangerous enough to criminalize. What follows is a map of how outcome simulation arrived, what it is doing inside the justice system right now, and where it is likely to push the courts over the next several years.

The Old Way: Intuition, Asymmetry and the Vanishing Trial

The traditional engine of dispute resolution was human judgment operating on incomplete information. Lawyers estimated the odds of winning; clients decided whether to fight or fold; insurers reserved against a number someone had guessed. The trouble was that the two sides of any case rarely agreed on the math. In one classic teaching experiment, after reading identical facts, plaintiffs predicted a judge's award averaging \$38,953 while defendants predicted \$24,426, a gap of more than \$14,000 on the same file, driven largely by self-serving optimism (Judicature, Duke Law). When each side believes it will win, cases that should settle instead grind on.

That asymmetry collided with a structural reality: the trial itself was disappearing. In the U.S. federal system and most state courts, the share of civil disputes resolved by trial has fallen for decades to a sliver of all dispositions, a phenomenon scholars call the "vanishing trial" (Judicature, Duke Law). If almost every case ends in settlement, dismissal or a plea, then the decisive question is no longer "Who would win at trial?" but "What is this case worth in the shadow of a trial that will probably never happen?" Answering that demanded a better estimate of probabilities than any individual lawyer could hold in their head.

The Shift: When Cases Became Predictable

The intellectual breakthrough was the realization that judicial decisions, in aggregate, follow patterns that machines can learn. A landmark study of the Supreme Court of the United States built a model spanning nearly two centuries of cases and correctly predicted 70.2% of case outcomes and 71.9% of individual justice votes from 1816 to 2015, using only information available before each decision (PLOS ONE). For comparison, knowledgeable legal experts had been measured at roughly 66% accuracy on the same kind of task (Science). The machine was, modestly but consistently, beating the humans.

European research pushed further. A widely cited study predicting decisions of the European Court of Human Rights from the text of cases alone reached 79% average accuracy, and found that the "circumstances", the formal facts, were the single most predictive ingredient, lending empirical weight to the legal-realist intuition that facts, not doctrine, drive outcomes (PeerJ Computer Science). Subsequent work has been more sober: a Trinity College Dublin study testing the same court on a realistic forward-looking set landed at 68.8% accuracy across twelve articles, below a naïve baseline of always predicting the most common outcome, a reminder that headline accuracy figures can flatter the technology (Trinity College Dublin).

How well can models predict court outcomes?

Reported predictive accuracy across published studies and a human-expert benchmark

Sources: PLOS ONE, PeerJ, Trinity College Dublin, Science, Artificial Intelligence and Law. Figures reflect distinct datasets and methods and are not directly comparable.

The more unsettling discoveries came from what the models keyed on. Analysis of asylum adjudications found that using only data available at the decision date, models reached around 80% accuracy, with predictions driven heavily by trend features and individual judge characteristics rather than the merits of the case, and that roughly 78% accuracy was reachable using only information present when the case opened, raising hard questions about snap judgments and pre-formed conclusions (Artificial Intelligence and Law). If who your judge is matters as much as what your case says, then analytics does not merely predict the system; it exposes it.

If who your judge is predicts the outcome as well as the facts do, then judicial analytics does not just forecast the system, it x-rays it.

What It Looks Like Now: Simulation as Negotiation Infrastructure

In present-day practice, outcome simulation has moved from research papers into the everyday economics of litigation. The dominant pattern is probabilistic rather than binary. Instead of declaring a single winner, modern engines estimate a distribution: the likelihood of each outcome, the range of plausible damages, and the expected value of a case once you multiply the chance of winning by the recovery if you do (Judicature, Duke Law). Monte Carlo techniques, running thousands of simulated case paths under varying assumptions, let parties see not just an answer but the shape of their risk.

Recent civil-litigation modelling shows why this matters for triage. One large study found that prediction is reliable mainly in the tails: when a model assigned a plaintiff-win probability above 90%, it was correct in 97% of those high-confidence cases, even though overall three-way accuracy was a more modest 61% and class-level discrimination (AUC) ranged from 0.74 to 0.81 (arXiv preprint). The practical lesson: simulation is least useful in the murky middle and most useful at the extremes, where it can flag a case as a near-certain settlement candidate or a defensible fight.

Confidence concentrates in the tails

Civil-litigation model: reliability of high-confidence predictions vs. overall accuracy

Source: "Predicting civil litigation outcomes," arXiv. Overall accuracy is three-class; tail accuracy is for plaintiff-win predictions above 90% confidence.

This capability does not sit on a shelf. Industry analysts value the broader legal-analytics market at roughly \$3.15 billion in 2025, projecting growth to about \$7.52 billion by 2031 at a compound annual rate near 15.6% (Mordor Intelligence). The demand is fed by sheer caseload pressure: the National Center for State Courts counted roughly 70 million incoming cases in U.S. state courts in 2024, up 4% on the year even as filings remained 27% below their 2012 level (National Center for State Courts). With traffic matters alone accounting for about 46% of incoming cases and contract filings rising 11%, courts have every incentive to forecast where the pressure will land (State Justice Institute).

Where the data comes from: representative outcome-prediction studies
Forum studied	What was predicted	Reported accuracy	Key driver
U.S. Supreme Court (1816 to 2015)	Case outcomes & justice votes	70.2% / 71.9%	Historical trends, justice behaviour
European Court of Human Rights	Article violation (text-only)	79% average	Case "circumstances" / facts
ECHR (forward-looking test)	Violation across 12 articles	68.8%	Below naïve baseline
Asylum adjudications	Grant vs. deny	~80%	Judge identity & trend features
Civil litigation (high-confidence)	Plaintiff win, >90% confidence	97%	Tail-of-distribution cases

The behavioural effect is the real story. When both sides can run the same simulation, the negotiating space narrows. A defendant staring at a credible 85% loss probability has little reason to litigate to verdict; a plaintiff facing a coin-flip and a long delay has reason to discount. Data-driven valuation tends to compress the optimism gap that historically kept cases from settling, turning simulation into a kind of shared infrastructure for negotiation rather than a weapon for one side.

The Backlash: Bans, Bias and Due Process

Not every jurisdiction welcomes the x-ray. In 2019 France became the first country in the world to criminalize the statistical analysis of named judges. Article 33 of its Justice Reform Act forbids reusing the identity data of magistrates and clerks "with the purpose or effect of evaluating, analysing, comparing or predicting" their professional practices, with a maximum penalty of five years' imprisonment (ABA Journal). The measure was reportedly spurred in part by analytics revealing wide disparities among individual judges in asylum decisions, and critics argue it sweeps far beyond commercial prediction to chill legitimate empirical legal research (Verfassungsblog).

A patchwork of approaches to judicial analytics

How jurisdictions are positioning on prediction and judge-level analysis

Illustrative grouping based on reported policy postures in ABA Journal, Verfassungsblog and Legal Futures.

The deeper anxiety is fairness. The most-litigated example is the use of proprietary risk-assessment scores in sentencing. In State v. Loomis, the Wisconsin Supreme Court allowed a closed-source recidivism tool to inform a sentence but circumscribed its use, requiring written warnings that the algorithm's methodology was a trade secret, that its scores were built on group rather than individual data, and that independent studies had questioned whether it disproportionately classified minority defendants as higher risk (Wisconsin Supreme Court). Investigative analysis of more than 7,000 risk scores found the tool flagged Black defendants as higher-risk at markedly elevated rates, crystallizing the concern that predictive systems can launder historical bias into seemingly neutral numbers (UNC Journal of Law & Technology).

There is also the problem of self-fulfilling prophecy. If litigants, insurers and even courts begin steering behaviour by the model's forecast, settling the cases it flags as losers, prioritizing the dockets it predicts will clog, the predictions can start to shape the very reality they claim to measure. Public-perception research finds that procedural-justice ratings drop when people learn a decision was informed by AI rather than human expertise alone, suggesting legitimacy costs that accuracy gains may not offset (Behavioral Sciences).

The Next Few Years: Probabilistic Justice, With Guardrails

The trajectory points toward simulation becoming ambient, embedded in case-management systems, settlement platforms and the back offices of courts themselves rather than confined to specialist analytics tools. Three developments look likely.

First, forecasting moves from cases to dockets. The same techniques that estimate a single outcome can estimate aggregate flow, clearance rates, time-to-disposition, and where backlogs will form. With state-court clearance and disposition data now published in interactive dashboards, court administrators have the raw material to forecast bottlenecks and allocate judges before backlogs metastasize (National Center for State Courts). Federal caseload data tells the same story of volatility worth predicting: U.S. district-court civil and criminal filings have swung sharply year to year, with combined filings moving from over 580,000 in 2020 to roughly 363,000 by late 2024 (United States Courts).

Why courts want a forecast: volatile federal caseloads

U.S. district courts, total filings, 12-month periods ending December 31

Source: United States Courts, National Judicial Caseload Profile.

Second, uncertainty becomes a first-class output. The maturing systems are learning to say "I don't know." Selective-prediction research on court cases aims to have models abstain on the murky middle and offer confident answers only in the reliable tails, precisely the regime where one study saw 97% accuracy on high-confidence calls (arXiv). Expect future tools to report calibrated probability ranges and confidence flags rather than a single deceptive number, which is also the form courts and regulators are most likely to accept.

Third, governance hardens around the human-in-the-loop. The emerging consensus among court bodies is that predictive analytics should augment, not replace, judicial reasoning, with documented disclosure when AI is used, the right to contest algorithmic inputs, and independent audits for bias before deployment (Montreal AI Ethics Institute). National court organizations have begun publishing guidance and training for judges on interpreting and limiting these tools (National Center for State Courts).

The next-few-years agenda for outcome simulation in courts
Direction	What changes	Open risk
Docket forecasting	Predicting clearance, delay & backlog formation	Resource decisions driven by imperfect models
Calibrated uncertainty	Models abstain or report confidence bands	Over-trust of "high-confidence" labels
Bias auditing	Pre-deployment fairness review & disclosure	No agreed standard; proprietary opacity
Judge-level limits	Bans or anonymization of individual analytics	Chilling legitimate empirical research
Self-fulfilling effects	Forecasts steer settlement behaviour	Predictions reshaping the reality measured

The likeliest near-term equilibrium is neither the techno-utopian "robot judge" nor the French prohibition, but a managed middle: probability widely used to value cases and plan dockets, hard limits on profiling individual judges, mandatory disclosure when models touch a decision, and a growing expectation that any score arriving in a courtroom carries its uncertainty and its audit trail with it.

Conclusion: The Distribution, Not the Verdict

The quiet revolution of outcome simulation is not that machines now decide cases, they do not, and the strongest research argues they should not. It is that the question itself has changed. Litigants increasingly ask not "Will I win?" but "What is the distribution of ways this could go, and what is it worth?" That reframing makes settlements faster, valuations sharper and dockets more predictable. But the same models that compress the optimism gap can entrench old biases, chill scrutiny of judges, and seduce a stretched system into trusting numbers it cannot fully explain. The courts that thrive in the probabilistic era will be the ones that treat a forecast as a beginning of judgment, never the end of it.