The Probability Engine: How Outcome Simulation Is Rewiring Insurance Legal Strategy

Insurers have spent two centuries turning chaos into a curve. Mortality, weather, fire, theft, each was once an unknowable hazard, and each was eventually tamed by data, distribution and the law of large numbers. Yet one exposure stubbornly escaped that discipline: the verdict. When a disputed claim entered litigation, the carrier's most consequential number, what a jury might award, was set not by a model but by an adjuster's gut, a panel counsel's letter, and a reserve that was, at bottom, an educated guess. That asymmetry is now closing. Outcome-simulation engines that run thousands of synthetic trials, and predictive analytics that score a case the moment it lands, are extending the actuary's logic into the courtroom, precisely as litigation costs hit records that make the old guesswork untenable.

$31.3B

2024 nuclear-verdict total, +116% YoY

$51M

Median nuclear verdict, up from $21M in 2020

7%

US social inflation in 2023, a 20-year high

$143B

US commercial casualty losses, 2023

The pressure is not abstract. Nuclear verdicts, jury awards of $10 million or more, surged to 135 against corporate defendants in 2024, a 52% jump over the prior year and the most since tracking began in 2009, with a combined value of $31.3 billion, according to research firm Marathon Strategies reported by Insurance Journal. The median such verdict climbed to $51 million, up from roughly $21 million in 2020, and "thermonuclear" verdicts above $100 million rose to 49, an 81.5% increase, as detailed in Risk & Insurance's coverage of the report. Behind these headline numbers sits a structural trend the industry calls social inflation, which the Swiss Re Institute estimates lifted US liability claims 57% over the past decade and peaked at 7% in 2023. When the loss side of the casualty ledger compounds at that pace, a more rigorous way to forecast individual case outcomes stops being a luxury.

The Old Way: A Reserve Set by Instinct

For most of the modern claims era, a litigated file was handled as a craft, not a calculation. An adjuster reviewed the demand, defense counsel offered a range, and a case reserve was booked, typically a single point estimate meant to capture the carrier's eventual cost. The number leaned heavily on the adjuster's experience with "cases like this one," a heuristic that worked tolerably in stable jurisdictions and failed badly when juries shifted. Settlement decisions inherited the same fog: a claims manager weighing whether to pay $400,000 now or risk trial rarely had a defensible probability attached to either branch.

The actuarial side was more sophisticated but operated at a remove. Reserving actuaries projected ultimate losses for entire portfolios using development triangles and chain-ladder methods, smoothing thousands of files into aggregate estimates. Those techniques are powerful at the book level yet say little about the trajectory of a specific suit. And the aggregate itself proved fragile: the litigious environment has driven persistent adverse development in long-tail casualty lines, with the "other liability" line alone carrying an estimated $12.5 billion reserve deficiency even as the broader US property-casualty industry sat on net redundancy, per an Assured Research analysis citing S&P Global Market Intelligence. A reserving system that misses by billions on the lines most exposed to verdicts is a system pricing the very risk it cannot yet see.

Two forces made the instinct-driven model increasingly indefensible. First, the magnitude of tail outcomes exploded: as the U.S. Chamber Institute for Legal Reform documented across more than 1,200 verdicts from 2013 to 2022, the median nuclear verdict was $21 million while the mean reached $89 million, evidence of a fat tail that point estimates simply cannot represent. Second, the cost of being wrong rose: US commercial casualty losses grew at an 11% average annual rate over five years to reach $143 billion in 2023, and bodily-injury-exposed lines posted cumulative underwriting losses of $43 billion between 2019 and 2023, the Swiss Re sigma report found.

The verdict curve steepened

Median US nuclear verdict against corporate defendants, $ millions

Source: Marathon Strategies, Corporate Verdicts Go Thermonuclear, via Risk & Insurance and Insurance Journal.

The Shift: When Litigation Joined the Loss Model

The same actuarial DNA that priced hurricanes is now being turned on the courtroom. The conceptual bridge is Monte Carlo simulation, a technique actuaries have long used to build full distributions of portfolio losses rather than single estimates. Established actuarial literature describes establishing a loss distribution "either parametrically, non-parametrically, analytically or by Monte Carlo simulation," then reading reserves off chosen percentiles, as set out in Society of Actuaries study material. Casualty researchers have shown how simulation generates a defensible reserve range rather than a brittle point, in work such as a Casualty Actuarial Society forum paper on statistical modeling techniques for reserve ranges.

Applied to a single litigated claim, the logic is identical. An outcome-simulation engine treats each driver of a case, liability probability, damages distribution, jurisdiction effect, judge and venue tendencies, the identity and track record of opposing counsel, as a random variable. It then runs thousands of synthetic trials and returns not "this case is worth $400,000" but a full curve: a most-likely value, a settlement band, and the probability of a tail verdict that would breach the reserve. That tail is exactly what nuclear-verdict risk has made the most dangerous number on the file.

The predictive layer feeding those simulations has matured fast. In a study of civil litigation drawing on tens of thousands of cases, a machine-learning classifier reached class-specific AUC values between 0.74 and 0.81 and, critically, achieved up to 97% accuracy on its highest-confidence plaintiff-win predictions, according to a 2026 arXiv study on predicting civil litigation outcomes. The same research surfaced a vital caveat for insurers: predictions are reliable in the tails but uncertain in the middle, and that "predictive uncertainty is not merely model error", it is a genuine signal of how indeterminate a dispute actually is. Procedural questions are even more tractable; reporting on early litigation-analytics platforms describes motion-to-dismiss rulings being forecast with roughly 85% accuracy from the judge's profile alone, per an analytics review by Accumulated. And in a controlled comparison, deep-learning models trained on more than 600,000 appeals outperformed 22 experienced human experts at predicting outcomes, a result published in PLOS ONE.

Predictive accuracy depends on the question

Reported model accuracy / discrimination by task type

Sources: arXiv civil-litigation study; Accumulated. Values are reported accuracy/AUC for distinct tasks and are not directly comparable.

For insurers the appeal is operational, not academic. Predictive models can estimate the probability a claim escalates into litigation, the likelihood a claimant retains counsel, the expected settlement band, and the chance a case breaches a reserve threshold, the litigated-claim equivalent of a frequency-severity model, as an industry primer from reinsurer Gen Re on predictive analytics in claims management describes. Carriers have reported using these models to assign the right defense counsel, intervene earlier on high-risk files, and compress days in litigation, with vendors in the space citing reductions in allocated loss-adjustment expense; the underlying mechanics are surveyed in that same Gen Re analysis.

What It Looks Like Now

In a present-day claims operation that has embraced these tools, the litigated file moves through an intelligence cycle rather than a queue. At first notice of loss, a model scores the claim for litigation propensity and severity, flagging the small fraction of files that will drive most of the cost. As a matter develops, an outcome-simulation engine ingests case attributes and produces a live distribution of likely results that updates as discovery changes the inputs. The claims professional and coverage counsel then negotiate against a probability, "settle at $350,000 and we clear 78% of simulated outcomes", instead of against a single adversarial demand.

The reserve becomes a percentile, not a point. Rather than booking the adjuster's best guess, the carrier can reserve at, say, the 75th percentile of the simulated loss distribution, with the full curve documented for the actuarial opinion. Scenario comparison lets legal leaders A/B test strategy: what does the distribution look like if we file for summary judgment versus mediate; if we try the case in this venue versus settle before a plaintiff-friendly jury pool. Exposure can then be aggregated across the entire litigation portfolio, giving the chief actuary and general counsel a single, defensible view of tail risk, the same loss-exceedance thinking actuaries use for catastrophe reserves, as described in a practitioner guide to Monte Carlo risk quantification.

From craft to computation: the claims-legal workflow, reframed
Decision	The legacy approach	The simulation-driven approach
Case reserve	Adjuster point estimate	Chosen percentile of a simulated loss distribution
Settle vs. try	Counsel's narrative range	Probability-weighted expected value of each branch
Counsel selection	Relationship and availability	Outcome track record by case type and venue
Nuclear-verdict risk	Largely unmodeled tail	Explicit tail probability and exposure quantification
Portfolio exposure	Aggregate triangle development	Bottom-up Monte Carlo aggregation across files

None of this displaces judgment. The strongest deployments treat the model as a second opinion that disciplines the human one, surfacing when an adjuster's reserve sits far below the simulated tail, or when counsel's optimism is unsupported by venue data. The combination of model output and seasoned legal judgment, rather than either alone, is what the research consistently identifies as the reliable configuration.

A simulated claim is a curve, not a number

Illustrative distribution of 10,000 synthetic trial outcomes for one liability claim ($000s)

Illustrative only. Methodology follows Monte Carlo loss-distribution practice described by the Society of Actuaries and Casualty Actuarial Society.

The Forces Driving Adoption

The technology is arriving into a legal profession that has decided AI is inevitable. In the Thomson Reuters 2025 Future of Professionals research, 95% of surveyed legal professionals expect generative AI to become central to their organization's workflow within five years, and the firm estimates the tools could free roughly 240 hours per professional per year. Organizations with a deliberate AI strategy were found to be twice as likely to see revenue growth than ad-hoc adopters, per Thomson Reuters' 2025 adoption analysis. For insurance legal teams, the economic case is sharpened by the cost line they are trying to bend.

That cost line keeps rising. Third-party litigation funding, capital injected into lawsuits in exchange for a share of recoveries, and a frequently cited accelerant of social inflation, was valued at roughly $15.2 billion in the US for 2024 by research summarized by the MPL Association, while the American Tort Reform Association pegs direct annual economic losses from the practice near $35.8 billion. When funded plaintiffs can finance protracted, high-stakes litigation, the insurer's ability to quantify and price tail exposure becomes a competitive necessity.

The cost pressures pulling simulation into claims

Selected US liability and litigation cost indicators

Sources: Swiss Re Institute; Marathon Strategies via Insurance Journal; MPL Association.

Nuclear-verdict momentum since 2020
Metric	Change since 2020	2024 level
Number of nuclear verdicts	+309%	135 verdicts
Aggregate verdict value	+273%	$31.3 billion
Median nuclear verdict	+143%	$51 million
Thermonuclear verdicts (>$100M)	+81.5% (2023 to 24)	49 verdicts

Regulators have moved in parallel. The National Association of Insurance Commissioners adopted its Model Bulletin on the Use of Artificial Intelligence Systems by Insurers on December 4, 2023, requiring each insurer to maintain a written AI governance program covering risk management, testing for bias, vendor oversight and documentation, as described by the NAIC. By 2025 more than two dozen jurisdictions had adopted it, and a multistate AI Systems Evaluation Tool, with examiners focusing heavily on AI used in claims handling, entered a pilot running through 2026 across twelve states, per a summary of the bulletin's evolution from WaterStreet. Any predictive model touching a litigated claim now lands inside that governance perimeter.

The Next Few Years: Promise, and the Limits of the Model

Over the next three to seven years, expect outcome simulation to migrate from a specialist tool used on the largest files to a default layer beneath every disputed claim of consequence. Reserves expressed as distributions rather than points will increasingly be the documented standard. Scenario comparison will be embedded directly into settlement authority workflows, so that the decision to escalate a file or accept a demand carries an attached probability the way a catastrophe bond carries a modeled loss. And the same engines will feed pricing and reinsurance, closing the loop between what the courtroom is likely to do and what the carrier charges to take the risk.

But the actuarial heritage that makes insurers natural adopters also makes them uniquely exposed to the technology's central failure mode: false precision. A simulation that returns a crisp 78% confidence number can lull a claims committee into treating an estimate as a fact. The research is blunt that predictability collapses in the indeterminate middle of the case distribution, exactly where the hardest settlement calls live, and that added information often fails to reduce that uncertainty, as the arXiv civil-litigation study demonstrates. Models trained on historical verdicts can also encode the biases and jurisdictional quirks of the past, and in a period when juries are visibly resetting expectations upward, yesterday's distribution may systematically understate tomorrow's tail.

Three risks deserve standing attention as adoption scales. First, over-reliance: a probability is a tool for judgment, not a substitute for it, and treating model output as gospel is the error retired jurists warned about even as they welcomed the analytics. Second, drift and data quality: a model calibrated on pre-2020 verdicts will misprice a post-2020 world, which is why regulators now expect insurers to monitor model drift and document validation, as the NAIC model bulletin sets out. Third, fairness and explainability: predictions that influence whether a claimant is offered an early settlement must withstand the same unfair-discrimination scrutiny as any other consumer-affecting decision under that bulletin.

Conclusion

Insurance has always been the business of converting uncertainty into a price, and for generations the litigated claim was the exception that resisted the curve. Outcome simulation and predictive case analytics are ending that exception at the precise moment social inflation and nuclear verdicts have made the old guesswork ruinously expensive. The carriers that win will not be the ones that trust the model most, but the ones that treat the simulation as what it has always been for an actuary: not a prophecy, but a disciplined account of what is likely, what is possible, and how much is still unknown.