The Brief That Reads Itself

Somewhere in a state attorney general's office tonight, a lawyer is reading a rule, not its first page, but the four-hundredth page of a cross-referenced scheme that touches three agencies, two prior administrations, and a line of appellate decisions nobody has fully reconciled. This is the unglamorous heart of government legal work: not courtroom theatrics, but the patient reconstruction of what the law actually requires, line by line, citation by citation. It is work that scales badly. The body of law keeps growing; the staff does not. Into that gap has stepped a category of technology that promises to read faster than any human and show its work while doing it.

The promise is real, and so is the peril. AI legal research and drafting tools can now compress days of statutory tracing into minutes and produce a first draft of an opinion or public notice in seconds. But the same systems have been caught inventing cases that never existed and misstating holdings with total confidence. The story of how government legal offices got here, and where they are going, is the story of a profession learning to trust a machine that is brilliant, fast, and occasionally, dangerously wrong.

9×

Rise in federal generative-AI use cases, 2023 to 24

1.1M

Binding restrictions in the U.S. regulatory code

69 to 88%

Hallucination rate of raw LLMs on specific legal queries

43%

Public-sector staff using AI by late 2025

The Old Way: Drowning in Paper

To understand what is changing, start with the volume problem. The U.S. Code of Federal Regulations contained 22,877 pages in 1960; by the end of 2023 it had swelled to 190,260 pages across 245 volumes, a body that the federal government's own Economic Report of the President measures at more than 100 million words. Counted by binding obligations, the words "shall," "must," "may not," "required," "prohibited", total restrictions climbed from roughly 400,000 in 1970 to about 1.1 million in 2024 (Economic Report of the President). By one frequently cited estimate from regulatory researchers, reading the entire code at a steady pace would take close to three years (Pacific Legal Foundation). Statute compounds the load: permanent federal law in the U.S. Code reached over 24.4 million words in its 2025 edition, its largest since at least 1991 (Mantzaris & Fošner, arXiv).

Against that tide stood a thin line of public-sector lawyers using tools that had barely changed in a generation: keyword search, printed reporters, and institutional memory. Research was the silent tax on every matter. Industry surveys have long pegged the share of an attorney's workday spent on legal research at roughly 17 percent, close to a full day each week vanishing into the hunt for authority before a single argument is written (Alta Pro Lawyers RPG). In government, where headcount is capped by appropriations rather than billings, that tax fell hardest. A legislative drafting office could not simply hire its way out of a backlog of bills; an agency counsel could not bill a client to fund overtime on a rulemaking record.

The early digital tools helped, but only at the margins. Keyword retrieval still demanded that a lawyer already know the right words to search; it surfaced documents, not answers. Studies of natural-language legal research found measurable but modest gains, one assessment reported users rating result relevance about 21 percent higher than a traditional keyword tool, an improvement that left the synthesis work firmly on human shoulders (Blue Hill Research, via LawNext). The bottleneck was never finding the haystack. It was reading it.

The Shift: From Retrieval to Reasoning

The arrival of large language models changed the unit of work. Where keyword search returned documents, generative systems returned drafted analysis, a summarized rule, a comparison of provisions, a first-pass memo with the relevant authorities already woven in. For an institution measured by output rather than profit, that shift from retrieval to reasoning landed differently than it did in private practice. It promised not more revenue but more capacity: the same staff answering more questions, faster.

Government did not move first, but it moved fast once it moved. The U.S. Government Accountability Office, reviewing the AI inventories of eleven federal agencies, found total reported AI use cases nearly doubled from 571 in 2023 to 1,110 in 2024, and generative-AI use cases grew roughly ninefold, from 32 to 282 (Government Accountability Office, via FedScoop). The consolidated federal inventory kept by the Office of Management and Budget logged 2,133 AI use cases across 41 agency submissions, of which 351 were flagged as rights- or safety-impacting (OMB Federal AI Use Case Inventory).

Generative AI in federal agencies went vertical

Reported use cases across 11 agencies reviewed by the GAO, 2023 vs. 2024

Source: U.S. Government Accountability Office analysis of agency AI inventories, reported via FedScoop (2025).

Adoption by individual public servants outran the formal inventories. By the fourth quarter of 2025, Gallup found that 43 percent of public-sector employees reported using AI at least a few times a year, up from 17 percent in mid-2023, with 21 percent using it weekly or daily (Gallup). A pulse survey of government workers found roughly half using AI applications nearly every day, rising to 64 percent among federal employees, and 71 percent of federal respondents using the tools to draft documents (EY). The drafting use case, memos, letters, notices, summaries, turned out to be the front door through which generative AI walked into the legal office.

Everyday adoption climbed faster than formal programs

Share of public-sector employees using AI at least occasionally

Source: Gallup public-sector workforce surveys (Q2 2023, Q2 2024, Q4 2025).

Yet the gap between experimentation and institutionalized practice remained wide. A global survey of government organizations found that while 64 percent saw AI's potential for cost savings, only 26 percent had integrated AI organization-wide and just 12 percent had adopted generative AI in a governed way (EY). In the United Kingdom, the National Audit Office found that as of late 2023 just over a third of surveyed bodies had deployed AI, most running only one or two use cases, while more than two-thirds were piloting or planning (National Audit Office). The profession is sprinting in the individual lane while the institutional lane lags behind.

The Accuracy Problem Nobody Can Ignore

Here the story darkens, and government has every reason to take the warning seriously. Researchers at Stanford found that state-of-the-art language models, asked specific legal questions without grounding, hallucinated between 69 and 88 percent of the time, and when asked about a court's core holding, erred at least 75 percent of the time (Stanford HAI / Stanford Law). A peer-reviewed study in the Journal of Legal Analysis put the range of legal hallucination at 58 percent for the strongest model tested to 88 percent for the weakest, when asked direct, verifiable questions about randomly selected federal cases (Journal of Legal Analysis, Oxford).

The crucial distinction, the one that should reshape how government deploys these tools, is between ungrounded generation and grounded, retrieval-based systems. When researchers examined purpose-built tools that anchor every answer to retrieved authority, error rates dropped sharply but did not vanish, with leading platforms still producing incorrect or misgrounded answers in a meaningful share of queries (Stanford RAG study, via LawSites). Synthesizing the literature, law librarians estimate that even the best legal AI tools still hallucinate somewhere between 15 and 25 percent of the time when both fabrications and mischaracterizations are counted (AI Law Librarians).

Grounding cuts error, but does not erase it

Approximate hallucination rates by system type and task, from published legal-AI studies

Sources: Stanford Law; Journal of Legal Analysis; Stanford RAG study via LawSites; AI Law Librarians. Figures are approximate ranges.

For a government lawyer, a fabricated citation is not an embarrassment, it is a liability. A misstated regulation in a public notice can mislead millions; an invented precedent in an agency's brief can sanction the office and erode legitimacy public institutions cannot afford to lose. This is why architecture matters more than fluency. The systems gaining real traction retrieve before they reason: they pull the actual statute, case, and agency record, and confine their answer to what those sources support, with a citation a human can click and verify.

Where AI lands on the reliability spectrum, by task
Task type	Grounding available	Relative risk	Verification burden
Summarizing an uploaded record or document	High (source in hand)	Lower	Spot-check key claims
Comparing provisions across known documents	High	Lower	Confirm citations
Retrieval-grounded statutory research	Moderate, High	Moderate	Verify each authority
Open-ended case-law questions	Low (model memory)	High	Independent re-research
Multi-jurisdiction or recent-doctrine queries	Low	Highest	Full human re-verification

What It Looks Like Now

Strip away the hype and the present-day government legal workflow is concrete and, increasingly, ordinary. A regulatory analyst facing a new statutory mandate uses a grounded research system to build a first-pass map of relevant authority, cross-references, prior amendments, related guidance documents, in the time it once took to pull the binders. The lawyer does not accept that map; they audit it. But they start from a draft rather than a blank page.

Drafting follows the same pattern. Government legal writing is unusually templated: public notices, FOIA responses, agency correspondence, compliance memos, and opinion letters follow recurring structures, which is precisely what makes them tractable for AI first drafts. Market analysis of administrative-law practice estimates that 35 to 45 percent of billable time in government and administrative-law workflows is automatable or AI-accelerable over the next five years, with the strongest exposure in research, first-draft memos and letters, agency-record review, and regulatory monitoring (law.co market research). Internal civil-servant assistants are already live: the OECD highlights France's "Albert" and the United Kingdom's "Caddy," tools that put cross-government information at public servants' fingertips to inform decisions and respond to inquiries (OECD).

Public defenders, perennially under-resourced, offer some of the clearest present-day evidence. One large county office that integrated AI across case management reported a 40 percent reduction in time spent on administrative case processing, freeing attorneys for client work; another office that piloted AI legal-research and drafting tools across roughly a hundred attorneys reported legal-research time cut by more than half (UC Berkeley Law Criminal Justice Center). Plain-language public communication is another quiet win: the same engines that summarize a rule for a lawyer can translate it for the public it governs, turning impenetrable regulatory text into notices citizens can actually understand.

Reported time savings in public-sector legal pilots

Selected outcomes from documented public defender deployments

Source: UC Berkeley Law, Criminal Law & Justice Center case studies of AI implementation (2025).

The barriers are not mainly technical. The GAO found that agency officials struggle most with complying with existing federal policy, securing technical resources and budget, and keeping appropriate-use rules current (Government Accountability Office, via FedScoop). A Deloitte survey of government leaders found 78 percent reporting their organizations were adopting generative AI somewhat or very fast, even as the same share, 78 percent, said they struggled to measure the impact, a tension that stalls projects in pilot purgatory (Deloitte).

Adoption signals across the public sector
Indicator	Figure	Source
Federal generative-AI use cases (11 agencies), 2023→2024	32 → 282	GAO
Public-sector staff using AI at least occasionally (Q4 2025)	43%	Gallup
Federal workers using AI to draft documents	71%	EY pulse survey
Government orgs that have adopted governed generative AI	12%	EY global survey
Government leaders adopting generative AI fast	78%	Deloitte
Public-sector orgs exploring or working on generative AI	64%	Capgemini

The Next Few Years

The trajectory points toward agentic systems, tools that do not merely answer a question but execute a multi-step legal task: monitor a regulatory docket, flag a change, draft the conforming amendment, and queue it for human sign-off. Public-sector appetite is already there. A survey of government organizations found 90 percent plan to explore, pilot, or implement agentic AI within two to three years, even though only a fraction have moved current generative tools into full deployment (Capgemini Research Institute). The near-term consensus among practitioners is that purpose-built, grounded legal tools will reach the reliability of a competent junior associate on well-defined tasks within a few years (AI Law Librarians).

Three forces will shape whether that future is healthy. The first is grounding as a non-negotiable standard: retrieval-anchored answers with clickable citations, not free-floating generation, as the baseline for any system touching legal text. The second is accountability architecture. As the OMB inventory's 351 rights- or safety-impacting use cases make plain, government AI is not a back-office convenience; it can affect benefits, enforcement, and liberty (OMB Federal AI Use Case Inventory). Someone, a named human, must own each output. The doctrine of reasonable diligence is tightening around AI-assisted work, which means the verification step is becoming a legal obligation, not a courtesy.

The third force is institutional capacity. Government consistently reports the least AI expertise and talent-readiness of any sector, which is why "choosing the right technology" is its single biggest reported barrier (Deloitte). The offices that pull ahead will train lawyers to interrogate AI output, to know which tasks their tools handle reliably and which demand full re-verification, rather than treating the software as an oracle.

Conclusion

The arc from binders to grounded AI is not about machines replacing lawyers. It is about a fixed supply of legal labor finally getting leverage against an exponential supply of legal text. The government legal office of the late 2020s will still turn on human judgment, on the lawyer who reads the four-hundredth page and decides what it means. What changes is that the reading, the first draft, and the citation map can now happen in minutes, freeing the lawyer's scarce attention for where it belongs: verification, accountability, and the questions no system can answer alone. The brief that reads itself is here. The brief that answers for itself is not, and in a government of laws, it never should be.