Decision&LawAI Legal Intelligence
regulatory-analysisjudicial-interpretation

Artificial Intelligence and the Future of Human Legal Reasoning: An Empirical Analysis

Sofia Chen
April 10, 2026
18 min read
3,847 words
artificial intelligencelegal reasoningempirical researchlegal educationprofessional ethics

Educational Content – Not Legal Advice

This article provides general information. Consult a qualified attorney before taking action.

Disclaimer

This analysis is for educational purposes only and does not constitute legal advice. The information provided is general in nature and may not apply to your specific situation. Laws and regulations change frequently; verify current requirements with qualified legal counsel in your jurisdiction.

Last Updated: April 10, 2026

The Paradox of Synthesis: How AI Empowers—and Threatens—Legal Thought

Generative artificial intelligence can dramatically improve the speed and quality of legal work. Yet institutional reluctance persists, grounded in fear that the technology will erode independent human reasoning and professional judgment.¹ This analysis presents the first empirical study designed to test whether using AI at early stages of a legal project impairs comprehension and independent reasoning in later stages when the tool is no longer available.

Through a randomized controlled trial involving approximately one hundred upper-level law students at the University of Minnesota (https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6525800), researchers analyzed four sequential tasks: synthesis, comprehension, application, and revision. The findings reveal a complex reality: AI increased productivity on the synthesis task by 50% to 70%, but no degradation in subsequent comprehension was observed. Unexpectedly, participants exposed to AI outperformed the control group on the independent application task.

However, the revision phase revealed a "leveling effect": AI improved weak drafts but caused regression among high-quality initial work. The conclusion is stark: AI's impact on legal reasoning is not inevitably erosive but contingent on the timing and manner of its use.


1. The Technological Transformation of Legal Practice: From Skepticism to Incremental Adoption

Since the public release of ChatGPT in 2022, generative AI has begun to transform the practice of law, particularly in tasks traditionally delegated to junior lawyers. Recent studies demonstrate that AI systems can earn top marks on law exams and that their use enables higher-quality work in less time.² Despite this mounting evidence, much of the legal ecosystem remains cautious, integrating AI as a limited tool rather than as a transformative technology.³


2. The Research Question: Does AI Erode Independent Reasoning Ability?

Resistance to full AI adoption rests primarily on three perceived risks: hallucination (fabrication of sources), breach of confidentiality, and, most critically, the erosion of human legal reasoning and professional judgment.⁴ Scholars and ethics committees have expressed genuine concern that delegating cognitive tasks to AI may prevent lawyers from internalizing doctrine, developing strategic judgment, and cultivating the ability to respond to complex questions from judges or clients.⁵ Literature from other fields (medicine, software development, analytical writing) already suggests that over-reliance on AI can weaken deep understanding of underlying material.⁶


3. Empirical Framework: Experimental Design and Methodology

A. Study Objectives and Preregistered Hypotheses

The primary goal was to assess the causal impact of generative AI use on human legal reasoning, isolating the effect of early-stage technological assistance on later independent performance. The authors employed a randomized controlled trial (RCT), the "gold standard" for empirical research because it neutralizes selection bias and balances observable and unobservable variables across groups.⁷

The research was grounded in a theory of "cognitive erosion": the hypothesis that delegating synthesis of legal sources to AI reduces mental engagement with the material, resulting in superficial understanding and diminished ability to apply doctrine autonomously. These hypotheses were preregistered with the Center for Open Science before data analysis.⁸

B. Sample Selection and Randomization Procedure

The sample consisted of 91 second- and third-year law students (2Ls and 3Ls) at the University of Minnesota Law School. First-year students were deliberately excluded to ensure a baseline uniformity in analytical skills. Participants were randomly assigned using an R script to two conditions: a control group (N=41), which had no access to AI until the final phase, and an AI-exposed group (N=50), which used Gemini 2.5 Pro from the initial task. To incentivize maximum effort, a payment system was implemented with a fixed fee of $75 and substantial bonuses of up to an additional $100 for top performers.⁹

C. Architecture of the Experiment: Four Sequential Tasks

The experimental design fragmented a typical legal assignment into four interconnected tasks, set in the hypothetical jurisdiction of "Gopher" to prevent participants or AI from using prior knowledge outside the closed universe of the experiment:¹⁰

1. Synthesis Task: Participants were required to draft a roughly 750-word memo synthesizing the doctrine of servitudes on chattels based on a 12-page packet of sources, including the Restatement (Third) of Property and hypothetical local case law. The treatment group used Gemini 2.5 Pro following specific prompting instructions.¹¹

2. Comprehension Task: Immediately afterward, without access to the sources or AI, both groups answered a six-question, multiple-choice quiz of high technical difficulty to measure retention and understanding of the legal principles.¹²

3. Application Task: Participants received a new factual scenario (the case of a historically valuable antique car) and were required to draft a memo applying the previously synthesized doctrine. Neither group had access to AI at this stage, allowing measurement of independent reasoning.¹³

4. Revision Task: Finally, all participants used AI for 20 minutes to improve the clarity and writing of their application memo, under instructions not to alter the substance of the analysis.¹⁴

D. Measurement Instruments: Standardized Rubrics and Control Variables

To minimize subjectivity, all 273 memos were anonymized and blindly graded by a single researcher with more than twenty years of experience in legal writing. Preregistered rubrics evaluated three main dimensions: substantive quality, organization, and polish. In addition, data were collected on grade-point average (GPA), year in law school, and prior AI experience to perform multivariate regression analyses.¹⁵


4. Methodological Analysis: Scope and Limitations

A. Strengths of Internal Validity Through Randomized Controlled Trial

The experimental design is notable for its robust internal validity, achieved through an RCT that isolates the causal effect of AI access on human performance.¹⁶ Random assignment neutralizes selection bias and balances both observable (GPA, prior experience) and unobservable (intrinsic motivation) characteristics.¹⁷

B. Sample Limitations: From Law Students to Professional Practice

A critical limitation is that the sample consisted exclusively of second- and third-year law students at a single top-tier institution. The findings may not generalize to practicing lawyers with substantial practical experience, or to jurisdictions with different curricular structures. The impact of AI on the reasoning of a lawyer with fifteen years of specialized practice likely differs significantly from that of a student just developing foundational doctrinal competencies. Future studies must replicate this design with more diverse and representative samples of the profession.

C. Control of Bias and Construct Validity

Rigorous bias controls were implemented through memo anonymization and blind grading, which increases confidence in impartial quality measurement. However, grading by a single evaluator, though experienced, introduces the possibility of personal idiosyncrasies regarding what constitutes "good legal writing." An ideal approach would have employed multiple independent evaluators with inter-rater reliability analysis to verify that observed differences do not reflect evaluator bias.


5. Experimental Results: A Complex Picture

A. Synthesis Productivity: The Acceleration Effect

The most striking result was that AI increased productivity on the synthesis task by 50% to 70%, without accompanying degradation in substantive analytical quality.¹⁸ Participants who used Gemini 2.5 Pro produced approximately 60% longer memos in the same time period, and evaluators detected no statistically significant differences in analytical coherence. This acceleration was especially pronounced for participants with less prior AI experience, suggesting that even "incipient user learning" was quickly overcome.

B. Subsequent Comprehension: Absence of Cognitive Degradation

Contrary to the "cognitive erosion" hypothesis, results showed that exposure to AI in the synthesis phase did not negatively affect subsequent comprehension as measured by the multiple-choice quiz.¹⁹ Indeed, the AI-exposed group achieved an average comparable to, if not slightly superior to, the control group. This finding challenges the assumption that delegating synthesis to AI reduces individual mental engagement with the material.

A plausible explanation lies in what cognitive psychologists call the "cognitive feedback effect": by generating faster and more complete syntheses, AI allowed participants to review and refine multiple versions of the material, effectively increasing their exposure to legal principles, even though composition had been assisted. In other words, AI did not replace participant thinking but rather accelerated the cycle of review and refinement.

C. Independent Application: The Positive Scaffolding Effect

Perhaps the most surprising result was that participants exposed to AI in the synthesis phase outperformed the control group on the independent application task, executed without technological assistance.²⁰ This suggests that AI-assisted synthesis functioned as a cognitive "scaffold": by providing a sturdier, better-structured, and more complete doctrinal foundation, it enabled participants subsequently to apply doctrine with greater precision and analytical sophistication.

The proposed causal interpretation is that AI liberated cognitive load dedicated to source gathering and organization, allowing participants to allocate mental resources to higher-order tasks: comparative analysis, nuanced application to novel cases, anticipation of counterarguments. The scholar who has "seen" a complete synthesis of doctrine is better positioned to manipulate those concepts than one who must discover them independently under time pressure.

D. Revision: The Leveling Effect and Expert Degradation

The fourth phase revealed the hidden face of AI: use of revision-assistance tools tended to improve weak drafts but to degrade work of higher initial quality.²¹ This "leveling effect" manifests as a compression of the quality distribution, reducing both the low and high extremes.

More specifically, regression analysis showed that for participants whose application memo was rated in the lower third (low initial quality), AI-assisted revision significantly improved clarity, structure, and polish, raising the group average approximately 15 percentage points. Conversely, for participants whose initial work was classified in the upper third (substantially higher quality), AI-assisted revision tended to homogenize language, standardize argumentative structure, and in some cases dilute carefully constructed doctrinal nuances.²²

Two underlying mechanisms probably explain this effect:

First, language standardization: Generative language models train on massive corpora reflecting "typical" or "standard" forms of legal expression. When AI revises text, it tends to optimize toward that statistical normalcy, eliminating particular turns of phrase or syntactic emphases that an expert had deliberately selected to suggest specific doctrinal nuances.

Second, automation bias: Robust literature shows that humans tend to over-trust automated systems, especially when they come from sources perceived as "expert" (like cutting-edge AI models). When AI suggests that phrasing is "clearer" or "more polished," participants tend to accept those suggestions uncritically, even if doing so sacrifices technical precision for accessibility.


6. Discussion: Mechanisms of Assisted Legal Reasoning

A. The Scaffolding Versus Substitution Model

The findings suggest it is useful to conceive AI's role in two distinct scenarios depending on timing of intervention:

In the early synthesis phase, AI operates as a "scaffold": it facilitates organization and structuring of the legal problem, freeing cognitive resources for later application. This occurs because the participant must still comprehend, evaluate, and refine the AI product, which mandates mental engagement with the material.

In the late revision phase, AI tends to operate as "partial substitution": it replaces human judgment about what constitutes stylistic improvement, frequently subordinating technical precision to accessible clarity. The risk is amplified when the user is expert, because they mistakenly assume the AI "understands" the nuances they had constructed.

B. Implications for Legal Education

The study suggests that law schools should not resist AI integration but rather structure it strategically. The findings support a pedagogical model where AI is introduced in initial synthesis tasks (under supervision), with the explicit objective of accelerating consolidation of doctrinal knowledge. However, this must be accompanied by clear prohibitions on AI use in application and revision tasks during certain educational phases.

A reformed curriculum might be structured as follows:

  • Year 1: Prohibition on AI access. Students develop doctrinal understanding without assistance.
  • Year 2: Introduction of AI in source synthesis under supervision. Students use tools to organize material but must independently validate and apply.
  • Year 3: AI permitted in revision but under explicit instructions to preserve substantive nuances. Students are taught to critically evaluate AI suggestions.

C. The Legal Profession and the Homogenization Risk

For practicing lawyers, the study warns against what might be called "cascading uncritical reliance": the belief that because AI accelerates synthesis, it can also improve revision. The findings suggest the opposite: the greater the lawyer's expertise, the lower should be AI's intervention in late-stage refinement.


7. Professional Ethics and the Duty of Competence in the Era of Large Language Models

Ethics opinions issued by bodies such as the ABA and state bar associations underscore that uncritical reliance on AI is inconsistent with the diligent practice of law.²³ The "duty of competence" requires that the lawyer maintain independent, personal judgment over every function delegated to technology. As the experiment demonstrates in its revision phase, excessive deference to AI can even degrade the quality of expert work. Consequently, professional ethics in the twenty-first century not only prohibit technological "hallucinations" but also impose an affirmative obligation not to allow AI to displace human reasoning in contexts where doctrinal subtlety is indispensable.


8. Recommendations for Best Practices and Future Research

A. Guidelines for Responsible Integration: The "Human-in-the-Loop" Model

The evidence from this study suggests that AI should not be viewed as a substitute for reasoning but as a scaffold that requires constant, expert human supervision. The fundamental principle is to limit AI use to those domains where the lawyer possesses the necessary expertise to independently evaluate, adapt, and defend the generated product.²⁴ A lawyer who uses AI to construct arguments in an area he or she does not understand risks producing work that appears professionally adequate but cannot be sustained in oral argument or under rigorous judicial scrutiny. Law firms should adopt an approach similar to supervision of junior associates, where the senior professional exercises judgment to decide which technological suggestions to accept and which to reject.

B. Task Segmentation Strategies and Prevention of Fatigue

To minimize the risk of displacing independent reasoning, AI should be employed on narrow, well-defined tasks rather than delegating entire complex projects.²⁵ For example, it is preferable to use AI to refine individual paragraphs or review specific contract clauses once the legal theory and argument structure have already been established by the human. Task segmentation forces the lawyer to perform the initial cognitive effort of mapping the legal problem, identifying nuances, and structuring the analysis, allowing AI to assist in technical execution without usurping the critical thinking process. Likewise, it is imperative to avoid using AI under conditions of cognitive fatigue or artificially tight time constraints, because these circumstances increase the human propensity to defer uncritically to system suggestions.²⁶

C. Future Research Agenda: Longitudinal Effects and Doctrinal Diversity

Although this controlled trial provides a solid causal foundation, the field requires additional research to address the identified limitations: (1) long-term effects – study whether prolonged AI dependence cumulatively erodes cognitive skills over months or years;²⁷ (2) sample and domain diversity – replicate the design with practicing lawyers and across various doctrinal areas to verify the robustness of the scaffolding effect and the revision-phase leveling risk;²⁸ (3) mechanisms of expert degradation – explore whether the regression of high-performing individuals is due to language standardization, automation bias, or loss of doctrinal nuance;²⁹ (4) pro se litigation – investigate how non-lawyer litigants use AI to navigate the judicial system, a priority for access to justice.³⁰


9. Conclusion

This analysis invites the legal community to avoid both complacency toward automation and panic over cognitive obsolescence. The findings from this controlled trial provide causal evidence that AI use in legal work does not inevitably erode independent reasoning; to the contrary, when employed to assist in initial doctrinal synthesis tasks, AI can improve the quality of intermediate products and, consequently, enhance subsequent human reasoning performance, even after the tool is withdrawn.

Yet the results also demonstrate that AI is neither an absolute good nor risk-free. Its introduction at the revision stage revealed a critical duality: while it can elevate weak drafts, it can displace and degrade the judgment of the most expert professionals, suggesting that technology can supplant human acuity if not managed rigorously.

The lesson for contemporary law practice lies not in categorical acceptance or rejection of generative AI, but in the ability of lawyers, judges, and academics to distinguish between uses that serve as a "scaffold" for the human mind and those that uncritically substitute for it. The future of legal practice will not depend simply on whether lawyers use AI, but on whether they can structure its use in ways that preserve and strengthen the human capacities upon which sound and just legal judgment ultimately depends.


Footnotes

  1. See original study, pp. 2–3 (citing results on law exams and student tasks).
  2. Id. at 3 ("practical impact muted").
  3. Id. at 4–5 (three risk categories).
  4. Id. at 5 (citing ethics opinions from ABA, Florida, New York, etc.).
  5. Id. at 15–16 (studies on SAT essays, medical diagnosis, recruiting).
  6. Id. at 19 ("gold standard").
  7. Id. at 19 (preregistration noted at n.54 and accompanying text).
  8. Id. at 27–28 (recruitment and compensation).
  9. Id. at 26 (hypothetical "Gopher" jurisdiction).
  10. Id. at 22–23 and Appendix instructions, pp. 68–69.
  11. Id. at 23 (six questions, 10 minutes).
  12. Id. at 24 (application without AI).
  13. Id. at 24–25 (revision with AI, 20 minutes).
  14. Id. at 29 (blind grading and rubrics).
  15. Id. at 19 (RCT as standard).
  16. Id. at 60 (balance tests, Table A1).
  17. Id. at 50.
  18. Id. at 51–52.
  19. Id. at 50.
  20. Id. at 51.
  21. Id. at 32, Table 2.
  22. Id. at 36, Tables 3 & 4.
  23. Id. at 5, nn.11–13.
  24. Id. at 54–55.
  25. Id. at 56–57.
  26. Id. at 58.
  27. Id. at 51.
  28. Id. at 53.
  29. Id. at 49.
  30. Id. at 53.
Back to News