Study: GPT Boosted Lawyer Productivity

A new study found that ChatGPT boosted the quality of output in a law school class, with some wrinkles and interesting implications for practitioners

Feb 09, 2024

Can large language models (“LLMs”), which are the newest evolution of artificial intelligence and machine learning, actually benefit attorneys? This is the question on everyone’s mind. Law firms are adopting AI at a rapid pace—a LexisNexis study found that a whopping 92% of lawyers are looking to expand their use of the technology. The obvious question, then, is whether it actually improves the productivity of lawyers. Some technology shifts, like word processing, cloud search tools, and docketing software have clearly transformed the profession, while other tools like fintech have been relatively unadopted.

This week, a new study was published by three Minnesota Law School researchers attempting to address that very question. It is the clearest example so far of how legal tools may impact practitioners. The study took 60 law students and asked them to complete 4 tasks, randomly assigning them to complete the task with or without GPT-4. The tasks included drafting a complaint, drafting a contract, writing a section of an employee handbook, and writing a client memo, with the students broken up into two groups: one group using ChatGPT to assist with the first two tasks and doing the second two tasks by hand, and one doing the reverse. The results are fascinating and open a window into how practitioners can benefit from these tools while also offering some best practices and areas for follow up. In this week’s Nonobvious, we’re going to dive into this paper.

Helpful, If Done Right

The core result of the paper is that it improves performance, but in a particular way. Typically, performance is measured in terms of quality output per hour. This implies that there are two ways to boost performance: increasing quality or decreasing time spent. One of the great strengths of this paper is that it attempts to measure both. And because this paper analyzed the use of LLMs as a tool, as opposed to an attempted replacement, it is a more accurate representation of how practitioners are using LLMs today, where they must edit the work outputted by their software. They found that quality (as measured by the GPA of the work) introduced small gains across most tasks, but with an interesting pattern. The speed greatly improved, with tasks getting completed over 20% faster on average.

The authors found a very interesting pattern. Although the average GPA improvement was low, the improvement in GPA was actually high for a specific group—the low-to-mid performers. The students who were already doing well were the ones who benefited the least in terms of quality. As a result, the authors call the quality improvements “inconsistent,” but what they mean is that it benefits some groups more than others. Below is a probability density plot. The effect is quite clear: the number of top performers does not change much, but the low-end of the curve effectively disappears and the distribution is much more concentrated in the head, meaning that the variance in quality is lower.

The speed improvements, in contrast, were consistent regardless of skill level. In other words, the average student improved in quality and increased their speed significantly while the excellent student did just slightly better while increasing their speed at the same rate. Complaint drafting, for example, saw a 24% decrease in time to draft roughly equally across the board, whether you were a 2.0 GPA or a 3.9 GPA.

These two facts are likely related to each other. One study from New Zealand conducted with senior attorneys found that LLMs performed certain tasks at senior-quality levels with a 99.97% reduction in task completion time. Therein lies the performance gain. What the AI is able to do is focus on the tasks it can do competently, freeing up more time for the practitioner to correct issues and do extra, high-value work. While the authors depict the speed increase and quality increase as separate effects, they likely contribute to each other.

The methodology of the paper suggests several subtleties that may be relevant to practitioners in the deployment of these tools. First, these students did not go in blindly to using ChatGPT. The authors are experts in prompt engineering to reduce hallucinations and have even produced a guide on lawyer prompt engineering. The students completed several modules including 2 hours of lecture videos and exercises. The authors also conducted surveys after the fact and found that the satisfaction of students increased over time as they got more comfortable with the tools—however, satisfaction remained higher with the students who were in the first group reported higher satisfaction primarily because they were assigned the tasks where ChatGPT provided more effective assistance. This general methodology has been found to work in other fields, like management consultants.1

There are two core learnings for firms. The first is the importance of training to maximize the use of the tools and minimize the errors. There is a hidden learning: using tools that are easier to use and require less prompt engineering require less training, introducing less opportunity for user error, and therefore will improve productivity more evenly. The second is to be selective on the use of the tools. They are better at some tasks than others, and the field is rapidly improving. Reviewing the tools carefully and not deploying them everywhere all at once is more likely to lead to user satisfaction and successful productivity boosts.

Bloomberg economist Noah Smith has observed that the rise of LLMs may signal the end of “Average is over,” an idea and book by George Mason University’s Tyler Cowen that predicted that the “superstar effect” combined with modern technologies would lead to a hollowing out of the middle. Indeed, this is what current studies have found. The primary effect of LLMs is to boost the productivity of low-to-middling performers and junior employees. This effect has been found, so far, in jobs as diverse as customer support agents, marketing copyrighters, software developers, and novelists.2 This mirrors the results we saw in this paper, too, where the best students saw the smallest performance boosts. In some areas, the top performers see almost no boost at all; this is different than in the law, where this paper showed that top performers did sometimes see a modest improvement in output. This paper also suggests that AI can be used as a training and productivity tool for especially junior associates. Bankers are already using AI to accelerate the learning of associates and allow, say, a 1st year associate to do 3rd year associate tasks. The coming research suggests that the same can be done here.

Weekly Novelties

USPTO came out with a guidance memo on the use of AI. In short: while practitioners are responsible for the work, there are no restrictions. (USPTO)
Nokia signed a major licensing deal for its 5G patents, which it claims is its last (Reuters)
The EU Council passed rules that will allow for the patenting of CRISPR-edited “supercrops” (Politico)
Although IBM is no longer the #1 company for filing patents, an analysis showed it was the #1 patent filer for AI applications (Axios)
A new analysis alleges that several GLP-1 agonists were improperly listing patents to prevent competition (Stat News)
The first NPE suit was filed by an NPE just 7 months in. Is this the new frontier for licensing campaigns? (Juve)

In fact, the authors of this study drew their methodology primarily from a study from Harvard Business School on best practices for implementing GPT-powered workflows with professionals. The results mirrored what we saw here: training is required for effective use and it is important to use the tools for the tasks for which they are best suited.

There are counterexamples. In one amusing example, two Wharton professors compared ChatGPT to their students and found the AI came on top. They said it was “not even close.”