How can artificial intelligence help us find new case law?
As more and more case law is published and available online, the key question for users is how to find the cases they need. How do they know they haven’t missed something? We look at how AI could help them find even the “unknown unknowns” that might be out there. … Continue reading
What are we looking for when we look for case law?
We know why we report cases: because in a common law system they constitute a primary source of the law. But why do people look for cases? Why do they cite them? The answer isn’t quite so singular.
For a student, learning about a new topic of law, the cases they are asked to read will be the ones that contain foundational principles. We can all think of our favourite examples. Donoghue v Stevenson [1932] AC 562 – the one about the snail in the ginger beer bottle. R v Dudley and Stephens (1884) 14 QBD 273 – the one where the shipwrecked sailors ate the cabin boy. Carlill v Carbolic Smoke Ball Co [1893] 1 QB 256 – the one about offer and acceptance in the formation of a contract. These are what we might call cornerstone cases, because the whole edifice of a particular topic of the common law depends on them.
That’s not to say the law hasn’t changed since these old cases. Of course it has. And for both students and practitioners, there will be a chain of subsequent cases – what one might call milestone cases – marking the path of the law to where it stands today. Not every intervening case will be of interest. But there will be a significant few, whose contributions to the development of the law will continue to justify citation.
If you look at a case like In re A (Children) (Conjoined Twins: Surgical Separation) [2001] Fam 147, you will see that in making an agonising decision whether to authorise surgery that would have the effect of saving one twin’s life at the cost of the other’s, the court considered the defence of “necessity” which had been rejected in R v Dudley and Stephens. But it also considered the case of Airedale NHS Trust v Bland [1993] AC 789, a cornerstone case on the authorisation of life-ending medical decisions in the best interests of the patient.
While it’s true that Bland referred back to earlier cases, the same is true of Donoghue v Stevenson, Dudley and Stephens and the Carbolic Smoke Ball case. It’s rare for the common law to spring from nothing. And Bland continues to be cited in later cases involving the ending of life-sustaining treatment, such as the recent cases involving the terminally ill infants, Alfie Evans and Charlie Gard. Some of these will be milestones too.
While students need to know where the law came from and to understand how it has reached its current state of development, for practitioners the focus will be on where the law is now and, perhaps just as importantly, where it might be tomorrow. In terms of the law of contract, the most important cases will not be about whether or when the contract was formed, but what a particular term in it means. The law on the formation of contracts has not changed much in the last hundred years. The law on how you interpret the wording is, apparently, subject to constant change. You might think the milestone case of Chartbrook Ltd v Persimmon Homes Ltd [2009] UKHL 38; [2009] AC 1101 had the last word, with the House of Lords citing a chain of earlier authorities, but while that decision itself has been applied in many subsequent cases, it has also been questioned and distinguished in some. So Chartbrook is a milestone, and there may be more to come.
For the practitioner, while the milestone cases are relevant, what also needs to be found are the cutting-edge cases – the stones newly mined from the quarry, if one wants to retain the stony analogy. Those cases might not even be reported, or might only have been noted up in a summary, pending more detailed treatment. But those cutting-edge cases are the ones that might make the difference between persuading the court of your client’s case, or not. What if your opponent has found such a case, and you have not?
To recap, case law on this analysis fall into three categories: cornerstone cases, which found a chain of authority on a topic; milestone cases, which show the major steps in its development; and cutting-edge cases, which are where the law may be about to develop still further. A successful legal information platform must, we know, cater for all these needs.
That is why ICLR is in a constant state of development. We are currently involved in a major upgrade project, the fruits of which will be launched later this year.
Known or unknown unknowns
In looking at how users conduct their case law research, we have adopted the three-part philosophy of knowledge popularised by that somewhat dubious military strategist, Donald Rumsfeld: there are “known knowns”; there are “known unknowns”; and there are “unknown unknowns”.
Applying that approach, we would categorise many cornerstone cases as known knowns. We know about them, and we know where to find them. We can use a citation to fetch the book off the shelf and find the page, or to find and fetch the case report from an online platform.
Many milestone cases are also known knowns. Others, perhaps, are known unknowns. We know they exist. We may not know their names or citations. But we can find them. They will be linked via the citator on ICLR online, or we can look them up by name, subject matter etc.
But what don’t we know? We don’t know that there was a case last week in the Court of Appeal that overturned our cherished milestone case. That’s an unknown-unknown. How do we deal with that? Partly by current awareness, partly by checking new cases as they appear, and keeping up to date with all the sources we can rely on. No system is foolproof. But what if we could detect new cases that were similar or related to what you already had, and flag them up for your attention?
Finding similar or related cases can be done the old way, using a look-up index or searching by legal subject matter. But this depends on cases being reported or indexed by a human being. Your typical database will allow you to search by entering words and finding words that match them. What it can’t do is find cases according to the concepts and ideas in them.
AI and NLP
There’s been a lot of hype about how artificial intelligence (AI) might affect developments in legal technology, with doomsday predictions about robot judges clutching digital gavels and robot lawyers putting all the real ones out of business. My feeling is that we can all rest easy on this score. Human experience and intuition are not that easy to replace.
But there are aspects of AI that can be put to good use in legal research – not as a replacement for human analysis, but as a tool to support it. One of these is what’s called Natural Language Processing (NLP), which involves a computer analysing text documents, breaking the content into its constituent elements and extracting their meaning for the purposes of detecting patterns or connections with other material.
Some routine legal tasks, such as contract drafting and e-discovery, already make widespread use of NLP. But what about case law research? This is an area where the use of NLP is perhaps less well known. There has been some publicity about the use of big data techniques, combined with NLP, to analyse past judgments with a view to making predictions about how courts will decide particular cases, based on the similarity of their facts, or even of predicting how individual judges will rule, based on a detection of trends or biases in their earlier decisions.
What about trial lawyers using NLP to find those authorities that they might not otherwise have found? Those recent cases not yet reported or cited elsewhere, for example, or that “sleeper” case that went unnoticed at the time, or was flagged up in a completely different context, that happens to contain the judicial ruling or dicta that might just swing your case? What if there were a case out there that overruled or flatly contradicted your main authority?
You may suspect such potentially critical cases exist – known unknowns. But conventional online search methods depend on accurate classification of the content or the matching of identical wording to identical ideas. What NLP can bring to the process is that it looks behind the labels, behind the words themselves, at the underlying concepts and ideas. It extracts the DNA of the content, and then compares it with the DNA of other content. In this sense, it can help find not only known unknowns but perhaps also unknown unknowns.
ICLR.4
So how do we make this happen? In the case of ICLR, we began by setting up a research and development lab, in August 2019, which we called ICLR&D. This was the brainchild of our then Head of Research, Daniel Hoadley. One of the first things he did was to set up a NLP project called Blackstone. That project’s deliverable was an open-source piece of software, the Blackstone library, that would allow researchers and engineers to automatically extract information from long, unstructured legal texts (such as judgments, skeleton arguments, scholarly articles, pleadings etc). The model was trained on sample documents drawn from ICLR’s existing dataset, a vast hoard of leading cases dating back to our foundation in 1865. The fact that these cases had been digitised as highly structured content made it much easier to process the data.
Daniel wrote about the project on the ICLR&D blog and on GitHub, where he published the open source software as he worked and improved the model. The idea was to develop a process whereby the user could input or upload a text document, such as a skeleton argument, legal problem or journal article, and Blackstone would analyse it, draw out the concepts and ideas (and existing cases) found in the document, and then find similar or related cases from the vast hoard of ICLR published judgments.
Having proved that such a process could work, on a selected subset of the data, and how it might be achieved at scale, we then took the idea to our Oxford-based developers, 67 Bricks, who built our current platform, known as ICLR.3. This AI “similarity search”, or Case Genie as we are calling it, will be at the centre of a suite of new features in a major upgrade, due later this year, which we are calling ICLR.4.
How will it work? The user will be invited to upload a document or paste in a piece of text in which a particular legal issue is described and discussed. Typically, this could be a skeleton argument. The material will be put through a “pipeline” of different analytical processes, which in turn break it down into sentences and individual words, parse them for parts of speech and dependency, identify and recognise entities such as case names, citations, courts, and legal concepts and ideas; and then group them by way of vectors.
Based on those vectors, the system can then find other content, ie existing cases, whose vectors most closely match it. Think of it as a match by the DNA sequence of the documents rather than their actual words. This can throw up some interesting results. It may simply confirm what you already knew, but it may throw up something startling and unexpected.
As with many computer processes, the better quality the input material, the more focused, the better the results. The ideal input text should confined to single theme or problem: that way, the similar cases suggested in response will more accurately resolve the unknown unknown of what the document might be missing. For a barrister anxious to ensure their skeleton argument hasn’t overlooked a key case, this could be a boon. Or it might help them disarm their opponent’s apparently impregnable argument.
The similar cases thus suggested can be ranked by relevance, in the sense of how closely their DNA matches that of the input document. But they can also be combined with a second process, finding related cases – those cases which have cited or been cited in the first category of case. This second network of relationships can be combined with the similarity model to produce an optimum list of what might be most relevant.
Taking this process one step further, one can then use the case analysis results to “prime” a more conventional search, using keywords or other parameters to search within the context of the recommended cases. This enables the user to really drill down to a set of results using all the tools at their disposal, both AI and conventional.
If this sounds exciting, then do please get involved in our Beta testing programme, which will be running over the summer. (Contact us via research@iclr.co.uk)
Meanwhile, watch this (vector) space.
Featured image via iStock.
A version of this post previously appeared in two articles in the newsletter of the British and Irish Association of Law Librarians.