Whose Judicial Data Is It, Anyway?
 
          
      Editor’s Note: This story is the first in an occasional series on research projects currently in the works at the Law School.
Every court case and judicial proceeding generates an enormous amount of data, some of which is either non-public or difficult to access.
What to do with that data is a question that Aziz Z. Huq, the Frank and Bernice J. Greenberg Professor at the Law School has been pondering lately. Huq is coauthoring a paper with Northwestern Law School Professor (and former Chicago Law School Public Fellow) Zachary D. Clopton that they hope will begin a thoughtful discussion of who should control this judicial data and who should have access to it.
If currently hidden data were made accessible and affordable, Huq explains, attorneys and researchers could use it to help find answers to a wide range of constitutional and public policy questions. For example:
- When is the provision of legal counsel effective, unnecessary, or sorely needed?
- When and where is litigation arising and what are the barriers to court access?
- Are judges consistent when they determine in forma pauperis status?
- Do judges’ sentencing decisions reflect defendants’ observed race, ethnicity, or gender?
- Are any state and local governments infringing on civil rights though their policing or municipal court systems?
According to Huq and Clopton, judicial data could be used to help clarify the law in ways that advance legality and judicial access, reveal shortfalls in judicial practice, and enable the provision of cheaper and better access to justice.
That potential has increased dramatically with the advent of AI and large language models (LLMs), such as ChatGPT.
“I had been writing about public law and technology, especially AI, for about five years. I became curious recently about why, of all the branches of government, only courts have been left largely to their own devices when it comes to collecting, archiving, and releasing information about its work,” said Huq.
While the legislative and executive branches have an extensive body of constitutional, statutory, and regulatory provisions channeling Congress and executive branch information—and countless public debates about transparency and opacity in and around both elected branches—the federal judiciary still relies on ad hoc procedures to determine what data to collect, preserve, and make available.
As a result, Huq and Clopton believe that “a lot of valuable data is either lost or stored in a way that makes it hard to use for the public good.”
Meanwhile, the authors note that large commercial firms such as Westlaw (owned by the Thomson Reuters Corporation), Lexis (owned by the RELX Group), and Bloomberg are moving to become the de facto data managers and gatekeepers who decide on the public flow of this information and who capture much of its value.
“At minimum, these developments should be the subject of more public discussion and scholarly debate,” said Huq. “Until now, however, one of the biggest obstacles to having that discussion is a lack of information about what data is at stake. It became apparent that we didn’t know why we knew what we knew, and we didn’t know what we didn’t know.”
The Scope of the Data
There were no studies about the full scope and depth of judicial data currently being preserved by the various courts’ disparate procedures—and no certainty about what other data could be preserved if there was a concerted effort to do so.
To fill that gap, Huq and Clopton drew on primary sources and previous scholarship, and then supplemented that research with anonymized interviews with selected judicial staff and judges.
They quickly discovered that, with no regulatory framework to guide them, institutional practices varied widely among federal courts. Different courts save different types of data, organize it differently, and make different types available to the public.
Even significant judicial data that has been collected is often kept just out of reach. For example, the cover sheets that are filed in every civil case contain a treasure trove of useful information, such as the court’s basis of jurisdiction, the type of relief sought, and the nature of the suit.
“A comprehensive database of civil cover sheets,” the authors write, “would be an extremely valuable source of insight into the timing, cyclicality, substance, and distribution of civil litigation in federal courts.”
Defective Delivery of Data
While federal courts make some data available via the Public Access to Court Electronic Records (PACER) database, that archive is neither comprehensive nor easy to use, and with a 10 cents per page public access fee, expensive, especially for large research projects. Moreover, its search capabilities are limited; PACER does not allow the user to search by judge and does not permit full-text or natural-language searches.
The Federal Judicial Center’s Integrated Database suffers from similar defects, as do the courts’ various statistical reports.
Huq and Clopton’s paper demonstrates how these database design choices — kludgy interfaces, limited search options, requiring downloads to proceed page-by-page and at a fee — have the effect of partly privatizing this info by driving the public to commercial firms, who then get to decide what data they want to make available and at what price.
Data Should Be Open, Not Opaque
In the authors’ view, openness and transparency are critical ingredients for making an institution that all Americans would recognize as a true “court.”
“To be clear,” Huq said, “we are not saying the courts must disclose everything. We recognize that there are privacy and other interests at stake and there needs to be some balance and debate around them. But we do believe there are some things we could all agree that the courts could be required to do now. So, our article focuses on that low-hanging fruit and seeks to provoke a conversation rather than partisanship.”
Huq and Clopton’s article will be published this summer by the Stanford Law Review.
Charles Williams is a freelance writer based in South Bend, Indiana.
 
 
 
   
  