Boston Public Library aims to increase access to a vast historic archive using AI

Boston Public Library, one of the oldest and largest public library systems in the country, is launching a project this summer with OpenAI and Harvard Law School to make its trove of historically significant government documents more accessible to the public.

The documents date back to the early 1800s and include oral histories, congressional reports and surveys of different industries and communities.

“It really is an incredible repository of primary source materials covering the whole history of the United States as it has been expressed through government publications,” said Jessica Chapel, the Boston Public Library’s chief of digital and online services.

Currently, members of the public who want to access these documents must show up in person. The project will enhance the metadata of each document and will enable users to search and cross-reference entire texts from anywhere in the world.

Chapel said Boston Public Library plans to digitize 5,000 documents by the end of the year, and if all goes well, grow the project from there.

Making a bargain with AI

Because of this historic collection’s massive size and fragility, getting to this goal is a daunting process. Every item has to be run through a scanner by hand. It takes about an hour to do 300-400 pages.

A book undergoing the digitization process in a scanner at Boston Public Library.
A book undergoing the digitization process in a scanner at Boston Public Library. (Boston Public Library)

Harvard University said it could help. Researchers at the Harvard Law School Library’s Institutional Data Initiative are working with libraries, museums and archives on a number of fronts, including training new AI models to help libraries enhance the searchability of their collections.

AI companies help fund these efforts, and in return get to train their large language models on high-quality materials that are out of copyright and therefore less likely to lead to lawsuits. (Microsoft and OpenAI are among the many AI players targeted by recent copyright infringement lawsuits, in which plaintiffs such as authors claim the companies stole their works without permission.)

“Having information institutions like libraries involved in building a sustainable data ecosystem for AI is critical, because it not just improves the amount of data we have available, it improves the quality of the data and our understanding of what’s in it,” said Burton Davis, vice president of Microsoft’s intellectual property group.

Access for all 

Greg Leppert, the Harvard Law School Library’s Institutional Data Initiative’s executive director, said it is not the goal of the initiative to grant AI companies privileged access to the rich troves of out-of-copyright information held at libraries and archives. Anyone can have access to the data after it’s been digitized.

“It’s a two-way street, where we are improving data in a way that will help AI, but those improvements work their way back into the library,” said Leppert. “So it improves the patron experience as well.”

OpenAI is helping Boston Public Library cover such costs as scanning and project management. The tech company does not have exclusive rights to the digitized data.

“We benefit, like others, from their efforts to digitize the public domain, expanding the high-quality data and public knowledge that AI systems, including ours, can build on,” the company said in a statement to NPR.

Challenges of public-private partnerships

Library professionals say working with AI companies will provide broader access to information.

“ I think this is a really worthwhile partnership out of which we are going to get more accessible collections,” Boston Public Library’s Chapel said.

And, because librarians are involved in curating and categorizing that information, the integrity of the materials used by AI companies can be more easily protected.

“Having trained professionals with deep subject knowledge is crucial in this moment as we start to develop what the future will bring,” said American Library Association President Sam Helmick.

But library experts also expressed caution about these partnerships because of the cultural differences between public institutions and corporations.

“The kind of ‘move fast and break things’ ethos of Silicon Valley is counter to the values of librarianship, which are about access and transparency,” said Michael Hanegan, co-author of the new book Generative AI and Libraries.

“This is all moving so fast: The technology is moving fast. The companies are moving fast,” Chapel said. “And libraries work on a very different timescale. So there’s a little bit of a culture clash.”

Jennifer Vanasco edited this story for broadcast and digital.

 

How Alabama Power kept bills up and opposition out to become one of the most powerful utilities in the country

In one of the poorest states in America, the local utility earns massive profits producing dirty energy with almost no pushback from state regulators.

No more Elmo? APT could cut ties with PBS

The board that oversees Alabama Public Television is considering disaffiliating from PBS, ending a 55-year relationship.

Nonprofit erases millions in medical debt across Gulf South, says it’s ‘Band-Aid’ for real issue

Undue Medical Debt has paid off more than $299 million in medical debts in Alabama. Now, the nonprofit warns that the issue could soon get worse.

Roy Wood Jr. on his father, his son and his new book

Actor, comedian and writer Roy Wood Jr. is out with a new book -- "The Man of Many Fathers: Life Lessons Disguised as a Memoir." He writes about his experience growing up in Birmingham, losing his dad as a teenager and all the lessons he learned from various father figures throughout his career.

Auburn fires coach Hugh Freeze following 12th loss in his last 15 SEC games

The 56-year-old Freeze failed to fix Auburn’s offensive issues in three years on the Plains, scoring 24 or fewer points in 17 of his 22 league games. He also ended up on the wrong end of too many close matchups, including twice this season thanks partly to questionable calls.

In a ‘disheartening’ era, the nation’s former top mining regulator speaks out

Joe Pizarchik, who led the federal Office of Surface Mining Reclamation and Enforcement from 2009 to 2017, says Alabama’s move in the wake of a fatal 2024 home explosion increases risks to residents living atop “gassy” coal mines.

More Front Page Coverage