Can You Keep a Secret?

Matthew Connelly, who runs Columbia’s History Lab, researches the answer to that and other questions about U.S. government records.

By
Eve Glasberg
October 24, 2023

Columbia’s History Lab is staffed by a team of scholars, designers, scientists, and engineers, who are building tools to preserve the fabric of the past and provide lessons for the future. The lab has gathered more than 3 million documents to create the Freedom of Information Archive, the world’s largest database of declassified government records.

Matthew Connelly, a professor of international and global history, directs the lab, which he discusses with Columbia News.

How long has the History Lab existed, and how did it start?

We began work back in 2013, when I started to realize how the whole system for reviewing and releasing secret documents was breaking down. This system still depends on human beings looking at one page at a time—about 2,000 people in total—at the same time that millions of other people with security clearances are producing a tsunami of classified data every single day.

I learned about new advances in artificial intelligence, and how machine learning algorithms could be used to analyze text as data. I realized there was potential not just to accelerate the declassification process—and analyze what the government did not want us to know— but also to learn all kinds of new things about the history of covert operations, war plans, and international diplomacy. I was lucky enough to find smart colleagues in the computer science and statistics departments who were interested in exploring the possibilities.

How does the lab use data science to analyze state secrecy, particularly intelligence, surveillance, and weapons of mass destruction?

Very carefully! Early on, we realized that there were political and even legal aspects to this research that require careful consideration. Here again, I was fortunate to find legal scholars who were also concerned about excessive state secrecy, and they helped me address concerns that we might somehow risk national security just by doing this research. But most of the actual work we have done over the last decade has involved gathering and cleaning the data. Any data scientist will tell you this is a big part of the job whenever you start working in a new domain, and the data is particularly messy when it includes millions of declassified documents.

But now we have the world’s largest such database, and we have been able to do a whole series of experiments. This includes identifying patterns and anomalies in what the government prefers to keep hidden, but also basic questions about history itself. Like, when you look at the totality of the archival record, what do government officials spend most of their time working on? Do people typically know they are living in a historic moment? And can we automatically identify heretofore unknown events by analyzing communications patterns?     

What are some current projects you're working on in the lab?

One project involves gathering all the records we can concerning the U.S. government response to COVID-19. We started this in collaboration with the Brown Institute for Media Innovation, and with the support of a National Archives grant. We are just now getting to the point where we can start exploring the data—hundreds of thousands of pages from state and local governments, and more and more from federal departments and agencies as well.

Possible research questions include determining where state and local officials obtained their information, and how they responded to events elsewhere. We also hope to explore how the language of internal discussions compares to public messaging. We have had a lot of interest from researchers in this field, but it's a tough environment in which to obtain follow-on funding. 

It’s odd when you think that an estimated 20 million people died because of COVID worldwide, and yet there is scant interest in new research that might help us understand why. But everything we do depends on funding, and while we have been very fortunate in receiving support from several foundations and some private individuals, we need to make the case every time for pursuing a new avenue of inquiry.

Another area of research, by contrast, is receiving enormous and growing attention. We just received a grant from the National Science Foundation to explore risks arising from the malign use of large language models like ChatGPT. We are well-positioned to do this work, because we have so much data related to weapons of mass destruction, and have also gained a lot of experience in managing other kinds of sensitive information—like when it contains the personal details of private individuals. So I think this work can, and will, grow in time.    

Do both undergraduates and graduate students help with lab work?

Absolutely! I’ve had over 30 students participate in History Lab research, such as writing code, testing new methodologies, doing exploratory research, and evaluating results. They have been first-year undergrads, masters students, and doctoral candidates. They also come from many disciplines, since the work is intrinsically multidisciplinary.

We have an open-door policy—anyone is welcome to sit in on our lab meetings. The most dedicated have stuck with their History Lab research for years, and in some cases have gone on to start their own firms.

Is there an overlap between what you teach and what goes on in the History Lab?

More and more. I have often taught a course called Hacking the Archive, which is all about developing innovative research projects in a classroom setting. I also teach a large lecture course, World History Since World War II, and over the years, I’ve found ways to weave in insights and data from my lab research. This past year, and for the first time, I invited students from the lecture course to come to weekly History Lab meetings as a kind of lab section. I gave them some new research tasks that the regular staff did not have time to focus on, and it was exciting to see them execute these ideas.

I know it’s a bit of a cliche when people say that the best teaching and the most innovative research go together. But I’ve learned from personal experience that they really do: I’m at my best, as a teacher, when sharing new discoveries. The students pick up on the excitement, and it helps them understand the different dimensions of what universities do, and how they are connected. 

Are you ever surprised by research findings from the lab, or learn something new that you didn't know before?

Over and over again. It started right from the beginning. I had already been a historian for almost 20 years before I started to work with data scientists. Just to understand the data, I also needed to work more closely with archivists. It was only then that I came to understand that the archival record—i.e., what I had been exploring all those years doing my research—represents only about 1% to 3% of the documents produced by the people I was trying to study.

This is a basic fact, but one historians don’t typically dwell on, even if they are aware of it. It has profound implications for how we know what we know. And even those with some appreciation for this, may not know that archival practices are now evolving rapidly, with the transition from paper to digital records. I don’t think anyone really knows how much archivists will be able to preserve from the exponentially growing mass of discarded hard drives, social media, and defunct websites.

But what I do know is that we historians must rethink our methods, and start teaching our students how to deal with it—both the mass of information, and what is missing. Just think: How would you research the social and cultural history of our time, without methods that would allow you to preserve, catalog, and mine the vast trove of images, sound, and video that billions of people are producing every single day?

The Declassification Engine by Columbia University Professor Matthew Connelly

Does your most recent book, The Declassification Engine, result in any way from History Lab projects?

It is the direct result! The book describes what we discovered after a decade of research. It is a history of secrecy, and how we came to this strange and dangerous place, when we have a sitting president and a former president both under investigation for mishandling state secrets. At the same time, we are finding that the government has lost its ability to keep secrets—or even track them—because of the chaotic and decaying state of our declassification system.

The American people have never been more mistrustful. So we really have the worst of all possible worlds. But in my book, I try to show how there is a better way. The things we discovered—whether about experiments with unwitting subjects, or mass surveillance, or secret war plans—show that new technology can, indeed, help us move to a more rational, risk-management approach to protecting truly dangerous information. At the same time, it can help restore citizens’ faith that they can ultimately hold those in office accountable—if only in the court of history.

What are you teaching this semester?

I’m on sabbatical! With summer teaching, I’ve been teaching almost nonstop for years. I’ve now taken up a position at Cambridge University directing their Centre for the Study of Existential Risk. It turns out that secrecy can also undermine our ability to predict and prepare for new threats, such as the secrecy that surrounded the development of nuclear weapons, and that currently shrouds research on new forms of artificial intelligence. There’s still much work to be done!