Faculty Q&A: Professor Mark Hansen Works to Integrate Data Science into Journalism

Statistician. Technologist. Artistic collaborator. And now, journalism professor.

Having spent virtually all his career working with numbers, Mark Hansen finds himself joining forces with the wordsmiths at Columbia Journalism School. “I feel like a spy in the house of Pulitzer,” he jokes.

Bridget O'Brian
Photo by Eileen Barroso
August 28, 2013

As East Coast Director of the David and Helen Gurley Brown Institute for Media Innovation at the Journalism School, he works with Emily Bell of the Tow Center for Digital Journalism and J-school colleague Susan McGregor to integrate data science into the curriculum and, ultimately, the profession.

He also chairs the New Media Center at the Institute for Data Sciences and Engineering. The data sciences institute was created in 2012 in a partnership with New York City.

Hansen has a history of visualizing data in unexpected ways. Last fall, he joined his longtime collaborator, Ben Rubin, in unveiling a chandelier-like piece at New York’s Public Theater called the Shakespeare Machine. Instead of lights, it has thin panels that stream lines of text from the Bard’s plays, generated by thematic algorithms. Similarly, the lobby of The New York Times features a prominent installation by the pair consisting of 560 small screens that include excerpts of text that has appeared in the paper since its founding in 1851, including letters to the editor and recipes. “We think of these exercises as a kind of storytelling,” he says.

Q. How does a statistician end up running an institute at Columbia Journalism School?

A few years ago, I spent my sabbatical from UCLA’s Department of Statistics at the R&D Lab of "The New York Times." And there I had the privilege of meeting and learning from some of the best creative technologists around. It was an amazing experience. So amazing that I began exploring how I could have more regular collaborations with journalists. That brought me to the office of the J-school dean, Nicholas Lemann, who said, “You know, we have this new institute here and we’re looking for a head....” One thing led to another and now I’m here.

Q. Did you have any journalism training or interest before getting here?

I've always tried to engage with ideas beyond my traditional statistics training. I have active collaborations with artists and designers and architects, not just to present ideas to the public in new ways but also to highlight the creative potential of data and data technology. It’s this experience with the expressiveness of data that I bring to the journalism school, finding and telling new stories, informing the public in new ways.

Q. Your work at Columbia goes beyond the Journalism School. What else are you doing here?

I’ve also been asked to head the New Media Center for the Institute for Data Sciences and Engineering. The center consists of all the departments you don’t think about when you think about data. There are professors from the Departments of English and Comparative Literature, history, architecture. Like journalism, a lot of these humanities-based disciplines are finding that their core artifacts are being digitized and turned into data. They have to navigate a world where a novel or a painting becomes representable and manipulable as data. When a humanities professor starts to work with data, or when a journalist starts to work with data, they create practices that incorporate their own disciplinary values and ethics. These new ways of thinking about and working with data should find their way back into a larger science of data. I believe that ultimately the science of data isn’t just a mathematical one. Counting is political and data collection is often a kind of social exchange. The New Media Center can help surface these different ways of thinking about data.

Q. What are you teaching at the Journalism School?

I’m teaching classes that help students tell stories with data. It’s a very different exercise than what I did when I taught straight-up statistics at UCLA. After a year at the J-School, I realize that when I explained data analysis to a group of statistics students, I was really teaching them to tell the story of a data set, as a data set. Instead, journalists ask, “What does this mean in a larger social context, what does this mean to a given neighborhood or community, what does this mean in respect to other data sets collected by other organizations?” There’s a need to fill the gap between the story of the data set and the story of the world, and filling that gap to me feels like journalism.

Q. The Brown Center was established as a collaboration between Columbia and Stanford Engineering. Can you describe the different roles of each institution?

The gift was pitched around marrying technology and content. The best way to look at it is that Stanford Engineering is world renowned for its tech innovations, and Columbia Journalism represents world-class content. I’m the head here, [Stanford Engineering Professor] Bernd Girod is the head at Stanford, and we work very closely. The endowment provides support for two or three of what are called Magic Grants a year at each institution and two fellowships each year at each institution.

Q. What kind of projects is the Institute interested in funding?

The kinds of projects we're looking to fund cross technology and storytelling. For example, this year Columbia gave a Magic Grant to The Declassification Engine, which is looking to apply machine learning techniques to expose the process of official secrecy, to assign authorship to anonymous declassified documents and to algorithmically remove some of the ink from redacted documents.

Q. How will what you do affect the Journalism School's new curriculum?

We have had the opportunity already. In the second half of the first semester, when, for instance, students are picking between video and photo modules, they can now choose to take Data I and Data II. Data I is learning about mean, median and mode, about histograms and scatterplots, this is how we grab data, how it’s represented and how we work with it. Data II starts here and ends, well, I’m not sure where. I’m planning on introducing a set of advanced tools that not every journalist needs to know, but a few should. In Data II, we’ll see how topics like machine learning and advanced visualization might be used as part of a righteous journalistic practice. I’m thrilled to say that I have 10 students signed up the class. In addition, I teach a class on journalistic computing in the spring that complements a data visualization class taught by Susan McGregor. And we’re starting up something called Year Zero, which will be a two-semester post-baccalaureate pro-gram providing basic computing and data skills to students interested in our dual degree master’s program in journalism and computer science.

Q. Why would someone want to take Year Zero?

Most of our applicants to the dual master's degree program come from engineering or mathematics or one of the sciences, which doesn’t exactly reflect the makeup of students who come to the school as a whole. Year Zero will prepare a history, English or humanities major to take a master’s degree in computer science. It’s an exciting addition to what we’re offering and hopefully will expand the horizons of those who come out of the school, helping them think about data code and algorithms in a new way.

Q. How will this change the Journalism School?

I hope that in five years' time every graduate of the school will know how to code. By that I mean they will see the computer as a human-created, human-scriptable object, that they can have a hand in generating new technology, and that they can ask some hard questions about new technologies. At the end of the day, data and data technologies are reshaping systems of power in our society, and journalists, as the explainers of last resort, need to be able to work with them, understand them, at a deep level. To say that in five years’ time, every graduate of the school can code really means that every graduate of the school will be able to engage with what it means to be a member of a data society.

Q. What does Columbia bring to the data revolution?

Journalistic institutions are here in New York en masse. There's so much amazing data-related activity, especially when it comes to journalism and other fields across the University. That just goes hand-in-glove with what the Institute is all about. You call up someone, “Hey, we’re doing this. Would you like coffee? Let’s talk.” And immediately there’s this outpouring of ideas and let’s see what we can pull together. But I don’t want to lose sight of the fact that the Browns were storytellers, they were showmen—Whether it was Helen as editor of Cosmopolitan magazine or David producing movies like Driving Miss Daisy and Jaws. The Brown Institute ultimately has to be about the quality of the story, funding not just tool creation but telling a good story. We have to find a sweet spot between technology leading the story and the story leading tech. But we have to make sure that the story that’s being told is a good one.