New Data Science Institute Holds Inaugural Symposium

April 17, 2013

Over the last five years, the amount of digital information worldwide has increased almost 2,000 percent, exceeding 2.8 trillion gigabytes—perhaps as many bits of information as there are stars in the universe.

Few modern trends have greater capacity to transform our economy and society than “big data,” a topic hundreds of Columbia faculty members and students, industry leaders and technological visionaries gathered to discuss in a daylong symposium April 5 in Low Library sponsored by the University’s new Institute for Data Sciences and Engineering. The symposium, “From Big Data to Big Ideas,” made clear that Columbia will be a center of scholarship and solutions in several important fields that are being driven by the data revolution.

More than 350 guests attended the conference and, appropriately enough, hundreds more tuned in via webcast. Its starting point was the often observed fact that the world is awash in a sea of data, from sensors, digital photos, global positioning systems, social media, e-mail, inventory statistics, patient metrics, consumer spending patterns, and weather stations, to name some of the leading sources.

In his opening remarks, G. Michael Purdy, the University’s executive vice president for research, said big data “will dramatically change the way we view the world around us,” including the way new tools and technologies will be developed and how people will engage with one another.

Throughout the day, speakers from Columbia’s faculty as well as technology industry leaders discussed how all this data could be harnessed. In a keynote speech, Lawrence Burns, director of the Earth Institute’s Program on Sustainable Mobility, described innovations that are transforming the automotive sector, such as vehicles that drive themselves, shared cars, crash avoidance—all technologies and innovations “driven by data sciences and engineering.”

The panel on data visualization, moderated by New York Times technology reporter Steve Lohr, featured presentations from Journalism Professor Mark Hansen, director of the David and Helen Gurley Brown Institute for Media Innovation, and Engineering School Senior Vice Dean Shih-Fu Chang, the Richard Dicker Professor of Telecommunications and director of the Digital Video and Multimedia Lab. Chang highlighted his collaboration with psychologists to develop algorithms capable of detecting emotions associated with different facial expressions in images.

“There’s an active art practice to using data in creative ways,” Hansen told the crowd, describing installations he has helped create. One, the Shakespeare Machine in the lobby of lower Manhattan’s Public Theater, uses algorithms to choose phrases from 37 Shakespeare plays, which are continuously displayed in LED bulbs along the length of slender blades suspended from the ceiling. Hansen has a similar installation in the lobby of The New York Times building that pulls phrases from each day’s news. “If a picture is worth a 1,000 words, a formula is worth 1,000 pictures,” quipped Hansen.

Posters arrayed around Low Library detailed projects by Columbia researchers already underway, many of which use advanced pattern-recognition programs to find connections in data far too complex to spot using traditional tools.

One program developed by biomedical engineers at Columbia University Medical Center uses real-time computer processing to spot markers associated with seizures for patients with a particular brain condition. Another involves pattern-recognition programs capable of predicting golf performance by individual players under a variety of course conditions. Yet another is aimed at detecting bubbles in financial markets.

Technology companies including Bloomberg L.P., Google, Mediaocean and Microsoft have already agreed to partner with the new institute, providing more opportunities with Columbia faculty and students to engage in translational research.

Representatives from some of those companies participated in a panel moderated by Kyle Kimble, executive director of the New York City Economic Development Corporation, which has led Mayor Michael Bloomberg’s moves to energize New York’s economy by investing in applied science at multiple institutions. Columbia’s interdisciplinary data science institute, to be based at the engineering school, was announced last summer by Mayor Bloomberg and Columbia President Lee C. Bollinger.

With seed funding from the Economic Development Corporation, the endeavor will add 30 new engineering faculty members and 44,000 square feet of lab space and encompass seven other schools on the Morningside and Medical Center campuses. Six centers within the institute will focus on key sectors such as new media, health and financial analytics, smart cities, and cybersecurity. The institute also will offer a certification program comprising four core courses in data sciences, the first step in Columbia’s plan to create a master’s degree program and, ultimately, a Ph.D. in data sciences.

“Most of the data you find in the real world is dirty data—it’s not easy to find the truth in it,” said Shawn Edwards (BS’90, MS’95,) chief technology officer of Bloomberg L.P. “It requires we have domain experts to make sense of the data and figure out how to preprocess it.” Columbia, he noted, has those experts and will produce more in the areas of business, medical sciences and journalism.

Managing big data is not just a science issue, said Jennifer Tour Chayes, managing director of Microsoft Research New York City. “It crosses everything, whether it’s health sciences, financial services or particle accelerators.

“Columbia is in a unique position to provide the cross-pollination our industry needs, in finance, engineering, business studies and computer sciences. We are going to be looking to tap into that,” Chayes said.