Photo by Michael DiVito
The quest to unravel the mysteries of the brain has made neuroscience one of today’s most exciting research fields. Researchers use electrophysiology to measure brain activity, light sheet microscopy to make high-resolution observations of living tissue and optogenetics to control living tissue with light. These tools and more are generating reams of data as they explore the brain in ways that were impossible only a few years ago.
Now the question becomes: How to interpret all that data? Enter computational neuroscience, a relatively new discipline that employs statistical machine learning and computer science tools to transform huge data sets into comprehensible and usable information.
John Cunningham sits at this juncture of neuroscience and statistics. He joined Columbia in 2013 as an assistant professor in the statistics department, where he teaches courses with names like “Probability and Statistics for Data Science,” “Statistical Machine Learning” and “Gaussian Processes and Kernel Methods.” His research integrates machine learning, a fairly new and rapidly growing field of data science and artificial intelligence, with neuroscience. “Some people think of statistics as an esoteric mathematical field, but especially in the last 10 years, that has not been the case,” he says. “It is at the heart of the data problems that the modern world is facing.”
Cunningham arrived at Columbia as it was embarking on two pivotal ventures that are directly relevant to his work. The Mortimer B. Zuckerman Mind Brain Behavior Institute is a venture launched by President Lee C. Bollinger to solidify Columbia’s long-term leadership in interdisciplinary research on critical aspects of the human brain. The Data Science Institute, a collaboration of 10 schools across the University, brings together almost 200 faculty to deepen our understanding of data, both theoretically and in numerous applied areas, such as financial analytics, smart cities and new media.
“Statistics represents a pivotal discipline for Columbia University and plays a key role in our rapidly growing data science efforts,” said David Madigan, executive vice president for arts and sciences and a professor of statistics. “From law, to journalism, to the digital humanities, and, of course, to neuroscience, we are continuing to see the power of this University-wide thrust towards data-centric science.”
A decade ago, while he was a Ph.D student at Stanford, Cunningham asked himself what was the most important problem for data science and machine learning in the 21st century. “The answer I came up with at the time is the one I still believe: how does the brain work?”
Q. Why is the discipline of statistics so relevant now?
A. There is an exponential increase in the amount of data we have. There is an assumption that all this data equals insight, or information, or even profit. But there is a gap between taking all the data sitting on hard drives and turning it into usable information. And that’s what data science and machine learning are all about: how to analyze and make sense of it all. Since the dawn of the field, we haven’t seen anything like the current growth of data sets in their complexity and their size. These data speak to our most pressing challenges: our own biomedicine, our astoundingly interdependent economy, and the fate of our environment. Making sense of this data is precisely what the field of statistics seeks to do. Computer science and engineering are also essential pieces of the machine learning puzzle.
Q. How does statistics contribute to the study of neuroscience?
A. There is a whole history of neuroscience built on investigating the activity of single neurons. If you look at our Columbia colleagues Eric Kandel and Tom Jessell’s [two of the co-directors of the Zuckerman Institute, along with Richard Axel] Principles of Neuroscience, which is considered the neuroscience textbook, you will see many examples of that. Then, about 15 years ago, hardware engineers and optical physicists started coming up with new technologies that can record up to thousands of neurons at a time. Imagine that for years and years you looked at one stock ticker showing only IBM. And then all of a sudden somebody gave you a thousand other stock tickers. How do you cope with that information? That is essentially the problem, or the demand, that brings data science to the fore.
Q. Are these problems unique to neuroscience?
A. Not at all. It used to be that you scrutinized weather information at the one weather station on top of your news building. Now you have access to 10,000 weather stations and you’re trying to bring all of that together to try to make sense of things globally. But you still want to know what’s going to happen in your local area. This general problem is something that the field of statistics has been spending a lot of time and effort on. There are many example applications of dealing with multiple dimensions of data.
Q. What would be an example that you’re working on?
A. I’m working on a dataset now with a zebrafish, where we can record nearly every one of its 100,000 neurons simultaneously. (These fish, often seen in aquariums, have genomes similar to the human genome and are frequently used to study gene functions.)
One half-hour experiment gets you a terabyte of data. So now you need computer scientists and a distributed, parallelized computing infrastructure. And once you’ve got this data, how do you analyze it, how do you make sense of it? That’s what the science of data is all about. That’s why you need statisticians.
Q. Are there questions about the brain that can only be answered through statistical neuroscience?
A. If I put a particular image in front of your eyes, I can predict very well how the neurons on the back of your retina are going to behave and slightly less well how the neurons in your visual cortex will behave. That is based on the one-neuron-at-a-time analysis. Now, here’s a more complicated scenario: a fly in your visual field—do you swat it? It takes many different brain areas to make that decision and to execute that movement. It’s not reflexive, it’s just really, really fast. You’re computing where is it likely to go, can I catch it, will it go in my eye and should I swat it, will I make a fool of myself and smack myself in the face? These are things that we do effortlessly, yet we need an interdependent network of perhaps hundreds of millions of neurons to make that happen. There is no way to determine how that remarkable process works without statistical analysis.
Q. How do you record data from neurons?
A. Neurons basically make these little electric sparks or “spikes” over time, and they go tick, tick, tick. Say that I’m trying to pick up a bottle of water. The neurons are talking to each other by sending these spikes back and forth at each other. And somehow this is brain language for “pick up that bottle of water.” The neurons encode this instruction that sends out the command to my arm, my hand, and so on, to bring the bottle to my mouth. How can you deconstruct that when you’re looking at this wall of little electrical ticks? You just can’t approach that question without statistics.
Q. You have been working with Mark Churchland, a principal investigator at the Zuckerman Institute and assistant professor in the Department of Neuroscience at Columbia University Medical Center. Can you describe the research?
A. Mark and I have known each other since we were at Stanford together, when he was a postdoc in neuroscience and I was a graduate student in electrical engineering. We are researching the motor cortex, one of the main parts of the brain that controls voluntary movements. He and I have been working for about a decade to try to understand how the neurons in that brain area work together to drive the remarkable sophistication of our movements. Here is a case where two very different fields come together and do something that neither of them could have done alone, because the questions we have can’t be answered in just the statistics department or just the neuroscience department.
Q. Can you explain why voluntary movement is so interesting?
A. All sorts of things without brains or nervous systems—trees, plants—live perfectly successful lives without ever moving in any intentional way. Some say then that the only reason to have so many of our remarkable capabilities—memory, emotion, vision, etc.—is to inform the way we interact with the world around us. And how do we interact with the world? We contract and relax muscles, whether to move ourselves physically, or our vocal cords to communicate, or in some other way; movement is our only action to influence the world. For example, what is the evolutionary advantage of remembering that a particular place is dangerous unless I decide not to move there again? My point is that movement is an important component in our basic nervous system architecture. Of course this is more than just basic scientific inquiry: millions of people can’t move or have serious motor impairments due to disease and injury. And painfully, by and large we have very little idea how to improve those conditions, partly because we don’t fully understand how the brain produces movement in the first place. That is a mystery that a statistician can help unravel.
Q. What do you see as the future of data science?
A. I think the future is in large collaborations between data scientists and basic and applied scientists. By that I mean getting a lot of investigators, a lot of students and a lot of industry interests together, rather than the silo model where you’ve got one lab working on one particular thing at a time. This is the integrative, interdisciplinary, collaborative dream, which is what we have with the Zuckerman Institute. It’s got statisticians, imaging people, hardware people, optics people, to say nothing of neurobiologists, neurophysiologists, molecular biologists, all of these amazing experts. Then there’s the Data Science Institute, where the same story of teamwork applies. I think that the future of neuroscience, and of data science, is really in more of these teams coming together.
—Interviewed by Marley Bauce