How Next-Generation Models Will Leverage Big Data and AI for More Accurate Estimates of Future Climate

In this Q&A, climate scientist Galen McKinley and computer scientist Carl Vondrick explain how Columbia’s new climate modeling center will improve on the latest projections.

By
Kim Martineau
September 09, 2021

With funding from the National Science Foundation, Columbia is set to launch a new research center aimed at building the next generation of data-driven physics-based climate models. Current models agree that Earth will get warmer in the next few decades, but they disagree on how severe the effects will be, and what parts of the world will be hardest hit. With collaborators at other universities and national labs, Columbia researchers will update the models with new information gleaned from massive datasets and new machine-learning methods. The broader goal of the center, called Learning the Earth with AI and Physics, is to provide actionable information for societies to adapt to climate change and protect the most vulnerable. Columbia News caught up with the center’s deputy director, Galen McKinley, and its data science director, Carl Vondrick, to dig into the details.

Q: How do global climate models currently work? What are their limitations? 

GALEN MCKINLEY: Climate models, also known as earth system models, work by representing the physics, chemistry, and biology of the climate system as mathematical equations. These equations are solved on a three-dimensional grid with cells representing the atmosphere, land, and ocean. We have no data for the future, and so the models draw from the scientific knowledge embedded in the equations to make their projections. Historical data are used to validate model simulations for the recent past, but are not directly used to guide the model. 

Most earth system models run on supercomputers, but they require even more computing power than we have available. This limits the size of the cells in our three-dimensional grid. In current models, a cell typically measures 60 miles on each side, with one value per cell representing a single variable like temperature, cloud cover, or rainfall. But, in the real world, these properties vary significantly over an area covering about 360 square miles!

earth system model illustration with examples of low-resolution and high-resolution cells describing cloud cover

A second limitation is that we don’t know how to write accurate equations for all the physical, chemical, and biological processes that influence climate. The equations we have are often based on field experiments from a few locations or points in time, and thus do not represent all conditions equally well. We tend to know physical equations better than biological equations, and research is ongoing to improve them. But we still have much to learn. 

Q: What existing machine-learning algorithms will you apply to the models to improve projections?

CARL VONDRICK: The form of machine learning we are using, deep learning, achieves excellent performance when we have large amounts of data. The center will capitalize on this technology to better represent several climate processes like ocean turbulence or ice sheet flow in the models. One technique we’re particularly excited about is generative adversarial networks, or GANs, which will allow us to fill in data gaps with a high level of detail. We’re also excited to leverage techniques for equation discovery, allowing us to interpret the neural networks that underlie deep-learning algorithms to extract equations that may advance our understanding of Earth’s climate system.

Q: You'll also be developing new algorithms. What questions will these AI tools address?

CV: A key aim of the center is to deliver next-generation machine-learning technology for answering some of the unresolved questions in climate science. We’ve seen many exciting advances in AI lately, but there are still key obstacles to overcome. For example, current algorithms excel at identifying patterns in data, but they often make projections that violate basic laws of physics. Further, large datasets provide the best results, but for many physical and biological processes, collecting this much data is either impractical or too expensive.

To solve these challenges, LEAP researchers will investigate how to deeply embed physical knowledge into state-of-the-art machine-learning models, thereby allowing systems to identify causal relationships, make do with less data, and reliably generalize to new situations. 

Q: Deep-learning algorithms can find patterns and correlations in big datasets but can't tease out cause and effect relationships. How would algorithms equipped with causal reasoning help to improve climate models? 

CV: Cause-and-effect reasoning is crucial for the models to capture why different physical and biological events happen. For a simple example, standard machine-learning methods tell us whether clouds and rain are related, but not much more. Machine learning with causal reasoning will tell us whether the clouds actually created the rain, and whether more clouds will create more rain. Embedding causality into the models will be pivotal for planning adaptation strategies to different climate processes.

Q: Humanity is also unpredictable. It's unclear how much national economies will grow, and how much more greenhouse gases we'll pump into the air. The transition to clean energy could off-set some of our growth-related emission increases. How much does this uncertainty affect climate projections?

GM: There is no doubt that our emissions of carbon dioxide and other greenhouse gases are driving climate change. The most important action we can take is to cut emissions as rapidly as possible. But carbon dioxide stays in the air for a long time, and thus any changes we make today to reduce emissions or remake our energy infrastructure won’t have much of an impact until at least 2050. In the meantime, we must adapt to the warming set in motion by current and future emissions. Our center is focused on making climate projections that can help guide the difficult decisions that will need to be made in the next few decades regarding climate adaptation. Right now, we’re focused on uncertainties in the models.