Exploring the Hidden ‘Grammar’ of How Planets Are Ordered in Space

A new study harnesses deep learning and linguistics to understand how thousands of planets beyond our solar system are configured around their host stars. In this Q&A, Emily Sandford, GSAS’20, explains how planets can be analyzed like so many words in a sentence.

By
Kim Martineau
August 09, 2021

Astronomers discovered the first planets circling a star other than our Sun in the 1990s. Since then, more than 4,200 other exoplanets have been found. “They’re everywhere,” said Emily Sandford, formerly a PhD student in David Kipping’s lab at Columbia, now a postdoc at Cambridge University. “On average, there’s at least one planet per star in the Milky Way. As our telescopes grow more powerful, we’re finding more and more examples of planetary systems with more than one planet orbiting the same star.”

To learn more, Sandford teamed up with Kipping and computational linguist Michael Collins to see if they could detect a pattern in how planets are configured around their host star as a function of their size and distance from the star. They used linguistics techniques for sorting words into parts of speech to see if planets or planetary systems might be organized in a similar way. The study, On planetary systems as ordered sequences, was published this month in the Monthly Notices of the Royal Astronomical Society.

Emily Sandford, GSAS'20

Q. How did you get this idea of using deep neural networks and parts of speech like nouns and verbs as a way of understanding the structure of planetary systems?

A. The movie "Arrival" got David thinking about planet data as an untranslated language. We started talking to Michael Collins, then a computational linguist at Columbia, now at Google, and expanded on the analogy from there. A grant from Columbia's Data Science Institute allowed us to pursue the idea.

Q. You cover the details in your Cool Worlds Lab video. Can you give us the short version?

A. This project is a new way to explore the patterns in planetary systems. We know that planets aren't arranged randomly; astronomers have noticed, for example, that planets in a system together tend to have similar sizes. That means there’s information not just in the star and planets individually, but in their relationships to one another. We use techniques developed by computational linguists to study the relationships between words in a sentence to understand the configuration of planets in a solar system. 

Natural language processing tools are found in your smartphone, car, and speakers, making sense of our speech and text. Could these same algorithms make sense of exoplanets? Emily Sandford, GSAS'20, explains.

Q. What’s significant about your results?

A. The methods are totally new to astronomy! This work is also part of a broader shift towards studying planetary systems in their own right, rather than as individual planets or an entire population of planets without the context of their surrounding systems. I think it’s an exciting direction.

Q. What was the biggest hurdle you faced in doing this research?

A. Because these techniques are so new, I had to juggle a lot of pieces—keeping my code and thoughts organized. I ran back and forth between linguistics and astronomy jargon as well as the mathematics underpinning the project, the computer programs I rewrote to process our data, and interpreting the numerical jumble of results into words and pictures. Fittingly, it felt a lot like translation.

Q. What’s the overarching goal of this work?

A. To find out whether natural language processing techniques are even useful for astronomy. Can they help us group planetary systems into categories? We’re working with a fraction of the data that linguists have so we weren't sure we would find anything. Even now, we're only seeing the broadest strokes; we'll need to discover many more planetary systems before this method can deliver specific, conclusive results. I told David early on that I’d be happy if our methods picked out the compact multi-planet systems as their own "category" (exoplanet scientists have long thought that compact multis are distinct from other types of systems). In the end they did, so that was an exciting confirmation that the method works.

Q. What made you decide to become an astrophysicist?

A. I’ve always liked math and science, so I went to college planning to study pre-med. I loved my introductory physics class, though, so I applied to some summer programs to see what physics research was like. I joined an astronomy project at the American Museum of Natural History and from there I was hooked.

Q. How did you get interested in exoplanets and do you have a favorite one?

A. I was in the second year of my PhD when David arrived at Columbia. He was the first to specialize in exoplanets, so it was mostly lucky timing. I joined his group and liked working with exoplanet data right away: it's fundamentally so simple. We measure the amount of light shining from a star over time and look for the shadows of planets crossing between the star and our telescope. My favorite planets, HAT-P-11b and Kepler-17b, orbit stars that are covered in starspots, or patches at the surface that are cooler and darker than the rest of the star. Starspots are more variable than the clockwork orbit of planets; sometimes a planet will eclipse one, and sometimes it won't. We use that information to tell where the spots are and how often they appear.

Q. You’ve produced several other Cool Worlds Lab videos, from The Lunar Space Elevator to the Exoplanet Multiplicity Distribution. What do you like best about communicating science this way?

A. YouTube is great for science communication because viewers can go back and watch the video again if they didn't quite make sense of it the first time. You can be more ambitious than you can in an in-person talk, where if you lose the audience, that's it. David has also gotten extremely good at editing and animating, so the visuals are clearer and more engaging than the typical talk slides. Of course, you can't see how the audience reacts so you have to be more careful in the writing, anticipating possible points of confusion and heading them off at the pass. I spend much longer on a video script than I do preparing a talk. 

Q. David Kipping’s lab must have been a fun place to be. What do you miss most?

A. David is one of the most creative and enthusiastic scientists I know. It wasn’t unusual to come in to our weekly lab meeting and find that he’d had a new idea, and even written a paper about it, since we last met. I miss my friends in the PhD program most. One of the pandemic’s silver linings has been watching their dissertation defense talks over Zoom.