Getting a Clearer View of the Stars—With Statistics
Luhuan Wu came to Columbia in 2018 to study data science. Four years later, she’s nearly half way through Columbia’s PhD program in statistics. What prompted the change? A seminar taught by a statistics professor on deep generative models, the technology behind deepfakes that can transform a selfie into a Renaissance-style portrait. The only child of civil servants, Wu was drawn to numbers at an early age, majoring in mathematics at Nanjing University before coming to Columbia for her master’s in data science. Columbia News spoke to her about her research, her favorite concept in stats, and more.
What was it about generative models that caught your interest?
I realized that with statistics and modern AI, we could go beyond predicting stock prices or running a hypothesis test. We could build a model to understand how the world works. Cool applications of deep generative models, or GANs, and variational autoencoders, or VAEs, include generating vivid, almost surreal, images and discovering new molecules. It’s a field with so many unknowns to explore.
What drew you to mathematics?
I like how counterintuitive mathematics can be. I watched my first math professor in college construct a cantor set, which is a mathematical object of infinite size, zero length, and fractional dimensions. You construct it by manipulating infinite numbers between 0 and 1. At that moment, my math professor seemed like a magician, revealing a concept that’s common in mathematics but doesn’t exist in the real world. It made me want to learn more.
Cunningham Lab in the News
Your first project was in neuroscience. Tell us about it.
I built a generative model to explain maternal behavior in mice. I was able to generate short-term behavioral sequences that looked realistic, but I couldn’t generate longer term sequences with solid scientific interpretations. I worked on this project for half a year without any publications.
What made you keep going? Advice for other young researchers struggling with failing experiments?
My advice is to remember that every research gets stuck. That’s why it’s called research. I switched to an astronomy project that produced results and a second project with applications for modeling the gut microbiome. That built up my confidence. Sometimes it just takes longer to find the right project. Give yourself time and patience to try new things.
You spent the first year of your PhD program at home, in China. What was that like?
Physically, the hardest part was dealing with the 12-hour time difference. I either stayed up very late or got up early for classes and meetings. Emotionally, the biggest challenge was staying positive and motivated amid so much uncertainty. I wasn’t sure when the pandemic would be over and I could return to campus. At one point I thought I might finish my entire PhD at home.
You’re currently creating a dust density map of the Milky Way. Why?
Our galaxy is full of dust particles that get in the way of astronomical observations by making the stars appear dimmer and red-shifted, or farther away, than they actually are. In a collaboration with the Flatiron Institute, our goal is to build a dust density map of the Milky Way that will allow astronomers to correct their measurements with higher precision. The biggest challenge is scaling up the algorithm so it can work on millions of star observations at high resolution. We’re leveraging algebraic structures and novel numerical techniques to make more efficient computations.
What's the coolest thing you've learned about astronomy so far?
How ancient starlight actually is. I was once looking at some star observations 5 kilo-parsecs from Earth, which is about 16,000 light-years away. I couldn’t believe how far away in time and space that data was. When I mentioned this to my collaborator, she calmly told me that traveling across 16,000 light years is only a second in the span of the universe. I came to realize how short human history is by comparison.
What's your favorite concept in stats?
The manifold hypothesis, which says that high-dimensional data exist in low-dimensional space. I like this idea because it provides a way of simplifying a chaotic world. No matter how complex a dataset, a simple pattern can be found with the help of statistics and machine learning. Take the popular personality test, the Myers Briggs. There are billions of people in the world, but according to the Myer Briggs they all fit into 16 basic personality types.
Of all the places you’ve traveled, what was your favorite?
I traveled to Iceland over spring break with my boyfriend and made this video to share with family and friends. The landscape is like no other place in the world. I was awed by the waterfalls, glaciers, basalt columns, and endless fields of lava. Despite the harsh weather and isolated geography, the Icelanders I met seemed very hard-working and positive.
Impressive footage! What’s your secret?
Drones … and patience. It would be sunny one minute, then rainy, snowy, and windy the next. We tried to fly a pair of drones around Kirkjufell, which gets its name from its church-steeple-like shape. When we arrived, it was snowing heavily, so we headed to the hotel. But 10 minutes later, the weather turned sunny again so we returned to Kirkjufell. But then it started to snow! On our third try, we finally got to fly our drones.