​The Data Science Institute announced recipients of the 2024 Data Science Institute Seed Funds today.

These awards support new research collaborations that bring together faculty and researchers across Columbia’s schools together to apply data science to a range of disciplines.

This year’s funded projects span fields ranging from astronomy to epidemiology, and climate change to law, bringing solutions-oriented approaches to challenges outside the bounds of any single discipline. 

The transdisciplinary teams, which represent six Columbia schools, will accelerate their projects over the course of the next year, drawing on DSI’s extensive network of scientists, scholars, and technical experts. 

Read about this year’s projects funded by the Data Science Institute Seed Funds.

Bringing Data Science into Astronomy for Early Discoveries with the Vera Rubin Observatory

Jennifer L. Sokoloski, Research Scientist, Columbia Astrophysics Laboratory
Savannah Thais, Associate Research Scientist, Data Science Institute

When the Rubin Observatory’s Legacy Survey of Space and Time (LSST) begins full operations in Chile in 2025-2026, its unprecedented images of the southern hemisphere of sky will offer data about the cosmos at a scale and complexity never before seen in the field of astrophysics.

To lay the groundwork for a decade of astrophysics discoveries that leverage this new resource, Sokoloski and Thais plan to use precursor data and simulated LSST data to design a machine-learning- based plan for research with LSST data that will uncover an important, previously hidden population of binary stars, and in the process conduct a census of accreting white dwarfs in our galaxy, with implications for the origin of supernovae used to study dark energy.

Detecting Emerging Opioid-Related Polysubstance Use Patterns, Motivation, and HIV and Overdose Narratives through Machine Learning Based Natural Language Processing: A Reddit Study 

Kechna Cadet, Postdoctoral Fellow, Epidemiology, Mailman School of Public Health
Silvia Martins, Professor of Epidemiology, Mailman School of Public Health
Smaranda Muresan, Research Scientist, Data Science Institute and Visiting Associate Professor, Barnard College

The social media platform Reddit offers surprisingly candid conversations about a range of public health issues, including substance use and other high risk behaviors. To leverage these posts as a window into the experiences of populations at high risk for HIV, Cadet, Martins, and Muresan will analyze this text using human-assisted machine learning and natural language processing (NLP).

They will convert narratives into usable and meaningful information for analysis, exploring patterns and themes to uncover reasoning for behaviors. They will also work to determine whether their findings can be used to improve understanding – and ultimately interventions – for real-life behaviors. 

One Size Doesn’t Fit All: Developing Bespoke Building Decarbonization Plans

Bianca Howard, Assistant Professor of Mechanical Engineering
Anthony Vanky, Assistant Professor, Graduate School of Architecture, Planning and Preservation

While the impact of climate change is global, given people’s diverse experiences – from habits and attitudes, to sensitivity to heat and air pollution –  effective decarbonization plans need a tailored approach.  

Through household interviews, participatory sensor measurements, and machine learning techniques, Howard and Vanky will analyze qualitative and quantitative data to uncover patterns and trends that drive energy use and their relationship to individuals’ lived experiences, opening up new opportunities for greenhouse gas emission reductions through building- and community-specific decarbonization plans that are both holistic and justice-centered.

Policy Evaluation with Transfer Learning: How to Assess Safety Performance of Self-Driving Cars in NYC?

Kaizheng Wang, Assistant Professor of Industrial Engineering and Operations Research, Fu Foundation School of Engineering and Applied Science
Sharon Di, Associate Professor of Civil Engineering and Engineering Mechanics, Fu Foundation School of Engineering and Applied Science

Does successful deployment of self-driving cars on San Francisco streets mean the vehicles can be safely introduced into New York City traffic?  

To make that assessment, Wang and Di will develop innovative transfer learning methods to evaluate existing driving algorithms and construct a traffic simulator to analyze future ones. 

While their work aims specifically to help cities more safely adopt autonomous vehicles, the tools they propose to develop could be used in a broader range of scenarios where policy-makers use data from one domain to assess safety in another.  

Searching for Less Discriminatory Algorithms: Creating Technical Tools and Regulatory Frameworks

Emily Black, Assistant Professor, Computer Science, Barnard College
Talia Gillis, Associate Professor of Law and Milton Handler Fellow, Columbia Law School

Recent executive orders call for increased oversight to address algorithmic bias in applications ranging from consumer credit to housing to health care. Black and Gillis will partner to bridge the existing gap between legal anti-discrimination requirements and the disparate impact doctrine, and the current ad hoc nature of many algorithmic fairness solutions, to create new technical and legal frameworks. 

In the first stage of their work, they will focus on fair lending law, exploring how current compliance regimes attempt to prevent algorithmic discrimination. They will then develop new frameworks that are sensitive to the full decision-making pipeline to search for less discriminatory algorithms.  

Do You Speak EMG? Generative Pre-training on Electromyographic Signals for Controlling a Rehabilitation Robot after Stroke

Matei Ciocarlie, Associate Professor of Mechanical Engineering, Fu Foundation School of Engineering and Applied Science
Carl Vondrick, YM Associate Professor of Computer Science, Fu Foundation School of Engineering and Applied Science
Joel Stein, Simon Baruch Professor of Physical Medicine and Rehabilitation; Chair, Department of Rehabilitation and Regenerative Medicine, Vagelos College of Physicians and Surgeons 

While the disruptive power of generative learning may be best known for its role in large language models, Ciocarlie, Vondrick, and Stein want to bring generative learning to bear on a new world of languages: those of the human body.

By advancing and applying generative learning to the understanding of electromyographic signals – or electrical activity in muscles – the team aims to develop a wearable robotic device that can sense what activity a user is trying to perform, offering real-time physical assistance to stroke survivors and other people with motor impairments.