Daniel Hsu | Breathing Life into Machine Learning

January 12, 2014

In recent years, human interactions with intelligent machines and software have become increasingly commonplace.

This can be attributed to developments in machine learning, a branch of artificial intelligence involving the integration of systems that can learn from data. Such systems include those that automatically sort your emails into spam folders and autocorrect your text messages.

Despite the obvious influx of artificial intelligence in our society, humans are still needed to teach machines how to learn. One of the humans at the forefront of innovation in machine learning is Daniel Hsu, assistant professor of computer science. Hsu has developed novel algebraic machine learning algorithms and his research has also been instrumental in the advancement of active learning and differential privacy. While Hsu explores multiple sub-areas of machine learning, he is widely renowned for his work with Hidden Markov Models (HMMs), statistical models with applications in automated speech recognition, natural language processing, computational biology, and robotics.

In 2009, together with Toyota Technological Institute at Chicago’s Sham Kakade (now with Microsoft) and Rutgers University’s Tong Zhang, Hsu developed a broadly used spectral learning algorithm for the problem of learning HMMs. “With better functionality than previously known algorithms for learning HMMs, which were heuristic in nature and very slow to run large data sets, this work received a lot of attention from both theoreticians and practitioners,” says Hsu. “The initial work provided a rigorous proof of computational learnability—that is, in principle, HMMs could be efficiently solved by computer programs—but much work was still needed to make the work usable in practice. Some of the most exciting follow-up work for the algorithm’s practical deployment was accomplished by my colleagues at Columbia Engineering.”

To combat active learning issues, those in which unlabeled data is abundant but requires expensive and time-consuming manual labeling processes, Hsu also developed a new protocol that can reduce human labor costs and streamline machine learning processes in many domains.

“My work in active learning merges data collection and the learning process. The new algorithms only request annotations for the most informative data while ensuring that the machine learning process proceeds effectively, as if the machine had annotations for all of the data,” Hsu explains. “These algorithms have countless potential applications; for instance, a group at MIT applied one to the problem of classifying electrocardiographic recordings, a case where annotations are generally supplied by expert cardiologists.”

Today, Hsu also focuses his research on differential privacy, which concerns the risk of machines leaking sensitive data. This current research examines the trade-offs involved with statistical estimation accuracy and gaps revealed between what can be accomplished with and without privacy protection.

Before joining Columbia Engineering in 2013, Hsu conducted postdoctoral research for Microsoft, worked as a postdoctoral associate at Rutgers University, and served as a visiting postdoctoral scholar at the University of Pennsylvania. He is a faculty affiliate of Columbia’s Institute for Data Sciences and Engineering and the Institute’s Foundations of Data Science Center.

BS, University of California Berkeley, 2004; PhD, University of California San Diego, 2010