CS Talk - Frank Wood
Host: Daniel Spielman
Coffee/tea - 10:15, BCT MC035
Title: “Revolutionizing Decision Making, Democratizing Data Science, and Automating Machine Learning via Probabilistic Programming (and One Example Language: Anglican)”
Abstract: Probabilistic programming aims to enable the next generation of data scientists to easily and efficiently create the kinds of probabilistic models needed to inform decisions and accelerate scientific discovery in the realm of big data and big models.
Model creation and the learning of probabilistic models from data are key problems in data science. Probabilistic models are used for forecasting, filling in missing data, outlier detection, cleanup, classification, and scientific understanding of data in every academic field and every industrial sector. While much work in probabilistic modeling has been based on hand-built models and laboriously-derived inference methods, future advances in model-based data science will require the development of much more powerful automated tools than currently exist.
In the absence of such automated tools, probabilistic models have traditionally co-evolved with methods for performing inference. In both academic and industrial practice, specific modeling assumptions are made not because they are appropriate to the application domain, but because they are required to leverage existing software packages or inference methods. This intertwined nature of modeling and computation leaves much of the promise of probabilistic modeling out of reach for even expert data scientists. The emerging field of probabilistic programming will reduce the technical and cognitive overhead associated with writing and designing novel probabilistic models by both introducing a programming (modeling) language abstraction barrier and automating inference.
The automation of inference, in particular, will lead to massive productivity gains for data scientists, much akin to how high-level programming languages and advances in compiler technology have transformed software developer productivity. What is more, not only will traditional data science be accelerated, but the number and kind of people who can do data science also will be dramatically increased.
My talk will touch on all of this, explain how to develop such probabilistic programming languages, highlight some exciting ways such languages are starting to be used, and introduce what I think are some of the most important challenges facing the field as we go forward.
Dr. Wood is an associate professor in the Department of Engineering Science at the University of Oxford. Before that Dr. Wood was an assistant professor of Statistics at Columbia University and a research scientist at the Columbia Center for Computational Learning Systems. He formerly was a postdoctoral fellow of the Gatsby Computational Neuroscience Unit of the University College London under Dr. Yee Whye Teh. He received his PhD from Brown University in computer science under the supervision of Dr. Michael Black and Dr. Tom Griffiths.
Dr. Wood is a product of the Illinois Mathematics and Science Academy from which he graduated in 1992. He began college at the University of Illinois at Chicago (UIC) but transfered and received a B.S. in computer science from Cornell University in 1996. Prior to his academic career he was a successful entrepreneur having run and sold the content-based image retrieval company ToFish! to Time Warner and serving as CEO of Interfolio. He started his career working at both the Cornell Theory Center and subsequently the Lawrence Berkeley National Laboratory.
Dr. Wood holds 6 patents, has authored over 40 papers, received the AISTATS best paper award in 2009, and has been awarded faculty research awards from Xerox, Google and Amazon.