Speaker: Nick Duffield, Texas A&M University
Title: The cost and benefit of reducing Big Data size and complexity
Host: Joan Feigenbaum
Sampling is a powerful approach to reduce Big Data to Small Data, relieving storage and enabling faster query response when an approximate answer suffices. The first part of this talk describes a cost-based formulation for optimal data reduction that is used by a major ISP, and some new applications to subgraph counting in graph streaming. The second part of this talk focuses on the use of machine learning methods to model the complex dependence between internet user experience and the systems that provide services, and how this knowledge can be used to improve those services. The talk also touches on the dependence between the foundations and applications of data science, and the costs and benefits of interdisciplinary data science research.