Machine Learning

Machine Learning at Yale - External Web Site

Recent improvements in computers and communication technology have made staggering amounts of information available to us. Fortunately there has been a concomitant increase in processing power and a surge of novel optimization methods. One of the major challenges for computing lies in finding ways to organize the available processing power in order to extract maximum benefit from the available information. Different aspects of that problem are addressed by machine learning, natural language processing, knowledge discovery, and data mining.

Every field of human endeavor is affected by increasing amounts of available information, raising questions about how to extract useful insights from this data. For example, in bioinformatics, there are opportunities to use information about the human genome to design more effective medications, information from medical records to track drug side-effects or emerging diseases, and information from medical imaging to understand the structure and function of the brain and other organs. In robotics, major challenges lie in making use of visual and other sensed information to allow a robot to adapt flexibly to its environment. In the domain of e-commerce, all but the simplest artificial agents designed to gather information and act on behalf of human users will have to be equipped with the ability to adapt their behavior successfully to unexpected conditions. The very large number of documents and databases available on the web has fostered the development of natural language processing methods and tools for abstracting, extracting, and combining sources in a variety of ways to make the information more usable to humans. Such information may be presented to the human users as part of natural language dialogue, or it can be stored in a structured format, e.g. a relational database, for further analysis. See Natural Language Processing for more information.

Machine learning is driven by the goal of making programs or agents that exhibit useful learning behavior, autonomously or in cooperation with teams of other agents, either human or artificial. One goal is software that is easier to use, e.g., a word-processing program that can guess from an example or two what text transformation a user wishes to make. Machine learning can be used in the design of a team of soccer-playing robots to allow the team to adapt its tactics to the observed behavior of an opposing team. At the level of hardware, sophisticated processor designs can adapt to characteristics of the program currently being run to achieve increases in speed.

As learning components are increasingly included in software and embedded systems, it is important to have a clear understanding of their theoretical and practical strengths and limitations. Computational learning theory – a theoretical branch of machine learning–develops and studies algorithmic models of learning, using tools from analysis of algorithms, theory of computation, probability and statistics, game theory, and cryptography. Fundamental models and results in computational learning theory have established the learnability or non-learnability of a number of classes of concepts from various kinds of information. Many exciting questions and areas remain open, including learnability of probabilistic models, learnability in games and other cooperative and competitive situations, and aspects of learnability in natural language. As one example, in the context of e-commerce it is important to study how a population of agents can learn and use information about the trustworthiness of the other agents in the population.

Knowledge discovery and data mining address the question of extracting usable information from very large and heterogeneous sources of information. In this context, even a linear-time algorithm can be too slow. One promising approach to a collection of text documents is to treat it as a matrix of the occurrences of distinct words in the documents, and to approximate the information in the matrix using a low-rank singular value decomposition. This has motivated the search for fast provable approximations of important matrix-vector computations based on weighted sampling.

Optimization and sampling methods are the workhorse of modern machine learning and, more broadly, for a variety of computing and engineering tasks.  New vistas for foundational research in optimization and sampling have opened with the confluence of areas, emerging computational resources, and consequences of automation. The major challenge is to develop new optimization and sampling methods that can tackle the scale (in both collected data and parameter spaces of machine learning models), can overcome the non-convexity or non-Euclidean geometry inherent in many real-world problems, can deal with stochastic or adversarial perturbations in the data, and that can work in dynamic or time-varying environments. 

Moreover, with the reach of machine learning tools in a variety of societal and humanistic contexts, it has become important to design optimization and sampling methods that can incorporate also constraints such as fairness and privacy; see the Societal and Humanistic Aspects of Computation page for more on this.

Faculty members with interests in this area include Dana Angluin, Amin Karbasi, Smita Krishnaswamy, John Lafferty, Sahand Negahban, Dragomir Radev, Brian Scassellati, Daniel Spielman, Marynel Vázquez, Nisheeth Vishnoi, and Andre Wibisono.