Distributed Computing

Distributed computing is the field in computer science that studies the design and behavior of systems that involve many loosely-coupled components. The components of such distributed systems may be multiple threads in a single program, multiple processes on a single machine, or multiple processors connected through a shared memory or a network. Distributed systems are unusually vulnerable to nondeterminism, where the behavior of the system as a whole or of individual components is hard to predict. Such unpredictability requires a wide range of new techniques beyond those used in traditional computing.

Like other areas in computer science, distributed computing spans a wide range of subjects from the applied to the very theoretical. On the theory side, distributed computing is a rich source of mathematically interesting problems in which an algorithm is pitted against an adversary representing the unpredictable elements of the system. Analysis of distributed algorithms often has a strong game-theoretic flavor, because executions involve a complex interaction between the algorithm’s behavior and the system’s responses.

Michael Fischer is one of the pioneering researchers in the theory of distributed computing. His work on using adversary arguments to prove lower bounds and impossibility results has shaped much of the research on the area. He is currently actively involved in the study of security issues in distributed systems, including cryptographic tools and trust management.

James Aspnes’s research emphasizes the use of randomization for solving fundamental problems in distributed computing. Many problems that turn out to be difficult or impossible to solve using a deterministic algorithm can be solved if processes can flip coins. Analyzing the resulting algorithms often requires using non-trivial techniques from probability theory.

Distributed systems research at Yale includes work in both programming language support for distributed computing, and in the use of distributed systems techniques to support parallel programming. Such work is designed to lift some of the burden of understanding complex distributed systems from the shoulders of distributed system designers by letting the compiler or run-time libraries handle issues of scheduling and communication.

Zhong Shao’s FLINT project focuses on developing extensible, secure, and resilient distributed systems. Modern distributed systems involve very sophisticated consensus protocols and may manipulate critical data such as crypto keys,  digital currency, and smart contracts. A bug either in the protocol design or in the system implementation could all lead to major security or safety breaches. Zhong’s group is particularly interested in applying formal verification technologies to build fully trustworthy ecosystems for modern distributed applications (DApps). His group has developed the world’s first fully verified concurrent operating system and hypervisor kernel (CertiKOS) using the deep specification technologies. Zhong is interested in evolving CertiKOS into a full blown certified distributed operating system capable of supporting modern certified DApps and microservices.

David Gelernter’s work on developing the Linda coordination language and related tools is an example of using distributed system techniques to support parallel programming. Linda provides a virtual “tuple space” through which processes can communicate without regard to location. His current Lifestreams project similarly simplifies information-management tasks, by freeing the user from many of the clerical duties imposed by traditional filesystems.

Avi Silberschatz specializes in transaction management techniques as they relate to both distributed database systems and multidatabase systems.

Anurag Khandelwal’s research focuses on building distributed cloud systems, tackling systems challenges around processing, storing and serving so called “big data”. His team has been exploring the design of distributed shared memory abstractions, as well as distributed resource management and scheduling techniques for emerging serverless and disaggregated data center architectures.