Programming Languages and Compilers

Programming Languages and Verification research at Yale emphasizes expressive, efficient, flexible, secure, and reliable programming environments for future information, computation, and communication systems. We approach this problem from several directions including language design, formal methods, compiler implementation, programming environments, and run-time systems. A major focus of the research at Yale is to build secure, error-free programs, as well as develop frameworks that help others achieve that same goal.

Faculty working in this area:

faculty email website
Timos Antonopoulos  
David Gelernter  
Jay Lim  
Ruzica Piskac ROSE
Abhishek Bhattacharjee Bhattacharjee Group
Zhong Shao FLINT
Alan Weide  


Highlights in this area:

Zhong Shao’s research interest is, very broadly, on building reliable and secure computer software. The computer industry has shown explosive growth in the past thirty years. Software dependability, however, falls far behind everything else because we do not have good languages and tools that can take precise control of the ever-growing complexity in modern computer systems. During the last few years, Professor Shao’s FLINT group has been developing and applying new language-based technologies to build certified system software (e.g., OS kernels and hypervisors). Certified software consists of a binary machine executable plus a rigorous machine-checkable proof that the software is free of bugs with respect to specific requirements. A certified OS kernel is a library-based kernel but with formal specifications and proofs about all of its abstraction layers and system libraries. The FLINT group is currently working on the following three important questions: (1) Under a clean-slate approach, what are the right OS kernel structures that can offer the best support for resilience, extensibility, and security? (2) What are the best programming languages and developing environments for implementing such certified low-level programs? (3) What new formal methods we need to develop in order to support these new languages and make certified programming both practical and scalable?

Ruzica Piskac’s research interests focus broadly on the topic of software synthesis and verification. Software errors in general cost the global economy a trillion dollars annually, based on 2017 estimates. Detecting and preventing software errors plays a major role in the development process, with programmers using techniques like testing, debugging, and verification, and as a result, automating a lot of these techniques is crucial. The different projects Professor Piskac’s group is working on include (1) automatically learning specifications for complex, unstructured languages such as database configuration files and continuous integration configurations, and Infrastructure as Code, (2) work on synthesis of Functional Reactive Programming programs from logical specifications that describe safety and security properties, (3) work on modular verification engine for Haskell programs, which automatically solves symbolic constraints on function inputs to fully explore code fragments, (4) privacy preserving formal methods, where the the goal is to provide cryptographic protocols for evaluating classic problems in formal methods - such as model checking and satisfiability solving – while maintaining privacy over user inputs. More information about research can be found at ROSE group’s website.

Abhishek Bhattacharjee and his students are interested in building next-generation operating systems for emerging heterogeneous computer systems. The waning of Moore’s Law and Dennard scaling as well as the widespread success of AI have prompted systems designers to embrace heterogeneity in hardware. Today’s systems integrate tightly- and loosely-coupled accelerators, ranging from GPUs, TPUs, neural network hardware, DSPs, etc., a swath of heterogeneous memory devices, ranging from high-bandwidth and non-volatile memories, with a complex amalgam of operating systems, drivers, firmware, runtimes, and languages. Abhishek’s group is currently studying how best to combine existing abstractions in production-scale compilers (e.g., LLVM) with domain-specific knowledge of emerging fields that will influence next-generation AI (e.g., cognitive neuroscience) to accelerate large-scale scientific models. 

On a larger scale, David Gelernter’s software ensemble research is the study of programs that are built out of many separate, coordinated activities, with an emphasis on recognizing and understanding the properties that all such systems share, versus the more widespread practice of dividing the field into intellectually disjoint sub-areas. Ensemble computing includes the study of: (1) parallel programs, which focus many computers on a single problem in order to solve it faster; (2) distributed programs, which attempt to integrate many separate, autonomous computers; (3) ensembles, which include time- as opposed to space-separated activities and which pose new research challenges in the areas of file systems and databases, and; (4) the relationship between software ensembles and naturally occurring ones, as for example in economic, biological, or physical systems. All of these areas pose both theoretical and pragmatic questions. The search for a formal definition of “coordination language,” and the development of Internet applications are new areas of focus in the software ensemble research effort.

Broadly, Jay Lim’s research interest lies in developing abstractions and techniques to create correct, safe, and secure low level systems. Jay focuses on systems where there are large gaps between the perception of the general public and the actual truth. Particularly, there are many systems in which the general public considers to be a safe and trusted system. However, in reality, these systems are far from trustable, riddled with bugs, security vulnerabilities, and incorrect outputs. In the past, Jay worked on techniques to ensure that applying both compiler optimizations and sanitizer transformations do not introduce unexpected behavior. He also worked on techniques to help developers hand-write correct, secure, and optimized programs in assembly directly. Currently, his focus has shifted to numerical programs, number representations, and floating point representation. He is interested in answering three questions: (1) Can we design math libraries that produce correct answers for elementary functions? (2) How can we improve the accuracy of computation heavy numerical programs? And (3) Can we formalize the behavior of floating point representation? (4) Can we design an efficient, hardware supported representation that truly models real numbers, rather than approximate real numbers?