Tushar Krishna, Georgia Tech
Host: Abhishek Bhattacharjee
Title: Communication-centric System Architectures for AI Acceleration
AI has become pervasive in our lives today. Given the heavy computational demands for AI, there has been a sustained investment from industry, academia and government programs in creating specialized compute acceleration systems for running AI efficiently across edge, HPC and datacenters. There are several open challenges and emerging opportunities in designing efficient AI systems. On the workload end, emerging AI models are increasingly complex with diverse shapes and sparsity-levels, driven in part by techniques like neural architecture search, and are increasingly memory-bound. The size of AI models has also been exponentially increasing, especially for domains like language translation and recommendations. These trends necessitate careful partitioning of AI models and/or datasets across multiple AI accelerators, and stage data accesses through the memory hierarchy on to the compute units. Unfortunately, the resultant data movement is a key bottleneck in AI systems today - showing up as energy and performance overheads. On the technology end, recent trends such as chiplets, wafer-scale architectures, CXL-based memory-pools, all open up several novel avenues for system optimization. The resultant design-spaces, especially for HW-SW co-design, are extremely large, necessitating sample-efficient search techniques to find optimized design-points.
In my talk, I will present my vision to address the aforementioned challenges via a systematic “communication-centric” approach to AI system design, where we proposed configurable communication fabric(s) within the system and co-optimize it along with the rest of the HW-SW stack. We have demonstrated several instances of this approach to tackle diverse AI workloads and scales (on-chip, on-package, on-wafer, on-rack) across both inference and training. I will also briefly present some open-source simulation infrastructures we developed and released to enable this research, that are now being actively used by academia, industry and national labs.
Tushar Krishna is an Associate Professor in the School of Electrical and Computer Engineering at Georgia Tech. He also serves as an Associate Director for the Center for Research into Novel Computing Hierarchies (CRNCH). He held the ON Semiconductor (Endowed) Junior Professorship from 2019-2021. He has a Ph.D. in Electrical Engineering and Computer Science from MIT (2014), a M.S.E in Electrical Engineering from Princeton University (2009), and a B.Tech in Electrical Engineering from the Indian Institute of Technology (IIT) Delhi (2007). Before joining Georgia Tech in 2015, Dr. Krishna spent a year as a researcher at the VSSAD group at Intel, Massachusetts.
Dr. Krishna’s research spans computer architecture, interconnection networks, networks-on-chip (NoC), and deep learning accelerators – with a focus on optimizing data movement in modern computing systems. His research is funded via multiple awards from NSF, DARPA, IARPA, Department of Energy, Intel, Google, Facebook, Qualcomm, TSMC and SRC. His papers have been cited over 11,000 times. Three of his papers have been selected for IEEE Micro’s Top Picks from Computer Architecture, one more received an honorable mention, and four have won best paper awards. He was inducted into the HPCA Hall of Fame in 2022. He received the “Class of 1940 Course Survey Teaching Effectiveness” Award from Georgia Tech in 2018 and the “Roger P. Webb Outstanding Junior Faculty Award” from the School of ECE in Georgia Tech in 2021.