Ashutosh Dhar

PhD | Computer Architect
Electrical and Computer Engineering
University of Illinois, Urbana-Champaign

About Me

About Me


I received my PhD in Electrical and Computer Engineering from the University of Illinois, Urbana-Champaign, advised by Prof Deming Chen. My primary area of research is in Computer Architecture, with a focus on Reconfigurable and Heterogeneous architectures. My research explores the application of reconfiguration in conventional architectures. My research focuses on GPU and multi-core architectures, with an emphasis on providing micro-architecture support for reconfiguration and is informed by a strong background in computer architecture, digital circuit design, and accelerator development using GPUs, FPGAs and CGRAs. My recent work looks at deep learning accelerators and their interaction with memory subsystem as the motivating factor.


  1. [2021] Our journal paper - "DML: Dynamic Partial Reconfiguration with Scalable Task Scheduling for Multi-Applications on FPGAs", has been accepted in the upcoming issue of the IEEE Transactions on Computers. It is available via early access now!
  2. [2021] Our work on a novel reconfigurable memory-compute fabric - Graviton - was accepted at ARC 2021.
  3. [2021] I was featured in the UIUC C3SR center newsletter's Team Member Spotlight
  4. [2021] I defended my PhD dissertation!
  5. [2020] I gave a talk about my graduate research at Facebook Research (Candidate Talk)
  6. [2020] I gave a talk about my graduate research at AMD Research (Candidate Talk)
  7. [2020] I've been invited to give a talk at the IBM 5th Workshop on the Future of Computing Architectures (FOCA 2020).
  8. [2020] FReaC Cache accepted at MICRO 2020!.
  9. [2020] Our work on ILP-based partial reconfiguration scheduling won the best paper award at VLSID 2020!.
  10. [2020] Our work on ILP-based partial reconfiguration accepted at VLSID 2020!.
  11. [2019] Invited to present our paper on Near-Memory and In-Storage FPGA Acceleration for Emerging Cognitive Computing Workloads at ISVLSI 2019..
  12. [2018] Our work on Memory Channel Networks was nominated for the Best Paper award at MICRO 2018..
  13. [2018] Our work on Memory Channel Networks accepted at MICRO 2018..
  14. [2018] I will be an intern in Nvidia Research's architecture group this summer.
  15. [2017] I will be an intern at IBM's T J Watson Research Center this summer.
  16. [2017] Our work on Efficient GPGPU reconfiguration accepted at FCCM 2017..
  17. [2016] I will be an intern in Nvidia GPU architecture team this summer.
  18. [2015] Presented our poster on Configurable On-Chip Learning Via In-Memory Computing and RRAM at ICCAD's HALO Workshop..
  19. [2015] Our work on Multi-level PCM based FPGAs was accepted at DATE 2015..
  20. [2014] Presented our paper on GPU thread structure optimization and workload allocation at SRC TECHCON 2014..


Here's a list of my recent research publications

  1. [2020] Xinheng Liu, Cong Hao, Yao Chen, Ashutosh Dhar, and Deming Chen, "Wino-SA: Efficient Systolic Architecture for Winograd Convolution," Proceedings of SRC TECHCON, September 2020.

  2. [2020] Ashutosh Dhar, Xiaohao Wang, Hubertus Franke, Jinjun Xiong, Jian Huang, Wen-mei Hwu, Nam Sung Kim, Deming Chen, "FReaC Cache: Folded Logic Reconfigurable Computing in the Last Level Cache", To appear in International Symposium on Microarchitecture (MICRO), 2020

  3. [2020] Ashutosh Dhar, Mang Yu, Wei Zuo, Xiaohao Wang, Nam Sung Kim and Deming Chen, "Leveraging Dynamic Partial Reconfiguration with Scalable ILP Based Task Scheduling", Proceedings of IEEE 2020 33nd International Conference on VLSI Design and 2020 19th International Conference on Embedded Systems (VLSID). (Best Paper Award)

  4. [2019] Cong Hao, Yao Chen, Xinheng Liu, Atif Sarwari, Daryl Sew, Ashutosh Dhar, Bryan Wu, Dongdong Fu, Jinjun Xiong, Wen-mei Hwu, Junli Gu, and Deming Chen, "NAIS: Neural Architecture and Implementation Search and its Applications in Autonomous Driving," Proceedings of IEEE/ACM International Conference on Computer-Aided Design (ICCAD), November 2019. (Invited)

  5. [2019] Ashutosh Dhar, Sitao Huang, Jinjun Xiong, Damir Jamsek, Bruno Mesnet, Jian Huang, Nam Sung Kim, Wen-mei Hwu, and Deming Chen, "Near-Memory and In-Storage FPGA Acceleration for Emerging Cognitive Computing Workloads," Proceedings of IEEE Computer Society Annual Symposium on VLSI (ISVLSI), July 2019. (Invited)

  6. [2018] M. Alian, S. Min, H. Asgharimoghaddam, A. Dhar, et. al., "Application-Transparent Near-Memory Processing Architecture with Memory Channel Network", 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2018

  7. [2017] A. Dhar and D. Chen, "Efficient GPGPU Computing with Cross-Core Resource Sharing and Core Reconfiguration", Proceedings of IEEE International Symposium on Field-Programmable Custom Computing Machines, (FCCM) 2017

  8. [2015] A. Dhar and D. Chen, "Neuromorphic Architecture Inspired Fast, Efficient and Configurable On-Chip Learning Via In-Memory Computing and RRAM", Poster, 2015 Workshop on Hardware and Algorithms for Learning On-a-chip (HALO), (ICCAD) 2015

  9. [2015] C. Wei, A. Dhar and D. Chen, "A Scalable and High-Density FPGA Architecture with Multi-Level Phase Change Memory", Proceedings of Design, Automation and Test in Europe, (DATE) 2015

  10. [2014] J. Wang, A. Dhar, D. Chen, Y. Liang, Y. Wang, and B. Guo, "Workload Allocation and Thread Structure Optimization for MapReduce on GPUs," Proceedings of SRC Technical Conference (TECHCON), September 2014.

Work Experience

I've had the opportunity to work and learn with some great people via some very cool internships over the last few years.

NVIDIA Research (Architecture Research Group)

(May 2018 - August 2018): Austin, TX

Worked on optimizing on-chip memory organizations for deep learning accelerators. My work focused on finding an organization that would be suitable for a range deep learning models. We explored the trade-offs between capacity, performance, and organizations for a variety of memory structures, as well as hierarchies in the memory-system, on a case-by-case basis to understand the needs of individual DL models. In addition, we explored the impact of different compute organizations on memory-system design

IBM Research

(June 2017 - August 2017): IBM T J Watson Research Center, Yorktown Heights, NY

Worked on acceleration and optimization of massively parallel and distributed training of deep networks. My work focused on compression algorithms for accelerating distributed training, with an emphasis on reducing the communication overhead involved in large scale distributed training. I studied compression techniques on a variety of deep networks that could be deployed in a scalable and GPU-friendly fashion. The work was integrated into a proprietary deep learning infrastructure toolchain.

NVIDIA, GPU Architecture Intern

(May 2016 - August 2016): Santa Clara, CA

Worked on performance modeling of new features in next generation GPUs and systems. My work focused on building a new performance model and simulation infrastructure for newly added features. The simulator was developed from scratch and will serve as the base infrastructure for future architectures. The simulation infrastructure was developed to be highly scalable, fast and cycle accurate.

Cisco Systems Inc.

(May 2013 - August 2013): Silicon Engineering, Enterprise Networking Group, San Jose, CA

Worked on the power analysis of Cisco's next-generation switching ASIC. I focused on developing a power analysis flow using Synopsys's PrimeTime PX. My work involved selecting cases and running tests/simulations to stress blocks, synthesizing blocks-under-test and analyzing the power and clock gating effectiveness under stress as well as under nominal and idle states. I worked on integrating hooks into the verification environment to enable this, along with integrating vendor power libraries and constraints, along with developing scripts to automate tasks and create reports. Analysis was done for actual power on gate-level netlists, with a comparative analysis between netlists before and after place-route, DFT insertion and clock tree creation.


ECE498ICC - Internet of Things

(Spring 2019): Prof Deming Chen, Prof Wen-mei Hwu, Prof Jinjun Xiong

As the graduate teaching assistant for the class, I am responsible for holding regular office hours as well lab sections. In my role as a TA, I assist in the creation and grading of homeworks, labs, and exams. I helped develop lab material relating to concepts in machine learning, deep learning, Python, IOT devices, and accelerators.

ECE527 - System on Chip Design

(Fall 2015, Fall 2017): Prof Deming Chen

As the graduate teaching assistant for the class, I am responsible for assisting the Professor with course logistics as well as helping students grasp key concepts. I helped develop a whole new set of machine problems and teaching material for the course centered around the Xilinx Zynq SoC platform. I helped develop material that taught students concepts relating to High Level Synthesis, SoC Design, DMA based transfers, Accelerator development and Hardware-Software co-design.

ECE110 - Introduction to Electronics

(Spring 2013, Fall 2013, Spring 2014, Fall 2014, Spring 2015, Spring 2016, Spring 2018, Fall 2019): Dr. Patricia Franke, Dr. Christopher Schmitz

I was a graduate teaching assistant for the course, focusing on the laboratory portion of the course. My responsibilities included providing brief lectures on key concepts relating to the lab and supervising and assisting students with lab work. The course focused on introductory concepts of Electrical Engineering from basic circuit analysis to logic design.


Contact Me


adhar2 {at} illinois {dot} edu