Ashutosh Dhar

PhD Candidate
Electrical and Computer Engineering
University of Illinois, Urbana-Champaign

About Me

About Me


I'm a PhD candidate in the Electrical and Computer Engineering department at the University of Illinois, Urbana-Champaign, advised by Prof Deming Chen. My primary area of research is in Computer Architecture, with a focus on Reconfigurable and Heterogeneous architectures. My research explores the application of reconfiguration in conventional architectures. My research focuses on GPU and multi-core architectures, with an emphasis on providing micro-architecture support for reconfiguration and is informed by a strong background in computer architecture, digital circuit design, and accelerator development using GPUs, FPGAs and CGRAs. My recent work looks at deep learning accelerators and their interaction with memory subsystem as the motivating factor.

Quick Links

Here's a copy of my CV.
Here's a copy of my 2 page resume.


Here's a list of my recent research publications

  1. [2020] Xinheng Liu, Cong Hao, Yao Chen, Ashutosh Dhar, and Deming Chen, "Wino-SA: Efficient Systolic Architecture for Winograd Convolution," Proceedings of SRC TECHCON, September 2020.

  2. [2020] Ashutosh Dhar, Xiaohao Wang, Hubertus Franke, Jinjun Xiong, Jian Huang, Wen-mei Hwu, Nam Sung Kim, Deming Chen, "FReaC Cache: Folded Logic Reconfigurable Computing in the Last Level Cache", To appear in International Symposium on Microarchitecture (MICRO), 2020

  3. [2020] Ashutosh Dhar, Mang Yu, Wei Zuo, Xiaohao Wang, Nam Sung Kim and Deming Chen, "Leveraging Dynamic Partial Reconfiguration with Scalable ILP Based Task Scheduling", Proceedings of IEEE 2020 33nd International Conference on VLSI Design and 2020 19th International Conference on Embedded Systems (VLSID). (Best Paper Award)

  4. [2019] Cong Hao, Yao Chen, Xinheng Liu, Atif Sarwari, Daryl Sew, Ashutosh Dhar, Bryan Wu, Dongdong Fu, Jinjun Xiong, Wen-mei Hwu, Junli Gu, and Deming Chen, "NAIS: Neural Architecture and Implementation Search and its Applications in Autonomous Driving," Proceedings of IEEE/ACM International Conference on Computer-Aided Design (ICCAD), November 2019. (Invited)

  5. [2019] Ashutosh Dhar, Sitao Huang, Jinjun Xiong, Damir Jamsek, Bruno Mesnet, Jian Huang, Nam Sung Kim, Wen-mei Hwu, and Deming Chen, "Near-Memory and In-Storage FPGA Acceleration for Emerging Cognitive Computing Workloads," Proceedings of IEEE Computer Society Annual Symposium on VLSI (ISVLSI), July 2019. (Invited)

  6. [2018] M. Alian, S. Min, H. Asgharimoghaddam, A. Dhar, et. al., "Application-Transparent Near-Memory Processing Architecture with Memory Channel Network", 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2018

  7. [2017] A. Dhar and D. Chen, "Efficient GPGPU Computing with Cross-Core Resource Sharing and Core Reconfiguration", Proceedings of IEEE International Symposium on Field-Programmable Custom Computing Machines, (FCCM) 2017

  8. [2015] A. Dhar and D. Chen, "Neuromorphic Architecture Inspired Fast, Efficient and Configurable On-Chip Learning Via In-Memory Computing and RRAM", Poster, 2015 Workshop on Hardware and Algorithms for Learning On-a-chip (HALO), (ICCAD) 2015

  9. [2015] C. Wei, A. Dhar and D. Chen, "A Scalable and High-Density FPGA Architecture with Multi-Level Phase Change Memory", Proceedings of Design, Automation and Test in Europe, (DATE) 2015

  10. [2014] J. Wang, A. Dhar, D. Chen, Y. Liang, Y. Wang, and B. Guo, "Workload Allocation and Thread Structure Optimization for MapReduce on GPUs," Proceedings of SRC Technical Conference (TECHCON), September 2014.

Work Experience

I've had the opportunity to work and learn with some great people via some very cool internships over the last few years.

NVIDIA Research (Architecture Research Group)

(May 2018 - August 2018): Austin, TX

Worked on optimizing on-chip memory organizations for deep learning accelerators. My work focused on finding an organization that would be suitable for a range deep learning models. We explored the trade-offs between capacity, performance, and organizations for a variety of memory structures, as well as hierarchies in the memory-system, on a case-by-case basis to understand the needs of individual DL models. In addition, we explored the impact of different compute organizations on memory-system design

IBM Research

(June 2017 - August 2017): IBM T J Watson Research Center, Yorktown Heights, NY

Worked on acceleration and optimization of massively parallel and distributed training of deep networks. My work focused on compression algorithms for accelerating distributed training, with an emphasis on reducing the communication overhead involved in large scale distributed training. I studied compression techniques on a variety of deep networks that could be deployed in a scalable and GPU-friendly fashion. The work was integrated into a proprietary deep learning infrastructure toolchain.

NVIDIA, GPU Architecture Intern

(May 2016 - August 2016): Santa Clara, CA

Worked on performance modeling of new features in next generation GPUs and systems. My work focused on building a new performance model and simulation infrastructure for newly added features. The simulator was developed from scratch and will serve as the base infrastructure for future architectures. The simulation infrastructure was developed to be highly scalable, fast and cycle accurate.

Cisco Systems Inc.

(May 2013 - August 2013): Silicon Engineering, Enterprise Networking Group, San Jose, CA

Worked on the power analysis of Cisco's next-generation switching ASIC. I focused on developing a power analysis flow using Synopsys's PrimeTime PX. My work involved selecting cases and running tests/simulations to stress blocks, synthesizing blocks-under-test and analyzing the power and clock gating effectiveness under stress as well as under nominal and idle states. I worked on integrating hooks into the verification environment to enable this, along with integrating vendor power libraries and constraints, along with developing scripts to automate tasks and create reports. Analysis was done for actual power on gate-level netlists, with a comparative analysis between netlists before and after place-route, DFT insertion and clock tree creation.

Contact Me


adhar2 {at} illinois {dot} edu