CSA logo

Research


My interests are in the design of programming languages/models, compilers, and runtimes for multicores, distributed-memory clusters, and accelerators, with an emphasis on automatic parallelization and high performance. Computational domains of particular interest to me include stencil computations, image processing pipelines, dense linear algebra, deep neural networks, and deep learning.

Research Tools / Software

Multicore Computing Lab - my group's page

Publications

Google scholar profile, DBLP, BibTeX

  1. Optimizing Geometric Multigrid Computation using a DSL Approach
    Vinay Vasista, Kumudha KN, Siddharth Bhat, Uday Bondhugula
    Supercomputing (SC), Nov 2017 (to appear).

  2. Diamond Tiling: Tiling Techniques to Maximize Parallelism for Stencil Computations
    Uday Bondhugula, Vinayaka Bandishti, Irshad Pananilath
    IEEE Transactions on Parallel and Distributed Systems (TPDS), pg 1285-1298, Vol 28, Issue 5, May 2017.
    (extended version of SC'12 paper)


  3. A DSL Compiler for Accelerating Image Processing Pipelines on FPGAs
    Nitin Chugh, Vinay Vasista, Suresh Purini, Uday Bondhugula
    IEEE International conference on Parallel Architectures and Compilation Techniques (PACT 2016), Sep 2016.

  4. Compiling Affine Loop Nests for a Dynamic Scheduling Runtime on Shared and Distributed Memory [PDF]
    Roshan Dathathri, Ravi Teja Mullapudi, Uday Bondhugula
    ACM Transactions on Parallel Computing (TOPC), vol 3, issue 2, Jul 2016.

  5. SMO: An Integrated Approach to Intra-Array and Inter-Array Storage Optimization [PDF]
    Somashekaracharya Bhaskaracharya, Uday Bondhugula, Albert Cohen
    ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL), Jan 2016.

  6. Automatic Storage Optimization for Arrays
    Somashekaracharya Bhaskaracharya, Uday Bondhugula, Albert Cohen
    ACM Transactions on Programming Languages and Systems (TOPLAS), vol 38, issue 3, Apr 2016.
    Selected for presentation at ACM SIGPLAN PLDI'16, Jun 2016.

  7. The Pluto+ Algorithm: A Practical Approach for Parallelization and Locality Optimization of Affine Loop Nests [PDF]
    Uday Bondhugula, Aravind Acharya, Albert Cohen
    ACM Transactions on Programming Languages and Systems (TOPLAS), vol 38, issue 3, Apr 2016.

  8. An Optimizing Code Generator for a Class of Lattice-Boltzmann Computations [PDF, slides]
    Irshad Pananilath, Aravind Acharya, Vinay Vasista, Uday Bondhugula
    ACM Transactions on Architecture and Code Optimization (TACO), volume 12, issue 2, article 14, July 2015.

  9. PolyMage: Automatic Optimization for Image Processing Pipelines [PDF]
    Ravi Teja Mullapudi, Vinay Vasista, Uday Bondhugula
    ASPLOS '15: International Conference on Architectural Support for Programming Languages and Operating Systems, Mar 2015

  10. PLUTO+: Near-Complete Modeling of Affine Transformations for Parallelism and Locality [PDF]
    Aravind Acharya, Uday Bondhugula
    ACM SIGPLAN symposium on Principle and Practice of Parallel Programming (PPoPP), Feb 2015.

  11. Tiling and Optimizing Time-Iterated Computations over Periodic Domains [PDF, slides, code]
    Uday Bondhugula, Vinayaka Bandishti, Albert Cohen, Guillain Potron, Nicolas Vasilache
    IEEE International conference on Parallel Architectures and Compilation Techniques (PACT 2014), Aug 2014.
    Nominated for the best paper award

  12. Effective automatic computation placement and data allocation for parallelization of regular programs
    Chandan Reddy, Uday Bondhugula
    ICS '14 Proceedings of the 28th ACM international conference on Supercomputing, Jun 2014.

  13. Automatic data allocation and buffer management for multi-GPU machines
    Thejas Ramashekar, Uday Bondhugula
    ACM Transactions on Architecture and Code Optimization (TACO), Vol 10, No. 4, Article 60, Dec 2013.

  14. Compiling Affine Loop Nests for Distributed-Memory Parallel Architectures [PDF, tool, slides]
    Uday Bondhugula
    ACM/IEEE Supercomputing (SC '13), Nov 2013, Denver, USA.

  15. Generating Efficient Data Movement Code for Heterogeneous Architectures with Distributed-Memory [PDF, Tool ]
    Roshan Dathathri, Chandan Reddy, Thejas Ramashekar, Uday Bondhugula
    International conference on Parallel Architectures and Compilation Techniques (PACT 2013), Sep 2013, Edinburgh, UK.

  16. PolyGLoT: A Polyhedral Loop Transformation Framework for a Graphical Dataflow Language [PDF ]
    Somashekar Bhaskaracharya, Uday Bondhugula
    International conference on Compiler Construction (CC 2013), Mar 2013, Rome, Italy.

  17. Tiling Stencil Computations to Maximize Parallelism [PDF, codes, tool]
    Vinayak Bandishti, Irshad Pananilath, and Uday Bondhugula
    ACM/IEEE Supercomputing, Nov 2012, Utah, USA.
    Note: please refer to the journal extension instead of this.

  18. Loop Transformations: Convexity, Pruning, and Optimization [PDF]
    Louis-Noel Pouchet, Uday Bondhugula, Cedric Bastoul, Albert Cohen, J Ramanujam, P Sadayappan, and Nicolas Vasilache
    ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages (POPL), Jan 2011, Austin, USA.

  19. Combined Iterative and Model-driven Optimization in an Automatic Parallelization Framework
    Louis-Noel Pouchet, Uday Bondhugula, Cedric Bastoul, Albert Cohen, J Ramanujam, and P Sadayappan
    Supercomputing (SC), 2010, New Orleans, USA.

  20. A Model for Fusion and Code Motion in an Integrated Auto-Parallelizing Compiler [PDF]
    Uday Bondhugula, Oktay Gunluk, Sanjeeb Dash, and L. Renganarayana
    International Conference on Parallel Architectures and Compilation Techniques (PACT), Sep 2010, Vienna, Austria.

  21. Compact Multi-dimensional Kernel Extraction for Register Tiling
    L. Renganarayana, Uday Bondhugula, Salem Derisavi, Alexandre E. Eichenberger, and Kevin O'Brien
    Supercomputing (SC), 2009, Portland, USA.

  22. Compiler-Assisted Dynamic Scheduling for Effective Parallelization of Loop Nests on Multicore Processors [PDF]
    M. Baskaran, N. Vydyanathan, Uday Bondhugula, J. Ramanujam, A. Rountev, and P. Sadayappan.
    ACM SIGPLAN Symposium on Principle and Practice of Parallel Programming (PPoPP), Feb 2009, Raleigh, North Carolina.

  23. Data Layout Transformation for Enhancing Locality on NUCA Chip Multiprocessors
    Qingda Lu, Christophe Alias, Uday Bondhugula, Thomas Henretty, Sriram Krishnamoorthy, J. Ramanujam, Atanas Rountev, P. Sadayappan, Yongjian Chen, Haibo Lin, and Tin-fook Ngai.
    International Conference on Parallel Architectures and Compilation Techniques (PACT), 2009, Raleigh, USA.

  24. A Practical Automatic Polyhedral Parallelizer and Locality Optimizer [PDF]
    Uday Bondhugula, A. Hartono, J. Ramanujan, P. Sadayappan.
    ACM SIGPLAN Programming Languages Design and Implementation (PLDI), Jun 2008, Tucson, Arizona.

  25. Automatic Transformations for Communication-Minimized Parallelization and Locality Optimization in the Polyhedral Model [PDF]
    Uday Bondhugula, M. Baskaran, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan.
    International Conference on Compiler Construction (ETAPS CC), Apr 2008, Budapest, Hungary.

  26. A Compiler Framework for Optimization of Affine Loop Nests for GPGPUs [PDF]
    Muthu Baskaran, Uday Bondhugula, J. Ramanujam, A. Rountev, and P. Sadayappan.
    ACM International Conference on Supercomputing (ICS), Jun 2008, Island of Kos, Greece.

  27. Automatic Data Movement and Computation Mapping for Multi-level Parallel Architectures with Explicitly Managed Memories.
    Muthu Baskaran, Uday Bondhugula, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan.
    ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), Feb 2008, Salt Lake City, Utah.

  28. Effective Automatic Parallelization of Stencil Computations [PDF]
    S. Krishnamoorthy, M. Baskaran, Uday Bondhugula, J. Ramanujam, A. Rountev, and P. Sadayappan.
    ACM SIGPLAN Programming Language Design and Implementation (PLDI), Jun 2007, San Diego, California.

  29. Automatic Mapping of Nested Loops to FPGAs [PDF]
    Uday Bondhugula, J. Ramanujam, and P. Sadayappan.
    ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), Mar 2007, San Jose, California.

  30. Hardware/Software Integration for FPGA-based All-Pairs Shortest-Paths [ PDF]
    Uday Bondhugula, A. Devulapalli, James Dinan, J. Fernando, Pete Wyckoff, E. Stahlberg, and P. Sadayappan.
    IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM '06), Apr 2006, Napa Valley, California.

  31. Parallel FPGA-based All-Pairs Shortest-Paths in a Directed Graph [PDF | talk]
    Uday Bondhugula, Ananth Devulapalli, Joseph Fernando, Pete Wyckoff, and P. Sadayappan.
    20th IEEE International Parallel & Distributed Processing Symposium (IPDPS '06), Apr 2006, Rodos, Greece.

  32. High Performance RDMA-based All-to-all Broadcast for InfiniBand Clusters [PDF]
    S. Sur, Uday Bondhugula, A. Mamidala, H.-W. Jin, and D. K. Panda.
    12th IEEE International Conference on High Performance Computing (HiPC '05), Dec 2005, Bangalore, India.

Research Reports

  1. Automatic Intra-Array Storage Optimization [PDF]
    Somashekaracharya G Bhaskaracharya, Uday Bondhugula, Albert Cohen
    IISc-CSA-TR-2014-3, Nov 2014.

  2. Handling Negative Coefficients in Automatic Transformation Schedules
    Uday Bondhugula, Albert Cohen
    Technical report, IISc-CSA-TR-1, Feb 2014.
    Superseded by the Pluto+ paper at PPoPP'15 listed above.

  3. Automatic Distributed Memory Code Generation using the Polyhedral Framework
    Uday Bondhugula
    IISc Research Report, IISc-CSA-TR-2011-3.

  4. Can CPUs Match GPUs on Performance with Productivity?: Experiences with Optimizing a FLOP-intensive Application on CPUs and GPU
    Rajesh Bordawekar, Uday Bondhugula, Ravi Rao
    IBM Research Report RC25033, IBM T.J. Watson Research Center, Yorktown Heights, New York, Aug 2010.

  5. Believe it or Not! Multicore CPUs can Match GPUs for FLOP-intensive Applications!
    Rajesh Bordawekar, Uday Bondhugula, Ravi Rao
    IBM Research Report RC24982, IBM TJ Watson Research Center, Yorktown Heights, New York, Apr 2010.

  6. PLUTO: A Practical and Fully Automatic Polyhedral Program Optimization System
    Uday Bondhugula, J. Ramanujam, and P. Sadayappan.
    OSU Research Report OSU-CISRC-10/07-TR70, Oct 2007.

  7. Affine transformations for communication minimal parallelization and locality optimization of arbitrarily nested loop sequences
    Uday Bondhugula, M. Baskaran, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan.
    OSU Research Report OSU-CISRC-5/07-TR43, May 2007.

Ph.D. thesis

Effective Automatic Parallelization and Locality Optimization using the Polyhedral Model [PDF]
Ph.D. thesis, Defended Aug 4th, 2008, The Ohio State University, USA.