Spring 2023

- Welcome to ECE/CS 508!
- We will use Canvas for distributing grade information.
- We will use Piazza for discussions. You can join here with access code 0nw2lk4c8dy
- Lab 0 (Device Query) and Lab 1 (Scatter
**AND**Gather) are ready for use. Lab 0 is due Tuesday 24 Jan and Lab 1 (both parts) is due Tuesday 31 Jan. - Lecture recordings are available publicly on UIUC MediaSpace through the ECE/CS 508 Spring 2023 semester channel. You can also subscribe if you want notifications of new lecture availability.
- See the class schedule (below) for all specific deadlines.
- Quizzes are in PrairieLearn. You'll need to log in, then should be able to add our class. Quiz 1 will show up just before the Lab 1 deadline. You can try as many times as you'd like, but need to get everything right by Thurs 9 Feb for full credit.
- Lab 2 (Stencil) is ready for use.
- Lab 3 (SGEMM) is ready for use.
- Please note that we will not have lecture on Tuesday 7 February.
- Lab 4 (Binning) is ready for use.
- Lab 5 (BFS) is ready for use.
- Lab 6 (triangle_counting) is ready for use.
- Dr. Mert Hidayetoglu of Stanford will give a guest lecture on Tuesday 7 March: "Performance Modeling of Sparse Matrix Multiplication."
- Information needed to propose a project is here. We'll need it no later than Saturday 18 March if you want to propose your own, but the sooner you get it to us, the sooner we can give you feedback. Team composition is ALSO due by Saturday 18 March--I will assign anyone remaining to a team.
- Lab 7 (tiled_conv) is ready for use.
**IGNORE the**My time for 50,000 inputs is 14.17 msec, which is 3.25 TFLOP/s by my calculations.`basic_conv`lab. - Lab 8 (parallel_merge) is ready for use.

Prof. Steve Lumetta (lumetta) | Tu 1:30-3:30 p.m. | Daily Byte (ECEB) | |

Kun Wu (kunwu2) | Mon 10:00 a.m.-12:00 p.m. | 227 CSL (or Zoom) |

Assignments will be distributed using Git. The repo is here. You may need to merge changes if I make modifications to the assignments, so be sure that you are familiar with Git. The ProGit book is a decent introduction.

An introduction to using RAI can be found here.

Your labs will be automatically recorded when you submit to RAI, so just be sure that you have passed all of the tests before the deadline for each lab.

UIUC assigns final exam times based on class times, so you can know your exam times when you sign up for classes.

Based on our class vote, we will have the final online on the day assigned by the campus (Wednesday 10 May). You'll have three hours to complete the final, but you can decide exactly when you want to take the exam--at any point during that day.

I'll try to have a tentative version of slides available in advance (change our web page from 508 to 508-F21 to view the last offering), but am likely to edit the version actually used until just before the lecture (and will post it afterward).

Lecture recordings are available **after the class**
as a MediaSpace channel (once
MediaSpace has finished processing them).
Live streaming will not be supported.

- (1) Logistics and Topics (x4)
- (2) Scatter to Gather (x4)
- (3) Thread Coarsening (x4)
- (4) GEMM Joint Tiling (x4)
- (5) Input Binning (x4)
- (6) Bin Compaction (x4)
- (7) Privatization (x4)
- (8) Triangle Counting (x4)
- (9) Dynamic Refinement (x4)
- (10) Deep Learning (x4)
- (11) Parallel Merge (x4)

The papers below were mentioned in lecture and are things that you should at least consider reading to broaden your GPU and HPC background knowledge.

- mentioned in L1: J.A. Stratton, C. Rodrigues, I.-J. Sung, L.-W. Chang, N. Anssari, G. Liu, W.W. Hwu, N. Obeid, "Algorithm and Data Optimization Techniques for Scaling to Massively Threaded Systems," IEEE Computer, 2012, pp. 26-32, PDF.
- mentioned in L4: V. Volkov, J.W. Demmel, "Benchmarking GPUs to Tune Dense Linear Algebra," SC2008, Austin, Texas, 2008, PDF.
- mentioned in L4: R. Hamming, "You and Your Research," Transcription of the Bell Communications Research Colloquium Seminar, 7 March 1986, PDF.
- mentioned in L5: D.H. Bailey, "Twelve Ways to Fool the Masses When Giving Performance Results on Parallel Computers," Supercomputing Review, Aug. 1991, pp. 54-55, PDF.
- mentioned in L5: S. Ryoo, C.I. Rodrigues, S.S. Stone, S.S. Baghsorkhi, S.-Z. Ueng, J.A. Stratton, W. W. Hwu, "Program Optimization Space Pruning for a Multithreaded GPU," CGO 2008, CGO Test of Time Award 2018, PDF.
- mentioned in L6: G. E. Blelloch, "Scans as Primitive Parallel Operations," IEEE Transactions on Computers, 38(11):1526-1538, Nov. 1989, PDF.
- mentioned in L7: A.C. Arpaci-Dusseau, R.H. Arpaci-Dusseau, D.E. Culler, J.M. Hellerstein, D.A. Patterson, "High-Performance Sorting on Networks of Workstations," SIGMOD '97, May 1997, PDF, also see this page.
- mentioned in L8: J.E. Gonzalez, Y. Low, H. Hu, D. Bickson, C. Guestrin, "PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs," OSDI, 2012, PDF.
- mentioned in L8: N. Sakharnykh, "Maximizing Unified Memory Performance in CUDA," web page.
- mentioned in L8: Z. Li, Y. Lu, W.-P. Zhang, R.-H. Li, J. Guo, X. Huang, R. Mao, "Discovering Hierarchical Subgraphs of K-Core-Truss," Data Science and Engineering, 3, pp. 136–149, 2018, PDF
- mentioned but not cited in L8: V.S. Mailthody, K. Date, Z. Qureshi, C. Pearson, R. Nagi, J. Xiong, W.-m. Hwu, "Collaborative (CPU + GPU) Algorithms for Triangle Counting and Truss Decomposition," Update Paper for Static Graph Challenge, 2018, PDF, PPTX slides
- mentioned in L9: I. El Hajj, "Techniques for Optimizing Dynamic Parallelism on Graphics Processing Units," Ph.D. Dissertation, 2018, PDF.
- from Mert Hidayetoglu's guest talk: M. Hidayetoglu, T. Bicer, S. Garcia de Gonzalo, V. De Andrade, D. Gursoy, R. Kettimuthu, I. T. Foster, W.-m. W. Hwu, "Petascale XCT: 3D Image Reconstruction with Hierarchical Communications on Multi-GPU Nodes," in Proceedings of SC20, PDF.
- from Mert Hidayetoglu's guest talk: M. Hidayetoglu, C. Pearson, V. S. Mailthody, E. Ebrahimi, J. Xiong, R. Nagi, W.-m. W. Hwu, "At-Scale Sparse Deep Neural Network Inference with Efficient GPU Implementation," in Proceedings of the IEEE High-Performance Extreme Computing Conference, Boston, MA, 2020, PDF