ECE/CS 508: Manycore Parallel Algorithms
Fall 2021

Announcements

Welcome to ECE/CS 508!
We will use Canvas for distributing grade information.
We will use Campuswire for discussions (please don't post to Canvas any more). You may also use join code 3169 to access our class.
Lab 0 (Device Query) and Lab 1 (Scatter AND Gather) are ready for use.
Unfortunately, no recording was made of the first lecture, and the second lecture had no audio. Sorry for the confusion; these rooms were developed over the summer and never tested. We now have scheduled recordings, and I made the channel public (you do not need to log in). You can also subscribe if you want notifications of new lecture availability. For the first two lectures, I suggest simply reviewing the posted slides (up to slide 59 in the second set).
Success! Lectures are now being captured properly, albeit with somewhat low volume, which has been a recurring problem in other ECEB rooms. Turn your digital speaker volume, mixer volume, physical speaker volume, and so forth up as necessary.
Lab 2 (Stencil) is ready for use. Since I missed my two-week deadline for changes, I don't want to edit the README, but it's a bit misleading: while the number of output elements that you need to fill in (indices 1 to nx-1 in the X dimension, for example), the Anext array has the same size as the A0 array, as discussed in lecture. Your code won't pass the tests if you do the indexing incorrectly on the output array.
Lab 3 (SGEMM) is ready for use. I just added clarifications and hints and made the documentation more legible--no real difference in code. I also added a slide (to set 4) with some performance numbers to give a benchmark against which to evaluate your own, if you'd like.
Quiz 1 is live! If you are enrolled, you should be able to see it in PrairieLearn (add the class), or there's a link in CampusWire. Please finish (as many attempts as you'd like) by the end of Thursday 16 September. The topic is Lab 1.
L6 morning, ECE sent email saying we should use the rechargeable batteries in the room, so I left my battery in my office. At the start of lecture, the mic said full. 20 minutes in, it cut out. I didn't notice, nor did anyone in the room. Sorry that I trusted their untested idea. I'll stick to the regular batteries from now on.
Lab 4 (Binning) is ready for use. See CampusWire for an explanation of how I count extra credit (there is some on this lab).
Lab 5 (BFS) is ready for use.
Lab 6 (triangle_counting) is ready for use.
Lab 7 (tiled_conv) is ready for use. IGNORE the basic_conv lab. My time for 50,000 inputs is 14.17 msec, which is 3.25 TFLOP/s by my calculations. Constant memory made it slower.
Project information is here.
Lab 8 (parallel_merge) is ready for use. Be sure that you update your copy before starting!

Course Information

Staff and Office Hours

Prof. Steve Lumetta (lumetta)

Tu 1:30-3:30 p.m.

Daily Byte (ECEB)

Assignments

Assignments will be distributed using Git. The repo is here here. You may need to merge changes if I make modifications to the assignments, so be sure that you are familiar with Git. The ProGit book is a decent introduction.

An introduction to using RAI can be found here.

Your labs will be automatically recorded when you submit to RAI, so just be sure that you have passed all of the tests before the deadline for each lab.

Final Exam

UIUC assigns final exam times based on class times, so you can know your exam times when you sign up for classes.

Those in class on the first day strongly preferred a take-home exam, and having it during finals week was more popular than having it before the end of the semester. After people have settled into their final class schedules (usually about the third or fourth week), I'll try to do another informal poll including those not coming to the lecture room to make the final decision.

Final Exam

Wednesday 15 December

08:00 - 11:00 a.m.

location TBD

Lecture Notes and Overviews

I'll try to have a tentative version of slides available in advance, but am likely to edit the version actually used until just before the lecture (and will post it afterward).

Lecture recordings are available after the class as a MediaSpace channel (once MediaSpace has finished processing them). Live streaming will not be supported.

Papers Mentioned During Lecture

I said in lecture that I'd add the other papers mentioned in L5, but looking again at the list below, I remembered that I made a conscious decision not to include papers that were relevant to a particular comment, but not really to the course in general (such as J.P. Singh's and my own papers in that set of slides). You won't have any trouble finding them, if you want to read them. But the ones below are things that you should at least consider reading to broaden your GPU and HPC background knowledge.

mentioned in L1: J.A. Stratton, C. Rodrigues, I.-J. Sung, L.-W. Chang, N. Anssari, G. Liu, W.W. Hwu, N. Obeid, "Algorithm and Data Optimization Techniques for Scaling to Massively Threaded Systems," IEEE Computer, 2012, pp. 26-32, PDF.
mentioned in L4: V. Volkov, J.W. Demmel, "Benchmarking GPUs to Tune Dense Linear Algebra," SC2008, Austin, Texas, 2008, PDF.
mentioned in L4: R. Hamming, "You and Your Research," Transcription of the Bell Communications Research Colloquium Seminar, 7 March 1986, PDF.
mentioned in L5: D.H. Bailey, "Twelve Ways to Fool the Masses When Giving Performance Results on Parallel Computers," Supercomputing Review, Aug. 1991, pp. 54-55, PDF.
mentioned in L5: S. Ryoo, C.I. Rodrigues, S.S. Stone, S.S. Baghsorkhi, S.-Z. Ueng, J.A. Stratton, W. W. Hwu, "Program Optimization Space Pruning for a Multithreaded GPU," CGO 2008, CGO Test of Time Award 2018, PDF.
mentioned in L6: G. E. Blelloch, "Scans as Primitive Parallel Operations," IEEE Transactions on Computers, 38(11):1526-1538, Nov. 1989, PDF.
mentioned in L7: A.C. Arpaci-Dusseau, R.H. Arpaci-Dusseau, D.E. Culler, J.M. Hellerstein, D.A. Patterson, "High-Performance Sorting on Networks of Workstations," SIGMOD '97, May 1997, PDF, also see this page.
mentioned in L8: J.E. Gonzalez, Y. Low, H. Hu, D. Bickson, C. Guestrin, "PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs," OSDI, 2012, PDF.
mentioned in L8: N. Sakharnykh, "Maximizing Unified Memory Performance in CUDA," web page.
mentioned in L8: Z. Li, Y. Lu, W.-P. Zhang, R.-H. Li, J. Guo, X. Huang, R. Mao, "Discovering Hierarchical Subgraphs of K-Core-Truss," Data Science and Engineering, 3, pp. 136–149, 2018, PDF
mentioned but not cited in L8: V.S. Mailthody, K. Date, Z. Qureshi, C. Pearson, R. Nagi, J. Xiong, W.-m. Hwu, "Collaborative (CPU + GPU) Algorithms for Triangle Counting and Truss Decomposition," Update Paper for Static Graph Challenge, 2018, PDF, PPTX slides
mentioned in L9: I. El Hajj, "Techniques for Optimizing Dynamic Parallelism on Graphics Processing Units," Ph.D. Dissertation, 2018, PDF.

ECE/CS 508: Manycore Parallel Algorithms Fall 2021

Announcements

Course Information

Staff and Office Hours

Assignments

Final Exam

Lecture Notes and Overviews

Papers Mentioned During Lecture

ECE/CS 508: Manycore Parallel Algorithms
Fall 2021