ECE/CS 508: Manycore Parallel Algorithms
Fall 2025
Announcements
- Welcome to ECE/CS 508!
- Follow this link ASAP to get your accounts on Delta and Github set up for use--do so NO LATER THAN the first day of class so that you can get Lab 0 done on time.
- We will use Piazza for discussions.
You can join here.
- We will use Canvas for distributing grade information (only).
- Lab 0 (Device Query) and Lab 1 (Scatter AND Gather) are ready for
use. Lab 0 is due Tuesday 2 Sep and Lab 1 (both parts) is due Tuesday 9 Sep.
- Lab 2 (Stencil) is ready for use.
- Lab 3 (SGEMM) is ready for use.
- Lab 4 (Binning) is ready for use.
- Lab 5 (BFS) is ready for use.
- Here's some starting guidance on profiling an application.
- Lab 6 (tricount) is ready for use.
- Information about how to propose a project is now available.
- Lab 7 (tiled_conv) is ready for use.
My time for 50,000 inputs is 7.6 msec, which is 8.25 TFLOP/s by my calculations. That's less than a quarter of peak,
which perhaps reflects the fact that the code was tuned for Titan V--you should be able to beat it, right?
- Lab 8 (parallel_merge) is ready for use.
Course Information
- Course overview and policies
- Course policy on use of outside tools, including AI.
- The supplementary textbook is shown on this web page (Amazon).
Illinois students have free access through
this link.
You can also find the third edition, and the fifth edition may come out soon.
- Grainger College of Engineering policies
- Assignments (labs/MPs) and final project will be distributed, completed, and graded using NCSA's Delta system and Github (see second announcement at top of page, just after the welcome, for directions).
- Quizzes and final exam are in PrairieLearn.
You'll need to log in, then should be able to add our class. Quiz 1
will show up just before the Lab 1 deadline. You can try as many times
as you'd like, but need to get everything right by Thurs 18 Sep
for full credit.
- Lecture recordings are available publicly on UIUC MediaSpace through
the ECE/CS 508 Fall 2025 semester channel. You can also
subscribe if you want notifications of new lecture availability.
- Tentative Schedule
Staff and Office Hours
| Prof. Steve Lumetta (lumetta) |
 |
Tu 1:00-3:00 p.m. |
Daily Byte (ECEB) |
| Jinghua Wang (jinghua3) |
 |
Fr 3:30-5:30 p.m. |
5034 ECEB |
Assignments
Assignments will be distributed using Git. You will need to merge changes if I make modifications to the assignments, so be sure that you are familiar with Git. The ProGit book is a decent introduction.
Final Exam
UIUC assigns final exam times based on class times, so you can
know your
exam times when you sign up for classes.
The final exam will be online using PrairieLearn.
After people have settled into their final class
schedules (usually about the third or fourth week), I'll do an
informal poll to see when people want the exam.
Last time, the class voted to use the campus' nominal exam date, which for us means
Thursday 18 December. Whatever day we choose, you'll have three hours to complete the final,
but you can decide exactly when you want to take the exam--at any point during that day.
Lecture Notes and Overviews
I'll try to have a tentative version of slides available in advance (change our web page from 508 to 508-S23 to view the last offering), but
may edit the version actually used until just before the lecture
(and will post it afterward).
Lecture recordings are available after the class
as a MediaSpace channel (once
MediaSpace has finished processing them).
Live streaming will not be supported.
Papers Mentioned During Lecture
The papers below were / will be mentioned in lecture and are things that you should at least consider reading to
broaden your GPU and HPC background knowledge.
-
mentioned in L1: J.A. Stratton, C. Rodrigues, I.-J. Sung, L.-W. Chang, N. Anssari, G. Liu,
W.W. Hwu, N. Obeid, "Algorithm and Data Optimization Techniques for Scaling to
Massively Threaded Systems," IEEE Computer, 2012, pp. 26-32,
PDF.
-
mentioned in L4: V. Volkov, J.W. Demmel, "Benchmarking GPUs to Tune Dense Linear Algebra,"
SC2008, Austin, Texas, 2008,
PDF.
-
mentioned in L4: R. Hamming, "You and Your Research," Transcription of the
Bell Communications Research Colloquium Seminar, 7 March 1986,
PDF.
-
mentioned in L5: D.H. Bailey, "Twelve Ways to Fool the Masses When Giving
Performance Results on Parallel Computers," Supercomputing Review, Aug. 1991,
pp. 54-55, PDF.
-
mentioned in L5: S. Ryoo, C.I. Rodrigues, S.S. Stone, S.S. Baghsorkhi, S.-Z. Ueng, J.A. Stratton, W. W. Hwu, "Program Optimization Space Pruning for a Multithreaded GPU," CGO 2008, CGO Test of Time Award 2018, PDF.
-
mentioned in L6: G. E. Blelloch, "Scans as Primitive Parallel Operations," IEEE Transactions on Computers, 38(11):1526-1538, Nov. 1989, PDF.
-
mentioned in L7: A.C. Arpaci-Dusseau, R.H. Arpaci-Dusseau, D.E. Culler, J.M.
Hellerstein, D.A. Patterson, "High-Performance Sorting on Networks of
Workstations," SIGMOD '97, May 1997, PDF.
-
mentioned in L8: J.E. Gonzalez, Y. Low, H. Hu, D. Bickson, C. Guestrin,
"PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs,"
OSDI, 2012, PDF.
-
mentioned in L8: N. Sakharnykh, "Maximizing Unified Memory Performance in CUDA,"
web page.
-
mentioned in L8: Z. Li, Y. Lu, W.-P. Zhang, R.-H. Li, J. Guo, X. Huang, R. Mao,
"Discovering Hierarchical Subgraphs of K-Core-Truss," Data Science and
Engineering, 3, pp. 136–149, 2018,
PDF
-
mentioned but not cited in L8: V.S. Mailthody, K. Date, Z. Qureshi, C. Pearson,
R. Nagi, J. Xiong, W.-m. Hwu, "Collaborative (CPU + GPU) Algorithms for
Triangle Counting and Truss Decomposition," Update Paper for Static Graph
Challenge, 2018, PDF.
-
mentioned in L9: I. El Hajj, "Techniques for Optimizing Dynamic Parallelism
on Graphics Processing Units," Ph.D. Dissertation, 2018,
PDF.