ECE/CS 508: Manycore Parallel Algorithms
Spring 2023
Announcements
- Welcome to ECE/CS 508!
- We will use Canvas for distributing grade information.
- We will use
Piazza
for discussions.
You can join here with access code 0nw2lk4c8dy
- Lab 0 (Device Query) and Lab 1 (Scatter AND Gather) are ready for
use. Lab 0 is due Tuesday 24 Jan and Lab 1 (both parts) is due Tuesday 31 Jan.
- Lecture recordings are available publicly on UIUC MediaSpace through
the ECE/CS 508 Spring 2023 semester channel. You can also
subscribe if you want notifications of new lecture availability.
- See the class schedule (below) for all specific deadlines.
- Quizzes are in PrairieLearn.
You'll need to log in, then should be able to add our class. Quiz 1
will show up just before the Lab 1 deadline. You can try as many times
as you'd like, but need to get everything right by Thurs 9 Feb
for full credit.
- Lab 2 (Stencil) is ready for use.
- Lab 3 (SGEMM) is ready for use.
- Please note that we will not have lecture on Tuesday 7 February.
- Lab 4 (Binning) is ready for use.
- Lab 5 (BFS) is ready for use.
- Lab 6 (triangle_counting) is ready for use.
-
Dr. Mert Hidayetoglu of Stanford will give a guest lecture on Tuesday 7 March:
"Performance Modeling of Sparse Matrix Multiplication."
-
Information needed to propose a project is here. We'll need it no later than Saturday 18 March if you want to propose your own, but the sooner you get it to us, the sooner we can give you feedback. Team composition is ALSO due by Saturday 18 March--I will assign anyone remaining to a team.
- Lab 7 (tiled_conv) is ready for use. IGNORE the basic_conv lab. My time for 50,000 inputs is 14.17 msec, which is 3.25 TFLOP/s by my calculations.
- Lab 8 (parallel_merge) is ready for use.
Course Information
Staff and Office Hours
Prof. Steve Lumetta (lumetta) |
|
Tu 1:30-3:30 p.m. |
Daily Byte (ECEB) |
Kun Wu (kunwu2) |
|
Mon 10:00 a.m.-12:00 p.m. |
227 CSL (or Zoom) |
Assignments
Assignments will be distributed using Git. The repo is here. You may need to merge changes if I make modifications to the assignments, so be sure that you are familiar with Git. The ProGit book is a decent introduction.
An introduction to using RAI can be found
here.
Your labs will be automatically recorded when you submit to RAI, so just
be sure that you have passed all of the tests before the deadline for each
lab.
Final Exam
UIUC assigns final exam times based on class times, so you can
know your
exam times when you sign up for classes.
Based on our class vote, we will have the final online on the day
assigned by the campus (Wednesday 10 May). You'll have three hours to complete the final, but you can decide exactly when you want to take the exam--at any point during that day.
Lecture Notes and Overviews
I'll try to have a tentative version of slides available in advance (change our web page from 508 to 508-F21 to view the last offering), but
am likely to edit the version actually used until just before the lecture
(and will post it afterward).
Lecture recordings are available after the class
as a MediaSpace channel (once
MediaSpace has finished processing them).
Live streaming will not be supported.
Papers Mentioned During Lecture
The papers below were mentioned in lecture and are things that you should at least consider reading to
broaden your GPU and HPC background knowledge.
-
mentioned in L1: J.A. Stratton, C. Rodrigues, I.-J. Sung, L.-W. Chang, N. Anssari, G. Liu,
W.W. Hwu, N. Obeid, "Algorithm and Data Optimization Techniques for Scaling to
Massively Threaded Systems," IEEE Computer, 2012, pp. 26-32,
PDF.
-
mentioned in L4: V. Volkov, J.W. Demmel, "Benchmarking GPUs to Tune Dense Linear Algebra,"
SC2008, Austin, Texas, 2008,
PDF.
-
mentioned in L4: R. Hamming, "You and Your Research," Transcription of the
Bell Communications Research Colloquium Seminar, 7 March 1986,
PDF.
-
mentioned in L5: D.H. Bailey, "Twelve Ways to Fool the Masses When Giving
Performance Results on Parallel Computers," Supercomputing Review, Aug. 1991,
pp. 54-55, PDF.
-
mentioned in L5: S. Ryoo, C.I. Rodrigues, S.S. Stone, S.S. Baghsorkhi, S.-Z. Ueng, J.A. Stratton, W. W. Hwu, "Program Optimization Space Pruning for a Multithreaded GPU," CGO 2008, CGO Test of Time Award 2018, PDF.
-
mentioned in L6: G. E. Blelloch, "Scans as Primitive Parallel Operations," IEEE Transactions on Computers, 38(11):1526-1538, Nov. 1989, PDF.
-
mentioned in L7: A.C. Arpaci-Dusseau, R.H. Arpaci-Dusseau, D.E. Culler, J.M.
Hellerstein, D.A. Patterson, "High-Performance Sorting on Networks of
Workstations," SIGMOD '97, May 1997, PDF, also see this page.
-
mentioned in L8: J.E. Gonzalez, Y. Low, H. Hu, D. Bickson, C. Guestrin,
"PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs,"
OSDI, 2012, PDF.
-
mentioned in L8: N. Sakharnykh, "Maximizing Unified Memory Performance in CUDA,"
web page.
-
mentioned in L8: Z. Li, Y. Lu, W.-P. Zhang, R.-H. Li, J. Guo, X. Huang, R. Mao,
"Discovering Hierarchical Subgraphs of K-Core-Truss," Data Science and
Engineering, 3, pp. 136–149, 2018,
PDF
-
mentioned but not cited in L8: V.S. Mailthody, K. Date, Z. Qureshi, C. Pearson,
R. Nagi, J. Xiong, W.-m. Hwu, "Collaborative (CPU + GPU) Algorithms for
Triangle Counting and Truss Decomposition," Update Paper for Static Graph
Challenge, 2018, PDF,
PPTX slides
-
mentioned in L9: I. El Hajj, "Techniques for Optimizing Dynamic Parallelism
on Graphics Processing Units," Ph.D. Dissertation, 2018,
PDF.
-
from Mert Hidayetoglu's guest talk:
M. Hidayetoglu, T. Bicer, S. Garcia de Gonzalo, V. De Andrade, D. Gursoy,
R. Kettimuthu, I. T. Foster, W.-m. W. Hwu, "Petascale XCT: 3D Image
Reconstruction with Hierarchical Communications on Multi-GPU Nodes,"
in Proceedings of SC20, PDF.
-
from Mert Hidayetoglu's guest talk:
M. Hidayetoglu, C. Pearson, V. S. Mailthody, E. Ebrahimi, J. Xiong, R. Nagi,
W.-m. W. Hwu, "At-Scale Sparse Deep Neural Network Inference with Efficient
GPU Implementation," in Proceedings of the IEEE High-Performance Extreme
Computing Conference, Boston, MA, 2020,
PDF