ECE/CS 508: Manycore Parallel Algorithms
Fall 2021
Announcements
- Welcome to ECE/CS 508!
- We will use Canvas for distributing grade information.
- We will use
Campuswire
for discussions (please don't post to Canvas any more).
You may also use join code 3169 to access our class.
- Lab 0 (Device Query) and Lab 1 (Scatter AND Gather) are ready for
use.
- Unfortunately, no recording was made of the first lecture, and the
second lecture had no audio.
Sorry for the confusion; these
rooms were developed over the summer and never tested.
We now have scheduled recordings,
and I made the channel public (you do not need to log in). You can also
subscribe if you want notifications of new lecture availability.
For the first two lectures, I suggest simply reviewing the posted
slides (up to slide 59 in the second set).
- Success! Lectures are now being captured properly, albeit with somewhat
low volume, which has been a recurring problem in other ECEB rooms. Turn
your digital speaker volume, mixer volume, physical speaker volume, and so
forth up as necessary.
- Lab 2 (Stencil) is ready for use. Since I missed my two-week deadline
for changes, I don't want to edit the README, but it's a bit misleading:
while the number of output elements that you need to fill in (indices 1 to
nx-1 in the X dimension, for example), the Anext array has the same size
as the A0 array, as discussed in lecture. Your code won't pass the tests
if you do the indexing incorrectly on the output array.
- Lab 3 (SGEMM) is ready for use. I just added clarifications and hints
and made the documentation more legible--no real difference in code. I also
added a slide (to set 4) with some performance numbers to give a
benchmark against which to evaluate your own, if you'd like.
- Quiz 1 is live! If you are enrolled, you should be able to see it
in PrairieLearn (add the class), or there's a link in CampusWire.
Please finish (as many attempts as you'd like) by the end of Thursday 16
September. The topic is Lab 1.
- L6 morning, ECE sent email saying we should use the rechargeable
batteries in the room, so I left my battery in my office. At the start of
lecture, the mic said full. 20 minutes in, it cut out. I didn't notice,
nor did anyone in the room. Sorry that I trusted their untested idea.
I'll stick to the regular batteries from now on.
- Lab 4 (Binning) is ready for use. See CampusWire for an explanation
of how I count extra credit (there is some on this lab).
- Lab 5 (BFS) is ready for use.
- Lab 6 (triangle_counting) is ready for use.
- Lab 7 (tiled_conv) is ready for use. IGNORE the basic_conv lab. My time for 50,000 inputs is 14.17 msec, which is 3.25 TFLOP/s by my calculations. Constant memory made it slower.
- Project information is here.
- Lab 8 (parallel_merge) is ready for use. Be sure that you update your copy before starting!
Course Information
Staff and Office Hours
Prof. Steve Lumetta (lumetta) |
|
Tu 1:30-3:30 p.m. |
Daily Byte (ECEB) |
Assignments
Assignments will be distributed using Git. The repo is here here. You may need to merge changes if I make modifications to the assignments, so be sure that you are familiar with Git. The ProGit book is a decent introduction.
An introduction to using RAI can be found
here.
Your labs will be automatically recorded when you submit to RAI, so just
be sure that you have passed all of the tests before the deadline for each
lab.
Final Exam
UIUC assigns final exam times based on class times, so you can
know your
exam times when you sign up for classes.
Those in class on the first day strongly preferred a take-home exam, and
having it during finals week was more popular than having it before the
end of the semester. After people have settled into their final class
schedules (usually about the third or fourth week), I'll try to do another
informal poll including those not coming to the lecture room to make the
final decision.
Final Exam |
Wednesday 15 December |
08:00 - 11:00 a.m. |
location TBD |
Lecture Notes and Overviews
I'll try to have a tentative version of slides available in advance, but
am likely to edit the version actually used until just before the lecture
(and will post it afterward).
Lecture recordings are available after the class
as a MediaSpace channel (once
MediaSpace has finished processing them).
Live streaming will not be supported.
Papers Mentioned During Lecture
I said in lecture that I'd add the other papers mentioned in L5, but looking
again at the list below, I remembered that I made a conscious decision not to
include papers that were relevant to a particular comment, but not really to
the course in general (such as J.P. Singh's and my own papers in that set of
slides). You won't have any trouble finding them, if you want to read them.
But the ones below are things that you should at least consider reading to
broaden your GPU and HPC background knowledge.
-
mentioned in L1: J.A. Stratton, C. Rodrigues, I.-J. Sung, L.-W. Chang, N. Anssari, G. Liu,
W.W. Hwu, N. Obeid, "Algorithm and Data Optimization Techniques for Scaling to
Massively Threaded Systems," IEEE Computer, 2012, pp. 26-32,
PDF.
-
mentioned in L4: V. Volkov, J.W. Demmel, "Benchmarking GPUs to Tune Dense Linear Algebra,"
SC2008, Austin, Texas, 2008,
PDF.
-
mentioned in L4: R. Hamming, "You and Your Research," Transcription of the
Bell Communications Research Colloquium Seminar, 7 March 1986,
PDF.
-
mentioned in L5: D.H. Bailey, "Twelve Ways to Fool the Masses When Giving
Performance Results on Parallel Computers," Supercomputing Review, Aug. 1991,
pp. 54-55, PDF.
-
mentioned in L5: S. Ryoo, C.I. Rodrigues, S.S. Stone, S.S. Baghsorkhi, S.-Z. Ueng, J.A. Stratton, W. W. Hwu, "Program Optimization Space Pruning for a Multithreaded GPU," CGO 2008, CGO Test of Time Award 2018, PDF.
-
mentioned in L6: G. E. Blelloch, "Scans as Primitive Parallel Operations," IEEE Transactions on Computers, 38(11):1526-1538, Nov. 1989, PDF.
-
mentioned in L7: A.C. Arpaci-Dusseau, R.H. Arpaci-Dusseau, D.E. Culler, J.M.
Hellerstein, D.A. Patterson, "High-Performance Sorting on Networks of
Workstations," SIGMOD '97, May 1997, PDF, also see this page.
-
mentioned in L8: J.E. Gonzalez, Y. Low, H. Hu, D. Bickson, C. Guestrin,
"PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs,"
OSDI, 2012, PDF.
-
mentioned in L8: N. Sakharnykh, "Maximizing Unified Memory Performance in CUDA,"
web page.
-
mentioned in L8: Z. Li, Y. Lu, W.-P. Zhang, R.-H. Li, J. Guo, X. Huang, R. Mao,
"Discovering Hierarchical Subgraphs of K-Core-Truss," Data Science and
Engineering, 3, pp. 136–149, 2018,
PDF
-
mentioned but not cited in L8: V.S. Mailthody, K. Date, Z. Qureshi, C. Pearson,
R. Nagi, J. Xiong, W.-m. Hwu, "Collaborative (CPU + GPU) Algorithms for
Triangle Counting and Truss Decomposition," Update Paper for Static Graph
Challenge, 2018, PDF,
PPTX slides
-
mentioned in L9: I. El Hajj, "Techniques for Optimizing Dynamic Parallelism
on Graphics Processing Units," Ph.D. Dissertation, 2018,
PDF.