COM4521 Parallel Computing with GPUs
Course Information
Accelerator architectures are discrete processing units which supplement a base processor with the objective of providing advanced performance at lower energy cost. Performance is gained by a design which favours a high number of parallel compute cores at the expense of imposing significant software challenges. This module looks at accelerated computing from multi-core CPUs to GPU accelerators with many TFlops of theoretical performance. The module will give insight into how to write high performance code with specific emphasis on GPU programming with NVIDIA CUDA GPUs. A key aspect of the module will be understanding what the implications of program code are on the underlying hardware so that it can be optimised.
This module is an on-demand version of Com4521 based on the 2021 year when I last delivered the course. The current undergraduate taught course is available on the modules public teaching page.
Course Material
Lectures are pre-recorded and are available on the COM4521 Parallel Computing with Graphical Processing Units Kaltura Channel or as downloadable pdfs on Google Drive. Each weeks practical activities (the labs) follow the ideas presented in the lectures so it is important that you follow the lecture and lab timetable below.
Week 01
Lecture 01 - Introduction
- Course Context (pdf, recording)
- Supercomputing and Software (pdf, recording)
- Module Details (pdf, recording)
Lecture 02 - Introduction to C
- Introducing C (pdf, recording)
- Functions and Scoping (pdf, recording)
- Arrays, Strings and IO (pdf, recording)
Lab 01 - Introduction to Visual Studio and C Programming
Week 02
Lecture 03 - Memory
- Pointers (pdf, recording)
- Advanced use of Pointers (pdf, recording)
- Dynamically Managed Memory (pdf, recording)
- Structures and Binary Files (pdf, recording)
Lecture 04 - Optimisation
- Optimisation Overview (pdf, recording)
- Compute Bound Code (pdf, recording)
- Memory Bound Code (pdf, recording)
Lab 02 - Memory and Performance
Week 03
Lecture 05 - OpenMP
- OpenMP Overview (pdf, recording)
- Loops and Critical Sections (pdf, recording)
- Scoping and Tasks (pdf, recording)
Lecture 06 - OpenMP Part II
Lab 03 - OpenMP
Week 04
Lecture 07 - GPU Architectures
Lecture 08 - Introduction to CUDA
- The CUDA Programming Model (pdf, recording)
- CUDA Device Code (pdf, recording)
- CUDA Host Code and Memory Management (pdf, recording)
Lab 04 - Introduction to CUDA
Week 05
Lecture 09 - CUDA Memory
- Memory Overview (pdf, recording)
- Global and Constant Memory (pdf, recording)
- Read Only and Texture Memory (pdf, recording)
Lab 5 - CUDA Memory
Week 6
Lecture 10 - CUDA Shared Memory
- Introduction to Shared Memory (pdf, recording)
- Shared Memory Bank Conflicts (pdf, recording)
- Boundary Conditions (pdf, recording)
- Shared Memory Bank Conflict Calculator (xlxs file)
Lecture 11 - CUDA Performance
Lab 6 - Shared Memory
Week 7
Lecture 12 - Warp Level CUDA
- Scheduling and Divergence (pdf, recording)
- Advanced Divergence (pdf, recording)
- Atomic and Warp Operations (pdf, recording)
Lecture 13 - Parallel Patterns
Lab 07 - Atomics and Primitives
Week 8
Lecture 14 and 15 - Performance Optimisation
Lab 08 - Profiling Example
Week 9
Lecture 16 - Sorting and Libraries
- Sorting (Networks) (pdf, recording)
- Libraries and Thrust (pdf, recording)
- Applications of GPU Sort (pdf, recording)
Lecture 17 - CUDA Streams
- Synchronous and Asynchronous Execution (pdf, recording)
- CUDA Streams (pdf, recording)
- Synchronisation (pdf, recording)
- Multi GPU Programming (pdf, recording)
Lab 09 - Libraries and Streams
Week 10
Please Find below a list of previous invited guest lectures