COM4521 Parallel Computing with GPUs

Course Information

Accelerator architectures are discrete processing units which supplement a base processor with the objective of providing advanced performance at lower energy cost. Performance is gained by a design which favours a high number of parallel compute cores at the expense of imposing significant software challenges. This module looks at accelerated computing from multi-core CPUs to GPU accelerators with many TFlops of theoretical performance. The module will give insight into how to write high performance code with specific emphasis on GPU programming with NVIDIA CUDA GPUs. A key aspect of the module will be understanding what the implications of program code are on the underlying hardware so that it can be optimised.

This module is an on-demand version of Com4521 based on the 2021 year when I last delivered the course. The current undergraduate taught course is available on the modules public teaching page.

Course Material

Lectures are pre-recorded and are available on the COM4521 Parallel Computing with Graphical Processing Units Kaltura Channel or as downloadable pdfs on Google Drive. Each weeks practical activities (the labs) follow the ideas presented in the lectures so it is important that you follow the lecture and lab timetable below.

Week 01

Lecture 01 - Introduction

Course Context (pdf, recording)
Supercomputing and Software (pdf, recording)
Module Details (pdf, recording)

Lecture 02 - Introduction to C

Introducing C (pdf, recording)
Functions and Scoping (pdf, recording)
Arrays, Strings and IO (pdf, recording)

Lab 01 - Introduction to Visual Studio and C Programming

Week 02

Lecture 03 - Memory

Pointers (pdf, recording)
Advanced use of Pointers (pdf, recording)
Dynamically Managed Memory (pdf, recording)
Structures and Binary Files (pdf, recording)

Lecture 04 - Optimisation

Optimisation Overview (pdf, recording)
Compute Bound Code (pdf, recording)
Memory Bound Code (pdf, recording)

Lab 02 - Memory and Performance

Week 03

Lecture 05 - OpenMP

OpenMP Overview (pdf, recording)
Loops and Critical Sections (pdf, recording)
Scoping and Tasks (pdf, recording)

Lecture 06 - OpenMP Part II

Parallel Reduction (pdf, recording)
Scheduling (pdf, recording)
Nesting and Summary (pdf, recording)

Lab 03 - OpenMP

Week 04

Lecture 07 - GPU Architectures

Introduction to GPUs (pdf, recording)
Programming GPUs (pdf, recording)
GPU Hardware (pdf, recording)

Lecture 08 - Introduction to CUDA

The CUDA Programming Model (pdf, recording)
CUDA Device Code (pdf, recording)
CUDA Host Code and Memory Management (pdf, recording)

Lab 04 - Introduction to CUDA

Week 05

Lecture 09 - CUDA Memory

Memory Overview (pdf, recording)
Global and Constant Memory (pdf, recording)
Read Only and Texture Memory (pdf, recording)

Lab 5 - CUDA Memory

Week 6

Lecture 10 - CUDA Shared Memory

Introduction to Shared Memory (pdf, recording)
Shared Memory Bank Conflicts (pdf, recording)
Boundary Conditions (pdf, recording)
Shared Memory Bank Conflict Calculator (xlxs file)

Lecture 11 - CUDA Performance

Memory Coalescing (pdf, recording)
The L1 Cache (pdf, recording)
Occupancy (pdf, recording)

Lab 6 - Shared Memory

Week 7

Lecture 12 - Warp Level CUDA

Scheduling and Divergence (pdf, recording)
Advanced Divergence (pdf, recording)
Atomic and Warp Operations (pdf, recording)

Lecture 13 - Parallel Patterns

Parallel Patterns Overview (pdf, recording)
Reduction (pdf, recording)
Scan (pdf, recording)

Lab 07 - Atomics and Primitives

Week 8

Lecture 14 and 15 - Performance Optimisation

Performance Profiling - Guest Lecture by Dr Robert Chisholm (pdf, recording)

Lab 08 - Profiling Example

Profile Lecture Example Code

Week 9

Lecture 16 - Sorting and Libraries

Sorting (Networks) (pdf, recording)
Libraries and Thrust (pdf, recording)
Applications of GPU Sort (pdf, recording)

Lecture 17 - CUDA Streams

Synchronous and Asynchronous Execution (pdf, recording)
CUDA Streams (pdf, recording)
Synchronisation (pdf, recording)
Multi GPU Programming (pdf, recording)

Lab 09 - Libraries and Streams

Week 10

Please Find below a list of previous invited guest lectures

Professor Paul Richmond