About the Course
Accelerator architectures are discrete processing units which supplement a base processor with the objective of providing advanced performance at lower energy cost. Performance is gained by a design which favours a high number of parallel compute cores at the expense of imposing significant software challenges. This module looks at accelerated computing from multi-core CPUs to GPU accelerators with many TFlops of theoretical performance. The module will give insight into how to write high performance code with specific emphasis on GPU programming with NVIDIA CUDA GPUs. A key aspect of the module will be understanding what the implications of program code are on the underlying hardware so that it can be optimised.
The modules aims, objectives and assessment details are available on the modules public teaching page.
Lecture Notes
Lectures will take place on Mondays 15:00 until 17:00 in Diamond lecture theatre DIA-LT09. In week 5 the second half of the lecture will be a MOLE quiz (The second MOLE quiz in in week 10 lab) . Week 11 will be an invited lecture.
Lectures notes are available via Google Drive (you will need to login with your university credentials).
Week 01: Lecture 01 - Introduction (pdf)
- Context and Hardware Trends
- Supercomputing
- Software and Parallel Computing
- Course Outline
Week 01: Lecture 02 - Introduction to C (pdf)
- Introduction to the C Programming Language
- Basic C Usage “Hello World”
- Functions and Scoping
- Arrays, Strings and Basic IO
- File IO
- Visual Studio 2013 Overview (available as a separate download)
Week02: Lecture 03 - Memory (pdf)
- Pointers
- Advanced use of Pointers
- Dynamically Managed Memory
- Structures
- Binary Files
Week 02: Lecture 04 - Optimisation (pdf)
- Optimisation Overview
- Compute Bound Code
- Memory Bound Code
Week 03: Lecture 05 - OpenMP (pdf)
- Multi-core Systems and OpenMP
- Parallelising Loops
- Critical Sections and Synchronisation
- OpenMP Scoping
- Task Parallelism with Sections
Week 03: Lecture 06 - OpenMP Part II (pdf)
- Parallel Reduction
- Scheduling
- Nesting
Week 04: Lecture 07 - GPU Architectures (pdf)
- What is a GPU?
- General Purpose Computation on GPUs (and GPU History)
- GPU CUDA Hardware Model
- Accelerated Systems
Week 04: Lecture 08 - Introduction to CUDA (pdf)
- CUDA Programming Model
- CUDA Device Code
- CUDA Host Code and Memory Management
- CUDA Compilation and execution in Visual Studio
Week 05: Lecture 09 - CUDA Memory (pdf)
- Memory Hierarchy Overview
- Global Memory
- Constant Memory
- Texture and Read-only Memory
- Roundup & Performance Timing
Week 05: MOLE QUIZ (DIA-002, Computer room 2)
Week 06: Lecture 10 - CUDA Shared Memory (pdf)
- Shared Memory
- Shared Memory Bank Conflicts
- 2D Shared Memory Bank Conflicts
- Boundary Conditions for Shared Memory Loading
- Host-side Configurations for Shared Memory
- Shared Memory Bank Conflict Calculator
Week 06: Lecture 11 - CUDA Performance (pdf)
- Global Memory Coalescing
- Global Memory Coalescing with the L1 Cache
- Occupancy and Thread Block Dimensions
Easter Vacation
Week 07: Lecture 12 - Warp Level CUDA (pdf)
- Warp Scheduling and Divergence
- Atomics
- Warp Operations
- Excel cheet sheet for SM conflicts
Week 07: Lecture 13 - Parallel Patterns (pdf)
- Parallel Patterns Overview
- Reduction
- Scan
Week 08: Lecture 14 - Sorting and Libraries (pdf)
- Sorting Networks
- Merge and Bitonic Sort
- Thrust Parallel Primitives Library
- Applications of Sorting (binning)
Week 08: Lecture 15 - CUDA Streams (pdf)
- Synchronous and Asynchronous execution
- CUDA Streams
- Synchronisation
- Multi GPU Programming
Week 9: Lecture 16 and 17 - Performance Optimisation (pdf)
- Profiling Introduction
- The Problem
- Visual Profiler Guided Analysis
- Profiling in tha lab (available as a seperate download)
Week 10: BANK HOLIDAY
MOLE quiz has moved to following days lab. i.e. 9:00am Tuesday 3rd May in Diamond High Spec Lab (DIA-206).
Week 11: Lecture 18 - Invited Talk
“Optimising single source shortest path on the GPU”, by Peter Heywood
Week 12: No lecture
Labs
Labs will take place on Tuesdays 9:00 until 10:50 in the Diamond High Spec Lab (DIA-206). Weeks 9 and 11 will be for assignment help. There will be no lab in week 12.
Lab sheets are available via Google Drive Lab Sheets, Lab Solutions (you will need to login with your university credentials).
A Lab FAQ document has been prepared with common questions. This will be continually updated throughout the course.
Lab 01 - C Programming
Lab 02 - Memory and Performance
Lab 03 - OpenMP
Lab 04 - Introduction to CUDA
Lab 05 - Memory
Lab 06 - Shared Memory and Occupancy
- Lab Sheet
- Revised Lab sheet with row major access
- Source Code
- Revised Source Code with row major access
- Solution for orginal lab sheet
- Solution for revised row major version
Lab 07 - Atomics and Primitives
Lab 08 - Libraries and Streams
No Lab Sheet Exercise or Code for Week 09 - Assignment Help
- Profiling in the Diamond Lab
- Review the profiling code form the lectures
Lab 09 - CUDA OpenGL Interop
Feedback
Anonymous feedback can be provided for the course content at any time by visiting the feedback form. For example have you found the content useful, too easy or difficult, or would you like some aspect of the course to be clarified?
Discussion and Announcements
Discussion and announcements will be made via the modules Gooogle Group.
https://groups.google.com/a/sheffield.ac.uk/d/forum/com4521-group
Calendar
You can add this calendar to your University of Sheffield Google Calendar by searching for COM4521 and COM6521
Recommended Reading
- Edward Kandrot, Jason Sanders, “CUDA by Example: An Introduction to General-Purpose GPU Programming”, Addison Wesley 2010.
- Brian Kernighan, Dennis Ritchie, “The C Programming Language (2nd Edition)”, Prentice Hall 1988.
- NVIDIA, CUDA C Programming Guide