About the Course

Accelerator architectures are discrete processing units which supplement a base processor with the objective of providing advanced performance at lower energy cost. Performance is gained by a design which favours a high number of parallel compute cores at the expense of imposing significant software challenges. This module looks at accelerated computing from multi-core CPUs to GPU accelerators with many TFlops of theoretical performance. The module will give insight into how to write high performance code with specific emphasis on GPU programming with NVIDIA CUDA GPUs. A key aspect of the module will be understanding what the implications of program code are on the underlying hardware so that it can be optimised.

The modules aims, objectives and assessment details are available on the modules public teaching page.

Assessment

Assignment

The assignment makes up 80% of the total marks for this module and is split into two related parts. The handout will be available on MOLE from week 3 and week 6 and will be handed in in two stages.

Part 1 is due Tuesday 28th March (5pm) - Week 8
Part 2 is due Wednesday 16th May (5pm) - Week 12.

MOLE Quizes

Two quizes, each 10% of the module mark will be held under exam conditions on

Monday the 6th March (16:00-17:00) - Week 5 (This is during the second half of the lecture in Diamond High Spec Lab (DIA-206))
Tuesday the 25th April (9:00-10:00) - Week 9 (This will take place during normal lab hours)

Lecture Notes

Lectures will take place on Mondays 15:00 until 17:00 in Diamond lecture theatre DIA-LT09. In week 5 the second half of the lecture will be a MOLE quiz. Week 11 will be an invited lecture.

Week 01:

Lecture 01 - Introduction (pdf)

Context and Hardware Trends
Supercomputing
Software and Parallel Computing
Course Outline

Lecture 02 - Introduction to C (pdf)

Introduction to the C Programming Language
Basic C Usage “Hello World”
Functions and Scoping
Arrays, Strings and Basic IO
File IO

Additional Notes on Visual Studio (pdf)

Visual Studio 2013 Overview

Week 02:

Lecture 03 - Memory (pdf)

Pointers
Advanced use of Pointers
Dynamically Managed Memory
Structures
Binary Files

Lecture 04 - Optimisation (pdf)

Optimisation Overview
Compute Bound Code
Memory Bound Code

NSS - National Stduent Survey (pdf)

Week 03:

Lecture 05 - OpenMP (pdf)

Multi-core Systems and OpenMP
Parallelising Loops
Critical Sections and Synchronisation
OpenMP Scoping
Task Parallelism with Sections

Lecture 06 - OpenMP Part II (pdf)

Parallel Reduction
Scheduling
Nesting

Week 04:

Lecture 07 - GPU Architectures (pdf)

What is a GPU?
General Purpose Computation on GPUs (and GPU History)
GPU CUDA Hardware Model
Accelerated Systems

Lecture 08 - Introduction to CUDA (pdf)

CUDA Programming Model
CUDA Device Code
CUDA Host Code and Memory Management
CUDA Compilation and execution in Visual Studio

Week 05:

Lecture 09 - CUDA Memory (pdf)

Memory Hierarchy Overview
Global Memory
Constant Memory
Texture and Read-only Memory
Roundup & Performance Timing

MOLE QUIZ

Week 06:

Lecture 10 - CUDA Shared Memory (pdf)

Shared Memory
Shared Memory Bank Conflicts
2D Shared Memory Bank Conflicts
Boundary Conditions for Shared Memory Loading
Host-side Configurations for Shared Memory
Shared Memory Bank Conflict Calculator

Lecture 11 - CUDA Performance (pdf)

Global Memory Coalescing
Global Memory Coalescing with the L1 Cache
Occupancy and Thread Block Dimensions

Week 07:

Lecture 12 - Warp Level CUDA (pdf)

Warp Scheduling and Divergence
Atomics
Warp Operations

Lecture 13 - Parallel Patterns (pdf)

Parallel Patterns Overview
Reduction
Scan
Shared Memory Bank Conflicts for Examples

Week 08:

Lecture 14 - Sorting and Libraries (pdf)

Sorting Networks
Merge and Bitonic Sort
Thrust Parallel Primitives Library
Applications of Sorting (binning)

Lecture 15 - CUDA Streams (pdf)

Synchronous and Asynchronous execution
CUDA Streams
Synchronisation
Multi GPU Programming

Easter Vacation

Week 09:

Lecture 16 and 17 - Performance Optimisation (pdf)

Profiling Introduction
The Problem
Visual Profiler Guided Analysis
Profiling in the lab Notes (pdf)

Week 10: BANK HOLIDAY

Week 11: Invited Talk

Week 12: NO LECTURE

Labs

Labs will take place on Tuesdays 9:00 until 10:50 in the Diamond High Spec Lab (DIA-206). Week 9 lab will be a MOLE quiz followed by assignment help. Week 11 (full lab) will be for assignment help. Assignment help will also be availble during any of the normal lab sessions. There will be no lab in week 12.

A Lab FAQ document has been prepared with common questions. This will be continually updated throughout the course.

Lab Register

The lab register must be completed by every student following the completion and review of the exercises. You should complete this only when you have completed the lab and reviewed the solutions. You are not expected to complete this during the lab class but you should complete it by the end of the teaching week.

Lab Register Link: https://goo.gl/0r73gD

Getting started with the DGX-1 - Self Paced Lab

As part of the department of computer science you have the opportunity to access the DGX-1 GPU system with state or the art P100 GPUs. In order to access this system I have written a handy user guide / self paced getting started lab which describes the process specifically for this module. As part of this you will need to register for a HPC account.

Getting started with the DGX-1 - Self Paced Lab

Note: It is not a requirement of the module to use this system (there are GPUs you can use in the diamond lab). This is offered as an additional opportunity to test your code on the very latest hardware. It is also possible to use the ShARC system remotely allowing you the added benefit of being able to work outside of the lab on your assignment.

Assistance for this is self paced lab is available during any normal lab session. If there is sufficient interest I will put on an additional lab session for help with this.

If you are unfamiliar with Linux and the bash shell then this is probably not for you…..

Lab 01 - C Programming

Lab Sheet (pdf)
Source Code (zip)
Solution (zip)

Lab 02 - Memory and Performance

Lab Sheet (pdf)
Source Code (zip)
Solution (zip)

Lab 03 - OpenMP

Lab Sheet (pdf)
Source Code (zip)
Solution (zip)

Lab 04 - Introduction to CUDA

Lab Sheet (pdf)
Source Code (zip)
Solution (zip)

Lab 05 - Memory

Lab Sheet (pdf)
Source Code (zip)
Solution (zip)

Lab 06 - Shared Memory and Occupancy

Lab Sheet (pdf)
Source Code (zip)
Solution (zip)

Lab 07 - Atomics and Primitives

Lab Sheet (pdf)
Source Code (zip)
Solution (zip)

Lab 08 - Libraries and Streams

Lab Sheet (pdf)
Source Code (zip)
Solution (zip)

Lab 09 - MOLE QUIZ 2, Profiling and Assignment Help

Assignment help
Profiling your own code in the Diamond Lab Notes (pdf)
Solution code and profiles (zip)

Lab 10 - CUDA OpenGL Interop

Lab Sheet (pdf)
Source Code (zip)
Solution

Lab 11 - Profiling and Assignment Help

Assignment help
Profiling your own code in the Diamond Lab

Discussion, Announcements and Help

Discussion, announcements and help requests (outside of the labs) should be made via the modules public Gooogle Group.

https://groups.google.com/a/sheffield.ac.uk/d/forum/com4521-group

Calendar

You can add this calendar to your University of Sheffield Google Calendar by searching for COM4521 and COM6521

About the Course

Assessment

Assignment

MOLE Quizes

Lecture Notes

Week 01:

Lecture 01 - Introduction (pdf)

Lecture 02 - Introduction to C (pdf)

Additional Notes on Visual Studio (pdf)

Week 02:

Lecture 03 - Memory (pdf)

Lecture 04 - Optimisation (pdf)

NSS - National Stduent Survey (pdf)

Week 03:

Lecture 05 - OpenMP (pdf)

Lecture 06 - OpenMP Part II (pdf)

Week 04:

Lecture 07 - GPU Architectures (pdf)

Lecture 08 - Introduction to CUDA (pdf)

Week 05:

Lecture 09 - CUDA Memory (pdf)

MOLE QUIZ

Week 06:

Lecture 10 - CUDA Shared Memory (pdf)

Lecture 11 - CUDA Performance (pdf)

Week 07:

Lecture 12 - Warp Level CUDA (pdf)

Lecture 13 - Parallel Patterns (pdf)

Week 08:

Lecture 14 - Sorting and Libraries (pdf)

Lecture 15 - CUDA Streams (pdf)

Easter Vacation

Week 09:

Lecture 16 and 17 - Performance Optimisation (pdf)

Week 10: BANK HOLIDAY

Week 11: Invited Talk

Week 12: NO LECTURE

Labs

Lab Register

Getting started with the DGX-1 - Self Paced Lab

Lab 01 - C Programming

Lab 02 - Memory and Performance

Lab 03 - OpenMP

Lab 04 - Introduction to CUDA

Lab 05 - Memory

Lab 06 - Shared Memory and Occupancy

Lab 07 - Atomics and Primitives

Lab 08 - Libraries and Streams

Lab 09 - MOLE QUIZ 2, Profiling and Assignment Help

Lab 10 - CUDA OpenGL Interop

Lab 11 - Profiling and Assignment Help

Discussion, Announcements and Help

Calendar

Recommended Reading