Dr Paul Richmond

Dr Paul Richmond

Research Software Engineer

About the Course

Welcome to the 2020 module page for COM4521/COM6521.

Accelerator architectures are discrete processing units which supplement a base processor with the objective of providing advanced performance at lower energy cost. Performance is gained by a design which favours a high number of parallel compute cores at the expense of imposing significant software challenges. This module looks at accelerated computing from multi-core CPUs to GPU accelerators with many TFlops of theoretical performance. The module will give insight into how to write high performance code with specific emphasis on GPU programming with NVIDIA CUDA GPUs. A key aspect of the module will be understanding what the implications of program code are on the underlying hardware so that it can be optimised.

The modules aims, objectives and assessment details are available on the modules public teaching page.

Lecture and Lab Location

Lectures will usually take place on Tuesdays 9:00 until 11:00 in Broad Lane Lecture Theatre 7 (BROAD-LT7). In week 5 the first half of the lecture (9:00) will be a MOLE quiz in Diamond Computer Room 3. In week 10 the second half of the lecture (10:00) will be a MOLE quiz in Diamond Computer Room 1. Note: Any other changes to lectures will be added to the course calendar with an email sent to the course mailing list.

Labs are double taught, you only need to attend one two lab session per week. You should have been allocated a group by the teaching admin team.

  • Group 1 lab is run on Tuesdays 13:00-15:00 (on week 5 the lab is 11:00-13:00).
  • Group 2 lab is Wednesday 9:00-11:00 (in week 6 the lab is on Friday 20th March 10:00-1200, and there is no lab in week 10).

Lecture and Lab Material

Lectures notes and labs are available via Google Drive (you will need to login with your university credentials) or via the links below. Lecture material will be made available shortly before the lecture each week. Lab solutions will be released after both labs have finished.

Software for the Module

The module programming exercises are designed to be completed in the Diamond high spec compute room. The room has recently been upgraded with Visual Studio 2017 and CUDA 10. If you intend to use your own machine for programming exercises (on the CUDA part of the module) then you must install the latest Community version of Visual Studio 2019 before you install the CUDA toolkit. Some versions of Visual Studio 2017 are not compatible with CUDA.

If you want to complete the exercises in Linux then example Makefiles will be provided with the lab starting code and solution. It is not possible to build Linux CUDA programs in the high spec compute room, however there will be lab instructions on how to remotely build and execute CUDA code for the Universities High Performance (HPC) computing system (ShARC).

Computers and Labs Available

The Diamond high spec compute room is booked for the lab classes and is always available to you during standard lab hours. Our side of lab times the room has a tendency to be quite busy, especially towards the end of term when assignments are due. The following options are available to you;

  • The Diamond high spec compute room (compute room 4)- The room has been block booked for participants in this module and Computer Science project students (See list of bookings). If the room is occupied then machines must be vacated to provide space for you. If people are unwilling to move then consult the Diamond reception desk.
  • Diamond bookable high spec laptops - Available to borrow from the lockers on Diamond Level 4 (for use within the Diamond building only) after completing an induction. To arrange an induction contact James Screaton.
  • Your own Windows/Linux machine - Follow the instructions under “software for this module”.
  • Your own Linux machine - Example makefiles are provided with the lab handouts and solutions. You will need to install CUDA 10.0
  • ShARC - In week 8, a lab will be provided to guide you through how to use the ShARC facility to submit GPU jobs to the Universities HPC system.

Lab Attendance Checking and Module Feedback

You are required to complete a lab register to indicate your progress with the lab exercises each week. You should fill this in once you have completed the lab exercises and reviewed the solutions. It is not expected that you will be able to complete all of the lab exercises within the two hour lab slot, you are expected to undertake independent study but you must try and complete the labs before the start of the next lab. The purpose of the lab register is to monitor class progress to be able provide feedback, clarifications on difficult areas and additional assistance to you. please use the additional comments section to highlight if you found the content useful, too easy or difficult, or would you like some aspect of the course to be clarified?

Lab Register and Feedback Form

Note: There is a form section per lab class. Please only complete this form once and edit your last submission each week to update.


You will receive feedback from the lab registers during lectures and lab classes. Feedback will be provided through the MOLE quizes and via marking of the assignment. Importantly your first assignment will provide feedback which you can use to improve your assignment 2 hand in.


There are two assignments for the module which contribute 80% of the total module mark

  • Part 1 (30% of the assignment total) - Released Week 3, Due Monday week 7 (23/03/2020) at 15:00
  • Part 2 (70% of the assignment total) - Released Week 6, Due Monday week 11 (18/05/2020) at 15:00

The remaining 20% of the module mark is from two mole quizes which are

  • Week 5 - During the first half of the lecture (9:00 10/03/2020) in Diamond compute room 3
  • Week 10 - During the second half of the lecture (10:00 05/05/2020) in Diamond compute room 1

DDP students and staff candidates

PhD students and research/academic staff are not required to undertake assessment but are asked to complete the lab feedback form as evidence of participation in the module. You are free to attend either lab session subject to availability however please note that the lab space is prioritised for undergraduate and masters students.

Discussion, Announcements and Requests for Help

A Google group has been created for announcements, help and discussion. Any changes to timetabling will be made via this group. All students enrolled on the module on the 29th January 2020 have been added to this group already. If you have transferred via Add/Drop then you will need to manually join the group yourself. The group is monitored by the teaching staff (including lab assistants) as well as additional PhD students who can provide help with the lab classes or assignment. This is a public (within the University of Sheffield) forum and therefore you should not post assignment code. If you require personal assistance then you should request this during the lab hours. Any lab class can be used for assignment help in addition to the lab exercises which are set.


Teaching Material

Week 01

Lecture 01 - Introduction (pdf)

  • Context and Hardware Trends
  • Supercomputing
  • Software and Parallel Computing
  • Course Outline

Lecture 02 - Introduction to C (pdf)

  • Introduction to the C Programming Language
  • Basic C Usage “Hello World”
  • Functions and Scoping
  • Arrays, Strings and Basic IO
  • File IO
  • Extra Material: Visual Studio 2017 Overview (pdf)

Lab 01 - Introduction to Visual Studio and C Programming


Lecture 03 - Memory (pdf)

  • Pointers
  • Advanced use of Pointers
  • Dynamically Managed Memory
  • Structures
  • Binary Files

Lecture 04 - Optimisation (pdf)

  • Optimisation Overview
  • Compute Bound Code
  • Memory Bound Code

Lab 02 - Memory and Performance

Week 03

Lecture 05 - OpenMP (pdf)

  • Multi-core Systems and OpenMP
  • Parallelising Loops
  • Critical Sections and Synchronisation
  • OpenMP Scoping
  • Task Parallelism with Sections

Lecture 06 - OpenMP Part II (pdf)

  • OpenMP TIming
  • Parallel Reduction
  • Scheduling
  • Nesting

Lab 03 - OpenMP

Week 04

Lecture 07 - GPU Architectures (pdf)

  • What is a GPU?
  • General Purpose Computation on GPUs (and GPU History)
  • GPU CUDA Hardware Model
  • Accelerated Systems

Lecture 08 - Introduction to CUDA (pdf)

  • CUDA Programming Model
  • CUDA Device Code
  • CUDA Host Code and Memory Management
  • CUDA Compilation and execution in Visual Studio

Lab 04 - Introduction to CUDA

Week 05

Quiz BEFORE lecture 9

Lecture 09 - CUDA Memory (pdf)

  • Memory Hierarchy Overview
  • Global Memory
  • Constant Memory
  • Texture and Read-only Memory
  • Roundup & Performance Timing

Lab 5 - CUDA Memory

Week 6


Weeks 6 lectures (e.g. Lecture 10 and 11) are pre-recorded on encore. Follow the link below for the lecture content.


Lecture 10 - CUDA Shared Memory pdf

  • Shared Memory
  • Shared Memory Bank Conflicts
  • 2D Shared Memory Bank Conflicts
  • Boundary Conditions for Shared Memory Loading
  • Host-side Configurations for Shared Memory
  • Shared Memory Bank Conflict Calculator (xlxs file)

Lecture 11 - CUDA Performance pdf

  • Global Memory Coalescing
  • Global Memory Coalescing with the L1 Cache
  • Occupancy and Thread Block Dimensions

Lab 6 - Shared Memory


Week 7


Weeks 7 lectures are pre-recorded on encore. Follow the link below for the lecture content.


Lecture 12 - Warp Level CUDA (pdf)

  • Warp Scheduling and Divergence
  • Atomics
  • Warp Operations
  • Excel sheet for SM conflicts

Lecture 13 - Parallel Patterns (pdf)

  • Parallel Patterns Overview
  • Reduction
  • Scan

Lab 07 - Atomics and Primitives

Week 8


Weeks 8 lectures are pre-recorded on encore. Follow the link below for the lecture content.


Lecture 14 and 15 - Performance Optimisation (pdf)

Lab 08 - Profiling Example

  • Online Lab Class Instructions Note: As labs have moved to instancehub there is no need to undertake a lab on ShARC which is extreemly busy with remote research work. This lab class can be used for help with any of the lab exercises or for assignment help.

  • Profile Lecture Example Code

Week 9


Weeks 9 lectures are pre-recorded on encore. Follow the link below for the lecture content.


Lecture 16 - Sorting and Libraries (pdf)

  • Sorting Networks
  • Merge and Bitonic Sort
  • Thrust Parallel Primitives Library
  • Applications of Sorting (binning)

Lecture 17 - CUDA Streams (pdf)

  • Synchronous and Asynchronous execution
  • CUDA Streams
  • Synchronisation
  • Multi GPU Programming

Lab 09 - Libraries and Streams

Week 10

MOLE Quiz during lecture time

This MOLE quiz will no longer be assessed but can still be taken and the answeres reviewed. It is stronly encouraged to take this quiz as it provides formative feedback which will help you to undertsnad your learning.

Lab 10 - Assignment Help

Week 11:

Invited Lecture

Please Find below a list of previous invited lectures

Lab 11 - Optional OpenGL Lab

Note: The OpenGL lab is only suitable for those who have completed the undergraduate Computer Graphics module. The content of this lab is entirely optional and is not assesed as part of the module. You can attend this lab to get help with any of the lab classes or your assignment.

Week 12:

No Lectures or Labs


You can add this calendar to your University of Sheffield Google Calendar by searching for COM4521 and COM6521

Recommended Reading

The following are useful resources but not required reading.

  • Edward Kandrot, Jason Sanders, “CUDA by Example: An Introduction to General-Purpose GPU Programming”, Addison Wesley 2010.
  • Brian Kernighan, Dennis Ritchie, “The C Programming Language (2nd Edition)”, Prentice Hall 1988.
  • NVIDIA, CUDA C Programming Guide