- https://www.nat-esm.de/services/workshops-and-trainings/events/gpu-programming-using-cuda
- 🎓 GPU Programming using CUDA
- 2026-01-20T09:00:00+01:00
- 2026-01-29T16:00:00+01:00
- CUDA, as native programming model of Nividia GPUs, allows very fine-grained control over parallel execution compared to higher level programming models such as OpenMP offloading, which helps to optimize performance. The course starts with an introduction to the programming language CUDA which is used to write fast numeric algorithms for NVIDIA GPUs. In the second week more features of GPUs and how to use them with CUDA will be introduced. Further correctness checking and kernel level profiling will be covered as well as some more advanced topics.
Jan 20, 2026
09:00 AM
to
Jan 29, 2026
04:00 PM
(Europe/Berlin / UTC100)
Online
The course provides an introduction to the programming language CUDA which is used to write fast numeric algorithms for NVIDIA GPUs. Focus is on the basic usage of the language, the exploitation of the most important features of the GPU (massive parallel computation, shared memory) and efficient usage of the hardware to maximize performance. An overview of the available development tools and some advanced features of the language is given.
The course is split in two parts in two subsequent weeks:Â
Part 1 starts with an introduction to CUDA, covers the most basic topics necessary and ends with comments on usage of CUDA with more modern C++.
Part 2 introduces more features of GPUs and how to use them with CUDA. Further correctness checking and kernel level profiling will be covered as well as some more advanced topics.
You may attend only part 2 if you have already a good basic knowledge of CUDA (see topics of part 1 in the agenda).
Course subject areas
Topics
Prerequisites and content levels
For part 1 (introductory)
Programming experience in any of C, C++, or Fortran. Exercises will use a Linux cluster. Therefore you should have some basic knowlegde about how to work with a Linux shell and a text editor in a shell. Resources for this could be e.g. https://ubuntu.com/tutorials/command-line-for-beginners and for an editor https://opensource.com/article/19/3/getting-started-vim. Some knowledge about parallel programming is a plus.
For part 2 (advanced)
Additionally to the prerequisites above you should be familiar with the topics of part 1.
Content levels
- Basic: 12Â hours
- Intermediate: 6Â hours
- Advanced: 3Â hours
Learn more about course curricula and content levels.
Instructors
Tobias Haas (HLRS)
Learning outcomes
After this course, participants will:
- be familiar with the CUDA programming model,
- have basic knowledge on performance optimization and profiling of CUDA code,
- be aware of available correctness checking tools,
- have seen a first approach to multi-GPU programming,
- have an overview of important CUDA libraries.
Agenda (preliminary)
All times are local times in the central European time zone (Berlin).
Drop in to the video conference (8:45 - 9:00)
Course will take place from 9:00 - 12:30 on each day.
Cluster dry run on Mon Jan 19
Part 1 (Tue - Thu, Jan 20 - 22)
Day 1
- Basics about CUDA
- Kernel, kernel launch, host/device functions
- Memory management: host and managed memory
- Synchronization
- Error handling
Day 2
- Profiling and NVTX annotations
- Memory management: pinned and device, unified memory
- Overview of CUDA libraries
Day 3
- GPU architecture
- Performance optimization: memory access patterns and cache
- Coalesced memory access
- Modern C++ and GPUs
Part 2 (Mon - Thu, Jan 26 - 29)
Day 4
- Shared and constant memory
- Bank conflics
- Atomic operations
Day 5
- CUDA streams
- Introduction to Multi-GPU
Day 6
- wrapShuffle, CUB
- cooperative_groups
Day 7
- Kernel-level profiling and correctness checking
- Other programming methods (using base language constructs, pragmas and libraries)
- Interoperability with OpenMP
Handouts
Each participant will get access to all slides (PDF).
Exercises
Although this is an online course, the exercises will be very interactive using break out rooms. Participants will work on HLRS's systems.
Registration information
Apply for this course via the button at the top of this page.
Registration closes on January 7, 2026.
Fees
- Employees of the HLRS or the Jülich Supercomputing Centre (JSC) or the Leibniz Supercomputing Centre (LRZ): 0 Euro
- Students without master’s degree or equivalent: 32.50 Euro
- PhD students or employees at a German university or public research institute: 67.50Â Euro
- PhD students or employees at a university or public research institute in an EU, EU-associated or PRACE country other than Germany: 135Â Euro
- PhD students or employees at a university or public research institute outside of EU, EU-associated or PRACE countries: 270Â Euro
- Other participants, e.g., from industry, other public service providers, or government: 690 Euro
Our course fee includes coffee breaks (in classroom courses only).
For lists of EU and EU-associated coutries, and PRACE countries have a look at the Horizon Europe and PRACE website.