Optimization & Vectorization

Universiteit Utrecht - Information and Computing Sciences

academic year 2018/19 – 1st block

title image title image title image



Lectures & Slides

Examination & Grading

Course Overview


Literature & Links

Newsback to navigation

Recent news

December 12:

  • The retake assignment is now available, download by clicking here.
  • Task: make it faster. Grading: speedup * 2.0f, capped at 1 and 10.
  • Deadline: January 7th 2019, 23:59.
  • Submission: please send your solution as a mail with a dropbox link or equivalent.

November 7:

  • Exam grades are now available in the concept final grades list.
  • Exam with answers can be downloaded here.

     Image courtesy of Cas Laugs. Thanks to all
INFOMOV 2018 participants!

Older news still available here.

Course Overview back to navigation

bunny logo image

Course: INFOMOV is a practical course on optimization: the art of improving software performance, without affecting functionality. We apply high level and low level optimizations, in a structured manner. Especially for the low level optimizations, we must intimately understand the hardware platform (CPU, GPU, memory, caches) and modify our code to use it efficiently.

Vectorization: Modern processors achieve their performance levels using parallel execution. This happens on the thread level, but also on the instruction level. Being able to produce efficient vectorized code is an important factor in achieving peak performance.

GPGPU: Graphics processors employ a streaming code execution model, taking vectorization to extremes, both in the programming model and the underlying architecture. Leveraging GPU processing power is an important option when optimizing existing code.

Context: Optimization is a vital skill for game engine developers, but also applies to other fields.

Lecturer: Jacco Bikker (j.bikker@uu.nl or bikker.j@gmail.com)

Comms: join us on Slack: INFOMOV2018.


  • Mondays, 09:00h - 12:45h
    Room BBG-001
  • Wednesdays, 09:00h - 12:45h
    Room BBG-001

The course will be taught in English. Warning: A decent level of C/C++ is expected. Expect a significantly higher workload if you are a C++ novice.

Files back to navigation


Cross-platform version of the C/C++ template used in the course.
OpenCL template for the GPGPU lectures.
OpenCL template for those that prefer C#.

Resources will be made available during the course.

Lecture Slides & Recommended Readingsback to navigation

Below is a list of all lectures with a very brief summary of the topics, slides downloads, and recommended readings to prepare for the lecture.
This list is tentative.

Lecture 01
Mon Sep 10

Topic: Introduction  This lecture serves as an introduction to the course. And: Profiling With or without knowledge of optimization, it proves hard to 'guess' application performance bottlenecks. Profiling is a vital first (and often repeated) step in a structured approach to optimization.

Suggested readings:

Designing for Performance, Scalability & Reliability: StarCraft II's Approach

Lecture 1 - introduction
Profiling tutorial

Lecture 02
Wed Sep 12

Topic: Low Level Optimization  In this lecture, we explore various low level factors that determine application performance.

Suggested readings:

Michael Karbo, Inside the CPU (Chapter 30 from PC Architecture)

Lecture 2 - low level

Lecture 03
Wed Sep 19

Topic: Caching (1)   Considering the huge latencies involved in fetching data from RAM, caches play a crucial role in 'feeding the beast'. We explore various cache architectures and investigate implications in software development.

Suggested readings:

What Every Programmer Should Know About Memory

Lecture 3 - caching (1)

Lecture 04
Mon Sep 24

Topic: Caching (2)  Continuation of the topic of the previous lecture.

Suggested readings:

Game Programming Patterns - Data Locality

Lecture 4 - caching (2)

Lecture 05
Wed Sep 26

Topic: SIMD (1)  With CPU clock speeds reaching practical limits, parallelism becomes the main source of further advances in hardware performance. In this lecture, Intel's approach to SIMD programming (SSE) is introduced.

Lecture 5 - SIMD(1)

Lecture 06
Mon Oct 1

Topic: SIMD (2)  Building on the concepts of lecture 6, we investigate advanced SIMD topics such as gather / scatter and masking.

Suggested readings:

Looking for 4x speedups? SSE to the rescue! (warning: by Intel)
SIMD Tutorial

Lecture 6 - SIMD(2)

Lecture 07
Wed Oct 3

Topic:  Data-Oriented Design  Where Object-Oriented Design focuses on ease of development and maintainability, Data-Oriented Design focuses on data layout that is optimal for performance. Targeting the computer rather than the developer has significant impact on software architecture, but also on performance.

Suggested readings:

Data-Oriented Design (Or Why You Might Be Shooting Yourself in the Foot With OOP)

Lecture 7 - Data-Oriented Design

Lecture 08
Mon Oct 8 

Topic: GPGPU (1)  For certain problems, a streaming processor is a good (and powerful) alternative to the CPU. In this lecture, we briefly explore GPU architecture and the concept of GPGPU.

Lecture 8 - GPGPU (1)

 Lecture 09
Wed Oct 10

Topic: GPGPU (2)  Building on the previous lecture, we investigate the implementation of a Verlet fluid simulator on the GPU.

Lecture 9 - GPGPU (2)

Stage 1 - Stage 2 - Stage 3 - Stage 4

Lecture 10
Mon Oct 15

Topic: GPGPU (3)  Transferring optimization concepts to the GPU: profiling, high-level, low-level, caches, other architecture considerations. GPGPU-specific algorithms for common problems.

Suggested readings:

A Survey of General-Purpose Computation on Graphics Hardware

Lecture 10 - GPGPU (3)

Lecture 11
Wed Oct 17

Topic: Fixed Point Math  Floating point calculations can often be done with integer arithmetic, and there are good reasons for doing so. In this lecture, the 'lost art' of fixed point arithmetic is introduced.

Suggested readings:

The Neglected Art of Fixed Point Arithmetic

Slide preview:
Lecture 11 - Fixed Point
Final slides will be made available after the lecture.

Lecture 12
  Mon Oct 22

Topic: Efficient Multithreading  Multi-core processing, NUMA architectures and hyperthreading, and the consequences for cache behavior. Worker threads as the basis for efficient parallel systems.

Lecture 12 - Multi-threading

Lecture 13
Wed Oct 24

Topic: Snippets  Various examples of optimizations that worked and didn't work.

Lecture 13 - Snippets

Lecture 14
Mon Oct 29

Guest lecture: Vincent Hindriksen, StreamHPC. Topic to be announced. Lecture starts at 10AM!


Lecture 15
Wed Oct 31

Topic: Process & Grand Recap Final lecture, recap of the structured optimization process, recap of various concepts, exam preparation.


Lecture 15 - Grand Recap

Course Schedule back to navigation

Period 1 Schedule (tentative)

Week Date Lecture / Exams Practicum Deadlines

No lecture

Mon Sep 10
Lecture 1:
  First practicum: profiling tutorial.
Document, project files.
Wed Sep 12
Lecture 2:

P1 will be introduced.
Mon Sep 17
No lecture

Wed Sep 19
Lecture 3:
Caching (1)
Thu Sep 20: Deadline Assignment 1
Click here for P1 details
Fri Sep 20, 23:59:
Extended Deadline (-1 pt)
P2 will be introduced.
Mon Sep 24
Lecture 4:
Caching (2)

Wed Sep 26
Lecture 5:
SIMD (1)

Mon Oct 1
Lecture 6:
SIMD (2)

P3 will be introduced.
Wed Oct 3
Lecture 7:
Data-Oriented Design
Thu Oct 4: Deadline Assignment 2
Click here for P2 details.
Fri Oct 5, 23:59:
Extended Deadline (-1 pt)

Mon Oct 8
Lecture 8:

Wed Oct 10
Lecture 9:

Mon Oct 15
Lecture 10:
   P4 will be introduced.
Wed Oct 17
Lecture 11:
Fixed Point Math
Thu Oct 18: Deadline Assignment 3
Click here for P3 details.
Fri Oct 19, 23:59:
Extended Deadline (-1 pt)
Mon Oct 22
Lecture 12:
Efficient Multithreading

Wed Oct 24
Lecture 13:
Guest lecture: StreamHPC
Mon Oct 29
Lecture 14:

Wed Oct 31
Lecture 15:
Process & Grand Recap
Thu Nov 1, 23:59:
Final Assignment
Fri Nov 2, 23:59:
Extended Deadline (-1 pt)


Tue Nov 6, 17:00:
Final Exam in
BEATRIX 7th floor


Assignments back to navigation

All assignments can be done alone or in teams of two students. For teams: it is not mandatory to do all four assignments with the same partner; switching is allowed.

Assignment P1 - LOW LEVEL

For assignment 1, you will low-level optize Wu's anti-aliased line rendering algorithm. The code for this project can be downloaded below,
along with the formal assignment description, which also contains grading details.


Assignment 1 description
Evolution project
Evolution optimized (for reference / grading) and win32 version.

Deadline: Thursday September 20, 23:59. Extended deadline (1pt penalty): Friday September 21, 23:59.

Assignment P2 - CACHING

Assignment 2 revolves around caching. You will implement a cache simulator to speed up the rendering of some fractals. Code and details can be downloaded below.


Assignment 2 description
Fractal project (Or alternatively: C# version)

Deadline: Thursday October 4, 23:59. Extended deadline (1pt penalty): Friday October 5, 23:59.


For the third assignment, you will get some hands-on experience with the SIMD theory, as well as neural networks. Code and details are available below.


Assignment 3 description

Deadline: Thursday October 18, 23:59. Extended deadline (1pt penalty): Friday October 19, 23:59.



In this final assignment, you will apply all techniques to speed up a small real-time strategy game.


Assignment 4 description

Deadline: Thursday November 1, 23:59. Extended deadline (1pt penalty): Friday November 2, 23:59.

Alternative Assignment P4

Instead of working on the standard P4 assignment, you may also propose your own project. This is intended for people that want to use the INFOMOV course to optimize a specific application. Please contact me for details before proceeding.


Exam & Grading back to navigation


Programming assignments: Your practical grade P is based on three programming assignments P1, P2, P3 (20% each) and one final assignment P4 (40%).

Exam: Your exam / theory grade T is based on a single final exam.

Final grade: Your final grade is (3P + T) / 4. You must score at least 4.0 (before rounding) for the exam to pass this course.


Retake: To qualify for a retake, the final grade must be at least 4 (before rounding). You may repair your final grade by redoing one of the four assignments, or the exam. Exact terms will be discussed individually.


Literature & Links back to navigation

Overview of literature used during this course:

Previous editions


News Archive back to navigation

Old posts

November 2:

October 31:

  • Slides for the final lecture are now online.
  • Combined grade overview (P1, P2, P3) can be downloaded. Please inform me of any errors / omissions.

October 30:

  • For next lecture: have a look at the 2016 and 2017 exams.

October 24:

  • Slides for lecture 13 ("Snippets") now available. Madness!
  • Note: postponed guest lecture starts at 10am on Monday morning.
  • Consider reading this OpenCL tutorial from the Concurrency course.

October 19:

October 17:

  • Final slides for lecture 11 have been uploaded.
  • Preview of lecture 12 slides now available.

October 15:

October 10:

  • Slides for lecture 9 ("GPGPU(2)") now online, along with 4 stages of the 'balls' application.
  • Preview of lecture 10 slides now also available.
  • Final assignment - assignment 4 now available!

October 8:

  • Updated grade lists for assignment P1 and P2.

October 5:

October 2:

September 26:

  • Gravity code.
  • Slides for lecture 5 ("SIMD(1)") and lecture 6 (experiment, "SIMD(2)") now available.

September 24:

September 21:

  • Grades for the first practical can be downloaded here (batch #1, late submissions to follow).

September 19:

September 18:

  • Assignment 2 details have been posted.
  • Assignment 2 now also available in C#.

September 12:

September 11:

  • Assignment 1 is now available. This will be introduced properly tomorrow.

September 10:

  • Reminder: Wednesday lecture starts at 11:00am (after working lecture, which starts at 9:00).
  • Wednesday working lecture content is the same as Monday, so pick your ideal slot.
  • Join us on Slack! infomov2018.slack.com

September 1:

August 6:

  • Initial version of the 2018/2019 website.

Old posts will be archived here.