Optimization & Vectorization

Universiteit Utrecht - Information and Computing Sciences

academic year 2015/16 – 1st period

title image title image title image



Lectures & Slides

Examination & Grading

Course Overview


Literature & Links


Recent news

November 2:

October 21:

  • Slides for lecture 14 on GPGPU(2) added.
  • Three stages for the Verlet particle simulation added to the files section.

October 14:

October 7:

  • Slides for lecture 10 with practical optimization examples added.

October 5:

October 2:

  • Significant changes to the schedule.
  • Grades for the first assignment can be downloaded here.

September 28:


Course Overview back to navigation

bunny logo image

Course: INFOMOV is a practical course on optimization: the art of improving software performance, without affecting functionality. We apply high level and low level optimizations, in a structured manner. Especially for the low level optimizations, we must intimately understand the hardware platform (CPU, GPU, memory, caches) and modify our code to use it efficiently.

Vectorization: Modern processors achieve their performance levels using parallel execution. This happens on the thread level, but also on the instruction level. Being able to produce efficient vectorized code is an important factor in achieving peak performance.

Context: Optimization is a vital skill for game engine developers, but also applies to other fields.

Lecturer: Jacco Bikker (j.bikker@uu.nl)


  • Mondays, 11:00h - 12:45h,
    Room BBG-023 (starting week #37)
  • Wednesdays, 11:00h - 12:45h,
    Room BBG-083 (starting week #37)

The lecture will be given in English.

Files back to navigation


Overview of example projects used in the course

balls stage 1.zip
balls stage 2.zip
balls stage 3.zip

Lecture Slides & Recommended Readingsback to navigation

Below is a list of all lectures with a very brief summary of the topics, slides downloads, and recommended readings to prepare for the lecture.

Lecture 01
Mon Sep 7

Topic: Introduction This lecture serves as an introduction to the course.


lecture1 - introduction.pdf

Lecture 02
Wed Sep 9

Topic: Profiling With or without knowledge of optimization, it proves hard to 'guess' application performance bottlenecks. Profiling is a vital first (and often repeated) step in a structured approach to optimization.

Suggested readings:

Designing for Performance, Scalability & Reliability: StarCraft II's Approach


lecture2 - profiling.pdf

Lecture 03
Mon Sep 14

Topic: Low Level Optimization In this lecture, we explore various low level factors that determine application performance.

Suggested readings:

Michael Karbo, Inside the CPU (Chapter 30 from PC Architecture)


lecture3 - low level.pdf

Lecture 04
Wed Sep 16

Topic: Caching (1) Considering the huge latencies involved in fetching data from RAM, caches play a crucial role in 'feeding the beast'. We explore various cache architectures and investigate implications in software development.

Suggested readings:

What Every Programmer Should Know About Memory


lecture4 - caching (1).pdf

Lecture 05
Mon Sep 21

Topic: Caching (2) Continuation of the topic of the previous lecture.

Suggested readings:

Game Programming Patterns - Data Locality


lecture5 - caching (2).pdf

Lecture 06
Wed Sep 23

Topic: SIMD (1) With CPU clock speeds reaching practical limits, parallelism becomes the main source of further advances in hardware performance. In this lecture, Intel's approach to SIMD programming (SSE) is introduced.


lecture6 - simd (1).pdf

Lecture 07
Mon Sep 28

Topic: SIMD (2) Building on the concepts of lecture 7, we investigate advanced SIMD topics such as gather / scatter and masking.

Suggested readings:

Looking for 4x speedups? SSE to the rescue! (warning: by Intel)


lecture7 - simd (2).pdf

Lecture 08
Wed Sep 30 

Because of the Control Conference in Utrecht, there will be no lecture on this day. If you are not attending the conference, you are welcome to join an extra long lab session in BBG-083.

 Lecture 09
Mon Oct 5

Topic: Fixed Point Math Floating point calculations can often be done with integer arithmetic, and there are good reasons for doing so. In this lecture, the 'lost art' of fixed point arithmetic is introduced.

Suggested readings:

The Neglected Art of Fixed Point Arithmetic


lecture9 - fixed point.pdf

Lecture 10
Wed Oct 7

Topic: Applied Optimization - Inserted by popular demand - During this lecture, we will explore various high level and low level optimizations in small-scale applications. This is a code-intensive lecture.




lecture10 - practical.pdf

Lecture 11
Mon Oct 12

Topic: Presentations In preparation of the final assignment, you are invited to (briefly) present an optimization problem, along with literature on the topic. Use this session for peer feedback and inspiration.

GROUP A   (see e-mail)
Team Marcus, Team Asscheman

Lecture 12
Wed Oct 14

Topic: GPGPU (1) For certain problems, a streaming processor is a good (and powerful) alternative to the CPU. In this lecture, we briefly explore GPU architecture and the concept of GPGPU.


lecture12 - gpgpu (1).pdf

Lecture 13
Mon Oct 19

Topic: Presentations In preparation of the final assignment, you are invited to (briefly) present an optimization problem, along with literature on the topic. Use this session for peer feedback and inspiration.

GROUP B   (see e-mail)
Team Sen, Team de Heer, Team Hadjicosti, Team Kuijper, Team Vermeulen, Team Loubos, Team Molenaar, Team Li, Team Carvalhal, Team Lardinoije (FULL)

Lecture 14
Wed Oct 21

Topic: GPGPU (2) Building on the previous lecture, we investigate some GPGPU-specific algorithms for common problems.
And: Optimizing for GPU Like CPU software, GPGPU code benefits from hardware-specific optimizations. Several examples for AMD and NVidia are explored.


balls stage 1.zip
balls stage 2.zip
balls stage 2.zip


lecture14 - gpgpu (2).pdf

Lecture 15
Mon Oct 26

Topic: Presentations In preparation of the final assignment, you are invited to (briefly) present an optimization problem, along with literature on the topic. Use this session for peer feedback and inspiration.

GROUP C   (see e-mail)
Team Born, Team Kerrebijn, Team Gramberg, Team Visser, Team Octopus, Team Vasile, Team van de Hoef, Team Dortmont, Ronald Dommisse, Team Daniëlse (FULL)

Lecture 16
Wed Oct 28

Topic: Process & Grand Recap Successful software optimization is the result of a structured and deliberate process. We review this process, and review the course as a whole.


lecture16 - process & recap.pdf


Course Schedule back to navigation

Period 1 Schedule

Week Date Lecture / Exams Practicum Deadlines
Literature / practice
Mon Sep 7
Lecture 1:
Wed Sep 9
Lecture 2:

Please watch: "Designing for Performance, Scalability & Reliability: StarCraft II's Approach". Have a look at the run-time profiling mechanism in water.zip. Consider how this could be efficiently extended to a generic run-time profiling system, and how this can be extended beyond measuring just time.
Mon Sep 14
Lecture 3:
Low level optimization
Wed Sep 16
Lecture 4:
Caching (1)

Mon Sep 21
Lecture 5:
Caching (2)
Wed Sep 23
Lecture 6:
SIMD (1)

Mon Sep 28
Lecture 7:
SIMD (2)
 Tue Sep 29: Deadline Assignment 1
"Cache Simulator"
At 9.30 (before the lecture) there will be a lab session in BBG-023. Feel free to join.
Wed Sep 30
Lecture 8:
Lab session / Control Conference

Mon Oct 5
Lecture 9:
Fixed Point Math
   At 9.30 (before the lecture) there will be a lab session in BBG-023. Feel free to join.
Wed Oct 7
Lecture 10:
Applied Optimization

Mon Oct 12
Lecture 11:
Presentations (group A)
Tue Oct 13: Deadline Assignment 2
"SIMD" / "RTS"
 At 9.30 (before the lecture) there will be a lab session in BBG-023. Feel free to join.
Wed Oct 14
Lecture 12:

Mon Oct 19
Lecture 13:
Presentations (group B)
   At 9.30 (before the lecture) there will be a lab session in BBG-023. Feel free to join.
Wed Oct 21
Lecture 14:

Mon Oct 26
Lecture 15:
Presentations (group C)
   At 9.30 (before the lecture) there will be a lab session in BBG-023. Feel free to join.
Wed Oct 28
Lecture 16:
Process & Grand Recap



Thu Nov 5: Deadline Final Assignment


Assignment P1

For the first assignment, you will implement an accurate cache simulator for a current architecture (AMD and/or Intel). The simulator is configured by specifying sizes and latencies for L1, L2, L3 and RAM (and optionally, the eviction policy), and interfaces with an application using a READ and WRITE function for each valid datum size (8, 16, 32, 64 and 128 bit).
Deliver the cache simulator and a test application, along with a (brief) report that describes the simulator.
For this project, you may either work alone or together with one other student.

Deadline: Tuesday, September 29, 23.59.
Late delivery: Wednesday, September 30, 23.59 (1pt penalty).

Assignment P2

For this assignment, you have two options:
1. Vectorize the UpdateParticles function in gravity.zip. This is a solo project. See slides for lecture 7 for details.
2. Apply the structured optimization process to a minimalistic real-time strategy game. See slides for lecture 7 for details.

Deadline: Tuesday, October 13, 23.59.
Late delivery: Wednesday, October 14, 23.59 (1pt penalty).

Assignment P3

For the final assignment, you are asked to optimize a real-world application of your choice. Valid options include: an application developed as part of a research project, an open source application, or a personal project. The application must meet the following criteria:

  • It must be of a significant scope (thousands of lines of code);
  • The application must benefit from optimization;
  • Optimization must be possible at a low level, i.e. in principle the code to be affected must be written in C++;
  • It must be possible to profile the application, i.e. in principle you must have access to the full source code;
  • Ideally, both high level and low level optimizations must be applicable to the application.

In short: using this project, you will show your optimization skills. The application must offer sufficient opportunity to do this.


  • A brief presentation on your project;
  • The optimized application;
  • A report detailing the optimization process.

Deadline: Thursday, November 5, 23.59.
Late delivery: Friday, November 6, 23.59 (1pt penalty). 

Exam & Grading back to navigation


Programming assignments: Your grade is based on two programming assignments (25% each) and a final assignment (50%).

Final grade: This course is graded based on the practical assignments alone.


Retake: To qualify for a retake, the final grade must be at least 4 (before rounding). You may repair your final grade by redoing one of the three assignments. Exact terms will be discussed individually.


Literature & Links back to navigation

Overview of literature used during this course:


News Archive back to navigation

Old posts

September 23:

  • Slides for lecture 6 are online.
  • Gravity example can be obtained here (version with SIMD backdrop: game.cpp).

September 21:

  • Schedule updated: lab on Monday September 28, 9.30; other changes.
  • Slides for lecture 5 are online.
  • Rotozoom cache example can be obtained here.

September 16:

  • Cache simulator demo app: fractal.zip.
  • Slides for lecture 4 are online (includes assignment 1 details).

September 14:

  • Slides for lecture 3 are online.
  • Download a pratice project for lecture 3 on low level optimization here.
  • C++ Template can be obtained here.

September 10:

  • Slides for lecture 2 are online. For next lecture: have a look at the 'literature / practice' section in the schedule.
  • Updated version of water.zip (Visual Studio 2015 compatible) is now available.

September 9:

  • Water example for lecture 2.

September 7:

July 30:

  • Site created.