Participants are expected to be familiar with the basics of probability and statistics, multivariable calculus, and linear algebra (familiarity with basic vector/matrix notation should be sufficient).
We can recommend the following online materials for self-study:

  1. MIT course Introduction to Probability. Excellent course covering a lot of material (not just probability, as the name suggest, but also statistical inference).
  2. Khan Academy course Multivariable Calculus.

Note: these courses cover a lot more material than actually required for the data mining course!


The required literature consists of the lecture slides (see the schedule), the lecture notes, and selected book chapters and articles. Below, we specify the required literature per subject. If some part is optional additional reading, it is stated explicitely.


Classification Trees, Regression Trees, Bagging and Random Forests

Undirected Graphical Models (Markov Random Fields)

Frequent Item Set Mining

Text Mining

Bayesian Networks

Social Network Mining