To pass the course you have to write a personal (i.e., on your own) essay in which you convince the reader that you have mastered the course material.


In your essay you are to explain the main results of Riondato and Upfall- as discussed during the course. While explaining these results, you are bound to use other results and concepts we discussed during the course, e.g. PAC learning. You should also explain those concepts; one could say that you have to write a recursive explanation.

These papers use PAC learning to derive sampling bounds for frequent item set mining. In the introduction to frequent item set me already encountered a simple sampling result by Toivonen. Indeed, the reason to study PAC learning was to see whether or not we could improve on Toivonen's results. So, did we improve the bounds? You should end your essay with a comparison of the two bounds. A theoretical comparison is mandatory, an experimental one is optional (well-written optional parts will give you a bonus point.

For your essay, you can/should use 5 - 6 pages. You can use Math, but it is not necessary. You should not give a verbatim list of definitions and theorems, but explain in your own words what these things mean - use formal notation only there where the exactness is necessary.

Finally, please use a spellchecker and make sure that words mean what you think they mean.

You should submit your essay (a pdf file) by April 11, 9 am as an attachment to an email that has in the subject the string [ESSAY BIG DATA], your name and your student number - please also provide a title page with the same information. Using your name and studentnumber in the name of your file would be a nice gesture

The deadline for the retake is July 8, at 12 midnight

The Slides of the 10th lectures give some more details, the authorative description can be found here. The latex sources are:

Again, note that Section 5 of this template is optional.