You are supposed to work in groups of two persons.

Although this page is in English, you are allowed to write your report in Dutch.

Assignment 1

For assignment 1 you will have to establish a connection between a Python- or C#-program and a database (SQLite).

Ranking on query results

The goal of this assignment is basically to implement the ideas of the article by Agrawal, Chaudhuri e.a. So you have to solve both the zero-answers and the many-answers problem. We offer a table and a query workload.

The program should be able to process conjunctive equality queries (ceq) on the table. A ceq consists of predicates of the kind attr = value, separated by comma's, terminated by a semicolon. An input consists of a ceq and is read by a (simple) gui. The required value for k is also entered according to this syntax. When missing, use a default value k = 10. Example inputs:

k = 6, brand = 'volkswagen';
cylinders = 4, brand = 'ford';

The basic query is
SELECT * FROM autompg WHERE ceq;
In the ceq, the comma's are replaced by ANDs and the k-value is left out.
The output consists of the top-k tuples according to some ranking principle.

In fact, you have to write two programs. The first program will do some preprocessing on the data and/or the workload. During this phase, a meta database will be constructed and filled. This metadb will be used when answering the actual queries. The second program will be able to process queries and show the answers, making use of the metadb for ranking. Note that your metadb should be constructed only once, for this particular contents of the db and before processing a batch of queries.

Your software is supposed to meet at least requirement [1]. Then your maximum score is 8. If you deal in a satisfying way with requirements [2] and/or [3], you can increase your maximum score with one point for each requirement.

[1] deal with similarity properties of numerical attributes
[2] use sophisticated techniques for finding value-similarities
[3] use sophisticated techniques for top-k calculations.

The deliverables are:

  • A text file metadb.txt, containing the data definitions required for your meta database (that is
    filled during the preprocessing)
  • A text file metaload.txt, containing sql-statements used to fill the metadb
  • A C#/Python program to determine and fill the contents of the metadb, based on the preprocessing of data and the workload
  • A C#/Python program to deal with the queries
  • A description of your approach, explaining choices and describing experiences. It contains a class diagram of your second program. It also contains an extensive discussion of your approach towards solving both problems. Format: pdf; max 6 pages. This report is also important when grading your work.
  • Finally, you will have to give a short demonstration of you program to one of the assistants.

Zip your stuff before submitting. Deadline for assignment 1 is Tuesday May 31. Details about the submission will follow..