[DB Seminar] Fall 2015: Bryan Hooi / Hyun Ah Song
Suppose you are a teacher, and have to convey a set of object-property pairs (‘lions eat meat’; or ‘aspirin is a blood- thinner’). A good teacher will convey a lot of information, with little effort on the student side. Specifically, given a list of objects (like animals or medical drugs) and their associated properties, what is the best and most intuitive way to convey this information to the student, without the student being overwhelmed? A related, harder problem is: how can we assign a numerical score to each lesson plan (i.e. way of conveying information)? Here, we give a formal definition of this problem of forming learning units and we provide a metric for comparing different approaches based on information theory. We also design a multi-pronged algorithm, HYTRA, for this problem. Our proposed HYTRA is scalable (near-linear in the dataset size); it is effective, achieving excellent results on real data, both with respect to our proposed metric, but also with respect to encoding length; and it is intuitive, conforming to well-known educational principles, such as grouping related concepts, and “comparing” and “contrasting”. Experiments on real and synthetic datasets demonstrate the effectiveness of HYTRA.