[DB Seminar] Spring 2020 DB Group: Mostly Order Preserving Dictionaries

Event Date: Monday April 13, 2020
Event Time: 04:30pm EDT
Speaker: Chunwei Liu [INFO]

Title: Mostly Order Preserving Dictionaries

Dictionary encoding, or domain encoding, is an important form of compression that uses a bijective mapping to replace attributes from a large domain (i.e. strings) with a finite domain (i.e. 32 bit integers). This encoding both reduces data storage and allows for more efficient query execution. Traditional dictionary encoding only supports efficient equality queries, while range queries require that encoded values are decoded for evaluating the predicates. An order preserving dictionary allows for range queries without decoding by ensuring that encoded keys follow the same order as the values in the dictionary. While this approach enables efficient queries it requires that the full set of values is known to create the mappings. In this talk, I will show how we can bridge this gap by introducing mostly ordered dictionaries that use a best-effort dictionary generation. I will cover the query rewriting rules for mostly ordered dictionaries to minimize predicate evaluation latency, and will demonstrate how mostly order-preserving dictionaries can accelerate range filtering and sorting, with graceful performance degradation as the ratio of ordered values decreases.

Zoom Link:

Chunwei Liu is a 4th-year Computer Science Ph.D. student at the University of Chicago. He is working in ChiData Group advised by Professor Aaron J. Elmore. His broad research primarily focuses on distributed systems and databases. He is particularly interested in developing new compression techniques for data systems and building a time-series database for edge devices. He received his B.S. at Beihang University, an M.S. degree at Chinese Academy of Sciences.