MS Thesis Defense: Filter Representation in Vectorized Query Execution (Amadou Ngom)
Advances in memory capacity have allowed Database Management Systems (DBMSs) to store large amounts of data in memory, thereby shifting the performance bottleneck of query execution from disk accesses to CPU efficiency (i.e., instruction count and cycles per instruction). One technique used to achieve such efficiency in analytical applications is batch-oriented processing or vectorization: it reduces interpretation overhead, improves cache locality, and allows for efficient loop optimizations (e.g., loop unrolling, SIMD vectorization). For each vector (i.e., a batch of tuples), the DBMS needs to track the set of valid tuples after a predicate application. To that end, systems employ one of two data structures, filter representations: Selection Vectors (SelVecs) or Bitmaps. In this work, we analyze each approach’s strengths and weaknesses to provide recommendations on how to implement vectorized operations. Through a wide range of micro-benchmarks, we determine that the optimal implementation strategy is a function of many factors: the cost of iterating through tuples, the cost of the operation itself, and how amenable it is to SIMD vectorization. Our analysis shows that Bitmaps perform better for operations that can be efficiently vectorized using SIMD instructions but that SelVecs perform better on all other operations due to a cheaper iteration logic.
- Andy Pavlo
- Todd Mowry