[DB Seminar] Spring 2017: Mohammad Hammoud

Event Date: Monday April 10, 2017
Event Time: 04:45pm
Location: GHC 8102
Speaker: Mohammad Hammoud

Title: PolyHJ: A Polymorphic Main-Memory Hash Join Paradigm For Multi-Core Machines

Relational join is a fundamental data management operation, which highly influences the performance of almost every database query. In this talk, I will show that different workload characteristics and hardware configurations necessitate different main-memory hash join models. Subsequently, I will identify four effective models by which any hash-based join algorithm can be executed. I will characterize the relative merits of each model and present PolyHJ, a novel polymorphic join scheme, which dynamically selects the best model for any given workload features and hardware setting. In addition, PolyHJ executes an efficient implementation for any selected model, which incorporates redesigned partitioning, building, and probing phases of classical hash joins. Specifically, it involves a new in-place, cache-aware partitioning (ICP) and collaborative building and probing (ColBP) mechanisms. ICP and ColBP serve in improving scalability, increasing cache locality, and saving multi-core memory bandwidth. In particular, ICP increases cache locality and saves memory bandwidth via re-using cached blocks in input relations. Alongside, ColBP reduces partitioning cost and enhances scalability through allowing each hash table to be as large as the total size of the last-level cache (LLC) in chip multi-core machines. This stems from our study of modern high-end CPUs, whereby we observed that per-thread cache has been largely unchanging for over a decade, while the capacity of LLC has been actually growing with larger numbers of cores.

Mohammad Hammoud is an Assistant Teaching Professor at Carnegie Mellon University in Qatar, wherein he teaches and researches distributed systems, cloud computing, database applications, and parallel computer architecture. Mohammad has a broad interest in anything that can help distributed and parallel computer systems perform faster. His current focus is on designing and developing scalable graph analytics frameworks as well as database systems for modern hardware.