Ippokratis Pandis (Cloudera)
On-line transaction processing (OLTP) is one of the two most important enterprise data management applications. Transaction processing workloads typically exhibit high concurrency and provide ample opportunities for parallel execution by multicore hardware. Unfortunately, due to the characteristics of the application, transaction processing systems must moderate and coordinate communication between independent agents. As a result, transaction processing systems cannot always convert abundant request-level parallelism into execution parallelism, due to communication bottlenecks. In order to improve scalability of transaction processing, we identify three forms of communication in the system—unbounded, fixed, and cooperative—and argue that only the first type poses a fundamental threat to scalability. We then present and evaluate under a common framework, techniques that attack significant sources of unbounded communication during transaction processing, and sketch a solution for those that remain. All the techniques are implemented in Shore-MT, which is the most scalable open-source storage manager. The solutions we present affect fundamental services of any transaction processing engine, such as locking, logging, and physical page accesses. They either reduce unbounded communication, they downgrade it to a less-threatening type, or they eliminate it completely through system redesign.
We find that the latter approach is the most effective. The final design, based on data-oriented transaction execution, cuts unbounded communication by almost two orders of magnitude compared with the baseline, exhibiting more predictable behavior and better scalability. The predictable behavior of the final design, allows to offload to hardware a large fraction of the complex transaction processing functionality that underutilizes the capabilities of modern general-purpose processors. Hence, in the last part, we make the case for a “bionic” database system design that enables operational analytics on truly live data.
Ippokratis Pandis is software engineer at Cloudera, working on the Impala project and focusing on scalable hardware-aware data management. Before he started riding Impalas, he was member of the research staff at IBM Almaden Research Center, where he was a member of the core team that designed and implemented the BLU column-store engine, which currently ships as part of IBM's DB2 LUW v10.5 with BLU Acceleration. Ippokratis received his PhD from the Electrical and Computer Engineering department at Carnegie Mellon University. He is the recipient of Best Demonstration awards at ICDE 2006 and SIGMOD 2011 and Best Paper runner up award at the Outrageous Ideas and Vision track of CIDR 2013. He has served or serving as PC chair for DaMoN 2014, DaMoN 2015 and PC area chair for the Main-memory, Parallel and Distributed Database systems area of CIKM 2014.