Shasank Chavan (Oracle)
Analytic workloads in data management systems are dominated by joins, aggregations, scan and filtering costs. In-Memory columnar databases have significantly optimized scans using compressed data formats and SIMD vectorization techniques, but have made little impact to the rest of the query execution plan. The Oracle Database In-Memory (DBIM) Option introduced new SQL execution operators that accelerate a wide range of analytic queries by optimizing aggregation over joins for star and similar schemas. Group-by expressions are pushed down into the scans of dimension tables, creating a unique key per distinct group called a Dense Grouping Key (DGK). A structure called a Key Vector is allocated that maps join keys to DGKs, which is used to filter non-matching rows during the fact table scan. Passing rows are then aggregated directly on compressed codes into DGK-indexed result buffers using SIMD and other novel aggregation techniques. Our solution replaces traditional join and group-by processing (bloom filters, hash table build and probe, serial aggregation) with blazing fast inlined scan operators. Our technique can drastically reduce query elapsed time by more than 10x, making real-time analytics truly achievable.
Shasank Chavan is an architect and director of the In-memory Data Technologies group at Oracle Corporation. He is primarily responsible for driving and delivering core performance-critical and customer-facing data layer features in Oracle's Database In-Memory option. His team designs and develops CPU-specific "software-in-silicon" libraries for columnar data evaluation, optimized data formats and compression technology for efficient in-memory storage, algorithms and techniques for fast in-memory join and aggregation processing, multi-threaded scan execution engine with push-down technology, and optimized in-memory data access solutions in general. Shasank has over 17 years of experience working on systems software technology.