[PDL Visit Day 2018] Zahra Khatami (Oracle)
SPDK has been successful in enabling a large class of high performance user mode storage applications and appliance. SPDK provides direct access to local NVMe SSDs as well as access to remote storage targets using NVMeoF. SPDK provides a highly concurrent and asynchronous runtime with no locking in the I/O path. High throughput and low latency is realized by directly polling the hardware queues for completions. DPDK toolkit is used for memory management and lock free message passing between compute threads for efficient scale out designs.
In this talk we will go over the challenges faced in incorporating SPDK within a complex enterprise class application: Oracle RDBMS. There is a significant impedance mismatch in deployment models of classical SPDK enabled applications and Oracle process and memory model. Oracle database consists of a large number of processes (over 10K+ processes) per node and a large System Global Area (SGA) that is symmetrically mapped into each process. Oracle RDBMS implements a comprehensive memory management infrastructure spanning SGA, process private memory (PGA) as well as a Managed Global Area (MGA) optimized for efficient data transfer over high performance networks such as Infiniband and RoCE.
Most NVMe SSDs contain a limited number of hardware IO queues. To enable high performance IO dispatch and completion from a large number of processes to local NVMe drives Oracle has implemented a dispatcher model. The IO dispatcher provides light weight dispatch of IOs using shared memory lock free queues and polling for completions. Oracle dispatcher implements various scheduling policies to optimize for overall throughput and latency dependent on workload as well as QoS aware scheduling of IOs.
For seamless integration with the memory model and RDMA data transfer optimizations available in Oracle we have worked with the SPDK community to decouple the storage libraries from the underlying DPDK runtime that provides the memory management, threading and messaging primitives. Oracle has implemented a SPDK environment library (ORAENV) that provides similar features to DPDK RTE environment using Oracle runtime services. The ORAENV library provides dynamic allocation of shared memory from SGA, PGA and MGA pools that is optimized for local storage IO as well as remote network IO with NVMeoF.
High performance NVMeoF communications using ORAENV is accomplished using optimizations such as Shared Protection Domain which allows a single memory mapping to be registered with the RDMA adapter for use by all Oracle processes. This significantly reduces the Memory Translation Table (MTT) cache thrashing on the RDMA adapter which has been shown to be a bottleneck in IO throughput with increased cache misses.
Zahra Khatami is working as a Member of Technical Staff at Virtual Operating system (VOS) group at Oracle. She has received her PHD and Master degrees from Louisiana State University in the field of computer science. She is currently working on developing a framework for supporting SPDK in Oracle Database.