[Hardware Accelerated Databases] Felipe Aramburu (BlazingDB)
BlazingDB has spent the past six months working on an open-source project (libgdf) alongside Anaconda and Nvidia. Libgdf is a library of computational primitives on top of a memory layout which is similar to Apache Arrow but optimized for GPUs. We have created a distributed, GPU-accelerated ETL pipeline that takes a user from reading data in Parquet, to performing SQL operations over that dataset, and finally feeding that data into xgboost, a machine learning library that allows us to leverage GPUs.
In this talk, we will present the design and implementation of BlazingDB for GPU query processing. We will discuss how BlazingDB performs query optimization, distributes workloads over compute resources, and communicates between the different layer. We will also present our methods for using latbuffers for CPU data and Cuda IPC for GPU data. Lastly, we will described our relational algebra engine that operates on data via Cuda IPC, interprets query plans, stores results sets. We will leverage the solutions mentioned above to accelerate a machine learning use case using the xgBoost library.
Part of Hardware Accelerated Database Lectures 2018 Seminar Series
Felipe is a maker. From aquaponics, beer and cheese-making to home automation. He is obsessed with creating. Before being CTO of BlazingDB he and his brother had a consulting company based out of Peru where they originally built BlazingDB as a tool to help them with their own consulting work. Before this he was the CTO of kWhOURs which provided a SaaS solution for energy auditing. Through BlazingDB he has become a high performance junkie that spends nights dreaming about how hybrid processing systems are going to change the world.