[DB Seminar] Spring 2018: Stephen Walkauskas (Vertica)

Event Date: Monday March 19, 2018
Event Time: 04:30pm
Location: GHC 8102
Speaker: Stephen Walkauskas

Title: Data Lake Analytics

In the beginning there was a DBMS, a flexible piece of software that could be used for OLTP and OLAP workloads. When transaction throughput increased and data sizes grew the database needed to be split into two, each instance optimized for a particular workload. And so it has been ever since and the distance between the two systems has increased, and now specialized database software is offered. What’s more, data hoarders have given rise to a need for a third system, the “data lake”.

This talk outlines the architecture of one database product, Vertica, that is optimized for analytic workloads. The key differences between traditional analytic database management software and the data lake are defined along with the changes made to Vertica to enable it to efficiently query data in the lake. Comparisons are made between popular data lake query engines. Finally, a roadmap is presented for bringing the conveniences offered by database management software to the data lake.

Stephen Walkauskas is a Software Engineering Manager at Vertica (that's his official title but really he's a Software Engineer who knows how much everyone on the team makes). It goes without saying that he has spent less time in prison than Andy Pavlo. Stephen joined Vertica when it was a tiny start-up, before the first version of the product shipped. A Pittsburgh native, he started a development office in the Steel City and has assembled an awesome team to work on Hadoop integration, recovery, backup and restore and storage optimization. Prior to joining Vertica he was one of the very first employees at Endeca. During his eight year span the company grew from single to triple digit number of employees and was later acquired by Oracle (after which there was a middle digit number of employees). Stephen holds a B.S. in Physics from Boston College.