[DB Seminar] Spring 2017: Marcel Kornacker
Running real-time data-intensive applications on Apache Hadoop requires complex architectures to store and query data, typically involving multiple independent systems that are tied together through custom-engineered pipelines. A common pattern is to use a NoSQL engine like Apache HBase for caching and later transformations, the results of which are periodically written to HDFS in one of the popular open columnar file formats as a prerequisite for querying by a SQL engine.
Apache Kudu (incubating), a new scalable distributed storage engine designed for the Hadoop environment, gives the user low-latency single-row access as well as high-throughput bulk data scans. Integrated with Apache Impala (incubating), these capabilities are made available to the user via standard SQL language elements for updates and querying, combining the flexible update functionality of an RDBMS with the performance of a parallel analytic database system.
Marcel Kornacker is a tech lead at Cloudera and the architect of Apache Impala (incubating). Marcel has held engineering jobs at a few database-related startup companies and at Google, where he worked on several ad-serving and storage infrastructure projects. His last engagement was as the tech lead for the distributed query engine component of Google’s F1 project. Marcel holds a PhD in databases from UC Berkeley.