Events

[Fall 2020] A Peek into Snowflake’s Scalable Architecture

Event Date: Monday December 7, 2020
Event Time: 03:20pm EDT
Location: https://cmu.zoom.us/j/93976830146?pwd=eUZvTzh0aUMrN28yWnRaR1pUbjJ5dz09
Speaker: Martin Hentschel + Max Heimel

Title: A Peek Into Snowflake's Scalable Architecture

Snowflake is an analytic data warehouse offered as a fully-managed service in the cloud. It is faster, easier to use, and far more scalable than traditional on-premise data warehouse offerings and is used by thousands of customers around the world. Snowflake’s data warehouse is not built on an existing database or “big data” software platform such as Hadoop—it uses a new SQL database engine with a unique architecture designed for the cloud. Snowflake operates three engineering centers in San Mateo, CA; Bellevue, WA; and Berlin, Germany.

This talk provides an overview of Snowflake’s architecture that was designed to efficiently support complex analytical workloads in the cloud. Looking at the lifecycle of micro partitions, this talk explains pruning, zero-copy cloning, and instant time travel. Pruning is a technique to speed up query processing by filtering out unnecessary micro partitions during query compilation. Zero-copy cloning allows to create logical copies of the data without duplicating physical storage. Instant time travel enables the user to query data “as of” a time in the past, even if the current state of the data has changed. This talk also shows how micro partitions tie into Snowflake’s unique architecture of separation of storage and compute, and enable advanced features such as automatic clustering.

Bio:

Martin Hentschel received a PhD in Computer Science from the Systems Group at ETH Zurich in 2012. In the following he worked at Microsoft where he built products integrating data from social networks into the Bing search engine. In 2014, he joined Snowflake where he is working on security, meta data management, and stateful micro services.

Max Heimel holds a PhD in Computer Science from the Database and Information Management Group at TU Berlin. He joined Snowflake in 2015 and is working primarily in the areas of query execution and query optimization. Before joining Snowflake, Max worked at IBM and spent several internships at Google.

More Info: https://15445.courses.cs.cmu.edu/fall2020/schedule.html#dec-07-2020