Archived Events

Archived Events

Oct 19 2020
05:00pm EDT
Quarantine DB Talk 2020: FoundationDB or: How I Learned to Stop Worrying and Trust the Database

Getting multiple entities to work nicely together is a difficult task. This is true for machines as much as it is true for humans. This is why testing and debugging distributed systems is such a hard task. Even if well known algorithms are used, subtle bugs can introduce catastrophic failures. FoundationDB uses deterministic simulation to test these failures. This is the secret sauce that makes FoundationDB one of the most robust databases on the market. FoundationDB is a distributed key... Read More

Oct 12 2020
05:00pm EDT
Quarantine DB Talk 2020: Databricks: A Deep Dive into Spark SQL’s Catalyst Optimizer
Cheng Lian , Maryann Xue

Catalyst is the SQL query optimizer in Spark SQL. It is one of the most important components of Apache Spark, as it powers major Spark APIs like SQL, DataFrames/Datasets, as well as Structured Streaming. Unlike many traditional SQL systems, Spark enables users to query data in arbitrary formats stored in arbitrary locations at scale. While being powerful, this also imposes extra query planning challenges such as statistics collection and cost estimation, which further affect performance negatively. In this talk, we... Read More

Oct 5 2020
05:00pm EDT
Quarantine DB Talk 2020: Apache Arrow Flight: Accelerating Columnar Dataset Transport

In this talk I will discuss the role that Apache Arrow and Arrow Flight are playing to provide a faster and more efficient approach to building data services that transport large datasets. We'll look at the technical details of why the Arrow protocol is an attractive choice and look at specific examples of where Arrow has been employed for better performance and resource efficiency. Finally, I will discuss the implications for databases and the upcoming generation of data systems. This... Read More

Oct 2 2020
01:00pm EDT
Fall 2020: Prashanth Menon (CMU)

Just-in-time (JIT) query compilation is a technique to improve analytical query performance in database management systems (DBMSs). But the cost of compiling each query can be significant relative to its execution time. This overhead prohibits the DBMS from employing well-known adaptive query processing (AQP) methods to generate a new plan for a query if data distributions do not match the optimizer's estimations. The optimizer could eagerly generate multiple sub-plans for a query, but it can only include a few alternatives... Read More

Sep 28 2020
05:00pm EDT
Quarantine DB Talk 2020: CockroachDB’s Query Optimizer

We live in an increasingly interconnected world, with many organizations operating across countries or even continents. To serve their global user base, organizations are replacing their legacy DBMSs with cloud-based systems capable of scaling OLTP workloads to millions of users. CockroachDB is a scalable SQL DBMS that was built from the ground up to support these global OLTP workloads while maintaining high availability and strong consistency. Just like its namesake, CockroachDB is resilient to disasters through replication and automatic recovery... Read More

Sep 21 2020
05:00pm EDT
Quarantine DB Talk 2020: Query Optimization at Snowflake

In this talk, I will give an introduction to Snowflake's query optimizer. I will talk about the main features of Snowflake's optimizer, explain the main philosophy behind the design decisions, and delve into some unique aspects of the implementation. I will also later expand into our infrastructures to facilitate optimizer development and discuss the opportunities and challenges for implementing and rolling out query optimizations for cloud-based databases. This talk is part of the Quarantine Database Tech Talk Seminar Series. Zoom... Read More

Sep 14 2020
05:00pm EDT
Quarantine DB Talk 2020: CrocodileDB: Resource Efficient Database Execution

The coming end of Moore’s law requires that data systems be more judicious with computation and resources as the growth in data outpaces the availability of computational resources. Current database systems are eager and aggressively consume resources to immediately and quickly complete the task at hand. Intelligently deferring a task to a later point in time can increase result reuse, reduce work that might later be invalidated, or avoid unnecessary work altogether. In this talk I will introduce CrocodileDB, a... Read More

Aug 31 2020
04:30pm EDT
Quarantine DB Talk 2020: PlanetScale: Query Planning for a Sharded System like Vitess

Traditional query planning involves parsing of an input SQL into an AST, and then transforming it into primitives which can later be sent through an optimizer. However, in a sharded system, each leaf node is a full relational engine that is capable of doing its own optimizations. So, the traditional approach may not work for such a system. And who knows if the finally reconstructed query would be correctly optimized by the underlying engine? The Vitess VTGate proxy uses a... Read More

Aug 24 2020
04:30pm EDT
Quarantine DB Talk 2020: ScyllaDB — No-Compromise Performance

ScyllaDB is a distributed NoSQL database that provides high availability, multiple consistency models, and high performance. This talk will cover how ScyllaDB approaches performance: thread-per-core, fully asynchronous operation, and kernel bypass. This talk is part of the Quarantine Database Tech Talk Seminar Series. Zoom Link: https://cmu.zoom.us/j/562649242 (Password 264771) Read More

Aug 17 2020
04:30pm EDT
Quarantine DB Talk 2020: TerminusDB: Building a Native Revision Control DB from Scratch

Revision control and CI/CD has completely transformed the way we write and deliver software. Yet data management has not kept pace with these changes. Data assets are still managed using RDBMSs, or worse, with CSVs or Excel spreadsheets. Since many software applications rely on data assets, this presents a serious problem for data-driven software. TerminusDB is a graph database which provides distributed revision control features natively, enabling us to lift best practices for CI/CD to data-driven applications. It supports a... Read More