Fall 2020: Prashanth Menon (CMU)
Just-in-time (JIT) query compilation is a technique to improve analytical query performance in database management systems (DBMSs). But the cost of compiling each query can be significant relative to its execution time. This overhead prohibits the DBMS from employing well-known adaptive query processing (AQP) methods to generate a new plan for a query if data distributions do not match the optimizer's estimations. The optimizer could eagerly generate multiple sub-plans for a query, but it can only include a few alternatives... Read More
Quarantine DB Talk 2020: CockroachDB’s Query Optimizer
We live in an increasingly interconnected world, with many organizations operating across countries or even continents. To serve their global user base, organizations are replacing their legacy DBMSs with cloud-based systems capable of scaling OLTP workloads to millions of users. CockroachDB is a scalable SQL DBMS that was built from the ground up to support these global OLTP workloads while maintaining high availability and strong consistency. Just like its namesake, CockroachDB is resilient to disasters through replication and automatic recovery... Read More
Quarantine DB Talk 2020: Query Optimization at Snowflake
In this talk, I will give an introduction to Snowflake's query optimizer. I will talk about the main features of Snowflake's optimizer, explain the main philosophy behind the design decisions, and delve into some unique aspects of the implementation. I will also later expand into our infrastructures to facilitate optimizer development and discuss the opportunities and challenges for implementing and rolling out query optimizations for cloud-based databases. This talk is part of the Quarantine Database Tech Talk Seminar Series. Zoom... Read More
Quarantine DB Talk 2020: CrocodileDB: Resource Efficient Database Execution
The coming end of Moore’s law requires that data systems be more judicious with computation and resources as the growth in data outpaces the availability of computational resources. Current database systems are eager and aggressively consume resources to immediately and quickly complete the task at hand. Intelligently deferring a task to a later point in time can increase result reuse, reduce work that might later be invalidated, or avoid unnecessary work altogether. In this talk I will introduce CrocodileDB, a... Read More
Quarantine DB Talk 2020: PlanetScale: Query Planning for a Sharded System like Vitess
Traditional query planning involves parsing of an input SQL into an AST, and then transforming it into primitives which can later be sent through an optimizer. However, in a sharded system, each leaf node is a full relational engine that is capable of doing its own optimizations. So, the traditional approach may not work for such a system. And who knows if the finally reconstructed query would be correctly optimized by the underlying engine? The Vitess VTGate proxy uses a... Read More
Quarantine DB Talk 2020: ScyllaDB — No-Compromise Performance
ScyllaDB is a distributed NoSQL database that provides high availability, multiple consistency models, and high performance. This talk will cover how ScyllaDB approaches performance: thread-per-core, fully asynchronous operation, and kernel bypass. This talk is part of the Quarantine Database Tech Talk Seminar Series. Zoom Link: https://cmu.zoom.us/j/562649242 (Password 264771) Read More
Quarantine DB Talk 2020: TerminusDB: Building a Native Revision Control DB from Scratch
Revision control and CI/CD has completely transformed the way we write and deliver software. Yet data management has not kept pace with these changes. Data assets are still managed using RDBMSs, or worse, with CSVs or Excel spreadsheets. Since many software applications rely on data assets, this presents a serious problem for data-driven software. TerminusDB is a graph database which provides distributed revision control features natively, enabling us to lift best practices for CI/CD to data-driven applications. It supports a... Read More
Quarantine DB Talk 2020: Splice Machine – An HTAP DB at Scale
Emerging modern applications routinely depend on data at scale for AI and ML and therefore require HTAP (Hybrid Transactional/Analytical Processing) database systems to make real-time business decisions. In this talk, we would like to introduce Splice Machine, an HTAP DB at scale, our mission and architecture. We will also dive into the topics of our transaction mechanism, query optimization and dual execution engine support. Zoom Link: https://cmu.zoom.us/j/562649242 (Password 264771) Read More
[DB Seminar] Spring 2020 DB Group: YugabyteDB: Bringing Together the Best of Amazon Aurora and Google Spanner
PostgreSQL, a single-node open-source RDBMS, is widely adopted for its powerful set of features. However, PostgreSQL is not built to be used as a cloud-native database, and therefore cannot inherently survive failures, scale horizontally or support geo-distributed deployments. While Amazon Aurora has modified the subsystem of PostgreSQL that writes to disk along with simplifying async replication to make the database resilient to failures, it does not address horizontal scalability or geo-distribution. Google Spanner addresses all of these features, however it does... Read More
[DB Seminar] Spring 2020 DB Group: Black-box Isolation Checking with Elle
Databases are awful. They lose information, corrupt state, and do other terrible things, both by design and by accident. You'd think that *testing* databases to see how awful they are would help make them better, but it turns out that testing most of the useful database safety properties is *also* awful. We came up with a better way to test databases, called Elle. Elle finds, graphs, and explains a wealth of isolation violations by mapping observed histories to Adya-style dependency... Read More