Archived Events

Archived Events

Sep 21

2020

Sep 21 2020
Query Optimization at Snowflake
Speaker:
Jiaqi Yan
System:
Snowflake
Video:
YouTube

In this talk, I will give an introduction to Snowflake's query optimizer. I will talk about the main features of Snowflake's optimizer, explain the main philosophy behind the design decisions, and delve into some unique aspects of the implementation. I will also later expand into our infrastructures to facilitate optimizer development and discuss the opportunities and challenges for implementing and... Read More

Sep 14

2020

Sep 14 2020
CrocodileDB: Resource Efficient Database Execution
Speaker:
Aaron Elmore
System:
CrocodileDB

The coming end of Moore’s law requires that data systems be more judicious with computation and resources as the growth in data outpaces the availability of computational resources. Current database systems are eager and aggressively consume resources to immediately and quickly complete the task at hand. Intelligently deferring a task to a later point in time can increase result reuse,... Read More

Aug 31

2020

Aug 31 2020
PlanetScale: Query Planning for a Sharded System like Vitess
Speaker:
Sugu Sougoumarane
System:
PlanetScale
Video:
YouTube

Traditional query planning involves parsing of an input SQL into an AST, and then transforming it into primitives which can later be sent through an optimizer. However, in a sharded system, each leaf node is a full relational engine that is capable of doing its own optimizations. So, the traditional approach may not work for such a system. And who... Read More

Aug 24

2020

Aug 24 2020
ScyllaDB — No-Compromise Performance
Speaker:
Avi Kivity
System:
ScyllaDB
Video:
YouTube

ScyllaDB is a distributed NoSQL database that provides high availability, multiple consistency models, and high performance. This talk will cover how ScyllaDB approaches performance: thread-per-core, fully asynchronous operation, and kernel bypass. This talk is part of the Quarantine Database Tech Talk Seminar Series. Zoom Link: https://cmu.zoom.us/j/562649242 (Password 264771) Read More

Aug 17

2020

Aug 17 2020
TerminusDB: Building a Native Revision Control DB from Scratch
Speaker:
Gavin Mendel-Gleason
System:
TerminusDB
Video:
YouTube

Revision control and CI/CD has completely transformed the way we write and deliver software. Yet data management has not kept pace with these changes. Data assets are still managed using RDBMSs, or worse, with CSVs or Excel spreadsheets. Since many software applications rely on data assets, this presents a serious problem for data-driven software. TerminusDB is a graph database which... Read More

Aug 10

2020

Aug 10 2020
Splice Machine – An HTAP DB at Scale
Speakers:
Daniel Gómez Ferro , Yi Xia
System:
Splice Machine
Video:
YouTube

Emerging modern applications routinely depend on data at scale for AI and ML and therefore require HTAP (Hybrid Transactional/Analytical Processing) database systems to make real-time business decisions. In this talk, we would like to introduce Splice Machine, an HTAP DB at scale, our mission and architecture. We will also dive into the topics of our transaction mechanism, query optimization and... Read More

Aug 3

2020

Aug 3 2020
YugabyteDB: Bringing Together the Best of Amazon Aurora and Google Spanner
Speaker:
Karthik Ranganathan
System:
YugabyteDB
Video:
YouTube

PostgreSQL, a single-node open-source RDBMS, is widely adopted for its powerful set of features. However, PostgreSQL is not built to be used as a cloud-native database, and therefore cannot inherently survive failures, scale horizontally or support geo-distributed deployments. While Amazon Aurora has modified the subsystem of PostgreSQL that writes to disk along with simplifying async replication to make the database resilient... Read More

Jul 27

2020

Jul 27 2020
Black-box Isolation Checking with Elle
Speaker:
Kyle Kingsbury
System:
Jepsen
Video:
YouTube

Databases are awful. They lose information, corrupt state, and do other terrible things, both by design and by accident. You'd think that *testing* databases to see how awful they are would help make them better, but it turns out that testing most of the useful database safety properties is *also* awful. We came up with a better way to test... Read More

Jul 24

2020

Jul 24 2020
MS Thesis Defense: Filter Representation in Vectorized Query Execution (Amadou Ngom)
Speaker:
Amadou Ngom

Advances in memory capacity have allowed Database Management Systems (DBMSs) to store large amounts of data in memory, thereby shifting the performance bottleneck of query execution from disk accesses to CPU efficiency (i.e., instruction count and cycles per instruction). One technique used to achieve such efficiency in analytical applications is batch-oriented processing or vectorization: it reduces interpretation overhead, improves cache... Read More

Jul 20

2020

Jul 20 2020
Rockset: Realtime Indexing for fast queries on massive semi-structured data
Speaker:
Dhruba Borthakur
System:
Rockset
Video:
YouTube

Rockset is a realtime indexing database that powers fast SQL over semi-structured data such as JSON, Parquet, or XML without requiring any schematization. All data loaded into Rockset are automatically indexed and a fully featured SQL engine powers fast queries over semi-structured data without requiring any database tuning. Rockset exploits the hardware fluidity available in the cloud and automatically grows... Read More