Archived Events

Archived Events

Nov 16

2020

Nov 16 2020
Fauna: Lessons Learned Building a Real World, Calvin-based System
Speaker:
Matt Freels
System:
Fauna
Video:
YouTube

Fauna is a NoSQL Database-as-an-API service which supports consistent, global database access for OLTP workloads. While there are many aspects of Fauna which make it unique among similar systems, one in particular is its use of Calvin, a deterministic transaction resolution protocol which underpins its strict-serializability guarantees. This talk will give an overview of Fauna's architecture, why we chose Calvin... Read More

Nov 10

2020

Nov 10 2020
Self-Driving Database Management Systems: Forecasting, Modeling, and Planning (Lin Ma)
Speaker:
Lin Ma

Database management systems (DBMSs) are an important part of modern data-driven applications. However, they are notoriously difficult to deploy and administer. There are existing methods that recommend physical design or knob configurations for DBMSs. But most of them require humans to make final decisions and decide when to apply changes. Furthermore, they either (1) only focus on a single aspect... Read More

Nov 9

2020

Nov 9 2020
EraDB: Designing Systems for Cardinality and Dimensionality
Speaker:
Todd Persen
System:
EraDB
Video:
YouTube

EraDB is a distributed database designed for petabyte-scale, schemaless data that leverages cloud-native object storage for global persistence. In this talk, Todd will discuss the historical origins of EraDB and delve into how it is designed to handle high-cardinality and high-dimensionality data within a flexible, horizontally-scalable architecture. This talk is part of the Quarantine Database Tech Talk Seminar Series. Zoom... Read More

Nov 2

2020

Nov 2 2020
Refactoring Query Processing in MySQL
Speaker:
Norvald H. Ryeng
System:
MySQL
Video:
YouTube

MySQL is often called the world's most popular open source DBMS, and it's certainly one of the most used. MySQL grew up with the open source movement and the public Internet and became a part of the famous LAMP stack. Today, MySQL server are still powering a huge number of web sites. A lot has changed in MySQL in the... Read More

Oct 26

2020

Oct 26 2020
Datometry Hyper-Q: Virtualizing the World’s Enterprise Data Warehouses
Speaker:
Lyublena Antova
System:
Datometry
Video:
YouTube

Enterprises worldwide are looking to move their database applications to the cloud. However, conventional migration from an on-premise data warehouse to a cloud-native one is a costly, labor-intensive task, laden with many risks. According to Gartner, the majority of these migrations are late, run over budget, or fail altogether. Datometry has developed a virtualization platform that enables applications written for... Read More

Oct 19

2020

Oct 19 2020
FoundationDB or: How I Learned to Stop Worrying and Trust the Database
Speaker:
Markus Pilman
System:
FoundationDB
Video:
YouTube

Getting multiple entities to work nicely together is a difficult task. This is true for machines as much as it is true for humans. This is why testing and debugging distributed systems is such a hard task. Even if well known algorithms are used, subtle bugs can introduce catastrophic failures. FoundationDB uses deterministic simulation to test these failures. This is... Read More

Oct 12

2020

Oct 12 2020
Databricks: A Deep Dive into Spark SQL’s Catalyst Optimizer
Speakers:
Cheng Lian , Maryann Xue
System:
Databricks
Video:
YouTube

Catalyst is the SQL query optimizer in Spark SQL. It is one of the most important components of Apache Spark, as it powers major Spark APIs like SQL, DataFrames/Datasets, as well as Structured Streaming. Unlike many traditional SQL systems, Spark enables users to query data in arbitrary formats stored in arbitrary locations at scale. While being powerful, this also imposes... Read More

Oct 5

2020

Oct 5 2020
Apache Arrow Flight: Accelerating Columnar Dataset Transport
Speaker:
Wes McKinney
System:
Arrow
Video:
YouTube

In this talk I will discuss the role that Apache Arrow and Arrow Flight are playing to provide a faster and more efficient approach to building data services that transport large datasets. We'll look at the technical details of why the Arrow protocol is an attractive choice and look at specific examples of where Arrow has been employed for better... Read More

Oct 2

2020

Oct 2 2020
Fall 2020: Prashanth Menon (CMU)
Speaker:
Prashanth Menon

Just-in-time (JIT) query compilation is a technique to improve analytical query performance in database management systems (DBMSs). But the cost of compiling each query can be significant relative to its execution time. This overhead prohibits the DBMS from employing well-known adaptive query processing (AQP) methods to generate a new plan for a query if data distributions do not match the... Read More

Sep 28

2020

Sep 28 2020
CockroachDB’s Query Optimizer
Speaker:
Rebecca Taft
System:
CockroachDB
Video:
YouTube

We live in an increasingly interconnected world, with many organizations operating across countries or even continents. To serve their global user base, organizations are replacing their legacy DBMSs with cloud-based systems capable of scaling OLTP workloads to millions of users. CockroachDB is a scalable SQL DBMS that was built from the ground up to support these global OLTP workloads while... Read More