Archived Events

Archived Events

Feb 15

2021

Feb 15 2021
[Vaccination 2021] Star-Tree Index: Space-Time Trade Off in OLAP (Kishore Gopalakrishna)
Speaker:
Kishore Gopalakrishna
System:
Pinot
Video:
YouTube

The need for real-time analytics has proliferated in the modern data landscape. The industry is moving towards providing analytics to end-users via interactive apps instead of traditional dashboards. Whether it's user-facing analytical applications such as LinkedIn's "Who Viewed My Profile" or an internal monitoring tool used by Uber's city ops team to regulate trips in a region, it is imperative... Read More

Feb 8

2021

Feb 8 2021
[Vaccination 2021] Performance Testing at MongoDB (David Daly)
Speaker:
David Daly
System:
MongoDB
Video:
YouTube

It is important for developers to understand the performance of a software project as they develop new features, fix bugs, and try to generally improve the product. While it is simple to state that requirement, it can be hard to do in practice. There are a lot of choices an organization faces when trying to understand the performance of the... Read More

Feb 3

2021

Feb 3 2021
MS Thesis Defense: An Evaluation of Compilation-Based PL/PGSQL Execution (Tanuj Nayak)

User Defined Functions (UDFs) are an important analytical feature in modern Database Management Systems (DBMSs) due to their server-side execution properties. These properties allow complex analytical queries to execute without serializing intermediate data over a network. However, query engines often incur significant overheads when executing UDFs due to them being non-declarative in contrast to SQL queries. This contrast causes a... Read More

Feb 1

2021

Feb 1 2021
[Vaccination 2021] SLOG: Serializable, Low-latency, Geo-replicated Transactions (Daniel Abadi)
Speaker:
Daniel Abadi
System:
SLOG
Video:
YouTube

For decades, applications deployed on a world-wide scale have been forced to give up at least one of (1) strict serializability (2) low latency writes (3) high transactional throughput. This talk will overview SLOG: a system that avoids this tradeoff for workloads which contain physical region locality in data access. SLOG achieves high-throughput, strictly serializable ACID transactions at geo-replicated distance... Read More

Jan 25

2021

Jan 25 2021
NoisePage: The Self-Driving Database Management System (Lin Ma)
Speaker:
Lin Ma

Database management systems (DBMSs) are an important part of modern data-driven applications. However, they are notoriously difficult to deploy and administer. There are existing methods that recommend physical design or knob configurations for DBMSs. But most of them require humans to make final decisions and decide when to apply changes. The goal of a self-driving DBMS is to remove the... Read More

Dec 16

2020

Dec 16 2020
On Automatic Database Management System Tuning Using Machine Learning (Dana Van Aken)
Speaker:
Dana Van Aken

Database management systems (DBMSs) are an essential component of any data-intensive application. But tuning a DBMS to perform well is a notoriously difficult task because they have hundreds of configuration knobs that control aspects of their runtime behavior, such as cache sizes and how frequently to flush data to disk. Getting the right configuration for these knobs is hard because... Read More

Dec 14

2020

Dec 14 2020
TiDB – On the Long Journey of HTAP (Xiaoyu Ma)
Speaker:
Xiaoyu Ma
System:
TiDB
Video:
YouTube

Due to the rising demand for real-time analytics and insights on fresh data, the term HTAP becomes hot in recent years. From the very beginning, TiDB was designed for pure TP workload. But gradually as we adapt to users' requirements, TiDB evolves into an HTAP database based on Raft. We will introduce TiDB's design, internals, and HTAP architectural evolvement. This... Read More

Dec 7

2020

Dec 7 2020
[Fall 2020] A Peek into Snowflake’s Scalable Architecture
Speakers:
Martin Hentschel , Max Heimel
System:
Snowflake

Snowflake is an analytic data warehouse offered as a fully-managed service in the cloud. It is faster, easier to use, and far more scalable than traditional on-premise data warehouse offerings and is used by thousands of customers around the world. Snowflake's data warehouse is not built on an existing database or "big data" software platform such as Hadoop—it uses a... Read More

Nov 30

2020

Nov 30 2020
The Cascades Framework for Query Optimization at Microsoft (Nico Bruno + Cesar Galindo-Legaria)
Speakers:
Nico Bruno, Cesar Galindo-Legaria
System:
SQL Server
Video:
YouTube

The Cascades framework was an academic project introduced 25 years ago as a foundation for modern query optimizers. It provides extensibility, memoization-based dynamic programming, an algebraic representation of logical and physical operator trees, and manipulation of such trees using transformation rules to enable cost-based query optimization. Cascades provides a clean framework/skeleton for optimizer development, but it needs to be instantiated... Read More

Nov 23

2020

Nov 23 2020
ksqlDB: A Stream-Relational Database System
Speaker:
Matthias J. Sax
System:
ksqlDB
Video:
YouTube

ksqlDB is a distributed event streaming database system that allows users to express SQL queries over relational tables and event streams. The project was released by Confluent in 2017 and is hosted on Github and developed with an open-source spirit. ksqlDB is built on top of Apache Kafka, a distributed event streaming platform. In this talk, we discuss ksqlDB's architecture... Read More