Archived Events

Archived Events

Nov 4

2025

Nov 4 2025
Real Time Analytics Query Architecture Evolution @ Uber (Ankit Sultana)
Speaker:
Ankit Sultana
System:
Pinot
Video:
YouTube

We will talk about how Apache Pinot's query feature set has grown tremendously over the past few years and how that growth has shaped Uber's Real Time Analytics Query Architecture. We will dive into the different query engines in Apache Pinot and briefly discuss our legacy and unique Presto over Pinot architecture. Read More

Nov 3

2025

Nov 3 2025
[Future Data] Multi-statement Transactions in the Databricks Lakehouse
Speaker:
Ryan Johnson
System:
Delta Lake
Video:
YouTube

The data lake architecture originally focused on self-standing tables in cloud storage, with catalogs as mere discovery aids. Modern lakehouse architectures add an ever-growing set of data warehousing capabilities to that original value proposition. Historically a key missing piece was multi-statement transactions -- Delta Lake supported single-statement single-table transactions, with ACID properties for changes made to that table. Sophisticated MERGE... Read More

Nov 3

2025

Nov 3 2025
Transactions and Coordination in Aurora DSQL
Speaker:
Marc Brooker
System:
DSQL

Aurora DSQL is a new global, serverless, scalable relational database system, built at AWS. In this talk, I’ll dive into the architecture of DSQL, how it handles transactions, and how and why it was designed to minimize coordination. We’ll touch on transaction protocols, isolation, and virtualization. Read More

Oct 27

2025

Oct 27 2025
[Future Data] Storage Metadata for Modern Cloud Databases
Speaker:
Joyo Victor
System:
SingleStore
Video:
YouTube

In modern database architecture, separating compute from storage unlocks powerful capabilities. Our tiered storage, “bottomless”, started by uploading files to remote object storage. This worked well until we wanted to create database branches pointing to the same remote storage. One branch does not know if it can delete a file that another branch depends on. To solve this, we built... Read More

Oct 21

2025

Oct 21 2025
[Fall 2025] Astronomer / Apache AirFlow Tech Talk
Speaker:
Julian LaNeve
System:
AirFlow

Apache Airflow is the most popular data orchestration tool there is, downloaded over 40m times per month and used to power the data, ML, and AI platforms at OpenAI, Lyft, Airbnb, Uber, and Apple. At its core, Airflow allows you to define data workflows as DAGs using Python. We’ll do a deep dive on how Airflow came to be and... Read More

Oct 20

2025

Oct 20 2025
[Future Data] Where We’re Going, We Don’t Need Rows: Columnar Data Connectivity with ADBC
Speaker:
Ian Cook
System:
Arrow
Video:
YouTube

ADBC (Arrow Database Connectivity) is Apache Arrow’s answer to ODBC and JDBC: It’s a database access API and driver standard that delivers data in Arrow columnar format instead of a row-oriented format. ADBC is on a roll, speeding and simplifying data access for dbt, Databricks, DuckDB, Microsoft, Snowflake, and more. This talk presents the architecture of ADBC (APIs, drivers, driver... Read More

Oct 13

2025

Oct 13 2025
[Future Data] Vortex: LLVM for File Formats
Speaker:
Will Manning
System:
Vortex
Video:
YouTube

Apache Parquet revolutionized columnar storage after its initial release in 2013, but has largely failed to evolve since then. As a result, nearly every Tier 1 tech company has built their own columnar format to replace Parquet. Enter Vortex, a Linux Foundation project that currently achieves 100x faster random access, 10-20x faster scans, and 5x higher write throughput, while maintaining... Read More

Oct 6

2025

Oct 6 2025
[Future Data] DuckLake: Learning from Cloud Data Warehouses to Build a Robust “Lakehouse”
Speaker:
Jordan Tigani
System:
MotherDuck
Video:
YouTube

When building scalable data systems, it is easy to focus on the storage and the compute, but metadata a critical third piece that is often overlooked. This talk will describe how metadata storage enables query performance and helps provide transactional semantics in modern data warehouses. We will then go into how the metadata story in popular open data formats take... Read More

Sep 29

2025

Sep 29 2025
[Future Data] Apache Hudi: A Database Layer over Cloud Storage for Fast Mutations and Efficient Queries
Speaker:
Vinoth Chandar
System:
Hudi
Video:
YouTube

Data lakes emerged as a way to store vast amounts of data as files and objects on infinitely scalable cloud storage, with processing done on scalable distributed compute engines. However, this architecture lacks many of the capabilities of traditional databases, such as efficient mutations, indexing, and transaction management. Apache Hudi was created as the first "lakehouse" project, to bridge this... Read More

Sep 23

2025

Sep 23 2025
[Fall 2025] On Holistic Database Optimization via Leveraging Similarity Across Actions, Workloads, Configurations, and Scenarios (William Zhang)
Speaker:
William Zhang

Modern database management systems (DBMSs) have evolved to support increasingly sophisticated data-intensive applications, at the cost of substantial complexity to configure them for two reasons. First, DBMSs expose a vast configuration space with trillions of possibilities that encompass system knobs, physical design (e.g., indexes), and query options, amongst others. Second, these applications are constantly evolving with changes in data access... Read More