Archived Events

Archived Events

Nov 17

2025

Nov 17 2025
Cortex AISQL: A Production SQL Engine for Unstructured Data
Speaker:
Anupam Datta
System:
Snowflake

Snowflake’s Cortex AISQL is a production SQL engine that integrates native semantic operations directly into SQL. This integration allows users to write declarative queries that combine relational operations with semantic reasoning, enabling them to query both structured and unstructured data effortlessly. However, making semantic operations efficient at production scale poses fundamental challenges. Semantic operations are more expensive than traditional SQL... Read More

Nov 11

2025

Nov 11 2025
[Fall 2025] Open Data Infrastructure with Iceberg and dbt
Speaker:
Connor McArthur
System:
dbt

Apache Iceberg is now interoperable with most modern data platforms and compute systems. While Iceberg enables powerful new capabilities, real-world adoption still presents challenges for many organizations. In this talk, we will unpack Iceberg's architecture; demonstrate a novel architecture where multiple compute systems connect to the same underlying Iceberg catalog; and discuss the maturity and continued investment needed to ensure... Read More

Nov 10

2025

Nov 10 2025
[Future Data] Mooncake: Real-Time Apache Iceberg Without Compromise
Speaker:
Cheng Chen
System:
Mooncake
Video:
YouTube

Apache Iceberg is great for large-scale analytics, but it was built for batch workloads. For streaming use cases, keeping tables fresh means writing snapshots more often, which creates excess small Parquet files, bloated metadata, and costly compaction that never ends. Updates and deletes make things worse because equality deletes push the burden to query engines, leaving readers slow and inefficient.... Read More

Nov 4

2025

Nov 4 2025
Real Time Analytics Query Architecture Evolution @ Uber (Ankit Sultana)
Speaker:
Ankit Sultana
System:
Pinot
Video:
YouTube

We will talk about how Apache Pinot's query feature set has grown tremendously over the past few years and how that growth has shaped Uber's Real Time Analytics Query Architecture. We will dive into the different query engines in Apache Pinot and briefly discuss our legacy and unique Presto over Pinot architecture. Read More

Nov 3

2025

Nov 3 2025
[Future Data] Multi-statement Transactions in the Databricks Lakehouse
Speaker:
Ryan Johnson
System:
Delta Lake
Video:
YouTube

The data lake architecture originally focused on self-standing tables in cloud storage, with catalogs as mere discovery aids. Modern lakehouse architectures add an ever-growing set of data warehousing capabilities to that original value proposition. Historically a key missing piece was multi-statement transactions -- Delta Lake supported single-statement single-table transactions, with ACID properties for changes made to that table. Sophisticated MERGE... Read More

Nov 3

2025

Nov 3 2025
Transactions and Coordination in Aurora DSQL
Speaker:
Marc Brooker
System:
DSQL

Aurora DSQL is a new global, serverless, scalable relational database system, built at AWS. In this talk, I’ll dive into the architecture of DSQL, how it handles transactions, and how and why it was designed to minimize coordination. We’ll touch on transaction protocols, isolation, and virtualization. Read More

Oct 27

2025

Oct 27 2025
[Future Data] Storage Metadata for Modern Cloud Databases
Speaker:
Joyo Victor
System:
SingleStore
Video:
YouTube

In modern database architecture, separating compute from storage unlocks powerful capabilities. Our tiered storage, “bottomless”, started by uploading files to remote object storage. This worked well until we wanted to create database branches pointing to the same remote storage. One branch does not know if it can delete a file that another branch depends on. To solve this, we built... Read More

Oct 21

2025

Oct 21 2025
[Fall 2025] Astronomer / Apache AirFlow Tech Talk
Speaker:
Julian LaNeve
System:
AirFlow

Apache Airflow is the most popular data orchestration tool there is, downloaded over 40m times per month and used to power the data, ML, and AI platforms at OpenAI, Lyft, Airbnb, Uber, and Apple. At its core, Airflow allows you to define data workflows as DAGs using Python. We’ll do a deep dive on how Airflow came to be and... Read More

Oct 20

2025

Oct 20 2025
[Future Data] Where We’re Going, We Don’t Need Rows: Columnar Data Connectivity with ADBC
Speaker:
Ian Cook
System:
Arrow
Video:
YouTube

ADBC (Arrow Database Connectivity) is Apache Arrow’s answer to ODBC and JDBC: It’s a database access API and driver standard that delivers data in Arrow columnar format instead of a row-oriented format. ADBC is on a roll, speeding and simplifying data access for dbt, Databricks, DuckDB, Microsoft, Snowflake, and more. This talk presents the architecture of ADBC (APIs, drivers, driver... Read More

Oct 13

2025

Oct 13 2025
[Future Data] Vortex: LLVM for File Formats
Speaker:
Will Manning
System:
Vortex
Video:
YouTube

Apache Parquet revolutionized columnar storage after its initial release in 2013, but has largely failed to evolve since then. As a result, nearly every Tier 1 tech company has built their own columnar format to replace Parquet. Enter Vortex, a Linux Foundation project that currently achieves 100x faster random access, 10-20x faster scans, and 5x higher write throughput, while maintaining... Read More