Archived Events

Archived Events

Dec 15

2025

Dec 15 2025
PhD Defense: Database Gyms: Towards Autonomous Database Tuning (Wan Shen Lim)
Speaker:
Wan Shen Lim

Database management systems (DBMSs) are the foundation of modern data-intensive applications. But as more features are developed to support new workloads, they become increasingly complex and difficult to configure. Thus, researchers have invested decades of effort into autonomous DBMS configuration. Recent advances in machine learning (ML) have produced tools that outperform unassisted experts in real-world deployments. However, these tools are... Read More

Dec 8

2025

Dec 8 2025
[Future Data] Apache Fluss: A Streaming Storage for Real-Time Lakehouse
Speaker:
Jark Wu
System:
Fluss
Video:
YouTube

Modern data lakehouses promise unified batch and streaming processing, yet their storage layer remains inherently batch-oriented—optimized for large, immutable files. This mismatch forces streaming workloads to rely on external systems (e.g., Kafka), while analytical queries operate on stale snapshots, breaking end-to-end freshness. In this talk, I’ll present Apache Fluss (incubating), a lakehouse-native streaming storage system designed to bridge this gap.... Read More

Dec 1

2025

Dec 1 2025
[Future Data] From Storage Formats to Open Governance: The Evolution to Apache Polaris
Speaker:
Prashant Singh
System:
Polaris
Video:
YouTube

As organizations build their data lakehouses on Apache Iceberg, the primary challenge shifts from managing individual files to orchestrating a cohesive ecosystem of tables. How can you guarantee consistency and enable complex operations when multiple data engines—like Spark, Trino, and Flink—need to interact with the same data concurrently? The answer lies in a standardized service layer, defined by the Iceberg... Read More

Nov 24

2025

Nov 24 2025
[Future Data] Reconstructing History with XTDB
Speaker:
Jeremy Taylor
System:
XTDB
Video:
YouTube

XTDB is a SQL database that challenges long held assumptions about how data mutates in databases. Instead of UPDATEs and DELETEs destroying information, or forcing developers to implement archival strategies, XTDB preserves history automatically without leaving such decisions to developers. Additionally XTDB implements a variation of the SQL:2011 syntax to simplify time-travel queries across two dimensions of time: system-time (what... Read More

Nov 19

2025

Nov 19 2025
Evolving Databases for the Cloud and AI era
Speaker:
Ippokratis Pandis
System:
Databricks

In this presentation, we are going to talk about Lakebase, a vision for the next generation of cloud-based agent-enabled OLTP systems. After the dramatic transformation of analytics (OLAP) platforms over the past one to two decades—with innovations such as columnar storage, vectorized execution, streaming, and the Lakehouse architecture—we argue that databases (OLTP) are now at an inflection point. We will... Read More

Nov 18

2025

Nov 18 2025
[Fall 2025] Optimizing the Table Scan Operator: I/O Minimization and Runtime Adaptivity
Speaker:
Benjamin Owad
System:
Snowflake

Table scan is a foundational operator in any analytical database and is often the primary bottleneck for a given query. This talk provides a technical deep dive into optimizations our team has developed for the table scan operator. First, we will discuss I/O reduction techniques, including pruning strategies to avoid reading unnecessary data and storage request coalescing to batch I/O... Read More

Nov 17

2025

Nov 17 2025
[Future Data] Why Powering User Facing Applications on Iceberg is Hard
Speaker:
Benjamin Wagner
System:
Firebolt
Video:
YouTube

Firebolt is a Postgres compliant analytical database built for low-latency, high-concurrency analytics. These applications are usually powered by our fully managed storage and metadata layers. They support efficient caching and indexing, all while having multi-writer consistency. More recently, we’ve been investing heavily into our support for Apache Iceberg. Iceberg is not built to serve these types of low-latency applications. This... Read More

Nov 17

2025

Nov 17 2025
Cortex AISQL: A Production SQL Engine for Unstructured Data
Speaker:
Anupam Datta
System:
Snowflake

Snowflake’s Cortex AISQL is a production SQL engine that integrates native semantic operations directly into SQL. This integration allows users to write declarative queries that combine relational operations with semantic reasoning, enabling them to query both structured and unstructured data effortlessly. However, making semantic operations efficient at production scale poses fundamental challenges. Semantic operations are more expensive than traditional SQL... Read More

Nov 11

2025

Nov 11 2025
[Fall 2025] Open Data Infrastructure with Iceberg and dbt
Speaker:
Connor McArthur
System:
dbt

Apache Iceberg is now interoperable with most modern data platforms and compute systems. While Iceberg enables powerful new capabilities, real-world adoption still presents challenges for many organizations. In this talk, we will unpack Iceberg's architecture; demonstrate a novel architecture where multiple compute systems connect to the same underlying Iceberg catalog; and discuss the maturity and continued investment needed to ensure... Read More

Nov 10

2025

Nov 10 2025
[Future Data] Mooncake: Real-Time Apache Iceberg Without Compromise
Speaker:
Cheng Chen
System:
Mooncake
Video:
YouTube

Apache Iceberg is great for large-scale analytics, but it was built for batch workloads. For streaming use cases, keeping tables fresh means writing snapshots more often, which creates excess small Parquet files, bloated metadata, and costly compaction that never ends. Updates and deletes make things worse because equality deletes push the burden to query engines, leaving readers slow and inefficient.... Read More