Events

Events

[Future Data] Apache Fluss: A Streaming Storage for Real-Time Lakehouse

Speaker:
Jark Wu
Date:
Mon Dec 8, 2025 @ 04:30pm EST
Date:
Mon Dec 8, 2025
Time:
04:30pm EST
Location:
https://cmu.zoom.us/j/96274590594?pwd=ZIhPZi8CFwaVd5kN9sS5uEiuWanTCa.1Zoom
Title:
Apache Fluss: A Streaming Storage for Real-Time Lakehouse
System:
Fluss
Video:
YouTube

Talk Info:

Modern data lakehouses promise unified batch and streaming processing, yet their storage layer remains inherently batch-oriented—optimized for large, immutable files. This mismatch forces streaming workloads to rely on external systems (e.g., Kafka), while analytical queries operate on stale snapshots, breaking end-to-end freshness.

In this talk, I’ll present Apache Fluss (incubating), a lakehouse-native streaming storage system designed to bridge this gap. Fluss rethinks streaming storage from the ground up for analytical workloads. Its core abstraction is a columnar stream built on Apache Arrow, enabling sub-second ingestion and high-throughput analytical scans. Furthermore, Fluss introduces the "Streaming Lakehouse" concept that Fluss serves as the real-time data layer on top of Lakehouse. It allows query engines to seamlessly unify both fresh streaming data in Fluss and historical data in Lakehouse (Iceberg) to achieve truly real-time data analytics.

This talk is part of the Future Data Systems Seminar Series.

Bio:

Jark Wu is the original creator of Apache Fluss and PMC member of Apache Flink. He currently leads the Flink SQL (streaming compute) and Fluss (streaming storage) teams at Alibaba Cloud, where he is dedicated to building a serverless Flink cloud service. His work focuses on data streaming systems for over a decade.