[Future Data] Apache Fluss: A Streaming Storage for Real-Time Lakehouse
- Speaker:
- Jark Wu
- Date:
- Mon Dec 8, 2025 @ 04:30pm EST
- Date:
- Mon Dec 8, 2025
- Time:
- 04:30pm EST
- Location:
- https://cmu.zoom.us/j/96274590594?pwd=ZIhPZi8CFwaVd5kN9sS5uEiuWanTCa.1Zoom
- Title:
- Apache Fluss: A Streaming Storage for Real-Time Lakehouse
- System:
- Fluss
- Video:
- YouTube
Talk Info:
Modern data lakehouses promise unified batch and streaming processing, yet their storage layer remains inherently batch-oriented—optimized for large, immutable files. This mismatch forces streaming workloads to rely on external systems (e.g., Kafka), while analytical queries operate on stale snapshots, breaking end-to-end freshness.
In this talk, I’ll present Apache Fluss (incubating), a lakehouse-native streaming storage system designed to bridge this gap. Fluss rethinks streaming storage from the ground up for analytical workloads. Its core abstraction is a columnar stream built on Apache Arrow, enabling sub-second ingestion and high-throughput analytical scans. Furthermore, Fluss introduces the "Streaming Lakehouse" concept that Fluss serves as the real-time data layer on top of Lakehouse. It allows query engines to seamlessly unify both fresh streaming data in Fluss and historical data in Lakehouse (Iceberg) to achieve truly real-time data analytics.
This talk is part of the Future Data Systems Seminar Series.
Bio:
Jark Wu is the original creator of Apache Fluss and PMC member of Apache Flink. He currently leads the Flink SQL (streaming compute) and Fluss (streaming storage) teams at Alibaba Cloud, where he is dedicated to building a serverless Flink cloud service. His work focuses on data streaming systems for over a decade.