[Future Data] DuckLake: Learning from Cloud Data Warehouses to Build a Robust “Lakehouse”
- Speaker:
- Jordan Tigani
- Date:
- Mon Oct 6, 2025 @ 04:30pm EDT
- Date:
- Mon Oct 6, 2025
- Time:
- 04:30pm EDT
- Location:
- https://cmu.zoom.us/j/96274590594?pwd=ZIhPZi8CFwaVd5kN9sS5uEiuWanTCa.1Zoom
- Title:
- DuckLake: Learning from Cloud Data Warehouses to Build a Robust "Lakehouse"
- System:
- MotherDuck
- Video:
- YouTube
Talk Info:
When building scalable data systems, it is easy to focus on the storage and the compute, but metadata a critical third piece that is often overlooked. This talk will describe how metadata storage enables query performance and helps provide transactional semantics in modern data warehouses. We will then go into how the metadata story in popular open data formats take us several steps backwards. We will then talk about how DuckLake makes metadata access work more closely to a traditional data warehouse, which solves a lot of problems. Finally, we'll discuss building a SaaS service for DuckLake, and the technical challenges and tradeoffs involved.
This talk is part of the Future Data Systems Seminar Series.
Bio:
Jordan is co-founder and chief duck-herder at MotherDuck, a startup providing a serverless data warehouse based on the open source DuckDB. This is the third cloud data analytics SaaS service he’s helped create, and hopefully this time he’s getting it right. He helped start Google BigQuery, spent a decade working on it as engineer, book author, engineering leader, and product leader. Jordan has also worked at SingleStore, Microsoft Research, the Windows Kernel team, and at a handful of star-crossed startups in engineering, product, and leadership roles.