[Future Data] Vortex: LLVM for File Formats
- Speaker:
- Will Manning
- Date:
- Mon Oct 13, 2025 @ 04:30pm EDT
- Date:
- Mon Oct 13, 2025
- Time:
- 04:30pm EDT
- Location:
- https://cmu.zoom.us/j/96274590594?pwd=ZIhPZi8CFwaVd5kN9sS5uEiuWanTCa.1Zoom
- Title:
- Vortex: LLVM for File Formats
- System:
- Vortex
- Video:
- YouTube
Talk Info:
Apache Parquet revolutionized columnar storage after its initial release in 2013, but has largely failed to evolve since then. As a result, nearly every Tier 1 tech company has built their own columnar format to replace Parquet.
Enter Vortex, a Linux Foundation project that currently achieves 100x faster random access, 10-20x faster scans, and 5x higher write throughput, while maintaining roughly the same compression ratio. Importantly, it’s also designed explicitly to support decoding via GPU SIMT.
But Vortex is actually more than just a file format. Like how LLVM turned "writing a compiler" into "writing a language frontend”, Vortex provides extensive file format infrastructure, turning “writing a new file format” into customizing encodings and layout strategies.
This talk will walk through how Vortex is built, and how we moved decisions from "spec writer" to "file writer." We'll also cover the core research foundations (BtrBlocks, FastLanes) behind its performance, and why designing for GPU SIMT makes CPU SIMD and random access fast too.
This talk is part of the Future Data Systems Seminar Series.
Bio:
Will is Co-founder & CEO at Spiral, a startup building a next-generation multimodal warehousing system. Spiral particularly excels at workloads like GPU data loading ("making GPUs go brr"), in addition to more traditional relational queries. Will is also the TSC Chair of Vortex, an Incubation stage project at the Linux Foundation that is building an extensible, state-of-the-art columnar file format. Prior to starting Spiral, he worked at Palantir for nearly 10 years. While there, he helped create Palantir Foundry, started Palantir's European commercial business, & ran "every engineering team that read or wrote bytes". In ancient times (ca. 2010), he did research work on reinforcement learning.