Future File Formats
Columnar storage is a core component of a modern data analytics system. Although many database management systems have proprietary storage formats, most support open-source storage formats such as Apache Parquet and Apache ORC to facilitate cross-platform data sharing. However, these formats were developed over a decade ago, in the early 2010s, for the Hadoop ecosystem. Since then, both the hardware and workload landscapes have changed.
The Future File Formats project seeks to develop a next-generation open-source columnar storage format that strives for high-performance decoding on advanced hardware and high portability.
People
- Xinyu Zeng (Tsinghua University)
- Ruijun Meng (Tsinghua University)
- Huanchen Zhang (Tsinghua University)
- Wes McKinney (Everything)
- Jignesh Patel
- Andy Pavlo
Publications
- X. Zeng, R. Meng, A. Pavlo, W. McKinney, and H. Zhang, "NULLS!: Revisiting Null Representation in Modern Columnar Formats," in Proceedings of the 20th International Workshop on Data Management on New Hardware, 2024. PDF
Bibtex
@inproceedings{zeng24, author = {Zeng, Xinyu and Meng, Ruijun and Pavlo, Andrew and McKinney, Wes and Zhang, Huanchen}, title = {NULLS!: Revisiting Null Representation in Modern Columnar Formats}, year = {2024}, doi = {10.1145/3662010.3663452}, booktitle = {Proceedings of the 20th International Workshop on Data Management on New Hardware}, articleno = {10}, numpages = {10}, series = {DaMoN '24}, url = {https://db.cs.cmu.edu/papers/2024/zeng-damon24.pdf}, }
- X. Zeng, Y. Hui, J. Shen, A. Pavlo, W. McKinney, and H. Zhang, "An Empirical Evaluation of Columnar Storage Formats," Proc. VLDB Endow., vol. 17, iss. 2, pp. 148-161, 2023. PDF
Bibtex
@article{zeng23, author = {Zeng, Xinyu and Hui, Yulong and Shen, Jiahong and Pavlo, Andrew and McKinney, Wes and Zhang, Huanchen}, title = {An Empirical Evaluation of Columnar Storage Formats}, journal = {Proc. {VLDB} Endow.}, volume = {17}, number = {2}, pages = {148--161}, year = {2023}, url = {https://www.vldb.org/pvldb/vol17/p148-zeng.pdf}, }