Future File Formats
Columnar storage is a core component of a modern data analytics system. Although many database management systems have proprietary storage formats, most support open-source storage formats such as Apache Parquet and Apache ORC to facilitate cross-platform data sharing. However, these formats were developed over a decade ago, in the early 2010s, for the Hadoop ecosystem. Since then, both the hardware and workload landscapes have changed.
The Future File Formats project seeks to develop a next-generation open-source columnar storage format that strives for high-performance decoding on advanced hardware and high portability.
People
- Xinyu Zeng (Tsinghua University)
- Ruijun Meng (Tsinghua University)
- Huanchen Zhang (Tsinghua University)
- Wes McKinney (Everything)
- Jignesh Patel
- Andy Pavlo
Publications
- X. Zeng, R. Meng, A. Pavlo, W. McKinney, and H. Zhang, "NULLS!: Revisiting Null Representation in Modern Columnar Formats," in Proceedings of the 20th International Workshop on Data Management on New Hardware, 2024. PDF
Bibtex
@inproceedings{zeng24,
author = {Zeng, Xinyu and Meng, Ruijun and Pavlo, Andrew and McKinney, Wes and Zhang, Huanchen},
title = {NULLS!: Revisiting Null Representation in Modern Columnar Formats},
year = {2024},
doi = {10.1145/3662010.3663452},
booktitle = {Proceedings of the 20th International Workshop on Data Management on New Hardware},
articleno = {10},
numpages = {10},
series = {DaMoN '24},
url = {https://db.cs.cmu.edu/papers/2024/zeng-damon24.pdf},
} - X. Zeng, Y. Hui, J. Shen, A. Pavlo, W. McKinney, and H. Zhang, "An Empirical Evaluation of Columnar Storage Formats," Proc. VLDB Endow., vol. 17, iss. 2, pp. 148-161, 2023. PDF
Bibtex
@article{zeng23,
author = {Zeng, Xinyu and Hui, Yulong and Shen, Jiahong and Pavlo, Andrew and McKinney, Wes and Zhang, Huanchen},
title = {An Empirical Evaluation of Columnar Storage Formats},
journal = {Proc. {VLDB} Endow.},
volume = {17},
number = {2},
pages = {148--161},
year = {2023},
url = {https://www.vldb.org/pvldb/vol17/p148-zeng.pdf},
}