Projects

Projects

Future File Formats

Future File Formats

Columnar storage is a core component of a modern data analytics system. Although many database management systems have proprietary storage formats, most support open-source storage formats such as Apache Parquet and Apache ORC to facilitate cross-platform data sharing. However, these formats were developed over a decade ago, in the early 2010s, for the Hadoop ecosystem. Since then, both the hardware and workload landscapes have changed.

The Future File Formats project seeks to develop a next-generation open-source columnar storage format that strives for high-performance decoding on advanced hardware and high portability.

People

Publications

  1. X. Zeng, R. Meng, A. Pavlo, W. McKinney, and H. Zhang, "NULLS!: Revisiting Null Representation in Modern Columnar Formats," in Proceedings of the 20th International Workshop on Data Management on New Hardware, 2024. PDF Bibtex
    @inproceedings{zeng24,
       author = {Zeng, Xinyu and Meng, Ruijun and Pavlo, Andrew and McKinney, Wes and Zhang, Huanchen},
       title = {NULLS!: Revisiting Null Representation in Modern Columnar Formats},
       year = {2024},
       doi = {10.1145/3662010.3663452},
       booktitle = {Proceedings of the 20th International Workshop on Data Management on New Hardware},
       articleno = {10},
       numpages = {10},
       series = {DaMoN '24},
       url = {https://db.cs.cmu.edu/papers/2024/zeng-damon24.pdf},
     }
  2. X. Zeng, Y. Hui, J. Shen, A. Pavlo, W. McKinney, and H. Zhang, "An Empirical Evaluation of Columnar Storage Formats," Proc. VLDB Endow., vol. 17, iss. 2, pp. 148-161, 2023. PDF Bibtex
    @article{zeng23,
       author = {Zeng, Xinyu and Hui, Yulong and Shen, Jiahong and Pavlo, Andrew and McKinney, Wes and Zhang, Huanchen},
       title = {An Empirical Evaluation of Columnar Storage Formats},
       journal = {Proc. {VLDB} Endow.},
       volume = {17},
       number = {2},
       pages = {148--161},
       year = {2023},
       url = {https://www.vldb.org/pvldb/vol17/p148-zeng.pdf},
     }