Fall 2019: Rohit Agrawal (SalesForce)
In this talk we discuss LSM compression for a KV store. In our KV store, we write to an underlying shared storage system that models data as named extents (up to 2GB) and variable-length fragments contained within the extent. Fragments are max of 1MB and are the atomic unit of read and write. Our KV store reads fragments into 64K buffers for scanning and random reads.
Our compression has two facets: key-compression and fragment-compression. Key-compression is particularly effective because the data we store in our LSM is very key-intensive, sometimes with key-only records stored. Effective key-compression dramatically improves the efficacy of our block cache. Fragment-compression is a powerful technique for us to save storage but the cost-benefit is proportional to the longevity of the compressed fragment leading to interesting tradeoffs with the LSM tree and its various levels.
Rohit Agrawal graduated from CMU in 2017. He works at Salesforce.