[Spring 2024] Towards a Systematic Framework for Index Structure Design (Dong Xie)

Event Date: Thursday March 14, 2024
Event Time: 12:00pm EDT
Location: GHC 9115
Speaker: Dong Xie [INFO]

Title: Towards A Systematic Framework For Index Structure Design

Index structures are at the database management systems’ core to facilitate efficient data access. Due to the constant changes in application requirements and hardware trends, people are going through exhaustive and painstaking work designing/tailoring new index structures to catch up. In this talk, I will show a vision of a systematic index structure design framework that will allow index designers to focus on data layout design and query algorithms without worrying about support for other practical features (update and concurrency) and adapting underlying hardware. In particular, I will talk about two main directions we have been working on: (1) How can we extend a static data structure with special query capabilities (e.g., sampling, similarity search, etc.) automatically with concurrent update support? (2) How can we automatically tailor data structures to perform decently on arbitrary block devices or complex storage hierarchies? For the first question, I will show how our extension framework adds (concurrent) update support to static sampling indexes and discuss its generality to other data structures. For the latter question, I will present how modern storage hierarchies make data structure design dramatically harder and our general methodology for designing data structures with performance models of black-box block devices.

Dong Xie is an assistant professor in the computer science and engineering department at Penn State University. He received his Ph.D in Computer Science from University of Utah in 2020, and received his bachelor’s degree from ACM Honored Class of Shanghai Jiao Tong University in 2015. He received the Google Research Scholar Award in 2023, Microsoft Research PhD Fellowship in 2018, and SoCC best paper runner-up in 2019. His research interest lies in building data systems to address the challenges of processing and analyzing real-world large-scale data. His research span in multiple areas including data systems on modern hardware, distributed databases, main-memory databases, stream processing systems, approximate query processing, spatio-temporal data processing, data privacy, and system security.