[PDL] Package Queries: Scalable Prescriptive Analytics Close to the Data (Matteo Brucato)
Decision making is central to a broad range of domains, including finance, transportation, healthcare, the travel industry, robotics, and engineering. It is often found at the very final step of business analytics–prescriptive analytics–to allow businesses to transform a rich understanding of data, typically provided by advanced predictive models, into actionable decisions. Modeling and solving these problems have relied on application-specific solutions, which are often complex, error-prone, and not generalizable. My goal is to create a domain-independent, declarative approach, supported and powered by the system where the data relevant to these problems typically resides: the database. Despite the widespread importance of prescriptive analytics, unified solutions close to the data did not exist.
In my talk, I will present a prototype system that supports package queries, a new query model that extends traditional database queries to handle complex constraints and preferences over answer sets. Package queries allow the declarative specification and efficient evaluation of a significant class of constrained optimization problems–integer programs–within a database. These queries pose unique challenges to a database system, ranging from their richer expressive power, more complex semantics, and harder computational complexity than their SQL counterpart, to scalability issues that arise from large amounts of data and uncertainty in the data. I will illustrate how our unified system addresses all these challenges achieving high performance and quality in many real-world problems from finance, healthcare, and science. I will also present my vision for data-centric systems for decision making, and their connections with robotics, machine learning, natural language processing, visualization, operations management, and simulation.
Matteo Brucato is a Ph.D. candidate in computer science at the University of Massachusetts Amherst. Matteo's research aims at augmenting data management systems to better support data science and all stages of analytics, with a focus on prescriptive analytics and data-driven decision making. He is the co-inventor of package queries and his work has been recognized by multiple awards, including CACM and SIGMOD Record research highlights, best paper at VLDB 2016, and both the best demonstration and the runner-up award in VLDB 2020. Matteo's work is largely interdisciplinary, spanning multiple research areas beyond data management, such as natural language processing, information retrieval, AI, machine learning, robotics, and operations research. He received his Bachelor's and Master's degrees in computer science from the University of Bologna (Italy), he visited Aarhus University (Denmark), UC Riverside, UC Berkeley, and NYU, and he interned at MSR and IBM.