[ML⇄DB 2023] Alibaba: Domain Knowledge Augmented AI for Databases (Jian Tan)
Date
Time
Location
Speaker
One goal of applying AI for databases is to make the systems easier to use, e.g., natural language to SQL conversion (NL2SQL), and more efficient to operate, e.g., DevOps root cause analysis (RCA). Although scaling up general models and datasets with less hand engineering have achieved unprecedented successes in various applications, we argue that utilizing domain knowledge to augment AI for databases can provide an efficient and effective solution. Specifically, we use two production systems, SQL Bridge (NL2SQL) and ShapleyIQ (RCA) developed at Alibaba Intelligent Database team, to show that domain-specific interventions can be designed to work coherently with general intelligence. For the former, we intervene in the SQL generation process of an encoder-decoder neural network by introducing a context-free grammar. The imposed structures allow us to insert precise and low-level controls over the generation. For the latter, the DevOps diagnosis system contains a forward pass and a backward pass. The forward pass uses domain knowledge to build models (e.g., queueing) to simulate and evaluate counterfactuals for the causal factors. The backward process relies on Shapley value with a splitting invariance axiom to quantify the factors’ influences to pinpoint the root causes. We believe that this design philosophy has great practical value, especially in low resource settings with distributional shifts.
This talk is part of the ML⇄DB Seminar Series.
Bio:
Jian Tan is a Research Scientist/Director of Intelligent Database Team at Alibaba. Before that, he worked at the IBM T. J. Watson Research Center and then as a tenure-track faculty at the Ohio State University. His research interests focus on making intelligent decisions for complex computing systems.
More Info: https://db.cs.cmu.edu/seminar2023/#db10