MS Thesis Defense: An Evaluation of Compilation-Based PL/PGSQL Execution (Tanuj Nayak)

Event Date: Wednesday February 3, 2021
Event Time: 02:30pm EST

User Defined Functions (UDFs) are an important analytical feature in modern Database Management Systems (DBMSs) due to their server-side execution properties. These properties allow complex analytical queries to execute without serializing intermediate data over a network. However, query engines often incur significant overheads when executing UDFs due to them being non-declarative in contrast to SQL queries. This contrast causes a lot of context switching between UDF and SQL execution. As a given UDF invokes more SQL queries, these overheads become more noticeable.

In this thesis, we investigate the extent to which compilation allow us to overcome such overheads. Compilation for executing SQL queries has become popular in database research in the past decade, especially in the context of main memory DBMSs. It has been shown to deliver significant improvements to query execution performance. We compare the technique of compiling UDFs with query inlining, another recent UDF execution technique.

To make this comparison, we implemented a UDF compilation framework in NoisePage, a main-memory compilation-based DBMS. In this framework we compile UDFs into a domain-specific language (DSL) function and evaluated it against query inlining. We find that this framework has greater support across UDF language features than inlining frameworks and allows for more efficient functions. We also observe that our framework compiles functions into DSL primitives that are far more fine-grained and lightweight than most SQL operators. As a result, the SQL operators produced by the inlining approach incur a much larger performance overhead. On iteration-heavy benchmarks, the database system achieved performance gains from 2x to 120x with compilation relative to inlining.

Thesis Committee:
Chair: Andy Pavlo and Todd C. Mowry

Thesis Link:
Zoom Link: