Archived Events

Archived Events

Sep 12 2024
12:00pm EDT
GHC 9115
[Fall 2024] Advancing Database Performance and Capabilities at Snowflake
Dan Sotolongo , Bowei Chen

This talk presents recent research and development at Snowflake aimed at pushing the boundaries of database performance and functionality. In the first section, we will introduce a series of optimizations designed to accelerate query execution within Snowflake’s platform. We will discuss the technical challenges associated with developing general-purpose optimizations and balancing performance improvements across a wide range of workloads. The second section will explore a novel database constraint we’re developing to enable continuous processing applications. A finalization constraint restricts the... Read More

Sep 10 2024
06:00pm EDT
GHC 4401
[Fall 2024] Databricks: Introduction to Mosaic AI Vector Search

This tech talk will deep dive into some of the most interesting challenges being solved at Databricks. Read More

Aug 21 2024
12:00pm EDT
LSM Management and Using LSM Immutability for Data Virtualization (Vaibhav Arora)

LSM (Log-Structured Merge) trees are now the bedrock of many storage engines and datastores like RocksDB, HBase, Cassandra etc. They provide the ability to avoid random-writes, and provide immutability. Data is organized in multiple-levels that are exponentially increasing in size. Each data mutation writes a new version of an object, and background processes named merge/compaction continuously remove the unused versions, while moving the data across the layers of the LSM tree and maintain its shape. This talk will describe how... Read More

Jun 26 2024
12:00pm EDT
Leveraging Generative AI with Oracle AI Vector Search (Shasank Chavan)

AI Vector Search in Oracle 23ai is a new, transformative way to intelligently search through your unstructured business data efficiently, and accurately, by using AI techniques to match on the semantics, or meaning, of the underlying data. With the inclusion of a new VECTOR datatype, new approximate search indexes, and new SQL operators and extensions, enterprise companies can quickly and easily leverage AI Vector Search to build modern, generative-ai applications with just a few lines of SQL! And with this... Read More

Apr 24 2024
12:30pm EDT
Porter Hall 1000
[Spring 2024] Beyond SQL: Dataframes in the Database (Devin Petersohn)

Dataframes are popular tools for interacting with and exploring data, but they are not as well understood nor as deeply studied as databases. Python's pandas. and Apache Spark are two of the most popular dataframes in use by data practitioners, but even these are extremely different from each other in terms of guarantees and user expectations. In this talk, we will explore these differences and take a deep dive into pandas-like dataframes with a theoretical lens, exploring the dataframe data... Read More

Apr 17 2024
12:00pm EDT
TCS Hall 358
[Spring 2024] Manufacturing AI Applications (Anthony Tomasic)

Developing AI applications is costly and difficult and recent trends have only intensified these challenges. Developers use a bottom-up approach, focusing on the nitty-gritty of integration and infrastructure, which leads to a complex "blob" of code. Changes to this blob are risky due to the intricate web of dependencies. Fort Alto has fundamentally rethought the application development process with a groundbreaking approach that redefines how AI applications are built. (i) Separate application semantics from the infrastructure code. (ii) Split application... Read More

Apr 5 2024
02:00pm EDT
PhD Defense: On Embedding Database Management System Logic in Operating Systems via Restricted Programming Environments (Matt Butrovich)

The rise in computer storage and network performance means that disk I/O and network communication are often no longer bottlenecks in database management systems (DBMSs). Instead, the overheads associated with operating system (OS) services (e.g., system calls, thread scheduling, and data movement from kernel-space) limit query processing responsiveness. User-space applications can elide these overheads with a kernel-bypass design. However, extracting benefits from kernel-bypass frameworks is challenging, and the libraries are incompatible with standard deployment and debugging tools. This thesis presents... Read More

Mar 14 2024
12:00pm EDT
GHC 9115
[Spring 2024] Towards a Systematic Framework for Index Structure Design (Dong Xie)

Index structures are at the database management systems' core to facilitate efficient data access. Due to the constant changes in application requirements and hardware trends, people are going through exhaustive and painstaking work designing/tailoring new index structures to catch up. In this talk, I will show a vision of a systematic index structure design framework that will allow index designers to focus on data layout design and query algorithms without worrying about support for other practical features (update and concurrency)... Read More

Feb 29 2024
12:00pm EDT
GHC 6501
[Spring 2024] Embedding Database Logic in the Operating System Is Finally a Good Idea (Matt Butrovich)

The rise in computer storage and network performance means that disk I/O and network communication are often no longer bottlenecks in database management systems (DBMSs). Instead, the overheads associated with operating system (OS) services (e.g., system calls, thread scheduling, and data movement from kernel-space) limit query processing responsiveness. To avoid these overheads, user-space applications prioritizing performance over simplicity can elide these software layers with a kernel-bypass design. However, extracting benefits from kernel-bypass frameworks is challenging, and the libraries are incompatible... Read More

Jan 5 2024
10:00am EDT
GHC 6115
[Winter 2023] Survey and Evaluation of Database Management System Extensibility (Abi Kim)

Database management system (DBMS) extensibility is a feature which enables users to extend the DBMS with user software. However, the DBMS extensibility environment is fraught with perils, and DBMS developers have to resort to unspecified methods of developing extensions, including copying core DBMS source code and casing between different versions of the DBMS. Extending a DBMS to support new functionality is challenging due to the tight coupling between the system's internal components. This thesis studies and evaluates the design of... Read More