News & Events
[Building Blocks] Building InfluxDB 3.0 with the FDAP Stack: Apache Flight, DataFusion, Arrow and Parquet (Paul Dix)
This talk is part of the Database Building Blocks Seminar Series. Zoom Link: https://cmu.zoom.us/j/95283696582 (Passcode 787637) Read More
[Building Blocks] Synnada (Mehmet Ozan Kabak)
This talk is part of the Database Building Blocks Seminar Series. Zoom Link: https://cmu.zoom.us/j/95283696582 (Passcode 787637) Read More
[Building Blocks] Exon (Trent Hauck)
This talk is part of the Database Building Blocks Seminar Series. Zoom Link: https://cmu.zoom.us/j/95283696582 (Passcode 787637) Read More
[Building Blocks] Accelerating Data and AI with Spice.ai Open-Source Software (Luke Kim)
Spice.ai OSS is an open-source, portable runtime designed to simplify building data and AI applications. It’s built on industry leading technologies like Apache DataFusion, Apache Arrow, DuckDB and SQLite. In this talk, we tell the story of building neurofeedback systems, to operating DuckDB at cloud-scale, to building Spice.ai OSS for the intersection of high-performance data query and ML-inference. We introduce Spice.ai OSS, demo some of its capabilities and use-cases, explore the design principles and architecture of the platform, and go Read More
[Building Blocks] ParadeDB – Postgres for Search and Analytics (Philippe Noël)
ParadeDB is Postgres for search and analytics. It is an alternative to Elasticsearch built on Postgres. It offers state-of-the-art full-text and vector search capabilities, as well as fast aggregations inside Postgres. ParadeDB is built in Rust via Postgres extensions on top of database building blocks like Tantivy, DuckDB, and Apache DataFusion. It is compatible with every officially supported PGDG Postgres version. In this talk, we'll discuss how we extended Postgres with these building blocks and dive into the technical details Read More
[Building Blocks] Accelerating Apache Spark workloads with Apache DataFusion Comet (Andy Grove)
Apache Spark is one of the most widely-used distributed data analysis frameworks. However, its JVM-based and row-oriented query execution engine limits Spark’s performance and scalability. In this talk, we will introduce DataFusion Comet, an accelerator for Apache Spark designed to improve the efficiency of Spark queries by translating them into native queries that leverage Apache Arrow and Apache DataFusion. We will explore the core architecture of Comet and explain how Spark plans are translated into native plans and talk about Read More
[Building Blocks] Apache Arrow DataFusion: A Fast, Embeddable, Modular Analytic Query Engine (Andrew Lamb)
Apache DataFusion is a fast, embeddable, and extensible query engine written in Rust that uses Apache Arrow as its memory model. In this talk we explain DataFusion in more detail and describe the types of data centric systems it is used to build. We will also review its high level architecture and feature set, discussing tradeoffs and performance between DataFusion's modularity vs more common tightly coupled design. This talk is part of the Database Building Blocks Seminar Series. Zoom Link: Read More
Leveraging Generative AI with Oracle AI Vector Search (Shasank Chavan)
AI Vector Search in Oracle 23ai is a new, transformative way to intelligently search through your unstructured business data efficiently, and accurately, by using AI techniques to match on the semantics, or meaning, of the underlying data. With the inclusion of a new VECTOR datatype, new approximate search indexes, and new SQL operators and extensions, enterprise companies can quickly and easily leverage AI Vector Search to build modern, generative-ai applications with just a few lines of SQL! And with this Read More
PhD Defense: On Embedding Database Management System Logic in Operating Systems via Restricted Programming Environments (Matt Butrovich)
The rise in computer storage and network performance means that disk I/O and network communication are often no longer bottlenecks in database management systems (DBMSs). Instead, the overheads associated with operating system (OS) services (e.g., system calls, thread scheduling, and data movement from kernel-space) limit query processing responsiveness. User-space applications can elide these overheads with a kernel-bypass design. However, extracting benefits from kernel-bypass frameworks is challenging, and the libraries are incompatible with standard deployment and debugging tools. This thesis presents Read More
[Spring 2024] Manufacturing AI Applications (Anthony Tomasic)
Developing AI applications is costly and difficult and recent trends have only intensified these challenges. Developers use a bottom-up approach, focusing on the nitty-gritty of integration and infrastructure, which leads to a complex "blob" of code. Changes to this blob are risky due to the intricate web of dependencies. Fort Alto has fundamentally rethought the application development process with a groundbreaking approach that redefines how AI applications are built. (i) Separate application semantics from the infrastructure code. (ii) Split application Read More