Archived Events

Archived Events

Nov 11

2024

Nov 11 2024
[Building Blocks] Building InfluxDB 3.0 with the FDAP Stack: Apache Flight, DataFusion, Arrow and Parquet (Paul Dix)
Speaker:
Paul Dix
System:
InfluxDB
Video:
YouTube

This talk is part of the Database Building Blocks Seminar Series. Zoom Link: https://cmu.zoom.us/j/95283696582 (Passcode 787637) Read More

Nov 11

2024

Nov 11 2024
AI Vector Search in the Oracle Database
Speaker:
Shasank Chavan
System:
Oracle

AI Vector Search in Oracle Database is a new, transformative way to intelligently, efficiently, and accurately search business data by using AI techniques to search data by semantics, or meaning. With the inclusion of a new VECTOR data type, new approximate search indexes, and new SQL operators and extensions, enterprise companies can quickly and easily leverage AI Vector Search to... Read More

Nov 6

2024

Nov 6 2024
Snowflake, and why the Cloud reshaped the analytics industry
Speaker:
Dan Sotolongo
System:
Snowflake

Snowflake was the first data warehouse designed from scratch to take advantage of Cloud economics. We'll talk about what that means, why it was such a big deal, and how its design differs from the approaches taken by similar systems. Stay until the end for some bonus content on how Snowflake is bringing stream processing into the DBMS. Zoom link:... Read More

Nov 4

2024

Nov 4 2024
[Building Blocks] Towards “Unified” Compute Engines: Opportunities and Challenges (Mehmet Ozan Kabak)
Speaker:
Mehmet Ozan Kabak
System:
Synnada
Video:
YouTube

The architecture diagram of a typical data and AI infrastructure setup often features a primary compute engine (e.g., Apache Spark) alongside an array of supplementary tools for observability, AI integration, streaming support, memory management, interactivity, and more. While this modular architecture can be effective, it also introduces challenges around performance bottlenecks, maintenance costs, and integration complexity. In this talk, we... Read More

Oct 28

2024

Oct 28 2024
[Building Blocks] Exon: A Built for Purpose Bioinformatics Database (Trent Hauck)
Speaker:
Trent Hauck
System:
Exon
Video:
YouTube

Without having to implement every component of a database engine, it’s now feasible to build databases that can lean into the idiosyncrasies of specific domains to deliver a better user experience. Exon is one such databases. Thanks to DataFusion, Exon can deliver a complete database, but also have capabilities bridge the gap between bioinformatics and database systems. In this talk... Read More

Oct 21

2024

Oct 21 2024
[Building Blocks] Accelerating Data and AI with Spice.ai Open-Source Software (Luke Kim)
Speaker:
Luke Kim
System:
Spice.ai
Video:
YouTube

Spice.ai OSS is an open-source, portable runtime designed to simplify building data and AI applications. It’s built on industry leading technologies like Apache DataFusion, Apache Arrow, DuckDB and SQLite. In this talk, we tell the story of building neurofeedback systems, to operating DuckDB at cloud-scale, to building Spice.ai OSS for the intersection of high-performance data query and ML-inference. We introduce... Read More

Oct 7

2024

Oct 7 2024
[Building Blocks] ParadeDB – Postgres for Search and Analytics (Philippe Noël)
Speaker:
Philippe Noël
System:
ParadeDB
Video:
YouTube

ParadeDB is Postgres for search and analytics. It is an alternative to Elasticsearch built on Postgres. It offers state-of-the-art full-text and vector search capabilities, as well as fast aggregations inside Postgres. ParadeDB is built in Rust via Postgres extensions on top of database building blocks like Tantivy, DuckDB, and Apache DataFusion. It is compatible with every officially supported PGDG Postgres... Read More

Oct 1

2024

Oct 1 2024
[DB Seminar] JSON Relational Duality: Converging the worlds of Objects, Documents, and Relational
Speaker:
Tirthankar Lahiri
System:
Oracle

The "Object-Relational Impedance Mismatch" has been a multi-decade problem for developers, and past solutions have all had various tradeoffs that have compromised efficiency or consistency.  JSON Relational Duality is a breakthrough capability that combines the best aspects of the Document model and the Relational models without the drawbacks of either model. This session will provide an overview and deep dive... Read More

Sep 30

2024

Sep 30 2024
[Building Blocks] Accelerating Apache Spark workloads with Apache DataFusion Comet (Andy Grove)
Speaker:
Andy Grove
System:
DataFusion
Video:
YouTube

Apache Spark is one of the most widely-used distributed data analysis frameworks. However, its JVM-based and row-oriented query execution engine limits Spark’s performance and scalability. In this talk, we will introduce DataFusion Comet, an accelerator for Apache Spark designed to improve the efficiency of Spark queries by translating them into native queries that leverage Apache Arrow and Apache DataFusion. We... Read More

Sep 23

2024

Sep 23 2024
[Building Blocks] Apache Arrow DataFusion: A Fast, Embeddable, Modular Analytic Query Engine (Andrew Lamb)
Speaker:
Andrew Lamb
System:
DataFusion
Video:
YouTube

Apache DataFusion is a fast, embeddable, and extensible query engine written in Rust that uses Apache Arrow as its memory model. In this talk we explain DataFusion in more detail and describe the types of data centric systems it is used to build. We will also review its high level architecture and feature set, discussing tradeoffs and performance between DataFusion's... Read More