Archived Events

Archived Events

Dec 9 2024
04:30pm EST
[Building Blocks] Implement, Integrate and Extend a Query Engine (Ruihang Xia)

GreptimeDB uses Apache DataFusion and many other common building blocks in its implementation. This talk will focus on managing the query aspect of a (time-series) database across various parts. We have extended DataFusion to implemenet PromQL, add grammar candies to SQL, cooperate with external secondary indexes and write domain-specific optimizer rules etc. Each of above is extended in a different stage of query execution. In addition to new features, we'll also discuss using DataFusion and Arrow as frameworks for implementing... Read More

Dec 2 2024
04:30pm EST
[Building Blocks] Apache OpenDAL: One Layer, All Storage (Xuanwo)

Apache OpenDAL is an Open Data Access Layer that enables seamless interaction with diverse storage services, guided by its mission of "One Layer, All Storage" and core tenets of being open, solid, fast, and extensible to serve various users from infrastructure builders to application developers. In this talk, we will explain OpenDAL in more detail and describe the abstractions it builds. We will discuss how OpenDAL helps developers build database systems. This talk is part of the Database Building Blocks... Read More

Nov 25 2024
02:00pm EST
Amazon Redshift: re-innovating cloud analytics

In 2013, Amazon Web Services revolutionized the data warehousing industry by launching Amazon Redshift, the first fully-managed, petabyte-scale, enterprise-grade cloud data warehouse. Amazon Redshift made it simple and cost-effective to efficiently analyze large volumes of data using existing business intelligence tools. This cloud service was a significant leap from the traditional on-premise data warehousing solutions, which were expensive, not elastic, and required significant expertise to tune and operate. Customers embraced Amazon Redshift and it became the fastest growing service in... Read More

Nov 20 2024
02:00pm EST
The Rise of Data Streaming Platforms

Apache Kafka and Apache Flink are powering a new category of data infrastructure called data streaming platform (DSP). This provides an opportunity for each enterprise to take actions on what’s happening in its business in real time. I will first provide an overview of DSP. DSP has both similarities and differences to database systems. I will show how existing database technologies can be used in this new platform and some of the unique problems that DSP needs to solve. I... Read More

Nov 18 2024
04:30pm EST
[Building Blocks] Biting the Bullet: Rebuilding GlareDB from the Ground Up (Sean Smith)

GlareDB is a database system enabling querying across a variety of data sources, including Snowflake, Postgres, and more. Building on top of DataFusion let us get to an early product very quickly. But not everything is sunshine and roses. In this talk, we'll explore some of the limitations we hit with DataFusion, and how we plan to address those in our upcoming engine Bullet. This talk is part of the Database Building Blocks Seminar Series. Zoom Link: https://cmu.zoom.us/j/95283696582 (Passcode 787637) Read More

Nov 14 2024
04:30pm EST
GHC 4401
Nov 13 2024
02:00pm EST
Evolution of the Storage Engine for Spanner, an Exabyte-scale Database System

I'll describe the design of Spanner's new storage engine, Ressi, which replaced untyped sorted string tables (inherited from Bigtable) with a strongly typed SQL-native representation. Live migration of 6 exabytes of data and multiple billion-user products to the new engine posed unique challenges. Sound methodology from experimental computer science was the key to its success. The simplicity and power of declarative queries combined with strongly consistent transactional semantics has scaled to many thousands of machines running an aggregate of over... Read More

Nov 11 2024
04:30pm EST
[Building Blocks] Building InfluxDB 3.0 with the FDAP Stack: Apache Flight, DataFusion, Arrow and Parquet (Paul Dix)

This talk is part of the Database Building Blocks Seminar Series. Zoom Link: https://cmu.zoom.us/j/95283696582 (Passcode 787637) Read More

Nov 11 2024
02:00pm EST
AI Vector Search in the Oracle Database
Shasank Chavan

AI Vector Search in Oracle Database is a new, transformative way to intelligently, efficiently, and accurately search business data by using AI techniques to search data by semantics, or meaning. With the inclusion of a new VECTOR data type, new approximate search indexes, and new SQL operators and extensions, enterprise companies can quickly and easily leverage AI Vector Search to build modern, generative AI applications in just a few lines of SQL. And with this simplicity comes power, as AI... Read More

Nov 6 2024
02:00pm EST
Snowflake, and why the Cloud reshaped the analytics industry

Snowflake was the first data warehouse designed from scratch to take advantage of Cloud economics. We'll talk about what that means, why it was such a big deal, and how its design differs from the approaches taken by similar systems. Stay until the end for some bonus content on how Snowflake is bringing stream processing into the DBMS. Zoom link: https://cmu.zoom.us/my/jignesh Read More