[Vaccination 2022] Open-source Change Data Capture With Debezium (Gunnar Morling)
Change Data Capture (CDC) is one big enabler for your data; by reacting to changes in your database in "real-time", CDC comes in handy for implementing a wide range of use cases, such as low-latency data updates from OLTP data stores to OLAP systems, caches, or search indexes, data exchange between microservices, building audit logs, and many more. In this talk you'll learn about Debezium, a distributed open-source log-based CDC platform for a variety of databases, such as Postgres, MySQL,... Read More
[Vaccination 2022] ApertureDB: Designing a Purpose-built System for Visual Data and Data Science (Vishakha Gupta)
Data science and ML techniques can help understand visual content and enable better customer experience across domains, in turn driving the exponential growth in the amount of visual data. Managing large amounts of visual data (images or videos) is extremely time consuming, frustrating, and inefficient due to a lack of data management solutions designed with visual data or data science in mind. In this talk, I will start by briefly highlighting why visual data needs special treatment now and how... Read More
[Vaccination 2022] It’s All Downhill From Here: The Motivations and Design of the sled Embedded Database (Tyler Neely) CANCELLED
The sled embedded database is generally regarded as an “ok” choice for working with embedded data in an ergonomic, too-fast-for-its-own-good, transactional manner. But it wasn’t always that way! This talk covers the motivations, design choices, mistakes, and evolution during the first 6 years of this young database’s life. Topics covered: lock-free index structures, low-overhead logging, cheap OLTP transaction techniques, the RUM conjecture’s implications for database design, finding vast troves of bugs with very little testing code in concurrent and stateful... Read More
[Vaccination 2022] Orca: A Modular Query Optimizer Architecture for VMware Greenplum (Venky Raghavan)
Greenplum is an established large scale data-warehouse system with both enterprise and open-source deployments. The massively parallel processing (MPP) architecture of Greenplum splits the data into disjoint parts that are stored across individual worker segments. The increased amount of data these systems have to process magnifies optimization mistakes and stresses the importance of query optimization more than ever. Furthermore, there is growing need for optimizers to be highly extensible and modular to ensure that optimizer can keep up with the... Read More
[Vaccination 2022] HTAP with Azure Cosmos DB: Hybrid Transaction & Analytical Processing (Hari Sudan S)
Azure Cosmos DB is a multi-tenant globally distributed database service for managing JSON documents at Internet scale. As the amount of data managed by the service has grown several times over the past 5 years, customers have shown an increasing need for being able to do efficient analytics on top of this operational data store. The customer asks include: reducing cost, removing the need to manage separate data storage or ETL, as well as being able to query data using... Read More
[Vaccination 2022] SpiceDB: Flexible Permissions Database for the Internet Era (Jake Moshenko)
In this talk, we will walk through the architecture and implementation of SpiceDB, an open-source permissions database. As an implementation of Google’s Zanzibar (the singular global-scale authorization service that powers permissions and sharing across all Google properties) paper, he will focus heavily on facets of the database that allow it to run highly scalably, with low latency and incredible reliability. He will also cover some of the innovations that have made the database service easier to understand and consume. This... Read More
[Vaccination 2022] Practical Considerations for ACID/MVCC Storage Engines (Oren Eini)
In this talk, Oren Eini, founder of RavenDB, will discuss the design decisons and the manner in which RavenDB deals with storing data on disk. Achieving highly concurrent and transactional system can be a challenging task. RavenDB solves this issue using a storage engine called Voron. We'll go over the design of Voron and how it is able to achieve both high performance and maintain ACID integrity. This talk is part of the Vaccination Database (Booster) Tech Talk Seminar Series.... Read More
[Vaccination 2021] How We Build Firebolt (Benjamin Wagner)
Data-driven companies are increasingly building customer-facing analytics products. These workloads demand lower latency, higher concurrency, and more predictable query performance than ever before - demands that traditional data warehouses struggle with. In this talk, Benjamin explains how Firebolt is designed to welcome this new generation of data challenges. This talk is part of the Vaccination Database (Second Dose) Tech Talk Seminar Series. Zoom Link: https://cmu.zoom.us/j/95002789605 (Passcode 982149) Read More
[Vaccination 2021] Apache Arrow: High-Performance Columnar Data Framework (Wes McKinney)
Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing. With the aim to make the data ecosystem modular and connected, Wes will talk about Apache Arrow’s vision for a future more unified data analytics ecosystem. In this talk, Wes will discuss the underlying interfaces and protocols powering the project, trends in the Apache Arrow ecosystem, and work on Arrow-native query processing. We will also discuss the new Substrait initiative for portable logical query plans across physical... Read More
An Overview of Google BigQuery (Justin Levandoski)
Google BigQuery is a serverless, scalable, and cost effective cloud data warehouse. Having evolved from internal Google infrastructure (Dremel), BigQuery is unique in a number of dimensions. In this talk, we provide a look at some of the key architectural aspects of BigQuery and how it provides a true serverless and multi-tenant warehousing solution to customers. We then provide an overview of recent features such as BQML and the embedded BI engine that build on these architectural foundations that allow... Read More