[¡Databases! 2022] Litestream: Making Single-Node Deployments Cool Again (Ben Johnson)
SQLite has long been regarded as an incredibly reliable, fast, & easy-to-use database in the world of personal devices such as laptops & phones. However, it's never gained much traction in the world of web applications because it's built as a single-node database. Litestream adds simple, cheap streaming replication to SQLite to expand the use cases that the database can be used for. Litestream provides the missing disaster recovery tooling to make SQLite a viable database for many concurrent, production... Read More
[¡Databases! 2022] Odyssey: PostgreSQL Connection Proxy! (Andrey Borodin)
In hypertext world connection proxies is a must for many decades now. And in many cases this idea works for databases too! Today almost any busy OLTP Postgres instance have to use some sort of proxy: old but gold PgBouncer, scalable Odyssey or entirely new SPQR\PgCat. In this talk I'll discuss what allows proxies to be useful, what Postgres hackers do to fix this and architecture of proxy that I maintain - Odyssey. Odyssey is a scalable multi-threaded connection pooler... Read More
[¡Databases! 2022] Rockset: High Performance Queries with Dynamically Typed SQL (Ben Hannel)
This talk is part of the ¡Databases! – A Database Seminar Series. Zoom Link: https://cmu.zoom.us/j/94466872009 (Passcode 424050) Read More
[¡Databases! 2022] Snowflake Iceberg Tables, Streaming Ingest, and Unistore! (Ashish Motivala)
Why settle for 1 cool db talk when you can get 3? Snowflake is pushing the boundaries of what a unified cloud data platform can do. Today we'll talk about how Snowflake can be combined with open standards like Apache Iceberg, hard tech to stream data into Snowflake and bring transactional and analytical workloads together in a single platform. Apache Iceberg is an open source project that provides a way to represent a table as files on the cloud. Iceberg... Read More
[¡Databases! 2022] Umbra: A Disk-Based System with In-Memory Performance (Thomas Neumann)
The increases in main-memory sizes over the last decade have made pure in-memory database systems feasible, and in-memory systems offer unprecedented performance. However, DRAM is still relatively expensive, and the growth of main-memory sizes has slowed down. In contrast, the prices for SSDs have fallen substantially in the last years, and their read bandwidth has increased to gigabytes per second. This makes it attractive to combine a large in-memory buffer with fast SSDs as storage devices, combining the excellent performance... Read More
[Vaccination 2022] IO in PostgreSQL: Past, Present, Future (Andres Freund)
PostgreSQL traditionally has handled IO in a fairly minimal way, relying on the operating system more than most other databases. This talk will discuss why PostgreSQL mostly got away with that so far, why current hardware trends (NVMe with very high bandwidth / low latency, cloud storage with high latency but good random / concurrent read behaviour) require changing course and the path towards modernizing the IO stack in PostgreSQL to use asynchronous / direct IO. This talk is part... Read More
[Vaccination 2022] RonDB: A Key-Value Store with SQL Capabilities and LATS Properties (Mikael Ronström)
RonDB is a key-value store with SQL capabilities and LATS (Latency/Availability/Throughput/ScalableStorage) properties. It is based on MySQL NDB Cluster that is used in extremely available applications such as universal data storage for mobile operators for many billions of subscribers. It is also used in gaming applications, financial applications and other areas. The main focus of RonDB in Hopsworks is as a platform for Machine Learning. RonDB handles the data storage for the Feature Store in Hopsworks, both the online Feature... Read More
[Vaccination 2022] Velox: An Open-source Unified Execution Engine (Deepak Majeti)
Data keeps getting bigger, processing keeps getting more and more complex but the hardware does not get faster. We need to reconsider efficiency from the ground up. While these data processing systems handle various workloads (e.g. “batch”, “analytical”, “streaming”, “AI/ML”), they employ common features such as functions, joins, filter-pushdown, sorting, grouping, projections, etc… A shared library that provides optimized implementations of this common functionality and which can consolidate these data processing systems is desired. The Velox project is being developed... Read More
[Vaccination 2022] QuestDB: Fast Open Source Time Series Database (Vlad Ilyushchenko)
In this talk, we will discuss major technical challenges developers face when dealing with time series data and QuestDB's design principles that are meant to solve these challenges. We will then go through QuestDB's performance focused architecture and cover topics like storage model, transactions, in-order and out-of-order ingestion, concurrency control, and network interfaces. This talk is part of the Vaccination Database (Booster) Tech Talk Seminar Series. Zoom Link: https://cmu.zoom.us/j/95002789605 (Passcode 982149) Read More
[Vaccination 2022] Yellowbrick: An Elastic Data Warehouse on Kubernetes (Mark Cusack)
Yellowbrick is an elastic SQL data warehouse with a design centered on efficiency, high concurrency and performance. The database management system is composed from a set of Kubernetes-orchestrated containers. Kubernetes provides the single-source-of-truth for system configuration and state, and manages all warehousing lifecycle operations. In this session, I'll provide an overview of Yellowbrick and its microservices architecture, and focus on the work we've done in our query optimizer, workload management system and in the Linux kernel to drive high performance... Read More