[Vaccination 2021] The TileDB Universal Database (Stavros Papadopoulos)
TileDB makes data management universal by modeling all types of data (tables, images, video, genomics, LiDAR and many more) as multi-dimensional arrays. TileDB enables storage on any backend and offers extreme interoperability via numerous language APIs, SQL databases and data science tools. It also takes data sharing, monetization and computation to extreme scale via its powerful serverless architecture. In this presentation I will explain the various aspects of TileDB that makes this audacious vision possible and why no one had... Read More
[Vaccination 2021] Google Napa: Powering Scalable Data Warehousing with Robust Query Performance (Jagan Sankaranarayanan + Indrajit Roy)
Napa powers Google’s data warehouse needs for critical clients like Ads and payments. These clients have differing requirements around cost, performance, and data freshness, including a strong expectation of variance-free, robust query performance. At its core, Napa’s principal technologies for robust query performance include the aggressive use of materialized views, which are maintained consistently as new data is ingested across multiple data centers. In this talk we will discuss Napa’s architecture, and how it is able to cater to multiple... Read More
[Vaccination 2021] rqlite – The Distributed Database Built on Raft and SQLite (Philip O’Toole)
rqlite is a lightweight, distributed database which uses SQLite as its database engine. This presentation will discuss its goals, design, and implementation, with particular reference to its use of the Raft consensus algorithm, and its embedding of SQLite. We will also discuss rqlite testing, performance, lessons learned during development, and some of its real-world applications. This talk is part of the Vaccination Database (Second Dose) Tech Talk Seminar Series. Zoom Link: https://cmu.zoom.us/j/95002789605 (Passcode 982149) Read More
PhD Defense: Self-Driving Database Management Systems: Forecasting, Modeling, and Planning (Lin Ma)
Database management systems (DBMSs) are an important part of modern data-driven applications. However, they are notoriously difficult to deploy and administer because they have many aspects that one can change that affect their performance, including database physical design and system configuration. There are existing methods that recommend how to change these aspects of databases for an application. But most of them require humans to make final decisions on what changes to apply and when to apply them. Furthermore, these previous... Read More
MS Thesis Defense: Code Generation Log Replay for In-memory Database Management Systems (Tianlei Pan)
Code generation is a widely-used technique for improving query execution throughput by compiling instructions into native code. This technique, however, leads to design challenges for the recovery system of a DBMS. The log replay process will be disconnected from the built-in execution engine that has been modified to operate efficiently on compiled code. This usually leads to the implementation of a separate execution engine to deal with the execution of log records. To resolve this design conflict, we propose a... Read More
[Vaccination 2021] Systems for Human Data Interaction (Eugene Wu)
The rapid democratization of data has placed its access and analysis in the hands of the entire population. While the advances in rapid and large-scale data processing continue to reduce runtimes and costs, the interfaces and tools for end-users to interact with, and work with, data is still lacking. It is still too difficult to translate a user’s data needs into the appropriate interfaces, too difficult to develop data intensive interfaces that are responsive and scalable, and too difficult for... Read More
[Vaccination 2021] PostgreSQL Optimizer Methodology (Robert Haas)
In this talk, I'll talk at a high level about how the PostgreSQL query planner approaches join planning, and how it gathers and uses statistics. Without losing sight of the fact that these algorithms generally work, I want to highlight some of the annoying cases where they break down, and the problems that they can cause for users and developers. This talk is part of the Vaccination Database Tech Talk Seminar Series. Zoom Link: https://cmu.zoom.us/j/94112059546 (Password 809013) Read More
[Vaccination 2021] MonetDB: Scale Up Before You Scale Out (Sir Martin Kersten)
MonetDB is the pioneering open-source main-memory oriented column store developed in a research setting and spinning out into the enterprise market to make a (performance) difference. MonetDB innovates at all layers of a DBMS, e.g. a storage model based on vertical fragmentation, a modern CPU-tuned query execution architecture, automatic and self-tuning indexes, run-time query optimization, and modular software architecture. In this talk I will review the decades of development with a focus on its off-beat adaptive vectorized query execution engine.... Read More
[Vaccination 2021] Fast Materialized Views for Fast Websites (Malte Schwarzkopf)
Modern web applications require fast reads of query results over user data. In practice, they use a complex, brittle, and tricky-to-manage caching layers to achieve this performance. In this talk, I will discuss how we built a new database system, Noria, from the ground up around the paradigm of materialized view maintenance via incremental streaming dataflow. Noria combines eager and lazy dataflow processing to maintain partially-materialized views for an application's queries with high performance and reasonable memory footprint, yielding 5-70x... Read More
[Vaccination 2021] The Design of InfluxDB IOx: An In-Memory Columnar Database Written in Rust with Apache Arrow (Paul Dix)
I'll talk about the design of InfluxDB IOx, the future core of InfluxDB, an open source time series database. It's an in-memory columnar database that uses object storage for persistence. It's written in Rust and is built on top of Apache Arrow. Unlike previous versions of InfluxDB, IOx supports standards compliant SQL and the Postgres dialect in particular. This is in addition to backwards compatibility with InfluxQL and Flux, our other two query languages. InfluxDB IOx is a project that's... Read More