Archived Events

Archived Events

Oct 11

2021

Oct 11 2021
[Vaccination 2021] How to Count Things with dbt (Drew Banin)
Speaker:
Drew Banin
System:
dbt
Video:
YouTube

Modern organizations leverage machine learning, data science, and AI to build predictive, responsive, and personalized applications. BUT! Most are bad at counting things. That's where dbt comes in. dbt is an open source framework used to define, test, and document datasets. In this talk, we will discuss the what, why, and how behind dbt and data warehousing in the year... Read More

Oct 4

2021

Oct 4 2021
[Vaccination 2021] Bodo: Automatic HPC Performance and Scaling for Data Processing in Python (Ehsan Totoni)
Speaker:
Ehsan Totoni
System:
Bodo
Video:
YouTube

Python is the language of choice for machine learning (ML) and AI, but SQL has been used for data processing for decades. Many data applications are often a mix of the two languages, which makes development and deployment cumbersome for data teams. BodoSQL addresses the "two-language" problem by compiling Python and SQL code together, providing type checking, error checking, end-to-end... Read More

Sep 27

2021

Sep 27 2021
[Vaccination 2021] The TileDB Universal Database (Stavros Papadopoulos)
Speaker:
Stavros Papadopoulos
System:
TileDB
Video:
YouTube

TileDB makes data management universal by modeling all types of data (tables, images, video, genomics, LiDAR and many more) as multi-dimensional arrays. TileDB enables storage on any backend and offers extreme interoperability via numerous language APIs, SQL databases and data science tools. It also takes data sharing, monetization and computation to extreme scale via its powerful serverless architecture. In this... Read More

Sep 20

2021

Sep 20 2021
[Vaccination 2021] Google Napa: Powering Scalable Data Warehousing with Robust Query Performance (Jagan Sankaranarayanan + Indrajit Roy)
Speakers:
Jagan Sankaranarayanan, Indrajit Roy
System:
Napa
Video:
YouTube

Napa powers Google’s data warehouse needs for critical clients like Ads and payments. These clients have differing requirements around cost, performance, and data freshness, including a strong expectation of variance-free, robust query performance. At its core, Napa’s principal technologies for robust query performance include the aggressive use of materialized views, which are maintained consistently as new data is ingested across... Read More

Sep 13

2021

Sep 13 2021
[Vaccination 2021] rqlite – The Distributed Database Built on Raft and SQLite (Philip O’Toole)
Speaker:
Philip O’Toole
System:
rqlite
Video:
YouTube

rqlite is a lightweight, distributed database which uses SQLite as its database engine. This presentation will discuss its goals, design, and implementation, with particular reference to its use of the Raft consensus algorithm, and its embedding of SQLite. We will also discuss rqlite testing, performance, lessons learned during development, and some of its real-world applications. This talk is part of... Read More

Aug 18

2021

Aug 18 2021
PhD Defense: Self-Driving Database Management Systems: Forecasting, Modeling, and Planning (Lin Ma)
Speaker:
Lin Ma

Database management systems (DBMSs) are an important part of modern data-driven applications. However, they are notoriously difficult to deploy and administer because they have many aspects that one can change that affect their performance, including database physical design and system configuration. There are existing methods that recommend how to change these aspects of databases for an application. But most of... Read More

Aug 9

2021

Aug 9 2021
MS Thesis Defense: Code Generation Log Replay for In-memory Database Management Systems (Tianlei Pan)
Speaker:
Tianlei Pan

Code generation is a widely-used technique for improving query execution throughput by compiling instructions into native code. This technique, however, leads to design challenges for the recovery system of a DBMS. The log replay process will be disconnected from the built-in execution engine that has been modified to operate efficiently on compiled code. This usually leads to the implementation of... Read More

Jun 14

2021

Jun 14 2021
[Vaccination 2021] Systems for Human Data Interaction (Eugene Wu)
Speaker:
Eugene Wu
System:
DVMS
Video:
YouTube

The rapid democratization of data has placed its access and analysis in the hands of the entire population. While the advances in rapid and large-scale data processing continue to reduce runtimes and costs, the interfaces and tools for end-users to interact with, and work with, data is still lacking. It is still too difficult to translate a user’s data needs... Read More

Jun 7

2021

Jun 7 2021
[Vaccination 2021] PostgreSQL Optimizer Methodology (Robert Haas)
Speaker:
Robert Haas
System:
PostgreSQL
Video:
YouTube

In this talk, I'll talk at a high level about how the PostgreSQL query planner approaches join planning, and how it gathers and uses statistics. Without losing sight of the fact that these algorithms generally work, I want to highlight some of the annoying cases where they break down, and the problems that they can cause for users and developers.... Read More

May 24

2021

May 24 2021
[Vaccination 2021] MonetDB: Scale Up Before You Scale Out (Sir Martin Kersten)
Speaker:
Sir Martin Kersten
System:
MonetDB
Video:
YouTube

MonetDB is the pioneering open-source main-memory oriented column store developed in a research setting and spinning out into the enterprise market to make a (performance) difference. MonetDB innovates at all layers of a DBMS, e.g. a storage model based on vertical fragmentation, a modern CPU-tuned query execution architecture, automatic and self-tuning indexes, run-time query optimization, and modular software architecture. In... Read More