Archived Events

Archived Events

Mar 15

2021

Mar 15 2021
[Vaccination 2021] HarperDB’s Data Storage Journey: From File System to LMDB (Kyle Bernhardy)
Speaker:
Kyle Bernhardy
System:
HarperDB
Video:
YouTube

HarperDB is a distributed database with hybrid SQL and NoSQL functionality in one, accessed via a REST API. Known as a structured object store with SQL capabilities, or NewSQL. HarperDB leverages a logical structure enabling ACID compliant efficient storage and retrieval without inconsistency, race conditions, or utilizing in-memory indexing. HarperDB is fully indexed and runs on any device from edge... Read More

Mar 8

2021

Mar 8 2021
[Vaccination 2021] Novel Design Choices in Apache CouchDB (Adam Kocoloski)
Speaker:
Adam Kocoloski
System:
CouchDB
Video:
YouTube

Apache CouchDB is a JSON document store with a native HTTP API, server-side JavaScript indexing, and active/active data replication across flexible configurations of server instances that are free to come and go as they please. Under the hood the DBMS is implemented largely in Erlang and features copy-on-write B-trees, hash histories for automatic revision tracking of individual records, and a... Read More

Mar 1

2021

Mar 1 2021
[Vaccination 2021] Inside Apache Druid’s Storage and Query Engine (Gian Merlino)
Speaker:
Gian Merlino
System:
Druid
Video:
YouTube

Apache Druid is an open-source columnar database known for high performance at scale; its largest deployments comprise thousands of servers. But no matter the scale, high performance starts with good fundamentals. This talk will dive into those fundamentals by exploring the inner workings of a single data server. We'll cover how Apache Druid stores data, what kinds of compression it... Read More

Feb 22

2021

Feb 22 2021
[Vaccination 2021] Citus: Distributed PostgreSQL as an Extension (Marco Slot)
Speaker:
Marco Slot
System:
Citus
Video:
YouTube

One of the defining characteristics of PostgreSQL is its extensibility, which enables developers to add new database functionality without forking from the original project. Citus is an open source PostgreSQL extension that transforms PostgreSQL into a distributed database. The goal of Citus is to make the versatile set of data processing capabilities in PostgreSQL available at any scale. Citus can... Read More

Feb 15

2021

Feb 15 2021
[Vaccination 2021] Star-Tree Index: Space-Time Trade Off in OLAP (Kishore Gopalakrishna)
Speaker:
Kishore Gopalakrishna
System:
Pinot
Video:
YouTube

The need for real-time analytics has proliferated in the modern data landscape. The industry is moving towards providing analytics to end-users via interactive apps instead of traditional dashboards. Whether it's user-facing analytical applications such as LinkedIn's "Who Viewed My Profile" or an internal monitoring tool used by Uber's city ops team to regulate trips in a region, it is imperative... Read More

Feb 8

2021

Feb 8 2021
[Vaccination 2021] Performance Testing at MongoDB (David Daly)
Speaker:
David Daly
System:
MongoDB
Video:
YouTube

It is important for developers to understand the performance of a software project as they develop new features, fix bugs, and try to generally improve the product. While it is simple to state that requirement, it can be hard to do in practice. There are a lot of choices an organization faces when trying to understand the performance of the... Read More

Feb 3

2021

Feb 3 2021
MS Thesis Defense: An Evaluation of Compilation-Based PL/PGSQL Execution (Tanuj Nayak)

User Defined Functions (UDFs) are an important analytical feature in modern Database Management Systems (DBMSs) due to their server-side execution properties. These properties allow complex analytical queries to execute without serializing intermediate data over a network. However, query engines often incur significant overheads when executing UDFs due to them being non-declarative in contrast to SQL queries. This contrast causes a... Read More

Feb 1

2021

Feb 1 2021
[Vaccination 2021] SLOG: Serializable, Low-latency, Geo-replicated Transactions (Daniel Abadi)
Speaker:
Daniel Abadi
System:
SLOG
Video:
YouTube

For decades, applications deployed on a world-wide scale have been forced to give up at least one of (1) strict serializability (2) low latency writes (3) high transactional throughput. This talk will overview SLOG: a system that avoids this tradeoff for workloads which contain physical region locality in data access. SLOG achieves high-throughput, strictly serializable ACID transactions at geo-replicated distance... Read More

Jan 25

2021

Jan 25 2021
NoisePage: The Self-Driving Database Management System (Lin Ma)
Speaker:
Lin Ma

Database management systems (DBMSs) are an important part of modern data-driven applications. However, they are notoriously difficult to deploy and administer. There are existing methods that recommend physical design or knob configurations for DBMSs. But most of them require humans to make final decisions and decide when to apply changes. The goal of a self-driving DBMS is to remove the... Read More

Dec 16

2020

Dec 16 2020
On Automatic Database Management System Tuning Using Machine Learning (Dana Van Aken)
Speaker:
Dana Van Aken

Database management systems (DBMSs) are an essential component of any data-intensive application. But tuning a DBMS to perform well is a notoriously difficult task because they have hundreds of configuration knobs that control aspects of their runtime behavior, such as cache sizes and how frequently to flush data to disk. Getting the right configuration for these knobs is hard because... Read More