Archived Events

Archived Events

May 11 2020
04:30pm EST
[DB Seminar] Spring 2020 DB Group: Introducing ClickHouse–the fastest data warehouse you’ve never heard of

The market for scalable SQL data warehouses is dominated by proprietary products. ClickHouse is one of the first open source projects to give those products a run for their money. ClickHouse scales to hundreds of nodes with ingest measured in millions of events per second. The user community includes CloudFlare, Cisco, and numerous financial services companies. This talk briefly recounts the history of ClickHouse, starting with its origins at Yandex, then dives into popular features. These include column storage with... Read More

May 4 2020
04:30pm EST
[DB Seminar] Spring 2020 DB Group: Active Learning for ML Enhanced Database Systems

Abstract: Recent research has shown promising results by using machine learning (ML) techniques to improve the performance of database systems, e.g., in query optimization or index recommendation. However, in many production deployments, the ML models’ performance degrades significantly when the test data diverges from the data used to train these models.   In this talk, I will present a solution to address this performance degradation by using B-instances to collect additional data during deployment. We propose an active data collection platform,... Read More

Apr 27 2020
04:30pm EST
[DB Seminar] Spring 2020 DB Group: Anna: a KVS for Any Scale

Modern cloud providers offer dense hardware with multiple cores and large memories, hosted in global platforms. This raises the challenge of implementing high-performance software systems that can effectively scale from a single core to multicore to the globe. Conventional wisdom says that software designed for one scale point needs to be rewritten when scaling up by 10-100x. In contrast, we explore how a system can be architected to scale across many orders of magnitude by design. We explore this challenge... Read More

Apr 20 2020
04:30pm EST
[DB Seminar] Spring 2020 DB Group: DuckDB – The SQLite for Analytics

The great popularity of SQLite shows that there is a need for unobtrusive in-process data management solutions. However, there is no such system yet geared towards analytical workloads. In this talk I will present DuckDB, a novel data management system designed to execute analytical SQL queries while embedded in another process. Zoom Link: https://cmu.zoom.us/j/562649242 Read More

Apr 13 2020
04:30pm EST
[DB Seminar] Spring 2020 DB Group: Mostly Order Preserving Dictionaries

Dictionary encoding, or domain encoding, is an important form of compression that uses a bijective mapping to replace attributes from a large domain (i.e. strings) with a finite domain (i.e. 32 bit integers). This encoding both reduces data storage and allows for more efficient query execution. Traditional dictionary encoding only supports efficient equality queries, while range queries require that encoded values are decoded for evaluating the predicates. An order preserving dictionary allows for range queries without decoding by ensuring that... Read More

Apr 6 2020
04:30pm EST
[DB Seminar] Spring 2020 DB Group: Round-table Discussion

The DB group will convene to have a casual round-table discussion on database topics. Zoom Link: https://cmu.zoom.us/j/562649242 Read More

Mar 30 2020
04:30pm EST
[DB Seminar] Spring 2020 DB Group: OtterTune Update

In this talk Dana will provide an update on running OtterTune at SocGen. Or, how OtterTune will take over the world. Zoom Link: https://cmu.zoom.us/j/562649242 Read More

Mar 23 2020
04:30pm EST
DB Seminar [Spring 2020] : Zero-Overhead Deterministic C++ Exceptions (“Herb Exceptions”)
Rohan Aggarwal

In this talk, Rohan Aggarwal will present a new proposal to the C++ standard for zero-overhead exceptions. A fundamental reason why C++ is successful and loved is its adherence to Stroustrup’s zero-overhead principle: You don’t pay for what you don’t use, and if you do use a feature you can’t reasonably code it better by hand. In the C++ language itself, there are only two features that violate the zero-overhead principle, exception handling and RTTI – and, unsurprisingly, these are... Read More

Feb 24 2020
04:30pm EST
GHC9115
DB Seminar [Spring 2020] : Compiling PL/SQL Away
Tanuj Nayak

In this talk, Tanuj Nayak will present Compiling PL/SQL Away from CIDR 2020. This paper details a method of overcoming current overheads of PL/SQL interpretation by compiling it to SQL Common Table Expressions (CTE) using the WITH RECURSIVE construct. Read More

Feb 17 2020
04:30pm EST
GHC9115
DB Seminar [Spring 2020] : sled and rio – modern database engineering with io_uring
Tyler Neely

sled is an embedded database that takes advantage of modern lock-free indexing and flash-friendly storage. rio is a pure-rust io_uring library unlocking the linux kernel's new asynchronous IO interface. This short talk will cover techniques that have been used to take advantage of modern hardware and kernels while optimizing for long term developer happiness in a complex, correctness-critical Rust codebase. https://fosdem.org/2020/schedule/event/rust_techniques_sled/ Read More