News & Events
[DB Seminar] Spring 2020 DB Group: Finding Logic Bugs in Database Management Systems
Database Management Systems (DBMS) are used ubiquitously for storing and retrieving data. It is critical that they function correctly --- incorrectly computed result sets (e.g., by omitting a row) can cause serious loss or damage. We refer to such defects as logic bugs. Despite their importance, finding logic bugs in production DBMS is a longstanding challenge. Existing techniques such as fuzzing and differential testing are ineffective in finding them. We have proposed a set of novel techniques to effectively detect Read More
[DB Seminar] Spring 2020 DB Group: Introducing ClickHouse–the fastest data warehouse you’ve never heard of
The market for scalable SQL data warehouses is dominated by proprietary products. ClickHouse is one of the first open source projects to give those products a run for their money. ClickHouse scales to hundreds of nodes with ingest measured in millions of events per second. The user community includes CloudFlare, Cisco, and numerous financial services companies. This talk briefly recounts the history of ClickHouse, starting with its origins at Yandex, then dives into popular features. These include column storage with Read More
Quarantine 2020 Database Tech Talks
Pittsburgh, PA — The Carnegie Mellon Database Group is hosting a series of online database technical tech talks during the COVID-19 lockdown. These talks will feature leading researchers and industry developers that are building state-of-the-art systems. CMU-DB's weekly meetings (Mondays @ 4:30pm EST) are available to the public on Zoom. Non-CMU affiliated members of the general public are invited to attend. See the seminar info page for the schedule of upcoming talks. The recordings are available on Youtube afterwards: https://www.youtube.com/playlist?list=PLSE8ODhjZXjagqlf1NxuBQwaMkrHXi-iz Read More
[DB Seminar] Spring 2020 DB Group: APOLLO: Automatic Detection and Diagnosis of Performance Regressions in Database Systems
The practical art of constructing database management systems (DBMSs) involves a morass of trade-offs among query execution speed, query optimization speed, standards compliance, feature parity, modularity, portability, and other goals. It is no surprise that DBMSs, like all complex software systems, contain bugs that can adversely affect their performance. The performance of DBMSs is an important metric as it determines how quickly an application can take in new information and use it to make new decisions. Both developers and users Read More
[DB Seminar] Spring 2020 DB Group: Active Learning for ML Enhanced Database Systems
Abstract: Recent research has shown promising results by using machine learning (ML) techniques to improve the performance of database systems, e.g., in query optimization or index recommendation. However, in many production deployments, the ML models’ performance degrades significantly when the test data diverges from the data used to train these models. In this talk, I will present a solution to address this performance degradation by using B-instances to collect additional data during deployment. We propose an active data collection platform, Read More
[DB Seminar] Spring 2020 DB Group: Anna: a KVS for Any Scale
Modern cloud providers offer dense hardware with multiple cores and large memories, hosted in global platforms. This raises the challenge of implementing high-performance software systems that can effectively scale from a single core to multicore to the globe. Conventional wisdom says that software designed for one scale point needs to be rewritten when scaling up by 10-100x. In contrast, we explore how a system can be architected to scale across many orders of magnitude by design. We explore this challenge Read More
[DB Seminar] Spring 2020 DB Group: DuckDB – The SQLite for Analytics
The great popularity of SQLite shows that there is a need for unobtrusive in-process data management solutions. However, there is no such system yet geared towards analytical workloads. In this talk I will present DuckDB, a novel data management system designed to execute analytical SQL queries while embedded in another process. Zoom Link: https://cmu.zoom.us/j/562649242 Read More
[DB Seminar] Spring 2020 DB Group: Mostly Order Preserving Dictionaries
Dictionary encoding, or domain encoding, is an important form of compression that uses a bijective mapping to replace attributes from a large domain (i.e. strings) with a finite domain (i.e. 32 bit integers). This encoding both reduces data storage and allows for more efficient query execution. Traditional dictionary encoding only supports efficient equality queries, while range queries require that encoded values are decoded for evaluating the predicates. An order preserving dictionary allows for range queries without decoding by ensuring that Read More
[DB Seminar] Spring 2020 DB Group: Round-table Discussion
The DB group will convene to have a casual round-table discussion on database topics. Zoom Link: https://cmu.zoom.us/j/562649242 Read More
[DB Seminar] Spring 2020 DB Group: OtterTune Update
In this talk Dana will provide an update on running OtterTune at SocGen. Or, how OtterTune will take over the world. Zoom Link: https://cmu.zoom.us/j/562649242 Read More