News & Events
[DB Seminar] Spring 2018: Huanchen Zhang
We present the Succinct Range Filter (SuRF), a fast and compact data structure for approximate membership tests. Unlike traditional Bloom filters, SuRF supports both single-key lookups and common range queries: open-range queries, closed-range queries, and range counts. SuRF is based on a new data structure called the Fast Succinct Trie (FST) that matches the point and range query performance of state-of-the-art order-preserving indexes, while consuming only 10 bits per trie node. The false positive rates in SuRF for both point and range queries Read More
Prof. Andy Pavlo wins Sloan Research Fellowship
New York City, New York — No one can escape the vicissitudes of life. We all know that going into this. We are driven to do one thing and one thing only: Databases. But this means that we will never ask, but instead just give. We are afforded the opportunity to wake up every morning and say to ourselves "today is the greatest day because I get to work on databases." Given this, the CMU Database Group is pleased to Read More
[DB Seminar] Spring 2018: Yangjun Sheng
Current architectures for main-memory online transaction processing (OLTP) database management systems (DBMS) typically use random scheduling to assign transactions to threads. This approach achieves uniform load across threads but it ignores the likelihood of conflicts between transactions. If the DBMS could estimate the potential for transaction conflict and then intelligently schedule transactions to avoid conflicts, then the system could improve its performance. Such estimation of transaction conflict, however, is non-trivial for several reasons. First, conflicts occur under complex conditions that Read More
[DB Seminar] Spring 2018: Ziqiang Feng
High-resolution, continuously-recording, and nearly-ubiquitous cameras provide great value for retrospective video analysis tasks in areas such as crime investigations and scientific research. This kind of video analysis task is often both interactive and exploratory in nature, where multiple queries are tried, aborted, refined, and re-executed in an iterative fashion. This exploratory video analysis usage model presents some unique challenges and opportunities for system optimizations. We present our ongoing work in building EVA, an efficient system for exploratory video analysis. We Read More
[DB Seminar] Spring 2018: Ziqi Wang
Abstract: Lock-free data structures have long been rumored to provide better performance and scalability than their lock-based counterparts. On the other hand, however, there are state-of-the-art in-memory indices using various fine-grained synchronization techniques that outperform the classical lock coupling implementation. In this talk, we investigate into a concrete, optimized implementation of a lock-free B+Tree, the Bw-Tree[1], and then conduct an apple-to-apple comparison between the Bw-Tree and other state-of-the-art in-memory indices such as the Skiplist[2], MassTree[3], BTree (OLC[4]) and ART[5] (OLC). Read More
[DB Seminar] Spring 2018: Aaron Harlap
PipeDream is a new distributed training system for deep neural networks (DNNs) that partitions ranges of DNN layers among machines, and aggressively pipelines computation and communication. Today’s pervasive use of data-parallel training performs well for DNNs of up to 10–20 million model parameters, but inter-machine communication dominates for models that are even 10x larger (e.g., up to 85% of time training the VGG16 model is spent on communication) – it seems likely that models will only get larger in the Read More
[DB Seminar] Spring 2018: Stephen Walkauskas (Vertica)
In the beginning there was a DBMS, a flexible piece of software that could be used for OLTP and OLAP workloads. When transaction throughput increased and data sizes grew the database needed to be split into two, each instance optimized for a particular workload. And so it has been ever since and the distance between the two systems has increased, and now specialized database software is offered. What's more, data hoarders have given rise to a need for a third Read More
[DB Seminar] Spring 2018: Joy Arulraj
We are at an exciting point in the evolution of memory technology. Device manufacturers have created a new non-volatile memory (NVM) technology that can serve as both system memory and storage. NVM supports fast reads and writes similar to volatile memory, but all writes to it are persistent like a solid-state disk. The advent of NVM invalidates decades of design decisions that are deeply embedded in today's database management systems (DBMSs). These systems are unable to take full advantage of Read More
[DB Seminar] Spring 2018: Alok Pareek (Striim)
In this seminar - Alok will present Striim, a distributed Streaming platform, and talk about the platform's motivation, distributed architecture, use cases, and open challenges. Read More