News & Events
[DB Seminar] Spring 2017: Xiangyao Yu
Strong consistency in parallel systems provides high programmability, but requires expensive coordination and scales poorly. This challenge exists in multiple layers of abstraction across the whole hardware and software stack. Examples include multicore processors, parallel transaction processing, and distributed systems. In this talk, I will introduce a simple primitive called logical leases to achieve strong consistency while maintaining good scalability and performance. Logical leases allow a system to avoid conflicts by reordering operations in both physical and logical time. I Read More
[DB Seminar] Spring 2017: Hyeontaek Lim
Multi-core in-memory databases promise high-speed online transaction processing. However, the performance of individual designs suffers when the workload characteristics miss their small sweet spot of a desired contention level, read-write ratio, record size, processing rate, and so forth. Cicada is a single-node multi-core in-memory transactional database with serializability. To provide high performance under diverse workloads, Cicada reduces overhead and contention at several levels of the system by leveraging optimistic and multi-version concurrency control schemes and multiple loosely synchronized clocks while Read More
[DB Seminar] Spring 2017: Dana Van Aken
Database management system (DBMS) configuration tuning is an essential aspect of any data-intensive application effort. But this is historically a difficult task because DBMSs have hundreds of configuration "knobs" that control everything in the system, such as the amount of memory to use for caches and how often data is written to storage. The problem with these knobs is that they are not standardized (i.e., two DBMSs use a different name for the same knob), not independent (i.e., changing one Read More
[DB Seminar] Spring 2017: Wei (David) Dai
Machine Learning (ML) systems depend on data engineering – the practice of transforming a small set of raw measurements to a large number of features – to substantially increase the accuracy of their results. However, as ML problem grow in both data size (number of instances) and model size (number of dimensions), existing systems that support data engineering have not been able to keep pace, and either fail to run or do so very slowly. Sometimes, one-off code can be Read More
[DB Seminar] Spring 2017: Marcel Kornacker
Running real-time data-intensive applications on Apache Hadoop requires complex architectures to store and query data, typically involving multiple independent systems that are tied together through custom-engineered pipelines. A common pattern is to use a NoSQL engine like Apache HBase for caching and later transformations, the results of which are periodically written to HDFS in one of the popular open columnar file formats as a prerequisite for querying by a SQL engine. Apache Kudu (incubating), a new scalable distributed storage engine Read More
[DB Seminar] Spring 2017: Prashanth Menon
In-memory database management systems (DBMSs) are a key component of modern on-line analytic processing (OLAP) applications, since they provide low-latency access to large volumes of data. Because disk accesses are no longer the principle bottleneck in such systems, the focus in designing query execution engines has shifted to optimizing CPU performance. Recent systems have revived an older technique of using just-in-time (JIT) compilation to execute queries as native code instead of interpreting a plan. The state-of-the-art in query compilation is Read More
[DB Seminar] Spring 2017: Huanchen Zhang
Succinct data structures are those that require, asymptotically, only the minimum number of bits required by information theory, while still answering queries efficiently. Despite the importance of space efficiency, particularly for today’s massive-scale data services, succinct data structures remain primarily of theoretical interest outside of a few application areas. Our goal in this paper is to make succinct tries practical for general database and file system use. We propose LOUDS-DS, a new succinct trie encoding method that can support fast Read More
[DB Seminar] Spring 2017: Round table discussion
We will have a round table discussion. Read More
[DB Seminar] Spring 2017: Mohammad Hammoud
Relational join is a fundamental data management operation, which highly influences the performance of almost every database query. In this talk, I will show that different workload characteristics and hardware configurations necessitate different main-memory hash join models. Subsequently, I will identify four effective models by which any hash-based join algorithm can be executed. I will characterize the relative merits of each model and present PolyHJ, a novel polymorphic join scheme, which dynamically selects the best model for any given workload Read More
[DB Seminar] Spring 2017: Joy Arulraj
Joy will give a talk on his work.The difference in the performance characteristics of volatile (DRAM) and non-volatile storage devices (HDD/SSDs) influences the design of database management systems. The key assumption has always been that the latter is much slower than the former. This affects all aspects of a DBMS's runtime architecture. But the arrival of new non-volatile memory (NVM) storage that is almost as fast as DRAM with fine-grained read/writes invalidates these previous design choices. In this talk, I Read More