News & Events
[PDL Visit Day 2018] Weiwei Gong (Oracle)
Oracle Database In-Memory dual format was first introduced in 12c in 2013, it optimizes both analytics and mixed workload OLTP, delivering outstanding performance for transactions while simultaneously supporting real-time analytics, business intelligence, and reports. In this talk, I will go over different features in Oracle Database In-Memory, and describe how we accelerate joins and aggregations on In-Memory Database. Read More
[PDL Visit Day 2018] Zahra Khatami (Oracle)
SPDK has been successful in enabling a large class of high performance user mode storage applications and appliance. SPDK provides direct access to local NVMe SSDs as well as access to remote storage targets using NVMeoF. SPDK provides a highly concurrent and asynchronous runtime with no locking in the I/O path. High throughput and low latency is realized by directly polling the hardware queues for completions. DPDK toolkit is used for memory management and lock free message passing between compute Read More
[DB Seminar] Spring 2018: Lin Ma
The first step towards an autonomous database management system (DBMS) is the ability to model the target application’s workload. This is necessary to allow the system to anticipate future workload needs and select the proper optimizations in a timely manner. Previous forecasting techniques model the resource utilization of the queries. Such metrics, however, change whenever the physical design of the database and the hardware resources change, thereby rendering previous forecasting models useless. We present a robust forecasting framework called QueryBot Read More
[DB Seminar] Spring 2018: Capstone Presentations
Siva Sudhir, Pooja Nilangekar, Bohan Zhang, and Aaron Tian will present their capstone projects. Bohan: OtterTune is really coming: how to use OtterTune to tune your DBMS automatically Aaron: Fast Durability and Recovery in In-memory Databases Siva: Compilation of User-Defined Functions in Peloton Read More
Prof. Andy Pavlo wins 2018 Joel & Ruth Spira Teaching Award
Pittsburgh, Pennsylvania — A great philosopher was once asked what their ultimate goal was. Without a notice of hesitation, the philosopher replied that right now he did not feel that he was getting what he wanted. But at some later point when he received a small amount of accolades, then that was when the world would know that it is on. Indeed, this is because (without trying to brag too much) he knew that he had something that the world Read More
Jiaqi Yan (Snowflake Computing)
For partitioned tables, maintaining good clustering properties for frequently accessed dimensions is critical for partition pruning performance. Naive methods of clustering maintenance could be expensive, especially when the clustering dimensions are different from the dimensions with which the data is loaded. On the other hand, approximate clustering is cheaper to maintain while still resulting in good pruning performance. In this talk, I will present Snowflake's clustering capabilities, including our algorithm for incremental maintenance of approximate clustering of partitioned tables, as Read More
Prof. Andy Pavlo wins 2017 Google Faculty Research Award
Mountain View, California — Science is a dirty game. People get hurt. Marriages get broken. Mix tapes get dropped. But in the end one is able to move humanity forward and make a difference. This is why we do database. This is why we get up in the morning for the research grind. Maybe there is a better life out there, but frankly we do not want to hear it. Given this, the CMU Database Group is pleased to announce Read More
Sanjay Krishnan (Berkeley)
A statistical model is only as good as its training data. Systematic errors can arise when data are integrated from untrustworthy sources, collected in mixed formats, or contain inconsistent references of the same real-world entities. This talk describes the classical relational database topic of "data cleaning", i.e., the process of transforming the data to remove such issues, from a modern statistical perspective. My talk emphasizes two central themes: (1) analyzing data cleaning algorithms using statistical theory regarding sample-complexity and generalization Read More
[DB Seminar] Spring 2018: Ajit Mylavarapu [Oracle]
Analytic workloads in data management systems are dominated by joins, aggregations, scan and filtering costs. In-Memory columnar databases have significantly optimized scans using compressed data formats and SIMD vectorization techniques, but have made little impact to the rest of the query execution plan. The Oracle Database In-Memory (DBIM) Option introduced new SQL execution operators that accelerate a wide range of analytic queries by optimizing aggregation over joins for star and similar schemas. Group-by expressions are pushed down into the scans Read More