News & Events
Murat Demirbas (University at Buffalo)
Work on theory of distributed systems abstract away from the physical-clock time and use the notion of logical clocks for ordering events in asynchronous distributed systems. Practice of distributed systems, on the other hand, employ loosely synchronized clocks using NTP in a best-effort manner without any guarantees. Recently, we introduced a third option: hybrid clocks. Hybrid clocks combine the best of logical and physical clocks; hybrid clocks are immune to the disadvantages of either while providing benefits of both. Hybrid Read More
Dana Van Aken Wins 2016 National Science Foundation Graduate Fellowship
CMU DB Ph.D. student Dana Van Aken won a National Science Foundation Graduate Fellowship. Dana's research is focused on using machine learning techniques for automatic database management system tuning and configuration. NSF's Graduate Research Fellowship Program supports outstanding student researchers pursuing graduate degrees in science, technology, engineering and mathematics who demonstrate the potential to have a significant impact in their fields. Almost 17,000 students applied for a total of 2,000 fellowships awarded nationwide. More Information about the NSF Graduate Fellowship Read More
Yi Pan (Apache Samza @ LinkedIn)
This talk will provide an overview of LinkedIn's distributed stream processing platform, including Samza/Kafka/Databus. It will first cover the high level scenarios for stream processing in LinkedIn, followed by detailed requirements around scalability, re-processing, accuracy of results, and easy programmability; then we will focus on the requirements on stateful stream processing applications and explain how Samza's state management allows us to build applications meet all the above mentioned requirements. The key concepts, architecture and usage in LinkedIn's stream processing pipeline Read More
[DB Seminar] Spring 2016: Lin Ma
In-memory database management systems (DBMSs) outperform disk-oriented systems for on-line transaction processing (OLTP) workloads. But this improved performance is only achievable when the database is smaller than the amount of physical memory available in the system. To overcome this limitation, some in-memory DBMSs can move cold data out of volatile DRAM to secondary storage. Such data appears as if it resides in memory with the rest of the database even though it does not. Although there have been several implementations Read More
[DB Seminar] Spring 2016: Huanchen Zhang
Using indexes for query execution is crucial for achieving high performance in modern on-line transaction processing databases. For a main-memory database, however, these indexes consume a large fraction of the total memory available and are thus a major source of storage overhead of in-memory databases. To reduce this overhead, we propose using a two-stage index: The first stage ingests all incoming entries and is kept small for fast read and write operations. The index periodically migrates entries from the first Read More
[DB Seminar] Spring 2016: Miguel Araujo
Miguel will give a practice talk on his thesis proposal. Abstract: The identification of anomalies and communities of nodes in real-world graphs has applications in widespread domains, from the automatic categorization of wikipedia articles or websites to bank fraud detection. While recent and ongoing research is supplying tools for the analysis of simple unlabeled data, it is still a challenge to find patterns and anomalies in large labeled datasets, such as time evolving networks. What do real communities identified in big datasets look like? How is their structure affected by their Read More
[DB Seminar] Spring 2016: Dana Van Aken
Database management system (DBMS) configuration tuning is an essential aspect of any data-intensive application effort. But this is historically a difficult task because DBMSs have hundreds of configuration "knobs" that control everything in the system, such as the amount of memory to use for caches and how often data is written to storage. The problem with these knobs is that they are not standardized (i.e., two DBMSs use a different name for the same knob), not independent (i.e., changing one Read More
Monte Zweben (Splice Machine)
This talk describes the Splice Machine RDBMS designed to power today’s new class of modern applications that require high scalability and high-availability while simultaneously executing OLTP and OLAP workloads. Splice Machine is a full ANSI SQL database that is ACID compliant, supports secondary indexes, constraints, triggers, and stored procedures. It uses a unique, distributed snapshot isolation algorithm that preserves transactional integrity, and avoids the latency of 2PC methods. The talk will also present a variety of distributed join algorithms implemented Read More
Vagelis invited to Dagstuhl
Vagelis has been invited to participate in Dagstuhl Perspectives Workshop,"Tensor Computing for Internet of Things" (16152), to be held at Schloss Dagstuhl in Germany from Sunday, April 10 to Wednesday, April 13, 2016. Schloss Dagstuhl, a nonprofit institution with associates from eleven universities and research organizations, is one of the world's leading research centers in informatics. It has been hosting invitation-only Dagstuhl Seminars and Dagstuhl Perspectives Workshops since the center's founding in 1990. More information about Schloss Dagstuhl and our Read More
[DB Seminar] Spring 2016: Vladimir I. Zadorozhny
Information fusion deals with reconstructing objects from multiple, possibly incomplete and inconsistent observations. The task of scalable information fusion is critical for interdisciplinary research where a comprehensive picture of the subject requires large amounts of data from disparate data sources. Despite its increasing availability, making sense of such data is not trivial. In this talk I will elaborate on challenges in developing an infrastructure that facilitates scalable information integration and fusion. I will introduce an efficient framework that enables systematic Read More