News & Events
CMU DB Wins ICDM 2017 ‘Best Paper’ Award
The CMU Database Group and the University of Porto (Portugal) won 2017 IEEE International Conference on Data Mining Best Paper award for their paper: TensorCast: Forecasting with Context using Coupled Tensors Miguel Ramos de Araujo, Pedro Manual Pinto Ribeiro, Christos Faloutsos This research provides a fast and effective algorithm to forecast the evolution of a tensor (like who-publishes-where-and-when), given additional, side information (like who has been co-author with whom). Read More
[DB Seminar] Fall 2017: CMU-DB Research Projects Overview
Andy will regale the team with a discussion of the various research projects that are currently ongoing this semester. He will then muse about various papers that he wants to write within the next year followed by a group discussion. Read More
[Time Series Database Lectures] Fintan Quill (Kx Systems)
Trying to solve the data riddle purely through the lens of architecture is missing a vital point: The unifying factor across all data is a dependency on time. The ability to capture and factor in time is the key to unlocking real cost efficiencies. Whether it’s streaming sensor data, financial market data, chat logs, emails, SMS or the P&L, each piece of data exists and changes in real time, earlier today or further in the past. Unless they are linked Read More
[Time Series Database Lectures] Saurabh Goel (Two Sigma)
Smooth is a distributed storage system for managing structured time series data at Two Sigma. Smooth's design emphasizes scale, both in terms of size and aggregate request bandwidth, reliability and storage efficiency. It is optimized for large parallel streaming read/write accesses over provided time ranges. Smooth has a clear separation between the metadata and data layers, and supports multiple pluggable object stores for storing data files. Data can be replicated or moved between different stores and data centers to support Read More
[Time Series Database Lectures] Edouard Alligand (QuasarDB)
QuasarDB is a scalable timeseries database that was designed to handle the extreme use cases one can find, for example, in market finance. In this talk we will see a couple of design and implementation decisions that were made to deliver the performance QuasarDB today delivers, especially regarding network communications, memory management and real-time aggregation. Part of Time Series Database Lectures 2017 Seminar Series Read More
[Time Series Database Lectures] Karthik Ramasamy (Streamlio)
Several enterprises have been producing data not only at high volume but also at high velocity. Many daily business operations depend on real-time insights, therefore real-time processing of the data is gaining significance. Hence there is a need for a scalable infrastructure that can continuously process billions of events per day the instant the data is acquired. To achieve real time performance at scale, Twitter developed and deployed Heron, a next-generation cloud streaming engine that provides unparalleled performance at large-scale. Read More
[DB Seminar] Fall 2017: Nick Katsipoulakis
Stream processing has become the dominant processing model for monitoring and real-time analytics. Modern Parallel Stream Processing Engines (pSPEs) have made it feasible to increase the performance in both monitoring and analytical queries by parallelizing a query’s execution and distributing the load on multiple workers. A determining factor for the performance of a pSPE is the partitioning algorithm used to disseminate tuples to workers. Until now, partitioning methods in pSPEs have been similar to the ones used in parallel databases Read More
Time Series Database Lectures – Seminar Series (Fall 2017)
The CMU Database group is holding a semester-long seminar series with the leading developers of time-series database management systems. The Time Series Database Lectures is designed to showcase some of the newer technologies available for data-intensive applications. Each speaker will present the implementation details of their respective systems and examples of the technical challenges that they faced when working with real-world customers. The list of confirmed speakers are: Sep 14 - Paul Dix (InfluxDB) Sep 21 - Karthik Ramasamy (Heron) Read More
[Time Series Database Lectures] Paul Dix (InfluxDB)
InfluxDB is an open source time series database written in Go. This talk will introduce how InfluxDB structures time series data and what makes it different from other use cases like OLTP. We'll then go into the internals of the storage engine we wrote from scratch, the Time Structured Merge Tree, heavily inspired by LSM trees. In addition to the raw time series storage, InfluxDB has an inverted index to quickly lookup time series metadata. Previously, we keep this index Read More
[DB Seminar] Fall 2017: Angela Jiang
Mainstream adaptively merges the video stream processing of concurrent applications sharing fixed edge resources to maximize aggregate result quality. Mainstream’s approach enables partial-DNN compute sharing among applications using DNNs (deep neural networks) that are fine-tuned from the same base model, decreasing aggregate per-frame compute time. Moreover, since the choice depends on the mix of applications running on an edge node, Mainstream automatically determines at deployment time the right trade-off between specializing more of a DNN, which improves per-frame accuracy, and Read More