News & Events
Alex Robinson (CockroachDB)
Learn more about CockroachDB from Google alum and Member of Technical Staff, Alex Robinson! The talk will focus on how CockroachDB ensures data integrity, no matter how broadly distributed. Read More
[DB Seminar] Fall 2016: Prakhar Ojha
In this talk, I shall discuss two interesting problems pertinent to quality-control and budget-optimization in complex crowdsourcing. Crowdsourcing has evolved from solving simpler tasks, like image-classification, to more complex tasks such as document editing, language translation, product designing etc. Unlike micro-tasks performed by a single worker, these complex tasks require a group of workers and greater resources. If the task-requester is interested in making individual payments based on their respective efforts in the group, she will need a strategy to Read More
[DB Seminar] Fall 2016: Ashraf Aboulnaga (QCRI)
Distributed data processing platforms such as Pregel and GraphLab have substantially simplified the design and deployment of certain classes of distributed graph analytics algorithms. However, these platforms do not represent a good match for distributed graph mining problems, for example, finding frequent subgraphs in a graph. Given an input graph, these problems require exploring a very large number of subgraphs and finding patterns that match some "interestingness" criteria desired by the user. These algorithms are very important for areas such Read More
[DB Seminar] Fall 2016: Matteo Riondato (Two Sigma)
TRIÈST is a suite of one-pass streaming algorithms to compute unbiased, low-variance, high- quality approximations of the global and local (i.e., incident to each vertex) number of triangles in a fully-dynamic graph represented as an adversarial stream of edge insertions and deletions. The algorithms use reservoir sampling and its variants to exploit the user-specified memory space at all times. This is in contrast with previous approaches, which require hard-to-choose parameters (e.g., a fixed sampling probability) and offer no guarantees on Read More
[DB Seminar] Fall 2016: Wolfgang Gatterbauer (CMU)
Performing inference over large uncertain data sets is becoming a central data management problem. Recent large knowledge bases, such as Yago, Nell or DeepDive, have millions to billions of uncertain tuples. Because general reasoning under uncertainty is highly intractable, many state-of-the-art systems today perform approximate inference by reverting to sampling. This talk shows an alternative approach that allows approximate ranking answers to hard probabilistic queries in guaranteed polynomial time, and by using only basic operators of existing database management systems Read More
[DB Seminar] Fall 2016: Yingjun Wu
Multi-version concurrency control (MVCC) is currently the most popular scheme used in modern database management systems (DBMSs). Although the protocol was discovered in the late 1970s, it is used in almost every major relational DBMS released in the last decade. Maintaining multiple versions of data potentially increases parallelism without sacrificing serializability. But scaling MVCC schemes in a multi-core, in-memory DBMS is non-trivial: when there are a large number of threads running in parallel, the synchronization overhead can outweigh the benefits of Read More
[DB Seminar] Fall 2016: Prof. Shenghua Liu
With mobile and web-based techniques to create highly interactive platforms, social media becomes prevalent in our daily life. It sees the interaction among people in which they create, share, discuss, or exchange ideas in virtual communities and networks. In this talk, he will introduce a series of his previous research work related to social media. They range from understanding short text, opinions, users' influences, and network structures. Among those topics there is one principle philosophy undergoing them, which is learning features from Read More
[DB Seminar] Fall 2016: Emaad Manzoor
Given a stream of heterogeneous graphs containing different types of nodes and edges, how can we spot anomalous ones in real-time while consuming bounded memory? This problem is motivated by and generalizes from its application in security to host-level advanced persistent threat (APT) detection. We propose StreamSpot, a clustering based anomaly detection approach that addresses challenges in two key fronts: (1) heterogeneity, and (2) streaming nature. We introduce a new similarity function for heterogeneous graphs that compares two graphs based Read More
CMU Wins SIGKDD 2016 ‘Best Research Paper’ Award
The Carnegie Mellon Database Group team won the Best Research Paper award in SIGKDD 2016 for their paper FRAUDAR: Bounding Graph Fraud in the Face of Camouflage. The authors were Bryan Hooi, Hyun Ah Song, Alex Beutel, Neil Shah, Kijung Shin, and Prof. Christos Faloutsos. KDD is the flagship data mining conference. The award ceremony was on Sunday Aug. 14, in San Francisco. The paper was selected out of over 700 submissions, and among 70 accepted papers. The paper gives Read More
Peloton Project – Info Meeting (Fall 2016)
The CMU Database Group is holding an orientation meeting for students that are interested in getting involved in research and development of CMU's new flagship database management system (DBMS). Peloton is a high-performance, in-memory relational DBMS for hybrid workloads. The key aspect of Peloton that makes it different from other systems is that it is designed to be completely autonomous and self-driving. It uses reinforcement learning to automatically configure, tune, and optimize the system without requiring human intervention. At this Read More