News & Events
[DB Seminar] Spring 2016: Srijan Kumar
The web enables transmission of knowledge at a speed and breadth unprecedented in human history, which has had tremendous positive impact on the lives of billions of people. While benign users try to keep the web safe and usable, malicious users add and spread harmful content, manipulate information and twist things in their favor. Having malicious users and their content questions the usefulness, credibility and safety of web platforms. In this talk, we will discuss general graph mining and user Read More
Alumnus win Lagrange Prize
Our alumnus, Jure Leskovec (PhD SCS/MLD, now at Stanford), won the prestigious Lagrange Prize. Established in 2008 by the CRT Foundation and coordinated by the ISI Foundation, the Lagrange Prize is part of the Lagrange Project, one of the most innovative European initiatives dedicated entirely to studying complex systems and data science, where a culture of innovation is encouraged through PHD scholarships and research grants. The Prize is the symbolic event of the Lagrange Project: a prestigious international acknowledgement – Read More
[DB Seminar] Spring 2016: Yingjun Wu
Today’s main-memory databases can support very high transaction rate for OLTP applications. However, when a large number of concurrent transactions contend on the same data records, the system performance can deteriorate significantly. This is especially the case when scaling transaction processing with optimistic concurrency control (OCC) on multicore machines. In this paper, we propose a new concurrency-control mechanism, called transaction healing, that exploits program semantics to scale the conventional OCC towards dozens of cores even under highly contended workloads. Transaction Read More
[DB Seminar] Spring 2016: Pengtao Xie
Matrix-parametrized models, including multiclass logistic regression and sparse coding, are used in machine learning (ML) applications ranging from computer vision to computational biology. When these models are applied to large scale ML problems starting at millions of samples and tens of thousands of classes, their parameter matrix can grow at an unexpected rate, resulting in high parameter synchronization costs that greatly slow down distributed learning. To address this issue, we propose a Sufficient Factor (SF) abstraction for efficient distributed learning of a Read More
[DB Seminar] Spring 2016: CSD Open House Event
This week, we will have one student from Professor Christos Faloutsos' group and Professor Andy Pavlo's group to give a short talk on the on-going research. Then we will have round table discussion of the on-going work of the other members of DB group with the visiting students attending the CSD Open House. Read More
[DB Seminar] Spring 2016: NO SEMINAR (Spring break)
This week, we will not have DB seminar because it's spring break. Read More
[DB Seminar] Spring 2016: Hao Zhang
We propose a dynamic topic model for monitoring temporal evolution of market competition by jointly leveraging tweets and their associated images. For a market of interest (e.g. luxury goods), we aim at automatically detecting the latent topics (e.g. bags, clothes, luxurious) that are competitively shared by multiple brands (e.g. Burberry, Prada, and Chanel), and tracking temporal evolution of the brands' stakes over the shared topics. One of key applications of our work is social media monitoring that can provide companies Read More
[DB Seminar] Spring 2016: Round table discussion
This week, we will have round table discussion. We will talk about on-going research, and paper submissions. Read More
[DB Seminar] Spring 2016: Wei Dai
In this talk I will first give a brief overview of Petuum which encompasses a set of distributed machine learning principles as well as our open-sourced implementations. By discussing the the high level ideas and performance highlights, I hope to show that Big ML systems can benefit greatly from ML-rooted statistical and algorithmic insights. In the second part I will dive into a key component of Petuum: the Bosen parameter server (PS), with particular interest in how consistency models allowing Read More
[DB Seminar] Spring 2016: Jun Woo Park
Traditional sketches, such as the Bloom filter, the CountMin sketch, and the Space-Saving sketch, estimate set membership, frequency counts, or moments of scalar random variables. In this paper, we extend these approaches to a new family of sketches that approximate moments of vectorial random variables that satisfy convex polytope constraints. One application is the Semidefinite sketch, a succinct way to estimate positive semidefinite matrices obtained from a vectorial data stream. Such a sketch can be used to efficiently estimate covariance Read More