Events

Events

[DB Seminar] Spring 2016: Jun Woo Park

Speaker:
Jun Woo Park
Date:
Mon Feb 8, 2016 @ 04:45pm EDT
Date:
Mon Feb 8, 2016
Time:
04:45pm EDT
Location:
GHC 8102
Title:
Semidefinite Sketches for Multidimensional Data Streams

Talk Info:

Traditional sketches, such as the Bloom filter, the CountMin sketch, and the Space-Saving sketch, estimate set membership, frequency counts, or moments of scalar random variables. In this paper, we extend these approaches to a new family of sketches that approximate moments of vectorial random variables that satisfy convex polytope constraints. One application is the Semidefinite sketch, a succinct way to estimate positive semidefinite matrices obtained from a vectorial data stream. Such a sketch can be used to efficiently estimate covariance matrices for on-line machine learning applications and novelty detection for observations.

To evaluate these sketches, we implemented them in a distributed data store for machine learning and measured their performance and accuracy using different real-world data streams. Our results show that the Semidefinite sketch is able process more than two billion multi-dimensional data points at a rate of 97M operations per second and achieves 2–7× better throughput over other state-of-the-art streaming platforms.