Robson Cordeiro (University of Sao Paulo)
Date
Time
Location
Speaker
Given a data stream with many attributes and high frequency of events, how to cluster similar events? Can it be done in real time? For example, how to cluster decades of frequent measurements of tens of climatic attributes to aid real time alert systems in forecasting extreme climatic events, such as floods and hurricanes? The task of clustering data with many attributes is known as subspace clustering. Today, there exists a need for algorithms of this type well-suited to process multidimensional data streams, for which real time processing is highly desirable. In this talk I will present the new algorithm Halite_ds – a fast, scalable and highly accurate subspace clustering algorithm for multidimensional data streams. It improves upon an existing technique that was originally designed to process static (not streams) data. Our main contributions are: (1) Analysis of Data Streams: the new algorithm takes advantage of the knowledge obtained from clustering past data to easy clustering data in the present. This fact allows our Halite_ds to be considerably faster than its base algorithm, yet obtaining the same accuracy of results; (2) Real Time Processing: as opposed to the state-of-the-art, Halite_ds is fast and scalable, making it feasible to analyze streams with many attributes and high frequency of events in real time; (3) Experiments: we ran experiments using synthetic data and a real multidimensional stream with almost one century of climatic data. Our Halite_ds was up to 217 times faster than 5 representative works, i.e., its base algorithm plus 4 others from the state-of-the-art, always presenting highly accurate results.
Bio:
Robson L. F. Cordeiro received the BSc degree in Computer Science (CS) from the University of Oeste Paulista, Brazil, in 2002, the MSc degree in CS from the Federal University of Rio Grande do Sul, Brazil, in 2005, and the PhD degree in CS from the University of São Paulo, Brazil, in 2011. His PhD program included a visiting period of one year at the Carnegie Mellon University, USA, from 2009 to 2010. He was also a Postdoctoral Researcher at the University of São Paulo, Brazil, from 2011 to 2013. His PhD Dissertation won the ’best CS Dissertation Award’ in 2012 from the Brazilian Computer Society - SBC, and generated one book published by Springer that was chosen as one of the 'Computing Reviews' Notable Computing Books and Articles of 2013' by ACM. Robson is currently an Assistant Professor at the University of São Paulo, Brazil. His research interests include mining and managing Big Data of moderate-to-high dimensionality, complex data and large graphs. He is a member of the IEEE, ACM, and SBC.