[DB Seminar] Spring 2016: Miguel Araujo
Miguel will give a practice talk on his thesis proposal.
The identification of anomalies and communities of nodes in real-world graphs has applications in widespread domains, from the automatic categorization of wikipedia articles or websites to bank fraud detection. While recent and ongoing research is supplying tools for the analysis of simple unlabeled data, it is still a challenge to find patterns and anomalies in large labeled datasets, such as time evolving networks. What do real communities identified in big datasets look like? How is their structure affected by their size? How can we find realistic communities in labeled data? The completed work of this proposal details three related problems in this area. Firstly, we explore the shape and structure of real communities in large networks and we introduce the concept of ”hyperbolic communities”, providing two different algorithms for finding such structures in large datasets. Secondly, we find communities in edge-labeled networks, where labels can be timesteps or any other categorical information in general. We describe efficient algorithms for this task. Lastly, we study anomalies in bank transaction networks, where both nodes and edges are labeled. We describe parallel algorithms that automatically find locations where bank accounts were compromised in billion-scale networks. We also detail future work (1) on the distributed detection of edge-labeled communities, (2) on forecasting communities to the future, predicting what members are going to join and finding the most common community profiles, and (3) on the existence of hyperbolic communities in word-networks, merging community detection and the known heavy-tailed distribution of word frequencies.
Thesis Summary: http://www.cs.cmu.edu/~maraujo/cmuproposal.pdf