[DB Seminar] Fall 2016: Ashraf Aboulnaga (QCRI)
Distributed data processing platforms such as Pregel and GraphLab have substantially simplified the design and deployment of certain classes of distributed graph analytics algorithms. However, these platforms do not represent a good match for distributed graph mining problems, for example, finding frequent subgraphs in a graph. Given an input graph, these problems require exploring a very large number of subgraphs and finding patterns that match some “interestingness” criteria desired by the user. These algorithms are very important for areas such as social networks, semantic web, and bioinformatics. In this talk, I will present Arabesque, a distributed data processing platform for implementing graph mining algorithms. Arabesque automates the process of exploring a very large number of subgraphs. It defines a high-level filter-process computational model that simplifies the development of scalable graph mining algorithms. Arabesque explores subgraphs and passes them to the application, which must simply compute outputs and decide whether the subgraph should be further extended. We use Arabesque’s API to produce distributed solutions to three fundamental graph mining problems: frequent subgraph mining, counting motifs, and finding cliques. Our implementations require only a few lines of code, scale to trillions of subgraphs, and represent in some cases the first available distributed solutions.
This is joint work with Carlos Teixeira, Alexandre Fonseca, Marco Serafini, Georgos Siganos, and Mohammed Zaki. It appeared in SOSP 2015.
Ashraf Aboulnaga is the Research Director of the Distributed Systems group at the Qatar Computing Research Institute. His research interests are in the area of data intensive cloud computing, with a current focus on platforms for big data, distributed storage systems, and data integration on the web.