# Pegasus

Graph Mining is an area of data mining to find patterns, rules, and anomalies of graphs. Graphs or networks are everywhere, ranging from the Internet Web graph, social networks(FaceBook, Twitter), biological networks, and many more. Finding patterns, rules, and anomalies have numerous applications including, but not limited to, the followings:

- Ranking web pages by search engine
- ‘viral’ or ‘word-of-mouth’ marketing
- Patterns of disease with potential impact for drug discovery
- Computer network security: email/IP traffic and anomaly detection

Existing works on graph mining has limited scalability: usually, the maximum graph size is order of millions. PEGASUS breaks the limit by scaling up the algorithms to billion-scale graphs. The breakthrough was possible by the careful algorithm design and implementation for Hadoop, a massive cloud computing platform. PEGASUS is a Peta-scale graph mining system, fully written in Java. It runs in parallel, distributed manner on top of Hadoop. Hadoop is a cloud computing platfrom, as well as an open source implementation of MapReduce framework which was originally designed for web-scale data processing by Google.

PEGASUS provide large scale algorithms for important graph mining tasks:

- Degree
- PageRank
- Random Walk with Restart (RWR)
- Radius
- Connected Components

## People

## Acknowledgements

Funding is provided by Defense Threat Reduction Agency (DTRA), National Science Foundation (NSF), Lawrence Livermore National Laboratories (LLNL), and Yahoo. This work is also funded in part by Gordon and Betty Moore Foundation.

## Publications

- U. Kang, D. H. Chau, and C. Faloutsos, "Pegasus: Mining billion-scale graphs in the cloud," in
*ICASSP*, 2012, pp. 5341-5344. Bibtex@INPROCEEDINGS{Kang2012, author = {U. Kang and Duen Horng Chau and Christos Faloutsos}, title = {Pegasus: Mining billion-scale graphs in the cloud}, booktitle = {ICASSP}, year = {2012}, pages = {5341-5344}, bdsk-url-1 = {http://dx.doi.org/10.1109/ICASSP.2012.6289127}, bibsource = {DBLP, http://dblp.uni-trier.de}, crossref = {DBLP:conf/icassp/2012}, date-added = {2014-02-08 23:07:44 -0500}, date-modified = {2014-02-08 23:07:44 -0500}, }

- D. H. Chau, A. Kittur, J. I. Hong, and C. Faloutsos, "Apolo: Making Sense of Large Network Data by Combining Rich User Interaction and Machine Learning," in
*Proceedings of the SIGCHI Conference on Human Factors in Computing Systems*, 2011, pp. 167-176. PDF Bibtex@inproceedings{Chau2011:AMS, author = {Chau, Duen Horng and Kittur, Aniket and Hong, Jason I. and Faloutsos, Christos}, title = {Apolo: Making Sense of Large Network Data by Combining Rich User Interaction and Machine Learning}, booktitle = {Proceedings of the SIGCHI Conference on Human Factors in Computing Systems}, series = {CHI '11}, year = {2011}, isbn = {978-1-4503-0228-9}, location = {Vancouver, BC, Canada}, pages = {167--176}, numpages = {10}, url = {http://www.cs.cmu.edu/~dchau/apolo/apolo.pdf}, doi = {10.1145/1978942.1978967}, }

- D. H. Chau, C. Nachenberg, J. Wilhelm, A. Wright, and C. Faloutsos, "Polonium: Tera-Scale Graph Mining and Inference for Malware Detection,"
*SIAM International Conference on Data Mining*, 2011. Bibtex@ARTICLE{Chau2011c, author = {Chau, D.H. and Nachenberg, C. and Wilhelm, J. and Wright, A. and Faloutsos, C.}, title = {Polonium: Tera-Scale Graph Mining and Inference for Malware Detection}, journal = {SIAM International Conference on Data Mining}, year = {2011}, }

- U. Kang, C. E. Tsourakakis, A. P. Appel, C. Faloutsos, and J. Leskovec, "HADI: Mining Radii of Large Graphs,"
*TKDD*, vol. 5, iss. 2, p. 8, 2011. Bibtex@ARTICLE{Kang2011d, author = {U. Kang and Charalampos E. Tsourakakis and Ana Paula Appel and Christos Faloutsos and Jure Leskovec}, title = {HADI: Mining Radii of Large Graphs}, journal = {TKDD}, year = {2011}, volume = {5}, pages = {8}, number = {2}, bibsource = {DBLP, http://dblp.uni-trier.de}, ee = {https://doi.acm.org/10.1145/1921632.1921634}, }

- U. Kang, C. E. Tsourakakis, and C. Faloutsos, "PEGASUS: mining peta-scale graphs,"
*Knowl. Inf. Syst.*, vol. 27, iss. 2, pp. 303-325, 2011. Bibtex@ARTICLE{Kang2011e, author = {U. Kang and Charalampos E. Tsourakakis and Christos Faloutsos}, title = {PEGASUS: mining peta-scale graphs}, journal = {Knowl. Inf. Syst.}, year = {2011}, volume = {27}, pages = {303-325}, number = {2}, bdsk-url-1 = {http://dx.doi.org/10.1007/s10115-010-0305-0}, bibsource = {DBLP, http://dblp.uni-trier.de}, date-added = {2014-02-09 00:14:18 -0500}, date-modified = {2014-02-09 00:14:18 -0500}, }

- U. Kang, B. Meeder, and C. Faloutsos, "Spectral Analysis for Billion-scale Graphs: Discoveries and Implementation," in
*Proceedings of the 15th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining - Volume Part II*, 2011, pp. 13-25. PDF Bibtex@inproceedings{Kang:2011:SAB, author = {Kang, U. and Meeder, Brendan and Faloutsos, Christos}, title = {Spectral Analysis for Billion-scale Graphs: Discoveries and Implementation}, booktitle = {Proceedings of the 15th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining - Volume Part II}, series = {PAKDD'11}, year = {2011}, isbn = {978-3-642-20846-1}, location = {Shenzhen, China}, pages = {13--25}, numpages = {13}, url = {http://www.cs.cmu.edu/~ukang/papers/HeigenPAKDD2011.pdf}, acmid = {2022852}, }

- U. Kang, H. Tong, J. Sun, C. Lin, and C. Faloutsos, "GBASE: A Scalable and General Graph Management System," in
*Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining*, 2011, pp. 1091-1099. Bibtex@inproceedings{kang2011:GSG, author = {Kang, U. and Tong, Hanghang and Sun, Jimeng and Lin, Ching-Yung and Faloutsos, Christos}, title = {GBASE: A Scalable and General Graph Management System}, booktitle = {Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining}, series = {KDD '11}, year = {2011}, isbn = {978-1-4503-0813-7}, location = {San Diego, California, USA}, pages = {1091--1099}, numpages = {9}, doi = {10.1145/2020408.2020580}, }

- U. Kang, C. E. Tsourakakis, A. P. Appel, C. Faloutsos, and J. Leskovec, "Radius Plots for Mining Tera-byte Scale Graphs: Algorithms, Patterns, and Observations," in
*SDM*, 2010, pp. 548-558. Bibtex@INPROCEEDINGS{Kang2010, author = {U. Kang and Charalampos E. Tsourakakis and Ana Paula Appel and Christos Faloutsos and Jure Leskovec}, title = {Radius Plots for Mining Tera-byte Scale Graphs: Algorithms, Patterns, and Observations}, booktitle = {SDM}, year = {2010}, pages = {548-558}, bdsk-url-1 = {http://www.siam.org/proceedings/datamining/2010/dm10_048_kangu.pdf}, bibsource = {DBLP, http://dblp.uni-trier.de}, crossref = {DBLP:conf/sdm/2010}, date-added = {2014-02-09 00:14:21 -0500}, date-modified = {2014-02-09 00:14:21 -0500}, }

- U. Kang, M. McGlohon, L. Akoglu, and C. Faloutsos, "Patterns on the Connected Components of Terabyte-Scale Graphs," in
*ICDM*, 2010, pp. 875-880. Bibtex@INPROCEEDINGS{Kang2010a, author = {U. Kang and Mary McGlohon and Leman Akoglu and Christos Faloutsos}, title = {Patterns on the Connected Components of Terabyte-Scale Graphs}, booktitle = {ICDM}, year = {2010}, pages = {875-880}, bibsource = {DBLP, http://dblp.uni-trier.de}, ee = {http://doi.ieeecomputersociety.org/10.1109/ICDM.2010.121}, }

- L. Akoglu and C. Faloutsos, "RTG: A Recursive Realistic Graph Generator Using Random Typing," in
*Machine Learning and Knowledge Discovery in Databases, European Conference, ECML PKDD 2009, Bled, Slovenia, September 7-11, 2009, Proceedings, Part I*, 2009, pp. 13-28. Bibtex@INPROCEEDINGS{Akoglu2009, author = {Leman Akoglu and Christos Faloutsos}, title = {{RTG}: A Recursive Realistic Graph Generator Using Random Typing}, booktitle = {Machine Learning and Knowledge Discovery in Databases, European Conference, ECML PKDD 2009, Bled, Slovenia, September 7-11, 2009, Proceedings, Part I}, year = {2009}, pages = {13-28}, bibsource = {DBLP, http://dblp.uni-trier.de}, ee = {http://dx.doi.org/10.1007/978-3-642-04180-8_13}, }

- C. Faloutsos, T. G. Kolda, and J. Sun, "Mining large graphs and streams using matrix and tensor tools," in
*SIGMOD Conference*, 2007, p. 1174. Bibtex@INPROCEEDINGS{Faloutsos2007, author = {Christos Faloutsos and Tamara G. Kolda and Jimeng Sun}, title = {Mining large graphs and streams using matrix and tensor tools}, booktitle = {SIGMOD Conference}, year = {2007}, editor = {Chee Yong Chan and Beng Chin Ooi and Aoying Zhou}, pages = {1174}, publisher = {ACM}, bibsource = {DBLP, http://dblp.uni-trier.de}, ee = {https://doi.acm.org/10.1145/1247480.1247647}, isbn = {978-1-59593-686-8}, }