# Big Tensor Mining

Tensors are multi-dimensional generalizations of matrices, and so can have non-numeric entries. Extremely large and sparse coupled tensors arise in numerous important applications that require the analysis of large, diverse, and partially related data. The effective analysis of coupled tensors requires the development of algorithms and associated software that can identify the core relations that exist among the different tensor modes, and scale to extremely large datasets. The objective of this project is to develop theory and algorithms for (coupled) sparse and low-rank tensor factorization, and associated scalable software toolkits to make such analysis possible. The research in the project is centered on three major thrusts. The first is designed to make novel theoretical contributions in the area of coupled tensor factorization, by developing multi-way compressed sensing methods for dimensionality reduction with perfect latent model reconstruction. Methods to handle missing values, noisy input, and coupled data will also be developed. The second thrust focuses on algorithms and scalability on modern architectures, which will enable the efficient analysis of coupled tensors with millions and billions of non-zero entries, using the map-reduce paradigm, as well as hybrid multicore architectures. An open-source coupled tensor factorization toolbox (HTF- Hybrid Tensor Factorization) will be developed that will provide robust and high-performance implementations of these algorithms. Finally, the third thrust focuses on evaluating and validating the effectiveness of these coupled factorization algorithms on a NeuroSemantics application whose goal is to understand how human brain activity correlates with text reading & understanding by analyzing fMRI and MEG brain image datasets obtained while reading various text passages. Given triplets of facts (subject-verb-object), like (‘Washington’ ‘is the capital of’ ‘USA’), can we find patterns, new objects, new verbs, anomalies? Can we correlate these with brain scans of people reading these words, to discover which parts of the brain get activated, say, by tool-like nouns (‘hammer’), or action-like verbs (‘run’)? We propose a unified “coupled tensor” factorization framework to systematically mine such datasets. Unique challenges in these settings include

- Tera- and peta-byte scaling issues,
- Distributed fault-tolerant computation,
- Large proportions of missing data, and
- Insufficient theory and methods for big sparse tensors.

We also propose to derive new scientific hypotheses on how the brain works and how it processes language (from the never-ending language learning (NELL) and NeuroSemantics projects) and the development of scalable open source software for coupled tensor factorization. Our tensor analysis methods can also be used in many other settings, including recommendation systems and computer-network intrusion/anomaly detection.

## People

- Alex Beutel
- Danai Koutra
- Miguel Araujo
- Vagelis Papalexakis
- Kijung Shin
- Christos Faloutsos
- Tom Mitchell
- Nikos Sidiropoulos (University of Minnesota)
- U Kang (KAIST, Korea)
- Partha Pratim Talukdar (CMU post-doc)
- Evrim Acar (University of Copenhagen, Denmark)
- Rasmus Bro (University of Copenhagen, Denmark)
- Konstantinos Pelechrinis (University of Pittsburgh)
- Leman Akoglu (Stony Brook Univ.)
- Polo Chau (Georgia Tech)
- Aditya Prakash (Virginia Tech)

## Publications

- J. Oh, K. Shin, E. E. Papalexakis, C. Faloutsos, and H. Yu, "S-HOT: Scalable High-Order Tucker Decomposition," in
*Proceedings of the Tenth ACM International Conference on Web Search and Data Mining*, 2017, pp. 761-770. PDF Bibtex@inproceedings{oh2017s, title = {S-HOT: Scalable High-Order Tucker Decomposition}, author = {Oh, Jinoh and Shin, Kijung and Papalexakis, Evangelos E and Faloutsos, Christos and Yu, Hwanjo}, booktitle = {Proceedings of the Tenth ACM International Conference on Web Search and Data Mining}, pages = {761--770}, year = {2017}, url = {http://www.cs.cmu.edu/~kijungs/papers/shotWSDM2017.pdf}, }

- K. Shin, B. Hooi, J. Kim, and C. Faloutsos, "D-cube: Dense-block detection in terabyte-scale tensors," in
*Proceedings of the Tenth ACM International Conference on Web Search and Data Mining*, 2017, pp. 681-689. PDF Bibtex@inproceedings{shin2017dcube, title = {D-cube: Dense-block detection in terabyte-scale tensors}, author = {Shin, Kijung and Hooi, Bryan and Kim, Jisu and Faloutsos, Christos}, booktitle = {Proceedings of the Tenth ACM International Conference on Web Search and Data Mining}, pages = {681--689}, year = {2017}, url = {http://www.cs.cmu.edu/~kijungs/papers/dcubeWSDM2017.pdf}, }

- K. Shin, B. Hooi, J. Kim, and C. Faloutsos, "DenseAlert: Incremental Dense-Subtensor Detection in Tensor Streams," in
*Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining*, 2017. PDF Bibtex@inproceedings{shin2017densealert, title = {DenseAlert: Incremental Dense-Subtensor Detection in Tensor Streams}, author = {Shin, Kijung and Hooi, Bryan and Kim, Jisu and Faloutsos, Christos}, booktitle = {Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining}, year = {2017}, url = {http://www.cs.cmu.edu/~kijungs/papers/alertKDD2017.pdf}, }

- K. Shin, B. Hooi, and C. Faloutsos, "M-Zoom: Fast Dense-Block Detection in Tensors with Quality Guarantees," in
*ECML/PKDD*, 2016, pp. 264-280. PDF Bibtex@inproceedings{shin2016mzoom, author = {Kijung Shin and Bryan Hooi and Christos Faloutsos}, title = {M-Zoom: Fast Dense-Block Detection in Tensors with Quality Guarantees}, booktitle = {ECML/PKDD}, pages = {264--280}, year = {2016}, url = {http://www.cs.cmu.edu/~kijungs/papers/mzoomPKDD2016.pdf}, }

- A. Beutel, A. Kumar, E. E. Papalexakis, P. P. Talukdar, C. Faloutsos, and E. P. Xing, "FlexiFaCT: Scalable Flexible Factorization of Coupled Tensors on Hadoop," in
*SDM*, 2014. Bibtex@INPROCEEDINGS{FlexiFaCT, author = {Alex Beutel and Abhimanu Kumar and Evangelos E. Papalexakis and Partha Pratim Talukdar and Christos Faloutsos and Eric P. Xing}, title = {FlexiFaCT: Scalable Flexible Factorization of Coupled Tensors on Hadoop}, booktitle = {SDM}, year = {2014}, ee = {http://alexbeutel.com/papers/sdm2014.flexifact.pdf}, }

- A. Kumar, A. Beutel, Q. Ho, and E. P. Xing, "Fugue: Slow-Worker-Agnostic Distributed Learning for Big Models," in
*AISTATS*, 2014. Bibtex@INPROCEEDINGS{Fugue, author = {Abhimanu Kumar and Alex Beutel and Qirong Ho and Eric P. Xing}, title = {Fugue: Slow-Worker-Agnostic Distributed Learning for Big Models}, booktitle = {AISTATS}, year = {2014}, ee = {http://alexbeutel.com/papers/aistats2014.fugue.pdf}, }

- E. E. Papalexakis, T. M. Mitchell, N. D. Sidiropoulos, C. Faloutsos, P. P. Talukdar, and B. Murphy, "Scoup-SMT: Scalable Coupled Sparse Matrix-Tensor Factorization,"
*arXiv preprint arXiv:1302.7043*, 2013. Bibtex@article{papalexakis2013scoup, title={Scoup-SMT: Scalable Coupled Sparse Matrix-Tensor Factorization}, author={Papalexakis, Evangelos E and Mitchell, Tom M and Sidiropoulos, Nicholas D and Faloutsos, Christos and Talukdar, Partha Pratim and Murphy, Brian}, journal={arXiv preprint arXiv:1302.7043}, year={2013}, }

- R. Bro, E. E. Papalexakis, E. Acar, and N. D. Sidiropoulos, "Coclustering—a useful tool for chemometrics,"
*Journal of Chemometrics*, vol. 26, iss. 6, pp. 256-263, 2012. Bibtex@article{bro2012coclustering, title={Coclustering—a useful tool for chemometrics}, author={Bro, Rasmus and Papalexakis, Evangelos E and Acar, Evrim and Sidiropoulos, Nicholas D}, journal={Journal of Chemometrics}, volume={26}, number={6}, pages={256--263}, year={2012}, publisher={Wiley Online Library}, }

- U. Kang, E. Papalexakis, A. Harpale, and C. Faloutsos, "Gigatensor: scaling tensor analysis up by 100 times-algorithms and discoveries," in
*Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining*, 2012, pp. 316-324. Bibtex@inproceedings{kang2012gigatensor, title={Gigatensor: scaling tensor analysis up by 100 times-algorithms and discoveries}, author={Kang, U and Papalexakis, Evangelos and Harpale, Abhay and Faloutsos, Christos}, booktitle={Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining}, pages={316--324}, year={2012}, organization={ACM}, }

- E. E. Papalexakis, A. Beutel, and P. Steenkiste, "Network anomaly detection using co-clustering," in
*Advances in Social Networks Analysis and Mining (ASONAM), 2012 IEEE/ACM International Conference on*, 2012, pp. 403-410. Bibtex@inproceedings{papalexakis2012network, title={Network anomaly detection using co-clustering}, author={Papalexakis, Evangelos E and Beutel, Alex and Steenkiste, Peter}, booktitle={Advances in Social Networks Analysis and Mining (ASONAM), 2012 IEEE/ACM International Conference on}, pages={403--410}, year={2012}, organization={IEEE}, }

- E. E. Papalexakis, C. Faloutsos, and N. D. Sidiropoulos, "ParCube: sparse parallelizable tensor decompositions." Springer, 2012, pp. 521-536.
Bibtex
@incollection{papalexakis2012parcube, title={ParCube: sparse parallelizable tensor decompositions}, author={Papalexakis, Evangelos E and Faloutsos, Christos and Sidiropoulos, Nicholas D}, booktitle={Machine Learning and Knowledge Discovery in Databases}, pages={521--536}, year={2012}, publisher={Springer}, }

- J. Sun, D. Tao, S. Papadimitriou, P. S. Yu, and C. Faloutsos, "Incremental tensor analysis: Theory and applications,"
*TKDD*, vol. 2, iss. 3, 2008. Bibtex@ARTICLE{Sun2008, author = {Jimeng Sun and Dacheng Tao and Spiros Papadimitriou and Philip S. Yu and Christos Faloutsos}, title = {Incremental tensor analysis: Theory and applications}, journal = {TKDD}, year = {2008}, volume = {2}, number = {3}, bibsource = {DBLP, http://dblp.uni-trier.de}, ee = {https://doi.acm.org/10.1145/1409620.1409621}, }

- E. E. Papalexakis, L. Akoglu, and D. Ienco, "Do more Views of a Graph help? Community Detection and Clustering in Multi-Graphs."
Bibtex
@article{papalexakismore, title={Do more Views of a Graph help? Community Detection and Clustering in Multi-Graphs}, booktitle={International Conference on Data Fusion, 2013}, author={Papalexakis, Evangelos E and Akoglu, Leman and Ienco, Dino}, }

- E. E. Papalexakis, T. Dumitras, D. H. P. Chau, A. B. Prakash, and C. Faloutsos, "Spatio-temporal Mining of Software Adoption \& Penetration."
Bibtex
@article{papalexakisspatio, title={Spatio-temporal Mining of Software Adoption \& Penetration}, booktitle={ASONAM 2013}, author={Papalexakis, Evangelos E and Dumitras, Tudor and Chau, Duen Horng Polo and Prakash, B Aditya and Faloutsos, Christos}, }