DB Seminar [Spring 2015]: Vagelis Papalexakis (Thesis Proposal dry run)

Event Date: Monday February 9, 2015
Event Time: 04:30pm EDT
Location: GHC 8102
Speaker: Vagelis Papalexakis

Title: Mining Large Multi­Aspect Data: Algorithms And Applications


Given a Knowledge Base that records millions of relations of the form “Barack Obama is the president of USA”, how can we automatically learn new synonyms and enhance the Knowledge Base?
Imagine now measuring the brain activity of a person while reading words that appear in this Knowledge Base; how can we relate information processing in the brain, and information found on the World Wide Web? Can we use both pieces of data in order to enhance knowledge extraction in both scenarios?

On a third, seemingly unrelated, application, consider having different “views” of a social network, e.g. observing who is calling whom, who sends e­mails to whom, and who texts whom; can we use this rich information towards community and anomaly detection? What if we also have demographic information about the people of the network? Can we further enhance our analysis?

The key underlying theme behind all the above applications is the multi­aspect nature of the data, with the ultimate question being: how can we take advantage of all different aspects? And if so, can we analyze sets of multi­aspect data jointly? Finally, can we automatically, and in a mostly unsupervised setting, filter out aspects of the data which are redundant or not beneficial for the task at hand?

In this thesis, we work towards answering the above questions, in two different thrusts:
1) Algorithms & Models: we develop multi­aspect analysis models and scalable algorithms, with specific emphasis to Tensor Analysis, that are able to efficiently extract knowledge from multi­aspect data.
2) Applications: we apply our algorithms to a variety of multi­aspect data problems, with specific emphasis on linking knowledge extraction from the Web and the brain.