Dhivya Eswaran and Zongge Liu (SDM2017 dry run)

Event Date: Tuesday April 25, 2017
Event Time: 01:00pm EDT
Location: GHC 8115
Speaker: Dhivya Eswaran and Zongge Liu

Title: SDM2017 Dry Run

Dhivya and Zongge will have dry runs for SDM 2017.

Dhivya’s talk information:

Title: The Power of Certainty: A Dirichlet Multinomial Model for Belief Propagation

Abstract: Given a friendship network, how certain are we that Smith is a progressive (vs. conservative)? How can we propagate these certainties through the network? While Belief propagation marked the beginning of principled label propagation to classify nodes in a graph, its numerous variants proposed in the literature fail to take into account uncertainty during the propagation process. As we show, this limitation leads to counter-intuitive results for even simple graphs. Motivated by these observations, we formalize axioms that any node classification algorithm should obey and propose NetConf which satisfies these axioms and handles arbitrary network effects (homophily / heterophily) at scale. Our contributions are: (1) Axioms: We state axioms that any node classification algorithm should satisfy; (2) Theory: NetConf is grounded in a Bayesian-theoretic framework to model uncertainties, has a closed-form solution and comes with precise convergence guarantees; (3) Practice: Our method is easy to implement and scales linearly with the number of edges in the graph. On experiments using real world data, we always match or outperform BP while taking less processing time.


Zongge’s talk information:

Title: H-fuse: Efficient Fusion of Aggregated Historical Data

Abstract: In this paper, we address the challenge of recovering a time sequence of counts from aggregated historical data. For example, given a mixture of the monthly and weekly sums, how can we find the daily counts of people infected with flu? In general, what is the best way to recover historical counts from aggregated, possibly overlapping historical reports, in the presence of missing values? Equally importantly, how much should we trust this reconstruction? We propose H-Fuse, a novel method that solves above problems by allowing injection of domain knowledge in a principled way, and turning the task into a well-defined optimization problem. H-Fuse has the following desirable properties: (a) Effectiveness, recovering historical data from aggregated reports with high accuracy; (b) Self-awareness, providing an assessment of when the recovery is not reliable; (c) Scalability, computationally linear on the size of the input data. demonstrates that H-Fuse reconstructs the original data 30 − 81% better than the least squares method.