[DB Seminar] Fall 2016: Prakhar Ojha
In this talk, I shall discuss two interesting problems pertinent to quality-control and budget-optimization in complex crowdsourcing.
Crowdsourcing has evolved from solving simpler tasks, like image-classification, to more complex tasks such as document editing, language translation, product designing etc. Unlike micro-tasks performed by a single worker, these complex tasks require a group of workers and greater resources. If the task-requester is interested in making individual payments based on their respective efforts in the group, she will need a strategy to discriminate between participants. It is a non-trivial task to distinguish workers (who contribute positively) from idlers (who do not contribute to group task) among the participants using only group’s performance.
In the first part, I shall talk the problem of distinguishing workers from idlers, without assuming any prior knowledge of individual skills and considering “groups” as the smallest observable unit for evaluation. We draw upon literature from group-testing which proposes strategies for forming groups and mechanisms to decode individual qualities from group results. Further, we give bounds over minimum number of groups required to identify quality of subsets of individuals with high confidence. Experiments give us insights into the number of workers and idlers that can be identified with significant probability for a given number of group-tasks, the impact of participant demographics etc.
In the second part of the talk, I shall describe Relational Crowdsourcing (RelCrowd), a novel crowdsourcing paradigm where human intelligence tasks (HITs) are created by taking their inter-dependencies into account and further discuss this framework in the context of evaluation of large-scale KGs.
Automatic construction of large knowledge graphs (KG) by mining web-scale text datasets has received considerable attention over the last few years, resulting in the construction of several KGs, such as NELL, Google Knowledge Vault, etc. Estimating the accuracy of such automatically constructed KGs is a challenging problem due to their size and diversity. Even though crowdsourcing is an obvious choice for such evaluation, the standard micro-task crowdsourcing, where each predicate in the KG is evaluated independently, is very expensive and especially problematic if the budget available is limited. We show that such settings are sub-optimal as they ignore dependencies among various predicates and their instances.
We attempt to systematically study the important problem of accuracy estimation of automatically constructed Knowledge Graphs. Our method binds facts of a KG using coupling constraints and posts those HITs that infer correctness of large parts of the KG. We demonstrate that the objective optimized by KGEval is submodular and NP-hard, allowing guarantees for our approximation algorithm. Through experiments on real-world datasets, our preliminary results demonstrate that KGEval is able to estimate KG accuracy more accurately compared to other competitive baselines, while requiring significantly lesser number of human evaluations.
1. Quality Estimation of Workers in Collaborative Crowdsourcingusing Group Testing, AAAI Conference on Human Computation and Crowdsourcing (HCOMP 2016), <http://talukdar.net/papers/hcomp16_ojha_talukdar.pdf>
2. Relational Crowdsourcing and its Application in Knowledge Graph Evaluation, (under preparation) <https://arxiv.org/abs/1610.06912>