This paper was accepted at the EMNLP 2022 “Data Science Workshop with the Human in the Loop”.
Identifying and integrating missing facts is an important task for filling out knowledge graphs to ensure robustness to downstream programs such as question answering. Adding new facts to a knowledge graph in a real-world system often involves a human verification effort, where candidate facts are checked for accuracy by human annotators. This process is laborious, time-consuming and inefficient, as only a small fraction of the missing facts can be identified. This paper proposes a simple yet efficient framework for human annotation for fact collection that searches for a diverse set of highly relevant candidate facts for human annotation. The empirical results presented in this work show that the proposed solution leads to improvements in both i) the quality of candidate facts and ii) the ability to discover more facts to increase the knowledge graph without requiring additional human effort.