{ALTO:} Active Learning with Topic Overviews for Speeding Label Inductionand Document Labeling

Forough Poursabzi{-}Sangdeh
Jordan L. Boyd{-}Graber
Leah Findlater
Kevin D. Seppi

Abstract

Effective text classification requires experts to annotate data with labels; these training data are time-consuming and expensive to obtain. If you know what labels you want, active learning can reduce the number of labeled documents needed. However, establishing the label set remains difficult. Annotators often lack the global knowledge needed to induce a label set. We introduce ALTO: Active Learning with Topic Overviews, an interactive system to help humans annotate documents: topic models provide a global overview of what labels to create and active learning directs them to the right documents to label. Our forty-annotator user study shows that while active learning alone is best in extremely resource limited conditions, topic models (even by themselves) lead to better label sets, and ALTO’s combination is best overall.