Text is a treasure trove of data for social, behavioral and economic researchers, for which the rapidly developing field of automated text analysis has produced numerous methods. However, the machine classifiers on which many of these methods depend degrade as data becomes noisier. Ad hoc approaches to reducing noisesuch as using analyst-generated keywords to collect documentscan introduce bias. Drs. King, Lam and Roberts (2016) work on computer-assisted keyword recommendations offered a new solution to this problem. Thresher used that work to build a multilingual keyword recommender (MKR) tool, which our government clients are using to reduce bias in data collection from relatively small sets of text. We propose to expand the applicability of the King et al. algorithm to larger datasets and longer documents by generating methods for isolating signal in text across heterogenous document sets. If successful we will reduce the need for arduous hand-coding custom data-cleaning rules for document sets before machine classifiers are applied and improve the precision and recall of the classifiers once applied. Results from this program will generate methods for Phase II that will serve as the basis for prototyping extensible software packages for deployment in MKR and other text analysis tools.