Building a recommendation engine from plain text data is a difficult task. Beyond plain text, noisy, inaccurate, and duplicated metadata from text extraction of PDF documents presents an enormous challenge. Mendeley is a reference manager for researchers that that is doing just that. The infrastructure and data mining requirements to build a recommendation engine from text-based PDFs will be discussed.