Content Mining: Overview

This guide provides information about available text mining resources and tools and whether or not the Libraries subscription databases support content mining.

What is Content Mining?

Content mining evolved from text and data mining (TDM).

TDM is a research technique used in a variety of disciplines that deploys computational analysis to extract trends and patterns from large text-based data sets (Source: University of Chicago's Text & Data Mining Guide). The difference between text mining and data mining is that "in text mining the patterns are extracted from natural language text rather than from structured databases of facts" (Source: "What Is Text Mining?" by Marti Hearst). Text mining examines and analyzes full-text digitized content, while data mining might only need to look at metadata describing that content. 

Content mining includes not only traditional text and data, but also video, images, websites and metadata. 

Ted Talk: What We Learned from 5 Million Books


Caroline Muglia's picture
Caroline Muglia
Co-Associate Dean for Collections