Content Mining: Overview

This guide provides information about available text mining resources and tools and whether or not the Libraries subscription databases support content mining.

What is Content Mining?

Content mining evolved from text and data mining (TDM).

TDM is a research technique used in a variety of disciplines that deploys computations analysis to extract trends and patterns from large text-based data sets (Source: University of Chicago). The difference between text mining and data mining is that "in text mining the patterns are extracted from natural language text rather than from structured databases of facts." (Source: What is text mining?) Text mining examines and analyzes full-text digitized content, while data mining might only need to look at metadata describing that content. Content mining includes not only traditional text and data, but video, images, websites, and metadata.

Content mining is often fair use under Copyright law. However, many subscribed library resources are restricted by a license agreement with publishers and other entities. In these cases, USC Libraries can mediate or attempt to negotiate with vendors and third party aggregators.

News + Updates

Loading ...