Most of the libraries' databases do not allow text or data mining due to license agreements. We will continue to work with database vendors to include TDM into future license agreements. The resources listed here are the current exceptions. If you do not see a resource listed here, please contact us and we can investigate further
Contact USC Libraries to initiate the process
All databases from Adam Matthew (which digitize unique primary source collections) are available for mining.
Association for Computing Machinery
(ACM Digital Library, ACM Transactions)
|FREE||The individual researcher negotiates and signs directly with provider.|
|Early English Books Online (EEBO)||FREE||
Early English Books Online - Text Creation Partnership - The Text Creation Partnership creates standardized, accurate XML/SGML encoded electronic text editions of early print books.
Phase I content (25,000 titles) is freely available/searchable. USC does not have full-text access to Phase II content (we are not a partner library for this phase of transcription)
(Primary source collections only)
|Some free. Downloading large datasets costs $500 -$3,500/price per collection||
Gale Artemis: Primary Sources, which searches across 23 of our Gale primary source databases covering 1500-2012, has a Term Frequency search option and Term Clusters viewer (available from the articles results list).
To download large datasets USC Libraries will have to request data on your behalf from our Gale sales representative. It can take up to 3 weeks to process requests. Gale will send a hard drive with the data requested to the libraries for you to use.
video: Using Term Frequency & Term Clusters [2:59]
|Hathi Trust||FREE||Individual can access public domain materials for content mining project through Hathi Trust Research Center.||Hathi Trust Research Center Analysis|
|JSTOR||FREE||Data for Research (DfR) - Provides a self-service system for text mining. By creating a free DfR account you can download the metadata, word frequencies, citations, key terms, and N-grams of up to 1,000 documents. To get larger datasets (>1,000) or a type of data not available through the main site, you have to contact JSTOR directly: email@example.com||Introduction to using DfR from DH @ Washington Lee University||Gender composition of scholarly publications (1665 - 2011)|
|IEEE||Cost negotiated per request||Through a negotiation of the vendor license, the library facilitates on a case by case situation.|
|LexisNexis||FREE||Does not "officially" support or provide data/text mining options. However, since text files can be downloaded, TDM is possible. You can batch download up to 500 articles at a time in one text file.|
|NextBio||FREE (via subscription)|
|Oxford English Dictionary (OED)||FREE||Oxford University Press grants research access to the Corpus for academic projects that can demonstrate a strong practical need for this data. To apply for research access to the Corpus, fill out and email this application form.||The Oxford English Corpus Sketch Engine Documentation|
|Oxford University Press||FREE||
Researchers are not required to request permission for non-commercial text-mining of OUP content. However, OUP offers consultation service with a technical project manager to assist in planning your TDM project, including avoidance of any technical safeguards triggers OUP has in place to protect the stability and security of our websites.
To request a consultant for your TDM project, please e-mail Data.Mining@oup.com
|ProQuest||Cost negotiated per request||
You can contact ProQuest directly to negotiate arrangements for text mining their content.
ProQuest does allow free text mining for the newspapers to which USC Libraries have purchased perpetual access licenses. Those newspapers are: Los Angeles Times (1881-1931), Los Angeles Sentinel (1934-2005), and New York Times (1851-1934). USC Libraries will have to request this data on your behalf.
ProQuest offers two methods of data delivery:
|Robots Reading Vogue|
|FREE (with subscription)||You can text mine all subscribed content so long as it is for non-commercial purposes. You do this via Elsevier's Science Direct APIs. You must register first to use these APIs. contact Elsevier directly|
|SpringerLink||FREE (with subscription)|