Most of the libraries' databases do not allow text or data mining due to license agreements. We will continue to work with database vendors to include TDM into future license agreements. The resources listed here are the current exceptions. If you do not see a resource listed here, please contact us and we can investigate further
|Database/Vendor||Fee?||Details||Help/Guides||Examples of text/data mining research from these databases|
|Adam Mathew||free/no fee||
Contact USC Libraries to initiate the process
All databases from Adam Matthew (which digitize unique primary source collections) are available for mining.
|Early English Books Online (EEBO)||free (public domain)||
Early English Books Online - Text Creation Partnership - The Text Creation Partnership creates standardized, accurate XML/SGML encoded electronic text editions of early print books.
Phase I content (25,000 titles) is freely available/searchable. USC does not have full-text access to Phase II content (we are not a partner library for this phase of transcription)
(Only primary source collections are available for TDM)
|Some free, but to download large datasets costs $500 -$3,500 (price per collection)||
Gale Artemis: Primary Sources, which searches across 23 of our Gale primary source databases covering 1500-2012, has a Term Frequency search option and Term Clusters viewer (available from the articles results list).
To download large datasets USC Libraries will have to request data on your behalf from our Gale sales representative. It can take up to 3 weeks to process requests. Gale will send a hard drive with the data requested to the libraries for you to use.
video: Using Term Frequency & Term Clusters [2:59]
|JSTOR||free||Data for Research (DfR) - Provides a self-service system for text mining. By creating a free DfR account you can download the metadata, word frequencies, citations, key terms, and N-grams of up to 1,000 documents. To get larger datasets (>1,000) or a type of data not available through the main site, you have to contact JSTOR directly: email@example.com||Introduction to using DfR from DH @ Washington Lee University||Gender composition of scholarly publications (1665 - 2011)|
|LexisNexis||free||Does not "officially" support or provide data/text mining options. However, since text files can be downloaded, TDM is possible. You can batch download up to 500 articles at a time in one text file.|
|NextBio||free (via subscription)|
|Oxford English Dictionary (OED)||free||Oxford University Press grants research access to the Corpus for academic projects that can demonstrate a strong practical need for this data. To apply for research access to the Corpus, fill out and email this application form.||The Oxford English Corpus Sketch Engine Documentation|
|Oxford University Press (Oxford Scholarship Online)||free||
Researchers are not required to request permission for non-commercial text-mining of OUP content. However, OUP offers consultation service with a technical project manager to assist in planning your TDM project, including avoidance of any technical safeguards triggers OUP has in place to protect the stability and security of our websites.
To request a consultant for your TDM project, please e-mail Data.Mining@oup.com
|ProQuest||yes and no||
You can contact ProQuest directly to negotiate arrangements for text mining their content.
ProQuest does allow free text mining for the newspapers to which USC Libraries have purchased perpetual access licenses. Those newspapers are: Los Angeles Times (1881-1931), Los Angeles Sentinel (1934-2005), and New York Times (1851-1934). USC Libraries will have to request this data on your behalf.
ProQuest offers two methods of data delivery:
|Robots Reading Vogue|
|free (included in subscription)||You can text mine all subscribed content so long as it is for non-commercial purposes. You do this via Elsevier's Science Direct APIs. You must register first to use these APIs. contact Elsevier directly|
|SpringerLink||free (included in subscription)|