Digital Humanities (DH) data are not the objects in themselves, but a digital representation of those objects, or of structured information related to them, which is created and analyzed by computational methods. Most digital projects rely on the synthesis, analysis, and visualization of data. Broadly defined, data are any information collected or created in order to answer a research question, and may include the primary objects of study, such as texts, paintings, documentary sources, surveys, and secondary literature. Large projects can even be crowd-sources from the general public.
In DH research, data often also constitute valuable outputs in the form of digital resources, such as, for example, encoded texts, databases, images of artifacts, and digital collections.
In DH projects, "data management" refers to the systematic process of organizing, storing, archiving, preserving, and sharing research data collected through digital technologies. This includes planning for the handling of data throughout its lifecycle from collection to analysis and dissemination, ensuring its accessibility and long-term usability for future research. This process is often called the research data lifecycle.
Key aspects of data management in DH projects:
Identifying relevant data sources, whether digitized texts, images, audio recordings, or other digital artifacts, and implementing appropriate methods for data capture.
Cleaning up messy data by correcting errors, standardizing formats, and applying consistent metadata to ensure data quality and interoperability.
Selecting appropriate platforms or repositories to securely store digital data, considering file formats, version control, and long-term preservation strategies.
Developing detailed descriptive information about the data (e.g., origin, date, content, context) to facilitate discoverability and understanding.
Utilizing software applications like OpenRefine, text analysis tools, and data visualization platforms to analyze and interpret the data.
Making research data accessible to the wider community through open repositories, data portals, or published digital projects.
Examples of Digital Humanities Projects that heavily rely on data management:
Building and analyzing large digital collections of texts to study language patterns and themes.
Geo-referencing and visualizing historical data on maps to explore spatial relationships.
Digitizing and providing online access to collections like photographs, manuscripts, and audio recordings.
Collecting and analyzing large volumes of social media data to understand public discourse and trends.
Challenges in Digital Humanities Data Management:
Important Considerations:
Creating a detailed plan outlining data collection, storage, access, and preservation strategies, often required by funding agencies.
Leveraging expertise from library staff and data scientists to navigate complex data management issues.
Utilizing open data formats and metadata standards to enable data sharing and reusability across different projects.
Tools that will allow you to work easily with collections of data:
Bulk Rename Utility: Simplifies the process for Windows Users by making it easy to establish and follow naming conventions.
Resource: ‘How to Use Bulk Rename Utility’ (video - 5.59 mins.)
DMPTool: Helps create the Data Management Plan, which will be required in a DH project that involves generating data.
From the Page: Transcription tool that allows users to crowdsource or collaborate with restricted individuals or volunteers to transcribe, index, and describe historic documents
Resource: From the Page "How-To"
OpenRefine: OpenRefine is a powerful free, open source tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data.
Resource: OpenRefine User Manual
Tropy: Allows users to organize, annotate, tag, search visually, and export collections for research.
Resource: Introduction to Tropy (video) 4.12 mins.