Skip to Main Content

Creating and Developing a Digital Humanities Project - From Inception to Implementation and Dissemination: GATHERING YOUR SOURCES

An Essential Step by Step Approach: From Planning to Completing and Disseminating Your Digital Humanities Project.

FROM SOURCE TO DATA

Digital humanities research can be based on sources in many different formats. The most basic form is a digital copy of the original, which requires digitization by means of a scan or a digital photo. To analyse textual sources, the digital images of those sources must be converted to computer readable text.

Many older printed documents and most handwritten manuscripts have to be transcribed manually by the project's author. Usually, this is done in the form of transcriptions (see below: Transcription of Texts).  Transcriptions can also be done online collaboratively or by means of crowd sourcing  and both methods enable groups of students and/or scholars to work together on the transcription of a single (larger) document or a collection of documents. For a growing number of larger transcription projects (usually conducted by academic departments, libraries or digital archives), this is not restricted to the research group, but all interested individuals are asked to participate.

Before you can use digital texts found online for computer-assisted text analysis, the project' author may have to prepare the texts first. Digitized online texts are often fragmented and in some cases they may  contain HTML tags or bits of JavaScript.  Furthermore, In some cases it is necessary to change the file format to a txt file, because the project's author's chosen program cannot handle the original format.

Digitising, preparing and processing sources for analysis are part of what is often called the research data lifecycle. The encompassing set of activities is called data management or data curation. This concerns the overall organisation of the data, including aspects like storage, archiving and preservation.

See Companion Guide: Digital Humanities Projects - Creating and Mining Corpora

TRANSCRIPTION OF TEXT

In order to use the computer to analyze textual sources, the digital images of those sources must be converted to computer readable text.

For printed documents that are relatively recent, this can often be achieved by optical character recognition (OCR).

For printed historic documents, OCR often does not produce satisfactory results: OCR errors may occur because of damaged material, irregular lay-out, and the use of historic fonts. In addition, historical language usually contains many spelling and orthographic variants.For handwritten documents (like historical manuscripts, letters and children's writing), OCR is, in most instances, quite problematic, Consequently, many older printed documents and most handwritten manuscripts have to be transcribed manually.

Usually, this is done in the form of a so-called diplomatic transcription which follows the original document as closely as possible (by recording only the characters as they appear on the document, with minimal or no editorial intervention or interpretation). In a normalized (also called regularized) transcription, the original text is cleaned up and more easily readable, e.g. using modern orthography. Because a normalized transcription can be made on the basis of a diplomatic transcription, but not vice versa, diplomatic transcriptions are often preferred.

Consequently, decisions have to be made about how to deal with certain aspects of the original text: page layout (including line length); typeface (capitalization, use of bold and italics, underline, strikeout, accent markers); punctuation (or lack of it); illegible text; older spelling and misspelling; archaic abbreviations; handwritten notes in printed text; and images and drawings in the text. All transcription decisions that are made in this respect must be well documented.

 

TRANSCRIPTION OF SPEECH

A recorded spoken language needs to be transcribed first before subjecting it to further analysis.  it is usually not possible to automate this process with acceptable results using speech recognition, so this must typically be done by hand, and this includes two main activities: listening and typing.

If the recording is digital, The following two tools can be used to support this process:

  • TranscriberAG is an elaborate (free) program that provides support for segmenting (longer) sound recordings and transcribing them. It is designed for assisting the manual annotation of speech signals. It provides a user-friendly graphical user interface (GUI) for segmenting long duration speech recordings, transcribing them, labeling speech turns, topic changes and acoustic conditions.  It allows the labelling of speaker changes, subject changes, and acoustic conditions. The program can also process video files.
    Although TranscriberAG was developed for (linguistic) research on spoken language, these functions can also be useful for other applications. It uses the Annotation Graph format as native format but can read a number of other annotation formats.

  • Transana  is an advanced program that can be used to manage and transcribe spoken language data, including video files. Transana also provides several tools for analyzing the transcribed materials.

 

DIGITAL TEXT ANNOTATION

There are several benefits of Digital annotation:

Documentation:Digital annotation allows the process of analysis to be optimally tracked and documented. This also makes it easier to verify reported research results, interpretations and conclusions.

 

Clarity: Digital annotations are generally clearer than manually written annotations in a printed text, and they can be changed in an easier and neater manner.

 

Shareability: It is easier and safer to share digitally annotated files with others (colleagues, fellow students, teachers) than annotated printed material.

 

Collaboration: Various annotation systems offer the possibility of joint (online) text annotation.

 

Analysis: Using specially developed software, it is possible to process digitally-applied annotations to come to a better understanding of the studied text(s). For example, you can rename and/or group annotations; produce overviews of annotations with an extra option to link to the annotated passages by clicking on a tag; search for particular annotations; create links between certain passages; and create visual representations of the relationship(s) between annotations as networks.

Tools

Simple Annotation:  Word documents allow for simple options for annotation, such as the underline, highlighter and insert comment features. There are various annotation systems for files in pdf format. The more recent versions of Adobe Reader have similar features to Word. However, there are also specific pdf annotators, such as the PDF Annotator (available on pc, laptop or tablet) and  Annotate (for tablets). Most modern e-readers also have simple annotation functionality.

Annotation and analysis: Programs that also allow you to add labels to the text and further process and analyse fall into the category of CAQDAS (Computer Assisted Qualitative Data AnalysiS) and have been developed for various forms of qualitative analysis. Examples of these programs are:

MAXQDA is a software program that helps users analyze qualitative data. It can import many types of data, including text, interviews, focus groups, PDFs, web pages, spreadsheets, articles, e-books, bibliographic data, videos, audio files, and social media data.

Atlas.TI is a program that supports qualitative analysis of textual, graphical and audiovisual data. It can be used for both systematic and creative analysis of unstructured data. You can use the programme to annotate your research material in different ways and add comments (called. memos) to all kinds of objects (such as documents, quotes or annotations). It is also possible to link various parts of the material by using hyperlinks. Annotations, comments and hyperlinks can be viewed, searched and combined in several ways. Annotations can be tallied and exported in SPSS format for further statistical analysis.

NVivo.  NVivo Transcription is seamlessly integrated with NVivo to remove the tedium of manual transcriptions. The transcription add-on allows researchers to skip the time-consuming transcription process with automated transcriptions that offer 90% accuracy from quality audio and video formats. A native editor helps you quickly make changes, tag speakers, and ensure proper formatting all while hearing the synchronized audio.

Online annotation
There are a growing number of online annotation environments that also allow joint text annotation. Many of these environments still have limited functionality (simple annotation). eMargin  is an online collaborative annotation tool. You can highlight, color-code, write notes and assign tags to individual words or passages of a text. These annotations can be shared among groups, generating discussions and allowing analyses and interpretations to be combined.

See its User Guide.