Skip to Main Content

Inclusive and Responsible Dataset Usage

Information about getting started on working with data in an inclusive and responsible manner.

When You Start

  • What is your research question, and what is the best dataset to help you answer it? Remember it might not be the most obvious or the most popular dataset, but one you haven’t heard of yet.
  • How does the resource contribute to your research agenda or your disciplinary area(s)? How does it help answer your research question?

While Choosing a Dataset or Collecting Data

  • What kinds of content, files, and metadata fields does the dataset or collection contain? What format are they offered in for use? e.g. text that can be mined, images that can be sorted, metadata fields that are consistent, etc.? 
  • How is the data organized? Is it available as structured or unstructured data? Is it structured consistently and concretely, suitable for use in computation? What kind of pre-processing will it need? 
  • How does the dataset account for its origins and practices? Does it contain descriptions of provenance, known absences, modifications? Is there a data card or data sheet to describe this information in detail?
  • How can the dataset be accessed? Can discrete data fields and/or files be pulled from the collection as a group? e.g. Is there an API (application program interface) for accessing the data programmatically? Can it be bulk downloaded or crawled from static directories?
  • Is it viable to use the information as a dataset, whether covered by the license agreement, or without violating any license, policies, labels, or data stewardship policies?
  • How are diverse communities and forms of knowledge represented within this dataset? 
  • How does the dataset account for any (in)complete aspects? 
  • How does the dataset’s curatorial/authorial perspective contribute to a wide range of subject positions? 
  • How does its content contribute to a wide range of subject positions? 
  • How much transparency is there in the provenance and source of the content?

As You Release Your Work

  • How will others access your research results and the originating dataset? Is the information equally available to all, or behind a login? 
  • Who will maintain the resource, and how will it be funded? Who will maintain it if that person is no longer available? 
  • What formats will it be shared in?
  • How can it meet the needs of a variety of users, potentially accessing in different modes?