Managing Research Data Tutorial

This tutorial covers the basics of data sharing, data management plans, and how to store and share your research data.

Data Organization

It can be good to think about the end goal of your research proposal and develop your organization plan based on that goal. You may still need to modify your plan as the research progresses but it's still good to have an idea about your plan and a system in place right from the start.  The key is being able to find a system that works for you and following the system right from the beginning of the project.  While this may seem like a simple task, it's easy for digital data to quickly get out of hand when organizational procedures are not followed.

There are several pieces to keep in mind while setting up your file organization structure -

File Version Control

To keep track of versions of documents and datasetsFolder Clipart

Use naming conventions

Always record every change to a file no matter how small

Discard obsolete versions after backups have been made.

Directory Structure Naming Conventions

Directory top-level folders should include the project title, a unique identifier, and the date (year).

Image by OpenClipart-Vectors from Pixabay

The substructure should have a clear, consistent naming convention, e.g., uniform conventions for labeling each run of an experiment, each version of a dataset, and/or each person in the group.

File Naming Conventions

Identify the activity or project in the file name as well as the date the dataset was created

Avoid using special characters (e.g., $ % & # @ /) as these can become easily corrupted or misinterpreted by various operating systems or software.

Instead of using spaces, use CamelCase or Pot_hole_case as some softwares and operating systems have trouble processing spaces.

File Renaming

Renaming files individually can be tedious and lead to errors in naming conventions so it is best to find a free batch renaming tool instead.

For macOS users, bulk renaming can be completed without additional software by following these directions

File Naming Conventions for Specific Disciplines

File naming illustrationFor example -

DOE's Atmospheric Radiation Measurement (ARM) program

The Open Biological and Biomedical Ontologies

GIS Datasets

 

 

 

 

 

Image by xkcd from XKCD

 

Unique Identifiers

Datasets identifiers will allow your data to be referenced and shared. Data identifiers must be globally unique and persistent: they must not be repeated elsewhere and they must not change over time.

Identifier schemes:

Data Documentation

Data Dictionary

A Data Dictionary is a collection of names, definitions, and attributes about data elements that are being used or captured in a database, information system, or part of a research project. It describes the meanings and purposes of data elements within the context of a project, and provides guidance on interpretation, accepted meanings and representation. A Data Dictionary also provides metadata about data elements. The metadata included in a Data Dictionary can assist in defining the scope and characteristics of data elements, as well the rules for their usage and application. 

Why Use a Data Dictionary?

Data Dictionaries are useful for a number of reasons. In short, they:

  • Assist in avoiding data inconsistencies across a project
  • Help define conventions that are to be used across a project
  • Provide consistency in the collection and use of data across multiple members of a research team
  • Make data easier to analyze
  • Enforce the use of Data Standards

From UC Merced Library

README Files

Provide a clear and concise description of all relevant details about data collection, processing, and analysis in a README file. This will help others interpret and reanalyze your dataset.

README files are created for a variety of reasons:

  • to document changes to files or file names within a folder
  • to explain file naming conventions, practices, etc. "in general" for future reference
  • to specifically accompany files/data being deposited in a repository

It is best practice to create a README file for each dataset regardless of whether it is being deposited in a repository because the document might become necessary at a later point.

  • A good data practice is to store a readme.txt with each distinct dataset that explains your file naming convention along with any abbreviations or codes you have used.
  • Write your README file as a plain text file, and avoid proprietary formats, such as Microsoft Word, whenever possible. However, PDF is acceptable when formatting is important.
  • If you deposit your final datasets in a data repository, the repository may ask you to provide a README file with additional details about your datasets, such as methodological information or sharing/access information. Creating a README file at the beginning of your research process, and updating it consistently throughout your research, will help you to compile a final README file when your data is ready for deposit.

From Longwood Research Data Management

Metadata

"Metadata, the information we create, store, and share to describe things, allows us to interact with these things to obtain the knowledge we need. The classic definition is literal, based on the etymology of the word itself—metadata is “data about data.”

From Understanding Metadata: What is Metadata, and What is it For? A Primer

Appropriate metadata allows the PI to understand, use, and share their own data now and in the future. It also facilitates long-term archival preservation of the data so that other researchers can discover, access, use, repurpose, and cite your data in the future.

For more information about metadata, including discipline-specific standards, visit the link below.