Research Data Management

Resource for best practices in managing, storing, and sharing your research data

Data Organization

It can be good to think about the end goal of your research proposal and develop your organization plan based on that goal.  While you may still need to modify your plan as the research progresses, it's still good to have an idea about your plan and a system in place right from the start.  The key is being able to find a system that works for you and following the system right from the beginning of the project.  While this may seem like a simple task, it's easy for digital data to quickly get out of hand when organizational procedures are not followed.

There are several pieces to keep in mind while setting up your file organization structure -

File Version Control

To keep track of versions of documents and datasets

Use naming conventions

Always record every change to a file no matter how small

Discard obsolete versions after backups have been made.

Directory Structure Naming Conventions

Directory top-level folders should include the project title, a unique identifier, and the date (year).

The substructure should have a clear, consistent naming convention, e.g., uniform conventions for labeling each run of an experiment, each version of a dataset, and/or each person in the group.

File Naming Conventions

Identify the activity or project in the file name as well as the date the dataset was created

Avoid using special characters (e.g., $ % & # @) as these can become easily corrupted or misinterpreted by various operating systems or software.

Instead of using spaces, use CamelCase or Pot_hole_case

File Renaming

Renaming files individually can be tedious and lead to errors in naming conventions so try these free batch renaming tools instead -

File Naming Conventions for Specific Disciplines

For example -

DOE's Atmospheric Radiation Measurement (ARM) program

The Open Biological and Biomedical Ontologies

GIS Datasets

Unique Identifiers

Datasets identifiers will allow your data to be referenced and shared. Data identifiers must be globally unique and persistent: they must not be repeated elsewhere and they must not change over time.

Identifier schemes: