Research Data Management

Resource for best practices in managing, storing, and sharing your research data

Data Versioning

It can be both useful and necessary to retain different versions of datasets as they are transformed.  For example, data may need re-processing to include new calculations; errors may need to be corrected; or new data might need to be generated and added to the dataset.


Instead of editing the original file, which may risk irreversible loss of the raw data if an error is made during the overwriting process, researchers can create versions of datasets as changes are made.


To manage and keep track of older data versions, it’s recommended to add a number to the file name for each version.

For example, V2, V3, etc:

  • 20240325_MultiAnalysis_Submission_v2.csv
  • HarrisStudy_Survey002_20220603_v3.1.csv

Data Versioning Control Tables

Creating a control table can keep track of different versions of your files and help you and your team document changes.  A control table describes which versions of the document were created, what the change was, who made it, and when.  Consider creating a README file or spreadsheet listing the data versions to include with your dataset prior to storage. See below for an example:

Version Author Change or Purpose Date
1.1 TCN Corrected formula in column 7 20240427
1.2 RD Amended references 3 and 4  20240430
1.3 JRW Formatted results table 20240515
2.0 MJM Added statistical analysis section 20240529

Tips

  • Always record every change to a file no matter how small.

  • Discard obsolete versions after backups have been made.

  • Always keep a "read-only" version of your raw, unprocessed dataset to protect against unintentional changes.