Research Data Management

Resource for best practices in managing, storing, and sharing your research data

Data Versioning

It can be both useful and necessary to retain different versions of datasets as they are transformed. For example, data may need re-processing to include new calculations; errors may need to be corrected; or new data might need to be generated and added to the dataset.

Instead of editing the original file, which may risk irreversible loss of the raw data if an error is made during the overwriting process, researchers can create versions of datasets as changes are made.

To manage and keep track of older data versions, it’s recommended to add a number to the file name for each version.

For example, V2, V3, etc:

20240325_MultiAnalysis_Submission_v2.csv
HarrisStudy_Survey002_20220603_v3.1.csv

Data Versioning Control Tables

Creating a control table can keep track of different versions of your files and help you and your team document changes. A control table describes which versions of the document were created, what the change was, who made it, and when. Consider creating a README file or spreadsheet listing the data versions to include with your dataset prior to storage. See below for an example:

Version	Author	Change or Purpose	Date
1.1	TCN	Corrected formula in column 7	20240427
1.2	RD	Amended references 3 and 4	20240430
1.3	JRW	Formatted results table	20240515
2.0	MJM	Added statistical analysis section	20240529

Tips

Always record every change to a file no matter how small.
Discard obsolete versions after backups have been made.
Always keep a "read-only" version of your raw, unprocessed dataset to protect against unintentional changes.