Open Science

A guide to open science principles.

Open Science in the Data Collection Stage

During the data collection or data creation stage of a research project, it's important that researchers practice data management to not only improve organization and workflows but to ensure the integrity of their results. When data are easier to find, understand, and navigate, a research project can more easily be shared and reproduced. This page covers suggestions for file naming conventions, stable file formats, and file hierarchies.

For more a more comprehensive overview, please visit our Research Data Management Guide:

Research Data Management
by Rachel Davis Last Updated Jul 2, 2025 442 views this year

File Naming Conventions

A file naming convention is a standard framework for naming your files in a way that describes them accurately and consistently. Establishing a file naming convention prior to starting a project can improve organization and accessibility for your research ream.

Image created using Canva.com

Date Formatting
By choosing a standard format for dates, you can avoid confusion and error when naming files. The ISO 8601 date format is an international standard for representing dates and times, which allows for unambiguous file description by ordering as year, month, & day:

Example: YYYY-MM-DD = 2021-05-02 or 20210502

Follow this link for more information about ISO date formatting: https://www.iso.org/iso-8601-date-and-time-format.html

Standard Characters
Only standard, alphanumeric characters should be used in file names. It’s good practice to:

Avoid special characters such as !, #, &, and *. This can impact how the file is displayed if moved between different operating systems.
Avoid starting or ending your filename with a non-alphanumeric character such as a hyphen or period

Example: 20180502_survey_results.csv, rather than 201805.survey.results.csv

Use underscores or capital letters to separate words in your file name:

CamelCase	Pot_hole_case
ShovelTestSample002.csv	shovel_test_sample_002.csv
20240715_TissueScanSample005.tiff	20240715_tissue_scan_sample_005.tiff

Sequential Ordering

When using numbers in a file name to designate an order, use leading zeros for consistency and better readability. Labeling a file with 01 will order files up to 99, and 001 will order files up to 999.

File Directory Structure Conventions

Structuring your data folders in a directory is useful for making it easier to locate and organize files and versions. Evaluate the best hierarchy for organizing your files and determine if a deep or shallow hierarchy suits your needs better. If your team has multiple independent data collections, it's recommended to create distinct folders for each one.

Directory top-level folders should include the project title, a unique identifier, and the date (year). The substructure should have a clear, consistent naming convention, e.g., uniform conventions for labeling each run of an experiment, each version of a dataset, and/or each person in the group.

File Formats to Avoid Obsolescence

As technology changes, so too do the ways researchers can access and utilize data. This includes ever changing file formats for proprietary software. To increase the longevity of your data, it is recommended to use file formats that are likely to remain accessible for the foreseeable future.

Obsolescence-resistant file formats are typically:

Non-proprietary
Open, documented standards
Commonly used by the research community
Standard representations (i.e. ASCII or Unicode)
Unencrypted
Uncompressed

Examples of these formats are:

PDF or RTF (not Word)
ASCII or CSV (not Excel)
MPEG-4 (not Quicktime)
TIFF or JPEG2000 (not GIF or JPG)
XML or RDF (not RDBMS)

Image by Esteban.alej from Wikimedia Commons

For a more exhaustive list of recommended file formats to avoid obsolescence, visit -

Library of Congress Recommended Formats

More Examples of Stable File Formats

Type of Data	Stable File Format Examples
Text	ASCII, XML, PDF/A, HTML, UTF-8
Tabular Data	CSV
Still Images	TIFF, JPEG, PDF, PNG, GIF, BMP
Geospatial	SHP, DBF, GeoTIFF, NetCDF
Databases	XML, CSV

Data Versioning

It can be both useful and necessary to retain different versions of datasets as they are transformed. For example, data may need re-processing to include new calculations; errors may need to be corrected; or new data might need to be generated and added to the dataset.

Instead of editing the original file, which may risk irreversible loss of the raw data if an error is made during the overwriting process, researchers can create versions of datasets as changes are made.

To manage and keep track of older data versions, it’s recommended to add a number to the file name for each version.

For example, V2, V3, etc:

20240325_MultiAnalysis_Submission_v2.csv
HarrisStudy_Survey002_20220603_v3.1.csv

Data Versioning Control Tables

Creating a control table can keep track of different versions of your files and help you and your team document changes. A control table describes which versions of the document were created, what the change was, who made it, and when. Consider creating a README file or spreadsheet listing the data versions to include with your dataset prior to storage. See below for an example:

Version	Author	Change or Purpose	Date
1.1	TCN	Corrected formula in column 7	20240427
1.2	RD	Amended references 3 and 4	20240430
1.3	JRW	Formatted results table	20240515
2.0	MJM	Added statistical analysis section	20240529