Skip to Main Content

Data Management and Record Keeping: Data Organization & Documentation

Tools, tips and checklists for creating a data management plan and managing data you generate

Further Information & Resources

  • Some popular data repository options are listed on our Sharing & Reuse page. Check the sites listed to see their metadata requirements.
  • Cornell's Research Data Management Service Group has a very helpful guide to writing 'readme' style metadata to describe your data.
  • Include the following in a readme text file:
    • The data’s purpose
    • A list of the files in your data package
    • Data dictionary listing and describing all variables
  • Contact the Gordon Library for questions related to metadata standards and schema.
  • Here is a working list of metadata standards organized by discipline, from the Research Data Alliance (RDA).
  • Specific standards include FGDC19115 (geospatial metadata standards) and Ecological Metadata Language (EML).

Data Structure/Organization

Data organization principles:

  • use one variable per column
  • make one observation per row
  • use human-readable column names
  • include one table per tab
  • indicate relationships between tables using a key

Document (in a readme text file) the following:

  • the data's purpose
  • a list of the files in your data package
  • all the variables, listed out and described (data dictionary)

*Information adapted from Educopia Institute ETD+ Toolkit

Metadata

"Metadata is descriptive or contextual information which refers to or is associated with another object or resource. This usually takes the form of a structured set of elements which describes the information resource and assists in the identification, location, and retrieval of it by users, while facilitating content and access management." (Digital Curation Centre)

It exists to help users find and understand content.

Metadata describes:

  • Who created it?
  • What is it?
  • When, where, how, and why was it created?

Metadata also helps determine:

  • How data will be shared (publicly or not).
  • The ways people are able to search for and find it via Google, library catalogs, or other search methods and platforms.

Data dictionaries and codebooks are examples of metadata as well.

Common metadata fields:

  • Title
  • Author/Creator
  • Contributor
  • Resource Type
  • Date
  • Language
  • Description/Abstract
  • Subject
  • Identifier
  • Rights management information

Example: Dublin Core Metadata Element Set

*Some information adapted from Educopia Institute ETD+ Toolkit

Helpful Tips

  • A lot of your data documentation and metadata decisions will likely be based on where you decide to store your data for the long term. Data repositories often have documentation/metadata requirements, so make sure you are aware of those when you are creating your data management plan.
  • Document your data with research & methods notes, preferably including some type of structured metadata (metadata is generally most useful when working with large amounts of information and/or sharing data).
  • Decide whether you prefer to use handwritten notes or an electronic lab notebook, or both.
  • Describe your data for future use (by you, your collaborators, or other researchers within or outside of your field).
  • Be consistent in your file naming and organization.
  • Be aware of any intellectual property issues surrounding the data your are using. The data itself might not be under copyright, but its organization (within or outside of a database) might be. It might also be protected in relation to a patent or subject to a license or contract.

File Naming Conventions

File Naming Conventions (FNCs) can help others (and you, in the future) better understand and navigate through your work.

General tips:

  • Give files names that are descriptive and consistent, but try and keep names to less than 25 characters
  • Use underscores (_) or dashes (-) to separate words, instead of using spaces. If you don't want to use either of these, you can captalize the first letter of each word instead
  • Avoid special characters
  • Use consistent date conventions such as YYYYMMDD or YYYY-MM-DD (year first because it is the most stable)

Bad FNCs:

  • ProjectData.xlsx
  • LabWorkJess.docx

Better FNCs:

  • 20130503_DOEProject_DesignDocument_OToole_v2.docx
  • 20140123_DOEProject_ProjectMeetingNotes_Steckervetz_v1.docx

FNCs are also important for version control: the process of managing changes to your files over time. Save copies of files as you make changes and keep your file naming scheme consistent to ensure you are able to find what changes were made when (and by whom).