Documenting and Organizing Data

Documentation

Having good documentation doesn't necessarily mean writing a book about your data. A good basic documentation setup can be as simple as:

  • Set up a sensible folder structure with one data collection per folder, and descriptive folder names
  • Use intelligible file names
  • Have a readme file for each folder (or in some cases, each data file)

Make sure you note down every part of how you collect your data as you are doing it: dates, decisions made, choices to include or exclude. This information goes in your readme file. A readme file is a text file that explains what you need to know before opening any of the files in the folder, and is usually called something like "ReadMe.txt".

Agree with your research partners on a file naming convention. The pattern Descriptive_name-person_who_modified-change_made-date.xlsx often works. (You can leave out the name of the person wwho modified the file if you are the only person working with your data.)

  • Bad: FinalData.xlsx, ThisIsReallyTheFinalVersion.xlsx, FinalFinalDataFIXED.xlsx
  • Better:CodedTweets-Kristi-random_subsample-Feb3_16.xlsx

This is especially helpful if you follow the practice of saving your file under a new name every time you make a major change.

Readme Files

A readme file for a folder containing a data collection will typically include:

  • Descriptive title for the data collection
  • Principal investigator (or person responsible for collecting the data), contact person for questions (if data is shared)
  • Date of data collection or date data first accessed / acquired
  • Information about geographic location of data collection, if relevant
  • For each file in the data collection
    • Short description of what data it contains
    • Methods of data collection or, if secondary data, how it was acquired
    • Description of any changes made since collection or acquisition (note that you should also retain an unmodified copy of your original data!)
    • Missing data codes

(Adapted from Cornell University,Guide to readme style metadata.)

Ideally, you should have sufficient documentation on your data that a random stranger who is knowledgeable in your field would be able to:

  • Follow and understand the steps you took to collect your data in the first place and the decisions you made along the way
  • Take your original data file and reproduce the changes you made to it to get your data into its final form
  • Reproduce any tables or charts that support your research conclusions

This is what is meant by reproducible research.