Data Structure

Creating Structure

Data modeling is the process of translating your understanding of the objects you study and the relationships between them into a form that takes advantage of the analytic methods you want to use, including digital technology.You start with an original source of raw data – text, transcripts, photos, or whatever you are studying. To summarize, analyze, and otherwise conduct further research, information is added to or collected on the data (e.g. annotations). Some analyses can be performed more easily on structured data, so information is grouped into categories and organized for easier computer processing. Data can now be grouped and arranged, and groups can be summarized to create an overall picture of the data
 
How do you do this? Coming up with a good data model is an art, not a science – if you have something that works for you, it’s correct. That said, there are some general tips that will help keep things running smoothly in most cases. The rest of this tutorial assumes that you are organizing your data using a spreadsheet program such as Excel.

Basic rules for organizing data in spreadsheets

  • Put just one thing in a cell
  • Organize the data as a single rectangle, with subjects / cases as rows and variables / features as columns, and with a single row at the top with a brief descriptive name for what is in the column
  • Column headers should be brief and descriptive
  • Create a data dictionary – a separate document explaining what is in your rows and columns
  • Do not include calculations in the original data files
  • Do not use font color or highlighting as data

The Digital Humanities Workbench has an example of a well-structured spreadsheet on their site Structured Data. The article Data Organization in Spreadsheets has some rectangular and non-rectangular examples to illustrate the difference.

Characteristics of a good rectangle:

  • Each row is a case; a single instance of the thing you're examining
  • Figuring out what your case is - the fundamental, smallest unit or element of what you are studying - is an important part of constructing your data model
  • Each column is a feature or characteristic of the case - a variable
  • Setting up your data this way lets you make use of the features of your spreadsheet program - you can sort on characteristics, count them, and ask questions about the characteristics of your cases like what percentage of X is Y
  • If applicable, the source of a data item or characteristic should be included as a column

This illustration shows two rectangular spreadsheets. The second is a well-structured rectangle. The first is a rectangle but will cause issues with data entry and analysis.

Don't merge cells, even if it saves a little typing or looks neater. Treat each row as a separate item and include all the information about that case in the row.
Nonrectangular spreadsheet