Introduction
|
|
Tabular Data & Spreadsheets
|
Data organization starts at the sampling phase of a reserach project.
Spreadsheets are good for data entry, but we use them for a lot more like formatting tables for publication and figures.
Not all data is tabular.
|
Formatting Data Tables
|
Computers need to be able to understand data tables
Never modify your raw data.
Keep track of all of the steps you take to clean your data.
Organize your data according to tidy data principles.
|
Discussion Formatting Problems
|
Avoid using multiple tables within one spreadsheet.
Avoid spreading data across multiple tabs.
Record zeros as zeros.
Use an appropriate null value to record missing data.
Don’t use formatting to convey information or to make your spreadsheet look pretty.
Place comments in a separate column.
Record units in column headers.
Include only one piece of information in a cell.
Avoid spaces, numbers and special characters in column headers.
Avoid special characters in your data.
Record metadata in a separate plain text file.
|
Dates as data
|
|
Exporting data
|
Data stored in common spreadsheet formats will often not be read correctly into data analysis software, introducing errors into your data.
Exporting data from spreadsheets to formats like CSV or TSV puts it in a format that can be used consistently by most programs.
|