Provenance Walkthrough

Overview

Teaching: 5 min
Exercises: 0 min
Questions
Objectives
  • Provenance capture walkthrough. Download a dataset, write provenance during analysis

Live Demo: Download a dataset, write provenance during analysis.

  • Subset dataset “Niskin bottle samples” https://www.bco-dmo.org/dataset/3782
  • Click the Subset Data button at the top of the page.
  • Download a subset of this dataset containing cruise 314, cast 005.
  • Open a text editor and write down every step you take. Make sure to write the date you are handling the data.

Solution

subsetting solution You can use this link to download the subset of data for the exercise from BCODMO: OR you can download the csv file from this lesson

Next steps

  • Make a new sheet for the data you change during your analysis.
  • Invert the depth axis since depth of 0 is the surface.

version_control_memeanalysis sheet in excel

Anyone can create metadata

You don’t need any special skills to write metadata and documentation to keep track of your provenance.

However, there are specifications and tools you can learn that have huge benefits. See more about metadata specifications like PROV.

Version control (e.g. git/github) is a great way to keep track of all the changes in your files. It does have a learning curve but will save you time and frustration in the long run after you learn it.

I’m sure everyone has experienced this frustration:

version_control_meme

from: Wit and wisdom from Jorge Cham (http://phdcomics.com/)

Git will keeps track of all the differences in your files over time, no need to keep a million copies! You can make notes for each version of your files too.

Learn more about Version Control and Git in a Software Carpentry.

Open new doors with a programming language

Like version control, learning a programming language has a learning curve. But the benefits after you learn it will be substantial. It will open up a lot of doors for your current research, and you will have a valuable skill that is in demand in many fields including research.

There are many resources online for learning a programming language, but you can check out the “Software Carpentry” lessons which https://software-carpentry.org/lessons/

Python Example

An example python notebook that is fully reproducible that does the exact same thing we just did manually to create that plot. BATS niskin subset example notebook And my text.

analysis sheet in excel

Key Points

  • Provenance should be captured while you work with your data.