Provenance Walkthrough
Overview
Teaching: 5 min
Exercises: 0 minQuestions
Objectives
Provenance capture walkthrough. Download a dataset, write provenance during analysis
Live Demo: Download a dataset, write provenance during analysis.
- Subset dataset “Niskin bottle samples” https://www.bco-dmo.org/dataset/3782
- Click the
Subset Data
button at the top of the page.- Download a subset of this dataset containing cruise 314, cast 005.
- Open a text editor and write down every step you take. Make sure to write the date you are handling the data.
Solution
You can use this link to download the subset of data for the exercise from BCODMO: OR you can download the csv file from this lesson
Next steps
- Make a new sheet for the data you change during your analysis.
- Invert the depth axis since depth of 0 is the surface.
Anyone can create metadata
You don’t need any special skills to write metadata and documentation to keep track of your provenance.
However, there are specifications and tools you can learn that have huge benefits. See more about metadata specifications like PROV.
Version control (e.g. git/github) is a great way to keep track of all the changes in your files. It does have a learning curve but will save you time and frustration in the long run after you learn it.
I’m sure everyone has experienced this frustration:
from: Wit and wisdom from Jorge Cham (http://phdcomics.com/)
Git will keeps track of all the differences in your files over time, no need to keep a million copies! You can make notes for each version of your files too.
Learn more about Version Control and Git in a Software Carpentry.
Open new doors with a programming language
Like version control, learning a programming language has a learning curve. But the benefits after you learn it will be substantial. It will open up a lot of doors for your current research, and you will have a valuable skill that is in demand in many fields including research.
There are many resources online for learning a programming language, but you can check out the “Software Carpentry” lessons which https://software-carpentry.org/lessons/
Python Example
An example python notebook that is fully reproducible that does the exact same thing we just did manually to create that plot. BATS niskin subset example notebook And my text.
Key Points
Provenance should be captured while you work with your data.