BCO-DMO Data Access
Overview
Teaching: 10 min
Exercises: 5 minQuestions
How do I search for data in ERDDAP?
How can I subset a dataset?
How do I download a dataset?
Objectives
Downloading data with erddap
Downloading data using the dataset buttons
Where are we in the data life cycle?
What is ERDDAP?
When scientists make their data available online for people to re-use it, there can often still be barriers that stand in the way of easily doing so. Reusing data from another source is difficult:
- different way of requesting data
- different formats: you work with R while colleague is working with Matlab and the other one with python
- Need for standardized metadata
This is where ERDDAP comes in. It gives data providers the ability to, in a consistent way, download subsets of gridded and tabular scientific datasets in common file formats and make graphs and maps.
There is no “1 ERDDAP server”, instead organizations and repositories have their own ERDDAP server to distribute data to end users. These users can request data and get data out in various file formats. Many institutes, repo’s and organizations (including NOAA, NASA, and USGS) run ERDDAP servers to serve their data.
BCO-DMO has its own ERDDAP server that is continuously being updated. We added ERDDAP badges to make it easy for new users to grab the dataset in the format they need.
Downloading a Dataset
You can access data from the ERDDAP server, but you can also download a whole dataset from the Dataset Metadata Page itself. There are buttons to easily download data in many file formats. Dataset AE1910 CTD Profiles
You can click the CSV
button to download the data table in csv-format. You can then open it in the editor of your choice. Below is what it looks like in Excel.
- “Understand all the different factors for reusing online data with ERDDAP” keypoints:
- “Searching an ERDDAP data catalog can be done using a web page”
- “Data can be downloaded in different file formats”
- “Constraints can be added to a dataset search”
Subsetting Data
For this example, we’ll zoom in on the BATS CTD dataset that BCO-DMO is serving. The dataset landing page can be found here: https://www.bco-dmo.org/dataset/3918
This dataset has data from 1988 to 2016, so it is a very big dataset. Clicking on the “view table” button will try to pull up the data table, but it is very big and not easily to pull up and to download.
An easier way to download the data is to subset it. Which means taking a slice of the dataset that you are interested in in particular.
The ERDDAP subset page has 2 important parts:
- Subsetting variables: Here you can set which variables you want to download.
- “Download” part. Here you can choose from many different formats to download the datasets (and is the true power of ERDDAP). We will download the dataset in .csvp = 1 line of variables.
Key Points
Data can be downloaded in different file formats based in user needs
ERDDAP helps converting files to the needed format
Datasets can be subsetted for easier use