2 Data Retrieval

2.1 Obtaining data summaries

Data in the CMAP database exists in tables, and can be accessed by specific table names, and variable names therein. For instance, sea surface temperature is from the table "tblSST_AVHRR_OI_NRT", and the variable name is "sst".

Some simpler functions for accessing data are useful for obtaining summaries of the data prior to downloading the data itself.

  • Name of variables (get_columns())
  • Type of variables, i.e., quantitative, qualitative, time, (get_var_catalog())
  • Space-time range information (get_var_coverage),
  • Numerical variable range summary (get_var_stat()).

These are used in the various following functions, which are designed to summarize data tables by extracting these table attributes. This is useful for learning about the tables without downloading them prior to analysis. Here are some key examples (the piping using %>% to kable() is just for formatting purposes):

get_head("tblArgoMerge_REP") %>% kable() 
float_id cycle time lat lon depth position_qc direction data_mode data_centre argo_merge_cdnc argo_merge_cdnc_qc argo_merge_cdnc_adj argo_merge_cdnc_adj_qc argo_merge_cdnc_adj_err argo_merge_pressure argo_merge_pressure_qc argo_merge_pressure_adj argo_merge_pressure_adj_qc argo_merge_pressure_adj_err argo_merge_salinity argo_merge_salinity_qc argo_merge_salinity_adj argo_merge_salinity_adj_qc argo_merge_salinity_adj_err argo_merge_temperature argo_merge_temperature_qc argo_merge_temperature_adj argo_merge_temperature_adj_qc argo_merge_temperature_adj_err argo_merge_O2 argo_merge_O2_qc argo_merge_O2_adj argo_merge_O2_adj_qc argo_merge_O2_adj_err argo_merge_bbp argo_merge_bbp_qc argo_merge_bbp_adj argo_merge_bbp_adj_qc argo_merge_bbp_adj_err argo_merge_bbp470 argo_merge_bbp470_qc argo_merge_bbp470_adj argo_merge_bbp470_adj_qc argo_merge_bbp470_adj_err argo_merge_bbp532 argo_merge_bbp532_qc argo_merge_bbp532_adj argo_merge_bbp532_adj_qc argo_merge_bbp532_adj_err argo_merge_bbp700 argo_merge_bbp700_qc argo_merge_bbp700_adj argo_merge_bbp700_adj_qc argo_merge_bbp700_adj_err argo_merge_turbidity argo_merge_turbidity_qc argo_merge_turbidity_adj argo_merge_turbidity_adj_qc argo_merge_turbidity_adj_err argo_merge_cp argo_merge_cp_qc argo_merge_cp_adj argo_merge_cp_adj_qc argo_merge_cp_adj_err argo_merge_cp660 argo_merge_cp660_qc argo_merge_cp660_adj argo_merge_cp660_adj_qc argo_merge_cp660_adj_err argo_merge_chl argo_merge_chl_qc argo_merge_chl_adj argo_merge_chl_adj_qc argo_merge_chl_adj_err argo_merge_cdom argo_merge_cdom_qc argo_merge_cdom_adj argo_merge_cdom_adj_qc argo_merge_cdom_adj_err argo_merge_NO3 argo_merge_NO3_qc argo_merge_NO3_adj argo_merge_NO3_adj_qc argo_merge_NO3_adj_err argo_merge_bisulfide argo_merge_bisulfide_qc argo_merge_bisulfide_adj argo_merge_bisulfide_adj_qc argo_merge_bisulfide_adj_err argo_merge_ph argo_merge_ph_qc argo_merge_ph_adj argo_merge_ph_adj_qc argo_merge_ph_adj_err argo_merge_down_irr argo_merge_down_irr_qc argo_merge_down_irr_adj argo_merge_down_irr_adj_qc argo_merge_down_irr_adj_err argo_merge_down_irr380 argo_merge_down_irr380_qc argo_merge_down_irr380_adj argo_merge_down_irr380_adj_qc argo_merge_down_irr380_adj_err argo_merge_down_irr412 argo_merge_down_irr412_qc argo_merge_down_irr412_adj argo_merge_down_irr412_adj_qc argo_merge_down_irr412_adj_err argo_merge_down_irr443 argo_merge_down_irr443_qc argo_merge_down_irr443_adj argo_merge_down_irr443_adj_qc argo_merge_down_irr443_adj_err argo_merge_down_irr490 argo_merge_down_irr490_qc argo_merge_down_irr490_adj argo_merge_down_irr490_adj_qc argo_merge_down_irr490_adj_err argo_merge_down_irr555 argo_merge_down_irr555_qc argo_merge_down_irr555_adj argo_merge_down_irr555_adj_qc argo_merge_down_irr555_adj_err argo_merge_up_irr argo_merge_up_irr_qc argo_merge_up_irr_adj argo_merge_up_irr_adj_qc argo_merge_up_irr_adj_err argo_merge_up_irr412 argo_merge_up_irr412_qc argo_merge_up_irr412_adj argo_merge_up_irr412_adj_qc argo_merge_up_irr412_adj_err argo_merge_up_irr443 argo_merge_up_irr443_qc argo_merge_up_irr443_adj argo_merge_up_irr443_adj_qc argo_merge_up_irr443_adj_err argo_merge_up_irr490 argo_merge_up_irr490_qc argo_merge_up_irr490_adj argo_merge_up_irr490_adj_qc argo_merge_up_irr490_adj_err argo_merge_up_irr555 argo_merge_up_irr555_qc argo_merge_up_irr555_adj argo_merge_up_irr555_adj_qc argo_merge_up_irr555_adj_err argo_merge_down_par argo_merge_down_par_qc argo_merge_down_par_adj argo_merge_down_par_adj_qc argo_merge_down_par_adj_err year month week dayofyear
2900114 189 2011-05-11 -59.138 95.245 1040.0 1 A D AO NA NA NA NA NA 1040.0 1 1040.4 1 2.4 34.739 1 34.739 1 0.01 1.544 1 1.544 1 0.002 184.1 0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 2011 5 20 131
2900114 189 2011-05-11 -59.138 95.245 1042.0 1 A D AO NA NA NA NA NA 1042.0 1 1042.4 1 2.4 34.739 1 34.739 1 0.01 1.542 1 1.542 1 0.002 184.2 0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 2011 5 20 131
2900114 189 2011-05-11 -59.138 95.245 1044.0 1 A D AO NA NA NA NA NA 1044.0 1 1044.4 1 2.4 34.739 1 34.739 1 0.01 1.541 1 1.541 1 0.002 184.3 0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 2011 5 20 131
2900114 189 2011-05-11 -59.138 95.245 1046.1 1 A D AO NA NA NA NA NA 1046.1 1 1046.5 1 2.4 34.739 1 34.739 1 0.01 1.542 1 1.542 1 0.002 184.3 0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 2011 5 20 131
2900114 189 2011-05-11 -59.138 95.245 1048.0 1 A D AO NA NA NA NA NA 1048.0 1 1048.4 1 2.4 34.739 1 34.739 1 0.01 1.542 1 1.542 1 0.002 184.3 0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 2011 5 20 131
get_var_coverage("tblAltimetry_REP", "sla") %>% kable()
Time_Min Time_Max Lat_Min Lat_Max Lon_Min Lon_Max Depth_Min Depth_Max
1993-01-01 2018-06-10 -89.875 89.875 -179.875 179.875 NA NA
get_var_resolution("tblAltimetry_REP", "sla") %>% kable()
Temporal_Resolution Spatial_Resolution
Daily 1/4° X 1/4°
get_var_unit("tblAltimetry_REP", "sla") %>% kable()
Unit
m
get_columns('tblSST_AVHRR_OI_NRT') %>% kable()
Columns
lat
lon
time
sst
year
month
week
dayofyear

2.2 Cruises

Out of the many data tables in CMAP, some are data collected and processed from cruises. Here are some examples of summarizing this data.

cruisename = "KOK1606" 
get_cruise_by_name(cruisename) %>% kable()
ID Nickname Name Ship_Name Start_Time End_Time Lat_Min Lat_Max Lon_Min Lon_Max Chief_Name
589 Gradients_1 KOK1606 R/V Kaimikai O Kanaloa 2016-04-20 00:04:37 2016-05-04 02:33:45 21.4542 37.8864 -158.3355 -157.858 Virginia Armbrust
get_cruise_bounds(cruisename) %>% kable()
ID dt1 dt2 lat1 lat2 lon1 lon2
589 2016-04-20 00:04:37 2016-05-04 02:33:45 21.4542 37.8864 -158.3355 -157.858
get_head('tblAMT13_Chisholm') %>% kable()
time lat lon depth Pro_abundance Pro_total_ecotype_abundance Pro_eMED4_abundance Pro_eMIT9312_abundance Pro_eMIT9211_abundance Pro_eNATL2A_abundance Pro_eSS120_abundance Pro_eMIT9313_abundance Syn_abundance density_AMT13 temp_C_AMT13 csal_ppt_AMT13 Light_Quanta_m2_sec_AMT13 chlA_AMT13 DCM_AMT13 MLD_05C_AMT13 MLD_25C_AMT13 MLD_03Density_AMT13 MLD_125Density_AMT13 TotalDepth_AMT13 Pro_eMED4_abundance_quality Pro_eMIT9312_abundance_quality Pro_eMIT9211_abundance_quality Pro_eNATL2A_abundance_quality Pro_eSS120_abundance_quality Pro_eMIT9313_abundance_quality Pro_total_ecotype_abundance_quality
2003-09-16 09:55:00 47.983 -11.539 9.280 0.9206000 96901.054 95878.590 96.139927 1.000000 923.3245 1.0000000 1.0000000 0.2825333 NA 17.8958 35.6425 45.652811 0.0761 NA NA NA NA NA NA NA NA 0 NA 0 0 0
2003-09-16 09:55:00 47.983 -11.539 43.017 0.2690000 21001.014 19852.153 24.226982 1.000000 1121.9211 0.7138027 1.0000000 0.4385333 NA 16.3019 35.6470 9.754238 0.1072 NA NA NA NA NA NA NA NA 0 NA NA 0 0
2003-09-16 09:55:00 47.983 -11.539 53.629 0.1236000 4502.291 3847.637 6.132032 1.000000 645.5221 1.0000000 1.0000000 0.2341333 NA 15.2485 35.5911 4.237555 0.5024 NA NA NA NA NA NA NA NA 0 NA 0 0 0
2003-09-16 09:55:00 47.983 -11.539 62.327 0.0954667 2468.064 2120.974 4.158969 3.010477 338.6858 0.2588225 0.9763928 0.1315333 NA 14.6576 35.5787 2.620675 0.2852 NA NA NA NA NA NA NA NA NA NA NA NA NA
2003-09-16 09:55:00 47.983 -11.539 78.587 0.0576667 2315.213 1420.991 2.133862 1.000000 889.0881 1.0000000 1.0000000 0.0554000 NA 13.8446 35.5778 1.134332 0.1979 NA NA NA NA NA NA NA NA 0 NA 0 0 0
get_columns('tblAMT13_Chisholm') %>% head(10) %>% kable()
Columns
time
lat
lon
depth
Pro_abundance
Pro_total_ecotype_abundance
Pro_eMED4_abundance
Pro_eMIT9312_abundance
Pro_eMIT9211_abundance
Pro_eNATL2A_abundance

As you can see, each table contains many data "rows", which are indexed by "keys" of the four values:

(Time, Latitude, Longitude, Depth)

Most functions in this package will involve using these data key values, or user-specified ranges of these keys, for certain operations. The most basic operation, explained next, is to download a dataset from a time and space range of interest.

2.3 Download data

When retrieving all data from CMAP in a "rectangle" of space and time, specifies the following:

  • Name of table (table)
  • Names of variables in that table (variable)
  • Upper and lower limits of the variables (dt1, dt2, lon1, lon2, lat1, lat2, depth1, depth2)

These are used in the space_time() function, which downloads the data as a data frame.

get_spacetime(tableName = 'tblArgoMerge_REP',
              varName = 'argo_merge_salinity_adj',
              dt1='2015-05-01',
              dt2='2015-05-30',
              lat1=28,
              lat2=38,
              lon1=-71,
              lon2=-50,
              depth1=0,
              depth2=100) %>% head(10) %>% kable()
time lat lon depth argo_merge_salinity_adj
2015-05-02 32.348 -63.45 4.3 36.71301
2015-05-02 32.348 -63.45 6.0 36.71301
2015-05-02 32.348 -63.45 7.6 36.71099
2015-05-02 32.348 -63.45 8.0 36.71101
2015-05-02 32.348 -63.45 10.0 36.70300
2015-05-02 32.348 -63.45 11.7 36.69700
2015-05-02 32.348 -63.45 12.0 36.69701
2015-05-02 32.348 -63.45 14.0 36.69500
2015-05-02 32.348 -63.45 16.0 36.69401
2015-05-02 32.348 -63.45 16.7 36.69402

2.4 Colocalize data

First of all, what does it mean to colocalize? Imagine a ship moving along a trajectory, whose location in space and time is recorded in latitude/longitude/time. Then, we'd like to obtain data like sea surface temperature or salinity, as if we were recording it along the ship's trajectory. Colocalization simply means retrieving such data manually from the database, by querying from the vicinity (a rectangle) of each lat/lon/time triplet of interest, and summarizing it (e.g. taking the average).

The size of the vicinity is define by the user-specified "slack" in longitude, latitude and time. For instance, one could ask for 1 degrees of slack in longitude and latitude, and 6 hours of slack in time.

The main function is along_track, which colocalizes data for one or multiple tables and variables at a time. The available cruises can be found by running the function cruises(). The available data tables and variable names are listed in the catalog (get_catalog(), or from https://cmap.readthedocs.io/en/latest/catalog/catalog.html). Here is a short example:

## cruise = 'Diel'
cruise = 'MGL1704'
targetTables = c('tblSeaFlow', 'tblPisces_NRT')
targetVars = c('abundance_synecho', 'NO3')
## targetVars = c('synecho_abundance', 'NO3')
depth1 = 0
depth2 = 5
temporalTolerance = c(0, 10)
latTolerance = c(0, 5)##0.25)
lonTolerance = c(0, 5)##0.25)
depthTolerance = c(5, 5)
dat = along_track(cruise[1],
                  targetTables[1],
                  targetVars[1],
                  depth1[1],
                  depth2[1],
                  temporalTolerance[1],
                  latTolerance[1],
                  lonTolerance[1],
                  depthTolerance[1])
## 
  |                                                                                                                              
  |                                                                                                                        |   0%
head(dat) %>% kable()
time lat lon abundance_synecho abundance_synecho_std
2017-05-26 22:34:53 21.3164 -157.8767 NA NA
2017-05-26 22:37:53 21.3164 -157.8767 NA NA
2017-05-26 22:40:53 21.3164 -157.8767 NA NA
2017-05-26 22:43:54 21.3164 -157.8767 NA NA
2017-05-26 22:46:54 21.3164 -157.8767 NA NA
2017-05-26 22:49:54 21.3164 -157.8767 NA NA

2.5 Issue SQL query

(For advanced users) Simons CMAP datasets are hosted in a SQL database and the CMAP4R R package provides the user with a number of pre-developed methods to extract and retrieve subsets of the data. The rest of this documentation is dedicated to explore and explain these methods. In addition to the pre-developed functions described above, you may issue custom SQL query statement and returns the results in form of a data frame.

dat = exec_manualquery("SELECT [time], lat, lon, depth, Fe FROM tblPisces_NRT
         WHERE
         [time] BETWEEN '2017-06-03' AND '2017-06-03' AND
         lat BETWEEN 10 AND 55 AND
         lon BETWEEN -180 AND 100 AND
         depth BETWEEN 0 AND 0.5
         ORDER BY [time], lat, lon, depth") 
dat %>% head(10) %>% kable()
time lat lon depth Fe
2017-06-03 10 -180.0 0.494025 1.5e-05
2017-06-03 10 -179.5 0.494025 1.5e-05
2017-06-03 10 -179.0 0.494025 1.5e-05
2017-06-03 10 -178.5 0.494025 1.5e-05
2017-06-03 10 -178.0 0.494025 1.5e-05
2017-06-03 10 -177.5 0.494025 1.5e-05
2017-06-03 10 -177.0 0.494025 1.5e-05
2017-06-03 10 -176.5 0.494025 1.5e-05
2017-06-03 10 -176.0 0.494025 1.5e-05
2017-06-03 10 -175.5 0.494025 1.5e-05