MarineSPEED: Marine SPEcies with Environment Dataset
  • About
  • Species
  • Sources

About

MarineSPEED is a benchmark dataset for presence-only species distribution modelling. It contains a collection of 514 marine species linked to environmental data for the 71 current climate layers of Bio-ORACLE and MARSPEC.

R package "marinespeed"

An R package for downloading and working with MarineSPEED is available on CRAN and on GitHub. For more information see the R package documentation, the README or contact me at mail@samuelbosch.com .

# Installation from CRAN
install.packages("marinespeed")
# or from GitHub
devtools::install_github("samuelbosch/marinespeed")

Example usage:

library(marinespeed)

## list of all species
species <- list_species()
View(species)

## count number of occurrences for all species 
get_occ_count <- function(speciesname, occ) {
  nrow(occ)
}
record_counts <- lapply_species(get_occ_count)
print(sum(unlist(record_counts)))

## plot first 2 folds for the first 10 species
plot_occurrences <- function(speciesname, data, k) {
  title <- paste0(speciesname, " (fold = ", k, ")")
  plot(data$occurrence_train[,c("longitude", "latitude")], pch=".", col="blue", main = title)
  points(data$occurrence_test[,c("longitude", "latitude")], pch=".", col="red")
}

# plot training (blue) and test (red) occurrences of the first 2 disc folds 
# for the first 10 species
species <- list_species()
lapply_kfold_species(plot_occurrences, species=species[1:10,],
fold_type = "disc", k = 1:2)

Direct download

You can also directly download all MarineSPEED data with the following links.

  • species data
    • species.csv.gz
    • traits.csv.gz
    • occurrences.zip
    • occurrences_raw.zip
  • random_background.zip
  • targetgroup_background.zip
  • random folds
    • random_species_5cv_folds.csv.gz
    • random_background_5cv_folds.csv.gz
    • targetgroup_background_5cv_folds.csv.gz
  • pseudo-disc (spatial) folds
    • pseudodisc_species_5cv_folds.csv.gz
    • pseudodisc_background_5cv_folds.csv.gz
  • grid (spatial) 4-fold
    • grid_species_4cv_folds.csv.gz
    • grid_background_4cv_folds.csv.gz
  • grid (spatial) 9-fold
    • grid_species_9cv_folds.csv.gz
    • grid_background_9cv_folds.csv.gz


Sources

Distribution records

The occurrence records where originally sourced from GBIF ( datasets ), OBIS ( datasets ), Reef Life Survey, INVASIVES project and personal communications.

Environmental data

The associated environmental data was extracted from Bio-ORACLE and MARSPEC. The appropriate citations are:

  • Tyberghein L., Verbruggen H., Pauly K., Troupin C., Mineur F. & De Clerck O. Bio-ORACLE: a global environmental dataset for marine species distribution modeling. Global Ecology and Biogeography. http://dx.doi.org/10.1111/j.1466-8238.2011.00656.x
  • Sbrocco, EJ and Barber, PH (2013) MARSPEC: Ocean climate layers for marine spatial ecology. Ecology 94: 979. http://dx.doi.org/10.1890/12-1358.1

Traits

Taxonomic information was retrieved from the World Register of Marine Species (WoRMS).

Sampling bias information was visually assessed.

Ecoregion data was generated based on: Spalding M.D., Fox H.E., Allen G.R., Davidson N., FerdaƱa Z. a., Finlayson M., Halpern B.S., Jorge M. a., Lombana A., Lourie S. a., Martin K.D., Mcmanus E., Molnar J., Recchia C. a., & Robertson J. (2007) Marine Ecoregions of the World: A Bioregionalization of Coastal and Shelf Areas. BioScience, 57, 573. http://dx.doi.org/10.1641/B570707



Created by Samuel Bosch .
Hosted by Flanders Marine Institute (VLIZ) .
Based on R , Shiny , Leaflet , Bootstrap and Bootswatch .