Creating input configurations in gen3sis

29.08.2020

The configuration object defines the core processes of the system modeled and define some run settings. It is essential to define the model to be simulated.

This vignette will guide you trough the config input object. It can be either an R object or a file, that can be easily shared and modified.

Here we will:

1. Create an empty config using the write_config_skeleton function.

2. Explain an example config using the config from introduction vignette.

3. Modify a config from within R.

library(gen3sis)

1. Create an empty config

In order to create an empty config, we use the function write_config_skeleton and define the output location. For that we have to define where the empty config will be stored.

# get data path
datapath <- system.file(file.path("extdata", "EmptyConfig"), package="gen3sis")
# set config_empty.R file path
config_file_path <- file.path(tempdir(), "config_empty.R")

And then create an empty config.

#writes out a config skeleton
write_config_skeleton(config_file_path)

This is how the config_empty.R looks like. We present them by main areas and each part is described briefly. For more information and a hand’s on example see next session.

Content of config_empty.R

Metadata

# Version: 1.0
# Author:
# Date:
# Landscape:
# Publications:
# Description:

Settings

# set the random seed for the simulation.
random_seed = NA
# set the starting time step or leave NA to use the earliest/highest time-step.
start_time = NA
# set the end time step or leave as NA to use the latest/lowest time-step (0).
end_time = NA
# maximum total number of species in the simulation before it is aborted.
max_number_of_species = 25000
# maximum number of species within one cell before the simulation is aborted.
max_number_of_coexisting_species = 2500
# a list of traits to include with each species.
# a "dispersal" trait is implicitly added in any case.
trait_names = c("dispersal")
# ranges to scale the input environments with:
# not listed variable:         no scaling takes place
# listed, set to NA:           env. variable will be scaled from [min, max] to [0, 1]
# listed with a given range r: env. variable will be scaled from [r1, r2] to [0, 1]
environmental_ranges = list( )

Observer

end_of_timestep_observer = function(data, vars, config){
 # the list of all species can be found in data$all_species.
 # the current landscape can be found in data$landscape.
}

Initialization

# the initial abundance of a newly colonized cell, both during setup and later when 
# colonizing a cell during the dispersal.
initial_abundance = 1
# place species in the landscape:
create_ancestor_species <- function(landscape, config) {
stop("create the initial species here")
}

Core Processes

### Dispersal
# the maximum range to consider when calculating the distances from local distance inputs.
max_dispersal <- Inf
# returns n dispersal values.
get_dispersal_values <- function(n, species, landscape, config) {
 stop("calculate dispersal values here")
}
### Speciation
# threshold for genetic distance after which a speciation event takes place.
divergence_threshold = NULL
# factor by which the divergence is increased between geographically isolated population.
# can also be a matrix between the different population clusters.
get_divergence_factor <- function(species, cluster_indices, landscape, config) {
 stop("calculate divergence factor here")
}
### Evolution
# mutate the traits of a species and return the new traits matrix.
apply_evolution <- function(species, cluster_indices, landscape, config) {
 stop("mutate species traits here")
}
### Ecology
# called for every cell with all occurring species, this function calculates abundances and/or who survives for each sites
# returns a vector of abundances.
# set the abundance to 0 for every species supposed to die.
apply_ecology <- function(abundance, traits, environment, config) {
 stop("calculate species abundances and deaths here")
}

Note that the main core functions have stop warnings by default, this forces the configuration to be edited according to the rules and assumptions (model) that is to be created. You can do this from scratch or by modifying an existing config.

2. Explain example config

Here we will go bit by bit of the config used at the introduction vignette vignette. We will follow the structure presented above.

Content of config_.empty_worldcenter.R

Metadata

The first section of the config file contains metadata. It informs the user when and by whom the file has been created, which landscape it was made for and informs about associated publications and implemented processes or exploration intentions of the specific configuration.

# Version: 1.0
# Author: Oskar Hagen
# Date: 1.7.2020
# Landscape: WorldCenter
# Publications: R-package gen3sis
# Description: Example config used at the introduction vignette and similar to case study global configs in Hagen et al. 2020.
# O. Hagen, B. Flück, F. Fopp, J.S. Cabral, F. Hartig, M. Pontarp, T.F. Rangel, L. Pellissier. gen3sis: The GENeral Engine for Eco-Evolutionary SImulationS on the origins of biodiversity.

Settings

We start our simulation with the first available time-step and end it a the latest available time-step at the landscape, thus start_time and end_time are set to NA. We limit the maximum total number of species alive in a simulation to 50000 with max_number_of_species and the maximum number of species within one site to 10000 with max_number_of_coexisting_species. Like so, simulations that could possible go off hand and generate too many species are stoped. We define which traits we will consider in our simulation with traits_names, in our example, species only have a dispersal ability and a optimum temperature traits. For our model to work, we normalize the environmental values, according to the range informed at environmental_ranges.

random_seed = 001
start_time = NA
end_time = NA
max_number_of_species = 50000
max_number_of_coexisting_species = 10000
trait_names = c("temp",  "dispersal")
environmental_ranges = list("temp" = c(-45, 55), "area"=c(101067, 196949), "prec"=c(1,0.5))

Observer

With the observer function, changes over time in any abiotic or biotic information of the virtual world can be recorded by defining the outputs that are saved at specified time steps, and results can be saved and plotted in real-time as the model runs. In our observer function we save the species and plot the richness. Like so, plots are printed as the simulation progresses and also saved inside the plot folder inside output.

end_of_timestep_observer = function(data, vars, config){
 save_species()
 plot_richness(data$all_species, data$landscape)
}

Combined plots can be generated inside the observer function as for example:

end_of_timestep_observer = function(data, vars, config){
   par(mfrow=c(2,3))
   plot_raster_single(data$landscape$environment[,"temp"], data$landscape, "temp", NA)
   plot_raster_single(data$landscape$environment[,"prec"], data$landscape, "prec", NA)
   plot_raster_single(data$landscape$environment[,"area"], data$landscape, "area", NA)
   plot_richness(data$all_species, data$landscape)
   plot_species_presence(data$all_species[[1]], data$landscape)
   plot(0,type='n',axes=FALSE,ann=FALSE)
   mtext("STATUS",1)
}

Initialization

The initialization function creates the ancestor species at the start of the simulation. Users can define the number of ancestor species, their distribution within the ancient landscape and their trait values. In our example, the initial abundance of the ancestor species as well as the abundance of species on newly colonized sites is 1. We set one single ancestor species spread across the entire globe, i.e. between longitudes [-180, 180] and latitudes [-90, 90]. We set the temp trait of each population of the initial species to the environmental temperature. This means, that each population is adapted to the local conditions, as we will see later on the ecology function. Dispersal is set to one, since it is not ranging. The function create_ancestor_species is expected to return a list of new species, i.e. the ancestor(s) one(s).

initial_abundance = 1
create_ancestor_species <- function(landscape, config) {
 range <- c(-180, 180, -90, 90)
 co <- landscape$coordinates
 selection <- co[, "x"] >= range[1] &
   co[, "x"] <= range[2] &
   co[, "y"] >= range[3] &
   co[, "y"] <= range[4]
 initial_cells <- rownames(co)[selection]
 new_species <- create_species(initial_cells, config)
 #set local adaptation to max optimal temp equals local temp
 new_species$traits[ , "temp"] <- landscape$environment[,"temp"]
 new_species$traits[ , "dispersal"] <- 1
 return(list(new_species))
}

Core Processes

Dispersal

The dispersal function iterates over all species populations and determines the connectivity between sites and the colonization of new sites in the grid cell. In our example, the dispersal of all species is the same and is stochastic, following a weibull distribution with shape 6 and scale 999. For different dispersal between species, the dispersal trait should be used. Note that dispersal can be made deterministic by making the function return a fix value.

get_dispersal_values <- function(n, species, landscape, config) {
 values <- rweibull(n, shape = 6, scale = 999)
 return(values) }

The histogram of 100 draws from the dispersal function used above.

n <- 100
hist(rweibull(n, shape = 6, scale = 999), col="black")

Speciation

The speciation iterates over every species separately, registers populations’ geographic occupancy (species range), and determines when geographic isolation between population clusters is higher than a user-defined threshold, triggering a lineage splitting event of cladogenesis. The clustering of occupied sites is based on the species’ dispersal capacity and the landscape connection costs. Over time, disconnected clusters gradually accumulate incompatibility, analogous to genetic differentiation. When the divergence between clusters is above the speciation threshold, those clusters become two or more distinct species, and a divergence matrix reset follows. On the other hand, if geographic clusters come into secondary contact before the speciation occurs, they coalesce and incompatibilities are gradually reduced to zero. In our example, speciation takes place after 12 time-steps of isolation and the divergence increase is the same for all species as indicated by get_divergence_threshold. > >r >divergence_threshold = 12 #this is 2Myrs >get_divergence_factor <- function(species, cluster_indices, landscape, config) { > return(1) } >

Evolution

In the function bellow, clustered populations (exchanging genes) had their trait homogenized and weighted by abundance, meaning that a trait of a population that is doing well in a site, as dictated by the ecology function, will contribute more to the average trait of a cluster. Later, populations mutate based on a normal distribution with standard deviation 0.001, possibly increasing or decreasing species optimum temperature.

# mutate the traits of a species and return the new traits matrix
apply_evolution <- function(species, cluster_indices, landscape, config) {
 trait_evolutionary_power <- 0.001
 traits <- species[["traits"]]
 cells <- rownames(traits)
 #homogenize trait based on abundance
 for(cluster_index in unique(cluster_indices)){
   cells_cluster <- cells[which(cluster_indices == cluster_index)]
   mean_abd <- mean(species$abundance[cells_cluster])
   weight_abd <- species$abundance[cells_cluster]/mean_abd
   traits[cells_cluster, "temp"] <- mean(traits[cells_cluster, "temp"]*weight_abd)
 }
 #mutations
 mutation_deltas <-rnorm(length(traits[, "temp"]), mean=0, sd=trait_evolutionary_power)
 traits[, "temp"] <- traits[, "temp"] + mutation_deltas
 return(traits)
}

Ecology

The ecology function determines the abundance or presence of the populations in occupied sites of each species. The function iterates over all occupied sites and updates the species population abundances or presences on the basis of local environmental values, updated co-occurrence patterns and species traits. The function takes as input the species abundance, species trait, species divergence and clusters, and the landscape values. In our example, we calculate abundances in a site based on how close the species is to the site temperature. We scale the values to avoid small numbers and apply a carrying capacity based on aridity and temperature corrected by the area of a site. If abundances are below 1, species are considered extinct, if total abundance in a site is above the carrying capacity, small abundances are removed progressively and randomly distributed across the present species until total abundance is smaller or equal to the carrying capacity.

apply_ecology <- function(abundance, traits, landscape, config) {
 abundance_scale = 10
 abundance_threshold = 1
 #abundance treashold
 survive <- abundance>=abundance_threshold
 abundance[!survive] <- 0
 abundance <- (( 1-abs( traits[, "temp"] - landscape[, "temp"]))*abundance_scale)*as.numeric(survive)
 #abundance thhreashold
 abundance[abundance<abundance_threshold] <- 0
 k <- ((landscape[,"area"]*(landscape[,"prec"]+0.1)*(landscape[,"temp"]+0.1))*abundance_scale^2)
 total_ab <- sum(abundance)
 subtract <- total_ab-k
 if (subtract > 0) {
   # print(paste("should:", k, "is:", total_ab, "DIFF:", round(subtract,0) ))
   while (total_ab>k){
     alive <- abundance>0
     loose <- sample(1:length(abundance[alive]),1)
     abundance[alive][loose] <- abundance[alive][loose]-1
     total_ab <- sum(abundance)
   }
   #set negative abundances to zero
   abundance[!alive] <- 0
 }
 return(abundance)
}

3. Modify a config

A configuration object can be modified by editing a file or the config object in R. Here we will load the config_worldcenter.R as an R object and modify it’s random seed.

# get data path
datapath <- system.file(file.path("extdata", "WorldCenter"), package="gen3sis")
# creates config object from config file
config_object <- create_input_config(file.path(datapath, "config/config_worldcenter.R"))
# modify random seed
config_object$gen3sis$general$random_seed <- 2020

Like so, functions and parameters can be conveniently changed and simulated in gen3sis. Remember that run_simulation works either with a path to a config file or config object, so we can run the old and the new modified config.

Before using a new config, you can test if it is a valid one. If the function verify_config returns TRUE, you are good to go.

verify_config(config_object)
#> [1] TRUE
# run simulation by indicating config file path
sim_old <- run_simulation(config = file.path(datapath, "config/config_worldcenter.R"), 
               landscape = file.path(datapath, "landscape"),
               output_directory=tempdir())

# run simulation by indicating config object
sim_new <- run_simulation(config = config_object, 
               landscape = file.path(datapath, "landscape"),
               output_directory=tempdir())