3. Creating configurations (gen3sis input)

26.10.2021

The configuration object defines the core processes of the system modeled. By defining some run settings the configuration object is essential to define the model to be simulated and facilitates formalization of eco-evolutionary processes.

• Core functions: include a custom initialization, observer, speciation, dispersal, evolution and ecology functions. Altogether, these six functions are applied as defined in the simulation engine. The possibility to customize these functions confers the high flexibility and generality of gen3sis in terms of including a wide range of theoretical knowledge.
• Settings: include the ecological traits considered in the simulation; whether a random seed is used thus enabling simulation reproducibility; start and end times of the simulation; and rules about aborting the simulation, including the maximum global or local species number permitted.

This vignette will guide you trough the config input object. It can be either an R object or a file, that can be easily shared and modified.

Here we will:

1. Create an empty config using the write_config_skeleton function.

2. Explain an example config using the config from introduction vignette.

3. Modify a config from within R.

library(gen3sis)

1. Create an empty config

In order to create an empty config, we use the function write_config_skeleton and define the output location. For that we have to define where the empty config will be stored.

# get data path
datapath <- system.file(file.path("extdata", "EmptyConfig"), package="gen3sis")
# set config_empty.R file path
config_file_path <- file.path(tempdir(), "config_empty.R")

And then create an empty config.

#writes out a config skeleton
write_config_skeleton(config_file_path)

This is how the config_empty.R looks like. We present them by main areas and each part is described briefly. For more information and a hand’s on example, see next session.

Content of config_empty.R

Metadata

# Version: 1.0
# Author:
# Date:
# Landscape:
# Publications:
# Description:

Settings

# set the random seed for the simulation.
random_seed = NA
# set the starting time step or leave NA to use the earliest/highest time-step.
start_time = NA
# set the end time step or leave as NA to use the latest/lowest time-step (0).
end_time = NA
# maximum total number of species in the simulation before it is aborted.
max_number_of_species = 25000
# maximum number of species within one cell before the simulation is aborted.
max_number_of_coexisting_species = 2500
# a list of traits to include with each species.
# a "dispersal" trait is implicitly added in any case.
trait_names = c("dispersal")
# ranges to scale the input environments with:
# not listed variable:         no scaling takes place
# listed, set to NA:           env. variable will be scaled from [min, max] to [0, 1]
# listed with a given range r: env. variable will be scaled from [r1, r2] to [0, 1]
environmental_ranges = list( )

Observer

end_of_timestep_observer = function(data, vars, config){
# the list of all species can be found in data$all_species. # the current landscape can be found in data$landscape.
}

Initialization

# the initial abundance of a newly colonized cell, both during setup and later when
# colonizing a cell during the dispersal.
initial_abundance = 1
# place species in the landscape:
create_ancestor_species <- function(landscape, config) {
stop("create the initial species here")
}

Core Processes

### Dispersal
# the maximum range to consider when calculating the distances from local distance inputs.
max_dispersal <- Inf
# returns n dispersal values.
get_dispersal_values <- function(n, species, landscape, config) {
stop("calculate dispersal values here")
}
### Speciation
# threshold for genetic distance after which a speciation event takes place.
divergence_threshold = NULL
# factor by which the divergence is increased between geographically isolated population.
# can also be a matrix between the different population clusters.
get_divergence_factor <- function(species, cluster_indices, landscape, config) {
stop("calculate divergence factor here")
}
### Evolution
# mutate the traits of a species and return the new traits matrix.
apply_evolution <- function(species, cluster_indices, landscape, config) {
stop("mutate species traits here")
}
### Ecology
# called for every cell with all occurring species, this function calculates abundances and/or who survives for each sites
# returns a vector of abundances.
# set the abundance to 0 for every species supposed to die.
apply_ecology <- function(abundance, traits, environment, config) {
stop("calculate species abundances and deaths here")
}

Note that the main core functions have stop warnings by default, this forces the configuration to be edited according to the rules and assumptions (model) that is to be created. You can do this from scratch or by modifying an existing config.

2. Explain example config

Here we will go bit by bit of the config used at the introduction vignette vignette. We will follow the structure presented above.

Metadata

The first section of the config file contains metadata. It informs the user when and by whom the file has been created, which landscape it was made for and informs about associated publications and implemented processes or exploration intentions of the specific configuration.

# Version: 1.0
# Author: Oskar Hagen
# Date: 26.10.2020
# Landscape: SouthAmerica
# Publications: R-package gen3sis
# Description: Example config used at the introduction vignette and similar to case study global configs in Hagen et al. 2020.
# O. Hagen, B. Flück, F. Fopp, J.S. Cabral, F. Hartig, M. Pontarp, T.F. Rangel, L. Pellissier. gen3sis: the general engine for eco-evolutionary simulations on the origins of biodiversity.

Settings

We start our simulation 40 Ma, which corresponds to time step 40 since 1 time-step = 1 myr. We end it a the latest available time-step at the landscape, thus start_time is set to 40 and end_time is set to NA. We limit the maximum total number of species alive in a simulation to 50000 with max_number_of_species and the maximum number of species within one site to 10000 with max_number_of_coexisting_species. Like so, simulations that could possible go off hand and generate too many species are stopped and properly flagged at the sgen3sis summary object. We define which traits we will consider in our simulation with traits_names, in our example, species only have a dispersal ability and a optimum temperature traits. For our model to work, we normalize the environmental values, according to the range informed at environmental_ranges.

random_seed = 6
start_time = 40
end_time = NA
max_number_of_species = 50000
max_number_of_coexisting_species = 10000
trait_names = c("temp",  "dispersal")
environmental_ranges = list("temp" = c(-45, 55), "area"=c(2361.5, 12923.4),
"arid"=c(1,0.5))

Observer

With the observer function, changes over time in any abiotic or biotic information of the virtual world can be recorded by defining the outputs that are saved at specified time steps, and results can be saved and plotted in real-time as the model runs. In our observer function we save the species and plot the richness. Like so, plots are printed as the simulation progresses and also saved inside the plot folder inside output.

end_of_timestep_observer = function(data, vars, config){
save_species()
plot_richness(data$all_species, data$landscape)
}

Combined plots can be generated inside the observer function as for example:

end_of_timestep_observer = function(data, vars, config){
par(mfrow=c(2,3))
plot_raster_single(data$landscape$environment[,"temp"], data$landscape, "temp", NA) plot_raster_single(data$landscape$environment[,"arid"], data$landscape, "arid", NA)
plot_raster_single(data$landscape$environment[,"area"], data$landscape, "area", NA) plot_richness(data$all_species, data$landscape) plot_species_presence(data$all_species[[1]], data$landscape) plot(0,type='n',axes=FALSE,ann=FALSE) mtext("STATUS",1) } Initialization The initialization function creates the ancestor species at the start of the simulation. Users can define the number of ancestor from a single to multiple species, their distribution within the paleolandscape and their trait values. In our example, the initial abundance of the 10 ancestor species, as well as the abundance of species on newly colonized sites is 1. We restrict the region of colonization to South America by limiting the range to a spatial extent of c(-95, -24, -68, 13). We then create ten different species, each of which is randomly inhabiting one of the sites of the continent. Thereby all of the sites have an equal probability of being inhabited by a species in the beginning of the simulation. However since the sites vary in their size, the habitation probability per area is not the same for all regions. If you want to correct for that you can consider using an equal area transformation of your coordinates or tweeking sampling functions to account for area correction. We set the optimum temperature trait temp of each population of the initial species to the environmental temperature. This means, that each population is adapted to the local conditions, as we will see later again at the ecology function. Dispersal is set to one, since it is not allowed to evolve, we use it’s maximum value. The function create_ancestor_species is expected to return a list of new species, i.e. the ancestor(s) one(s). initial_abundance = 1 create_ancestor_species <- function(landscape, config) { range <- c(-95, -24, -68, 13) co <- landscape$coordinates
selection <- co[, "x"] >= range[1] &
co[, "x"] <= range[2] &
co[, "y"] >= range[3] &
co[, "y"] <= range[4]
new_species <- list()
for(i in 1:10){
initial_cells <- rownames(co)[selection]
initial_cells <- sample(initial_cells, 1)
new_species[[i]] <- create_species(initial_cells, config)
#set local adaptation to max optimal temp equals local temp
new_species[[i]]$traits[ , "temp"] <- landscape$environment[initial_cells,"temp"]
weight_abd <- species$abundance[cells_cluster]/mean_abd traits[cells_cluster, "temp"] <- mean(traits[cells_cluster, "temp"]*weight_abd) } #mutations mutation_deltas <-rnorm(length(traits[, "temp"]), mean=0, sd=trait_evolutionary_power) traits[, "temp"] <- traits[, "temp"] + mutation_deltas return(traits) } Ecology The ecology function determines the abundance or presence of the populations in occupied sites of each species. The function iterates over all occupied sites and updates the species population abundances or presences on the basis of local environmental values, updated co-occurrence patterns and species traits. The function takes as input the species abundance, species trait, species divergence and clusters, and the landscape values. In our example, we calculate abundances in a site based on how close the species is to the site temperature. We scale the values to avoid small numbers and apply a carrying capacity based on aridity and temperature corrected by the area of a site. If abundances are below 1, species are considered extinct, if total abundance in a site is above the carrying capacity, small abundances are removed progressively and randomly distributed across the present species until total abundance is smaller or equal to the carrying capacity. apply_ecology <- function(abundance, traits, landscape, config) { abundance_scale = 10 abundance_threshold = 1 #abundance treashold survive <- abundance>=abundance_threshold abundance[!survive] <- 0 abundance <- (( 1-abs( traits[, "temp"] - landscape[, "temp"]))*abundance_scale)*as.numeric(survive) #abundance thhreashold abundance[abundance<abundance_threshold] <- 0 k <- ((landscape[,"area"]*(landscape[,"arid"]+0.1)*(landscape[,"temp"]+0.1))*abundance_scale^2) total_ab <- sum(abundance) subtract <- total_ab-k if (subtract > 0) { # print(paste("should:", k, "is:", total_ab, "DIFF:", round(subtract,0) )) while (total_ab>k){ alive <- abundance>0 loose <- sample(1:length(abundance[alive]),1) abundance[alive][loose] <- abundance[alive][loose]-1 total_ab <- sum(abundance) } #set negative abundances to zero abundance[!alive] <- 0 } return(abundance) } 3. Modify a config A configuration object can be modified by editing a file or the config object in R. Here we will load the config_southamerica.R as an R object and modify the initialization of species. Instead of having multiple species that occupy one site each, we will now initialize one species that occurs in all habitable sites and starts with abundance 10. # get data path datapath <- system.file(file.path("extdata", "SouthAmerica"), package="gen3sis") # creates config object from config file config_object <- create_input_config(file.path(datapath, "config/config_southamerica.R")) # modify the initialization function config_object$gen3sis$initialization$create_ancestor_species <- function(landscape, config) {
range <- c(-95, -24, -68, 13)
co <- landscape$coordinates selection <- co[, "x"] >= range[1] & co[, "x"] <= range[2] & co[, "y"] >= range[3] & co[, "y"] <= range[4] initial_cells <- rownames(co)[selection] new_species <- create_species(initial_cells, config) #set local adaptation to max optimal temp equals local temp new_species$traits[ , "temp"] <- landscape$environment[initial_cells,"temp"] new_species$traits[ , "dispersal"] <- 1
new_species$abundance <- new_species$abundance*10
return(list(new_species))
}

Additionally, we will now implement a simpler ecology function in our config_object. Abundances in a site are defined by how close the species population optimum temperature is to the site temperature. The abundance_threshold parameter defines the niche width cutoff. The higher the value, the narrower is the temperature range of the niche.

# modify the ecology function
config_object$gen3sis$ecology$apply_ecology <- function(abundance, traits, landscape, config, abundance_scale = 10, abundance_threshold = 8) { #abundance threshold abundance <- as.numeric(!abundance<abundance_threshold) abundance <- (( 1-abs( traits[, "temp"] - landscape[, "temp"]))*abundance_scale)*abundance #abundance threshold abundance[abundance<abundance_threshold] <- 0 return(abundance) } Like so, functions and parameters can be conveniently changed and simulated in gen3sis. Remember that run_simulation works either with a path to a config file or config object, so we can run the old and the new modified config. To compare the initial modified config, we will load the old config and set the old and the new config to run only for 3 time-steps and plot species ranges plot_ranges config_object_old <- create_input_config(file.path(datapath, "config/config_southamerica.R")) # define observer config_object_old$gen3sis$general$end_of_timestep_observer <- function(data, vars, config){
plot_ranges(data$all_species, data$landscape, disturb=0, max_sps=10)
}

config_object$gen3sis$general$end_of_timestep_observer <- config_object_old$gen3sis$general$end_of_timestep_observer

# define only 3 time-steps
config_object_old$gen3sis$general$end_time <- 38 config_object$gen3sis$general$end_time <- 38

Before using the modified configs, test if it is valid. If the function verify_config returns TRUE, you are good to go.

verify_config(config_object_old)
#> [1] TRUE
verify_config(config_object)
#> [1] TRUE

Run the modified old config

sim_old <- run_simulation(config = config_object_old,
landscape = file.path(datapath, "landscape"),
output_directory=tempdir(),
call_observer = 4,
verbose=0) # no progress printed

And run the modified new config

sim_new <- run_simulation(config = config_object,
landscape = file.path(datapath, "landscape"),
verbose=0, # no progress printed
output_directory=tempdir())