1 Introduction

The geometr package generalises the way to interact with spatial and other geometric objects by providing functions that access and modify data components in the same manner across classes. Moreover, geometr provides a data structure (of class geom) that represents the different data components in a truly tidy manner, allowing to generate geometric objects that are easily accessible and play well with other tidy tools.

One could argue that spatial objects are merely a special case of geometric objects, where the coordinates of points refer to real locations on the surface of the earth instead of some virtual (cartesian) coordinate system. Geometric and spatial objects typically contain a collection of points that outline a geometric shape, or feature. A feature in geometr is defined as a set of points that form no more than one single unit of one of the types point, line, polygon or grid. In contrast to the simple features standard, there are no multi-* features in geometr, sets of features that belong together beyond their geometric connectedness are instead assigned a common group. Consequently, a geom is primarily made up of three tables that contain information on points (their coordinates), features and groups. The tables are related with feature and group IDs (fid and gid respectively) and can be provided with additional attributes (more on this in the chapter "Attributes of a geom").

This vignette outlines in detail first how geometr improves interoperability, then it describes the data-structure of a geom, how different feature types are cast into one another and shows how to visualise geometric objects with geometr.


2 Interoperability

Interoperable software is designed to easily exchange information with other software, which can be achieved by providing the output of functionally similar operations in a common arrangement or format, standardising access to the data. This principle is not only true for software written in different programming languages, but can also apply to several packages within the R ecosystem. R is an open source environment which means that no single package or class will ever be the sole source of a particular data structure and this is also the case for spatial and other geometric data.

Interoperable data is data that has a common arrangement and that uses the same terminology, resulting ideally in semantic interoperability. As an example, we can think of the extent of a geometric object. An extent reports the minimum and maximum value of all dimensions an object resides in. There are, however, several ways in which even this simple information can be reported, for example as vector or as table and with or without names. Moreover, distinct workflows provide data so that the same information is not at the same location or with the same name in all structures, e.g., the minimum value of the x dimension is not always the first information and is not always called ‘xmin’.

The following code chunk exemplifies this by showing various functions, which are all considered standard in R to date, that derive an extent from specific spatial objects:

nc_sf <- st_read(system.file("shape/nc.shp", package="sf"))
#> Reading layer `nc' from data source `/home/se87kuhe/R/x86_64-pc-linux-gnu-library/3.6/sf/shape/nc.shp' using driver `ESRI Shapefile'
#> Simple feature collection with 100 features and 14 fields
#> geometry type:  MULTIPOLYGON
#> dimension:      XY
#> bbox:           xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
#> CRS:            4267
#>      xmin      ymin      xmax      ymax 
#> -84.32385  33.88199 -75.45698  36.58965

nc_sp <- as_Spatial(nc_sf)
#>         min       max
#> x -84.32385 -75.45698
#> y  33.88199  36.58965

ras <- raster(system.file("external/test.grd", package="raster"))
#> class      : Extent 
#> xmin       : 178400 
#> xmax       : 181600 
#> ymin       : 329400 
#> ymax       : 334000

st_bbox() provides the information as a named vector and presents first minimum and then maximum values of both dimensions, bbox() provides a table with minimum and maximum values in columns and extent() provides the information in an S4 object that presents first the x and then the y values. Neither data structures, nor names or positions of the information are identical.

For a human user the structure of those information might not matter because we recognise, in most cases intuitively, where which information is to be found in such a simple data-structure. In the above case it is easy to recognise how the combination of column and rownames (of bbox()) refers to the already combined names (of st_bbox() or extent()). However, this capacity of humans to recognise information relative to the context needs to be programmed into software, for it to have that ability. Think, for example, of a new custom function that is designed to extract and process information from an arbitrary spatial input, i.e., without knowing in advance what spatial class the user will provide. This would require an extensive code-logic to handle all possible input formats, complicated further by classes that may become available only in the future.

geometr improves interoperability in R for geometric and thus spatial classes by following the Bioconductor standard for S4 classes. Here, getters and setters are used as accessor functions, and as pathway to extract or modify information of a given data structure. geometr thus provides getters that provide information in identical arrangement from a wide range of classes, and likewise setters that modify different classes in the same way, despite those classes typically need differently formatted input, arguments and functions. The following code chunk shows how different input classes yield the same output object.

myInput <- nc_sf
getExtent(x = myInput)
#> # A tibble: 2 x 2
#>       x     y
#>   <dbl> <dbl>
#> 1 -84.3  33.9
#> 2 -75.5  36.6

myInput <- nc_sp
getExtent(x = myInput)
#> # A tibble: 2 x 2
#>       x     y
#>   <dbl> <dbl>
#> 1 -84.3  33.9
#> 2 -75.5  36.6

myInput <- ras
getExtent(x = myInput)
#> # A tibble: 2 x 2
#>        x      y
#>    <dbl>  <dbl>
#> 1 178400 329400
#> 2 181600 334000

The output of the getters provided by geometr is

  • tidy, i.e., it provides variables in columns, observations in rows and only one value per cell
  • semantically interoperable, i.e., it provides the same information in the same location of the output object, with the same names.

This ensures that the information retrieved with getters are compatible with a tidy workflow and that a custom function that processes geometric information requires merely one very simple row of code to extract those information from a potentially wide range of distinct classes.

3 Description of the class geom

geometr comes with the S4 class geom, a geometric (spatial) class that has primarily been developed for its interoperability and easy access.

All objects of this class are structurally the same, no slots are removed or added when modifying an object and all properties are labelled with the same terms in each object of that class. This interoperability is true for objects representing point (and grid), line or polygon features, for objects that contain a single or several features and for objects that are either merely geometric or indeed spatial/geographic because they contain a coordinate reference system (crs). A geom contains, moreover, only direct information, i.e., such information that can’t be derived from other of its information, such as the extent (which is in fact only the minimum and maximum coordinates that make up the geometry).

3.1 The data-structure of a geom

A geom contains as its backbone the three slots @point, @feature and @group. Each of those slots are a named list that contains as many tables as there are layers in the geom. The exact values stored in those tables are explained in Tab. 3.1, along the other slots of a geom.

Table 3.1: List of issues, which have to be considered when building a database of areal data from distinct sources.
slot class description
type character the type of how the geom is processed and visualised. Either point, line, polygon or grid.
point tibble the coordinates in x and y dimension and the ID of the feature the point is part of (fid).
feature list of tibbles the feature ID (fid) and the ID of the group the feature is part of (gid). Any other attributes that are valid for each individual feature can be joined to this table.
group list of tibbles the group ID (gid) and any attributes that are valid for an overall group.
window tibble the coordinates of a rectangular polygon that outlines the "enclosing area" of the geom. This is not to be confused with the extent, which is the minimum and maximum values of all dimensions of the geom and which is not recorded in a slot but derived from the coordinates.
scale character depending on crs and usecase, the coordinates of points can be documented as absolute or relative values
crs character the coordinate reference system, currently in proj4 notation. In case no crs has been set, this is shown as 'cartesian'.
history list all of the functions of geometr produce an entry in this list to document provenance.

A geom of type grid is a special case of a point geom in that it is made up of a systematically distributed lattice of points, thereby resembling raster objects. A geom of type grid contains in the @point slot merely a table that contains the minimum and maximum value and the cell size for the x and y dimensions, while a geom of type point, line or polygon explicitly contains all the coordinates of the points that make up features. When using the getter getPoints(), this slot is “unpacked” into a form that is interoperable with the other geom types.

#> # A tibble: 3 x 2
#>       x     y
#>   <dbl> <dbl>
#> 1     0     0
#> 2    60    56
#> 3     1     1
getPoints(x = gtGeoms$grid$categorical)
#> # A tibble: 3,360 x 3
#>      fid     x     y
#>    <int> <dbl> <dbl>
#>  1     1   0.5   0.5
#>  2     2   1.5   0.5
#>  3     3   2.5   0.5
#>  4     4   3.5   0.5
#>  5     5   4.5   0.5
#>  6     6   5.5   0.5
#>  7     7   6.5   0.5
#>  8     8   7.5   0.5
#>  9     9   8.5   0.5
#> 10    10   9.5   0.5
#> # … with 3,350 more rows

In contrast to Raster* objects of the raster package, the values in a grid geom are run-length encoded, in case that results in a smaller object, which is often the case for rasters with categorical values.

#> 30608 bytes
#> 13784 bytes

visualise(gtRasters$categorical, gtGeoms$grid$categorical)

As with points, the getter getFeatures() unpacks the @feature slot into its interoperable form.

#> $categorical
#> # A tibble: 726 x 2
#>      val   len
#>    <int> <int>
#>  1    31     2
#>  2    47    10
#>  3    44     7
#>  4    21    27
#>  5    27    14
#>  6    31     1
#>  7    47    11
#>  8    44     8
#>  9    21     5
#> 10    41     4
#> # … with 716 more rows
getFeatures(x = gtGeoms$grid$categorical)
#> # A tibble: 3,360 x 3
#>      fid   gid values