The geometr
package generalises the way to interact with spatial and other geometric objects by providing functions that access and modify data components in the same manner across classes.
Moreover, geometr
provides a data structure (of class geom
) that represents the different data components in a truly tidy manner, allowing to generate geometric objects that are easily accessible and play well with other tidy tools.
One could argue that spatial objects are merely a special case of geometric objects, where the coordinates of points refer to real locations on the surface of the earth instead of some virtual (cartesian) coordinate system.
Geometric and spatial objects typically contain a collection of points that outline a geometric shape, or feature.
A feature in geometr
is defined as a set of points that form no more than one single unit of one of the types point, line, polygon or grid.
In contrast to the simple features standard, there are no multi-* features in geometr
, sets of features that belong together beyond their geometric connectedness are instead assigned a common group.
Consequently, a geom
is primarily made up of three tables that contain information on points (their coordinates), features and groups.
The tables are related with feature and group IDs (fid
and gid
respectively) and can be provided with additional attributes (more on this in the chapter "Attributes of a geom
").
This vignette outlines in detail first how geometr
improves interoperability, then it describes the data-structure of a geom
, how different feature types are cast into one another and shows how to visualise geometric objects with geometr
.
Interoperable software is designed to easily exchange information with other software, which can be achieved by providing the output of functionally similar operations in a common arrangement or format, standardising access to the data. This principle is not only true for software written in different programming languages, but can also apply to several packages within the R ecosystem. R is an open source environment which means that no single package or class will ever be the sole source of a particular data structure and this is also the case for spatial and other geometric data.
Interoperable data is data that has a common arrangement and that uses the same terminology, resulting ideally in semantic interoperability. As an example, we can think of the extent of a geometric object. An extent reports the minimum and maximum value of all dimensions an object resides in. There are, however, several ways in which even this simple information can be reported, for example as vector or as table and with or without names. Moreover, distinct workflows provide data so that the same information is not at the same location or with the same name in all structures, e.g., the minimum value of the x dimension is not always the first information and is not always called ‘xmin’.
The following code chunk exemplifies this by showing various functions, which are all considered standard in R to date, that derive an extent from specific spatial objects:
nc_sf <- st_read(system.file("shape/nc.shp", package = "sf"))
#> Reading layer `nc' from data source
#> `/home/se87kuhe/R/x86_64-pc-linux-gnu-library/3.6/sf/shape/nc.shp'
#> using driver `ESRI Shapefile'
#> Simple feature collection with 100 features and 14 fields
#> Geometry type: MULTIPOLYGON
#> Dimension: XY
#> Bounding box: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
#> Geodetic CRS: NAD27
st_bbox(nc_sf)
#> xmin ymin xmax ymax
#> -84.32385 33.88199 -75.45698 36.58965
nc_sp <- as_Spatial(nc_sf)
bbox(nc_sp)
#> min max
#> x -84.32385 -75.45698
#> y 33.88199 36.58965
ras <- raster(system.file("external/test.grd", package = "raster"))
extent(ras)
#> class : Extent
#> xmin : 178400
#> xmax : 181600
#> ymin : 329400
#> ymax : 334000
st_bbox()
provides the information as a named vector and presents first minimum and then maximum values of both dimensions, bbox()
provides a table with minimum and maximum values in columns and extent()
provides the information in an S4 object that presents first the x and then the y values.
Neither data structures, nor names or positions of the information are identical.
For a human user the structure of those information might not matter because we recognise, in most cases intuitively, where which information is to be found in such a simple data-structure.
In the above case it is easy to recognise how the combination of column and rownames (of bbox()
) refers to the already combined names (of st_bbox()
or extent()
).
However, this capacity of humans to recognise information relative to the context needs to be programmed into software, for it to have that ability.
Think, for example, of a new custom function that is designed to extract and process information from an arbitrary spatial input, i.e., without knowing in advance what spatial class the user will provide.
This would require an extensive code-logic to handle all possible input formats, complicated further by classes that may become available only in the future.
geometr
improves interoperability in R for geometric and thus spatial classes by following the Bioconductor standard for S4 classes.
Here, getters and setters are used as accessor functions, and as pathway to extract or modify information of a given data structure.
geometr
thus provides getters that provide information in identical arrangement from a wide range of classes, and likewise setters that modify different classes in the same way, despite those classes typically need differently formatted input, arguments and functions.
The following code chunk shows how different input classes yield the same output object.
myInput <- nc_sf
getExtent(x = myInput)
#> # A tibble: 2 × 2
#> x y
#> <dbl> <dbl>
#> 1 -84.3 33.9
#> 2 -75.5 36.6
myInput <- nc_sp
getExtent(x = myInput)
#> # A tibble: 2 × 2
#> x y
#> <dbl> <dbl>
#> 1 -84.3 33.9
#> 2 -75.5 36.6
myInput <- ras
getExtent(x = myInput)
#> # A tibble: 2 × 2
#> x y
#> <dbl> <dbl>
#> 1 178400 329400
#> 2 181600 334000
The output of the getters provided by geometr
is
This ensures that the information retrieved with getters are compatible with a tidy workflow and that a custom function that processes geometric information requires merely one very simple row of code to extract those information from a potentially wide range of distinct classes.
geom
geometr
comes with the S4 class geom
, a geometric (spatial) class that has primarily been developed for its interoperability and easy access.
All objects of this class are structurally the same, no slots are removed or added when modifying an object and all properties are labelled with the same terms in each object of that class.
This interoperability is true for objects representing point (and grid), line or polygon features, for objects that contain a single or several features and for objects that are either merely geometric or indeed spatial/geographic because they contain a coordinate reference system (crs).
A geom
contains, moreover, only direct information, i.e., such information that can’t be derived from other of its information, such as the extent (which is in fact only the minimum and maximum coordinates that make up the geometry).
geom
A geom
contains as its backbone the three slots @point
, @feature
and @group
.
Each of those slots are a named list that contains as many tables as there are layers in the geom
.
The exact values stored in those tables are explained in Tab. 3.1, along the other slots of a geom
.
slot | class | description |
---|---|---|
type |
character |
the type of how the geom is processed and visualised. Either point , line , polygon or grid . |
point |
tibble |
the coordinates in x and y dimension and the ID of the feature the point is part of (fid ). |
feature |
list of tibble s |
the feature ID (fid ) and the ID of the group the feature is part of (gid ). Any other attributes that are valid for each individual feature can be joined to this table. |
group |
list of tibble s |
the group ID (gid ) and any attributes that are valid for an overall group. |
window |
tibble |
the coordinates of a rectangular polygon that outlines the "enclosing area" of the geom . This is not to be confused with the extent, which is the minimum and maximum values of all dimensions of the geom and which is not recorded in a slot but derived from the coordinates. |
crs |
character |
the coordinate reference system, currently in proj4 notation. In case no crs has been set, this is shown as 'cartesian'. |
history |
list |
all of the functions of geometr produce an entry in this list to document provenance. |
A geom
of type grid is a special case of a point geom
in that it is made up of a systematically distributed lattice of points, thereby resembling raster
objects.
A geom
of type grid contains in the @point
slot merely a table that contains the minimum and maximum value and the cell size for the x and y dimensions, while a geom
of type point, line or polygon explicitly contains all the coordinates of the points that make up features.
When using the getter getPoints()
, this slot is “unpacked” into a form that is interoperable with the other geom
types.
gtGeoms$grid$categorical@point
#> # A tibble: 3 × 2
#> x y
#> <dbl> <dbl>
#> 1 0 0
#> 2 60 56
#> 3 1 1
getPoints(x = gtGeoms$grid$categorical)
#> # A tibble: 3,360 × 3
#> x y fid
#> <dbl> <dbl> <int>
#> 1 0.5 0.5 1
#> 2 1.5 0.5 2
#> 3 2.5 0.5 3
#> 4 3.5 0.5 4
#> 5 4.5 0.5 5
#> 6 5.5 0.5 6
#> 7 6.5 0.5 7
#> 8 7.5 0.5 8
#> 9 8.5 0.5 9
#> 10 9.5 0.5 10
#> # … with 3,350 more rows
In contrast to Raster*
objects of the raster
package, the values in a grid geom
are run-length encoded, in case that results in a smaller object, which is often the case for rasters with categorical values.
object.size(gtRasters$categorical)
#> 30608 bytes
object.size(gtGeoms$grid$categorical)
#> 12520 bytes
visualise(gtRasters$categorical, gtGeoms$grid$categorical)
As with points, the getter getFeatures()
unpacks the @feature
slot into its interoperable form.