Require

CRAN_Status_Badge CRAN Downloads R build status

A simple package for reproducible package management in R. This is different than other approaches to package management such as packrat, checkpoint, and renv, by including all-in-one management for packages in R focused around a single function, Require.

Objectives

Some packages, including those in our PredictiveEcology repository, have many package dependencies. Some of them are on CRAN, but some are still in development and so are hosted elsewhere. Mixing many package dependencies that are constantly evolving creates challenges with standard R package management. For example, what is the best way to move analyses from one machine to another, or set up a series of High Performance Compute nodes? How should we use functions like install.packages in a reproducible workflow that are clearly intended to be used once or very few times? How do we deal with many packages on GitHub that have many common dependencies? Finally, how do we do all this for many concurrent projects without installing hundreds of packages in a new directory for every project?

The Require package attempts to address these issues and others. It is different than packrat in that it is much simpler and is closer to base R package management. Require can use hierarchical library paths, as in base R, with many paths in the .libPaths(), or can set a single library path to be standAlone. This allows “system” packages to be used as well as “project-specific” packages to be used together, as in base R. It is different than other packages like renv in that it is focused around a single function, Require, that can be used in a reproducible workflow. renv uses a notion of package versions, the “snapshot”, as installed at its foundation; any changes to package versions by a user then updates this snapshot. renv does not keep this information in the source code of the project. Require uses the notion of a “snapshot” as a decision to make by a user when it is time to set the package versions. Package versions are primarily updated by the code developer by stating the minimum (or maximum) package version in the source code. This means that by default, projects are somewhat more fluid, defined by no package version if none is required or minimum (or maximum) package versions if required until it is time to freeze it, say when publishing or needing to set up virtual machines with identical setup. From this perspective, renv is more “top-down”, and Require is more “bottom-up”, though they can each emulate the other’s behaviour.

We define a reproducible workflow as a workflow that can be run from the start to any point in the project, without having to “skip over” or “comment out” or “jump to” particular lines or chunks of code. Require does that.

Basic usage

Require is essentially a wrapper around functions that install packages, e.g., install.packages, install_github and one of the main function to load packages, require.

# install.packages("Require") 
library(Require)
Require("data.table")

And with version numbering:

Require("data.table (>=1.12.8)")

It is vectorized on package names, and can include mixed github and CRAN, mixed version number and not:

Require(c("data.table (>=1.12.8)", "PredictiveEcology/quickPlot"))

Advanced usage

Keeping it all isolated

We can keep all packages in an isolated folder for a project, using standAlone = TRUE

setLibPaths("Project_A_Packages", standAlone = TRUE)
.libPaths() # this is just to check what happened in the previous line -- there are 2 folders only
Require("data.table (>=1.12.8)")

Or we can use a hybrid of our main, “personal” library and a project specific one for “extra” packages:

setLibPaths("Project_A_Packages", standAlone = FALSE)
.libPaths() # we have added a library to original ones on this system
Require("data.table (>=1.12.8)")

Installing old package versions

In the same way as above, we can specify maximum or exact package versions. Require will retrieve these on CRAN archives.

Require("data.table (<=1.11.0)")

Managing a project

Because it is vectorized, there can be a long list of packages at the top of a project file, with various sources and version specifications.

library(Require)
setLibPaths("ProjectA", standAlone = TRUE)
Require(c("data.table (==1.12.8)", "dplyr", "reproducible", 
          "PredictiveEcology/SpaDES@development", "raster (>=3.1.5)"))

Taking a snapshot

When a system is set up with the correct packages and versions, we can take a snapshot and give that file to another person or machine:

pkgSnapshot("mySnapshot.txt", standAlone = TRUE) # to get only the project specific ones

Move to a new machine, say

Require::Require(packageVersionFile = "mySnapshot.txt")

Using local package cache

When installing on many machines on a network, having a local cache can speed up installations. Setting options("Require.RPackageCache" = someSharedDirectory) will turn on local cache. By default, binaries will be saved on Windows. Also by default, binaries will be built on the fly on *nix systems and this binary will be cached for even faster installs later (turned off with options("Require.RPackageCache" = someSharedDirectory))

Installing

News

See updates from latest CRAN and development versions.

Installation

Current release

R build status codecov

Install from CRAN:

install.packages("Require")

Install from GitHub:

#install.packages("devtools")
library("devtools")
install_github("PredictiveEcology/Require", dependencies = TRUE) 

Development version

R build status codecov

Install from GitHub:

#install.packages("devtools")
library("devtools")
install_github("PredictiveEcology/Require", ref = "development", dependencies = TRUE) 

Conclusion

Require package offers a simple, lightweight, package focused around a single function. The package has very few dependencies and so can be used to install packages without interfering with itself.

Contributions

Please see CONTRIBUTING.md for information on how to contribute to this project.