# Check the Heaviness of Package Dependencies

When developing R packages, we should try to avoid directly setting dependencies to “heavy packages”. The “heaviness” for a package means, the number of additional dependent packages it brings to. If your package directly depends on a heavy package, it brings several consequences:

1. Users need to install a lot of additional packages if your package is installed (which brings the risk that installation of some packages may fail that makes your package cannot be installed neither).
2. The namespaces that are loaded into your R session after loading your package (by library(your-pkg)) will be huge (you can see the loaded namespaces by sessionInfo()).

You package will be “heavy” as well and it may take long time to load your package.

In the DESCRIPTION file of your package, those “directly dependent pakcages” are always listed in the “Depends” or “Imports” fields. To get rid of the heavy packages that are not offen used in your package, it is better to move them into the “Suggests” fields and load them only when they are needed.

Here pkgndep package checks the heaviness of the packages that your package depends on. For each package listed in the “Depends”, “Imports” and “Suggests” fields in the DESCRIPTION file, it opens a new R session, loads the package and counts the number of namespaces that are loaded. The summary of the dependencies is visualized by a customized heatmap.

As an example, I am developing a package cola which depends on a lot of other packages. The dependency heatmap looks like (Figure in the original size is here):

In the heatmap, rows are the packages listed in “Depends”, “Imports” and “Suggests” fields, columns are the namespaces that are loaded if each of the package is only loaded to a new R session. The barplots on the right show the number of namespaces that are imported by each package and the time of only loading one of the packages into R.

We can see if all the packages are put in the “Imports” field, 166 namespaces will be loaded after library(cola). Some of the heavy packages such as WGCNA and clusterProfiler are not very frequently used in cola, moving them to “Suggests” field and loading them only when they are needed helps to speed up loading cola. Now the number of namespaces are reduced to only 25 after library(cola).

## Usage

To use this package:

library(pkgndep)
x = pkgndep("package-name")
plot(x)

or

x = pkgndep("path-to-the-package")
plot(x)

Executable examples:

library(pkgndep)
x = pkgndep("ComplexHeatmap")
## ========== checking ComplexHeatmap ==========
## Loading pheatmap to a new R session... 17 namespaces loaded.
x
## ComplexHeatmap version 2.5.2
## 14 namespaces loaded if only load packages in Depends and Imports
## 56 namespaces loaded after loading all packages in Depends, Imports and Suggests
plot(x)

## Statistics

I ran pkgndep on all packages that are installed in my computer. The table of the number of loaded namespaces as well as the dependency heatmaps are available at https://jokergoo.github.io/pkgndep/stat/.

For a quick look, the top 10 packages with the largest dependencies are:

Package # Namespaces also load packages in Suggests Heatmap
ReportingTools 125 131 view
epik 116 116 view
minfiData 109 109 view
minfiDataEPIC 109 109 view
ggbio 108 119 view
FlowSorted.Blood.450k 108 108 view
IlluminaHumanMethylation450kanno.ilmn12.hg19 108 108 view
IlluminaHumanMethylation450kmanifest 108 108 view
IlluminaHumanMethylationEPICanno.ilm10b2.hg19 108 108 view

And the top 10 packages with the largest dependencies where packages in “Suggests” are also loaded are:

Package # Namespaces also load packages in Suggests Heatmap