The `{domir}`

package contains functions that apply
decomposition-based relative importance analysis methods (*dominance
analysis* or *Shapley value decomposition*) to predictive
modeling functions in R.

The intention of this package is to provide a flexible user interface
to dominance analysis—a popular relative importance analysis method that
extends on the rigorous solution concept of Shapley value decomposition
in cooperative game theory. The user interface is structured such that
`{domir}`

automates the decomposition of the returned value
and comparisons between model inputs and the user provides the model
inputs, the predictive model into which they are entered, and returned
value from the model to decompose.

To install the most recent stable version of `{domir}`

from CRAN use:

`install.packages("domir")`

To install the working, development version of `{domir}`

using the `{devtools}`

package use:

`devtools::install_github("https://github.com/jluchman/domir")`

Coming soon: see the `{domir}`

-based
`dominance_analysis`

function for the {parameters} package
from the `{easystats}`

/easyverse.

`{domir}`

DoesThe primary dominance analysis function `domir`

implements
the most computationally intensive and programming heavy parts of
dominance analysis for the user and has relatively few requirements on
the predictive modeling functions with which it can work.

The flexibility of `domir`

comes at the cost of more
complexity for the user in terms of setting up a function that accepts
the type of input `domir`

will provide (currently only a
‘formula’) and and expects to receive (currently only a numeric
scalar).

Below these ideas are outlined in greater detail in the context of a
few examples. The next section begins the discussion with a more
extensive comparison of `domir`

with packages that implement
similar methods.

The `domir`

function is similar to the “lmg” type for the
`calc.relimpo`

function in the `{relaimpo}`

package as well as the `dominanceAnalysis`

function in the `{dominanceanalysis}`

package (not on CRAN). `domir`

can replicate the results
produced by both the above packages but, as will be seen, requires more
user input.

To illustrate these points, consider the following example linear regression on which all three of the dominance analysis results to come are based:

`lm(mpg ~ am + vs + cyl, data = mtcars)`

Classic dominance analysis uses the variance explained as fit statistic (i.e., as implemented by
`lm`

’s `summary`

method) and so will this
example.

`{domir}`

’s `domir`

Implementing a ‘classic’ dominance analysis on this linear regression
in `domir`

can be inputted as:

```
<-
lm_wrapper function(formula, data) {
<-
result lm(formula, data = data) |>
summary()
return(result[["r.squared"]])
}
domir(mpg ~ am + vs + cyl,
lm_wrapper,data = mtcars)
```

```
## Overall Value: 0.7619773
##
## General Dominance Values:
## General Dominance Standardized Ranks
## am 0.1774892 0.2329324 3
## vs 0.2027032 0.2660226 2
## cyl 0.3817849 0.5010450 1
##
## Conditional Dominance Values:
## Subset Size: 1 Subset Size: 2 Subset Size: 3
## am 0.3597989 0.1389842 0.033684441
## vs 0.4409477 0.1641982 0.002963748
## cyl 0.7261800 0.3432799 0.075894823
##
## Complete Dominance Designations:
## Dmnated?am Dmnated?vs Dmnated?cyl
## Dmnates?am NA NA FALSE
## Dmnates?vs NA NA FALSE
## Dmnates?cyl TRUE TRUE NA
```

In `domir`

, the `lm`

model is not submitted
directly. Rather, it is wrapped into a function (i.e.,
`lm_wrapper`

) that, in this case, accepts two arguments;
*formula* or an R formula and *data* a data frame in which
the independent variables in the formula are present. The result of the
`lm`

is piped (i.e., `|>`

) into the
`summary`

function and the result is captured as the object
*result*. The *result* object is then filtered to just the
**r.squared** element and returned.

What `domir`

does automate taking subsets of the
*formula* and submit them, repeatedly until all possible subsets
have been submitted, to `lm_wrapper`

(see this vignette
for a conceptual discussion of dominance analysis). In this way,
`domir`

is a `Map`

- or `lapply`

-like
function as it receives an object on which to operate (i.e., the
*formula*) and a function to which to apply to it.
`domir`

expects a numeric scalar to be returned from the
function.

Like `lapply`

, other arguments
(`data = mtcars`

) can also be passed to each call of the
function and must be explicitly built into the wrapper function.

What is important to note about `domir`

that differs from
other dominance analysis-oriented functions discussed below is that
`domir`

expects that the user will supply the analysis
pipeline linking the *formula* it passes to the numeric scalar
value that it expects. This ‘supply the pipeline’ approach makes
`domir`

far more flexible than other implementations but does
require the user to think more carefully about how to structure the
pipeline.

Note that the focus of `domir`

’s `print`

-ed
results focuses on the numerical results from “General Dominance Values”
and “Conditional Dominance Values” and, a logical matrix of “Complete
Dominance Designations”.

See also the (now superseded) `domir::domin`

function for
another approach to structuring the input pipeline for dominance
analysis.

`{relaimpo}`

’s
`calc.relimp`

with `type = "lmg"`

`{relaimpo}`

is not a dominance analysis software but does
produce general dominance value decomposition for linear regression
using the explained variance in the `calc.relimp`

function with
the argument `type = "lmg"`

.

```
::calc.relimp(mpg ~ am + vs + cyl,
relaimpodata = mtcars,
type = "lmg")
```

```
## Response variable: mpg
## Total response variance: 36.3241
## Analysis based on 32 observations
##
## 3 Regressors:
## am vs cyl
## Proportion of variance explained by model: 76.2%
## Metrics are not normalized (rela=FALSE).
##
## Relative importance metrics:
##
## lmg
## am 0.1774892
## vs 0.2027032
## cyl 0.3817849
##
## Average coefficients for different model sizes:
##
## 1X 2Xs 3Xs
## am 7.244939 4.316851 3.026480
## vs 7.940476 2.995142 1.294614
## cyl -2.875790 -2.795816 -2.137632
```

`calc.relimp`

has a similar to structure to that of
`domir`

but does not require a pipeline function. This is
because `{relaimpo}`

is specialized and works only with
`lm`

models and the variance explained as a fit statistic. `calc.relimp`

also allows for multiple methods of submitting (i.e., correlation
matrices, fitted `lm`

object, a `data.frame`

)
given that it always implements the same model.

`calc.relimp`

’s printed results provide relative
importance metric values that match those obtained from
`domir`

(i.e., the general dominance values). In addition,
`calc.relimp`

reports the average `lm`

coefficients across numbers of independent variables/s in a way similar to the conditional dominance
values reported by `domir`

—an additional and useful result to
show the impact of inclusion of different numbers of independent
variables on obtained coefficients/predicted values.

Again, note that `{relaimpo}`

is not dominance
analysis-oriented and does not report on dominance designations or
dominance values other than the general dominance values.

`{dominanceanalysis}`

’s
`dominanceAnalysis`

`{dominanceanalysis}`

implements dominance analysis for
several different predictive models including `lm`

,
`betareg`

, and `glm`

each with its own built-in
(pseudo-).

```
<- lm(mpg ~ am + vs + cyl,
lm_model data = mtcars)
::dominanceAnalysis(lm_model) dominanceanalysis
```

```
##
## Dominance analysis
## Predictors: am, vs, cyl
## Fit-indices: r2
##
## * Fit index: r2
## complete conditional general
## am
## vs am
## cyl am,vs am,vs am,vs
##
## Average contribution:
## cyl vs am
## 0.382 0.203 0.177
```

`dominanceAnalysis`

has a simpler approach than
`domir`

to get a ‘classic’ dominance analysis as it accepts a
fitted `lm`

model as input and requires use of the explained
variance as the returned value. The object returned by
`dominanceAnalysis`

is large and contains the fit statistic
values from all subsets as well as computed dominance statistics based
on them. Several helper functions are available to extract specific
dominance and other results for printing to the console.

`dominanceAnalysis`

’s default printed output is focused on
qualitative dominance designations but also reports a sorted, average
contribution metric (i.e., general dominance values).

As mentioned above, `{dominanceanalysis}`

can be used with
around seven different predictive models and implements a (pseudo-) as returned values for each. Itis worth noting
that the upcoming `dominance_analysis`

function in the
`{parameters}`

package takes a similar approach as
`{dominanceanalysis}`

but works from the
`{insight}`

engine linked with `performance::r2`

which allows extension to many different models.

`{domir}`

Extends on Previous PackagesThe intention of `{domir}`

is to extend relative
importance analysis to new data analytic situations the user might
encounter where a decomposition-based relative importance method such as
dominance analysis could be valuable.

The sections below outline some pertinent examples that the
`domir`

function can accommodate that cannot be r

Given that the user supplies the analysis pipeline, one component of
`domir`

’s flexibility is in allowing the user to apply any
applicable fit statistic as a returned value for the purposes of
relative importance analysis.

In the example below, the explained variance is swapped with an alternative, but nonetheless
applicable, fit statistic: the McFadden pseudo- as implemented by the `{pscl}`

package.

The example below is more complex than the previous
`domir`

call as the analysis pipeline is submitted as an
anonymous function with a single argument (*fml*). In part, this
approach is taken to show that the user *can* submit the function
to `domir`

in this way. In addition, note that the
`data`

argument submitted to the `lm`

function is
built-into the analysis pipeline instead of passed as an argument; both
are valid methods of setting arguments to predictive analyses.

```
domir(mpg ~ am + vs + cyl,
\(fml)<-
{(result lm(fml, data = mtcars) |>
::pR2()
pscl|> capture.output()
) return(result[["McFadden"]])
})
```

```
## Overall Value: 0.2243283
##
## General Dominance Values:
## General Dominance Standardized Ranks
## am 0.04848726 0.2161442 3
## vs 0.04970277 0.2215627 2
## cyl 0.12613826 0.5622931 1
##
## Conditional Dominance Values:
## Subset Size: 1 Subset Size: 2 Subset Size: 3
## am 0.06969842 0.05507782 0.020685547
## vs 0.09088103 0.05629333 0.001933959
## cyl 0.20243215 0.13272881 0.043253806
##
## Complete Dominance Designations:
## Dmnated?am Dmnated?vs Dmnated?cyl
## Dmnates?am NA NA FALSE
## Dmnates?vs NA NA FALSE
## Dmnates?cyl TRUE TRUE NA
```

The use of the McFadden pseudo- has produced effectively the same answers, in terms of qualitative importance inferences about the independent variables, as that of the dominance analysis using the explained variance .

It is also worth noting that the use `capture.output`

in
the anonymous function was not not strictly necessary. If not used,
`domir`

will print far more output than is needed as
`pscl::pR2`

is a rather verbose function and will print a
message for each model fitted.

The user-defined analysis pipeline also allows for extending
predictive modeling to effectively any predictive model (that the user
can adapt the formula input to accommodate). The example below is
applied to the `polr`

function from the `{MASS}`

package using `peformance::r2`

’s result as a returned
value.

```
<- data.frame(mtcars, carb2 = as.factor(mtcars$carb))
mtcars2
domir(carb2 ~ am + vs + mpg,
\(fml) ::polr(fml, data = mtcars2) |>
MASS::r2() |> unlist()
performance )
```

```
## Overall Value: 0.5764319
##
## General Dominance Values:
## General Dominance Standardized Ranks
## am 0.07067731 0.1226117 3
## vs 0.22206005 0.3852321 2
## mpg 0.28369455 0.4921562 1
##
## Conditional Dominance Values:
## Subset Size: 1 Subset Size: 2 Subset Size: 3
## am 0.004737758 0.09192243 0.11537173
## vs 0.402858270 0.24330517 0.02001669
## mpg 0.383596252 0.30493968 0.16254772
##
## Complete Dominance Designations:
## Dmnated?am Dmnated?vs Dmnated?mpg
## Dmnates?am NA NA FALSE
## Dmnates?vs NA NA NA
## Dmnates?mpg TRUE NA NA
```

The call to `unlist`

in the anonymous function above
ensures that the returned value is a numeric scalar as opposed to a list
with a single element.

`domir`

can also work with predictive models that do not
produce model coefficients like `randomForest::randomForest`

.
The dominance analysis approach’s results differ from the built-in
variable importance method plotted below (which is arguably better
suited for model selection) but can, and in the case of many of the
variables do, agree on which independent variables are important.

The dominance analysis here is based on a squared correlation of the predicted values with the dependent variable (i.e., the explained variance ).

```
set.seed(5621)
<-
rf_model ::randomForest(mpg ~ am + qsec + cyl, data = mtcars,
randomForestimportance = TRUE)
data.frame(`%IncMSE` = rf_model$importance[,1], `RankIncMSE` = rank(rf_model$importance[,1]*-1), `IncNodePurity` = rf_model$importance[,2], `RankIncPurity` = rank(rf_model$importance[,2]*-1),check.names = FALSE)
```

```
## %IncMSE RankIncMSE IncNodePurity RankIncPurity
## am 10.121981 2 188.9050 3
## qsec 6.754203 3 281.3496 2
## cyl 20.526554 1 367.8784 1
```

`cor(predict(rf_model), mtcars$mpg)^2`

`## [1] 0.7005082`

```
domir(mpg ~ am + qsec + cyl,
\(fml) {set.seed(5621)
<-
result ::randomForest(fml, data = mtcars,
randomForestimportance = TRUE)
<- cor(predict(result), mtcars$mpg)
cor return(cor^2)
} )
```

```
## Overall Value: 0.7005082
##
## General Dominance Values:
## General Dominance Standardized Ranks
## am 0.1600684 0.2285032 2
## qsec 0.1338248 0.1910396 3
## cyl 0.4066151 0.5804572 1
##
## Conditional Dominance Values:
## Subset Size: 1 Subset Size: 2 Subset Size: 3
## am 0.2642756 0.1741050 0.041824636
## qsec 0.2452030 0.1478614 0.008409932
## cyl 0.6761472 0.4206517 0.123046338
##
## Complete Dominance Designations:
## Dmnated?am Dmnated?qsec Dmnated?cyl
## Dmnates?am NA TRUE FALSE
## Dmnates?qsec FALSE NA FALSE
## Dmnates?cyl TRUE TRUE NA
```

Note the use of `set.seed`

prior to all calls to
`randomForest`

. These ensure that the random processes within
the `randomForest`

function result in a reproducible set of
predicted values (and metric). The calls to individual
`randomForest`

s also had to use the
`importance = TRUE`

argument (though they are not used) to
ensure matching with the original result as they affect the state of the
random number generator.

One distinct advantage of having the level of flexibility in the
analytic pipeline that `domir`

offers is that this that it
can work directly with modeling functions that are more complex. The
example below uses the `zeroinfl`

model from the package
`{pscl}`

that accepts a `Formula::Formula`

object
(i.e., a multi-equation formula) instead of a standard R formula.

The below example uses the entries in the formula to plug into the
`Formula`

object that will be submitted to the
`zeroinfl`

model.

```
library(Formula)
domir(~ fem + mar + kid5,
\(fml) {<-
result as.Formula(fml, fml) |>
update(art ~ .) |>
::zeroinfl(data = pscl::bioChemists) |>
pscl::r2()
performancereturn(result[["R2"]])
})
```

```
## Overall Value: 0.04922296
##
## General Dominance Values:
## General Dominance Standardized Ranks
## fem 0.031066252 0.63113333 1
## mar 0.004913445 0.09982017 3
## kid5 0.013243265 0.26904650 2
##
## Conditional Dominance Values:
## Subset Size: 1 Subset Size: 2 Subset Size: 3
## fem 0.027905567 0.031558374 0.033734816
## mar 0.003403668 0.005405566 0.005931099
## kid5 0.005029077 0.013735387 0.020965332
##
## Complete Dominance Designations:
## Dmnated?fem Dmnated?mar Dmnated?kid5
## Dmnates?fem NA TRUE TRUE
## Dmnates?mar FALSE NA FALSE
## Dmnates?kid5 FALSE TRUE NA
```

In this example, note the absence of a dependent variable in the
model formula. `domir`

does not require a left hand
side/dependent variable to accommodate situations like the one here
where it is added later in the analysis pipeline. Also, note that the
*fml* passed to the pipeline is repeated in the Poisson and
inflation model equations and then adapted to a `Formula`

object before submitting to `zeroinfl`

.

*…to be populated…*