Technical Details: Difference between ggpredict() and ggemmeans()

Daniel Lüdecke

2020-09-12

ggpredict() and ggemmeans() compute predicted values for all possible levels or values from a model’s predictor. Basically, ggpredict() wraps the predict()-method for the related model, while ggemmeans() wraps the emmeans()-method from the emmeans-package. Both ggpredict() and ggemmeans() do some data-preparation to bring the data in shape for the newdata-argument (predict()) resp. the at-argument (emmeans()). It is recommended to read the general introduction first, if you haven’t done this yet.

For models without categorical predictors, the results from ggpredict() and ggemmeans() are identical (except some slight differences in the associated confidence intervals, which are, however, negligable).

library(magrittr)
library(ggeffects)
data(efc)
fit <- lm(barthtot ~ c12hour + neg_c_7, data = efc)

ggpredict(fit, terms = "c12hour")
#> 
#> # Predicted values of Total score BARTHEL INDEX
#> # x = average number of hours of care per week
#> 
#>   x | Predicted |   SE |         95% CI
#> ---------------------------------------
#>   0 |     75.07 | 1.08 | [72.96, 77.18]
#>  20 |     70.15 | 0.90 | [68.40, 71.91]
#>  45 |     64.01 | 0.82 | [62.41, 65.61]
#>  65 |     59.09 | 0.90 | [57.32, 60.86]
#>  85 |     54.17 | 1.09 | [52.04, 56.30]
#> 105 |     49.25 | 1.33 | [46.65, 51.86]
#> 125 |     44.34 | 1.61 | [41.18, 47.49]
#> 170 |     33.27 | 2.29 | [28.79, 37.76]
#> 
#> Adjusted for:
#> * neg_c_7 = 11.83

ggemmeans(fit, terms = "c12hour")
#> 
#> # Predicted values of Total score BARTHEL INDEX
#> # x = average number of hours of care per week
#> 
#>   x | Predicted |   SE |         95% CI
#> ---------------------------------------
#>   0 |     75.07 | 1.08 | [72.96, 77.19]
#>  20 |     70.15 | 0.90 | [68.40, 71.91]
#>  45 |     64.01 | 0.82 | [62.40, 65.61]
#>  65 |     59.09 | 0.90 | [57.32, 60.86]
#>  85 |     54.17 | 1.09 | [52.04, 56.31]
#> 105 |     49.25 | 1.33 | [46.64, 51.87]
#> 125 |     44.34 | 1.61 | [41.18, 47.49]
#> 170 |     33.27 | 2.29 | [28.78, 37.76]
#> 
#> Adjusted for:
#> * neg_c_7 = 11.83

As can be seen, the continuous predictor neg_c_7 is held constant at its mean value, 11.83. For categorical predictors, ggpredict() and ggemmeans() behave differently. While ggpredict() uses the reference level of each categorical predictor to hold it constant, ggemmeans() - like ggeffect() - averages over the proportions of the categories of factors.

library(sjmisc)
data(efc)
efc$e42dep <- to_label(efc$e42dep)
fit <- lm(barthtot ~ c12hour + neg_c_7 + e42dep, data = efc)

ggpredict(fit, terms = "c12hour")
#> 
#> # Predicted values of Total score BARTHEL INDEX
#> # x = average number of hours of care per week
#> 
#>   x | Predicted |   SE |         95% CI
#> ---------------------------------------
#>   0 |     92.74 | 2.17 | [88.48, 97.00]
#>  20 |     91.32 | 2.17 | [87.07, 95.57]
#>  45 |     89.53 | 2.21 | [85.21, 93.86]
#>  65 |     88.10 | 2.27 | [83.65, 92.56]
#>  85 |     86.68 | 2.37 | [82.04, 91.32]
#> 105 |     85.25 | 2.49 | [80.38, 90.12]
#> 125 |     83.82 | 2.63 | [78.67, 88.97]
#> 170 |     80.61 | 3.00 | [74.72, 86.50]
#> 
#> Adjusted for:
#> * neg_c_7 =       11.83
#> *  e42dep = independent

ggemmeans(fit, terms = "c12hour")
#> 
#> # Predicted values of Total score BARTHEL INDEX
#> # x = average number of hours of care per week
#> 
#>   x | Predicted |   SE |         95% CI
#> ---------------------------------------
#>   0 |     73.51 | 0.85 | [71.85, 75.18]
#>  20 |     72.09 | 0.73 | [70.65, 73.53]
#>  45 |     70.30 | 0.72 | [68.89, 71.71]
#>  65 |     68.87 | 0.81 | [67.29, 70.46]
#>  85 |     67.45 | 0.97 | [65.55, 69.34]
#> 105 |     66.02 | 1.16 | [63.74, 68.30]
#> 125 |     64.59 | 1.38 | [61.88, 67.31]
#> 170 |     61.38 | 1.92 | [57.61, 65.15]
#> 
#> Adjusted for:
#> * neg_c_7 = 11.83

In this case, one would obtain the same results for ggpredict() and ggemmeans() again, if condition is used to define specific levels at which variables, in our case the factor e42dep, should be held constant.

ggpredict(fit, terms = "c12hour")
#> 
#> # Predicted values of Total score BARTHEL INDEX
#> # x = average number of hours of care per week
#> 
#>   x | Predicted |   SE |         95% CI
#> ---------------------------------------
#>   0 |     92.74 | 2.17 | [88.48, 97.00]
#>  20 |     91.32 | 2.17 | [87.07, 95.57]
#>  45 |     89.53 | 2.21 | [85.21, 93.86]
#>  65 |     88.10 | 2.27 | [83.65, 92.56]
#>  85 |     86.68 | 2.37 | [82.04, 91.32]
#> 105 |     85.25 | 2.49 | [80.38, 90.12]
#> 125 |     83.82 | 2.63 | [78.67, 88.97]
#> 170 |     80.61 | 3.00 | [74.72, 86.50]
#> 
#> Adjusted for:
#> * neg_c_7 =       11.83
#> *  e42dep = independent

ggemmeans(fit, terms = "c12hour", condition = c(e42dep = "independent"))
#> 
#> # Predicted values of Total score BARTHEL INDEX
#> # x = average number of hours of care per week
#> 
#>   x | Predicted |   SE |         95% CI
#> ---------------------------------------
#>   0 |     92.74 | 2.17 | [88.48, 97.01]
#>  20 |     91.32 | 2.17 | [87.06, 95.57]
#>  45 |     89.53 | 2.21 | [85.20, 93.87]
#>  65 |     88.10 | 2.27 | [83.64, 92.57]
#>  85 |     86.68 | 2.37 | [82.03, 91.32]
#> 105 |     85.25 | 2.49 | [80.37, 90.13]
#> 125 |     83.82 | 2.63 | [78.67, 88.98]
#> 170 |     80.61 | 3.00 | [74.71, 86.51]
#> 
#> Adjusted for:
#> * neg_c_7 = 11.83

Creating plots is as simple as described in the vignette Plotting Marginal Effects.

ggemmeans(fit, terms = c("c12hour", "e42dep")) %>% plot()