Greed enables model based clustering of networks, matrices of count data and much more with different types of generative models. Model selection and clustering is performed in combination by optimizing the Integrated Classification Likelihood. Details of the algorithms and methods proposed by this package can be found in Côme, Jouvin, Latouche, and Bouveyron (2021) 10.1007/s11634-021-00440-z.

The following generative models are available currently :

- Stochastic Block Models (for
**directed**and**un-directed**graphs, see`?`sbm-class``

; to deal with missing values see`?`misssbm-class``

), - Degree Corrected Stochastic Block Models (for
**directed**and**un-directed**graphs, see`?`dcsbm-class``

), - Stochastic Block Models with Multinomial observations (
**experimental**, see`?`multsbm-class``

), - Degree Corrected Latent Block Models (see
`?`co_dcsbm-class``

), - Mixture of Multinomials (see
`?`mm-class``

), - Gaussian Mixture Model (
**experimental**, see`?`gmm-class``

and`?`diaggmm-class``

), - Multivariate Mixture of Gaussian Regression Model (
**experimental**, see`?`mvmreg-class``

).

With the Integrated Classification Likelihood, the parameters of the models are integrated out. This allows a natural regularization for complex models. Since the Integrated Classification Likelihood penalizes complex models it allows to automatically find a “natural” value for the number of clusters (K^*), the user only needs to provide an initial guess as well as values for the prior parameters (sensible default values are used if no prior information is available). The optimization is performed by default thanks to a combination of a greedy local search and a genetic algorithm. Several optimization algorithms are available.

Eventually, the whole path of solutions from (K^*) to 1 cluster is extracted. This enables a partial ordering of the clusters, and the evaluation of simpler clustering. The package also provides some plotting functionality.

You can install the development version of greed from GitHub with:

Or use the CRAN version:

The main entry point for using the package is simply the greed function (`?greed`

). The generative model will be chosen automatically to fit with the data provided, but you may specify another choice with the model parameter. This is a basic example with the classical Jazz network:

```
library(greed)
data(Jazz)
sol=greed(Jazz)
#> ------- undirected DCSBM model fitting ------
#> ################# Generation 1: best solution with an ICL of -28611 and 16 clusters #################
#> ################# Generation 2: best solution with an ICL of -28601 and 15 clusters #################
#> ################# Generation 3: best solution with an ICL of -28580 and 16 clusters #################
#> ################# Generation 4: best solution with an ICL of -28578 and 15 clusters #################
#> ################# Generation 5: best solution with an ICL of -28578 and 15 clusters #################
#> ------- Final clustering -------
#> ICL clustering with a DCSBM model, 14 clusters and an icl of -28559.
```

Here Jazz is a square sparse matrix and a `?`dcsbm-class``

model will be used by default. Some plotting function enable the exploration of the clustering results:

And the hierarchical structure between clusters:

Eventually, one may explore some coarser clustering using the cut function:

For large datasets, it is possible to use parallelism to speed-up the computation thanks to the future package. You only need to specify the type of backend you want to use.

```
library(future)
plan(multisession)
data("Blogs")
sol=greed(Blogs$X)
#> ------- directed DCSBM model fitting ------
#> ################# Generation 1: best solution with an ICL of -84417 and 16 clusters #################
#> ################# Generation 2: best solution with an ICL of -84358 and 17 clusters #################
#> ################# Generation 3: best solution with an ICL of -84199 and 18 clusters #################
#> ################# Generation 4: best solution with an ICL of -84179 and 19 clusters #################
#> ################# Generation 5: best solution with an ICL of -84160 and 18 clusters #################
#> ################# Generation 6: best solution with an ICL of -84150 and 17 clusters #################
#> ################# Generation 7: best solution with an ICL of -84143 and 17 clusters #################
#> ################# Generation 8: best solution with an ICL of -84143 and 17 clusters #################
#> ------- Final clustering -------
#> ICL clustering with a DCSBM model, 16 clusters and an icl of -84101.
plot(sol)
```