This is a flexible tool for enrichment analysis based on user-defined sets. It allows users to perform over-representation analysis of the custom sets among any specified ranked feature list, hence making enrichment analysis applicable to various types of data from different scientific fields. EnrichIntersect also enables an interactive means to visualize identified associations based on, for example, the mix-lasso model (Zhao et al., 2022) or similar methods.
To get started, load the package with
An example input data object
cancers_drug_groups is an R
list provided in our package, which includes a
data.frame object with 147 cancer drugs as rows and nine cancer types as columns, and another
data.frame that groups the 147 drugs (second column) into nine user-defined drug classes (first column). The default setup of
enrichment() uses the classic K-S test statistic to calculate the normalized enrichment score that quantifies the degree to which the features in a feature set are over-represented at the top or bottom of the entire ranked list of features (e.g. a list of drugs), using as default 100 permutations for the empirical null test statistic. In the visualization, the statistically significantly enriched feature sets are marked with red coloured circles given a pre-specified significance level. The pre-specified significance level will be adjusted automatically if argument
padj.method is one of
c("holm","hochberg","hommel","bonferroni","BH","BY","fdr","none"). Users can specify the argument
alpha for calculating a weighted enrichment score, argument
normalize=FALSE for using the standard enrichment score rather than the normalized score, argument
permute.n for the number of permutations of the ranked feature list used for estimating the empirical null test statistic, and the argument
pvalue.cutoff for marking enriched categories at a specific significance level. See the following code for an example.
<- cancers_drug_groups$score x <- cancers_drug_groups$custom.set custom.set set.seed(123) <- enrichment(x, custom.set, permute.n = 100)enrich
intersectSankey() creates a Sankey diagram to visualize intersecting sets from an
array object, in which the first dimension represents intermediate variables, and the second and third dimensions represent multiple levels and multiple tasks, respectively. One intersecting set is a list of intermediate variables associated with a combination of a subset of levels and a subset of tasks, which is not easy to visualize when all possible combinations of the two are many. Our function
intersectSankey() has adapted
sankeyNetwork() from R package
networkD3 to create a
networkD3, the user can also save the Sankey diagram as a pdf or png file via R package
webshot2. The argument
out.fig=c(NA,"html","pdf","png") in the function
intersectSankey() determines the figure on the user’s R graphics device, to be saved either as a html, pdf or png file.
An example input data object
cancers_genes_drugs in the package is an
array with associations between, e.g., 56 genes (first dimension), two cancer types (second dimension) and two drugs (third dimension) provided in our package. The user can adjust the Sankey diagram argument
out.fig for different output graph types and use argument
step.names to indicate the labels of the three kinds of variables in a Sankey diagram, i.e., name of multiple levels, name of intermediate variables, and name of multiple tasks, see the following code for an example.
data(cancers_genes_drugs, package = "EnrichIntersect") intersectSankey(cancers_genes_drugs, step.names = c("Cancers", "Genes", "Drugs"))
## Warning in links$group[seq_along(length(group_tmp))] <- group_tmp: número de ## items para para sustituir no es un múltiplo de la longitud del reemplazo
Zhi Zhao, Manuela Zucknick, Tero Aittokallio (2022). EnrichIntersect: an R package for custom set enrichment analysis and interactive visualization of intersecting sets. Bioinformatics Advances, 2(1), vbac073. DOI: 10.1093/bioadv/vbac073.
Zhi Zhao, Shixiong Wang, Manuela Zucknick, Tero Aittokallio (2022). Tissue-specific identification of multi-omics features for pan-cancer drug response prediction. iScience, 25(8): 104767. DOI: 10.1016/j.isci.2022.104767.