CRAN Task View: Natural Language Processing

Maintainer:Fridolin Wild
Contact:wild at
Contributions:Suggestions and improvements for this task view are very welcome and can be made through issues or pull requests on GitHub or via e-mail to the maintainer address. For further details see the Contributing guide.
Citation:Fridolin Wild (2022). CRAN Task View: Natural Language Processing. Version 2022-05-06. URL
Installation:The packages from this task view can be installed automatically using the ctv package. For example, ctv::install.views("NaturalLanguageProcessing", coreOnly = TRUE) installs all the core packages or ctv::update.views("NaturalLanguageProcessing") installs all packages that are not yet installed and up-to-date. See the CRAN Task View Initiative for more details.

Natural language processing has come a long way since its foundations were laid in the 1940s and 50s (for an introduction see, e.g., Jurafsky and Martin (2008, 2009, 2022 draft third edition): Speech and Language Processing, Pearson Prentice Hall). This CRAN task view collects relevant R packages that support computational linguists in conducting analysis of speech and language on a variety of levels - setting focus on words, syntax, semantics, and pragmatics.

In recent years, we have elaborated a framework to be used in packages dealing with the processing of written material: the package tm. Extension packages in this area are highly recommended to interface with tm’s basic routines and useRs are cordially invited to join in the discussion on further developments of this framework package.

A basic introduction with comprehensive examples is provided in the book by Fridolin Wild (2016): Learning Analytics in R, Springer.


Words (lexical DBs, keyword extraction, string manipulation, stemming)




CRAN packages

Regular:boilerpipeR, BTM, corpora, corporaexplorer, crfsuite, gsubfn, hunspell, kernlab, koRpus, languageR, lda, lsa, movMF, mscstexta4r, mscsweblm4r, openNLP, ore, phonics, qdap, quanteda, RcmdrPlugin.temis, RKEA, ruimtehol, RWeka, sentencepiece, skmeans, SnowballC, stm, stringdist, stringi, tau, tesseract, text2vec, textcat, textir, textplot, textrank, textreuse, tidytext, tm.plugin.alceste, tm.plugin.dc, tm.plugin.europresse, tm.plugin.factiva, tm.plugin.lexisnexis, tm.plugin.mail, tm.plugin.webmining, tokenizers, tokenizers.bpe, topicdoc, topicmodels, udpipe, word2vec, wordcloud, wordnet, zipfR.

Related links

Other resources