restatapi

An R package to search and retrieve data from Eurostat database using SDMX

installation

Now ‘restatapi’ is on CRAN so it can be installed by

> install.packages("restatapi")

or use the development version from GitHub

> devtools::install_github("eurostat/restatapi")

background

This package is similar to other packages like the eurostat, rdbnomics, RJSDMX or TSsdmx which can be used to download data from Eurostat database. The difference is that restatapi is based on SDMX (Statistical Data and Metadata eXchange) and XML to search and retrieve filtered datasets and use the TSV (tab separeted values) bulk download facility to get whole data tables. The code was written in a way that the number of dependencies on other packages should be very small. The restatapi package provides flexible filtering options, data caching, and uses the parallel and data.table package to handle large dataset in an efficient way.

content

The package contains 8 main functions and several other sub functions in 3 areas.

  1. downloading and filtering the list of avaialble datasets:
  2. downloading and searching in the metadata:
  3. retrieving full or partial datasets:

Below there are examples demonstrating the main features, the detailed documentation of the functions is in the package.

Next to the functions the package contains a list of country codes for different groups of European countries based on the Eurostat standard code list, e.g.: European Union (EU28, …, EU6), Euro Area (EA19, …, EA11) or New Member States (NMS13, …, NMS2).

examples

Example 1: Setting the number of cores/threads to 3 and download the txt version of the English TOC and search not case sensitive for the word energie in the German version of the TOC.

> options(restatapi_cores=3)
> get_eurostat_toc()
> get_get_eurostat_toc(mode="txt",verbose=TRUE)
> search_eurostat_toc("energie",lang="de",ignore.case=TRUE)

Other examples:

> dsd<-get_eurostat_dsd("ei_bsfs_q")
> search_eurostat_dsd("EU",dsd)
> search_eurostat_dsd("EU",dsd,name=FALSE)
> 
> get_eurostat_data("NAMA_10_GDP")
> get_eurostat_data("nama_10_gdp",update_cache=TRUE,check_toc=TRUE)
> get_eurostat_data("nama_10_gdp",cache_dir="/tmp",stringAsFactors=FALSE,cflags=TRUE)
>
> options(restatapi_update=TRUE)
> options(restatapi_cache_dir=file.path(tempdir(),"restatapi"))
>
> dt<-get_eurostat_data("avia_par_me",select_freq="A",cache=FALSE)
> dt<-get_eurostat_data("agr_r_milkpr",date_filter=2008,keep_flags=TRUE)
> dt<-get_eurostat_data("avia_par_ee",
>                        filters="BE$",
>                        date_filter=c("2017-03",2016,"2017-07-01",2012:2014),
>                        select_freq="Q",
>                        label=TRUE,
>                        verbose=TRUE,
>                        name=FALSE)
> dt<-get_eurostat_data("bop_its6_det",
>                        filters=list(bop_item="SC",
>                                     currency="MIO_EUR",
>                                     partner="EXT_EU28",
>                                     geo=c("EU28","HU"),
>                                     stk_flow="BAL"),
>                        date_filter="2010:2012",
>                        select_freq="A",
>                        label=TRUE,
>                        name=FALSE,
>                        ignore.case=TRUE)     
> options(restatapi_cache_dir=tempdir())
> dt<-get_eurostat_data("agr_r_milkpr",
>                       filters=c("BE$","Hungary"),
>                       date_filter="2007-06<",
>                       keep_flags=TRUE)
> dt<-get_eurostat_data("agr_r_milkpr",
>                       filters="BE",
>                       exact_match=FALSE,
>                       date_filter="2006-02:2008-06-05",
>                       keep_flags=TRUE,
>                       stringsAsFactors=FALSE,
>                       label=TRUE,
>                       ignore.case=TRUE)
>
> eu<-get("cc",envir=.restatapi_env)
> dt<-get_eurostat_data("agr_r_milkpr",
>                       filters=eu$NMS10,
>                       date_filter="2009-03-01:2011-06-05",
>                       keep_flags=TRUE,
>                       stringsAsFactors=FALSE,
>                       label=TRUE)
>
> dt<-get_eurostat_data("nama_10_a10_e",
>                       filters=c("Annual","EU28","Belgium","AT","Total","EMP_DC","person"),
>                       date_filter=c("2008",2002,"2005-01",2013:2018))
> dt<-get_eurostat_data("avia_par_me",
>                       filters="Q...ME_LYPG_HU_LHBP+ME_LYTV_UA_UKKK",
>                       date_filter=c("2016-08","2017-07-01"),
>                       select_freq="M") 
> dt<-get_eurostat_data("avia_par_me",
>                       filters=c("HU","Quarterly","Monthly"),
>                       date_filter=c("2016-08","2014-03-01"),
>                       label=TRUE)
>
> clean_restatapi_cache(tempdir(),verbose=TRUE)