Units of measurement

Possible mismatch in units

Working with different ICU datasets can be challenging in terms of units of measurement. In particular, combining data from different countries can cause a mismatch in the units, as the practices vary substantially. In particular, we note that the commonly used unit of measurement for laboratory values in the US datasets is mg/dL, as opposed to mmol/L used in European datasets. Note that the conversion between the two requires the molecular weight of the substance and therefore must be handled on case-to-case basis. When loading data, care needs to be taken in light of this possible problem.

ricu approach

All concepts that can be loaded with load_concepts() within ricu have been checked for units, and the units were converted where necessary.

For example, we take 5 different concepts:

data_src <- c("mimic_demo", "eicu_demo")

concepts <- c(
  "map", "lact" , "crea", "bili", "plt"
)

dat <- lapply(data_src,
  function(src) load_concepts(concepts, src, verbose = FALSE)
)

names(dat) <- data_src

dat
#> $mimic_demo
#> # A `ts_tbl`: 13,877 ✖ 7
#> # Id var:     `icustay_id`
#> # Index var:  `charttime` (1 hours)
#>        icustay_id charttime   map  lact  crea  bili   plt
#>             <int> <drtn>    <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1          201006 -58 hours    NA   1.7   0.9  NA     338
#> 2          201006 -45 hours    NA  NA     0.8   0.9   302
#> 3          201006 -21 hours    NA  NA     0.8  NA     333
#> 4          201006 -10 hours    NA   1.8  NA    NA      NA
#> 5          201006   0 hours    82   2.2   0.8  NA     306
#> …                   
#> 13,873     298685 314 hours    76  NA    NA    NA      NA
#> 13,874     298685 315 hours    58  NA    NA    NA      NA
#> 13,875     298685 316 hours    47  NA    NA    NA      NA
#> 13,876     298685 317 hours    35  NA    NA    NA      NA
#> 13,877     298685 318 hours    12  NA    NA    NA      NA
#> # … with 13,867 more rows
#> 
#> $eicu_demo
#> # A `ts_tbl`: 134,319 ✖ 7
#> # Id var:     `patientunitstayid`
#> # Index var:  `observationoffset` (1 hours)
#>         patientunitstayid observationoffset   map  lact  crea  bili   plt
#>                     <int> <drtn>            <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1                  141764  1 hours           112.    NA    NA    NA    NA
#> 2                  141764  2 hours           128     NA    NA    NA    NA
#> 3                  141764  3 hours           143     NA    NA    NA    NA
#> 4                  141764  4 hours           133     NA    NA    NA    NA
#> 5                  141764  5 hours           103     NA    NA    NA    NA
#> …                    
#> 134,315           3353113 37 hours           111     NA    NA    NA    NA
#> 134,316           3353113 39 hours            97     NA    NA    NA    NA
#> 134,317           3353113 40 hours           105     NA    NA    NA    NA
#> 134,318           3353113 41 hours            70     NA    NA    NA    NA
#> 134,319           3353113 44 hours            93     NA    NA    NA    NA
#> # … with 134,309 more rows

We plot the density of the features and report their median values:

Note that the matching between datasets is not perfect, but the median values should align closely (the above are done using the demo datasets which are rather small in size; in general the matching should be even better).

Concepts outside ricu dictionary

Not all relevant concepts are included in the ricu dictionary. When loading concepts outside the dictionary, we recommend checking whether the units match across datasets using the density plots and median values as shown above. In particular, if there is a clear difference in the median values, or if the density plots look “multimodal”, there is reason to believe some unit conversion is required.