C or R

One of the most common questions when implementing a function in R, especially one that may take a long time to run, is whether the function should be written in C and interfaced using the rcpp package (Eddelbuettel and Francois 2011). In distr6, not only did we not opt to write our functions in C, but we actively chose to write our own d/p/q/r methods instead of interfacing packages that had already done so in C. We had several reasons for this:

  1. To minimise dependencies on unknown packages
  2. To increase efficiency (we didn’t have to verify the code quality of unknown packages)
  3. To make the d/p/q/r/ methods visible to the user
  4. Tested C implementations were not vastly quicker

Note: We did interface the d/p/q/r methods of some distributions from other packages. We interfaced everything directly from R stats. Additionally we interfaced the d/p/q/r functions in a couple of distributions from extraDistr (Wolodzko 2019), usually because they had derived some tricky inverse cdfs.

Speed Tests

Below are some speed tests against packages actuar (Dutang, Goulet, and Pigeon 2008) and extraDistr that we ran comparing our distr6 methods. We want to stress that we are not implying that C implementations are a bad idea or that the distributions are poorly implemented in these packages but simply that we deliberately risked compromising on speed to ensure we could maintain the distributions more easily and more effectively.

From the results we see that the R implementation in distr6 in fact faster than C implements in some cases.

The Code

#  Pareto: distr6 vs. extraDistr
t1 = Sys.time()
extraDistr::dpareto(1)
t2 = Sys.time()
extraDistr::dpareto(1:10)
t3 = Sys.time()
extraDistr::dpareto(1:100)
t4 = Sys.time()
extraDistr::dpareto(1:1000)
t5 = Sys.time()
extraDistr::rpareto(1000000)
t6 = Sys.time()

extradist_pdf_1  = round(as.numeric(t2 - t1),4)
extradist_pdf_10  = round(as.numeric(t3 - t2),4)
extradist_pdf_100  = round(as.numeric(t4 - t3),4)
extradist_pdf_1000  = round(as.numeric(t5 - t4),4)
extradist_rand_1000000  = round(as.numeric(t6 - t5),4)

p = Pareto$new()

t1 = Sys.time()
p$pdf(1)
t2 = Sys.time()
p$pdf(10)
t3 = Sys.time()
p$pdf(100)
t4 = Sys.time()
p$pdf(1000)
t5 = Sys.time()
p$rand(1000000)
t6 = Sys.time()

distr6_pdf_1  = round(as.numeric(t2 - t1),4)
distr6_pdf_10  = round(as.numeric(t3 - t2),4)
distr6_pdf_100  = round(as.numeric(t4 - t3),4)
distr6_pdf_1000  = round(as.numeric(t5 - t4),4)
distr6_rand_1000000 = round(as.numeric(t6 - t5),4)

#  Gumbel: distr6 vs. actuar
t1 = Sys.time()
actuar::dgumbel(1,1,1)
t2 = Sys.time()
actuar::dgumbel(1:10,1,1)
t3 = Sys.time()
actuar::dgumbel(1:100,1,1)
t4 = Sys.time()
actuar::dgumbel(1:1000,1,1)
t5 = Sys.time()
actuar::rgumbel(1000000,1,1)
t6 = Sys.time()

actuar_pdf_1  = round(as.numeric(t2 - t1),4)
actuar_pdf_10  = round(as.numeric(t3 - t2),4)
actuar_pdf_100  = round(as.numeric(t4 - t3),4)
actuar_pdf_1000  = round(as.numeric(t5 - t4),4)
actuar_rand_1000000  = round(as.numeric(t6 - t5),4)

g = Gumbel$new()

t1 = Sys.time()
g$pdf(1)
t2 = Sys.time()
g$pdf(10)
t3 = Sys.time()
g$pdf(100)
t4 = Sys.time()
g$pdf(1000)
t5 = Sys.time()
g$rand(1000000)
t6 = Sys.time()

distr6_pdf_1  = round(as.numeric(t2 - t1),4)
distr6_pdf_10  = round(as.numeric(t3 - t2),4)
distr6_pdf_100  = round(as.numeric(t4 - t3),4)
distr6_pdf_1000  = round(as.numeric(t5 - t4),4)
distr6_rand_1000000 = round(as.numeric(t6 - t5),4)

The Results

For distr6 vs extraDistr

#>      Package  pdf_1 pdf_10 pdf_100 pdf_1000 rand_1000000
#> 1 extraDistr 0.0140 0.0022  0.0023   0.0039       1.9729
#> 2     distr6 0.0034 0.0099  0.0032   0.0030       3.0028

For distr6 vs actuar

#>   Package  pdf_1 pdf_10 pdf_100 pdf_1000 rand_1000000
#> 1  actuar 0.0118 0.0018  0.0022   0.0042       2.8701
#> 2  distr6 0.0034 0.0099  0.0032   0.0030       3.0028

References

Dutang, Christophe, Vincent Goulet, and Mathieu Pigeon. 2008. “Actuar: An R Package for Actuarial Science.” ournal of Statistical Software. http://www.jstatsoft.org/v25/i07.

Eddelbuettel, Dirk, and Romain Francois. 2011. “Rcpp: Seamless R and C++ Integration.” Journal of Statistical Software. http://www.jstatsoft.org/v40/i08/.

Wolodzko, Tymoteusz. 2019. “extraDistr: Additional Univariate and Multivariate Distributions.” CRAN. https://CRAN.R-project.org/package=extraDistr.