vignettes/webs/create_sdistribution.rmd
create_sdistribution.rmd
This tutorial assumes that you have a good knowledge of R6 and so we will not be going through the basics of inheritance and private/public methods/variables.
All implemented probability distributions (excluding Kernels) in distr6 inherit from the SDistribution
class. This means they share a common interface. The only differences between these distributions is that some will have methods missing as no analytic results are available. See the uml diagram for an overview of how this all fits in together. A core design principle in distr6 is that only analytical methods are defined in the SDistribution child classes, all numerical results are available through decorators. See the decorators tutorial for more information on decorators and the analytical and numerical article for further discussions on analytical and numerical methods. The summary is that when creating your own SDistribution class, please do not put any numerical methods in the core interface, if a closed form expression cannot be found, omit the method entirely and it can be imputed with a decorator. If your desired method is not available in one of our decorators but you think it is useful, see the creating a decorator extension guidelines.
Every class inheriting from SDistribution must have the following public variables:
For the Normal distribution, the above all looks like
Normal <- R6::R6Class("Normal", inherit = SDistribution, lock_objects = F) Normal$set("public","name","Normal") Normal$set("public","short_name","Norm") Normal$set("public","description","Normal Probability Distribution.") Normal$set("public","package","stats")
Note:
lock_objects=F
is not left out as it ensures decorators work correctly.set
method to add private/public variables/methodsFor the full list of methods to (optimally) include see the ‘Statistical Methods’ section in the help pages of SDistribution: ?SDistribution
. This does not include pdf/cdf/quantile/rand, these are defined in the constructor and not in the class. Once again, if there is no closed form analytical expression possible, omit the method completely.
The following methods are included by default and can therefore by omitted from the class definition:
Additionally the pgf
method returns NaN if omitted but this can be overloaded by including the method in the class definition. Below is an example of adding four methods to the Normal distribution
Normal$set("public","mean",function(){ return(self$getParameterValue("mean")) }) Normal$set("public","variance",function(){ return(self$getParameterValue("var")) }) Normal$set("public","skewness",function(){ return(0) }) Normal$set("public", "mgf", function(t){ return(exp((self$getParameterValue("mean") * t) + (self$getParameterValue("var") * t^2 * 0.5))) })
Note:
self
keyword and the getParameterValue
method, we looked at this in the custom distribution tutorial
?SDistribution
help page, this ensures that the automated S3 dispatch methods run correctlyThe constructor for all SDistribution objects looks the same, below is the constructor for the Normal distribution, which we will talk through as an example.
Normal$set("public","initialize",function(mean = 0, var = 1, sd = NULL, prec = NULL, decorators = NULL, verbose = FALSE){ private$.parameters <- getParameterSet(self, mean, var, sd, prec, verbose) self$setParameterValue(mean = mean, var = var, sd = sd, prec = prec) pdf <- function(x1) dnorm(x1, self$getParameterValue("mean"), self$getParameterValue("sd")) cdf <- function(x1) pnorm(x1, self$getParameterValue("mean"), self$getParameterValue("sd")) quantile <- function(p) qnorm(p, self$getParameterValue("mean"), self$getParameterValue("sd")) rand <- function(n) rnorm(n, self$getParameterValue("mean"), self$getParameterValue("sd")) super$initialize(decorators = decorators, pdf = pdf, cdf = cdf, quantile = quantile, rand = rand, support = Reals$new(zero = T), symmetric = TRUE, type = Reals$new(), valueSupport = "continuous", variateForm = "univariate") invisible(self) })
Note the following:
getParameterSet
and setParameterValue
methods, we will return to these later.In a separate script called getParameterSet.R we have the generic and dispatch methods for every SDistribution. This function validates the initial choice of parameterisation, tells the user the parameterisation (if verbose == TRUE
), constructs and returns the ParameterSet for the distribution. Below is the getParameterSet
method for the Normal distribution
getParameterSet.Normal <- function(x, mean, var, sd = NULL, prec = NULL, verbose = FALSE){ var.bool = sd.bool = prec.bool = FALSE if(!is.null(prec)){ if(verbose) message("Parameterised with mean and prec.") prec.bool = TRUE } else if(!is.null(sd)){ if(verbose) message("Parameterised with mean and sd.") sd.bool = TRUE } else{ if(verbose) message("Parameterised with mean and var.") var.bool = TRUE } ps <- ParameterSet$new(id = list("mean","var","sd","prec"), value = list(0, 1, 1, 1), support = list(Reals$new(), PosReals$new(), PosReals$new(), PosReals$new()), settable = list(TRUE, var.bool, sd.bool, prec.bool), updateFunc = list(NA, NA, function(self) self$getParameterValue('var')^0.5, function(self) self$getParameterValue('var')^-1), description = list("Mean - Location Parameter", "Variance - Squared Scale Parameter", "Standard Deviation - Scale Parameter", "Precision - Inverse Squared Scale Parameter")) return(ps) }
Note:
getParameterSet.Normal
TRUE
whilst all others are FALSE
.updateFunc
s reference, this does not need to be the parameterisation chosen by the user. This will also correspond to the .getRefParams
method (see below).To ensure that there is no clash of parameterisations or any conflicts between functions that are updated by others, we have an internal method that is called by setParameterValue
to ensure the same parameter is always updated. For example in the Normal distribution, it doesn’t matter if the user passes sd, var or prec to setParameterValue
internally it is always the variance that is updated, and this is because of the .getRefParams
private method. All the other parameterisations are updated automatically using the updateFunc
s that you write in the ParameterSet, hence why it is very important that these reference the default parameterisation. This is less abstract by example, again the Normal distribution
Normal$set("private",".getRefParams", function(paramlst){ lst = list() if(!is.null(paramlst$mean)) lst = c(lst, list(mean = paramlst$mean)) if(!is.null(paramlst$var)) lst = c(lst, list(var = paramlst$var)) if(!is.null(paramlst$sd)) lst = c(lst, list(var = paramlst$sd^2)) if(!is.null(paramlst$prec)) lst = c(lst, list(var = paramlst$prec^-1)) return(lst) })
Note
Internally this looks a bit like:
setParameterValue
is called with lst = list(sd = 2)
setParameterValue
calls .getRefParams(paramlst = list(sd=2))
.getRefParams
returns list(var = 4)
(sd^2)list(var = 4)
is then used by setParameterValue
to update the varianceupdate
is called in the ParameterSet which uses all the updateFunc
s to update sd and precThis may seem slightly counter-intuitive as we have first given sd, transformed this to variance then later re-calculated sd. However this is incredibly important as it maintains no clashes when multiple functions are updated and allows complex updates (such as in composite distributions) to run smoothly. Even when only one parameterisation is possible, this is still a required private method as it is called directly in setParmeterValue
.
That’s everything that is required to create your own SDistribution class. In summary the different components include
?SDistribution
getParameterSet
dispatch method, written in the getParameterSet.R
script.getRefParams
for ensuring efficient updating of parameters