In the previous tutorial we used variables and methods to find properties and traits of the Normal distribution. Before that we looked at statistical methods and construction of the Normal distribution. In this tutorial we look at multivariate distributions, which should feel similar to univariate distributions.

Constructing a Multivariate Distribution

Construction of a multivariate distribution is identical to a univariate distribution, except that a vector input is likely to be required for one of the parameters. In keeping with our running Normal example, we will now use the Multivariate Normal distribution.

MN <- MultivariateNormal$new(mean = c(0,0), cov = c(1,0,0,1))
MultivariateNormal$new() # This is in fact the default
#> MultiNorm(mean = c(0, 0), cov = c(1, 0, 0, 1))

Notice how this is almost identical to constructing a univariate Normal distribution. We even allow multiple parameterisations

MultivariateNormal$new(mean = c(0,0), prec = c(1,0,0,1))
#> MultiNorm(mean = c(0, 0), prec = c(1, 0, 0, 1))

Getting and Setting Parameters

Every multivariate distribution includes by default an extra parameter that cannot be updated. This is the K parameter that refers to the ‘number of components or categories’. It ensures that the correct number of inputs are given to the distribution for the d/p/q/r functions and in updating other parameters. It also means that the number of parameters must be identical when updating. Otherwise getting and setting parameters is identical to univariate distributions.

MN <- MultivariateNormal$new()
MN$getParameterValue("K")
#> [1] 2
MN$setParameterValue(mean = c(0)) # Causes an error as 1 not 2 means are given
MN$setParameterValue(mean = c(0,1))
MN$setParameterValue(prec = c(2,0,0,2,1,2)) # Truncated to c(2,0,0,2)
MN$parameters()
#>      id           value support                           description
#> 1: mean             0,1     ℝ^2 Vector of means - Location Parameter.
#> 2:  cov 0.5,0.0,0.0,0.5     ℝ^4  Covariance matrix - Scale Parameter.
#> 3: prec         2,0,0,2     ℝ^4   Precision matrix - Scale Parameter.
#> 4:    K               2      ℕ+                  Number of components

d/p/q/r

The biggest difference between univariate and multivariate distributions is in how arguments are passed to the d/p/q/r methods. This differs slightly from R stats. For example to evaluate the pdf of the multinomial distribution at (1,2) in R stats we would run

dmultinom(c(1,2), size = 3, prob = c(0.2,0.8))
#> [1] 0.384

Whereas in distr6, each point is its own argument

Multinomial$new(size = 3, probs = c(0.2,0.8))$pdf(1,2)
#> [1] 0.384

There is a very important reason for this: vectorisation. In R stats there is no way to generate multiple points from a multivariate distribution, whereas in distr6…

MN$pdf(c(1,2), c(2,3))
#> [1] 0.0430785586 0.0001067811
MN$rand(5)
#>           V1        V2
#> 1: 0.9694140 0.6006981
#> 2: 0.2567706 1.4475014
#> 3: 0.2858609 0.9249586
#> 4: 1.0688075 0.9330660
#> 5: 1.4272411 0.9556544

Note: cdf() and quantile() are often omitted from multivariate distributions n distr6 as no closed form analytic expression exists.

Summary

In this tutorial we looked at multivariate distributions and discussed the difference between distr6 and R stats in using the d/p/q/r functions. The next tutorial concludes the ‘Basic’ set of tutorials with a look at listing in distr6 to help you navigate the package more easily.