Models

# Local sampler

## Generic model

A model must be an immutable type with an associated gradloglik function. It is important this function be coded as efficiently as possible since it is called a large number of time in any simulation.

## Multivariate Gaussian

### Hierarchy of types

Multiple parametrisation are possible. Some being more efficient than others while some are maybe more intuitive than others.

MvGaussian (abstract)
| — MvGaussianStandard
| — MvDiagonalGaussian
| — MvGaussianCanon
| — MvGaussianNatural

In the sequel we write $\mu$ the mean, $\Sigma$ the covariance matrix and $\Omega$ the precision matrix. The different way to parametrise the distributions are as follows:

• MvGaussianStandard, direct: $(\mu, \Sigma)$, indirect: (\Omega\mu,\Omega)

• MvDiagonalGaussian, direct: $(\mu, (\sigma_i))$, indirect: $(\sigma_i^2)$

• MvGaussianCanon, direct: $(\mu, \Omega)$, indirect: $(\Omega\mu)$

• MvGaussianNatural, direct: $(\Omega\mu,-\Omega)$

The preferred way is the "canonical" representation (most efficient).

Note: "direct" means that these are the parameters passed to the constructor while "indirect" means that these values are computed when the constructor is called.

### Auxiliary functions

Internally, the types mentioned above are shortened to MvGS, MvDG etc. Then a number of simplifying functions are defined (these simplify the computation of the log-likelihood and gradient of the log-likelihood)

• mvg_mu to recover $\mu$

• mvg_precmu to recover $\Omega\mu$

• mvg_precmult taking a point and multiplying it by $\Omega$

gradloglik is then trivial to compute.

## Logistic Regression

The logistic regression considers a feature matrix X, a response y, the Lipschitz constant associated to it and dimensionality parameters.

### Auxiliary functions

A number of auxiliary functions are defined to prevent numerical instabilities and ensure that the computation of the log-likelihood and gradient of the log-likelihood can be expressed simply.

The gradloglik_cv considers a control-variate gradient developed around a given point (see this paper for more details).

Note: the response is in $\{-1,1\}$.

## Probabilistic Matrix Factorisation

This model considers a normal distribution on every entry of a matrix $r_{ij}$:

\begin{equation} \mathcal N(r_{ij}; \langle u,v\rangle , \sigma^2) \end{equation}

The resulting intensity can be shown to be a truncated cubic for which we can in fact also do exact sampling.

The pmf_case* correspond to the various possible cases depending on where the roots of the cubic are.