# Homogeneous Ensembles

Although an ensemble of models sharing a common set of hyperparameters can defined using the learning network API, MLJ's `EnsembleModel`

model wrapper is preferred, for convenience and best performance.

`MLJEnsembles.EnsembleModel`

— Function```
EnsembleModel(atom=nothing,
atomic_weights=Float64[],
bagging_fraction=0.8,
n=100,
rng=GLOBAL_RNG,
acceleration=default_resource(),
out_of_bag_measure=[])
```

Create a model for training an ensemble of `n`

learners, with optional bagging, each with associated model `atom`

. Ensembling is useful if `fit!(machine(atom, data...))`

does not create identical models on repeated calls (ie, is a stochastic model, such as a decision tree with randomized node selection criteria), or if `bagging_fraction`

is set to a value less than 1.0, or both. The constructor fails if no `atom`

is specified.

Only atomic models supporting targets with scitype `AbstractVector{<:Finite}`

(univariate classifiers) or `AbstractVector{<:Continuous}`

(univariate regressors) are supported.

If `rng`

is an integer, then `MersenneTwister(rng)`

is the random number generator used for bagging. Otherwise some `AbstractRNG`

object is expected.

The atomic predictions are weighted according to the vector `atomic_weights`

(to allow for external optimization) except in the case that `atom`

is a `Deterministic`

classifier. Uniform atomic weights are used if `weight`

has zero length.

The ensemble model is `Deterministic`

or `Probabilistic`

, according to the corresponding supertype of `atom`

. In the case of deterministic classifiers (`target_scitype(atom) <: Abstract{<:Finite}`

), the predictions are majority votes, and for regressors (`target_scitype(atom)<: AbstractVector{<:Continuous}`

) they are ordinary averages. Probabilistic predictions are obtained by averaging the atomic probability distribution/mass functions; in particular, for regressors, the ensemble prediction on each input pattern has the type `MixtureModel{VF,VS,D}`

from the Distributions.jl package, where `D`

is the type of predicted distribution for `atom`

.

The `acceleration`

keyword argument is used to specify the compute resource (a subtype of `ComputationalResources.AbstractResource`

) that will be used to accelerate/parallelize ensemble fitting.

If a single measure or non-empty vector of measures is specified by `out_of_bag_measure`

, then out-of-bag estimates of performance are written to the trainig report (call `report`

on the trained machine wrapping the ensemble model).

*Important:* If sample weights `w`

(as opposed to atomic weights) are specified when constructing a machine for the ensemble model, as in `mach = machine(ensemble_model, X, y, w)`

, then `w`

is used by any measures specified in `out_of_bag_measure`

that support sample weights.