Homogeneous Ensembles

Although an ensemble of models sharing a common set of hyperparameters can defined using the learning network API, MLJ's EnsembleModel model wrapper is preferred, for convenience and best performance.

MLJEnsembles.EnsembleModelFunction
EnsembleModel(atom=nothing,
              atomic_weights=Float64[],
              bagging_fraction=0.8,
              n=100,
              rng=GLOBAL_RNG,
              acceleration=default_resource(),
              out_of_bag_measure=[])

Create a model for training an ensemble of n learners, with optional bagging, each with associated model atom. Ensembling is useful if fit!(machine(atom, data...)) does not create identical models on repeated calls (ie, is a stochastic model, such as a decision tree with randomized node selection criteria), or if bagging_fraction is set to a value less than 1.0, or both. The constructor fails if no atom is specified.

Only atomic models supporting targets with scitype AbstractVector{<:Finite} (univariate classifiers) or AbstractVector{<:Continuous} (univariate regressors) are supported.

If rng is an integer, then MersenneTwister(rng) is the random number generator used for bagging. Otherwise some AbstractRNG object is expected.

The atomic predictions are weighted according to the vector atomic_weights (to allow for external optimization) except in the case that atom is a Deterministic classifier. Uniform atomic weights are used if weight has zero length.

The ensemble model is Deterministic or Probabilistic, according to the corresponding supertype of atom. In the case of deterministic classifiers (target_scitype(atom) <: Abstract{<:Finite}), the predictions are majority votes, and for regressors (target_scitype(atom)<: AbstractVector{<:Continuous}) they are ordinary averages. Probabilistic predictions are obtained by averaging the atomic probability distribution/mass functions; in particular, for regressors, the ensemble prediction on each input pattern has the type MixtureModel{VF,VS,D} from the Distributions.jl package, where D is the type of predicted distribution for atom.

The acceleration keyword argument is used to specify the compute resource (a subtype of ComputationalResources.AbstractResource) that will be used to accelerate/parallelize ensemble fitting.

If a single measure or non-empty vector of measures is specified by out_of_bag_measure, then out-of-bag estimates of performance are written to the trainig report (call report on the trained machine wrapping the ensemble model).

Important: If sample weights w (as opposed to atomic weights) are specified when constructing a machine for the ensemble model, as in mach = machine(ensemble_model, X, y, w), then w is used by any measures specified in out_of_bag_measure that support sample weights.