Model Search

MLJ has a model registry, allowing the user to search models and their properties, without loading all the packages containing model code. In turn, this allows one to efficiently find all models solving a given machine learning task. The task itself is specified with the help of the matching method, and the search executed with the models methods, as detailed below.

For commonly encountered problems with model search, see also Preparing Data.

A table of all models is also given at List of Supported Models.

Model metadata

Terminology. In this section the word "model" refers to a metadata entry in the model registry, as opposed to an actual model struct that such an entry represents. One can obtain such an entry with the info command:

julia> info("PCA")(name = "PCA",
 package_name = "MultivariateStats",
 is_supervised = false,
 abstract_type = Unsupervised,
 deep_properties = (),
 docstring = "```\nPCA\n```\n\nA model type for constructing a pca, ...",
 fit_data_scitype = Tuple{Table{<:AbstractVector{<:Continuous}}},
 human_name = "pca",
 hyperparameter_ranges = (nothing, nothing, nothing, nothing),
 hyperparameter_types =
     ("Int64", "Symbol", "Float64", "Union{Nothing, Real, Vector{Float64}}"),
 hyperparameters = (:maxoutdim, :method, :variance_ratio, :mean),
 implemented_methods =
     [:clean!, :fit, :fitted_params, :inverse_transform, :transform],
 inverse_transform_scitype = Table{<:AbstractVector{<:Continuous}},
 is_pure_julia = true,
 is_wrapper = false,
 iteration_parameter = nothing,
 load_path = "MLJMultivariateStatsInterface.PCA",
 package_license = "MIT",
 package_url = "https://github.com/JuliaStats/MultivariateStats.jl",
 package_uuid = "6f286f6a-111f-5878-ab1e-185364afe411",
 predict_scitype = Unknown,
 prediction_type = :unknown,
 reporting_operations = (),
 reports_feature_importances = false,
 supports_class_weights = false,
 supports_online = false,
 supports_training_losses = false,
 supports_weights = false,
 transform_scitype = Table{<:AbstractVector{<:Continuous}},
 input_scitype = Table{<:AbstractVector{<:Continuous}},
 target_scitype = Unknown,
 output_scitype = Table{<:AbstractVector{<:Continuous}})

So a "model" in the present context is just a named tuple containing metadata, and not an actual model type or instance. If two models with the same name occur in different packages, the package name must be specified, as in info("LinearRegressor", pkg="GLM").

Model document strings can be retreived, without importing the defining code, using the doc function:

doc("DecisionTreeClassifier", pkg="DecisionTree")

General model queries

We list all models (named tuples) using models(), and list the models for which code is already loaded with localmodels():

julia> localmodels()59-element Vector{NamedTuple{(:name, :package_name, :is_supervised, :abstract_type, :deep_properties, :docstring, :fit_data_scitype, :human_name, :hyperparameter_ranges, :hyperparameter_types, :hyperparameters, :implemented_methods, :inverse_transform_scitype, :is_pure_julia, :is_wrapper, :iteration_parameter, :load_path, :package_license, :package_url, :package_uuid, :predict_scitype, :prediction_type, :reporting_operations, :reports_feature_importances, :supports_class_weights, :supports_online, :supports_training_losses, :supports_weights, :transform_scitype, :input_scitype, :target_scitype, :output_scitype)}}:
 (name = AdaBoostStumpClassifier, package_name = DecisionTree, ... )
 (name = BayesianLDA, package_name = MultivariateStats, ... )
 (name = BayesianSubspaceLDA, package_name = MultivariateStats, ... )
 (name = ConstantClassifier, package_name = MLJModels, ... )
 (name = ConstantRegressor, package_name = MLJModels, ... )
 (name = ContinuousEncoder, package_name = MLJModels, ... )
 (name = DBSCAN, package_name = Clustering, ... )
 (name = DecisionTreeClassifier, package_name = DecisionTree, ... )
 (name = DecisionTreeRegressor, package_name = DecisionTree, ... )
 (name = DeterministicConstantClassifier, package_name = MLJModels, ... )
 ⋮
 (name = RidgeRegressor, package_name = MultivariateStats, ... )
 (name = RobustRegressor, package_name = MLJLinearModels, ... )
 (name = Standardizer, package_name = MLJModels, ... )
 (name = SubspaceLDA, package_name = MultivariateStats, ... )
 (name = UnivariateBoxCoxTransformer, package_name = MLJModels, ... )
 (name = UnivariateDiscretizer, package_name = MLJModels, ... )
 (name = UnivariateFillImputer, package_name = MLJModels, ... )
 (name = UnivariateStandardizer, package_name = MLJModels, ... )
 (name = UnivariateTimeTypeToContinuous, package_name = MLJModels, ... )
julia> localmodels()[2](name = "BayesianLDA",
 package_name = "MultivariateStats",
 is_supervised = true,
 abstract_type = Probabilistic,
 deep_properties = (),
 docstring = "```\nBayesianLDA\n```\n\nA model type for constructing...",
 fit_data_scitype =
     Tuple{Table{<:AbstractVector{<:Continuous}}, AbstractVector{<:Finite}},
 human_name = "Bayesian LDA model",
 hyperparameter_ranges =
     (nothing, nothing, nothing, nothing, nothing, nothing),
 hyperparameter_types =
     ("Symbol",
      "StatsBase.CovarianceEstimator",
      "StatsBase.CovarianceEstimator",
      "Int64",
      "Float64",
      "Union{Nothing, Dict{<:Any, <:Real}, CategoricalDistributions.UnivariateFinite{<:Any, <:Any, <:Any, <:Real}}"),
 hyperparameters = (:method, :cov_w, :cov_b, :outdim, :regcoef, :priors),
 implemented_methods = [:clean!, :fit, :fitted_params, :predict, :transform],
 inverse_transform_scitype = Unknown,
 is_pure_julia = true,
 is_wrapper = false,
 iteration_parameter = nothing,
 load_path = "MLJMultivariateStatsInterface.BayesianLDA",
 package_license = "MIT",
 package_url = "https://github.com/JuliaStats/MultivariateStats.jl",
 package_uuid = "6f286f6a-111f-5878-ab1e-185364afe411",
 predict_scitype =
     AbstractVector{ScientificTypesBase.Density{_s25} where _s25<:Finite},
 prediction_type = :probabilistic,
 reporting_operations = (),
 reports_feature_importances = false,
 supports_class_weights = false,
 supports_online = false,
 supports_training_losses = false,
 supports_weights = false,
 transform_scitype = Unknown,
 input_scitype = Table{<:AbstractVector{<:Continuous}},
 target_scitype = AbstractVector{<:Finite},
 output_scitype = Table{<:AbstractVector{<:Continuous}})

One can search for models containing specified strings or regular expressions in their docstring attributes, as in

julia> models("forest")12-element Vector{NamedTuple{(:name, :package_name, :is_supervised, :abstract_type, :deep_properties, :docstring, :fit_data_scitype, :human_name, :hyperparameter_ranges, :hyperparameter_types, :hyperparameters, :implemented_methods, :inverse_transform_scitype, :is_pure_julia, :is_wrapper, :iteration_parameter, :load_path, :package_license, :package_url, :package_uuid, :predict_scitype, :prediction_type, :reporting_operations, :reports_feature_importances, :supports_class_weights, :supports_online, :supports_training_losses, :supports_weights, :transform_scitype, :input_scitype, :target_scitype, :output_scitype)}}:
 (name = GeneralImputer, package_name = BetaML, ... )
 (name = IForestDetector, package_name = OutlierDetectionPython, ... )
 (name = RandomForestClassifier, package_name = DecisionTree, ... )
 (name = RandomForestClassifier, package_name = MLJScikitLearnInterface, ... )
 (name = RandomForestImputer, package_name = BetaML, ... )
 (name = RandomForestRegressor, package_name = BetaML, ... )
 (name = RandomForestRegressor, package_name = DecisionTree, ... )
 (name = RandomForestRegressor, package_name = MLJScikitLearnInterface, ... )
 (name = StableForestClassifier, package_name = SIRUS, ... )
 (name = StableForestRegressor, package_name = SIRUS, ... )
 (name = StableRulesClassifier, package_name = SIRUS, ... )
 (name = StableRulesRegressor, package_name = SIRUS, ... )

or by specifying a filter (Bool-valued function):

julia> filter(model) = model.is_supervised &&
                       model.input_scitype >: MLJ.Table(Continuous) &&
                       model.target_scitype >: AbstractVector{<:Multiclass{3}} &&
                       model.prediction_type == :deterministicfilter (generic function with 1 method)
julia> models(filter)12-element Vector{NamedTuple{(:name, :package_name, :is_supervised, :abstract_type, :deep_properties, :docstring, :fit_data_scitype, :human_name, :hyperparameter_ranges, :hyperparameter_types, :hyperparameters, :implemented_methods, :inverse_transform_scitype, :is_pure_julia, :is_wrapper, :iteration_parameter, :load_path, :package_license, :package_url, :package_uuid, :predict_scitype, :prediction_type, :reporting_operations, :reports_feature_importances, :supports_class_weights, :supports_online, :supports_training_losses, :supports_weights, :transform_scitype, :input_scitype, :target_scitype, :output_scitype)}}:
 (name = DeterministicConstantClassifier, package_name = MLJModels, ... )
 (name = LinearSVC, package_name = LIBSVM, ... )
 (name = NuSVC, package_name = LIBSVM, ... )
 (name = PassiveAggressiveClassifier, package_name = MLJScikitLearnInterface, ... )
 (name = PerceptronClassifier, package_name = MLJScikitLearnInterface, ... )
 (name = RidgeCVClassifier, package_name = MLJScikitLearnInterface, ... )
 (name = RidgeClassifier, package_name = MLJScikitLearnInterface, ... )
 (name = SGDClassifier, package_name = MLJScikitLearnInterface, ... )
 (name = SVC, package_name = LIBSVM, ... )
 (name = SVMClassifier, package_name = MLJScikitLearnInterface, ... )
 (name = SVMLinearClassifier, package_name = MLJScikitLearnInterface, ... )
 (name = SVMNuClassifier, package_name = MLJScikitLearnInterface, ... )

Multiple test arguments may be passed to models, which are applied conjunctively.

Matching models to data

Common searches are streamlined with the help of the matching command, defined as follows:

matching(model, X, y) == true exactly when model is supervised and admits inputs and targets with the scientific types of X and y, respectively
matching(model, X) == true exactly when model is unsupervised and admits inputs with the scientific types of X.

So, to search for all supervised probabilistic models handling input X and target y, one can define the testing function task by

task(model) = matching(model, X, y) && model.prediction_type == :probabilistic

And execute the search with

models(task)

Also defined are Bool-valued callable objects matching(model), matching(X, y) and matching(X), with obvious behavior. For example, matching(X, y)(model) = matching(model, X, y).

So, to search for all models compatible with input X and target y, for example, one executes

models(matching(X, y))

while the preceding search can also be written

models() do model
    matching(model, X, y) &&
    model.prediction_type == :probabilistic
end

API

MLJModels.models — Function

models()

List all models in the MLJ registry. Here and below model means the registry metadata entry for a genuine model type (a proxy for types whose defining code may not be loaded).

models(filters..)

List all models m for which filter(m) is true, for each filter in filters.

models(matching(X, y))

List all supervised models compatible with training data X, y.

models(matching(X))

List all unsupervised models compatible with training data X.

Excluded in the listings are the built-in model-wraps, like EnsembleModel, TunedModel, and IteratedModel.

Example

task(model) = model.is_supervised && model.is_probabilistic

then models(task) lists all supervised models making probabilistic predictions.