# Model Search

MLJ has a model registry, allowing the user to search models and their properties, without loading all the packages containing model code. In turn, this allows one to efficiently find all models solving a given machine learning task. The task itself is specified with the help of the `matching`

method, and the search executed with the `models`

methods, as detailed below.

For commonly encountered problems with model search, see also Preparing Data.

A table of all models is also given at List of Supported Models.

## Model metadata

*Terminology.* In this section the word "model" refers to a metadata entry in the model registry, as opposed to an actual model `struct`

that such an entry represents. One can obtain such an entry with the `info`

command:

```
julia> info("PCA")
Principal component analysis. Learns a linear transformation to
project the data on a lower dimensional space while preserving most of the initial
variance.
→ based on [MultivariateStats](https://github.com/JuliaStats/MultivariateStats.jl).
→ do `@load PCA pkg="MultivariateStats"` to use the model.
→ do `?PCA` for documentation.
(name = "PCA",
package_name = "MultivariateStats",
is_supervised = false,
abstract_type = Unsupervised,
deep_properties = (),
docstring = " Principal component analysis. Learns a linear transformation to\nproject the data on a lower dimensional space while preserving most of the initial\nvariance.\n\n→ based on [MultivariateStats](https://github.com/JuliaStats/MultivariateStats.jl).\n→ do `@load PCA pkg=\"MultivariateStats\"` to use the model.\n→ do `?PCA` for documentation.",
fit_data_scitype = Tuple{Table{_s48} where _s48<:(AbstractVector{_s47} where _s47<:Continuous)},
hyperparameter_ranges = (nothing, nothing, nothing, nothing),
hyperparameter_types = ("Int64", "Symbol", "Float64", "Union{Nothing, Real, Vector{Float64}}"),
hyperparameters = (:maxoutdim, :method, :pratio, :mean),
implemented_methods = [:clean!, :fit, :fitted_params, :inverse_transform, :transform],
inverse_transform_scitype = Table{_s48} where _s48<:(AbstractVector{_s47} where _s47<:Continuous),
is_pure_julia = true,
is_wrapper = false,
iteration_parameter = nothing,
load_path = "MLJMultivariateStatsInterface.PCA",
package_license = "MIT",
package_url = "https://github.com/JuliaStats/MultivariateStats.jl",
package_uuid = "6f286f6a-111f-5878-ab1e-185364afe411",
predict_scitype = Unknown,
prediction_type = :unknown,
supports_class_weights = false,
supports_online = false,
supports_training_losses = false,
supports_weights = false,
transform_scitype = Table{_s48} where _s48<:(AbstractVector{_s47} where _s47<:Continuous),
input_scitype = Table{_s48} where _s48<:(AbstractVector{_s47} where _s47<:Continuous),
target_scitype = Unknown,
output_scitype = Table{_s48} where _s48<:(AbstractVector{_s47} where _s47<:Continuous),)
```

So a "model" in the present context is just a named tuple containing metadata, and not an actual model type or instance. If two models with the same name occur in different packages, the package name must be specified, as in `info("LinearRegressor", pkg="GLM")`

.

## General model queries

We list all models (named tuples) using `models()`

, and list the models for which code is already loaded with `localmodels()`

:

```
julia> localmodels()
55-element Vector{NamedTuple{(:name, :package_name, :is_supervised, :abstract_type, :deep_properties, :docstring, :fit_data_scitype, :hyperparameter_ranges, :hyperparameter_types, :hyperparameters, :implemented_methods, :inverse_transform_scitype, :is_pure_julia, :is_wrapper, :iteration_parameter, :load_path, :package_license, :package_url, :package_uuid, :predict_scitype, :prediction_type, :supports_class_weights, :supports_online, :supports_training_losses, :supports_weights, :transform_scitype, :input_scitype, :target_scitype, :output_scitype), T} where T<:Tuple}:
(name = AdaBoostStumpClassifier, package_name = DecisionTree, ... )
(name = BayesianLDA, package_name = MultivariateStats, ... )
(name = BayesianSubspaceLDA, package_name = MultivariateStats, ... )
(name = ConstantClassifier, package_name = MLJModels, ... )
(name = ConstantRegressor, package_name = MLJModels, ... )
(name = ContinuousEncoder, package_name = MLJModels, ... )
(name = DecisionTreeClassifier, package_name = DecisionTree, ... )
(name = DecisionTreeRegressor, package_name = DecisionTree, ... )
(name = DeterministicConstantClassifier, package_name = MLJModels, ... )
(name = DeterministicConstantRegressor, package_name = MLJModels, ... )
⋮
(name = RidgeRegressor, package_name = MultivariateStats, ... )
(name = RobustRegressor, package_name = MLJLinearModels, ... )
(name = Standardizer, package_name = MLJModels, ... )
(name = SubspaceLDA, package_name = MultivariateStats, ... )
(name = UnivariateBoxCoxTransformer, package_name = MLJModels, ... )
(name = UnivariateDiscretizer, package_name = MLJModels, ... )
(name = UnivariateFillImputer, package_name = MLJModels, ... )
(name = UnivariateStandardizer, package_name = MLJModels, ... )
(name = UnivariateTimeTypeToContinuous, package_name = MLJModels, ... )
julia> localmodels()[2]
Bayesian Multiclass linear discriminant analysis. The algorithm
learns a projection matrix `P` that projects a feature matrix `Xtrain` onto a lower
dimensional space of dimension `out_dim` such that the trace of the transformed
between-class scatter matrix(`Pᵀ*Sb*P`) is maximized relative to the trace of the
transformed within-class scatter matrix (`Pᵀ*Sw*P`). The projection matrix is scaled
such that `Pᵀ*Sw*P = n` or `Pᵀ*Σw*P=I` (Where `n` is the number of training samples
and `Σw` is the within-class covariance matrix).
Predicted class posterior probability distibution are derived by applying Bayes rule
with a multivariate Gaussian class-conditional distribution.
→ based on [MultivariateStats](https://github.com/JuliaStats/MultivariateStats.jl).
→ do `@load BayesianLDA pkg="MultivariateStats"` to use the model.
→ do `?BayesianLDA` for documentation.
(name = "BayesianLDA",
package_name = "MultivariateStats",
is_supervised = true,
abstract_type = Probabilistic,
deep_properties = (),
docstring = " Bayesian Multiclass linear discriminant analysis. The algorithm\nlearns a projection matrix `P` that projects a feature matrix `Xtrain` onto a lower\ndimensional space of dimension `out_dim` such that the trace of the transformed\nbetween-class scatter matrix(`Pᵀ*Sb*P`) is maximized relative to the trace of the\ntransformed within-class scatter matrix (`Pᵀ*Sw*P`). The projection matrix is scaled \nsuch that `Pᵀ*Sw*P = n` or `Pᵀ*Σw*P=I` (Where `n` is the number of training samples \nand `Σw` is the within-class covariance matrix).\nPredicted class posterior probability distibution are derived by applying Bayes rule \nwith a multivariate Gaussian class-conditional distribution.\n\n→ based on [MultivariateStats](https://github.com/JuliaStats/MultivariateStats.jl).\n→ do `@load BayesianLDA pkg=\"MultivariateStats\"` to use the model.\n→ do `?BayesianLDA` for documentation.",
fit_data_scitype = Tuple{Table{_s48} where _s48<:(AbstractVector{_s47} where _s47<:Continuous), AbstractVector{_s90} where _s90<:Finite},
hyperparameter_ranges = (nothing, nothing, nothing, nothing, nothing, nothing),
hyperparameter_types = ("Symbol", "StatsBase.CovarianceEstimator", "StatsBase.CovarianceEstimator", "Int64", "Float64", "Union{Nothing, Vector{Float64}}"),
hyperparameters = (:method, :cov_w, :cov_b, :out_dim, :regcoef, :priors),
implemented_methods = [:clean!, :fit, :fitted_params, :predict, :transform],
inverse_transform_scitype = Unknown,
is_pure_julia = true,
is_wrapper = false,
iteration_parameter = nothing,
load_path = "MLJMultivariateStatsInterface.BayesianLDA",
package_license = "MIT",
package_url = "https://github.com/JuliaStats/MultivariateStats.jl",
package_uuid = "6f286f6a-111f-5878-ab1e-185364afe411",
predict_scitype = AbstractVector{ScientificTypesBase.Density{_s25} where _s25<:Finite},
prediction_type = :probabilistic,
supports_class_weights = false,
supports_online = false,
supports_training_losses = false,
supports_weights = false,
transform_scitype = Unknown,
input_scitype = Table{_s48} where _s48<:(AbstractVector{_s47} where _s47<:Continuous),
target_scitype = AbstractVector{_s90} where _s90<:Finite,
output_scitype = Table{_s48} where _s48<:(AbstractVector{_s47} where _s47<:Continuous),)
```

One can search for models containing specified strings or regular expressions in their `docstring`

attributes, as in

```
julia> models("forest")
4-element Vector{NamedTuple{(:name, :package_name, :is_supervised, :abstract_type, :deep_properties, :docstring, :fit_data_scitype, :hyperparameter_ranges, :hyperparameter_types, :hyperparameters, :implemented_methods, :inverse_transform_scitype, :is_pure_julia, :is_wrapper, :iteration_parameter, :load_path, :package_license, :package_url, :package_uuid, :predict_scitype, :prediction_type, :supports_class_weights, :supports_online, :supports_training_losses, :supports_weights, :transform_scitype, :input_scitype, :target_scitype, :output_scitype), T} where T<:Tuple}:
(name = RandomForestClassifier, package_name = DecisionTree, ... )
(name = RandomForestClassifier, package_name = ScikitLearn, ... )
(name = RandomForestRegressor, package_name = DecisionTree, ... )
(name = RandomForestRegressor, package_name = ScikitLearn, ... )
```

or by specifying a filter (`Bool`

-valued function):

```
julia> filter(model) = model.is_supervised &&
model.input_scitype >: MLJ.Table(Continuous) &&
model.target_scitype >: AbstractVector{<:Multiclass{3}} &&
model.prediction_type == :deterministic
filter (generic function with 1 method)
julia> models(filter)
12-element Vector{NamedTuple{(:name, :package_name, :is_supervised, :abstract_type, :deep_properties, :docstring, :fit_data_scitype, :hyperparameter_ranges, :hyperparameter_types, :hyperparameters, :implemented_methods, :inverse_transform_scitype, :is_pure_julia, :is_wrapper, :iteration_parameter, :load_path, :package_license, :package_url, :package_uuid, :predict_scitype, :prediction_type, :supports_class_weights, :supports_online, :supports_training_losses, :supports_weights, :transform_scitype, :input_scitype, :target_scitype, :output_scitype), T} where T<:Tuple}:
(name = DeterministicConstantClassifier, package_name = MLJModels, ... )
(name = LinearSVC, package_name = LIBSVM, ... )
(name = NuSVC, package_name = LIBSVM, ... )
(name = PassiveAggressiveClassifier, package_name = ScikitLearn, ... )
(name = PerceptronClassifier, package_name = ScikitLearn, ... )
(name = RidgeCVClassifier, package_name = ScikitLearn, ... )
(name = RidgeClassifier, package_name = ScikitLearn, ... )
(name = SGDClassifier, package_name = ScikitLearn, ... )
(name = SVC, package_name = LIBSVM, ... )
(name = SVMClassifier, package_name = ScikitLearn, ... )
(name = SVMLinearClassifier, package_name = ScikitLearn, ... )
(name = SVMNuClassifier, package_name = ScikitLearn, ... )
```

Multiple test arguments may be passed to `models`

, which are applied conjunctively.

## Matching models to data

Common searches are streamlined with the help of the `matching`

command, defined as follows:

`matching(model, X, y) == true`

exactly when`model`

is supervised and admits inputs and targets with the scientific types of`X`

and`y`

, respectively`matching(model, X) == true`

exactly when`model`

is unsupervised and admits inputs with the scientific types of`X`

.

So, to search for all supervised probabilistic models handling input `X`

and target `y`

, one can define the testing function `task`

by

`task(model) = matching(model, X, y) && model.prediction_type == :probabilistic`

And execute the search with

`models(task)`

Also defined are `Bool`

-valued callable objects `matching(model)`

, `matching(X, y)`

and `matching(X)`

, with obvious behaviour. For example, `matching(X, y)(model) = matching(model, X, y)`

.

So, to search for all models compatible with input `X`

and target `y`

, for example, one executes

`models(matching(X, y))`

while the preceding search can also be written

```
models() do model
matching(model, X, y) &&
model.prediction_type == :probabilistic
end
```

## API

`MLJModels.models`

— Function`models()`

List all models in the MLJ registry. Here and below *model* means the registry metadata entry for a genuine model type (a proxy for types whose defining code may not be loaded).

`models(filters..)`

List all models `m`

for which `filter(m)`

is true, for each `filter`

in `filters`

.

`models(matching(X, y))`

List all supervised models compatible with training data `X`

, `y`

.

`models(matching(X))`

List all unsupervised models compatible with training data `X`

.

Excluded in the listings are the built-in model-wraps, like `EnsembleModel`

, `TunedModel`

, and `IteratedModel`

.

**Example**

If

`task(model) = model.is_supervised && model.is_probabilistic`

then `models(task)`

lists all supervised models making probabilistic predictions.

See also: `localmodels`

.

`models(needle::Union{AbstractString,Regex})`

List all models whole `name`

or `docstring`

matches a given `needle`

.

`MLJModels.localmodels`

— Function```
localmodels(; modl=Main)
localmodels(filters...; modl=Main)
localmodels(needle::Union{AbstractString,Regex}; modl=Main)
```

List all models currently available to the user from the module `modl`

without importing a package, and which additional pass through the specified filters. Here a *filter* is a `Bool`

-valued function on models.

Use `load_path`

to get the path to some model returned, as in these examples:

```
ms = localmodels()
model = ms[1]
load_path(model)
```