# Transformers and Other Unsupervised Models

Several unsupervised models used to perform common transformations, such as one-hot encoding, are available in MLJ out-of-the-box. These are detailed in Built-in transformers below.

A transformer is *static* if it has no learned parameters. While such a transformer is tantamount to an ordinary function, realizing it as an MLJ static transformer (a subtype of `Static <: Unsupervised`

) can be useful, especially if the function depends on parameters the user would like to manipulate (which become *hyper-parameters* of the model). The necessary syntax for defining your own static transformers is described in Static transformers below.

Some unsupervised models, such as clustering algorithms, have a `predict`

method in addition to a `transform`

method. We give an example of this in Transformers that also predict

Finally, we note that models that fit a distribution, or more generally a sampler object, to some data, which are sometimes viewed as unsupervised, are treated in MLJ as *supervised* models. See Models that learn a probability distribution for an example.

## Built-in transformers

`MLJModels.Standardizer`

— Type`Standardizer`

A model type for constructing a standardizer, based on MLJModels.jl, and implementing the MLJ model interface.

From MLJ, the type can be imported using

`Standardizer = @load Standardizer pkg=MLJModels`

Do `model = Standardizer()`

to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in `Standardizer(features=...)`

.

Use this model to standardize (whiten) a `Continuous`

vector, or relevant columns of a table. The rescalings applied by this transformer to new data are always those learned during the training phase, which are generally different from what would actually standardize the new data.

**Training data**

In MLJ or MLJBase, bind an instance `model`

to data with

`mach = machine(model, X)`

where

`X`

: any Tables.jl compatible table or any abstract vector with`Continuous`

element scitype (any abstract float vector). Only features in a table with`Continuous`

scitype can be standardized; check column scitypes with`schema(X)`

.

Train the machine using `fit!(mach, rows=...)`

.

**Hyper-parameters**

`features`

: one of the following, with the behavior indicated below:`[]`

(empty, the default): standardize all features (columns) having`Continuous`

element scitypenon-empty vector of feature names (symbols): standardize only the

`Continuous`

features in the vector (if`ignore=false`

) or`Continuous`

features*not*named in the vector (`ignore=true`

).function or other callable: standardize a feature if the callable returns

`true`

on its name. For example,`Standardizer(features = name -> name in [:x1, :x3], ignore = true, count=true)`

has the same effect as`Standardizer(features = [:x1, :x3], ignore = true, count=true)`

, namely to standardize all`Continuous`

and`Count`

features, with the exception of`:x1`

and`:x3`

.

Note this behavior is further modified if the

`ordered_factor`

or`count`

flags are set to`true`

; see below`ignore=false`

: whether to ignore or standardize specified`features`

, as explained above`ordered_factor=false`

: if`true`

, standardize any`OrderedFactor`

feature wherever a`Continuous`

feature would be standardized, as described above`count=false`

: if`true`

, standardize any`Count`

feature wherever a`Continuous`

feature would be standardized, as described above

**Operations**

`transform(mach, Xnew)`

: return`Xnew`

with relevant features standardized according to the rescalings learned during fitting of`mach`

.`inverse_transform(mach, Z)`

: apply the inverse transformation to`Z`

, so that`inverse_transform(mach, transform(mach, Xnew))`

is approximately the same as`Xnew`

; unavailable if`ordered_factor`

or`count`

flags were set to`true`

.

**Fitted parameters**

The fields of `fitted_params(mach)`

are:

`features_fit`

- the names of features that will be standardized`means`

- the corresponding untransformed mean values`stds`

- the corresponding untransformed standard deviations

**Report**

The fields of `report(mach)`

are:

`features_fit`

: the names of features that will be standardized

**Examples**

```
using MLJ
X = (ordinal1 = [1, 2, 3],
ordinal2 = coerce([:x, :y, :x], OrderedFactor),
ordinal3 = [10.0, 20.0, 30.0],
ordinal4 = [-20.0, -30.0, -40.0],
nominal = coerce(["Your father", "he", "is"], Multiclass));
julia> schema(X)
┌──────────┬──────────────────┐
│ names │ scitypes │
├──────────┼──────────────────┤
│ ordinal1 │ Count │
│ ordinal2 │ OrderedFactor{2} │
│ ordinal3 │ Continuous │
│ ordinal4 │ Continuous │
│ nominal │ Multiclass{3} │
└──────────┴──────────────────┘
stand1 = Standardizer();
julia> transform(fit!(machine(stand1, X)), X)
(ordinal1 = [1, 2, 3],
ordinal2 = CategoricalValue{Symbol,UInt32}[:x, :y, :x],
ordinal3 = [-1.0, 0.0, 1.0],
ordinal4 = [1.0, 0.0, -1.0],
nominal = CategoricalValue{String,UInt32}["Your father", "he", "is"],)
stand2 = Standardizer(features=[:ordinal3, ], ignore=true, count=true);
julia> transform(fit!(machine(stand2, X)), X)
(ordinal1 = [-1.0, 0.0, 1.0],
ordinal2 = CategoricalValue{Symbol,UInt32}[:x, :y, :x],
ordinal3 = [10.0, 20.0, 30.0],
ordinal4 = [1.0, 0.0, -1.0],
nominal = CategoricalValue{String,UInt32}["Your father", "he", "is"],)
```

See also `OneHotEncoder`

, `ContinuousEncoder`

.

`MLJModels.OneHotEncoder`

— Type`OneHotEncoder`

A model type for constructing a one-hot encoder, based on MLJModels.jl, and implementing the MLJ model interface.

From MLJ, the type can be imported using

`OneHotEncoder = @load OneHotEncoder pkg=MLJModels`

Do `model = OneHotEncoder()`

to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in `OneHotEncoder(features=...)`

.

Use this model to one-hot encode the `Multiclass`

and `OrderedFactor`

features (columns) of some table, leaving other columns unchanged.

New data to be transformed may lack features present in the fit data, but no *new* features can be present.

**Warning:** This transformer assumes that `levels(col)`

for any `Multiclass`

or `OrderedFactor`

column, `col`

, is the same for training data and new data to be transformed.

To ensure *all* features are transformed into `Continuous`

features, or dropped, use `ContinuousEncoder`

instead.

**Training data**

In MLJ or MLJBase, bind an instance `model`

to data with

`mach = machine(model, X)`

where

`X`

: any Tables.jl compatible table. Columns can be of mixed type but only those with element scitype`Multiclass`

or`OrderedFactor`

can be encoded. Check column scitypes with`schema(X)`

.

Train the machine using `fit!(mach, rows=...)`

.

**Hyper-parameters**

`features`

: a vector of symbols (column names). If empty (default) then all`Multiclass`

and`OrderedFactor`

features are encoded. Otherwise, encoding is further restricted to the specified features (`ignore=false`

) or the unspecified features (`ignore=true`

). This default behavior can be modified by the`ordered_factor`

flag.`ordered_factor=false`

: when`true`

,`OrderedFactor`

features are universally excluded`drop_last=true`

: whether to drop the column corresponding to the final class of encoded features. For example, a three-class feature is spawned into three new features if`drop_last=false`

, but just two features otherwise.

**Fitted parameters**

The fields of `fitted_params(mach)`

are:

`all_features`

: names of all features encountered in training`fitted_levels_given_feature`

: dictionary of the levels associated with each feature encoded, keyed on the feature name`ref_name_pairs_given_feature`

: dictionary of pairs`r => ftr`

(such as`0x00000001 => :grad__A`

) where`r`

is a CategoricalArrays.jl reference integer representing a level, and`ftr`

the corresponding new feature name; the dictionary is keyed on the names of features that are encoded

**Report**

The fields of `report(mach)`

are:

`features_to_be_encoded`

: names of input features to be encoded`new_features`

: names of all output features

**Example**

```
using MLJ
X = (name=categorical(["Danesh", "Lee", "Mary", "John"]),
grade=categorical(["A", "B", "A", "C"], ordered=true),
height=[1.85, 1.67, 1.5, 1.67],
n_devices=[3, 2, 4, 3])
julia> schema(X)
┌───────────┬──────────────────┐
│ names │ scitypes │
├───────────┼──────────────────┤
│ name │ Multiclass{4} │
│ grade │ OrderedFactor{3} │
│ height │ Continuous │
│ n_devices │ Count │
└───────────┴──────────────────┘
hot = OneHotEncoder(drop_last=true)
mach = fit!(machine(hot, X))
W = transform(mach, X)
julia> schema(W)
┌──────────────┬────────────┐
│ names │ scitypes │
├──────────────┼────────────┤
│ name__Danesh │ Continuous │
│ name__John │ Continuous │
│ name__Lee │ Continuous │
│ grade__A │ Continuous │
│ grade__B │ Continuous │
│ height │ Continuous │
│ n_devices │ Count │
└──────────────┴────────────┘
```

See also `ContinuousEncoder`

.

`MLJModels.ContinuousEncoder`

— Type`ContinuousEncoder`

A model type for constructing a continuous encoder, based on MLJModels.jl, and implementing the MLJ model interface.

From MLJ, the type can be imported using

`ContinuousEncoder = @load ContinuousEncoder pkg=MLJModels`

Do `model = ContinuousEncoder()`

to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in `ContinuousEncoder(drop_last=...)`

.

Use this model to arrange all features (columns) of a table to have `Continuous`

element scitype, by applying the following protocol to each feature `ftr`

:

If

`ftr`

is already`Continuous`

retain it.If

`ftr`

is`Multiclass`

, one-hot encode it.If

`ftr`

is`OrderedFactor`

, replace it with`coerce(ftr, Continuous)`

(vector of floating point integers), unless`ordered_factors=false`

is specified, in which case one-hot encode it.If

`ftr`

is`Count`

, replace it with`coerce(ftr, Continuous)`

.If

`ftr`

has some other element scitype, or was not observed in fitting the encoder, drop it from the table.

**Warning:** This transformer assumes that `levels(col)`

for any `Multiclass`

or `OrderedFactor`

column, `col`

, is the same for training data and new data to be transformed.

To selectively one-hot-encode categorical features (without dropping columns) use `OneHotEncoder`

instead.

**Training data**

In MLJ or MLJBase, bind an instance `model`

to data with

`mach = machine(model, X)`

where

`X`

: any Tables.jl compatible table. Columns can be of mixed type but only those with element scitype`Multiclass`

or`OrderedFactor`

can be encoded. Check column scitypes with`schema(X)`

.

Train the machine using `fit!(mach, rows=...)`

.

**Hyper-parameters**

`drop_last=true`

: whether to drop the column corresponding to the final class of one-hot encoded features. For example, a three-class feature is spawned into three new features if`drop_last=false`

, but two just features otherwise.`one_hot_ordered_factors=false`

: whether to one-hot any feature with`OrderedFactor`

element scitype, or to instead coerce it directly to a (single)`Continuous`

feature using the order

**Fitted parameters**

The fields of `fitted_params(mach)`

are:

`features_to_keep`

: names of features that will not be dropped from the table`one_hot_encoder`

: the`OneHotEncoder`

model instance for handling the one-hot encoding`one_hot_encoder_fitresult`

: the fitted parameters of the`OneHotEncoder`

model

**Report**

`features_to_keep`

: names of input features that will not be dropped from the table`new_features`

: names of all output features

**Example**

```
X = (name=categorical(["Danesh", "Lee", "Mary", "John"]),
grade=categorical(["A", "B", "A", "C"], ordered=true),
height=[1.85, 1.67, 1.5, 1.67],
n_devices=[3, 2, 4, 3],
comments=["the force", "be", "with you", "too"])
julia> schema(X)
┌───────────┬──────────────────┐
│ names │ scitypes │
├───────────┼──────────────────┤
│ name │ Multiclass{4} │
│ grade │ OrderedFactor{3} │
│ height │ Continuous │
│ n_devices │ Count │
│ comments │ Textual │
└───────────┴──────────────────┘
encoder = ContinuousEncoder(drop_last=true)
mach = fit!(machine(encoder, X))
W = transform(mach, X)
julia> schema(W)
┌──────────────┬────────────┐
│ names │ scitypes │
├──────────────┼────────────┤
│ name__Danesh │ Continuous │
│ name__John │ Continuous │
│ name__Lee │ Continuous │
│ grade │ Continuous │
│ height │ Continuous │
│ n_devices │ Continuous │
└──────────────┴────────────┘
julia> setdiff(schema(X).names, report(mach).features_to_keep) # dropped features
1-element Vector{Symbol}:
:comments
```

See also `OneHotEncoder`

`MLJModels.FillImputer`

— Type`FillImputer`

A model type for constructing a fill imputer, based on MLJModels.jl, and implementing the MLJ model interface.

From MLJ, the type can be imported using

`FillImputer = @load FillImputer pkg=MLJModels`

Do `model = FillImputer()`

to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in `FillImputer(features=...)`

.

Use this model to impute `missing`

values in tabular data. A fixed "filler" value is learned from the training data, one for each column of the table.

For imputing missing values in a vector, use `UnivariateFillImputer`

instead.

**Training data**

In MLJ or MLJBase, bind an instance `model`

to data with

`mach = machine(model, X)`

where

`X`

: any table of input features (eg, a`DataFrame`

) whose columns each have element scitypes`Union{Missing, T}`

, where`T`

is a subtype of`Continuous`

,`Multiclass`

,`OrderedFactor`

or`Count`

. Check scitypes with`schema(X)`

.

Train the machine using `fit!(mach, rows=...)`

.

**Hyper-parameters**

`features`

: a vector of names of features (symbols) for which imputation is to be attempted; default is empty, which is interpreted as "impute all".`continuous_fill`

: function or other callable to determine value to be imputed in the case of`Continuous`

(abstract float) data; default is to apply`median`

after skipping`missing`

values`count_fill`

: function or other callable to determine value to be imputed in the case of`Count`

(integer) data; default is to apply rounded`median`

after skipping`missing`

values`finite_fill`

: function or other callable to determine value to be imputed in the case of`Multiclass`

or`OrderedFactor`

data (categorical vectors); default is to apply`mode`

after skipping`missing`

values

**Operations**

`transform(mach, Xnew)`

: return`Xnew`

with missing values imputed with the fill values learned when fitting`mach`

**Fitted parameters**

The fields of `fitted_params(mach)`

are:

`features_seen_in_fit`

: the names of features (columns) encountered during training`univariate_transformer`

: the univariate model applied to determine the fillers (it's fields contain the functions defining the filler computations)`filler_given_feature`

: dictionary of filler values, keyed on feature (column) names

**Examples**

```
using MLJ
imputer = FillImputer()
X = (a = [1.0, 2.0, missing, 3.0, missing],
b = coerce(["y", "n", "y", missing, "y"], Multiclass),
c = [1, 1, 2, missing, 3])
schema(X)
julia> schema(X)
┌───────┬───────────────────────────────┐
│ names │ scitypes │
├───────┼───────────────────────────────┤
│ a │ Union{Missing, Continuous} │
│ b │ Union{Missing, Multiclass{2}} │
│ c │ Union{Missing, Count} │
└───────┴───────────────────────────────┘
mach = machine(imputer, X)
fit!(mach)
julia> fitted_params(mach).filler_given_feature
(filler = 2.0,)
julia> fitted_params(mach).filler_given_feature
Dict{Symbol, Any} with 3 entries:
:a => 2.0
:b => "y"
:c => 2
julia> transform(mach, X)
(a = [1.0, 2.0, 2.0, 3.0, 2.0],
b = CategoricalValue{String, UInt32}["y", "n", "y", "y", "y"],
c = [1, 1, 2, 2, 3],)
```

See also `UnivariateFillImputer`

.

`MLJModels.UnivariateFillImputer`

— Type`UnivariateFillImputer`

A model type for constructing a single variable fill imputer, based on MLJModels.jl, and implementing the MLJ model interface.

From MLJ, the type can be imported using

`UnivariateFillImputer = @load UnivariateFillImputer pkg=MLJModels`

Do `model = UnivariateFillImputer()`

to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in `UnivariateFillImputer(continuous_fill=...)`

.

Use this model to imputing `missing`

values in a vector with a fixed value learned from the non-missing values of training vector.

For imputing missing values in tabular data, use `FillImputer`

instead.

**Training data**

In MLJ or MLJBase, bind an instance `model`

to data with

`mach = machine(model, x)`

where

`x`

: any abstract vector with element scitype`Union{Missing, T}`

where`T`

is a subtype of`Continuous`

,`Multiclass`

,`OrderedFactor`

or`Count`

; check scitype using`scitype(x)`

Train the machine using `fit!(mach, rows=...)`

.

**Hyper-parameters**

`continuous_fill`

: function or other callable to determine value to be imputed in the case of`Continuous`

(abstract float) data; default is to apply`median`

after skipping`missing`

values`count_fill`

: function or other callable to determine value to be imputed in the case of`Count`

(integer) data; default is to apply rounded`median`

after skipping`missing`

values`finite_fill`

: function or other callable to determine value to be imputed in the case of`Multiclass`

or`OrderedFactor`

data (categorical vectors); default is to apply`mode`

after skipping`missing`

values

**Operations**

`transform(mach, xnew)`

: return`xnew`

with missing values imputed with the fill values learned when fitting`mach`

**Fitted parameters**

The fields of `fitted_params(mach)`

are:

`filler`

: the fill value to be imputed in all new data

**Examples**

```
using MLJ
imputer = UnivariateFillImputer()
x_continuous = [1.0, 2.0, missing, 3.0]
x_multiclass = coerce(["y", "n", "y", missing, "y"], Multiclass)
x_count = [1, 1, 1, 2, missing, 3, 3]
mach = machine(imputer, x_continuous)
fit!(mach)
julia> fitted_params(mach)
(filler = 2.0,)
julia> transform(mach, [missing, missing, 101.0])
3-element Vector{Float64}:
2.0
2.0
101.0
mach2 = machine(imputer, x_multiclass) |> fit!
julia> transform(mach2, x_multiclass)
5-element CategoricalArray{String,1,UInt32}:
"y"
"n"
"y"
"y"
"y"
mach3 = machine(imputer, x_count) |> fit!
julia> transform(mach3, [missing, missing, 5])
3-element Vector{Int64}:
2
2
5
```

For imputing tabular data, use `FillImputer`

.

`MLJModels.FeatureSelector`

— Type`FeatureSelector`

A model type for constructing a feature selector, based on MLJModels.jl, and implementing the MLJ model interface.

From MLJ, the type can be imported using

`FeatureSelector = @load FeatureSelector pkg=MLJModels`

Do `model = FeatureSelector()`

to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in `FeatureSelector(features=...)`

.

Use this model to select features (columns) of a table, usually as part of a model `Pipeline`

.

**Training data**

In MLJ or MLJBase, bind an instance `model`

to data with

`mach = machine(model, X)`

where

`X`

: any table of input features, where "table" is in the sense of Tables.jl

Train the machine using `fit!(mach, rows=...)`

.

**Hyper-parameters**

`features`

: one of the following, with the behavior indicated:`[]`

(empty, the default): filter out all features (columns) which were not encountered in trainingnon-empty vector of feature names (symbols): keep only the specified features (

`ignore=false`

) or keep only unspecified features (`ignore=true`

)function or other callable: keep a feature if the callable returns

`true`

on its name. For example, specifying`FeatureSelector(features = name -> name in [:x1, :x3], ignore = true)`

has the same effect as`FeatureSelector(features = [:x1, :x3], ignore = true)`

, namely to select all features, with the exception of`:x1`

and`:x3`

.

`ignore`

: whether to ignore or keep specified`features`

, as explained above

**Operations**

`transform(mach, Xnew)`

: select features from the table`Xnew`

as specified by the model, taking features seen during training into account, if relevant

**Fitted parameters**

The fields of `fitted_params(mach)`

are:

`features_to_keep`

: the features that will be selected

**Example**

```
using MLJ
X = (ordinal1 = [1, 2, 3],
ordinal2 = coerce(["x", "y", "x"], OrderedFactor),
ordinal3 = [10.0, 20.0, 30.0],
ordinal4 = [-20.0, -30.0, -40.0],
nominal = coerce(["Your father", "he", "is"], Multiclass));
selector = FeatureSelector(features=[:ordinal3, ], ignore=true);
julia> transform(fit!(machine(selector, X)), X)
(ordinal1 = [1, 2, 3],
ordinal2 = CategoricalValue{Symbol,UInt32}["x", "y", "x"],
ordinal4 = [-20.0, -30.0, -40.0],
nominal = CategoricalValue{String,UInt32}["Your father", "he", "is"],)
```

`MLJModels.UnivariateBoxCoxTransformer`

— Type`UnivariateBoxCoxTransformer`

A model type for constructing a single variable Box-Cox transformer, based on MLJModels.jl, and implementing the MLJ model interface.

From MLJ, the type can be imported using

`UnivariateBoxCoxTransformer = @load UnivariateBoxCoxTransformer pkg=MLJModels`

Do `model = UnivariateBoxCoxTransformer()`

to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in `UnivariateBoxCoxTransformer(n=...)`

.

Box-Cox transformations attempt to make data look more normally distributed. This can improve performance and assist in the interpretation of models which suppose that data is generated by a normal distribution.

A Box-Cox transformation (with shift) is of the form

`x -> ((x + c)^λ - 1)/λ`

for some constant `c`

and real `λ`

, unless `λ = 0`

, in which case the above is replaced with

`x -> log(x + c)`

Given user-specified hyper-parameters `n::Integer`

and `shift::Bool`

, the present implementation learns the parameters `c`

and `λ`

from the training data as follows: If `shift=true`

and zeros are encountered in the data, then `c`

is set to `0.2`

times the data mean. If there are no zeros, then no shift is applied. Finally, `n`

different values of `λ`

between `-0.4`

and `3`

are considered, with `λ`

fixed to the value maximizing normality of the transformed data.

*Reference:* Wikipedia entry for power transform.

**Training data**

In MLJ or MLJBase, bind an instance `model`

to data with

`mach = machine(model, x)`

where

`x`

: any abstract vector with element scitype`Continuous`

; check the scitype with`scitype(x)`

Train the machine using `fit!(mach, rows=...)`

.

**Hyper-parameters**

`n=171`

: number of values of the exponent`λ`

to try`shift=false`

: whether to include a preliminary constant translation in transformations, in the presence of zeros

**Operations**

`transform(mach, xnew)`

: apply the Box-Cox transformation learned when fitting`mach`

`inverse_transform(mach, z)`

: reconstruct the vector`z`

whose transformation learned by`mach`

is`z`

**Fitted parameters**

The fields of `fitted_params(mach)`

are:

`λ`

: the learned Box-Cox exponent`c`

: the learned shift

**Examples**

```
using MLJ
using UnicodePlots
using Random
Random.seed!(123)
transf = UnivariateBoxCoxTransformer()
x = randn(1000).^2
mach = machine(transf, x)
fit!(mach)
z = transform(mach, x)
julia> histogram(x)
┌ ┐
[ 0.0, 2.0) ┤███████████████████████████████████ 848
[ 2.0, 4.0) ┤████▌ 109
[ 4.0, 6.0) ┤█▍ 33
[ 6.0, 8.0) ┤▍ 7
[ 8.0, 10.0) ┤▏ 2
[10.0, 12.0) ┤ 0
[12.0, 14.0) ┤▏ 1
└ ┘
Frequency
julia> histogram(z)
┌ ┐
[-5.0, -4.0) ┤█▎ 8
[-4.0, -3.0) ┤████████▊ 64
[-3.0, -2.0) ┤█████████████████████▊ 159
[-2.0, -1.0) ┤█████████████████████████████▊ 216
[-1.0, 0.0) ┤███████████████████████████████████ 254
[ 0.0, 1.0) ┤█████████████████████████▊ 188
[ 1.0, 2.0) ┤████████████▍ 90
[ 2.0, 3.0) ┤██▊ 20
[ 3.0, 4.0) ┤▎ 1
└ ┘
Frequency
```

`MLJModels.UnivariateDiscretizer`

— Type`UnivariateDiscretizer`

A model type for constructing a single variable discretizer, based on MLJModels.jl, and implementing the MLJ model interface.

From MLJ, the type can be imported using

`UnivariateDiscretizer = @load UnivariateDiscretizer pkg=MLJModels`

Do `model = UnivariateDiscretizer()`

to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in `UnivariateDiscretizer(n_classes=...)`

.

Discretization converts a `Continuous`

vector into an `OrderedFactor`

vector. In particular, the output is a `CategoricalVector`

(whose reference type is optimized).

The transformation is chosen so that the vector on which the transformer is fit has, in transformed form, an approximately uniform distribution of values. Specifically, if `n_classes`

is the level of discretization, then `2*n_classes - 1`

ordered quantiles are computed, the odd quantiles being used for transforming (discretization) and the even quantiles for inverse transforming.

**Training data**

In MLJ or MLJBase, bind an instance `model`

to data with

`mach = machine(model, x)`

where

`x`

: any abstract vector with`Continuous`

element scitype; check scitype with`scitype(x)`

.

Train the machine using `fit!(mach, rows=...)`

.

**Hyper-parameters**

`n_classes`

: number of discrete classes in the output

**Operations**

`transform(mach, xnew)`

: discretize`xnew`

according to the discretization learned when fitting`mach`

`inverse_transform(mach, z)`

: attempt to reconstruct from`z`

a vector that transforms to give`z`

**Fitted parameters**

The fields of `fitted_params(mach).fitesult`

include:

`odd_quantiles`

: quantiles used for transforming (length is`n_classes - 1`

)`even_quantiles`

: quantiles used for inverse transforming (length is`n_classes`

)

**Example**

```
using MLJ
using Random
Random.seed!(123)
discretizer = UnivariateDiscretizer(n_classes=100)
mach = machine(discretizer, randn(1000))
fit!(mach)
julia> x = rand(5)
5-element Vector{Float64}:
0.8585244609846809
0.37541692370451396
0.6767070590395461
0.9208844241267105
0.7064611415680901
julia> z = transform(mach, x)
5-element CategoricalArrays.CategoricalArray{UInt8,1,UInt8}:
0x52
0x42
0x4d
0x54
0x4e
x_approx = inverse_transform(mach, z)
julia> x - x_approx
5-element Vector{Float64}:
0.008224506144777322
0.012731354778359405
0.0056265330571125816
0.005738175684445124
0.006835652575801987
```

`MLJModels.UnivariateTimeTypeToContinuous`

— Type`UnivariateTimeTypeToContinuous`

A model type for constructing a single variable transformer that creates continuous representations of temporally typed data, based on MLJModels.jl, and implementing the MLJ model interface.

From MLJ, the type can be imported using

`UnivariateTimeTypeToContinuous = @load UnivariateTimeTypeToContinuous pkg=MLJModels`

Do `model = UnivariateTimeTypeToContinuous()`

to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in `UnivariateTimeTypeToContinuous(zero_time=...)`

.

Use this model to convert vectors with a `TimeType`

element type to vectors of `Float64`

type (`Continuous`

element scitype).

**Training data**

In MLJ or MLJBase, bind an instance `model`

to data with

`mach = machine(model, x)`

where

`x`

: any abstract vector whose element type is a subtype of`Dates.TimeType`

Train the machine using `fit!(mach, rows=...)`

.

**Hyper-parameters**

`zero_time`

: the time that is to correspond to 0.0 under transformations, with the type coinciding with the training data element type. If unspecified, the earliest time encountered in training is used.`step::Period=Hour(24)`

: time interval to correspond to one unit under transformation

**Operations**

`transform(mach, xnew)`

: apply the encoding inferred when`mach`

was fit

**Fitted parameters**

`fitted_params(mach).fitresult`

is the tuple `(zero_time, step)`

actually used in transformations, which may differ from the user-specified hyper-parameters.

**Example**

```
using MLJ
using Dates
x = [Date(2001, 1, 1) + Day(i) for i in 0:4]
encoder = UnivariateTimeTypeToContinuous(zero_time=Date(2000, 1, 1),
step=Week(1))
mach = machine(encoder, x)
fit!(mach)
julia> transform(mach, x)
5-element Vector{Float64}:
52.285714285714285
52.42857142857143
52.57142857142857
52.714285714285715
52.857142
```

## Static transformers

A *static transformer* is a model for transforming data that does not generalize to new data (does not "learn") but which nevertheless has hyperparameters. For example, the `DBSAN`

clustering model from Clustering.jl can assign labels to some collection of observations, cannot directly assign a label to some new observation.

The general user may define their own static models. The main use-case is insertion into a Linear Pipelines some parameter-dependent transformation. (If a static transformer has no hyper-parameters, it is tantamount to an ordinary function. An ordinary function can be inserted directly into a pipeline; the situation for learning networks is only slightly more complicated.

The following example defines a new model type `Averager`

to perform the weighted average of two vectors (target predictions, for example). We suppose the weighting is normalized, and therefore controlled by a single hyper-parameter, `mix`

.

```
mutable struct Averager <: Static
mix::Float64
end
MLJ.transform(a::Averager, _, y1, y2) = (1 - a.mix)*y1 + a.mix*y2
```

*Important.* Note the sub-typing `<: Static`

.

Such static transformers with (unlearned) parameters can have arbitrarily many inputs, but only one output. In the single input case, an `inverse_transform`

can also be defined. Since they have no real learned parameters, you bind a static transformer to a machine without specifying training arguments; there is no need to `fit!`

the machine:

```
mach = machine(Averager(0.5))
transform(mach, [1, 2, 3], [3, 2, 1])
```

```
3-element Vector{Float64}:
2.0
2.0
2.0
```

Let's see how we can include our `Averager`

in a learning network to mix the predictions of two regressors, with one-hot encoding of the inputs. Here's two regressors for mixing, and some dummy data for testing our learning network:

```
ridge = (@load RidgeRegressor pkg=MultivariateStats)()
knn = (@load KNNRegressor)()
import Random.seed!
seed!(112)
X = (
x1=coerce(rand("ab", 100), Multiclass),
x2=rand(100),
)
y = X.x2 + 0.05*rand(100)
schema(X)
```

```
┌───────┬───────────────┬────────────────────────────────┐
│ names │ scitypes │ types │
├───────┼───────────────┼────────────────────────────────┤
│ x1 │ Multiclass{2} │ CategoricalValue{Char, UInt32} │
│ x2 │ Continuous │ Float64 │
└───────┴───────────────┴────────────────────────────────┘
```

And the learning network:

```
Xs = source(X)
ys = source(y)
averager = Averager(0.5)
mach0 = machine(OneHotEncoder(), Xs)
W = transform(mach0, Xs) # one-hot encode the input
mach1 = machine(ridge, W, ys)
y1 = predict(mach1, W)
mach2 = machine(knn, W, ys)
y2 = predict(mach2, W)
mach4= machine(averager)
yhat = transform(mach4, y1, y2)
# test:
fit!(yhat)
Xnew = selectrows(X, 1:3)
yhat(Xnew)
```

```
3-element Vector{Float64}:
0.6403223210037916
0.9607694439597683
0.8159225346205365
```

We next "export" the learning network as a standalone composite model type. First we need a struct for the composite model. Since we are restricting to `Deterministic`

component regressors, the composite will also make deterministic predictions, and so gets the supertype `DeterministicNetworkComposite`

:

```
mutable struct DoubleRegressor <: DeterministicNetworkComposite
regressor1
regressor2
averager
end
```

As described in Learning Networks, we next paste the learning network into a `prefit`

declaration, replace the component models with symbolic placeholders, and add a learning network "interface":

```
import MLJBase
function MLJBase.prefit(composite::DoubleRegressor, verbosity, X, y)
Xs = source(X)
ys = source(y)
mach0 = machine(OneHotEncoder(), Xs)
W = transform(mach0, Xs) # one-hot encode the input
mach1 = machine(:regressor1, W, ys)
y1 = predict(mach1, W)
mach2 = machine(:regressor2, W, ys)
y2 = predict(mach2, W)
mach4= machine(:averager)
yhat = transform(mach4, y1, y2)
# learning network interface:
(; predict=yhat)
end
```

The new model type can be evaluated like any other supervised model:

```
X, y = @load_reduced_ames;
composite = DoubleRegressor(ridge, knn, Averager(0.5))
```

```
DoubleRegressor(
regressor1 = RidgeRegressor(
lambda = 1.0,
bias = true),
regressor2 = KNNRegressor(
K = 5,
algorithm = :kdtree,
metric = Distances.Euclidean(0.0),
leafsize = 10,
reorder = true,
weights = NearestNeighborModels.Uniform()),
averager = Averager(
mix = 0.5))
```

```
composite.averager.mix = 0.25 # adjust mix from default of 0.5
evaluate(composite, X, y, measure=l1)
```

```
PerformanceEvaluation object with these fields:
model, measure, operation, measurement, per_fold,
per_observation, fitted_params_per_fold,
report_per_fold, train_test_rows, resampling, repeats
Extract:
┌──────────┬───────────┬─────────────┬─────────┬────────────────────────────────
│ measure │ operation │ measurement │ 1.96*SE │ per_fold ⋯
├──────────┼───────────┼─────────────┼─────────┼────────────────────────────────
│ LPLoss( │ predict │ 17200.0 │ 1350.0 │ [15200.0, 15800.0, 18500.0, 1 ⋯
│ p = 1) │ │ │ │ ⋯
└──────────┴───────────┴─────────────┴─────────┴────────────────────────────────
1 column omitted
```

A static transformer can also expose byproducts of the transform computation in the report of any associated machine. See Static models (models that do not generalize) for details.

## Transformers that also predict

Some clustering algorithms learn to label data by identifying a collection of "centroids" in the training data. Any new input observation is labeled with the cluster to which it is closest (this is the output of `predict`

) while the vector of all distances from the centroids defines a lower-dimensional representation of the observation (the output of `transform`

). In the following example a K-means clustering algorithm assigns one of three labels 1, 2, 3 to the input features of the iris data set and compares them with the actual species recorded in the target (not seen by the algorithm).

```
import Random.seed!
seed!(123)
X, y = @load_iris;
KMeans = @load KMeans pkg=ParallelKMeans
kmeans = KMeans()
mach = machine(kmeans, X) |> fit!
# transforming:
Xsmall = transform(mach);
selectrows(Xsmall, 1:4) |> pretty
julia> selectrows(Xsmall, 1:4) |> pretty
┌─────────────────────┬────────────────────┬────────────────────┐
│ x1 │ x2 │ x3 │
│ Float64 │ Float64 │ Float64 │
│ Continuous │ Continuous │ Continuous │
├─────────────────────┼────────────────────┼────────────────────┤
│ 0.0215920000000267 │ 25.314260355029603 │ 11.645232464391299 │
│ 0.19199200000001326 │ 25.882721893491123 │ 11.489658693899486 │
│ 0.1699920000000077 │ 27.58656804733728 │ 12.674412792260142 │
│ 0.26919199999998966 │ 26.28656804733727 │ 11.64392098898145 │
└─────────────────────┴────────────────────┴────────────────────┘
# predicting:
yhat = predict(mach);
compare = zip(yhat, y) |> collect;
compare[1:8]
8-element Array{Tuple{CategoricalValue{Int64,UInt32},CategoricalString{UInt32}},1}:
(1, "setosa")
(1, "setosa")
(1, "setosa")
(1, "setosa")
(1, "setosa")
(1, "setosa")
(1, "setosa")
(1, "setosa")
compare[51:58]
8-element Array{Tuple{CategoricalValue{Int64,UInt32},CategoricalString{UInt32}},1}:
(2, "versicolor")
(3, "versicolor")
(2, "versicolor")
(3, "versicolor")
(3, "versicolor")
(3, "versicolor")
(3, "versicolor")
(3, "versicolor")
compare[101:108]
8-element Array{Tuple{CategoricalValue{Int64,UInt32},CategoricalString{UInt32}},1}:
(2, "virginica")
(3, "virginica")
(2, "virginica")
(2, "virginica")
(2, "virginica")
(2, "virginica")
(3, "virginica")
(2, "virginica")
```