Although one can call
predict_mode on a probabilistic binary classifier to get deterministic predictions, a more flexible strategy is to wrap the model using
BinaryThresholdPredictor, as this allows the user to specify the threshold probability for predicting a positive class. This wrapping converts a probabilistic classifier into a deterministic one.
The positive class is always the second class returned when calling
levels on the training target
model, assumed to support binary classification, as a
Deterministic model, by applying the specified
threshold to the positive class probability. In addition to conventional supervised classifiers, it can also be applied to outlier detection models that predict normalized scores - in the form of appropriate
UnivariateFinite distributions - that is, models that subtype
By convention the positive class is the second class returned by
y is the target.
threshold=0.5 then calling
predict on the wrapped model is equivalent to calling
predict_mode on the atomic model.
Below is an application to the well-known Pima Indian diabetes dataset, including optimization of the
threshold parameter, with a high balanced accuracy the objective. The target class distribution is 500 positives to 268 negatives.
Loading the data:
using MLJ, Random rng = Xoshiro(123) diabetes = OpenML.load(43582) outcome, X = unpack(diabetes, ==(:Outcome), rng=rng); y = coerce(Int.(outcome), OrderedFactor);
Choosing a probabilistic classifier:
EvoTreesClassifier = @load EvoTreesClassifier prob_predictor = EvoTreesClassifier()
TunedModel to get a deterministic classifier with
threshold as a new hyperparameter:
point_predictor = BinaryThresholdPredictor(prob_predictor, threshold=0.6) Xnew, _ = make_moons(3, rng=rng) mach = machine(point_predictor, X, y) |> fit! predict(mach, X)[1:3] # [0, 0, 0]
balanced = BalancedAccuracy(adjusted=true) e = evaluate!(mach, resampling=CV(nfolds=6), measures=[balanced, accuracy]) e.measurement # 0.405 ± 0.089
Wrapping in tuning strategy to learn
threshold that maximizes balanced accuracy:
r = range(point_predictor, :threshold, lower=0.1, upper=0.9) tuned_point_predictor = TunedModel( point_predictor, tuning=RandomSearch(rng=rng), resampling=CV(nfolds=6), range = r, measure=balanced, n=30, ) mach2 = machine(tuned_point_predictor, X, y) |> fit! optimized_point_predictor = report(mach2).best_model optimized_point_predictor.threshold # 0.260 predict(mach2, X)[1:3] # [1, 1, 0]
Estimating the performance of the auto-thresholding model (nested resampling here):
e = evaluate!(mach2, resampling=CV(nfolds=6), measure=[balanced, accuracy]) e.measurement # 0.477 ± 0.110