OpenML Integration

OpenML provides an integration platform for carrying out and comparing machine learning solutions across a broad collection of public datasets and software platforms. Integration of MLJ with OpenML is a work in progress.

Loading IRIS Dataset

As an example, we will try to load iris dataset using OpenML.load(taskID).

using MLJ.MLJBase

Task ID

OpenML.load requires task ID of the the dataset to be loaded. This ID can be found on OpenML website. The task ID for iris dataset is 61, as mentioned in this OpenML Page

julia> rowtable = OpenML.load(61)
150-element Array{NamedTuple{(:sepallength, :sepalwidth, :petallength, :petalwidth, :class),Tuple{Float64,Float64,Float64,Float64,SubString{String}}},1}:
 (sepallength = 5.1, sepalwidth = 3.5, petallength = 1.4, petalwidth = 0.2, class = "Iris-setosa")
 (sepallength = 4.9, sepalwidth = 3.0, petallength = 1.4, petalwidth = 0.2, class = "Iris-setosa")
 (sepallength = 4.7, sepalwidth = 3.2, petallength = 1.3, petalwidth = 0.2, class = "Iris-setosa")
 (sepallength = 4.6, sepalwidth = 3.1, petallength = 1.5, petalwidth = 0.2, class = "Iris-setosa")
 (sepallength = 5.0, sepalwidth = 3.6, petallength = 1.4, petalwidth = 0.2, class = "Iris-setosa")
 (sepallength = 5.4, sepalwidth = 3.9, petallength = 1.7, petalwidth = 0.4, class = "Iris-setosa")
 (sepallength = 4.6, sepalwidth = 3.4, petallength = 1.4, petalwidth = 0.3, class = "Iris-setosa")
 (sepallength = 5.0, sepalwidth = 3.4, petallength = 1.5, petalwidth = 0.2, class = "Iris-setosa")
 (sepallength = 4.4, sepalwidth = 2.9, petallength = 1.4, petalwidth = 0.2, class = "Iris-setosa")
 (sepallength = 4.9, sepalwidth = 3.1, petallength = 1.5, petalwidth = 0.1, class = "Iris-setosa")
 ⋮
 (sepallength = 6.9, sepalwidth = 3.1, petallength = 5.1, petalwidth = 2.3, class = "Iris-virginica")
 (sepallength = 5.8, sepalwidth = 2.7, petallength = 5.1, petalwidth = 1.9, class = "Iris-virginica")
 (sepallength = 6.8, sepalwidth = 3.2, petallength = 5.9, petalwidth = 2.3, class = "Iris-virginica")
 (sepallength = 6.7, sepalwidth = 3.3, petallength = 5.7, petalwidth = 2.5, class = "Iris-virginica")
 (sepallength = 6.7, sepalwidth = 3.0, petallength = 5.2, petalwidth = 2.3, class = "Iris-virginica")
 (sepallength = 6.3, sepalwidth = 2.5, petallength = 5.0, petalwidth = 1.9, class = "Iris-virginica")
 (sepallength = 6.5, sepalwidth = 3.0, petallength = 5.2, petalwidth = 2.0, class = "Iris-virginica")
 (sepallength = 6.2, sepalwidth = 3.4, petallength = 5.4, petalwidth = 2.3, class = "Iris-virginica")
 (sepallength = 5.9, sepalwidth = 3.0, petallength = 5.1, petalwidth = 1.8, class = "Iris-virginica")

Coverting to DataFrame

julia> using DataFrames

julia> df = DataFrame(rowtable)
150×5 DataFrame
│ Row │ sepallength │ sepalwidth │ petallength │ petalwidth │ class          │
│     │ Float64     │ Float64    │ Float64     │ Float64    │ SubString…     │
├─────┼─────────────┼────────────┼─────────────┼────────────┼────────────────┤
│ 1   │ 5.1         │ 3.5        │ 1.4         │ 0.2        │ Iris-setosa    │
│ 2   │ 4.9         │ 3.0        │ 1.4         │ 0.2        │ Iris-setosa    │
│ 3   │ 4.7         │ 3.2        │ 1.3         │ 0.2        │ Iris-setosa    │
│ 4   │ 4.6         │ 3.1        │ 1.5         │ 0.2        │ Iris-setosa    │
│ 5   │ 5.0         │ 3.6        │ 1.4         │ 0.2        │ Iris-setosa    │
│ 6   │ 5.4         │ 3.9        │ 1.7         │ 0.4        │ Iris-setosa    │
│ 7   │ 4.6         │ 3.4        │ 1.4         │ 0.3        │ Iris-setosa    │
⋮
│ 143 │ 5.8         │ 2.7        │ 5.1         │ 1.9        │ Iris-virginica │
│ 144 │ 6.8         │ 3.2        │ 5.9         │ 2.3        │ Iris-virginica │
│ 145 │ 6.7         │ 3.3        │ 5.7         │ 2.5        │ Iris-virginica │
│ 146 │ 6.7         │ 3.0        │ 5.2         │ 2.3        │ Iris-virginica │
│ 147 │ 6.3         │ 2.5        │ 5.0         │ 1.9        │ Iris-virginica │
│ 148 │ 6.5         │ 3.0        │ 5.2         │ 2.0        │ Iris-virginica │
│ 149 │ 6.2         │ 3.4        │ 5.4         │ 2.3        │ Iris-virginica │
│ 150 │ 5.9         │ 3.0        │ 5.1         │ 1.8        │ Iris-virginica │

julia> df2 = coerce(df, :class=>Multiclass)
150×5 DataFrame
│ Row │ sepallength │ sepalwidth │ petallength │ petalwidth │ class          │
│     │ Float64     │ Float64    │ Float64     │ Float64    │ Categorical…   │
├─────┼─────────────┼────────────┼─────────────┼────────────┼────────────────┤
│ 1   │ 5.1         │ 3.5        │ 1.4         │ 0.2        │ Iris-setosa    │
│ 2   │ 4.9         │ 3.0        │ 1.4         │ 0.2        │ Iris-setosa    │
│ 3   │ 4.7         │ 3.2        │ 1.3         │ 0.2        │ Iris-setosa    │
│ 4   │ 4.6         │ 3.1        │ 1.5         │ 0.2        │ Iris-setosa    │
│ 5   │ 5.0         │ 3.6        │ 1.4         │ 0.2        │ Iris-setosa    │
│ 6   │ 5.4         │ 3.9        │ 1.7         │ 0.4        │ Iris-setosa    │
│ 7   │ 4.6         │ 3.4        │ 1.4         │ 0.3        │ Iris-setosa    │
⋮
│ 143 │ 5.8         │ 2.7        │ 5.1         │ 1.9        │ Iris-virginica │
│ 144 │ 6.8         │ 3.2        │ 5.9         │ 2.3        │ Iris-virginica │
│ 145 │ 6.7         │ 3.3        │ 5.7         │ 2.5        │ Iris-virginica │
│ 146 │ 6.7         │ 3.0        │ 5.2         │ 2.3        │ Iris-virginica │
│ 147 │ 6.3         │ 2.5        │ 5.0         │ 1.9        │ Iris-virginica │
│ 148 │ 6.5         │ 3.0        │ 5.2         │ 2.0        │ Iris-virginica │
│ 149 │ 6.2         │ 3.4        │ 5.4         │ 2.3        │ Iris-virginica │
│ 150 │ 5.9         │ 3.0        │ 5.1         │ 1.8        │ Iris-virginica │
MLJBase.OpenML.loadFunction
OpenML.load(id)

Load the OpenML dataset with specified id, from those listed on the OpenML site.

Returns a "row table", i.e., a Vector of identically typed NamedTuples. A row table is compatible with the Tables.jl interface and can therefore be readily converted to other compatible formats. For example:

using DataFrames
rowtable = OpenML.load(61);
df = DataFrame(rowtable);
df2 = coerce(df, :class=>Multiclass)