OpenML Integration

OpenML provides an integration platform for carrying out and comparing machine learning solutions across a broad collection of public datasets and software platforms. Integration of MLJ with OpenML is a work in progress.

Loading IRIS Dataset

As an example, we will try to load the iris dataset using OpenML.load(taskID).

using MLJ.MLJBase

Task ID

OpenML.load requires task ID of the the dataset to be loaded. This ID can be found on the OpenML website. The task ID for the iris dataset is 61, as mentioned in this OpenML Page

julia> rowtable = OpenML.load(61)
150-element Array{NamedTuple{(:sepallength, :sepalwidth, :petallength, :petalwidth, :class),Tuple{Float64,Float64,Float64,Float64,SubString{String}}},1}:
 (sepallength = 5.1, sepalwidth = 3.5, petallength = 1.4, petalwidth = 0.2, class = "Iris-setosa")
 (sepallength = 4.9, sepalwidth = 3.0, petallength = 1.4, petalwidth = 0.2, class = "Iris-setosa")
 (sepallength = 4.7, sepalwidth = 3.2, petallength = 1.3, petalwidth = 0.2, class = "Iris-setosa")
 (sepallength = 4.6, sepalwidth = 3.1, petallength = 1.5, petalwidth = 0.2, class = "Iris-setosa")
 (sepallength = 5.0, sepalwidth = 3.6, petallength = 1.4, petalwidth = 0.2, class = "Iris-setosa")
 (sepallength = 5.4, sepalwidth = 3.9, petallength = 1.7, petalwidth = 0.4, class = "Iris-setosa")
 (sepallength = 4.6, sepalwidth = 3.4, petallength = 1.4, petalwidth = 0.3, class = "Iris-setosa")
 (sepallength = 5.0, sepalwidth = 3.4, petallength = 1.5, petalwidth = 0.2, class = "Iris-setosa")
 (sepallength = 4.4, sepalwidth = 2.9, petallength = 1.4, petalwidth = 0.2, class = "Iris-setosa")
 (sepallength = 4.9, sepalwidth = 3.1, petallength = 1.5, petalwidth = 0.1, class = "Iris-setosa")
 ⋮
 (sepallength = 6.9, sepalwidth = 3.1, petallength = 5.1, petalwidth = 2.3, class = "Iris-virginica")
 (sepallength = 5.8, sepalwidth = 2.7, petallength = 5.1, petalwidth = 1.9, class = "Iris-virginica")
 (sepallength = 6.8, sepalwidth = 3.2, petallength = 5.9, petalwidth = 2.3, class = "Iris-virginica")
 (sepallength = 6.7, sepalwidth = 3.3, petallength = 5.7, petalwidth = 2.5, class = "Iris-virginica")
 (sepallength = 6.7, sepalwidth = 3.0, petallength = 5.2, petalwidth = 2.3, class = "Iris-virginica")
 (sepallength = 6.3, sepalwidth = 2.5, petallength = 5.0, petalwidth = 1.9, class = "Iris-virginica")
 (sepallength = 6.5, sepalwidth = 3.0, petallength = 5.2, petalwidth = 2.0, class = "Iris-virginica")
 (sepallength = 6.2, sepalwidth = 3.4, petallength = 5.4, petalwidth = 2.3, class = "Iris-virginica")
 (sepallength = 5.9, sepalwidth = 3.0, petallength = 5.1, petalwidth = 1.8, class = "Iris-virginica")

Converting to DataFrame

julia> using DataFrames

julia> df = DataFrame(rowtable)
150×5 DataFrame
 Row │ sepallength  sepalwidth  petallength  petalwidth  class
     │ Float64      Float64     Float64      Float64     SubStrin…
─────┼──────────────────────────────────────────────────────────────────
   1 │         5.1         3.5          1.4         0.2  Iris-setosa
   2 │         4.9         3.0          1.4         0.2  Iris-setosa
   3 │         4.7         3.2          1.3         0.2  Iris-setosa
   4 │         4.6         3.1          1.5         0.2  Iris-setosa
   5 │         5.0         3.6          1.4         0.2  Iris-setosa
   6 │         5.4         3.9          1.7         0.4  Iris-setosa
   7 │         4.6         3.4          1.4         0.3  Iris-setosa
   8 │         5.0         3.4          1.5         0.2  Iris-setosa
  ⋮  │      ⋮           ⋮            ⋮           ⋮             ⋮
 144 │         6.8         3.2          5.9         2.3  Iris-virginica
 145 │         6.7         3.3          5.7         2.5  Iris-virginica
 146 │         6.7         3.0          5.2         2.3  Iris-virginica
 147 │         6.3         2.5          5.0         1.9  Iris-virginica
 148 │         6.5         3.0          5.2         2.0  Iris-virginica
 149 │         6.2         3.4          5.4         2.3  Iris-virginica
 150 │         5.9         3.0          5.1         1.8  Iris-virginica
                                                        135 rows omitted

julia> df2 = coerce(df, :class=>Multiclass)
150×5 DataFrame
 Row │ sepallength  sepalwidth  petallength  petalwidth  class
     │ Float64      Float64     Float64      Float64     Cat…
─────┼──────────────────────────────────────────────────────────────────
   1 │         5.1         3.5          1.4         0.2  Iris-setosa
   2 │         4.9         3.0          1.4         0.2  Iris-setosa
   3 │         4.7         3.2          1.3         0.2  Iris-setosa
   4 │         4.6         3.1          1.5         0.2  Iris-setosa
   5 │         5.0         3.6          1.4         0.2  Iris-setosa
   6 │         5.4         3.9          1.7         0.4  Iris-setosa
   7 │         4.6         3.4          1.4         0.3  Iris-setosa
   8 │         5.0         3.4          1.5         0.2  Iris-setosa
  ⋮  │      ⋮           ⋮            ⋮           ⋮             ⋮
 144 │         6.8         3.2          5.9         2.3  Iris-virginica
 145 │         6.7         3.3          5.7         2.5  Iris-virginica
 146 │         6.7         3.0          5.2         2.3  Iris-virginica
 147 │         6.3         2.5          5.0         1.9  Iris-virginica
 148 │         6.5         3.0          5.2         2.0  Iris-virginica
 149 │         6.2         3.4          5.4         2.3  Iris-virginica
 150 │         5.9         3.0          5.1         1.8  Iris-virginica
                                                        135 rows omitted
MLJBase.OpenML.loadFunction
OpenML.load(id)

Load the OpenML dataset with specified id, from those listed on the OpenML site.

Returns a "row table", i.e., a Vector of identically typed NamedTuples. A row table is compatible with the Tables.jl interface and can therefore be readily converted to other compatible formats. For example:

using DataFrames
rowtable = OpenML.load(61);
df = DataFrame(rowtable);
df2 = coerce(df, :class=>Multiclass)