Data Science Tutorials in Julia

Learning by doing

This website offers tutorials for MLJ.jl and related packages. On each tutorial page, you will find a link to download the raw script and the notebook corresponding to the page.

Feedback and PRs are always welcome to help make these tutorials better, from the presentation to the content.

In order to reproduce the environment that was used to generate these tutorials, please follow these steps:

  1. Go to the directory of your choice: cd("/Users/[JohnDoe]/")

  2. Create a folder named, e.g., "MLJ_tutorials": mkdir("MLJ_tutorials")

  3. Download this Project.toml and this Manifest.toml in this folder;

  4. In that folder, do

julia> using Pkg; Pkg.activate("."); Pkg.instantiate();

Elementary data manipulations

If you have some programming experience but are otherwise fairly new to data processing in Julia, you may appreciate the following few tutorials before moving on. In these we provide an introduction to some of the fundamental packages in the Julia data processing universe such as DataFrames, CSV and CategoricalArrays.

Getting started with MLJ

If you are new to MLJ but are familiar with Julia and with Machine Learning, we recommend you start by going through the short Getting started examples in order:

  1. How to choose a model,

  2. How to fit, predict and transform

  3. How to tune models

  4. How to ensemble models

  5. How to ensemble models (2)

  6. More on ensembles

  7. How to compose models

  8. How to build a learning network

  9. How to create models from learning networks

  10. An extended tutorial on stacking

Additionally, you can refer to the documentation for more detailed information.

Introduction to Statistical Learning with MLJ

This is a sequence of tutorials adapted from the labs associated with An introduction to statistical learning which were originally written in R. These tutorials may be useful if you want a gentle intro to MLJ and other relevant tools in the Julia environment. If you're fairly new to Julia and ML, this is probably where you should start.

Note: the adaptation is fairly liberal, adding content when it helps highlights specificities with MLJ and removing content when it seems unnecessary. Also note that some of the things used in the ISL labs are not (yet) supported by MLJ.

  • Lab 2, a very short intro to Julia for data analysis

  • Lab 3, linear regression and metrics

  • Lab 4, classification with LDA, QDA, KNN and metrics

  • Lab 5, k-folds cross validation

  • Lab 6b, Ridge and Lasso regression

  • Lab 8, Tree-based models

  • Lab 9, SVM (partial)

  • Lab 10, PCA and clustering (partial)

End to end examples with MLJ

These are examples that are meant to show how MLJ can be used from loading data to producing a model. They assume familiarity with Machine Learning and MLJ.

Note that these tutorials are not meant to teach you ML or Data Science; there may be better ways to analyse the data, the primary aim is to show quick analysis so that you can get more familiar with using MLJ.

The examples can be followed in any order, the tags can guide you as to which tutorials you may want to look at first.

  • AMES, simple, regression, one-hot, learning network, tuning, deterministic

  • Wine, simple, classification, standardizer, PCA, knn, multinomial, pipeline

  • Crabs XGB, simple, classification, xg-boost, tuning

  • Horse, simple, classification, scientific type and autotype, missing values, imputation, one-hot, tuning

  • King County Houses, simple, regression, scientific type, tuning, xg-boost.

  • Airfoil, simple, regression, random forest

  • Boston LGBM, intermediate, regression, LightGBM

  • Using GLM.jl, simple, regression.

  • Power Generation, simple, feature pre-processing, regression, temporal data.