# Lab 2

Download the notebook, the raw script, or the annotated script for this tutorial (right-click on the link and save).

## Basic commands

This is a very brief and rough primer if you're new to Julia and wondering how to do simple things that are relevant for data analysis.

Defining a vector

``````x = [1, 3, 2, 5]
@show x
@show length(x)``````
``````x = [1, 3, 2, 5]
length(x) = 4
``````

Operations between vectors

``````y = [4, 5, 6, 1]
z = x .+ y # elementwise operation``````
``````4-element Array{Int64,1}:
5
8
8
6``````

Defining a matrix

``X = [1  2; 3 4]``
``````2×2 Array{Int64,2}:
1  2
3  4``````

You can also do that from a vector

``X = reshape([1, 2, 3, 4], 2, 2)``
``````2×2 Array{Int64,2}:
1  3
2  4``````

But you have to be careful that it fills the matrix by column; so if you want to get the same result as before, you will need to permute the dimensions

``X = permutedims(reshape([1, 2, 3, 4], 2, 2))``
``````2×2 Array{Int64,2}:
1  2
3  4``````

Function calls can be split with the `|>` operator so that the above can also be written

``X = reshape([1,2,3,4], 2, 2) |> permutedims``
``````2×2 Array{Int64,2}:
1  2
3  4``````

You don't have to do that of course but we will sometimes use it in these tutorials.

There's a wealth of functions available for simple math operations

``````x = 4
@show x^2
@show sqrt(x)``````
``````x ^ 2 = 16
sqrt(x) = 2.0
``````

Element wise operations on a collection can be done with the dot syntax:

``sqrt.([4, 9, 16])``
``````3-element Array{Float64,1}:
2.0
3.0
4.0``````

The packages `Statistics` (from the standard library) and `StatsBase` offer a number of useful function for stats:

``using Statistics, StatsBase``

Note that if you don't have `StatsBase`, you can add it using `using Pkg; Pkg.add("StatsBase")`. Right, let's now compute some simple statistics:

``````x = randn(1_000) # 1_000 points iid from a N(0, 1)
μ = mean(x)
σ = std(x)
@show (μ, σ)``````
``````(μ, σ) = (-0.023363181706442294, 0.9757686582990799)
``````

Indexing data starts at 1, use `:` to indicate the full range

``````X = [1 2; 3 4; 5 6]
@show X[1, 2]
@show X[:, 1]
@show X[1, :]
@show X[[1, 2], [1, 2]]``````
``````X[1, 2] = 2
X[:, 1] = [1, 3, 5]
X[1, :] = [1, 2]
X[[1, 2], [1, 2]] = [1 2; 3 4]
``````
`size` gives dimensions (nrows, ncolumns)

``size(X)``
``(3, 2)``

There are many ways to load data in Julia, one convenient one is via the `CSV` package.

``using CSV``

Many datasets are available via the `RDatasets` package

``using RDatasets``

And finally the `DataFrames` package allows to manipulate data easily

``using DataFrames``

Let's load some data from RDatasets (the full list of datasets is available here).

``````auto = dataset("ISLR", "Auto")
first(auto, 3)``````
``````3×9 DataFrame
│ Row │ MPG     │ Cylinders │ Displacement │ Horsepower │ Weight  │ Acceleration │ Year    │ Origin  │ Name                      │
│     │ Float64 │ Float64   │ Float64      │ Float64    │ Float64 │ Float64      │ Float64 │ Float64 │ String                    │
├─────┼─────────┼───────────┼──────────────┼────────────┼─────────┼──────────────┼─────────┼─────────┼───────────────────────────┤
│ 1   │ 18.0    │ 8.0       │ 307.0        │ 130.0      │ 3504.0  │ 12.0         │ 70.0    │ 1.0     │ chevrolet chevelle malibu │
│ 2   │ 15.0    │ 8.0       │ 350.0        │ 165.0      │ 3693.0  │ 11.5         │ 70.0    │ 1.0     │ buick skylark 320         │
│ 3   │ 18.0    │ 8.0       │ 318.0        │ 150.0      │ 3436.0  │ 11.0         │ 70.0    │ 1.0     │ plymouth satellite        │``````

The `describe` function allows to get an idea for the data:

``describe(auto, :mean, :median, :std)``
``````9×4 DataFrame
│ Row │ variable     │ mean    │ median │ std      │
│     │ Symbol       │ Union…  │ Union… │ Union…   │
├─────┼──────────────┼─────────┼────────┼──────────┤
│ 1   │ MPG          │ 23.4459 │ 22.75  │ 7.80501  │
│ 2   │ Cylinders    │ 5.47194 │ 4.0    │ 1.70578  │
│ 3   │ Displacement │ 194.412 │ 151.0  │ 104.644  │
│ 4   │ Horsepower   │ 104.469 │ 93.5   │ 38.4912  │
│ 5   │ Weight       │ 2977.58 │ 2803.5 │ 849.403  │
│ 6   │ Acceleration │ 15.5413 │ 15.5   │ 2.75886  │
│ 7   │ Year         │ 75.9796 │ 76.0   │ 3.68374  │
│ 8   │ Origin       │ 1.57653 │ 1.0    │ 0.805518 │
│ 9   │ Name         │         │        │          │``````

To retrieve column names, you can use `names`:

``names(auto)``
``````9-element Array{String,1}:
"MPG"
"Cylinders"
"Displacement"
"Horsepower"
"Weight"
"Acceleration"
"Year"
"Origin"
"Name"``````

Accesssing columns can be done in different ways:

``````mpg = auto.MPG
mpg = auto[:, 1]
mpg = auto[:, :MPG]
mpg |> mean``````
``23.44591836734694``

To get dimensions you can use `size` and `nrow` and `ncol`

``````@show size(auto)
@show nrow(auto)
@show ncol(auto)``````
``````size(auto) = (392, 9)
nrow(auto) = 392
ncol(auto) = 9
``````

For more detailed tutorials on basic data wrangling in Julia, consider

## Plotting data

There are multiple libraries that can be used to plot things in Julia:

In these tutorials we use `PyPlot` but you could use another package of course.

``````using PyPlot

figure(figsize=(8,6))
plot(mpg)``````