# Handling categorical data

Download the notebook, the raw script, or the annotated script for this tutorial (right-click on the link and save).
This tutorial follows loosely the docs.

## Defining a categorical vector

``````using CategoricalArrays

v = categorical(["AA", "BB", "CC", "AA", "BB", "CC"])``````
``````6-element CategoricalArrays.CategoricalArray{String,1,UInt32}:
"AA"
"BB"
"CC"
"AA"
"BB"
"CC"``````

This declares a categorical vector, i.e. a Vector whose entries are expected to represent a group or category. You can retrieve the group labels using `levels`:

``levels(v)``
``````3-element Array{String,1}:
"AA"
"BB"
"CC"``````

which, by default, returns the labels in lexicographic order.

## Working with categoricals

### Ordered categoricals

You can specify that categories are ordered by specifying `ordered=true`, the order then follows that of the levels. If you wish to change that order, you need to use the `levels!` function. Let's see two examples.

``````v = categorical([1, 2, 3, 1, 2, 3, 1, 2, 3], ordered=true)

levels(v)``````
``````3-element Array{Int64,1}:
1
2
3``````

Here the lexicographic order matches what we want so no need to change it, since we've specified that the categories are ordered we can do:

``v[1] < v[2]``
``true``

Let's now consider another example

``````v = categorical(["high", "med", "low", "high", "med", "low"], ordered=true)

levels(v)``````
``````3-element Array{String,1}:
"high"
"low"
"med"``````

The levels follow the lexicographic order which is not what we want:

``v[1] < v[2]``
``true``

In order to re-specify the order we need to use `levels!`:

``levels!(v, ["low", "med", "high"])``
``````6-element CategoricalArrays.CategoricalArray{String,1,UInt32}:
"high"
"med"
"low"
"high"
"med"
"low"``````

now things are properly ordered:

``v[1] < v[2]``
``false``

### Missing values

You can also have a categorical vector with missing values:

``v = categorical(["AA", "BB", missing, "AA", "BB", "CC"]);``

that doesn't change the levels:

``levels(v)``
``````3-element Array{String,1}:
"AA"
"BB"
"CC"``````