Overview
Overview¶
The key goal of research data science is to learn from data. One of the most powerful methods of learning from data is statistical modelling.
We demystify the key concepts involved through applying simple models (linear and logistic regression). The intended take-homes can be applied to any modelling problem.
The module is structured as follows:
The what and why of statistical modelling. We begin by defining what modelling is and motivating the power of modelling.
Fitting models. Here we go through the components of a model, including describing how to fit one to data.
Building a simple model. We then carefully build a model based on the understanding of our data, taking care to understand the model.
Evaluation a model. It is not enough to have a model that is fitted to your data. The model has to be useful. The final section will cover how to evaluate your model and iteratively improve upon your model.
References:
We will include more specific references as we move through the module. But useful accessible introductions to modelling that has inspired much of this module’s content are Poldrack’s Statistical Thinking for the 21st Century, Holmes and Huber’s Modern Statistics for Modern Biology, as well as the introductory sections of Richard McElreath’s wonderfully readable Statistical Rethinking and Bishop’s classic Machine Learning for Pattern Recognition textbook.