Hypothesis Testing & Software Development

4. Hypothesis Testing & Software Development

Objectives

  • Understand the concepts and processes of hypothesis testing.

  • Use hypothesis testing to evaluate and interpret the performance of machine learning models.

  • Understand the software development process and reproducibility.

  • Learn to use GitHub for version control and collaboration.

Expected time to complete: 3 hours

Machine learning is a powerful tool for decision making and scientific discovery. However, the results of machine learning models are often not as straightforward as it seems to interpret, and we need to be careful when using machine learning models to make decisions or draw conclusions. For example, a model may have a high accuracy but it may not be clear whether the model is actually useful.

We typically develop software to implement the machine learning model. Different software development processes can lead to different results and costs. We need to be aware of the software development process and the reproducibility of the results. Moreover, nowadays, software development is often a collaborative process. We need to learn how to collaborate with others in software development and also ensure that the software is reproducible and maintainable.

In this chapter, we will learn about the concepts and processes of hypothesis testing and how to use hypothesis testing to evaluate and interpret the performance of machine learning models. We will also learn about the software development process and reproducibility, and how to use GitHub for version control and collaboration.

Process transparency: hypothesis testing

  • Starting point: One or multiple groups of data to make a decision or draw a conclusion

  • Define the null hypothesis and the alternative hypothesis

  • Compute a test statistic to summarise the strength of evidence against the null hypothesis

  • Compute a \(p\)-value to quantify the strength of evidence against the null hypothesis

  • Decide whether to reject the null hypothesis or not based on a chosen significance level

  • End point: Report the decision or conclusion

Process transparency: software development

  • Starting point: A problem to solve

  • Initiate the project with rationale, scope, and vision

  • Define the project’s objectives and requirements

  • Build the software

  • Test and evaluate the software in a controlled environment

  • Deploy the software in a production environment

  • End point: Deployed software working as expected