Skip to content

What is the AI Project Lifecycle?

There are many ways of carving up the lifecycle for a data science or AI project. For instance, (Sweenor, 2020) breaks it into four stages: Build, Manage, Deploy & Integrate, Monitor.1 Ashmore et al. (2019) also identify four stages, which have a more specific focus on data science: data management, model learning, model verification, and model deployment.

The multiplicity of approaches is likely a product of the evolution of diverse methods in data mining/analytics, the significant impact of ML on research and innovation, and the specific practices and considerations inherent to each of the various domains where ML techniques are applied. While there are many benefits of existing frameworks, they have not been designed with a responsible and ethical approach to governance in mind.

The following figure, therefore, presents a model of a typical lifecycle for a project involving data science or the production of an ML/AI system. We have designed this model to support the ethical governance and responsible regulation of AI/ML, while remaining faithful to the technical processes. However, it is important to note that the model is a heuristic device, and, therefore, is not intended to be perfectly represent all ML or AI projects.

Info

You can hover over the individual activity bubbles to see a description of the corresponding activity.

Preliminary activities designed to help scope out the aims, objectives and processes involved with the project, including potential risks and benefits. Project Planning The formulation of a clear statement about the over-arching problem the system or project addresses (e.g., a research statement or system specification) and a lower-level description of the computational procedure that instantiates it. ProblemFormulation The design of an experimental method or decisions about data gathering and collection, based on the planning and problem formulation from the previous steps. Data Extraction& Procurement Stages of exploratory and confirmatory data analysis designed to help researchers or developers identify relevant associations between input variables and target variables. Data Analysis A process of cleaning, normalising, and refactoring data into the features that will be used in model training and testing, as well as the features that may be used in the final system. Preprocessing& FeatureEngineering The selection of a particular algorithm (or multiple algorithms) for training the model. Model Selection & Training An algorithmic model that that adapts its behaviour over time or context may require updating or deprovisioning (i.e. removing from the production environment). Model Updating& Deprovisioning Ongoing monitoring and feedback from the system, either automated or probed, to ensure that issues such as model drift have not affected performance or resulted in harms to individuals or groups. System Use& Monitoring Training for those individuals or groups who are either required to operate a data-driven system (perhaps in a safety critical context) or who are likely to use the system (e.g. consumers). User Training The process of putting a model into production, and implementing the operational system, which enables and structures interaction with the model, within the respective environment (e.g., a recommender system that converts a user's existing movie ratings into recommendations for future watches). A process of documenting both the formal and non-formal properties of both the model and the processes by which it was developed (e.g., source of data, algorithms used, evaluation metrics). ModelReporting Testing the model against a variety of metrics, which may include those that assess how accurate a model is for different sub-groups of a population. This is important where issues of fairness or equality may arise. Model Testing & Validation

To begin, the inner circle breaks the project lifecycle into three processes:

  1. (Project) Design
  2. (Model) Development
  3. (System) Deployment

These terms are intended to be maximally inclusive. For example, the design stage encompasses any project task or decision-making process that scaffolds or sets constraints on later project stages (i.e. design system constraints).

Each of the stages shades into its neighbours, as there is no clear boundary that differentiates certain project design activities (e.g. data extraction and exploratory analysis) from model design activities (e.g. preprocessing and feature engineering, model selection). As such, the design stage overlaps with the development stage, but the latter extends to include the actual process of training, testing, and validating a ML model. Similarly, the process of productionalising a model within a system can be thought of as both a development and deployment activity. And, so, the deployment stage overlaps with the ‘development’ stage, and also overlaps with the ‘design’ stage because the deployment of a system should be thought of as an ongoing process (e.g. where new data are used to continuously train the ML model, or, the decision to de-provision a model may require the planning and design of a new model if the older (legacy) system becomes outdated).

Despite the unidirectional nature of the arrows, we also acknowledge that ML/AI research and innovation is frequently an iterative process. Therefore, the singular direction is only present at a macroscopic level of abstraction (i.e., the overall direction of progress for a project), and allows for some inevitable back and forth between the stages at the microscopic level.

The three higher-level stages can be thought of as a useful heuristic for approaching the project lifecycle. However, each higher-level stage subsumes a wide variety of tasks and activities that are likely to be carried out by different individuals, teams, and organisations, depending on their specific roles and responsibilities (e.g. procurement of data). Therefore, it is important to break each of the three higher-level stages into their (typical) constituent parts, which are likely to vary to some extent between specific projects or within particular organisations.

The following pages explore each of these stages in detail.


  1. These four stages are influenced by an ‘ML OPs’ perspective. The term ‘MLOps’ refers to the application of DevOps practices to ML pipelines. The term is often used in an inclusive manner to incorporate traditional statistical or data science practices that support the ML lifecycle, but are not themselves constitutive of machine learning (e.g. exploratory data analysis), as well as deployment practices that are important within business and operational contexts (e.g. monitoring key performance indicators).