Stage 1: Project Design¶
In this section we will begin by looking at the first four tasks of the project lifecycle model, which are concerned with the project design stage. For each task, a description is given as well as information about the importance of the typical activities associated with the task, including issues that have an ethical significance.
Project Planning¶
Task Description¶
The project planning
task encompasses the preliminary activities that are intended to help determine the aims, objectives, scope, and processes associated with the project, including an assessment of the potential risks and benefits.
The following activities are illustrative but not exhaustive:
- Resource and responsibilities evaluation and allocation: this can help a project team identify gaps in the team's skills or organisation's resources, which may require support from an external partner (e.g. procurement of services)
- Stakeholder engagement planning: identification and analysis of relevant stakeholders and affected users, with an emphasis on the inclusion of diverse or otherwise marginalised voices
- Risk and impact assessments: there are many forms of risk and impact assessment that may need to be carried out, including data protection, safety, and equality impact assessments.
These initial assessments can help determine the proportional level of any risk or impact mitigation activities.
- Project plan documentation: the development and reporting of an actual project plan, which can be used to track progress and identify any issues that may arise. This can be a formal document or a more informal plan that is used to guide the project team's activities.
Importance of task¶
- Creates a space for anticipatory and reflective activities (see AREA framework) which are necessary for a stable foundation for the project.
- Offers an opportunity for the team to agree on any "red lines" (e.g. contexts or domains in which a system should not be used, data types that are not permissible to collect or use).
- Allows project team to set milestones and objectives that can be used throughout the project to determine if their original goals have been achieved.
Illustrative Examples¶
Illustrative examples coming soon!
Illustrative examples coming soon!
Problem Formulation¶
Task Description¶
This task involves the formulation of a clear statement about the overarching problem the target system or project seeks to address (e.g. a research statement or system specification) and a lower level description of the computational procedure that instantiates it (e.g. a functional mapping from input to output variables and explanation about why it is appropriate).
Importance of Task¶
The importance of this stage is split across the two interlocking understandings of the term "problem":
1) As a statement about a well-defined computational process (or a higher-level abstraction of the process), this task helps identify the validity and legitimacy of the project. For example, an algorithmic system that attempts to predict a candidate's 'employability' (the target variable) on the basis of a model trained on biased data from historical hiring practices will be perceived as unjust.
2) As a statement about how the system attempts to address a wider practical, social, or policy issue, this task helps the project team determine if their goal is valid and if the target system is sufficient to achieve their goal. It can also support stakeholder engagement and project communication activities.
Illustrative Example¶
Illustrative examples coming soon!
Illustrative examples coming soon!
Data Extraction (or Procurement)¶
Task Description¶
By data extraction
we refer to a) the design of an experimental method or decisions about data gathering and collection, based on the planning and problem formulation from the previous steps, and/or b) the actual extraction and storage of novel data or the procurement of existing data.
Research Data Management
This description is situated at a high level of abstraction. As such, it encompasses many more fine-grained activities that are typically associated with the development and governance of a data pipeline (e.g the Turing Way's guide on research data management).
Importance of Task¶
The well-known principle of 'garbage-in, garbage-out' summarises the importance of this task nicely.
As data-driven technologies, ML algorithms or AI systems depend on the data fed into them. However, due diligence at this stage is important for reasons other than statistical validity. Responsible data extraction is, among other reasons, vital for the design of accountable and trustworthy services, the development of safe, fair, and explainable algorithms, and the deployment of sustainable and privacy-preserving systems.
Illustrative Example¶
Illustrative examples coming soon!
Illustrative examples coming soon!
Data Analysis¶
Task Description¶
Data analysis is typically split into two types: exploratory
and confirmatory
analysis:
- Exploratory data analysis allows analysts to better understand the structure and content of the dataset, and identify possible associations between data types and variables.
- Confirmatory data analysis is where initial hypotheses that are developed in the previous stage are evaluated using a variety of statistical methods (e.g. significance testing).
Importance of Task¶
In the context of responsible research and innovation, data analysis is vital to the assessment of myriad biases that can negatively impact a project, many of which are most obvious at this stage.
Identifying and dealing with missing data is particularly important during this task. Although upstream stakeholder engagement activities can help mitigate the impact of this bias, identifying the scope of its impact and determining how effectively it can be addressed (e.g. using various methods imputation, collecting additional data), will largely depend on the quality of the data analysis task.
Illustrative Example¶
Illustrative examples coming soon!
Illustrative examples coming soon!