Responsible Public Engagement in Data Science and AI¶
Quote
The public have to live with the consequences of science and technology. As such, RRI is above all "an opportunity for truly collective stewardship of our highly technologized future." (Sebastian Pfotenhauer quoted in Pain, 2017)[@pain2017]
The above quote helps to motivate a need to consider our responsibility, as researchers, to the societal impact of science and technology. The practical tools and methods we will explore in subsequent chapters will help us meet our various obligations. However, before we shift to the practical considerations, there is one final question to consider:
Question
What does it mean to conduct responsible public engagement in data science and AI?
Responsible research and innovation is increasingly important in science and technology. As a term, 'responsible research and innovation' (RRI) is most strongly associated with the European Commission's Framework Programmes for Research and Technological Development—a set of funding programmes that support research in the European Union. Beginning with the seventh framework programme in 2010, and continuing on through Horizon 2020 (FP8), the term 'responsible research and innovation' became increasingly important for the European Commission's policy.
Since then, other national funding bodies have also shown a commitment to RRI. For example, UKRI's Engineering and Physical Sciences Research Council have developed the AREA framework, which sets out four principles for RRI: Anticipate, Reflect, Engage, and Act (AREA).
AREA Principles
Anticipate - Describe and analyse the impacts, intended or otherwise, that might arise. Do not seek to predict but rather support the exploration of possible impacts (such as economic, social and environmental) and implications that may otherwise remain uncovered and little discussed.
Reflect - Reflect on the purposes of, motivations for and potential implications of the research, together with the associated uncertainties, areas of ignorance, assumptions, framings, questions, dilemmas and social transformations these may bring.
Engage - Open up such visions, impacts and questioning to broader deliberation, dialogue, engagement and debate in an inclusive way.
Act - Use these processes to influence the direction and trajectory of the research and innovation process itself.
However, research and innovation in data science and AI is unlike traditional scientific research or technological innovation in many ways. For example, the following considerations are pertinent and relate to the impact upon public engagement specifically:
- Effective public communication requires sufficient levels of data and digital literacy (e.g. understanding of statistics, data visualisation)
- Conceptual knowledge and understanding of relevant concepts such as 'autonomous and adaptive behaviour', 'intelligence', or ' algorithmic system' may vary widely, such that different stakeholders may not be talking about the same thing
- Participants will hold differing opinions on key topics, such as data privacy and protection, which could give rise to challenging attitudes such as algorithmic aversion.1
- Values and norms concerning the impact of novel data-driven technologies are still in flux, as they continue to affect different areas of society. Therefore, public attitudes may also vary, perhaps drastically, over time.
As such, legitimate doubts could be raised about how universally these principles apply, and also whether they extend to considerations about public engagement?
In response to these doubts, a justifiable argument could be made that some principles of RRI are more directly relevant to some of the applied sciences than more theoretical disciplines (e.g. fundamental physics).
Two comments can be made:
- We should exercise caution in assenting to such a belief too rapidly given the many examples in the history of science that show how unintended consequences can arise as a result of advances in some scientific discipline that even experts could not foresee.
- Although principles of RRI may apply more strongly in some areas of application or research, there are also principles (e.g. FAIR principles) that support responsible public engagement that ought to apply universally (e.g. research integrity, promotion of equality, diversity, and inclusivity among researchers).
Question
How else does public engagement in data science and AI differ from public engagement in traditional scientific research or technological innovation?
The Burden of Responsibility¶
Exercising responsibility can be demanding. In the context of public engagement for research this is made more demanding by the existence of several barriers:
- High levels of competition for research funding
- Secrecy among project teams
- Temporary (fixed-term) contracts for researchers
- Time pressures
Although some of the tools and methods we will consider in later chapters can help alleviate some of these burdens, it would be naive to pretend that a wider institutional change in attitudes and incentive structures is not also necessary.2
On top of these barriers, public engagement in data science and AI is not always recognised as valuable for the furthering of one's scientific career by some specialists. However, consider the following quotation:
Quote
The lack of diverse jobs after a PhD, a shrinking funding budget and lack of public support all seem daunting to graduate students. In the face of these major hurdles, many scientists isolate themselves from these issues and focus on their own research. A recent study has shown, however, that increased interaction with reporters and the more Twitter mentions a study receives correlate with a higher h-index of the author, a metric for measuring the scientific impact of a publication. To determine whether media coverage of a published science article is causative of increased citations, a 1991 study looked at journal articles that would have been covered by the New York Times, but due to a writer’s strike in the late 1970s, were not. Researchers found that the journal articles that were not covered had consistently less citations than other research articles covered by the Times. There is clear value for the science research community to publicise new findings to the public.[@pham2016]
While this is only one form of engagement (i.e. communication) it serves as a pragmatic reason for pursuing public engagement. However, there are also ethical reasons for pursuing public engagement in data science, in spite of the aforementioned barriers.
With these considerations in mind, let us now return to the original question, 'what does it mean to conduct responsible public engagement in data science and AI?' Although some of the principles alluded to above may also apply to responsible engagement in data science and AI (e.g. AREA principles), there are more specific principles that can help us with planning and delivering public engagement activities. We call these the SAFE-D principles.
SAFE-D Principles¶
We cover the SAFE-D principles in greater detail in our our course on responsible research and innovation. Here, it will suffice to simply summarise them and explain how they can support responsible public engagement through some illustrative examples. We will return to some of the examples in later activities.
In general, the SAFE-D principles are designed to support responsible project governance in data science and AI research and innovation. Insofar as public engagement forms a key part of this research and innovation, the SAFE-D principles serve as useful ethical reflection points to help design responsible public engagement activities.
Sustainability¶
Sustainability can mean a couple of things. From a technical perspective, sustainability requires the outputs of a project to be safe, secure, robust, and reliable. For example, if an organisation is developing an autonomous vehicle, it should operate safely in the intended context of use. However, in the context of responsible data science and AI, there is also a social sustainability component. This aspect of sustainability requires a project’s practices and outputs to be informed a) by ongoing consideration of the risks of harm to individuals and society, even after the system has been deployed and the project completed—a long-term (or sustainable) safety—but also b) by the shared values that motivate the pursuit of social goals.
The final point in the above summary is significant for public engagement, and connects with the following question:
Question
Who Designs the Future?
As society continues to stagger forward feeling the ongoing effects of the global pandemic, while struggling with the increasing complexity and uncertainty that a modern, data-driven society poses, the ability for particular voices to participate in imagining a collective vision for the future becomes increasingly difficult. This can lead to a growing distrust in vital social institutions as people become alienated from society.
In response, organisations like Nesta in the UK are carrying out public engagement activities focused on building "participatory futures", which aim to include marginalised or systematically excluded voices in these conversations. Public engagement activities and events such as these can play a vital role in supporting sustainable research and innovation.
Accountability¶
In contrast to responsibility, accountability is often seen as a backwards-looking process of holding an individual or organisation to account. It is commonly associated with regulation and governance, but in the context of data science or AI research and innovation, it goes beyond these matters.
In this context, 'accountability' refers to the transparency of processes and associated outcomes that enable people to understand how a project was conducted (e.g., project documentation), or why a specific decision was reached. But it can also refer to broader processes of responsible project governance that seek to establish clear roles of responsibility where full transparency may be inappropriate (e.g., confidential projects).
Without this broader, end-to-end awareness of accountability throughout a project's lifecycle, it is possible that harms could arise such that specific individuals are stripped of their ability to seek redress for their grievances or violations of certain rights.
Responsible public engagement, therefore, can serve as a way to ensure that a project and its associated outputs remain sufficiently accountable, by ensuring that all affected stakeholders are able to voice potential concerns. The following quote is a nice reflection point for the values contained within this principle:
Quote
Sometimes we want everyone’s voice to be heard because we think that will make a better decision as a result, and sometimes we want everyone’s voice to be heard simply because we think that everyone has a right to be heard.[@macgilvray2014]
Fairness¶
Consider the following statistic:
Quote
15 per cent of scientists come from working class backgrounds; and in the US, children from the top 1 per cent of richest families (by income) are ten times as likely to have filed for a patent as those from families in the bottom half of the income distribution.[@saunders2018]
While much has been done in the past decade to improve access to science and technology, there is, of course, still a lot more that needs to be done to ensure all individuals have an equal opportunity to participate in a core part of society. This is, ultimately, a matter of fairness.
Fairness is inseparably connected with legal conceptions of equality and justice. In the context of data science or AI, this often leads to an emphasis on features such as non-discrimination, equitable outcomes of automated decisions, or procedural fairness through bias mitigation. However, these notions serve only a subset of broader ethical considerations pertaining to social justice, socioeconomic capabilities, diversity and inclusivity.
As the above quote acknowledges, fairness is also a matter of addressing the socioeconomic barriers that prevent or make it difficult for certain individuals or communities to participate. Responsible public engagement, therefore, can help researchers or developers identify relevant barriers and work with affected people to co-design ways to improve capabilities.
Explainability¶
Explainability has received widespread attention in recent years from many within the AI community. This is because it is a key requirement for autonomous and informed decision-making in situations where data-driven systems interact with or influence human judgement and choice behaviour. For example, if an autonomous vehicle is involved in an accident or if a algorithmic system fails to recognise a fault in vital energy infrastructure, the reasons for these unexpected and undesirable behaviours needs to be identified.
As such, there needs to be a) interpretability of the system itself (e.g., an ability to make sense of the underlying logic by which a system reaches a decision) and b) the ability to translate this information into a variety of explanations that are acceptable to relevant users and stakeholders. On this second point, public engagement can support evaluation and determination of both the accessibility and usability of different explanations, as well as providing input on possible trade-offs between, say, the interpretability and accuracy of possible system types (e.g., decision trees versus neural networks).3
Data Quality, Integrity, Privacy and Protection¶
Data quality, integrity, protection and privacy must all be established to be confident that a research or innovation project has been designed, developed, and deployed in a responsible manner.
- ‘Data Quality’ captures the static properties of data, such as whether they are a) relevant to and representative of the domain and use context, b) balanced and complete in terms of how well the dataset represents the underlying data generating process, and c) up-to-date and accurate as required by the project.
- ‘Data Integrity’ refers to more dynamic properties of data stewardship, such as how a dataset evolves over the course of a project lifecycle. In this manner, data integrity requires a) contemporaneous and attributable records from the start of a project (e.g., process logs; research statements), b) ensuring consistent and verifiable means of data analysis or processing during development, and c) taking steps to establish findable, accessible, interoperable, and reusable records towards the end of a project’s lifecycle.
- ‘Data protection and privacy’ reflect ongoing developments and priorities as set out in relevant legislation and regulation of data practices as they pertain to fundamental rights and freedoms, democracy, and the rule of law. For example, the right for data subjects to have inaccurate personal data rectified or erased.
These final properties are, typically, more technical than the others (e.g., requiring legal expertise to determine if a project violates any data protection laws). However, public engagement can help determine whether a project that is legal is also acceptable from a moral perspective.
For example, if a research project uses a basis of informed consent to collect and process personal data, are the participants able to opt-out at a later stage and are their processes to make this decision easy and accessible for participants. Whether this is the case will, invariably, be a matter of working with participants to determine.
-
Algorithmic aversion refers to the reluctance of human agents to incorporate algorithmic tools as part of their decision-making processes due to misaligned expectations of the algorithm’s performance (see Burton et al. 2020). ↩
-
There are increasing signs of reform in the funding landscape, however, with more funding bodies favouring applications that demonstrate commitments to responsible research and innovation or include early career researchers as co-investigators. ↩
-
For more on the explainability/accuracy trade-off see our module on explainability. ↩