What is data justice?¶

‘Impacted Communities’ illustration by Johnny Lighthands, Creative Commons Attribution-ShareAlike 4.0 International

Data-intensive technologies are increasingly deployed and used for diverse applications across domains, such as healthcare, policing, and education. Although such technological advances may offer various opportunities, there is a growing body of research and practice that highlights how the proliferation of data-intensive technologies exacerbate longstanding social inequities, or even contribute to the generation of new ones.

As with any sociotechnical phenomenon, data-intensive technologies are neither neutral nor apolitical. They come into being through a mixture of human values, behaviours, and decisions of the creators of such technologies¹. Researchers have studied the role of social structures within and around data-intensive technologies across intersecting social dimensions, including class, race, and gender. They have shown that algorithms and systems of classification are necessarily shaped by historical patterns like socio-economic, racial, and gender disparities in technical professions, and other manifestations of discrimination in society, and that they could reinforce such patterns of inequality².

Illustrative example: Facial recognition technologies

The ways in which data is collected, processed, and used can have significant impacts on the outcome of the system whether it is assisting with the provision of social services or determining what videos you may want to watch based on your past viewing history. If data about certain groups are scarce, incomplete, or missing, this could have significant impacts on the overall output of a model.

To illustrate the point, we can use the example of facial recognition technologies which are trained to recognise faces of individuals. In an example illustrated by Joy Buolamwini and Timnit Gebru³, a facial recognition classifier performed the worst on female faces with darker skin due to the underrepresentation of females with darker skin and individuals with darker skin in general in the datasets.

Due to the lack of representation of women and darker-skinned women in the datasets, the classifier was more likely to fail to recognise their faces, leading to potential harms like wrongful arrests and negative stereotyping which could in turn reinforce historical patterns of discrimination towards these marginalised groups. In this instance, the dataset, often called the training set (a dataset used to train the model on past historical patterns) was unrepresentative and therefore led to harmful impacts. This illustrates how the ways in which this data is collected and the information it contains is critical and has real-world impacts on those for whom the outcome of the model is intended for.

In response to such injustices reflected in data, there has been a growing movement of researchers, practitioners, and civil society groups seeking to address, challenge, and reimagine current practices of datafication. Data justice has emerged as a framework to characterise the multifaceted efforts to identify and enact ethical paths to social justice in an increasingly datafied world ⁴.

For a quick recap of the emerging movement of data justice, take a look at the short infographic video below.

Winner, Langdon. 1980. ‘Do Artifacts Have Politics?’ Daedalus 109(1): 121–36. ↩
Eubanks, V. (2018). Automating inequality: How high-tech tools profile, police, and punish the poor. St. Martin’s Press; Benjamin, R. (2019). Race after technology: Abolitionist tools for the new Jim code. Polity Books; D’Ignazio, C., & Klein, L. F. (2020). Data feminism. MIT Press. ↩
Buolamwini, J., & Gebru, T. (2018, January). Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on fairness, accountability and transparency (pp. 77-91). PMLR. https://proceedings.mlr.press/v81/buolamwini18a.html ↩
Taylor, L. (2017). What is data justice? The case for connecting digital rights and freedoms globally. Big Data & Society, July-December, 1-14. https://doi.org/10.1177/2053951717736335 ↩