As artificial intelligence becomes intricately interwoven with our daily experiences, influencing both personal and societal decisions, mere transparency in such AI systems falls short of being the ultimate solution to their inherent value-ladenness, writes Rune Nyrup.
AI is the future. Or so it would appear from countless press releases and strategies emanating from tech industry and governments alike. And indeed, AI-driven technology has become ubiquitous, whether in the form of applications like ChatGPT and DALL-E that sit visibly in our palms generating text and images, or algorithms that work discretely to personalise adverts, estimate our credit scores and much else besides, based on data streams that we are scarcely aware of. Regardless of what we call these technologies, increasing digitalisation has and will continue to define and reshape our individual and collective lives.
Technological change is always value laden. That is, how we design and implement technologies will always promote specific social and political values, either unintentionally or by design. This has long been argued by philosophers of technology such as Langdon Winner, who in his classical essay “Do Artefacts have Politics?”, explored how building low overpass bridges across the roads in a certain neighbourhood makes the area inaccessible to buses. This itself embodies a preference for private motorists over public transport, which in turn becomes a preference against those who rely on public transport—such as poorer or disabled people—which furthermore correlates with other dimensions of marginalisation, including race, ethnicity, age and class.
While this point is clear enough in the case of physical infrastructure, it applies no less to the digital infrastructures that are increasingly embedded into our lives. As Jenny Davis puts it, digital technologies variously demand, refuse, request, allow, encourage or discourage certain kinds of actions from different kinds of people. For instance, social media doesn’t just provide us with a means to communicate: implicit in the very design of these sites are embodiments of philosophies about what a person is and how we should value and relate to one another. Artificial intelligence is no exception either. How we design and deploy AI will inevitably involve value-laden choices regarding what it should be possible and convenient to do, and for whom.
For one thing, transparency alone does not enable autonomous choice or democratic deliberation.
How, then, should we manage the value-ladenness of AI? One of the commonly proposed solutions levelled by policymakers, NGOs, and private companies when discussing the ethical challenges presented by AI has been to focus on transparency. Making explicit how AI systems function and the value choices that go into their design—so the idea goes—will enable people to make informed choices about how to use AI, allow regulators to hold developers accountable and facilitate public debate about how these technologies should be deployed.
This is an attractive line of thought, grounded in widely professed ideals of personal autonomy and democratic deliberation. But while it is indeed important to make AI more transparent, this in itself will not resolve the issue of value-ladenness. For one thing, transparency alone does not enable autonomous choice or democratic deliberation. It requires a critical audience, as Jakko Kemper and Daan Kolkman put it, in the form of technologically literate consumers or investigative journalists, who are able to appreciate the significance of the information that is made transparent. More fundamentally, autonomous choice assumes that we have meaningfully different options available to us in the first place. However, if all lenders employ practically indistinguishable forms of data-driven credit surveillance, making this fact transparent does not in itself empower consumers. As Hao Wang has recently shown, transparency under such circumstances instead becomes an expression of power, merely dictating norms to consumers they will have to comply with.
These limitations to transparency as a strategy for managing value-ladenness arise for any technology. However, as I have recently argued, when it comes to the type of complex AI systems that are increasingly being developed, the problems go much deeper.
To see this, we first have to look closer at modern AI. Almost all AI today are based (in some way or another) on machine learning. Briefly put, machine learning refers to computer programmes that use data to automatically improve their ability to perform a given task, and many do this through employing a process of optimisation through trial-and-error. The programme tests the performance of an initial model (i.e., a set of decision rules) on a body of training data. Based on the results, the model is then adjusted, usually by changing some of its numerical parameters. The process is then repeated until no further improvement is achieved.
No matter how the training process is designed, it will necessarily embody value-laden assumptions about what counts as correct “performance”.
To automate this process, it must be programmed in a mathematically precise way. Crucially, this includes a precise operationalisation of how to score the programme’s “performance”. That is, there must be some unambiguous process through which the programme can calculate how “well” it did on the task. A game-playing AI might measure how often it wins vs. loses a specific set of video games. A language model might measure how often it correctly predicts the next word in a dataset scraped from the internet. More complicated procedures are also possible, for example, to give different weight to different kinds of errors or incorporate human feedback on whether the system did the right thing. But no matter how the training process is designed, it will necessarily embody value-laden assumptions about what counts as correct “performance”. Is the point of playing games to always win? Is good language-use represented by what people write on the internet? What kinds of errors are most important to avoid? Who gets to decide and provide feedback on what counts as the “right thing”?
Making these kinds of assumptions explicit and accessible will certainly be an important step towards making AI more transparent. This could for example take the form of clearly documenting what datasets were used in training, how performance was defined and measured, with explicit justifications for those choices. Even if this information may not be useful to the average user, it could be used by regulators, auditors and researchers to evaluate and highlight some of the value-laden assumptions embedded within the system to the broader public.
However, for sufficiently complex AI systems, this type of transparency may not be enough. This is due to one of the fundamental motivations behind machine learning, namely its ability to discover novel solutions. By optimising highly complex models—sometimes containing millions or billions of adjustable parameters—on sufficiently large datasets, machine learning enthusiasts hope to discover complex decision-rules for solving problems that would have otherwise been out of reach to human inquiry. Exactly this ability to surprise us, to find unexpected solutions, gives rise to the phenomenon that Victoria Krakovna calls specification gaming.
The system cannot tell the two apart, as it has no access to the designers’ intentions outside of the mathematical problem specification it has been programmed to solve.
Specification gaming occurs when a machine learning system finds an unexpected but effective solution to the mathematically precise specification of the problem it had been programmed to solve, which however intuitively fails to capture what the designers intended. For example, a game-playing AI may discover a bug in a video game that allows it to crash the game just before a loss is registered. A language model may learn to generate non-existent citations because this makes a text seem more plausible. These are simple illustrations, which seem obvious once pointed out. However, a complex machine learning system will also be able to find much subtler ways to “misinterpret” the problem specification. The greater the ability of the system to discover unanticipated solutions to the problem the designers intended, the greater the risk that it will discover unanticipated but unintended solutions instead. The system cannot tell the two apart, as it has no access to the designers’ intentions outside of the mathematical problem specification it has been programmed to solve. Finally, if the system has been designed to find solutions that go beyond existing human knowledge, designers may not be able to tell them apart either. Especially not if the impacts are subtle value-laden shifts of the kind we discussed earlier, rather than a single catastrophic event.
The problem of specification gaming presents a major challenge to transparency. If not even designers will be able to recognise unintended solutions, making design choices and assumptions explicit to external regulators, auditors or researchers is unlikely to help.
When the values embedded in AI systems misalign with societal values, or when they amplify existing biases, they can perpetuate inequalities, diminish trust in institutions, and even pose direct harm. Thus, understanding the value-ladenness of AI is essential not just for tech enthusiasts or ethicists but for anyone who interacts with modern digital platforms and services—which, in today's world, is virtually everyone. Perhaps the problem can be solved. Researchers across disciplines are working on improved ways to validate problem specifications, make the behaviour of AI systems more predictable, and better predict their impacts. However, until such efforts bear fruit transparency should not be regarded as any kind of catch-all solution to managing the value-ladenness of AI.