Unaligned AI systems may cause illegitimate value change. At the heart of this risk lies the observation that the malleability inherent to human values can be exploited in ways that make the resulting value change illegitimate. Recall that I take illegitimacy to follow from a lack of or (significant) impediment to a person’s ability to self-determine and course-correct a value-change process.
Mechanisms causing illegitimate value change
Instantiations of this risk can already be observed today, such as in the case of recommender systems. It is worth spending a bit of time understanding this example before considering what lessons it can teach us about risks from advanced AI systems more generally. To this effect, I will draw on work by Hardt et al. (2022), which introduces the notion of ‘performative power’. Performative power is a quantitative measure of ‘the ability of a firm operating an algorithmic system, such as a digital content recommendation platform, to cause change in a population of participants’ (p. 1). The higher the performative power of a firm, the higher its ability to ‘benefit from steering the population towards more profitable [for the firm] behaviour’ (p. 1). In other words, performative power allows us to measure the ability of the firm running the recommender systems to cause exogenously induced value change[1] in the customer population. The measure was specifically developed to advance the study of competition in digital economies, and in particular, to identify anti-competitive dynamics.
What is happening here? To better understand this, we can help ourselves to the distinction between ‘ex-ante optimization’ and ‘ex-post optimization’, introduced by Predomo et al. (2020). The former—ex-ante optimisation—is the type of predictive optimisation that occurs under conditions of low performative power, where a predictor (a firm in this case) cannot do better than the information that standard statistical learning allows to extract from past data about future data. Ex-post optimisation, on the other hand, involves steering the predicted behaviour such as to improve the predictor’s predictive performance. In other words, in the first case, the to-be-predicted data is fixed and independent from the activity of the predictor, while in the second case, the to-be-predicted data is influenced by the prediction process. As Hardt et al. (2022) remark: ‘[Ex-post optimisation] corresponds to implicitly or explicitly optimising over the counterfactuals’ (p. 7). In other words, an actor with high performative power does not only predict the most likely outcome; functionally speaking, it can perform as if it can choose which future scenarios to bring about, and then predicts those (thereby being able to achieve higher levels of predictive accuracy).
According to our earlier discussion of the nature of (il)legitimate value change, cases where performative power drives value change in a population constitute an example of illegitimate change. The people undergoing said change were in no meaningful way actively involved in the change that the performative predictor affected upon said population, and their ability to ‘course-correct’ was actively reduced by means of (among others) choice design (i.e., affecting the order of recommendations a consumer is exposed to) or by exploiting certain psychological features which make it such that some types of content are experienced as locally more compelling than others, irrespective of said content’s relationship to the individuals’ values or proleptic reasons.
What is more, the change that the population undergoes is shaped in such a way that it tends towards making the values more predictable. To explain this, first note that the performative predictor (i.e., the firm running the recommender platform) is embedded in an economic logic which imposes an imperative to minimise costs and increase profits. As a result, a firms’ steering power will specifically tend towards making the predicted behaviour easier to predict, because it is this predictability that the firm is able to exploit for profit (e.g., via increases in advertisement revenues). This process has been well documented to date. For example, in the case of recommendation platforms, rather than finding an increased heterogeneity in viewing behaviour, studies have observed that these platforms suffer from what is called a ‘popularity bias’, which leads to a loss of diversity and a homogenisation in the content recommended (see, e.g., Chechkin et al. (2007), DiFranzo et al. (2017), & Hazrati et al. (2022)). As such, predictive optimisers impose pressures towards making behaviour more predictable, which, in reality, often imply pressures towards simplification, homogenisation, and/or polarisation of (individual and collective) values.
...in the case of (advanced) AI systems
While current-day recommender platforms may already possess a significant degree of performative power, it is not hard to imagine that more advanced AI systems come to be able to exploit human psychology and socio-economic dynamics yet more powerfully. There is a priori not much reason to expect that humans’ evolved psychology would be particularly robust against an artificial superintelligent ‘persuader’. Beyond recommender systems powered by highly advanced AI systems, we can also imagine an increasingly widespread use of personalised ‘AI assistants’. We can imagine the tasks of an AI assistant as helping the person they are assisting to meet their needs, achieve their goals or satisfy their preferences. Given the difficulty of comprehensively and unambiguously specifying what a person wants across a wide range of contexts, such ‘assistance’ will typically involve some element of guessing (i.e., predicting) on the parts of the AI systems. As such, and given the dynamics discussed above, and if not successfully designed to avoid VCP, such ‘AI assistants’ are likely to ‘improve their performance’ by causing substantive and cumulative changes in individuals’ goals and values. What is more, just like with the above case of recommender algorithms, the nature of change induced by such an ‘AI assistant’ will tend (unless relevant corrective measures are taken) towards a simplification of the data structures that are being predicted—in this case the human’s values. To illustrate this: an ‘AI assistant‘ might be able to improve its performance measures by effectively narrowing my culinary preferences to always ask for burger and fries, instead of occasionally being interested in exploring novel flavours and dishes. The picture painted above is concerning because, for one, it undermines the person’s ability to self-determine their values, and, for two, the ensuing change might bring about what effectively is an impoverishment of what once were richer or more subtle values. The described effect does not require any ’maliciousness’ on the side of the AI systems, but can arise as ‘merely’ unintended consequence of their way of functioning.
What is important to recognise is that the described mechanism has the potential to reach both ‘far’ and ‘deep’—in other words, it has the potential to substantially affect both our public and private lives; people’s economic, social, political and personal beliefs, values, behaviours and relationships. Think for example of the pervasive presence of advertisement (reaching, these days, even far into the private sphere via smartphones and television) and of how much of economic behaviour is shaped by it everyday. Or, think about how the same mechanism can affect opinion formation, public deliberation and, consequently, political outcomes. As such, AI-powered advertisement or political propaganda, as well as other applications we may not even be able to conceive of at this point, hold tremendous potential for harm.
Let us recap the mechanics underlying the risk of illegitimate value change that we have identified here. Generally speaking, we are concerned with cases where a predictive optimiser (or a process that acts functionally equivalent to one) comes to be able to systematically affect that which it is predicting. If the phenomenon that is being predicted involves what some set of humans want, the performative optimiser will come to influence those humans’ values. If one assumes human values to be fixed and unchangeable, one might conclude that there is nothing to worry about here. However, recognising the malleability of human values makes this risk stand out as salient and potentially highly pressing. Advanced AI systems will become increasingly more capable at this form of performative prediction, thus exacerbating whatever patterns we can already make out today. The wider these AI systems will be deployed in relevant socio-economic contexts—such as advertisement, information systems, our political lives, our private lives and more—the more severe and far-reaching the potential harm.
The observed change in the population might not be exclusively due to value change. However, it can (and typically will) involve a non-trivial amount of value change, and as such, performative power is a relevant measure to understand the phenomena of exogenously induced value change.
4. Risks from causing illegitimate value change (performative predictors)
Unaligned AI systems may cause illegitimate value change. At the heart of this risk lies the observation that the malleability inherent to human values can be exploited in ways that make the resulting value change illegitimate. Recall that I take illegitimacy to follow from a lack of or (significant) impediment to a person’s ability to self-determine and course-correct a value-change process.
Mechanisms causing illegitimate value change
Instantiations of this risk can already be observed today, such as in the case of recommender systems. It is worth spending a bit of time understanding this example before considering what lessons it can teach us about risks from advanced AI systems more generally. To this effect, I will draw on work by Hardt et al. (2022), which introduces the notion of ‘performative power’. Performative power is a quantitative measure of ‘the ability of a firm operating an algorithmic system, such as a digital content recommendation platform, to cause change in a population of participants’ (p. 1). The higher the performative power of a firm, the higher its ability to ‘benefit from steering the population towards more profitable [for the firm] behaviour’ (p. 1). In other words, performative power allows us to measure the ability of the firm running the recommender systems to cause exogenously induced value change[1] in the customer population. The measure was specifically developed to advance the study of competition in digital economies, and in particular, to identify anti-competitive dynamics.
What is happening here? To better understand this, we can help ourselves to the distinction between ‘ex-ante optimization’ and ‘ex-post optimization’, introduced by Predomo et al. (2020). The former—ex-ante optimisation—is the type of predictive optimisation that occurs under conditions of low performative power, where a predictor (a firm in this case) cannot do better than the information that standard statistical learning allows to extract from past data about future data. Ex-post optimisation, on the other hand, involves steering the predicted behaviour such as to improve the predictor’s predictive performance. In other words, in the first case, the to-be-predicted data is fixed and independent from the activity of the predictor, while in the second case, the to-be-predicted data is influenced by the prediction process. As Hardt et al. (2022) remark: ‘[Ex-post optimisation] corresponds to implicitly or explicitly optimising over the counterfactuals’ (p. 7). In other words, an actor with high performative power does not only predict the most likely outcome; functionally speaking, it can perform as if it can choose which future scenarios to bring about, and then predicts those (thereby being able to achieve higher levels of predictive accuracy).
According to our earlier discussion of the nature of (il)legitimate value change, cases where performative power drives value change in a population constitute an example of illegitimate change. The people undergoing said change were in no meaningful way actively involved in the change that the performative predictor affected upon said population, and their ability to ‘course-correct’ was actively reduced by means of (among others) choice design (i.e., affecting the order of recommendations a consumer is exposed to) or by exploiting certain psychological features which make it such that some types of content are experienced as locally more compelling than others, irrespective of said content’s relationship to the individuals’ values or proleptic reasons.
What is more, the change that the population undergoes is shaped in such a way that it tends towards making the values more predictable. To explain this, first note that the performative predictor (i.e., the firm running the recommender platform) is embedded in an economic logic which imposes an imperative to minimise costs and increase profits. As a result, a firms’ steering power will specifically tend towards making the predicted behaviour easier to predict, because it is this predictability that the firm is able to exploit for profit (e.g., via increases in advertisement revenues). This process has been well documented to date. For example, in the case of recommendation platforms, rather than finding an increased heterogeneity in viewing behaviour, studies have observed that these platforms suffer from what is called a ‘popularity bias’, which leads to a loss of diversity and a homogenisation in the content recommended (see, e.g., Chechkin et al. (2007), DiFranzo et al. (2017), & Hazrati et al. (2022)). As such, predictive optimisers impose pressures towards making behaviour more predictable, which, in reality, often imply pressures towards simplification, homogenisation, and/or polarisation of (individual and collective) values.
...in the case of (advanced) AI systems
While current-day recommender platforms may already possess a significant degree of performative power, it is not hard to imagine that more advanced AI systems come to be able to exploit human psychology and socio-economic dynamics yet more powerfully. There is a priori not much reason to expect that humans’ evolved psychology would be particularly robust against an artificial superintelligent ‘persuader’. Beyond recommender systems powered by highly advanced AI systems, we can also imagine an increasingly widespread use of personalised ‘AI assistants’. We can imagine the tasks of an AI assistant as helping the person they are assisting to meet their needs, achieve their goals or satisfy their preferences. Given the difficulty of comprehensively and unambiguously specifying what a person wants across a wide range of contexts, such ‘assistance’ will typically involve some element of guessing (i.e., predicting) on the parts of the AI systems. As such, and given the dynamics discussed above, and if not successfully designed to avoid VCP, such ‘AI assistants’ are likely to ‘improve their performance’ by causing substantive and cumulative changes in individuals’ goals and values. What is more, just like with the above case of recommender algorithms, the nature of change induced by such an ‘AI assistant’ will tend (unless relevant corrective measures are taken) towards a simplification of the data structures that are being predicted—in this case the human’s values. To illustrate this: an ‘AI assistant‘ might be able to improve its performance measures by effectively narrowing my culinary preferences to always ask for burger and fries, instead of occasionally being interested in exploring novel flavours and dishes. The picture painted above is concerning because, for one, it undermines the person’s ability to self-determine their values, and, for two, the ensuing change might bring about what effectively is an impoverishment of what once were richer or more subtle values. The described effect does not require any ’maliciousness’ on the side of the AI systems, but can arise as ‘merely’ unintended consequence of their way of functioning.
What is important to recognise is that the described mechanism has the potential to reach both ‘far’ and ‘deep’—in other words, it has the potential to substantially affect both our public and private lives; people’s economic, social, political and personal beliefs, values, behaviours and relationships. Think for example of the pervasive presence of advertisement (reaching, these days, even far into the private sphere via smartphones and television) and of how much of economic behaviour is shaped by it everyday. Or, think about how the same mechanism can affect opinion formation, public deliberation and, consequently, political outcomes. As such, AI-powered advertisement or political propaganda, as well as other applications we may not even be able to conceive of at this point, hold tremendous potential for harm.
Let us recap the mechanics underlying the risk of illegitimate value change that we have identified here. Generally speaking, we are concerned with cases where a predictive optimiser (or a process that acts functionally equivalent to one) comes to be able to systematically affect that which it is predicting. If the phenomenon that is being predicted involves what some set of humans want, the performative optimiser will come to influence those humans’ values. If one assumes human values to be fixed and unchangeable, one might conclude that there is nothing to worry about here. However, recognising the malleability of human values makes this risk stand out as salient and potentially highly pressing. Advanced AI systems will become increasingly more capable at this form of performative prediction, thus exacerbating whatever patterns we can already make out today. The wider these AI systems will be deployed in relevant socio-economic contexts—such as advertisement, information systems, our political lives, our private lives and more—the more severe and far-reaching the potential harm.
The observed change in the population might not be exclusively due to value change. However, it can (and typically will) involve a non-trivial amount of value change, and as such, performative power is a relevant measure to understand the phenomena of exogenously induced value change.