I’m glad to see a post on alignment asking about the definition of human values. I propose the following conundrum. Let’s suppose that humans, if ask, say they value a peaceful, stable society. I accept the assumption the human mind contains one or more utility optimizers. I point out that the utility optimizers are likely to operate at individual, family, or local group levels, while the stated “value” has to do with society at large. So humans are not likely “optimizing” on the same scope as they “value”.
This leads to game theory problems, such as the last turn problem, and the notorious instability of cooperation with respect to public goods (commons). According to the theory of cliodynamics put forward by Turchin et. al. utility maximization by subsets of society leads to the implementation of wealth pumps that produce inequality, and to excess reproduction among elites, that leads to elite competition in a cyclic pattern. A historical database of over a hundred cycles from various parts of the world and history suggests every other cycle becomes violent or at least very destructive 90% of the time, and the will to reduce the number of elites and turn off the wealth pump occurs through elite cooperation less than 10% of the time.
I add the assumption that there is nothing special about humans, and any entities (AI or extraterrestrials) that align with the value goals and optimization scopes described above will produce similar results. Game theory mathematics does not say anything about the evolutionary history or take into account species preferences, after all, because it doesn’t seem to need to. Even social insects, optimizing presumably on much larger, but still not global scopes, fall victim to large scale cyclic wars (I’m thinking of ants here).
So is alignment even a desirable goal? Perhaps we should ensure that AI does not aid the wealth pump and elite competition and the mobilization of the immiserated commoners (Turchin’s terminology)? But it is the goal of many, perhaps most AI researchers to “make a lot of money” (witness recent episode with Sam Altman and support from OpenAI employees for his profit-oriented strategy, over the board’s objection, as well as the fact most competing entities developing AI are profit oriented—and competing!) But some other goal (e.g. stabilization of society) might have wildly unpredictable results (stagnation comes to mind).
I’m assuming the relevant values are the optimizer ones not what people say. I discussed social institutions, including those encouraging people to endorse and optimize for common values, in the section on subversion.
Alignment with a human other than yourself could be a problem because people are to some degree selfish and, to a smaller degree, have different general principles/aesthetics about how things should be. So some sort of incentive optimization / social choice theory / etc might help. But at least there’s significant overlap between different humans’ values. Though, there’s a pretty big existing problem of people dying, the default was already that current people would be replaced by other people.
Game theory mathematics does not say anything about the evolutionary history or take into account species preferences, after all, because it doesn’t seem to need to
Evo game theory is a thing and does not agree with this, I think? though maybe I misunderstand. evo gt still typically only involves experiments of the current simulated population
I’m glad to see a post on alignment asking about the definition of human values. I propose the following conundrum. Let’s suppose that humans, if ask, say they value a peaceful, stable society. I accept the assumption the human mind contains one or more utility optimizers. I point out that the utility optimizers are likely to operate at individual, family, or local group levels, while the stated “value” has to do with society at large. So humans are not likely “optimizing” on the same scope as they “value”.
This leads to game theory problems, such as the last turn problem, and the notorious instability of cooperation with respect to public goods (commons). According to the theory of cliodynamics put forward by Turchin et. al. utility maximization by subsets of society leads to the implementation of wealth pumps that produce inequality, and to excess reproduction among elites, that leads to elite competition in a cyclic pattern. A historical database of over a hundred cycles from various parts of the world and history suggests every other cycle becomes violent or at least very destructive 90% of the time, and the will to reduce the number of elites and turn off the wealth pump occurs through elite cooperation less than 10% of the time.
I add the assumption that there is nothing special about humans, and any entities (AI or extraterrestrials) that align with the value goals and optimization scopes described above will produce similar results. Game theory mathematics does not say anything about the evolutionary history or take into account species preferences, after all, because it doesn’t seem to need to. Even social insects, optimizing presumably on much larger, but still not global scopes, fall victim to large scale cyclic wars (I’m thinking of ants here).
So is alignment even a desirable goal? Perhaps we should ensure that AI does not aid the wealth pump and elite competition and the mobilization of the immiserated commoners (Turchin’s terminology)? But it is the goal of many, perhaps most AI researchers to “make a lot of money” (witness recent episode with Sam Altman and support from OpenAI employees for his profit-oriented strategy, over the board’s objection, as well as the fact most competing entities developing AI are profit oriented—and competing!) But some other goal (e.g. stabilization of society) might have wildly unpredictable results (stagnation comes to mind).
I’m assuming the relevant values are the optimizer ones not what people say. I discussed social institutions, including those encouraging people to endorse and optimize for common values, in the section on subversion.
Alignment with a human other than yourself could be a problem because people are to some degree selfish and, to a smaller degree, have different general principles/aesthetics about how things should be. So some sort of incentive optimization / social choice theory / etc might help. But at least there’s significant overlap between different humans’ values. Though, there’s a pretty big existing problem of people dying, the default was already that current people would be replaced by other people.
Evo game theory is a thing and does not agree with this, I think? though maybe I misunderstand. evo gt still typically only involves experiments of the current simulated population