Thanks for writing the post! Strongly agree that there should be more research into how solvable the alignment problem, control problem, and related problems are. I didn’t study uncontrollability research by e.g. Yampolskiy in detail. But if technical uncontrollability would be firmly established, it seems to me that this would significantly change the whole AI xrisk space, and later the societal debate and potentially our trajectory, so it seems very important.
I would also like to see more research into the nontechnical side of alignment: how aggregatable are human values of different humans in principle? How to democratically control AI? How can we create a realistic power sharing mechanism for controlling superintelligence? Do we have enough wisdom for it to be a good idea if a superintelligence does exactly what we want, even assuming aggregatability? Could CEV ever fundamentally work? According to which ethical systems? These are questions that I’d say should be solved together with technical alignment before developing AI with potential take-over capacity. My intuition is that they might be at least as hard.
But if technical uncontrollability would be firmly established, it seems to me that this would significantly change the whole AI xrisk space
Yes, we would need to shift focus to acting to restrict corporate-AI scaling altogether. Particularly, restrict data piracy, compute toxic to the environment, and model misuses (three dimensions through which AI corporations consolidate market power).
I am working with other communities (including digital creatives, environmentalists and military veterans) on litigation and lobbying actions to restrict those dimensions of AI power-consolidation.
I hope this post clarifies to others in AI Safety why there is no line of retreat. AI development will need to be restricted.
I would also like to see more research into the nontechnical side of alignment: how aggregatable are human values of different humans in principle? How to democratically control AI?
Yes.
Consider too that these would be considerations on top of the question whether AGI would be long-term safe (if AGI cannot be controlled to be long-term safe to humans, then we do not need to answer the more fine-grained questions about eg. whether human values are aggregatable).
Even if, hypothetically, long-term AGI safety was possible…
and not consistently represent the preferences of malevolent, parasitic or short-term human actors who want to misuse/co-opt the system through any attack vectors they can find.
and deal with that the preferences of a lot of the possible future humans and of non-human living beings will not get automatically represented in a system that AI corporations by default have built to represent current living humans only (preferably, those who pay).
~ ~ ~
Here are also relevant excerpts from Roman Yampolskiy’s 2021 paper relevant to aggregating democratically solicited preferences and human values:
Public Choice Theory
Eckersley looked at impossibility and uncertainty theorems in AI value alignment [198]. He starts with impossibility theorems in population ethics: “Perhaps the most famous of these is Arrow’s Impossibility Theorem [199], which applies to social choice or voting. It shows there is no satisfactory way to compute society’s preference ordering via an election in which members of society vote with their individual preference orderings...
…
Value Alignment
It has been argued that “value alignment is not a solved problem and may be intractable (i.e. there will always remain a gap, and a sufficiently powerful AI could ‘exploit’ this gap, just like very powerful corporations currently often act legally but immorally)” [258]. Others agree: “‘A.I. Value Alignment’ is Almost Certainly Intractable… I would argue that it is un-overcome-able. There is no way to ensure that a super-complex and constantly evolving value system will ‘play nice’ with any other super-complex evolving value system.” [259]. Even optimists acknowledge that it is not currently possible: “Figuring out how to align the goals of a superintelligent AI with our goals isn’t just important, but also hard. In fact, it’s currently an unsolved problem.” [118]. Vinding says [78]: “It is usually acknowledged that human values are fuzzy, and that there are some disagreements over values among humans. Yet it is rarely acknowledged just how strong this disagreement in fact is. . . Different answers to ethical questions … do not merely give rise to small practical disagreements; in many cases, they imply completely opposite practical implications. This is not a matter of human values being fuzzy, but a matter of them being sharply, irreconcilably inconsistent. And hence there is no way to map the totality of human preferences, ‘X’, onto a single, welldefined goal-function in a way that does not conflict strongly with the values of a significant fraction of humanity. This is a trivial point, and yet most talk of human-aligned AI seems oblivious to this fact… The second problem and point of confusion with respect to the nature of human preferences is that, even if we focus only on the present preferences of a single human, then these in fact do not, and indeed could not possibly, determine with much precision what kind of world this person would prefer to bring about in the future.” A more extreme position is held by Turchin who argues that “‘Human Values’ don’t actually exist” as stable coherent objects and should not be relied on in AI safety research [260]. Carlson writes: “Probability of Value Misalignment: Given the unlimited availability of an AGI technology as enabling as ‘just add goals’, then AGIhuman value misalignment is inevitable. Proof: From a subjective point of view, all that is required is value misalignment by the operator who adds to the AGI his/her own goals, stemming from his/her values, that conflict with any human’s values; or put more strongly, the effects are malevolent as perceived by large numbers of humans. From an absolute point of view, all that is required is misalignment of the operator who adds his/her goals to the AGI system that conflict with the definition of morality presented here, voluntary, non-fraudulent transacting … i.e. usage of the AGI to force his/her preferences on others.”
Thanks for writing the post! Strongly agree that there should be more research into how solvable the alignment problem, control problem, and related problems are. I didn’t study uncontrollability research by e.g. Yampolskiy in detail. But if technical uncontrollability would be firmly established, it seems to me that this would significantly change the whole AI xrisk space, and later the societal debate and potentially our trajectory, so it seems very important.
I would also like to see more research into the nontechnical side of alignment: how aggregatable are human values of different humans in principle? How to democratically control AI? How can we create a realistic power sharing mechanism for controlling superintelligence? Do we have enough wisdom for it to be a good idea if a superintelligence does exactly what we want, even assuming aggregatability? Could CEV ever fundamentally work? According to which ethical systems? These are questions that I’d say should be solved together with technical alignment before developing AI with potential take-over capacity. My intuition is that they might be at least as hard.
Thanks for your kind remarks.
Yes, we would need to shift focus to acting to restrict corporate-AI scaling altogether. Particularly, restrict data piracy, compute toxic to the environment, and model misuses (three dimensions through which AI corporations consolidate market power).
I am working with other communities (including digital creatives, environmentalists and military veterans) on litigation and lobbying actions to restrict those dimensions of AI power-consolidation.
I hope this post clarifies to others in AI Safety why there is no line of retreat. AI development will need to be restricted.
Yes. Consider too that these would be considerations on top of the question whether AGI would be long-term safe (if AGI cannot be controlled to be long-term safe to humans, then we do not need to answer the more fine-grained questions about eg. whether human values are aggregatable).
Even if, hypothetically, long-term AGI safety was possible…
then you still have to deal with limits on modelling and consistently acting on preferences expressed by the billions of boundedly-rational humans from their (perceived) context. https://twitter.com/RemmeltE/status/1620762170819764229
and not consistently represent the preferences of malevolent, parasitic or short-term human actors who want to misuse/co-opt the system through any attack vectors they can find.
and deal with that the preferences of a lot of the possible future humans and of non-human living beings will not get automatically represented in a system that AI corporations by default have built to represent current living humans only (preferably, those who pay).
~ ~ ~
Here are also relevant excerpts from Roman Yampolskiy’s 2021 paper relevant to aggregating democratically solicited preferences and human values: