As soon as we start talking about societal identities and therefore interests or “values” (and, in general, any group identities/interests/values) the question rises of how should AI balance between individual and group interests, while also considering than there are far more than two levels of group hierarchy, and that group identity/individuality (cf. Krakauer et al. “Information theory of individuality”) is a gradualistic rather than a categorical property, as well as group’s (system’s) consciousness. If we don’t have a principled scientific theory for balancing between all these considerations (as well as many others “unsolved issues”, which I mentioned here: https://www.lesswrong.com/posts/opE6L8jBTTNAyaDbB/a-multi-disciplinary-view-on-ai-safety-research#3_1__Scale_free_axiology_and_ethics) and instead rely on some “heuristic” at best (such as allowing different AIs representing different individuals and groups to compete and negotiate with each other and expecting the ultimate balance of interests to emerge as the outcome of this competition and/or negotiation), my expectation is that this will lead to “doom” or at least a dystopia with a very high probability.
And if a solution to 1 does require a solution to ethics, I think we should give up on alignment, and push full throttle on collective action solutions to not building AGIs in the first place, because that is literally the longest-lasting open problem in the history of philosophy.
That is actually somewhat close to my position. I am actually supporting a complete ban of AGI development maybe at least until humans develop narrow bio-genetic AI to fix humans’ genom such that humans are much like bonobos and a significant fraction of their own misalignment is gone. I.e., what Yudkowsky proposes.
But I also don’t expect this to happen, and I don’t think we are completely doomed if this doesn’t happen. Ethics was an unsolved philosophical problem for thousands of years exactly because it was treated as a philosophical rather than a scientific problem, which itself was a necessity because the scientific theories that are absolutely required as the basis of scientific ethics (from cognitive science and control theory to network theory and theory of evolution) and the necessary mathematical apparatus (such as renormalization group and category theory) were themselves lacking. However, I don’t count on (unaided) humans to develop the requisite science of consciousness and ethics in the short timeframe that will be available. My mainline hope is definitely in AI-automated or at least AI-assisted science, which is also what all major AGI labs seem to bet at, either explicitly (OpenAI, Conjecture) or implicitly (others).
In response to Roman’s very good points (i have only for now skimmed the linked articles); these are my thoughts:
I agree that human values are very hard to aggregate (or even to define precisely); we use politics/economy (of collectives ranging from the family up to the nation) as a way of doing that aggregation, but that is obviously a work in progress, and perhaps slipping backwards. In any case, (as Roman says) humans are (much of the time) misaligned with each other and their collectives, in ways little and large, and sometimes that is for good or bad reasons. By ‘good reason’ I mean that sometimes ‘misalignment’ might literally be that human agents & collectives have local (geographical/temporal) realities they have to optimise for (to achieve their goals), which might conflict with goals/interests of their broader collectives: this is the essence of governing a large country, and is why many countries are federated. I’m sure these problems are formalised in preference/values literature, so I’m using my naive terms for now…
Anyway, this post’s working assumption/intuition is that ‘single AI-single human’ alignment (or corrigibility or identity fusion or (delegation to use Andrew Critch’s term)) is ‘easier’ to think about or achieve, than ‘multiple AI-multiple human’. Which is why we consciously focused on the former & temporarily ignored the latter. I don’t know if that assumption is valid and I haven’t thought about (i.e. no opinion) whether ideas in Roman’s ‘science of ethics’ linked post would change anything, but am interested in it !
As soon as we start talking about societal identities and therefore interests or “values” (and, in general, any group identities/interests/values) the question rises of how should AI balance between individual and group interests, while also considering than there are far more than two levels of group hierarchy, and that group identity/individuality (cf. Krakauer et al. “Information theory of individuality”) is a gradualistic rather than a categorical property, as well as group’s (system’s) consciousness. If we don’t have a principled scientific theory for balancing between all these considerations (as well as many others “unsolved issues”, which I mentioned here: https://www.lesswrong.com/posts/opE6L8jBTTNAyaDbB/a-multi-disciplinary-view-on-ai-safety-research#3_1__Scale_free_axiology_and_ethics) and instead rely on some “heuristic” at best (such as allowing different AIs representing different individuals and groups to compete and negotiate with each other and expecting the ultimate balance of interests to emerge as the outcome of this competition and/or negotiation), my expectation is that this will lead to “doom” or at least a dystopia with a very high probability.
That is actually somewhat close to my position. I am actually supporting a complete ban of AGI development maybe at least until humans develop narrow bio-genetic AI to fix humans’ genom such that humans are much like bonobos and a significant fraction of their own misalignment is gone. I.e., what Yudkowsky proposes.
But I also don’t expect this to happen, and I don’t think we are completely doomed if this doesn’t happen. Ethics was an unsolved philosophical problem for thousands of years exactly because it was treated as a philosophical rather than a scientific problem, which itself was a necessity because the scientific theories that are absolutely required as the basis of scientific ethics (from cognitive science and control theory to network theory and theory of evolution) and the necessary mathematical apparatus (such as renormalization group and category theory) were themselves lacking. However, I don’t count on (unaided) humans to develop the requisite science of consciousness and ethics in the short timeframe that will be available. My mainline hope is definitely in AI-automated or at least AI-assisted science, which is also what all major AGI labs seem to bet at, either explicitly (OpenAI, Conjecture) or implicitly (others).
In response to Roman’s very good points (i have only for now skimmed the linked articles); these are my thoughts:
I agree that human values are very hard to aggregate (or even to define precisely); we use politics/economy (of collectives ranging from the family up to the nation) as a way of doing that aggregation, but that is obviously a work in progress, and perhaps slipping backwards. In any case, (as Roman says) humans are (much of the time) misaligned with each other and their collectives, in ways little and large, and sometimes that is for good or bad reasons. By ‘good reason’ I mean that sometimes ‘misalignment’ might literally be that human agents & collectives have local (geographical/temporal) realities they have to optimise for (to achieve their goals), which might conflict with goals/interests of their broader collectives: this is the essence of governing a large country, and is why many countries are federated. I’m sure these problems are formalised in preference/values literature, so I’m using my naive terms for now…
Anyway, this post’s working assumption/intuition is that ‘single AI-single human’ alignment (or corrigibility or identity fusion or (delegation to use Andrew Critch’s term)) is ‘easier’ to think about or achieve, than ‘multiple AI-multiple human’. Which is why we consciously focused on the former & temporarily ignored the latter. I don’t know if that assumption is valid and I haven’t thought about (i.e. no opinion) whether ideas in Roman’s ‘science of ethics’ linked post would change anything, but am interested in it !