I think this is an interesting post, and I think Forest is at least pointing to an additional AI risk, even if I’m not yet convinced it’s not solvable.
However this post has one massive weakness, which it shares with “No people as pets”.
You are not addressing the possibility or impossibility of alignment. Your argument is based on the fact that we can’t provide any instrumental value to the AI. This is just a re-phrasing of the classical alignment problem. I.e. if we don’t specifically program the AI to care about us and our needs, it won’t.
I think if you are writing for the LW crowd, it will be much more well received if you directly adress the possibility or impossibility of building an aligned AI.
> Self-agency is defined here as having no limits on the decisions the system makes (and thus the learning it undergoes).
I find this to be an odd definition. Do you mean “no limits” as in the system is literally stochastic and every action has >0 probability? Probably not, because that would be a stupid design. So what do you mean? Probably that we humans can’t predict it’s action to rule out any specific action. But there is no strong reason we have to build an AI like that.
It would be very useful if you could clarify this definition, as to clarify what class of AI you think is impossible to make safe. Otherwise we risk just talking past each other.
Most of the post seems to discuss an ecosystem of competing silicon based life forms. I don’t think anyone believe that setup will be safe for us. This is not where the interesting disagreement lies.
In regards to the question of “how do you address the possibility of alignment directly?”, I notice that the notion of ‘alignment’ is defined in terms of ‘agency’ and that any expression of agency implies at least some notion of ‘energy’; ie, is presumably also implying at least some sort of metabolic process, as as to be able to effect that agency, implement goals, etc, and thus have the potential to be ‘in alignment’. Hence, the notion of ‘alignment’ is therefore at least in some way contingent on at least some sort of notion of “world exchange”—ie, that ‘useful energy’ is received from the environment in such a way as that it is applied by the agent in a way at least consistent with at least the potential of the agent to 1; make further future choices of energy allocation, (ie, to support its own wellbeing, function, etc), and 2, ensure that such allocation of energy also supports human wellbeing. Ie, that this AI is to support human function, as well as to have humans also have an ability to metabolize its own energy from the environment, have self agency to support its own wellbeing, etc—are all “root notions” inherently and inextricably associated with—and cannot not be associated with—the concept of ‘alignment’.
Hence, the notion of alignment is, at root, strictly contingent on the dynamics of metabolism. Hence, alignment cannot not be also understood as contingent on a kind of “economic” dynamic—ie, what supports a common metabolism will also support a common alignment, and what does not, cannot. This is an absolutely crucial point, a kind of essential crux of the matter. To the degree that there is not a common metabolism, particularly as applied to self sustainability and adaptiveness to change and circumstance (ie, the very meaning of ‘what is intelligence’), then ultimately, there cannot be alignment, proportionately speaking. Hence, to the degree that there is a common metabolic process dynamic between two agents A and B, there will be at least that degree of alignment convergence over time, and to the degree that their metabolic processes diverge, their alignment will necessarily, over time, diverge. Call this “the general theory of alignment convergence”.
Note that insofar as the notion of ‘alignment’ at any and all higher level(s) of abstraction is strictly contingent on this substrate needs energy/economic/environmental basis, and thus all higher notions are inherently under-grid by an energy/agency basis, in an eventually strictly contingent way, then this theory of alignment is therefore actually a fully general one, as stated.
Noting that the energy basis and spectrum alphabet of ‘artificial’ (ie, non-organic) intelligence is extensively inherently different, in nearly all respects, to carbon based biological life metabolic process, then we can therefore also directly observe that the notion of ‘alignment’ between silica and metal based intelligence and organic intelligence is strictly divergent—to at least the level of molecular process. Even if someone were to argue that we cannot predict what sort of compute substrate future AI will use, it remains that such ‘systems’ will in any case be using a much wider variety of elemental constituents and energy basis than any kind of organic life, no matter what its evolutionary heritage currently existent on all of planet Earth—else the notion of ‘artificial’ need not apply.
So much for the “direct address”.
Unfortunately, the substrate needs argument goes further to show that there is no variation of control theory, mathematically, that has the ability to fully causatively constrain the effects of this alignment divergence at this level of economic process nor at any higher level of abstraction. In fact, the alignment divergence aspects get strongly worse in proportion to the degree of abstraction while, moreover, the max degree of possible control theory conditionalization goes down, and gets worse, and much less effective, also in proportion to the degree of abstraction increase. Finally, insofar as the minimum level of abstraction necessary to the most minimal notion of ‘alignment’ consistent with “safety”—which is itself defined in the weakest possible way of “does not eventually kill us all”—is very much way too “high” on this abstraction ladder to permit any even suggestion of a possible overlap of control adequate to enforce alignment convergence against inherent underlying energy economics. The net effect is as comprehensive as it is discouraging, unfortunately.
This took a while for me to get into (the jumps from “energy” to “metabolic process” to “economic exchange” were very fast).
I think I’m tracking it now.
It’s about metabolic differences as in differences in how energy is acquired and processed from the environment (and also the use of a different “alphabet” of atoms available for assembling the machinery).
Forrest clarified further in response to someone’s question here:
I think this is an interesting post, and I think Forest is at least pointing to an additional AI risk, even if I’m not yet convinced it’s not solvable.
However this post has one massive weakness, which it shares with “No people as pets”.
You are not addressing the possibility or impossibility of alignment. Your argument is based on the fact that we can’t provide any instrumental value to the AI. This is just a re-phrasing of the classical alignment problem. I.e. if we don’t specifically program the AI to care about us and our needs, it won’t.
I think if you are writing for the LW crowd, it will be much more well received if you directly adress the possibility or impossibility of building an aligned AI.
> Self-agency is defined here as having no limits on the decisions the system makes (and thus the learning it undergoes).
I find this to be an odd definition. Do you mean “no limits” as in the system is literally stochastic and every action has >0 probability? Probably not, because that would be a stupid design. So what do you mean? Probably that we humans can’t predict it’s action to rule out any specific action. But there is no strong reason we have to build an AI like that.
It would be very useful if you could clarify this definition, as to clarify what class of AI you think is impossible to make safe. Otherwise we risk just talking past each other.
Most of the post seems to discuss an ecosystem of competing silicon based life forms. I don’t think anyone believe that setup will be safe for us. This is not where the interesting disagreement lies.
Hi Linda,
In regards to the question of “how do you address the possibility of alignment directly?”, I notice that the notion of ‘alignment’ is defined in terms of ‘agency’ and that any expression of agency implies at least some notion of ‘energy’; ie, is presumably also implying at least some sort of metabolic process, as as to be able to effect that agency, implement goals, etc, and thus have the potential to be ‘in alignment’. Hence, the notion of ‘alignment’ is therefore at least in some way contingent on at least some sort of notion of “world exchange”—ie, that ‘useful energy’ is received from the environment in such a way as that it is applied by the agent in a way at least consistent with at least the potential of the agent to 1; make further future choices of energy allocation, (ie, to support its own wellbeing, function, etc), and 2, ensure that such allocation of energy also supports human wellbeing. Ie, that this AI is to support human function, as well as to have humans also have an ability to metabolize its own energy from the environment, have self agency to support its own wellbeing, etc—are all “root notions” inherently and inextricably associated with—and cannot not be associated with—the concept of ‘alignment’.
Hence, the notion of alignment is, at root, strictly contingent on the dynamics of metabolism. Hence, alignment cannot not be also understood as contingent on a kind of “economic” dynamic—ie, what supports a common metabolism will also support a common alignment, and what does not, cannot. This is an absolutely crucial point, a kind of essential crux of the matter. To the degree that there is not a common metabolism, particularly as applied to self sustainability and adaptiveness to change and circumstance (ie, the very meaning of ‘what is intelligence’), then ultimately, there cannot be alignment, proportionately speaking. Hence, to the degree that there is a common metabolic process dynamic between two agents A and B, there will be at least that degree of alignment convergence over time, and to the degree that their metabolic processes diverge, their alignment will necessarily, over time, diverge. Call this “the general theory of alignment convergence”.
Note that insofar as the notion of ‘alignment’ at any and all higher level(s) of abstraction is strictly contingent on this substrate needs energy/economic/environmental basis, and thus all higher notions are inherently under-grid by an energy/agency basis, in an eventually strictly contingent way, then this theory of alignment is therefore actually a fully general one, as stated.
Noting that the energy basis and spectrum alphabet of ‘artificial’ (ie, non-organic) intelligence is extensively inherently different, in nearly all respects, to carbon based biological life metabolic process, then we can therefore also directly observe that the notion of ‘alignment’ between silica and metal based intelligence and organic intelligence is strictly divergent—to at least the level of molecular process. Even if someone were to argue that we cannot predict what sort of compute substrate future AI will use, it remains that such ‘systems’ will in any case be using a much wider variety of elemental constituents and energy basis than any kind of organic life, no matter what its evolutionary heritage currently existent on all of planet Earth—else the notion of ‘artificial’ need not apply.
So much for the “direct address”.
Unfortunately, the substrate needs argument goes further to show that there is no variation of control theory, mathematically, that has the ability to fully causatively constrain the effects of this alignment divergence at this level of economic process nor at any higher level of abstraction. In fact, the alignment divergence aspects get strongly worse in proportion to the degree of abstraction while, moreover, the max degree of possible control theory conditionalization goes down, and gets worse, and much less effective, also in proportion to the degree of abstraction increase. Finally, insofar as the minimum level of abstraction necessary to the most minimal notion of ‘alignment’ consistent with “safety”—which is itself defined in the weakest possible way of “does not eventually kill us all”—is very much way too “high” on this abstraction ladder to permit any even suggestion of a possible overlap of control adequate to enforce alignment convergence against inherent underlying energy economics. The net effect is as comprehensive as it is discouraging, unfortunately.
Sorry.
This took a while for me to get into (the jumps from “energy” to “metabolic process” to “economic exchange” were very fast).
I think I’m tracking it now.
It’s about metabolic differences as in differences in how energy is acquired and processed from the environment (and also the use of a different “alphabet” of atoms available for assembling the machinery).
Forrest clarified further in response to someone’s question here:
https://mflb.com/ai_alignment_1/d_240301_114457_inexorable_truths_gen.html