LeCun may not be correct to dismiss concerns but I think the concept “dominance” could be very useful concept for AI safety people to apply (or at least grapple with).
The thing about the concept is it seems as if it could be defined in game theoretic terms fairly easily and so could be defined in a fashion independent of the intelligence or capabilities of an organism or entity. Plausibly, it could be measured and analyzed more objectively than “aligned to human values”, which appears to depend one’s notion of human values.
Defined well, dominance would be the organizing principle, the source, of an entity’s behavior. So if it was possible to engineer an AI for non-dominance, “it might become dominant for other reasons” (argued here multiple time) wouldn’t be a valid argue because achieving dominance or non-dominance would be the overriding reason/motivation that the entity had and no “other reason” would override that.
And I don’t think the concept itself guarantees a given GAI would be created safety. It would depend on the creation process.
A process where dominance is an incidental quality, it seems like an apparently nondominant system could become dominant unpredictably. While Bing Chat wasn’t a GAI, it’s shift to dominant and malevolent seems like a reasonable warning for blind training.
In a process which attempts to evolve non-dominant behavior. Here I think it’s an open question whether the thing can be guaranteed non-dominant.
A system where a nondominant system is explicitly engineered. One might even be able logically guarantee this in the fashion of provably correct software. Of course, explicitly engineered systems seem to be losing to trained/evolved systems.
I think this is sort of the idea behind a satisficer. Make something that basically never tries too hard, therefore will never reach up to the “conquer the world” class of solutions as they’re way too extreme and you can do good enough with far less. That said, I’m not sure if satisficers are actually proven to be fully safe either.
Something like this is argued to be why humans are frankly exceptionally well aligned to basic homeostatic drives, and the only real failure modes that happened are basically obesity, drugs and maybe alcohol as things that misaligned us with basic needs, as hedonic treadmills/loops essentially tame the RL part of us, and make sure that reward isn’t the optimization target in practice, like TurnTrout’s post below:
Defined well, dominance would be the organizing principle, the source, of an entity’s behavior.
I doubt that. Dominance is the result, not the cause of behavior. It comes from the fact that there are conflicts in the world and often, only one side can get its will (even in a compromise, there’s usually a winner and a loser). If an agent strives for dominance, it is usually as an instrumental goal for something else the agent wants to achieve. There may be a “dominance drive” in some humans, but I don’t think that explains much of actual dominant behavior. Even among animals, dominant behavior is often a means to an end, for example getting the best mating partners or the largest share of food.
I also think the concept is already covered in game theory, although I’m not an expert.
LeCun may not be correct to dismiss concerns but I think the concept “dominance” could be very useful concept for AI safety people to apply (or at least grapple with).
The thing about the concept is it seems as if it could be defined in game theoretic terms fairly easily and so could be defined in a fashion independent of the intelligence or capabilities of an organism or entity. Plausibly, it could be measured and analyzed more objectively than “aligned to human values”, which appears to depend one’s notion of human values.
Defined well, dominance would be the organizing principle, the source, of an entity’s behavior. So if it was possible to engineer an AI for non-dominance, “it might become dominant for other reasons” (argued here multiple time) wouldn’t be a valid argue because achieving dominance or non-dominance would be the overriding reason/motivation that the entity had and no “other reason” would override that.
And I don’t think the concept itself guarantees a given GAI would be created safety. It would depend on the creation process.
A process where dominance is an incidental quality, it seems like an apparently nondominant system could become dominant unpredictably. While Bing Chat wasn’t a GAI, it’s shift to dominant and malevolent seems like a reasonable warning for blind training.
In a process which attempts to evolve non-dominant behavior. Here I think it’s an open question whether the thing can be guaranteed non-dominant.
A system where a nondominant system is explicitly engineered. One might even be able logically guarantee this in the fashion of provably correct software. Of course, explicitly engineered systems seem to be losing to trained/evolved systems.
I think this is sort of the idea behind a satisficer. Make something that basically never tries too hard, therefore will never reach up to the “conquer the world” class of solutions as they’re way too extreme and you can do good enough with far less. That said, I’m not sure if satisficers are actually proven to be fully safe either.
Something like this is argued to be why humans are frankly exceptionally well aligned to basic homeostatic drives, and the only real failure modes that happened are basically obesity, drugs and maybe alcohol as things that misaligned us with basic needs, as hedonic treadmills/loops essentially tame the RL part of us, and make sure that reward isn’t the optimization target in practice, like TurnTrout’s post below:
https://www.lesswrong.com/posts/pdaGN6pQyQarFHXF4/reward-is-not-the-optimization-target
Similarly, 2 beren posts below explain how the PID control loop may be helpful for alignment:
https://www.lesswrong.com/posts/3mwfyLpnYqhqvprbb/hedonic-loops-and-taming-rl
https://www.beren.io/2022-11-29-Preventing-Goodheart-with-homeostatic-rewards/
I doubt that. Dominance is the result, not the cause of behavior. It comes from the fact that there are conflicts in the world and often, only one side can get its will (even in a compromise, there’s usually a winner and a loser). If an agent strives for dominance, it is usually as an instrumental goal for something else the agent wants to achieve. There may be a “dominance drive” in some humans, but I don’t think that explains much of actual dominant behavior. Even among animals, dominant behavior is often a means to an end, for example getting the best mating partners or the largest share of food.
I also think the concept is already covered in game theory, although I’m not an expert.