Well the standard example is evolution: the compact mechanisms discovered first by the gradient-climbing search for fit organisms generalized to perform effectively in many domains, but not particularly to maximize fitness—we don’t monomaniacally maximize number of offspring (which would improve our genetic fitness a lot relative to what we actually do).
Human coalitions are made of humans, and humans come ready built with roughly the same desires and shape of cognition as you. That makes them vastly easier to interface with and approximately understand intuitively.
I was thinking specifically here of maximizing the value function (desires) across the agents interacting with other. Or more specially adapting the system in a way that it self maintains “maximizing the value function (desires) across the agents” property.
An example is an ecomonic system which seeks to maximize the total wealthfare. Current systems though don’t maintain themselves. More powerful agents take over the control mechanisms (or adjust the market rules) so that they are favoured (lobbying, cheating, ignoring the rules, mitageting enforcement). Similar problems occur in other types of coallitions.
Postulating a more powerful agent that forces this maximization property (an aligned super AGI) is cheating unless you can describe how this agent works and self maintains itself and this goal.
However coming to a solution of a system of agents that self maintains this property with no “super agent” might lead to solutions for AGI alignment, or might prevent the creation of such a misaligned agent.
I read a while ago the design/theoritics of corruption resistent systems is an area that has not received much research.
However coming to a solution of a system of agents that self maintains this property with no “super agent” might lead to solutions for AGI alignment, or might prevent the creation of such a misaligned agent.
I doubt that because intelligence explosions or their leadups make things local.
Well the standard example is evolution: the compact mechanisms discovered first by the gradient-climbing search for fit organisms generalized to perform effectively in many domains, but not particularly to maximize fitness—we don’t monomaniacally maximize number of offspring (which would improve our genetic fitness a lot relative to what we actually do).
Human coalitions are made of humans, and humans come ready built with roughly the same desires and shape of cognition as you. That makes them vastly easier to interface with and approximately understand intuitively.
I was thinking specifically here of maximizing the value function (desires) across the agents interacting with other. Or more specially adapting the system in a way that it self maintains “maximizing the value function (desires) across the agents” property.
An example is an ecomonic system which seeks to maximize the total wealthfare. Current systems though don’t maintain themselves. More powerful agents take over the control mechanisms (or adjust the market rules) so that they are favoured (lobbying, cheating, ignoring the rules, mitageting enforcement). Similar problems occur in other types of coallitions.
Postulating a more powerful agent that forces this maximization property (an aligned super AGI) is cheating unless you can describe how this agent works and self maintains itself and this goal.
However coming to a solution of a system of agents that self maintains this property with no “super agent” might lead to solutions for AGI alignment, or might prevent the creation of such a misaligned agent.
I read a while ago the design/theoritics of corruption resistent systems is an area that has not received much research.
I doubt that because intelligence explosions or their leadups make things local.