I think solving the alignment problem for government, corporations, and other coallitions would probably help solving the alignment problem in AGI.
I guess you are saying that even if we could solve the above alignment problems it would still not go all the way to solving it for AGI? What particular gaps are you thinking of?
Yeah, mainly things such that solving them for human coalitions/firms doesn’t generalize. It’s hard to point to specific gaps because they’ll probably involve mechanisms of intelligence, which I / we don’t yet understand. The point is that the hidden mechanisms that are operating in human coalitions are pretty much just the ones operating in individual humans, maybe tweaked by being in a somewhat different local context created by the coalition (Bell Labs, scientific community, job in a company, role in a society, position in a government, etc. etc.). We’re well out of distribution for the ancestral environment, but not *that* far out. Humans, possibly excepting children, don’t routinely invent paradigm-making novel cognitive algorithms and then apply them to everything; that sort of thing only happens at a super-human level and what effects on the world it’s pointed at are not strongly constrained by it’s original function.
By “technical” I don’t mean anything specific, exactly. I’m gesturing vaguely at the cluster of things that look like math problems, math questions, scientific investigations, natural philosophy, engineering; and less like political problems, aesthetic goals, lawyering, warfare, cultural change. The sort of thing that takes a long time and might not happen at all because it involves long chains of prerequisites on prerequisites. Art might be an example of something that’s not “technical” but still matches this definition; I don’t know the history but from afar it seems like there’s actually quite a lot of progress in art and it’s somewhat firmly sequential / prerequisited, like perspective is something you invent, and you only get cubism after perspective, and cubism seems like a stepping stone towards more abstractionism.… So if the fate of everything depended on artistic progress, we’d want to be persistently working on art, refining and discovering concepts, even if we weren’t pure of soul.
How do you know they don’t generalize? As far as I know, no one has solved these problems for coallitions of agents, regardless of human, theoritical or otherwise.
Well the standard example is evolution: the compact mechanisms discovered first by the gradient-climbing search for fit organisms generalized to perform effectively in many domains, but not particularly to maximize fitness—we don’t monomaniacally maximize number of offspring (which would improve our genetic fitness a lot relative to what we actually do).
Human coalitions are made of humans, and humans come ready built with roughly the same desires and shape of cognition as you. That makes them vastly easier to interface with and approximately understand intuitively.
I was thinking specifically here of maximizing the value function (desires) across the agents interacting with other. Or more specially adapting the system in a way that it self maintains “maximizing the value function (desires) across the agents” property.
An example is an ecomonic system which seeks to maximize the total wealthfare. Current systems though don’t maintain themselves. More powerful agents take over the control mechanisms (or adjust the market rules) so that they are favoured (lobbying, cheating, ignoring the rules, mitageting enforcement). Similar problems occur in other types of coallitions.
Postulating a more powerful agent that forces this maximization property (an aligned super AGI) is cheating unless you can describe how this agent works and self maintains itself and this goal.
However coming to a solution of a system of agents that self maintains this property with no “super agent” might lead to solutions for AGI alignment, or might prevent the creation of such a misaligned agent.
I read a while ago the design/theoritics of corruption resistent systems is an area that has not received much research.
However coming to a solution of a system of agents that self maintains this property with no “super agent” might lead to solutions for AGI alignment, or might prevent the creation of such a misaligned agent.
I doubt that because intelligence explosions or their leadups make things local.
Yeah, mainly things such that solving them for human coalitions/firms doesn’t generalize.
I actually think it necessarily does, and that the method by which unFriendly egregores control us exploits and maintains a gap in our thinking that prevents us from solving the AGI alignment problem.
However! That’s up for debate, and given the uncertainty I think you highlighting this concern makes sense. You might turn out to be right.
(But I still think sorting out egregoric Friendliness is upstream to solving the technical AI alignment problem even if the thinking from one doesn’t transfer to the other.)
the method by which unFriendly egregores control us exploits and maintains a gap in our thinking that prevents us from solving the AGI alignment problem
I’m skeptical but definitely interested, if you have already expanded or at some point expand on this. E.g. what can you say about what precisely this method is; what’s the gap it maintains; why do you suspect it prevents us from solving alignment; what might someone without this gap say about alignment; etc.
sorting out egregoric Friendliness is upstream to solving the technical AI alignment problem even if the thinking from one doesn’t transfer to the other.
Leaving aside the claim about upstreamness, I upvote keeping this distinction live (since in fact I think an almost as strong version of the claim as you seem to, but I’m pretty skeptical about the transfer).
I’m skeptical but definitely interested, if you have already expanded or at some point expand on this. E.g. what can you say about what precisely this method is; what’s the gap it maintains; why do you suspect it prevents us from solving alignment; what might someone without this gap say about alignment; etc.
I haven’t really detailed this anywhere, but I just expanded on it a bit in my reply to Kaj.
What do you mean by “technical” here?
I think solving the alignment problem for government, corporations, and other coallitions would probably help solving the alignment problem in AGI.
I guess you are saying that even if we could solve the above alignment problems it would still not go all the way to solving it for AGI? What particular gaps are you thinking of?
Yeah, mainly things such that solving them for human coalitions/firms doesn’t generalize. It’s hard to point to specific gaps because they’ll probably involve mechanisms of intelligence, which I / we don’t yet understand. The point is that the hidden mechanisms that are operating in human coalitions are pretty much just the ones operating in individual humans, maybe tweaked by being in a somewhat different local context created by the coalition (Bell Labs, scientific community, job in a company, role in a society, position in a government, etc. etc.). We’re well out of distribution for the ancestral environment, but not *that* far out. Humans, possibly excepting children, don’t routinely invent paradigm-making novel cognitive algorithms and then apply them to everything; that sort of thing only happens at a super-human level and what effects on the world it’s pointed at are not strongly constrained by it’s original function.
By “technical” I don’t mean anything specific, exactly. I’m gesturing vaguely at the cluster of things that look like math problems, math questions, scientific investigations, natural philosophy, engineering; and less like political problems, aesthetic goals, lawyering, warfare, cultural change. The sort of thing that takes a long time and might not happen at all because it involves long chains of prerequisites on prerequisites. Art might be an example of something that’s not “technical” but still matches this definition; I don’t know the history but from afar it seems like there’s actually quite a lot of progress in art and it’s somewhat firmly sequential / prerequisited, like perspective is something you invent, and you only get cubism after perspective, and cubism seems like a stepping stone towards more abstractionism.… So if the fate of everything depended on artistic progress, we’d want to be persistently working on art, refining and discovering concepts, even if we weren’t pure of soul.
How do you know they don’t generalize? As far as I know, no one has solved these problems for coallitions of agents, regardless of human, theoritical or otherwise.
Well the standard example is evolution: the compact mechanisms discovered first by the gradient-climbing search for fit organisms generalized to perform effectively in many domains, but not particularly to maximize fitness—we don’t monomaniacally maximize number of offspring (which would improve our genetic fitness a lot relative to what we actually do).
Human coalitions are made of humans, and humans come ready built with roughly the same desires and shape of cognition as you. That makes them vastly easier to interface with and approximately understand intuitively.
I was thinking specifically here of maximizing the value function (desires) across the agents interacting with other. Or more specially adapting the system in a way that it self maintains “maximizing the value function (desires) across the agents” property.
An example is an ecomonic system which seeks to maximize the total wealthfare. Current systems though don’t maintain themselves. More powerful agents take over the control mechanisms (or adjust the market rules) so that they are favoured (lobbying, cheating, ignoring the rules, mitageting enforcement). Similar problems occur in other types of coallitions.
Postulating a more powerful agent that forces this maximization property (an aligned super AGI) is cheating unless you can describe how this agent works and self maintains itself and this goal.
However coming to a solution of a system of agents that self maintains this property with no “super agent” might lead to solutions for AGI alignment, or might prevent the creation of such a misaligned agent.
I read a while ago the design/theoritics of corruption resistent systems is an area that has not received much research.
I doubt that because intelligence explosions or their leadups make things local.
I actually think it necessarily does, and that the method by which unFriendly egregores control us exploits and maintains a gap in our thinking that prevents us from solving the AGI alignment problem.
However! That’s up for debate, and given the uncertainty I think you highlighting this concern makes sense. You might turn out to be right.
(But I still think sorting out egregoric Friendliness is upstream to solving the technical AI alignment problem even if the thinking from one doesn’t transfer to the other.)
I’m skeptical but definitely interested, if you have already expanded or at some point expand on this. E.g. what can you say about what precisely this method is; what’s the gap it maintains; why do you suspect it prevents us from solving alignment; what might someone without this gap say about alignment; etc.
Leaving aside the claim about upstreamness, I upvote keeping this distinction live (since in fact I think an almost as strong version of the claim as you seem to, but I’m pretty skeptical about the transfer).
I haven’t really detailed this anywhere, but I just expanded on it a bit in my reply to Kaj.