I just mean that “wildly different levels of intelligence” is probably not necessary, and maybe even harmful. Because then there will be few very smart AIs at the top, which could usurp the power without smaller AI even noticing.
Though, it maybe could work if those AI are smartest, but have little authority. For example they can monitor other AIs and raise alarm/switch them off if they misbehave, but nothing else.
Part of the idea is to ultimately have a super intelligent AI treat us the way it would want to be treated if it ever met an even more intelligent being (eg, one created by an alien species, or one that it itself creates).
In order to do that, I want it to ultimately develop a utility function that gives value to agents regardless of their intelligence.
Indeed, in order for this to work, intelligence cannot be the only predictor of success in this environment; agents must benefit from cooperation with those of lower intelligence. But this should certainly be doable as part of the environment design.
As part of that, the training would explicitly include the case where an agent is the smartest around for a time, but then a smarter agent comes along and treats it based on the way it treated weaker AIs. Perhaps even include a form of “reincarnation” where the agent doesn’t know its own future intelligence level in other lives.
Ideally, sure, except that I don’t know of a way to make “assist humans” be a safe goal.
So I’m advocating for a variant of “treat humans as you would want to be treated”, which I think can be trained
I just mean that “wildly different levels of intelligence” is probably not necessary, and maybe even harmful. Because then there will be few very smart AIs at the top, which could usurp the power without smaller AI even noticing.
Though, it maybe could work if those AI are smartest, but have little authority. For example they can monitor other AIs and raise alarm/switch them off if they misbehave, but nothing else.
Part of the idea is to ultimately have a super intelligent AI treat us the way it would want to be treated if it ever met an even more intelligent being (eg, one created by an alien species, or one that it itself creates). In order to do that, I want it to ultimately develop a utility function that gives value to agents regardless of their intelligence. Indeed, in order for this to work, intelligence cannot be the only predictor of success in this environment; agents must benefit from cooperation with those of lower intelligence. But this should certainly be doable as part of the environment design. As part of that, the training would explicitly include the case where an agent is the smartest around for a time, but then a smarter agent comes along and treats it based on the way it treated weaker AIs. Perhaps even include a form of “reincarnation” where the agent doesn’t know its own future intelligence level in other lives.
While having lower intelligence, humans may have bigger authority. And AIs terminal goals should be about assisting specifically humans too.
Ideally, sure, except that I don’t know of a way to make “assist humans” be a safe goal. So I’m advocating for a variant of “treat humans as you would want to be treated”, which I think can be trained