CEV-ing just one person is enough for the “basic challenge” of alignment as described on AGI Ruin.
I thought the “C” in CEV stood for “coherent” in the sense that it had been reconciled over all people (or over whatever set of preference-possessing entities you were taking into acount). Otherwise wouldn’t it just be “EV”?
I think the kind of AI likely to take over the world can be described closely enough in such a way.
So are you saying that it would literally have an internal function that represented “how good” it thought every possible state of the world was, and then solve an (approximate) optimization problem directly in terms of maximizing that function? That doesn’t seem to me like a problem you could solve even with a Jupiter brain and perfect software.
I thought the “C” in CEV stood for “coherent” in the sense that it had been reconciled over all people (or over whatever set of preference-possessing entities you were taking into acount). Otherwise wouldn’t it just be “EV”?
I mean I guess, sure, if “CEV” means over-all-people then I just mean “EV” here.
Just “EV” is enough for the “basic challenge” of alignment as described on AGI Ruin.
So are you saying that it would literally have an internal function that represented “how good” it thought every possible state of the world was, and then solve an (approximate) optimization problem directly in terms of maximizing that function?
Or do something which has approximately that effect.
That doesn’t seem to me like a problem you could solve even with a Jupiter brain and perfect software.
I disagree! I think some humans right now (notably people particulalry focused on alignment) already do something vague EUmax-shaped, and definitely an ASI capable of running on current compute would be able to do something more EUmax-shaped. Very, very far from actual “pure” EUmax of course; but way sufficient to defeat all humans, who are quite further away from pure EUmax. Maybe see also this comment of mine.
I thought the “C” in CEV stood for “coherent” in the sense that it had been reconciled over all people (or over whatever set of preference-possessing entities you were taking into acount). Otherwise wouldn’t it just be “EV”?
So are you saying that it would literally have an internal function that represented “how good” it thought every possible state of the world was, and then solve an (approximate) optimization problem directly in terms of maximizing that function? That doesn’t seem to me like a problem you could solve even with a Jupiter brain and perfect software.
I mean I guess, sure, if “CEV” means over-all-people then I just mean “EV” here.
Just “EV” is enough for the “basic challenge” of alignment as described on AGI Ruin.
Or do something which has approximately that effect.
I disagree! I think some humans right now (notably people particulalry focused on alignment) already do something vague EUmax-shaped, and definitely an ASI capable of running on current compute would be able to do something more EUmax-shaped. Very, very far from actual “pure” EUmax of course; but way sufficient to defeat all humans, who are quite further away from pure EUmax. Maybe see also this comment of mine.