and even finding any given person’s individual volition may be arbitrarily path-dependent
Sure, but it doesn’t need to be path-independent, it just needs to have pretty good expected value over possible paths.
Whether something is a utopia or a dystopia is a matter of opinion. Some people’s “utopias” may be worse than death from other people’s point of view.
That’s fair (though given the current distribution of people likely to launch the AI, I’m somewhat optimistic that we won’t get such a dystopia) — but the people getting confused about that question aren’t asking it because they have such concerns, they’re usually (in my experience) asking it because they’re confused way upstream of that, and if they were aware of the circumstances they’d be more likely to focus on solving alignment than on asking “aligned to whom”. I agree that the question makes sense, I just think the people asking it wouldn’t endorse-under-reflection focusing on that question in particular if they were aware of the circumstances. Maybe see also this post.
Most actually implementable agents probably don’t have coherent utility functions […]
I think the kind of AI likely to take over the world can be described closely enough in such a way. Certainly for the kind of aligned AI that saves the world, it seems likely to me that expected utility is sufficient to think about how it thinks about its impact on the world.
That’s fair (though given the current distribution of people likely to launch the AI, I’m somewhat optimistic that we won’t get such a dystopia) — but the people getting confused about that question aren’t asking it because they have such concerns, they’re usually (in my experience) asking it because they’re confused way upstream of that
I disagree. I think they’re concerned about the right thing for the right reasons, and the attempt to swap-in a different (if legitimate, and arguably more important) problem instead of addressing their concerns is where a lot of communication breaks down.
I mean, yes, there is the issue that it doesn’t matter which monkey finds the radioactive banana and drags it home, because that’s going to irradiate the whole tribe anyway. Many people don’t get it, and this confusion is important to point out and resolve.
But once it is resolved, the “but which monkey” question returns. Yes, currently AGI is unalignable. But since we want to align it anyway, and we’re proposing ways to make that happen, what’s our plan for that step? Who’s doing the aligning, what are they putting in the utility function, and why would that not be an eternal-dystopia hellscape which you’d rather burn down the world attempting to prevent than let happen?
They see a powerful technology on the horizon, and see people hyping it up as something world-changingly powerful. They’re immediately concerned regarding how it’ll be used. That there’s an intermediary step missing – that we’re not actually on-track to build the powerful technology, we’re only on-track to create a world-ending explosion – doesn’t invalidate the question of how that technology will be used if we could get back on-track to building it.
And if that concern gets repeatedly furiously dismissed in favour of “but we can’t even build it, we need to do [whatever] to build it”, that makes the other side feel unheard. And regardless of how effectively you argue the side of “the current banana-search strategies would only locate radioactive bananas” and “we need to prioritize avoiding radiation”, they’re going to stop listening in turn.
Okay yeah this is a pretty fair response actually. I think I still disagree with the core point (that AI aligned to current people-likely-to-get-AI-aligned-to-them would be extremely bad) but I definitely see where you’re coming from.
Do you actually believe extinction is preferable to rolling the dice on the expected utility (according to your own values) of what happens if one of the current AI org people launches AI aligned to themself?
Even if, in worlds where we get an AI aligned to a set of values that you would like, that AI then acausally pays AI-aligned-to-the-”wrong”-values in different timelines to not run suffering? e.g. Bob’s AI runs a bunch of things Alice would like in Bob’s AI’s timelines, in exchange for Alice’s AI not running things Bob would very strongly dislike.
I think the kind of AI likely to take over the world can be described closely enough in such a way. Certainly for the kind of aligned AI that saves the world, it seems likely to me that expected utility is sufficient to think about how it thinks about its impact on the world.
What observations are backing this belief? Have you seen approaches that share some key characteristics with expected utility maximization approaches which have worked in real-world situations, and where you expect that the characteristics that made it work in the situation you observed will transfer? If so, would you be willing to elaborate?
On the flip side, are there any observations you could make in the future that would convince you that expected utility maximization will not be a good model to describe the kind of AI likely to take over the world?
CEV-ing just one person is enough for the “basic challenge” of alignment as described on AGI Ruin.
I thought the “C” in CEV stood for “coherent” in the sense that it had been reconciled over all people (or over whatever set of preference-possessing entities you were taking into acount). Otherwise wouldn’t it just be “EV”?
I think the kind of AI likely to take over the world can be described closely enough in such a way.
So are you saying that it would literally have an internal function that represented “how good” it thought every possible state of the world was, and then solve an (approximate) optimization problem directly in terms of maximizing that function? That doesn’t seem to me like a problem you could solve even with a Jupiter brain and perfect software.
I thought the “C” in CEV stood for “coherent” in the sense that it had been reconciled over all people (or over whatever set of preference-possessing entities you were taking into acount). Otherwise wouldn’t it just be “EV”?
I mean I guess, sure, if “CEV” means over-all-people then I just mean “EV” here.
Just “EV” is enough for the “basic challenge” of alignment as described on AGI Ruin.
So are you saying that it would literally have an internal function that represented “how good” it thought every possible state of the world was, and then solve an (approximate) optimization problem directly in terms of maximizing that function?
Or do something which has approximately that effect.
That doesn’t seem to me like a problem you could solve even with a Jupiter brain and perfect software.
I disagree! I think some humans right now (notably people particulalry focused on alignment) already do something vague EUmax-shaped, and definitely an ASI capable of running on current compute would be able to do something more EUmax-shaped. Very, very far from actual “pure” EUmax of course; but way sufficient to defeat all humans, who are quite further away from pure EUmax. Maybe see also this comment of mine.
CEV-ing just one person is enough for the “basic challenge” of alignment as described on AGI Ruin.
Sure, but it doesn’t need to be path-independent, it just needs to have pretty good expected value over possible paths.
That’s fair (though given the current distribution of people likely to launch the AI, I’m somewhat optimistic that we won’t get such a dystopia) — but the people getting confused about that question aren’t asking it because they have such concerns, they’re usually (in my experience) asking it because they’re confused way upstream of that, and if they were aware of the circumstances they’d be more likely to focus on solving alignment than on asking “aligned to whom”. I agree that the question makes sense, I just think the people asking it wouldn’t endorse-under-reflection focusing on that question in particular if they were aware of the circumstances. Maybe see also this post.
I think the kind of AI likely to take over the world can be described closely enough in such a way. Certainly for the kind of aligned AI that saves the world, it seems likely to me that expected utility is sufficient to think about how it thinks about its impact on the world.
I disagree. I think they’re concerned about the right thing for the right reasons, and the attempt to swap-in a different (if legitimate, and arguably more important) problem instead of addressing their concerns is where a lot of communication breaks down.
I mean, yes, there is the issue that it doesn’t matter which monkey finds the radioactive banana and drags it home, because that’s going to irradiate the whole tribe anyway. Many people don’t get it, and this confusion is important to point out and resolve.
But once it is resolved, the “but which monkey” question returns. Yes, currently AGI is unalignable. But since we want to align it anyway, and we’re proposing ways to make that happen, what’s our plan for that step? Who’s doing the aligning, what are they putting in the utility function, and why would that not be an eternal-dystopia hellscape which you’d rather burn down the world attempting to prevent than let happen?
They see a powerful technology on the horizon, and see people hyping it up as something world-changingly powerful. They’re immediately concerned regarding how it’ll be used. That there’s an intermediary step missing – that we’re not actually on-track to build the powerful technology, we’re only on-track to create a world-ending explosion – doesn’t invalidate the question of how that technology will be used if we could get back on-track to building it.
And if that concern gets repeatedly furiously dismissed in favour of “but we can’t even build it, we need to do [whatever] to build it”, that makes the other side feel unheard. And regardless of how effectively you argue the side of “the current banana-search strategies would only locate radioactive bananas” and “we need to prioritize avoiding radiation”, they’re going to stop listening in turn.
Okay yeah this is a pretty fair response actually. I think I still disagree with the core point (that AI aligned to current people-likely-to-get-AI-aligned-to-them would be extremely bad) but I definitely see where you’re coming from.
Do you actually believe extinction is preferable to rolling the dice on the expected utility (according to your own values) of what happens if one of the current AI org people launches AI aligned to themself?
Even if, in worlds where we get an AI aligned to a set of values that you would like, that AI then acausally pays AI-aligned-to-the-”wrong”-values in different timelines to not run suffering? e.g. Bob’s AI runs a bunch of things Alice would like in Bob’s AI’s timelines, in exchange for Alice’s AI not running things Bob would very strongly dislike.
What observations are backing this belief? Have you seen approaches that share some key characteristics with expected utility maximization approaches which have worked in real-world situations, and where you expect that the characteristics that made it work in the situation you observed will transfer? If so, would you be willing to elaborate?
On the flip side, are there any observations you could make in the future that would convince you that expected utility maximization will not be a good model to describe the kind of AI likely to take over the world?
I thought the “C” in CEV stood for “coherent” in the sense that it had been reconciled over all people (or over whatever set of preference-possessing entities you were taking into acount). Otherwise wouldn’t it just be “EV”?
So are you saying that it would literally have an internal function that represented “how good” it thought every possible state of the world was, and then solve an (approximate) optimization problem directly in terms of maximizing that function? That doesn’t seem to me like a problem you could solve even with a Jupiter brain and perfect software.
I mean I guess, sure, if “CEV” means over-all-people then I just mean “EV” here.
Just “EV” is enough for the “basic challenge” of alignment as described on AGI Ruin.
Or do something which has approximately that effect.
I disagree! I think some humans right now (notably people particulalry focused on alignment) already do something vague EUmax-shaped, and definitely an ASI capable of running on current compute would be able to do something more EUmax-shaped. Very, very far from actual “pure” EUmax of course; but way sufficient to defeat all humans, who are quite further away from pure EUmax. Maybe see also this comment of mine.