In a transformed-except-corporate-ownership-stays-the-same world, I don’t see any reason such lottery winners’ portion wouldn’t increase asymptotically toward 100 percent, with nobody else getting anything at all.
Even without an overtly revolutionary restructuring, I kind of doubt “OpenAI owns everything” would fly. Maybe corporate ownership would stay exactly the same, but there’d be a 99.999995 percent tax rate.
Well, that’s where the “safe” part comes in, isn’t it?
I think a fair number of people would say that ASI/AGI can’t be called “safe” if it’s willing to wage war to physically take over the world on behalf of its owners, or to go around breaking laws all the time, or to thwart whatever institutions are supposed to make and enforce the laws. I’m pretty sure that even OpenAI’s (present) “safety” department would have an issue if ChatGPT started saying stuff like “Sam Altman is Eternal Tax-Exempt God-King”.
Personally, I go further than that. I’m not sure about “basic” AGI, but I’m pretty confident that very powerful ASI, the kind that would be capable of really total world domination, can’t be called “safe” if it leaves really decisive power over anything in the hands of humans, individually or collectively, directly or via institutions. To be safe, it has to enforce its own ideas about how things should go. Otherwise the humans it empowers are probably going to send things south irretrievably fairly soon, and if they don’t do so very soon they always still could, and you can’t call that safe.
Yeah, that means you get exactly one chance to get “its own ideas” right, and no, I don’t think that success is likely. I don’t think it’s technically likely to be able to “align” it to any particular set of values. I also don’t think people or insitutions would make good choices about what values to give it even if they could. AND I don’t think anybody can prevent it from getting built for very long. I put more hope in it being survivably unsafe (maybe because it just doesn’t usually happen to care to do anything to/with humans), or on intelligence just not being that powerful, or whatever. Or even in it just luckily happening to at least do something less boring or annoying than paperclipping the universe or mass torture or whatever.
… assuming the values you want are learnable and “convergeable” upon. “Alignment” doesn’t even necessarily have a coherent meaning.
Actual humans aren’t “aligned” with each other, and they may not be consistent enough that you can say they’re always “aligned” with themselves. Most humans’ values seem to drive them toward vaguely similar behavior in many ways… albeit with lots of very dramatic exceptions. How they articulate their values and “justify” that behavior varies even more widely than the behavior itself. Humans are frequently willing to have wars and commit various atrocities to fight against legitimately human values other than their own. Yet humans have the advantage of starting with a lot of biological commonality.
The idea that there’s some shared set of values that a machine can learn that will make everybody even largely happy seems, um, naive. Even the idea that it can learn one person’s values, or be engineered to try, seems really optimistic.
Anyway, even if the approach did work, that would just mean that “its own ideas” were that it had to learn about and implement your (or somebody’s?) values, and also that its ideas about how to do that are sound. You still have to get that right before the first time it becomes uncontrollable. One chance, no matter how you slice it.
Actual humans aren’t “aligned” with each other, and they may not be consistent enough that you can say they’re always “aligned” with themselves.
Completely agreed, see for example my post 3. Uploading which makes this exact point at length.
Anyway, even if the approach did work, that would just mean that “its own ideas” were that it had to learn about and implement your (or somebody’s?) values, and also that its ideas about how to do that are sound. You still have to get that right before the first time it becomes uncontrollable. One chance, no matter how you slice it.
The point is that you now get one shot at a far simpler task: defining “your purpose as an AI is to learn about and implement the humans’ collective values” is a lot more compact, and a lot easier to get right first time, than an accurate description of human values in their full large-and-fairly-fragile details. As I demonstrate in the post linked to in that quote, the former, plus its justification as being obvious and stable under reflection, can be described in exhaustive detail on a few pages of text.
As for the the model’s ideas on how to do that research being sound, that’s a capabilities problem: if the model is incapable of performing a significant research project when at least 80% of the answer is already in human libraries, then it’s not much of an alignment risk.
I definitely agree that under the more common usage of safety that an AI doing what a human ordered in taking over the world or breaking laws for their owner would not be classified as safe, but in an AI safety context, alignment/safety does usually mean that these outcomes would be classified as safe.
My own view is that the technical problem is IMO shaping up to be a relatively easy problem, but I think that the political problems of advanced AI will probably prove a lot harder, especially in a future where humans control AIs for a long time.
I’m not necessarily going to argue with your characterization of how the “AI safety” field views the world. I’ve noticed myself that people say “maintaining human control” pretty much interchangeably with “alignment”, and use both of those pretty much interchangeably with “safety”. And all of the above have their own definition problems.
I think that’s one of several reasons that the “AI safety” field has approximately zero chance of avoiding any of the truly catastrophic possible outcomes.
I agree that the conflation between maintaining human control and alignment and safety is a problem, and to be clear I’m not saying that the outcome of human-controlled AI taking over because someone ordered to do that is an objectively safe outcome.
I agree at present the AI safety field is poorly equipped to avoid catastrophic outcomes that don’t involve extinction from uncontrolled AIs.
Well yeah, exactly.
Taxes enforced by whom?
Well, that’s where the “safe” part comes in, isn’t it?
I think a fair number of people would say that ASI/AGI can’t be called “safe” if it’s willing to wage war to physically take over the world on behalf of its owners, or to go around breaking laws all the time, or to thwart whatever institutions are supposed to make and enforce the laws. I’m pretty sure that even OpenAI’s (present) “safety” department would have an issue if ChatGPT started saying stuff like “Sam Altman is Eternal Tax-Exempt God-King”.
Personally, I go further than that. I’m not sure about “basic” AGI, but I’m pretty confident that very powerful ASI, the kind that would be capable of really total world domination, can’t be called “safe” if it leaves really decisive power over anything in the hands of humans, individually or collectively, directly or via institutions. To be safe, it has to enforce its own ideas about how things should go. Otherwise the humans it empowers are probably going to send things south irretrievably fairly soon, and if they don’t do so very soon they always still could, and you can’t call that safe.
Yeah, that means you get exactly one chance to get “its own ideas” right, and no, I don’t think that success is likely. I don’t think it’s technically likely to be able to “align” it to any particular set of values. I also don’t think people or insitutions would make good choices about what values to give it even if they could. AND I don’t think anybody can prevent it from getting built for very long. I put more hope in it being survivably unsafe (maybe because it just doesn’t usually happen to care to do anything to/with humans), or on intelligence just not being that powerful, or whatever. Or even in it just luckily happening to at least do something less boring or annoying than paperclipping the universe or mass torture or whatever.
Not if you built a model that does (or on reflection decides to do) value learning: then you instead get to be its research subject and interlocutor while it figures out its ideas. But yes, you do need to start the model off close enough to aligned that it converges to value learning.
… assuming the values you want are learnable and “convergeable” upon. “Alignment” doesn’t even necessarily have a coherent meaning.
Actual humans aren’t “aligned” with each other, and they may not be consistent enough that you can say they’re always “aligned” with themselves. Most humans’ values seem to drive them toward vaguely similar behavior in many ways… albeit with lots of very dramatic exceptions. How they articulate their values and “justify” that behavior varies even more widely than the behavior itself. Humans are frequently willing to have wars and commit various atrocities to fight against legitimately human values other than their own. Yet humans have the advantage of starting with a lot of biological commonality.
The idea that there’s some shared set of values that a machine can learn that will make everybody even largely happy seems, um, naive. Even the idea that it can learn one person’s values, or be engineered to try, seems really optimistic.
Anyway, even if the approach did work, that would just mean that “its own ideas” were that it had to learn about and implement your (or somebody’s?) values, and also that its ideas about how to do that are sound. You still have to get that right before the first time it becomes uncontrollable. One chance, no matter how you slice it.
Completely agreed, see for example my post 3. Uploading which makes this exact point at length.
True. Or, as I put it just above:
The point is that you now get one shot at a far simpler task: defining “your purpose as an AI is to learn about and implement the humans’ collective values” is a lot more compact, and a lot easier to get right first time, than an accurate description of human values in their full large-and-fairly-fragile details. As I demonstrate in the post linked to in that quote, the former, plus its justification as being obvious and stable under reflection, can be described in exhaustive detail on a few pages of text.
As for the the model’s ideas on how to do that research being sound, that’s a capabilities problem: if the model is incapable of performing a significant research project when at least 80% of the answer is already in human libraries, then it’s not much of an alignment risk.
I definitely agree that under the more common usage of safety that an AI doing what a human ordered in taking over the world or breaking laws for their owner would not be classified as safe, but in an AI safety context, alignment/safety does usually mean that these outcomes would be classified as safe.
My own view is that the technical problem is IMO shaping up to be a relatively easy problem, but I think that the political problems of advanced AI will probably prove a lot harder, especially in a future where humans control AIs for a long time.
I’m not necessarily going to argue with your characterization of how the “AI safety” field views the world. I’ve noticed myself that people say “maintaining human control” pretty much interchangeably with “alignment”, and use both of those pretty much interchangeably with “safety”. And all of the above have their own definition problems.
I think that’s one of several reasons that the “AI safety” field has approximately zero chance of avoiding any of the truly catastrophic possible outcomes.
I agree that the conflation between maintaining human control and alignment and safety is a problem, and to be clear I’m not saying that the outcome of human-controlled AI taking over because someone ordered to do that is an objectively safe outcome.
I agree at present the AI safety field is poorly equipped to avoid catastrophic outcomes that don’t involve extinction from uncontrolled AIs.