Are you predicting there won’t be any lethal autonomous weapons before AGI? It seems like if that ends up being true, it would only be because we coordinated well to prevent that. More generally, we don’t usually try to kill people, whereas we do try to build AGI.
(Whereas I think at least Paul usually thinks about people not paying the “safety tax” because the unaligned AI is still really good at e.g. getting them money, at least in the short term.)
Are you predicting there won’t be any lethal autonomous weapons before AGI?
No… thanks for pressing me on this.
Better at killing an a context where either: the operator would punish the agent if they knew, or the state would punish the operator if they knew. So the agent has to conceal its actions at whichever the level the punishment would occur.
How about a recommendation engine that accidentally learns to show depressed people sequences of videos that affirm their self-hatred that leads them to commit suicide? (It seems plausible that something like this has already happened, though idk if it has.)
I think the thing you actually want to talk about is an agent that “intentionally” deceives its operator / the state? I think even there I’d disagree with your prediction, but it seems more reasonable as a stance (mostly because depending on how you interpret the “intentionally” it may need to have human-level reasoning abilities). Would it count if a malicious actor successfully finetuned GPT-3 to e.g. incite violence while maintaining plausible deniability?
Would it count if a malicious actor successfully finetuned GPT-3 to e.g. incite violence while maintaining plausible deniability?
Yes, that would count. I suspect that many “unskilled workers” would (alone) be better at inciting violence while maintaining plausible deniability than GPT-N at the point in time the leading group had AGI. Unless it’s OpenAI, of course :P
Regarding intentionality, I suppose I didn’t clarify the precise meaning of “better at”, which I did take to imply some degree of intentionality, or else I think “ends up” would have been a better word choice. The impetus for this point was Paul’s concern that someone would have used an AI to kill you to take your money. I think we can probably avoid the difficulty of a rigorous definition intentionality, if we gesture vaguely at “the sort of intentionality required for that to be viable”? But let me know if more precision would be helpful, and I’ll try to figure out exactly what I mean. I certainly don’t think we need to make use of a version of intentionality that requires human-level reasoning.
Are you predicting there won’t be any lethal autonomous weapons before AGI? It seems like if that ends up being true, it would only be because we coordinated well to prevent that. More generally, we don’t usually try to kill people, whereas we do try to build AGI.
(Whereas I think at least Paul usually thinks about people not paying the “safety tax” because the unaligned AI is still really good at e.g. getting them money, at least in the short term.)
No… thanks for pressing me on this.
Better at killing an a context where either: the operator would punish the agent if they knew, or the state would punish the operator if they knew. So the agent has to conceal its actions at whichever the level the punishment would occur.
How about a recommendation engine that accidentally learns to show depressed people sequences of videos that affirm their self-hatred that leads them to commit suicide? (It seems plausible that something like this has already happened, though idk if it has.)
I think the thing you actually want to talk about is an agent that “intentionally” deceives its operator / the state? I think even there I’d disagree with your prediction, but it seems more reasonable as a stance (mostly because depending on how you interpret the “intentionally” it may need to have human-level reasoning abilities). Would it count if a malicious actor successfully finetuned GPT-3 to e.g. incite violence while maintaining plausible deniability?
Yes, that would count. I suspect that many “unskilled workers” would (alone) be better at inciting violence while maintaining plausible deniability than GPT-N at the point in time the leading group had AGI. Unless it’s OpenAI, of course :P
Regarding intentionality, I suppose I didn’t clarify the precise meaning of “better at”, which I did take to imply some degree of intentionality, or else I think “ends up” would have been a better word choice. The impetus for this point was Paul’s concern that someone would have used an AI to kill you to take your money. I think we can probably avoid the difficulty of a rigorous definition intentionality, if we gesture vaguely at “the sort of intentionality required for that to be viable”? But let me know if more precision would be helpful, and I’ll try to figure out exactly what I mean. I certainly don’t think we need to make use of a version of intentionality that requires human-level reasoning.