I think Alexander Kruel has been writing about things like this for a while (but arguing in the opposite direction). Here’s an example.
I find his arguments unpersuasive so far, but steelmaning a little bit. I you could argue that giving an AI any goal at all would basically entail making it grok humans, and the jump from that to correctly holding human values would be short.
Your post just seems to be introducing the concept of accidentally creating a super-powerful paperclip-maximizing AI, which is an idea that we’ve all been talking about for years. I can’t tell what part is supposed to be new—is it that this AI would actually be smart and not just an idiot savant?
The ideas that AIs follow their programming, and that intelligence and values are orthogonal seem like pretty well-established concepts around here. And, in particular, a lot of our discussion about hypothetical Clippies has presupposed that they would understand humans well enough to engage in game theory scenarios with us.
I’ve had an online conversation where it was argued that AI goals other than what was intended by the programmers would be evidence of a faulty AI—and hence that it wouldn’t be a dangerous one. This post was a direct response to that.
I feel like I’ve seen this post before...
I think Alexander Kruel has been writing about things like this for a while (but arguing in the opposite direction). Here’s an example.
I find his arguments unpersuasive so far, but steelmaning a little bit. I you could argue that giving an AI any goal at all would basically entail making it grok humans, and the jump from that to correctly holding human values would be short.
? Never written anything like this… Have others?
Your post just seems to be introducing the concept of accidentally creating a super-powerful paperclip-maximizing AI, which is an idea that we’ve all been talking about for years. I can’t tell what part is supposed to be new—is it that this AI would actually be smart and not just an idiot savant?
The ideas that AIs follow their programming, and that intelligence and values are orthogonal seem like pretty well-established concepts around here. And, in particular, a lot of our discussion about hypothetical Clippies has presupposed that they would understand humans well enough to engage in game theory scenarios with us.
Am I missing something?
I’ve had an online conversation where it was argued that AI goals other than what was intended by the programmers would be evidence of a faulty AI—and hence that it wouldn’t be a dangerous one. This post was a direct response to that.
Ah, I see. Fair enough, I agree.
It’s vaguely reminiscent of “a computer is only as stupid as its programmer” memes.