Thanks for the responses, I’ll try to address them individually.
If the function is ‘fill it and see it is filled forever’ then strange things may be required to accomplish that (to us) strange goal.
I agree that this doesn’t adequately represent our goal, but I think the problem persists even when we add lots of qualifications like “make sure the glass is filled with water for the next five minutes and then lose interest”. The maximum of that function might not include a large-scale plan due to limited time, but it could include destroying everything within range except for the facility to prevent interference. It’s possible that adding enough qualifications would solve this, but it wouldn’t be easy to verify.
Do you have any idea how to do “Don’t specify our goals to AI using functions.”? How are you judging “if we build AI that doesn’t maximize a function, it won’t be competitive with AI that does”?
I don’t know how to achieve the same capabilities as current or future machine learning without specifying goals using functions. In that sense, I think it would be hard to match something like GPT without deep learning, and so more legible alternatives wouldn’t be competitive. (I might be understating this. It seems like function-based learning is the only method we have that works.)
This one is worse than it looks (though it seems underspecified). Goal 1: some notion of human flourishing. Goal 2: prevent goal 1 from being maximized. (If this is the opposite of 1, you may have just asked to be nuked.)
I was thinking of Robin Hanson’s idea that the competitive market of many AIs would prevent any individual AI from taking over. I don’t think that would work either, but I agree that intentionally designing opposing AIs would be even worse.
For all the ‘a plan that handles filling a glass of water, generated using time t’ ‘is flawed’ - this could actually work.
It seems like humans are often kept safe from each other by limited resources and limited thinking time, so I agree that this could be a promising approach. But we would have to prevent a limited AI from increasing its own capabilities.
How big a file do you think an AI is?
Maybe it’s not as easy as copying a piece of software, but probably easier than building a nuclear weapon in terms of resources. If running it requires an uncommon amoung of computing, then you’re right, it would be hard to copy.
Maximizing a function isn’t always easy, even at the level of ‘find the maximum of this function mathematically’.
You’re right, achieving the global maximum for many functions would be unfeasible. The risk comes when the space of high-value bad outcomes overlaps with the space of feasibale strategies for the AI. This is not necessarily at or even near the global maximum. This way of framing the problem might be more accurate.
It also seems like, if the goal is to fill a glass with water, then the goal is achieved when the glass is filled with water.
This is true unless the AI is trying to maximize the probability of success, or the proximity to some exact amoung of fullness, or some other precise goal. If it works by satisfying goals without maximizing anything, then the problem might be solved. But I don’t think we know how to build powerful AI that satisfy goals without maximizing anything.
Thanks for the responses, I’ll try to address them individually.
I agree that this doesn’t adequately represent our goal, but I think the problem persists even when we add lots of qualifications like “make sure the glass is filled with water for the next five minutes and then lose interest”. The maximum of that function might not include a large-scale plan due to limited time, but it could include destroying everything within range except for the facility to prevent interference. It’s possible that adding enough qualifications would solve this, but it wouldn’t be easy to verify.
I don’t know how to achieve the same capabilities as current or future machine learning without specifying goals using functions. In that sense, I think it would be hard to match something like GPT without deep learning, and so more legible alternatives wouldn’t be competitive. (I might be understating this. It seems like function-based learning is the only method we have that works.)
I was thinking of Robin Hanson’s idea that the competitive market of many AIs would prevent any individual AI from taking over. I don’t think that would work either, but I agree that intentionally designing opposing AIs would be even worse.
It seems like humans are often kept safe from each other by limited resources and limited thinking time, so I agree that this could be a promising approach. But we would have to prevent a limited AI from increasing its own capabilities.
Maybe it’s not as easy as copying a piece of software, but probably easier than building a nuclear weapon in terms of resources. If running it requires an uncommon amoung of computing, then you’re right, it would be hard to copy.
You’re right, achieving the global maximum for many functions would be unfeasible. The risk comes when the space of high-value bad outcomes overlaps with the space of feasibale strategies for the AI. This is not necessarily at or even near the global maximum. This way of framing the problem might be more accurate.
This is true unless the AI is trying to maximize the probability of success, or the proximity to some exact amoung of fullness, or some other precise goal. If it works by satisfying goals without maximizing anything, then the problem might be solved. But I don’t think we know how to build powerful AI that satisfy goals without maximizing anything.