The AI safety community claims it is hard to specify reward functions. If we actually believe this claim, we should be able to create tasks where even if we allow people to specify reward functions, they won’t be able to do so. That’s what we’ve tried to do here.
It being hard to specify reward functions for a specific task and it being hard to specify reward functions for a more general AGI seem to me like two very different problems.
Additionally, developing a safe system and developing a nonsafe system are very different. Even if your reward function works 99,9% of the time it can be exploited in those cases where it fails.
Okay, regardless of what the AI safety community claims, I want to make that claim.
(I think a substantial chunk of the AI safety community also makes that claim but I’m not interested in defending that here.)
It being hard to specify reward functions for a specific task and it being hard to specify reward functions for a more general AGI seem to me like two very different problems.
As an aside, if I thought we could build task-specific AI systems for arbitrary tasks, and only super general AI systems were dangerous, I’d be advocating really hard for sticking with task-specific AI systems and never building super general AI systems (or only building them after some really high threshold of safety was met).
if I thought we could build task-specific AI systems for arbitrary tasks, and only super general AI systems were dangerous, I’d be advocating really hard for sticking with task-specific AI systems and never building super general AI systems
The problem with this is that you need an AI whose task is “protect humanity from unaligned AIs”, which is already very “general” in a way (i.e. requires operating on large scales of space, time and strategy). Unless you can effectively reduce this to many “narrow” tasks which is probably not impossible but also not easy.
I think it’s very easy to say “don’t do general system do task specific ones”, because general ones might promise a lot of economic returns.
A task like “Handle this Amazon customer query correctly” is already very general as it includes a host of different long tail issues about possible bugs that might appear (some of those unknown). If a customer faces an issue on a page that’s likely a bug, a customer service AI profits from understanding the code that produces the issue that the customer has.
Given the way economic pressures work, I see it as very probably that companies will just go ahead and look at what’s most efficient for their business goals.
It being hard to specify reward functions for a specific task and it being hard to specify reward functions for a more general AGI seem to me like two very different problems.
Additionally, developing a safe system and developing a nonsafe system are very different. Even if your reward function works 99,9% of the time it can be exploited in those cases where it fails.
Okay, regardless of what the AI safety community claims, I want to make that claim.
(I think a substantial chunk of the AI safety community also makes that claim but I’m not interested in defending that here.)
As an aside, if I thought we could build task-specific AI systems for arbitrary tasks, and only super general AI systems were dangerous, I’d be advocating really hard for sticking with task-specific AI systems and never building super general AI systems (or only building them after some really high threshold of safety was met).
The problem with this is that you need an AI whose task is “protect humanity from unaligned AIs”, which is already very “general” in a way (i.e. requires operating on large scales of space, time and strategy). Unless you can effectively reduce this to many “narrow” tasks which is probably not impossible but also not easy.
I think it’s very easy to say “don’t do general system do task specific ones”, because general ones might promise a lot of economic returns.
A task like “Handle this Amazon customer query correctly” is already very general as it includes a host of different long tail issues about possible bugs that might appear (some of those unknown). If a customer faces an issue on a page that’s likely a bug, a customer service AI profits from understanding the code that produces the issue that the customer has.
Given the way economic pressures work, I see it as very probably that companies will just go ahead and look at what’s most efficient for their business goals.
It’s not clear that a system which doesn’t use reward doesn’t have the same issue (relative to “99,9% of the time”).