Do people think we could make a singleton (or achieve global coordination and preventative policing) just by imitating human policies on computers? If so, this seems pretty safe to me.
Some reasons for optimism: 1) these could be run much faster than a human thinks, and 2) we could make very many of them.
Acquiring data: put a group of people in a house with a computer. Show them things (images, videos, audio files, etc.) and give them a chance to respond at the keyboard. Their keyboard actions are the actions, and everything between actions is an observation. Then learn the policy of the group of humans. By the way, these can be happy humans who earnestly try to follow instructions. To model their policy, we can take the maximum a posteriori estimate over a set of policies which includes the truth, and freeze the policy once we’re satisfied. (This is with unlimited computation; we’d have to use heuristics and approximations in real life). With a maximum a posteriori estimate, this will be quick to run once we freeze the policy, and we’re no longer tracking tons of hypotheses, especially if we used some sort of speed prior. Let T be the number of interaction cycles we record before freezing the policy. For sufficiently large T, it seems to me that running this is safe.
What are people’s intuitions here? Could enough human-imitating artificial agents (running much faster than people) prevent unfriendly AGI from being made?
If we think this would work, there would still be the (neither trivial nor hopeless) challenge of convincing all serious AGI labs that any attempt to run a superhuman AGI is unconscionably dangerous, and we should stick to imitating humans.
Just Imitate Humans?
Do people think we could make a singleton (or achieve global coordination and preventative policing) just by imitating human policies on computers? If so, this seems pretty safe to me.
Some reasons for optimism: 1) these could be run much faster than a human thinks, and 2) we could make very many of them.
Acquiring data: put a group of people in a house with a computer. Show them things (images, videos, audio files, etc.) and give them a chance to respond at the keyboard. Their keyboard actions are the actions, and everything between actions is an observation. Then learn the policy of the group of humans. By the way, these can be happy humans who earnestly try to follow instructions. To model their policy, we can take the maximum a posteriori estimate over a set of policies which includes the truth, and freeze the policy once we’re satisfied. (This is with unlimited computation; we’d have to use heuristics and approximations in real life). With a maximum a posteriori estimate, this will be quick to run once we freeze the policy, and we’re no longer tracking tons of hypotheses, especially if we used some sort of speed prior. Let T be the number of interaction cycles we record before freezing the policy. For sufficiently large T, it seems to me that running this is safe.
What are people’s intuitions here? Could enough human-imitating artificial agents (running much faster than people) prevent unfriendly AGI from being made?
If we think this would work, there would still be the (neither trivial nor hopeless) challenge of convincing all serious AGI labs that any attempt to run a superhuman AGI is unconscionably dangerous, and we should stick to imitating humans.