One solution I can see for AGI is to build in some low-level discriminator that prevents the agent from collecting massive reward. If the agent is expecting to get near-infinite reward in the near future by wiping out humanity using nanotech, then we can set a solution so it decides to do something that will earn it a more finite amount of reward (like obeying our commands).
This has a parallel with drugs here on Earth. Most people are a little afraid of that type of high.
This probably isn’t an effective solution, but I’d love to hear why so I can keep refining my ideas.
Very cool! So this idea has been thought of, and it doesn’t seem totally unreasonable, though it definitely isn’t a perfect solution. A neat idea is a sort of ‘laziness’ score so that it doesn’t take too many high-impact options.
It would be interesting to try to build an AI alignment testing ground, where you have a little simulated civilization and try to use AI to align properly with it, given certain commands. I might try to create it in Unity to test some of these ideas out in the (less abstract than text and slightly more real) world.
One solution I can see for AGI is to build in some low-level discriminator that prevents the agent from collecting massive reward. If the agent is expecting to get near-infinite reward in the near future by wiping out humanity using nanotech, then we can set a solution so it decides to do something that will earn it a more finite amount of reward (like obeying our commands).
This has a parallel with drugs here on Earth. Most people are a little afraid of that type of high.
This probably isn’t an effective solution, but I’d love to hear why so I can keep refining my ideas.
A discussion of related ideas on Arbital: mild optimization.
Very cool! So this idea has been thought of, and it doesn’t seem totally unreasonable, though it definitely isn’t a perfect solution. A neat idea is a sort of ‘laziness’ score so that it doesn’t take too many high-impact options.
It would be interesting to try to build an AI alignment testing ground, where you have a little simulated civilization and try to use AI to align properly with it, given certain commands. I might try to create it in Unity to test some of these ideas out in the (less abstract than text and slightly more real) world.