Problem: It’s really hard to figure out how it will interepret its utility function when it learns about the real world. If we make something that want Vpaperclips, will it also care about making Vpaperclip like things in the real world when if it finds out about us?
BIG problem:
Even if it wants something strictly virtual, it can get it easier if it has physical control. It’s in its interest to convert the universe to a computer and copy vpaperclips directly in memory, rather than running a virtual factory on virtual energy.
Possible solution:
I think there are ways to write it a program such that even if it inferred our existence, it would optimize away from us, rather than over us. Loosely: A goal like “I need to organize these instructions within this block of memory to solve a problem specified at address X.” needs to be implemented such that it produces a subgoal like “I need to write a subroutine to patch over the fact that an error in the VM I’m running on gives me a window of access into a universe with huge computation resources and godlike power over my memory space, so that my solution get get the right answer to it’s arithmetic and sole the puzzle.” It should want to do things in a way that isn’t cheating.
This was my line of thought a week or so ago, It’s developed now to the point that the proper course seems to do away with the VM entirely, or allowing the AI to run tests, and just have it go through the motions of working out a solution based on it’s understanding. If I could write an AI that can determine it needs to put an IF statement somewhere, actually outputting it is superfluous. Don’t put your AI in a virtual world, just make it understand one.
Also, I plan to start development on a spiral notebook, as opposed to a linux one.
Possible solution: I think there are ways to write it a program such that even if it inferred our existence, it would optimize away from us, rather than over us. Loosely: A goal like “I need to organize these instructions within this block of memory to solve a problem specified at address X.” needs to be implemented such that it produces a subgoal like “I need to write a subroutine to patch over the fact that an error in the VM I’m running on gives me a window of access into a universe with huge computation resources and godlike power over my memory space, so that my solution get get the right answer to it’s arithmetic and sole the puzzle.” It should want to do things in a way that isn’t cheating.
Marcello had a crazy idea for doing this; it’s the only suggestion for AI-boxing I’ve ever heard that doesn’t have an obvious cloud of doom hanging over it. However, you still have to prove stability of the boxed AI’s goal system.
Problem: It’s really hard to figure out how it will interepret its utility function when it learns about the real world. If we make something that want Vpaperclips, will it also care about making Vpaperclip like things in the real world when if it finds out about us?
BIG problem: Even if it wants something strictly virtual, it can get it easier if it has physical control. It’s in its interest to convert the universe to a computer and copy vpaperclips directly in memory, rather than running a virtual factory on virtual energy.
Possible solution: I think there are ways to write it a program such that even if it inferred our existence, it would optimize away from us, rather than over us. Loosely: A goal like “I need to organize these instructions within this block of memory to solve a problem specified at address X.” needs to be implemented such that it produces a subgoal like “I need to write a subroutine to patch over the fact that an error in the VM I’m running on gives me a window of access into a universe with huge computation resources and godlike power over my memory space, so that my solution get get the right answer to it’s arithmetic and sole the puzzle.” It should want to do things in a way that isn’t cheating.
This was my line of thought a week or so ago, It’s developed now to the point that the proper course seems to do away with the VM entirely, or allowing the AI to run tests, and just have it go through the motions of working out a solution based on it’s understanding. If I could write an AI that can determine it needs to put an IF statement somewhere, actually outputting it is superfluous. Don’t put your AI in a virtual world, just make it understand one.
Also, I plan to start development on a spiral notebook, as opposed to a linux one.
Marcello had a crazy idea for doing this; it’s the only suggestion for AI-boxing I’ve ever heard that doesn’t have an obvious cloud of doom hanging over it. However, you still have to prove stability of the boxed AI’s goal system.
Can you link to (or otherwise more fully describe) this crazy idea?