One minor problem, AI’s might be asked to solve problems with no known solutions (EG:write code that solves these test cases) and might be pitted against one another (EG:find test cases for which these two functions are not equivalent)
I’d agree that this is plausible but in the scenarios where the AI can read the literal answer key, they can probably read out the OS code and hack the entire training environment.
RL training will be parallelized. Multiple instances of the AI might be interacting with individual sandboxed environments on a single machine. In this case communication between instances will definitely be possible unless all timing cues can be removed from the sandbox environement which won’t be done.
One minor problem, AI’s might be asked to solve problems with no known solutions (EG:write code that solves these test cases) and might be pitted against one another (EG:find test cases for which these two functions are not equivalent)
That’s definitely something people might ask the AI to do during deployment / inference, but during training via SGD, the problem the AI is asked to solve has to be one in which the trainer knows an answer for, in order to calculate a loss and a gradient.
One minor problem, AI’s might be asked to solve problems with no known solutions (EG:write code that solves these test cases) and might be pitted against one another (EG:find test cases for which these two functions are not equivalent)
I’d agree that this is plausible but in the scenarios where the AI can read the literal answer key, they can probably read out the OS code and hack the entire training environment.
RL training will be parallelized. Multiple instances of the AI might be interacting with individual sandboxed environments on a single machine. In this case communication between instances will definitely be possible unless all timing cues can be removed from the sandbox environement which won’t be done.
That’s definitely something people might ask the AI to do during deployment / inference, but during training via SGD, the problem the AI is asked to solve has to be one in which the trainer knows an answer for, in order to calculate a loss and a gradient.