I guess I’m still missing some of the positive story. Why should the AI optimally using the encoder-human-decoder part of the graph to do computation involve explaining clearly to the human what it thinks happens in the future?
Why wouldn’t it do things like showing us a few different video clips and asking which looks most realistic to us, without giving any information about what it “really thinks” about the diamond? Or even worse, why wouldn’t gradient descent learn to just treat the fixed human-simulator as a “reservoir” that it can use in unintended ways?
I do think this is what happens given the current architecture. I argued that the desired outcome solves narrow ELK as a sanity check, but I’m not claiming that the desired setup is uniquely loss-minimizing.
Part of my original intuition was “the human net is set up to be useful at predicting given honest information about the situation”, and “pressure for simpler reporters will force some kind of honesty, but I don’t know how far that goes.” As time passed I became more and more aware of how this wasn’t the best/only way for the human net to help with prediction, and turned more towards a search for a crisp counterexample.
I guess I’m still missing some of the positive story. Why should the AI optimally using the encoder-human-decoder part of the graph to do computation involve explaining clearly to the human what it thinks happens in the future?
Why wouldn’t it do things like showing us a few different video clips and asking which looks most realistic to us, without giving any information about what it “really thinks” about the diamond? Or even worse, why wouldn’t gradient descent learn to just treat the fixed human-simulator as a “reservoir” that it can use in unintended ways?
I do think this is what happens given the current architecture. I argued that the desired outcome solves narrow ELK as a sanity check, but I’m not claiming that the desired setup is uniquely loss-minimizing.
Part of my original intuition was “the human net is set up to be useful at predicting given honest information about the situation”, and “pressure for simpler reporters will force some kind of honesty, but I don’t know how far that goes.” As time passed I became more and more aware of how this wasn’t the best/only way for the human net to help with prediction, and turned more towards a search for a crisp counterexample.