I think this is worth considering even in cases of things like recursive prompting for a couple of reasons that I haven’t organized:
In general additional compute time lets you improve your answer, so by default I’d expect answers to any significant question to take multiple steps
I agree for trivial questions it might not do any of this. My expectation is that AI deployments impacts will mostly be on things like non-trivial questions or tasks
So far current research hasn’t done a great deal of regularizing or penalizing models for using a lot of thinking
Mostly they’ve optimized for “maximize the chance of correct answer”
This could change in the future, especially in cases where the model is expected to be deployed / otherwise sensitive to runtime costs
Steganography is rewarded over openness if the openness is somehow far from the prior for a given piece of information it wants to transmit
Caveat this point is more speculative than the above document
One of my points is that due to null space + optimization pressure, it’s very easy and convenient for gradient descent to put information in these hidden dimensions
I think one of the things this can do well is handle concepts that don’t yet exist—and so reasoning moves like “invent a new term” or “give an illustrative example” or “invent a thought experiment” are lower likelihood under the prior
In these cases, it can be more likely for steganography to transmit certain information, if it is less expensive in terms of KL penalties
I do agree that for straightforward reasoning on straightforward evidence, where the reasoning is human-like and close to the prior, we can expect it to mostly reason out loud in a way that is “open”
Myopia—though less important than it was earlier in RFLO research—seemed like a good alignment criterion to aim for, and still does today somewhat
The original starting point for this direction of thinking was about Myopia / Non-Myopia in chain of thought reasoning systems
The sorts of guarantees I was hoping to get for chain of thought systems included myopia guarantees
Steganography came up as an example of how it could break myopia (other more complicated examples include side effects and situational awareness)
The concepts in this document came up by asking “how might steganography arise naturally given current research techniques and directions”
I think this is worth considering even in cases of things like recursive prompting for a couple of reasons that I haven’t organized:
In general additional compute time lets you improve your answer, so by default I’d expect answers to any significant question to take multiple steps
I agree for trivial questions it might not do any of this. My expectation is that AI deployments impacts will mostly be on things like non-trivial questions or tasks
So far current research hasn’t done a great deal of regularizing or penalizing models for using a lot of thinking
Mostly they’ve optimized for “maximize the chance of correct answer”
This could change in the future, especially in cases where the model is expected to be deployed / otherwise sensitive to runtime costs
Steganography is rewarded over openness if the openness is somehow far from the prior for a given piece of information it wants to transmit
Caveat this point is more speculative than the above document
One of my points is that due to null space + optimization pressure, it’s very easy and convenient for gradient descent to put information in these hidden dimensions
I think one of the things this can do well is handle concepts that don’t yet exist—and so reasoning moves like “invent a new term” or “give an illustrative example” or “invent a thought experiment” are lower likelihood under the prior
In these cases, it can be more likely for steganography to transmit certain information, if it is less expensive in terms of KL penalties
I do agree that for straightforward reasoning on straightforward evidence, where the reasoning is human-like and close to the prior, we can expect it to mostly reason out loud in a way that is “open”
Myopia—though less important than it was earlier in RFLO research—seemed like a good alignment criterion to aim for, and still does today somewhat
The original starting point for this direction of thinking was about Myopia / Non-Myopia in chain of thought reasoning systems
The sorts of guarantees I was hoping to get for chain of thought systems included myopia guarantees
Steganography came up as an example of how it could break myopia (other more complicated examples include side effects and situational awareness)
The concepts in this document came up by asking “how might steganography arise naturally given current research techniques and directions”