I think there has to be some kind of reward or loss function, in the current paradigm anyway. That’s what gradient descent uses to know such weights to adjust on each update.
Like what are you imagining is the input output channel of this AI? Maybe discussing this a bit would help us clarify.
To steelman, I’d guess this idea applies in the hypothetical where GPT-N gains general intelligence and agency (such as via a mesa-optimizer) just by predicting the next token.
Does there have to be a reward? This is using brute force to create the underlying world model. It’s just adjusting weights right?
I think there has to be some kind of reward or loss function, in the current paradigm anyway. That’s what gradient descent uses to know such weights to adjust on each update.
Like what are you imagining is the input output channel of this AI? Maybe discussing this a bit would help us clarify.
To steelman, I’d guess this idea applies in the hypothetical where GPT-N gains general intelligence and agency (such as via a mesa-optimizer) just by predicting the next token.