What I get from essentially the same observations of ChatGPT is increase in AI risk without shortening of timelines, which were already with median at 2032-2042 for me. My model is that there is a single missing piece to the puzzle (of AGI, not alignment), generation of datasets for SSL (and then an IDA loop does the rest). This covers a current bottleneck, but also feels like a natural way of fixing the robustness woes.
Before ChatGPT, I expected that GPT-n is insufficiently coherent to set it up directly, in something like HCH bureaucracies, and fine-tuned versions tend to lose their map of the world, what they generate can no longer be straightforwardly reframed into an improvement over (amplification of) what the non-fine-tuned SSL phase trained on. This is good, because I expect a more principled method of filling the gaps in the datasets for SSL is the sort of reflection (in the usual human sense) that boosts natural abstraction, makes learning less lazy, promotes easier alignment. If straightforward bureaucracies of GPT-n can’t implement reflection, that is a motivation to figure out how to do this better.
But now I’m more worried that GPT-n with some fine-tuning and longer-term memory for model instances could be sufficiently close to human level to do reflection/generation directly, without a better algorithm. And that’s an alignment hazard, unless there is a stronger resolve to only use this for strawberry alignment tasks not too far away from human level of capability, which I’m not seeing at all.
What I get from essentially the same observations of ChatGPT is increase in AI risk without shortening of timelines, which were already with median at 2032-2042 for me. My model is that there is a single missing piece to the puzzle (of AGI, not alignment), generation of datasets for SSL (and then an IDA loop does the rest). This covers a current bottleneck, but also feels like a natural way of fixing the robustness woes.
Before ChatGPT, I expected that GPT-n is insufficiently coherent to set it up directly, in something like HCH bureaucracies, and fine-tuned versions tend to lose their map of the world, what they generate can no longer be straightforwardly reframed into an improvement over (amplification of) what the non-fine-tuned SSL phase trained on. This is good, because I expect a more principled method of filling the gaps in the datasets for SSL is the sort of reflection (in the usual human sense) that boosts natural abstraction, makes learning less lazy, promotes easier alignment. If straightforward bureaucracies of GPT-n can’t implement reflection, that is a motivation to figure out how to do this better.
But now I’m more worried that GPT-n with some fine-tuning and longer-term memory for model instances could be sufficiently close to human level to do reflection/generation directly, without a better algorithm. And that’s an alignment hazard, unless there is a stronger resolve to only use this for strawberry alignment tasks not too far away from human level of capability, which I’m not seeing at all.