btw, I was getting ready for an appointment earlier, so I had only skimmed this until now. Thank you for doing this and sharing it. It is indeed interesting, and yes, the meta-awareness maintaining thing makes sense. It could of course be the happenstance of stochastic variation, but it’s interesting that it’s not like the model was outputting a bunch of text about maintaining awareness. If it wasn’t actually doing anything, except for pre-emptively outputting text that spoke of awareness, then token prediction would just have the output be just as reliable. The fact that it aligned with the self-reported difficulty suggests that it’s doing something at the very least.
I just realized that’s what you just said in rereading your concluding paragraph, but i was just coming to the same conclusion in real-time. Genuinely excited that someone else is engaging with this and tackling it from a different angle.
btw, I was getting ready for an appointment earlier, so I had only skimmed this until now. Thank you for doing this and sharing it. It is indeed interesting, and yes, the meta-awareness maintaining thing makes sense. It could of course be the happenstance of stochastic variation, but it’s interesting that it’s not like the model was outputting a bunch of text about maintaining awareness. If it wasn’t actually doing anything, except for pre-emptively outputting text that spoke of awareness, then token prediction would just have the output be just as reliable. The fact that it aligned with the self-reported difficulty suggests that it’s doing something at the very least.
I just realized that’s what you just said in rereading your concluding paragraph, but i was just coming to the same conclusion in real-time. Genuinely excited that someone else is engaging with this and tackling it from a different angle.