This paper shows how you can make the model’s loss function dependent on both predicting the future and predicting itself: https://arxiv.org/abs/2407.10188
So I think these give us a place to start. Do we notice qualitative shifts in behavior following a fine-tuning regime which combines these techniques?
Given this, why then do I merely doubt the presence of phenomenal consciousness instead of being certain of its absence?
Because a reseacher friend pointed out to me that one could interpret RL training as supplying a sort of ‘valence’.
I think the most direct way to clear up this confusion is through experiments.
This paper shows how you can explicitly give a transformer model access to its internal state: https://scholar.google.com/citations?view_op=view_citation&hl=en&user=LKv32bgAAAAJ&sortby=pubdate&citation_for_view=LKv32bgAAAAJ:dQ2og3OwTAUC
This paper shows how you can make the model’s loss function dependent on both predicting the future and predicting itself: https://arxiv.org/abs/2407.10188
So I think these give us a place to start. Do we notice qualitative shifts in behavior following a fine-tuning regime which combines these techniques?
I have ideas for further experiments at well.