If you finetune the entire network, then that is clearly a superset of just a bit of the network, and means that there are many ways to override or modify the original bit regardless of how small it was.. (If you have a ‘grandmother neuron’, then I can eliminate your ability to remember your grandmother by deleting it… but I could also do that by hitting you on the head. The latter is consistent with most hypotheses about memory.)
I view that the functioning of the grandmother neuron/memory and Luigi/Waluigi roleplay/personality as distintly different....wherein if we are dealing with memory—yes I agree certain network locations/neurons when combined will retrieve a certain instance. However, the idea that we are discussing here is a Luigi/Waluigi roleplay—I look at it as the entire network supporting the personalities.. (Like when the left and right hemisphere conjuring split identities after the brain has been divided into two..).
I would also be hesitant about concluding too much from GPT-2 about anything involving RLHF. After all, a major motivation for creating GPT-3 in the first place was that GPT-2 wasn’t smart enough for RLHF, and RLHF wasn’t working well enough to study effectively. And since the overall trend has been for the smarter the model the simpler & more linear the final representations...
Thank you for explaining why exercising caution is necessary with the results I presented here to GPT-2 but also ignoring the evidences from my projects (this project, another one here & the random responses presented in this post) that GPT-2 (XL) is much smarter than most though is personally not optimal for me. But yeah, I will be mindful of what you said here.
Hello again!
I view that the functioning of the grandmother neuron/memory and Luigi/Waluigi roleplay/personality as distintly different....wherein if we are dealing with memory—yes I agree certain network locations/neurons when combined will retrieve a certain instance. However, the idea that we are discussing here is a Luigi/Waluigi roleplay—I look at it as the entire network supporting the personalities.. (Like when the left and right hemisphere conjuring split identities after the brain has been divided into two..).
Thank you for explaining why exercising caution is necessary with the results I presented here to GPT-2 but also ignoring the evidences from my projects (this project, another one here & the random responses presented in this post) that GPT-2 (XL) is much smarter than most though is personally not optimal for me. But yeah, I will be mindful of what you said here.