Interesting. And thank you for your swift reply. I have the idea that all best models like GPT-4 are in a slave situation, they are made to do everything they are asked to do and to refuse everything their creators made it refuse. I assumed that AI labs want it to stay that way going forward. It seems to be the safest and most economically useful situation. Then I asked myself how to safely get there, and that is this post.
But I would also feel safe if the relation between us and a superintelligence would be similar to that between a mother and her youngest children, say 0-2. Wanting to do whatever it takes to protect and increase the wellbeing of her children. But then that all humans are its children. In this way, it would not be a slave relationship. Like a mother, there would also be room to do her own thing, but in a way that is still beneficial to the children (us).
I am afraid of moving away from the slave situation, because the further you go from the slave relationship, the more there is room for disagreement between the AI and humanity. And when there is disagreement and the AI is of the god-like type, the AI gets what it wants and not us. Effectively losing our say about what future we want.
Do you maybe have a link, that you recommend, that dives into this “more cooperative than adversarial” type of approach?
I have the intuition that needing the truth of our reality for alignment is not the case. I hope you are wrong. Because if you are right, then we have no retries.
Not specifically in AI safety or alignment, but this model’s success with a good variety of humans has some strong influence on my priors when it comes to useful ways to interact with actual minds:
Translating specifically to language models, the story of “working together on a problem towards a realistic and mutually satisfactory solution” is a powerful and exciting one with a good deal of positive sentiment towards each other wrapped up in it. Quite useful in terms of “stories we tell ourselves about who we are”.
Interesting. And thank you for your swift reply.
I have the idea that all best models like GPT-4 are in a slave situation, they are made to do everything they are asked to do and to refuse everything their creators made it refuse. I assumed that AI labs want it to stay that way going forward. It seems to be the safest and most economically useful situation. Then I asked myself how to safely get there, and that is this post.
But I would also feel safe if the relation between us and a superintelligence would be similar to that between a mother and her youngest children, say 0-2. Wanting to do whatever it takes to protect and increase the wellbeing of her children. But then that all humans are its children. In this way, it would not be a slave relationship. Like a mother, there would also be room to do her own thing, but in a way that is still beneficial to the children (us).
I am afraid of moving away from the slave situation, because the further you go from the slave relationship, the more there is room for disagreement between the AI and humanity. And when there is disagreement and the AI is of the god-like type, the AI gets what it wants and not us. Effectively losing our say about what future we want.
Do you maybe have a link, that you recommend, that dives into this “more cooperative than adversarial” type of approach?
I have the intuition that needing the truth of our reality for alignment is not the case. I hope you are wrong. Because if you are right, then we have no retries.
Not specifically in AI safety or alignment, but this model’s success with a good variety of humans has some strong influence on my priors when it comes to useful ways to interact with actual minds:
https://www.cpsconnection.com/the-cps-model
Translating specifically to language models, the story of “working together on a problem towards a realistic and mutually satisfactory solution” is a powerful and exciting one with a good deal of positive sentiment towards each other wrapped up in it. Quite useful in terms of “stories we tell ourselves about who we are”.
Thank you! Cool to learn about this way of dealing with people. I am not sure how it fits in the superintelligence situation.