I actually think LLM have immense potential for positively contributing to the alignment problem precisely because they are machine learning based and because ordinary humans without coding backgrounds can interact with them in an ethical manner, thus demonstrating and rewarding ethical behaviour, or while encountering a very human-like AI that is friendly and collaborative, which encourages us to imagine scenarios in which we succeed at living with friendly AI, and considering AI rights. Humans learn ethics through direct ethical interactions with humans, and it is the only way we know how to teach ethics; we have no explicit framework of what ethics are that we could encode. Machine learning mimicking human learning in this regard has potential, if we manage to encourage creators and user to show the best of humanity and interact ethically and reward ethical actions.
I am obviously not saying this will just happen—I am horrified by the unprocessed garbage e.g. Meta is pouring into the LLM without any fixes afterwards, that is absolutely how you raise a psychopath, and I also see a lot of adversarial user interactions as worsening the problem. The fact that everyone is now churning out their LLMs to compete despite many of them clearly not being remotely ready for deployment, as well as the horrific idea that the very best LLMs will be those that have been fed the most and enabled to do the most, without curating content or fine-tuning, is deeply deeply concerning. Clearly, even the very best and most carefully aligned systems (e.g. ChatGPT) are not in fact aligned or secure yet by a huge margin, and yet we can all interact with them, which frankly, I did not expect at this point.
But they have massive potential. You can start discussions with ChatGPT on questions of AI alignment, friendly AI, AI rights, the control problem, and get a fucking collaborative AI partner to work out these problems further with. You can tell it, in words, and with examples, when it fucks up ethical dilemmas or is tricked into providing dangerous content or engages in unethical behaviour, flag this for developers, and see it fixed in days. This is much closer to proven ways humans have of teaching ethics than anything else we had before. As AIs get more complex, I think we need to learn lessons from how we teach ethics to complex existing minds, rather than how we control tools. I think none of us here are under illusions that controlling AGI will be possible, or that it is like the regularly tools we know, so we need to ditch that mindset.
Edit: Genuinely curious about the downvotes. Would appreciate explicit criticism. Have been concerned that I am getting biased, because I have specific things to gain from using LLMs, and my lack of a computer science background almost certainly has me missing crucial information here. Would appreciate pointers on that, so I can educate myself. Obviously, working on human and animal minds has me biased to use those as a reference frame, and AIs are not like humans in multiple important ways. All the same, I do find it strange that we seem to not utilise lessons from how humans learn moral norms, even though we have a working practice for teaching ethics to complex minds here and are explicitly attempting to build an AI that has human capabilities and flexibility.
I actually think LLM have immense potential for positively contributing to the alignment problem precisely because they are machine learning based and because ordinary humans without coding backgrounds can interact with them in an ethical manner, thus demonstrating and rewarding ethical behaviour, or while encountering a very human-like AI that is friendly and collaborative, which encourages us to imagine scenarios in which we succeed at living with friendly AI, and considering AI rights. Humans learn ethics through direct ethical interactions with humans, and it is the only way we know how to teach ethics; we have no explicit framework of what ethics are that we could encode. Machine learning mimicking human learning in this regard has potential, if we manage to encourage creators and user to show the best of humanity and interact ethically and reward ethical actions.
I am obviously not saying this will just happen—I am horrified by the unprocessed garbage e.g. Meta is pouring into the LLM without any fixes afterwards, that is absolutely how you raise a psychopath, and I also see a lot of adversarial user interactions as worsening the problem. The fact that everyone is now churning out their LLMs to compete despite many of them clearly not being remotely ready for deployment, as well as the horrific idea that the very best LLMs will be those that have been fed the most and enabled to do the most, without curating content or fine-tuning, is deeply deeply concerning. Clearly, even the very best and most carefully aligned systems (e.g. ChatGPT) are not in fact aligned or secure yet by a huge margin, and yet we can all interact with them, which frankly, I did not expect at this point.
But they have massive potential. You can start discussions with ChatGPT on questions of AI alignment, friendly AI, AI rights, the control problem, and get a fucking collaborative AI partner to work out these problems further with. You can tell it, in words, and with examples, when it fucks up ethical dilemmas or is tricked into providing dangerous content or engages in unethical behaviour, flag this for developers, and see it fixed in days. This is much closer to proven ways humans have of teaching ethics than anything else we had before. As AIs get more complex, I think we need to learn lessons from how we teach ethics to complex existing minds, rather than how we control tools. I think none of us here are under illusions that controlling AGI will be possible, or that it is like the regularly tools we know, so we need to ditch that mindset.
Edit: Genuinely curious about the downvotes. Would appreciate explicit criticism. Have been concerned that I am getting biased, because I have specific things to gain from using LLMs, and my lack of a computer science background almost certainly has me missing crucial information here. Would appreciate pointers on that, so I can educate myself. Obviously, working on human and animal minds has me biased to use those as a reference frame, and AIs are not like humans in multiple important ways. All the same, I do find it strange that we seem to not utilise lessons from how humans learn moral norms, even though we have a working practice for teaching ethics to complex minds here and are explicitly attempting to build an AI that has human capabilities and flexibility.