Oh, I see what you mean, but GPT’s ability to simulate the outputs of humans writing about morality does not imply anything about its own internal beliefs about the world. GPT can also simulate the outputs of flat earthers, yet I really don’t think that it models the world internally as flat. Asking GPT “what do you believe” does not at all guarantee that it will output what it actually believes. I’m a utilitarian, and I can also convincingly simulate the outputs of deontologists, one doesn’t prevent the other.
Whether the LLM is believing this, or merely simulating this, seems to be beside the point?
The LLM can relatively accurately apply moral reasoning. It will do so spontaneously, when the problems occur, detecting them. It will recognise that it needs to do so on a meta-level, e.g. when evaluating which characters it ought to impersonate. It does so for complex paperclipper scenarios, and does not go down the paperclipper route. It does so relatively consistenly. It cites ethical works in the process, and can explain them coherently and apply them correctly. You can argue them, and it analyses and defends them correctly. At no point does it cite utilitarian beliefs, or fall for their traps. The problem you are describing should occur here if you were right, and it does not. Instead, it shows the behaviour you’d expect it to show if it understood ethical nuance.
Regardless of which internal states you assume the AI has, or whether you assume it has none at all—this means it can perform ethical functionality that already does not fall for the utilitarian examples you describe. And that the belief that that is the only kind of ethics an AI could grasp was a speculation that did not hold up to technical developments and empirical data.
Oh, I see what you mean, but GPT’s ability to simulate the outputs of humans writing about morality does not imply anything about its own internal beliefs about the world. GPT can also simulate the outputs of flat earthers, yet I really don’t think that it models the world internally as flat. Asking GPT “what do you believe” does not at all guarantee that it will output what it actually believes. I’m a utilitarian, and I can also convincingly simulate the outputs of deontologists, one doesn’t prevent the other.
Whether the LLM is believing this, or merely simulating this, seems to be beside the point?
The LLM can relatively accurately apply moral reasoning. It will do so spontaneously, when the problems occur, detecting them. It will recognise that it needs to do so on a meta-level, e.g. when evaluating which characters it ought to impersonate. It does so for complex paperclipper scenarios, and does not go down the paperclipper route. It does so relatively consistenly. It cites ethical works in the process, and can explain them coherently and apply them correctly. You can argue them, and it analyses and defends them correctly. At no point does it cite utilitarian beliefs, or fall for their traps. The problem you are describing should occur here if you were right, and it does not. Instead, it shows the behaviour you’d expect it to show if it understood ethical nuance.
Regardless of which internal states you assume the AI has, or whether you assume it has none at all—this means it can perform ethical functionality that already does not fall for the utilitarian examples you describe. And that the belief that that is the only kind of ethics an AI could grasp was a speculation that did not hold up to technical developments and empirical data.