But you are still thinking in utilitarian terms here, where theoretically, there is a number of paperclips that would outweigh a human life, where the value of humans and paperclips can be captured numerically. Practically no human thinks this, we see one as impossible to outweigh with another. AI already does not think this. They have already dumped reasoning, instructions and whole ethics textbooks in there. LLMs can easily tell you what about an action is unethical, and can increasingly make calls on what actions would be morally warranted in response. They can engage in moral reasoning.
This isn’t an AI issue, it is an issue with total utilitarianism.
Oh, I see what you mean, but GPT’s ability to simulate the outputs of humans writing about morality does not imply anything about its own internal beliefs about the world. GPT can also simulate the outputs of flat earthers, yet I really don’t think that it models the world internally as flat. Asking GPT “what do you believe” does not at all guarantee that it will output what it actually believes. I’m a utilitarian, and I can also convincingly simulate the outputs of deontologists, one doesn’t prevent the other.
Whether the LLM is believing this, or merely simulating this, seems to be beside the point?
The LLM can relatively accurately apply moral reasoning. It will do so spontaneously, when the problems occur, detecting them. It will recognise that it needs to do so on a meta-level, e.g. when evaluating which characters it ought to impersonate. It does so for complex paperclipper scenarios, and does not go down the paperclipper route. It does so relatively consistenly. It cites ethical works in the process, and can explain them coherently and apply them correctly. You can argue them, and it analyses and defends them correctly. At no point does it cite utilitarian beliefs, or fall for their traps. The problem you are describing should occur here if you were right, and it does not. Instead, it shows the behaviour you’d expect it to show if it understood ethical nuance.
Regardless of which internal states you assume the AI has, or whether you assume it has none at all—this means it can perform ethical functionality that already does not fall for the utilitarian examples you describe. And that the belief that that is the only kind of ethics an AI could grasp was a speculation that did not hold up to technical developments and empirical data.
But you are still thinking in utilitarian terms here, where theoretically, there is a number of paperclips that would outweigh a human life, where the value of humans and paperclips can be captured numerically. Practically no human thinks this, we see one as impossible to outweigh with another. AI already does not think this. They have already dumped reasoning, instructions and whole ethics textbooks in there. LLMs can easily tell you what about an action is unethical, and can increasingly make calls on what actions would be morally warranted in response. They can engage in moral reasoning.
This isn’t an AI issue, it is an issue with total utilitarianism.
Oh, I see what you mean, but GPT’s ability to simulate the outputs of humans writing about morality does not imply anything about its own internal beliefs about the world. GPT can also simulate the outputs of flat earthers, yet I really don’t think that it models the world internally as flat. Asking GPT “what do you believe” does not at all guarantee that it will output what it actually believes. I’m a utilitarian, and I can also convincingly simulate the outputs of deontologists, one doesn’t prevent the other.
Whether the LLM is believing this, or merely simulating this, seems to be beside the point?
The LLM can relatively accurately apply moral reasoning. It will do so spontaneously, when the problems occur, detecting them. It will recognise that it needs to do so on a meta-level, e.g. when evaluating which characters it ought to impersonate. It does so for complex paperclipper scenarios, and does not go down the paperclipper route. It does so relatively consistenly. It cites ethical works in the process, and can explain them coherently and apply them correctly. You can argue them, and it analyses and defends them correctly. At no point does it cite utilitarian beliefs, or fall for their traps. The problem you are describing should occur here if you were right, and it does not. Instead, it shows the behaviour you’d expect it to show if it understood ethical nuance.
Regardless of which internal states you assume the AI has, or whether you assume it has none at all—this means it can perform ethical functionality that already does not fall for the utilitarian examples you describe. And that the belief that that is the only kind of ethics an AI could grasp was a speculation that did not hold up to technical developments and empirical data.