It seems that the “ethical simulator” from point 1. and the LLM-based agent from point 2. overlap, so you just overcomplicate things if make them two distinct systems. An LLM prompted with the right “system prompt” (virtue ethics) + doing some branching-tree search for optimal plans according to some trained “utility/value” evaluator (consequentialism) + filtering out plans which have actions that are always prohibited (law, deontology). The second component is the closest to what you described as an “ethical simulator”, but is not quite it: the “utility/value” evaluator cannot say whether an action or a plan is ethical or not in absolute terms, it can only compare some proposed plans for the particular situation by some planner.
They are not supposed to be two distinct systems. One is a subsystem of the other. There may be implementations where its the same LLM doing all the generative work for every step of the reasoning via prompt engineering but it doesn’t have to be this way. It can can be multiple more specific LLMs that went through different RLHF processes.
It seems that the “ethical simulator” from point 1. and the LLM-based agent from point 2. overlap, so you just overcomplicate things if make them two distinct systems. An LLM prompted with the right “system prompt” (virtue ethics) + doing some branching-tree search for optimal plans according to some trained “utility/value” evaluator (consequentialism) + filtering out plans which have actions that are always prohibited (law, deontology). The second component is the closest to what you described as an “ethical simulator”, but is not quite it: the “utility/value” evaluator cannot say whether an action or a plan is ethical or not in absolute terms, it can only compare some proposed plans for the particular situation by some planner.
They are not supposed to be two distinct systems. One is a subsystem of the other. There may be implementations where its the same LLM doing all the generative work for every step of the reasoning via prompt engineering but it doesn’t have to be this way. It can can be multiple more specific LLMs that went through different RLHF processes.