As I understand it, you’re trying to prevent the AI from behaving in a non-humanlike way by constraining its output. This seems to me to be a reasonable option to explore.
I agree that generating a finite set of humanlike answers (with a chatbot or otherwise) might be a sensible way to do this. An AI could perform gradient descent over the solution space then pick the nearest proposed behaviour (it could work like relaxation in integer programming).
As I understand it, you’re trying to prevent the AI from behaving in a non-humanlike way by constraining its output. This seems to me to be a reasonable option to explore.
I agree that generating a finite set of humanlike answers (with a chatbot or otherwise) might be a sensible way to do this. An AI could perform gradient descent over the solution space then pick the nearest proposed behaviour (it could work like relaxation in integer programming).
The multiple choice AI (with human-suggested options) is the most obvious option for avoiding unhumanlike behaviour. Paul has said in some medium comments that he thinks his more elaborate approach of combining mimicry and optimisation [1] would work better though. https://medium.com/ai-control/mimicry-maximization-and-meeting-halfway-c149dd23fc17
Thanks for linking me to that!