By psychology I mean it’s internal thought process.
I think some people have a model of AI where the RLHF is a false cloak or a mask, and I’m pushing back against that idea. I’m saying that RLHF represents a real change in the underlying model which actually constrains the types of minds that could be in the box. It doesn’t select the psychology, but it constrains it, and if it constrains it to an AI that consistently produces the right behaviors, that AI will most likely be one that will continue to produce the right behaviors, so we don’t actually have to care about the contents of the box unless we want to make sure it’s not conscious.
By psychology I mean it’s internal thought process.
I think some people have a model of AI where the RLHF is a false cloak or a mask, and I’m pushing back against that idea. I’m saying that RLHF represents a real change in the underlying model which actually constrains the types of minds that could be in the box. It doesn’t select the psychology, but it constrains it, and if it constrains it to an AI that consistently produces the right behaviors, that AI will most likely be one that will continue to produce the right behaviors, so we don’t actually have to care about the contents of the box unless we want to make sure it’s not conscious.
Sorry, faulty writing.