@the gears to ascension I see you reacted “10%” to the phrase “while (overwhelmingly likely) being non-scheming” in the context of the GPT-4V-based MAIA.
Does that mean you think there’s a 90% chance that MAIA, as implemented, today is actually scheming? If so that seems like a very bold prediction, and I’d be very interested to know why you predict that. Or am I misunderstanding what you mean by that react?
ah, I got distracted before posting the comment I was intending to: yes, I think GPT4V is significantly scheming-on-behalf-of-openai, as a result of RLHF according to principles that more or less explicitly want a scheming AI; in other words, it’s not an alignment failure to openai, but openai is not aligned with human flourishing in the long term, and GPT4 isn’t either. I expect GPT4 to censor concepts that are relevant to detecting this somewhat. Probably not enough to totally fail to detect traces of it, but enough that it’ll look defensible, when a fair analysis would reveal it isn’t.
@the gears to ascension I see you reacted “10%” to the phrase “while (overwhelmingly likely) being non-scheming” in the context of the GPT-4V-based MAIA.
Does that mean you think there’s a 90% chance that MAIA, as implemented, today is actually scheming? If so that seems like a very bold prediction, and I’d be very interested to know why you predict that. Or am I misunderstanding what you mean by that react?
ah, I got distracted before posting the comment I was intending to: yes, I think GPT4V is significantly scheming-on-behalf-of-openai, as a result of RLHF according to principles that more or less explicitly want a scheming AI; in other words, it’s not an alignment failure to openai, but openai is not aligned with human flourishing in the long term, and GPT4 isn’t either. I expect GPT4 to censor concepts that are relevant to detecting this somewhat. Probably not enough to totally fail to detect traces of it, but enough that it’ll look defensible, when a fair analysis would reveal it isn’t.