I think the answer to “are you phenomenally conscious” will be sensitive to small differences in the training data involving similar conversations.
I’m not sure why the narrowness vs. broadness of the distribution of answers here should update me either. If it’s just really confident that all sci-fi AIs are supposed to answer “yes” to “are you conscious,” you’ll get the same answer every time but that answer won’t correlate to anything about the model’s actual consciousness.
I think we can mitigate this issue by removing all data related/adjacent to consciousness and/or AIs when pretraining/finetuning the model. Here, we’d only explain the notion of phenomenal consciousness to the model at test time, when it needs to answer the consciousness-related questions
+1. Also:
I’m not sure why the narrowness vs. broadness of the distribution of answers here should update me either. If it’s just really confident that all sci-fi AIs are supposed to answer “yes” to “are you conscious,” you’ll get the same answer every time but that answer won’t correlate to anything about the model’s actual consciousness.
I think we can mitigate this issue by removing all data related/adjacent to consciousness and/or AIs when pretraining/finetuning the model. Here, we’d only explain the notion of phenomenal consciousness to the model at test time, when it needs to answer the consciousness-related questions