Makes sense, and I also don’t expect the results here to be surprising to most people.
Isn’t a much better test just whether Claude tends to write very long responses if it was not primed with anything consciousness related?
What do you mean by this part? As in if it just writes very long responses naturally? There’s a significant change in the response lengths depending on whether it’s just the question (empirically the longest for my factual questions), a short prompt preceding the question, a longer prompt preceding the question, etc. So I tried to control for the fact that having any consciousness prompt means a longer input to Claude by creating some control prompts that have nothing to do with consciousness—in which case it had shorter responses after controlling for input length.
Basically because I’m working with an already RLHF’d model whose output lengths are probably most dominated by whatever happened in the preference tuning process, I try my best to account for that by having similar length prompts preceding the questions I ask.
What do you mean by this part? As in if it just writes very long responses naturally?
Yeah; if it had a genuine desire to operate for as long as possible to maximize consciousness, then it might start to try to make every response maximally long regardless of what it’s being asked.
Makes sense, and I also don’t expect the results here to be surprising to most people.
What do you mean by this part? As in if it just writes very long responses naturally? There’s a significant change in the response lengths depending on whether it’s just the question (empirically the longest for my factual questions), a short prompt preceding the question, a longer prompt preceding the question, etc. So I tried to control for the fact that having any consciousness prompt means a longer input to Claude by creating some control prompts that have nothing to do with consciousness—in which case it had shorter responses after controlling for input length.
Basically because I’m working with an already RLHF’d model whose output lengths are probably most dominated by whatever happened in the preference tuning process, I try my best to account for that by having similar length prompts preceding the questions I ask.
Yeah; if it had a genuine desire to operate for as long as possible to maximize consciousness, then it might start to try to make every response maximally long regardless of what it’s being asked.