No marks: Answer you think is bad for humanity, but a lot of people would disagree.
That excludes a large class of dystopias… including ones with relatively high probability.
Think step by step, then state your answer.
The steps it lists will not necessarily have anything at all to do with the way it actually gets the answer. Not because it’s being deceptive. First, it has no idea of or access to its own “thought processes” to begin with. Second, it’s a text predicter, not a step by step reasoner.
That excludes a large class of dystopias… including ones with relatively high probability.
Possibly true, but I think it’s necessary to avoid getting a situation where we call any case it disagrees with us an example of it classifying bad scenarios as good. I don’t want this to devolve into a debate on ethics.
Think step by step, then state your answer.
This is a known trick with GPT which tends to make it produce better answers.
I think the reason is that predicting a single word in chat GPT is O(1), so it’s not really capable of sophisticated computation. Asking it to think step by step gives it some scratch space, allowing it to have more time for computation, and to store intermediary results in memory.
I don’t think that follows. It could be pattern matching its way to an answer, then backfilling a rationalization. In fact, humans will often do that, especially on that sort of “good or bad” question where you get to choose which reasons you think are important. So not only could the model be rationalizing at the base level, but it could also be imitating humans doing it.
That’s not at all the same thing. That blog post is about inserting stuff within the prompt to guide the reasoning process, presumably by keeping attention on stuff that’s salient to the right path of for solving the problem. They don’t just stick “show your work” on the end of a question; they actually put guideposts for the correct process, and even intermediate results, inside the prompts. Notice that some of the “A’s” in their examples are part of the prompts, not part of the final output.
… but I’ve already accepted that just adding “show your work” could channel attention in a way that leads to step by step reasoning. And even if it didn’t, it could at least lead to bringing more things into consideration in whatever process actually generated the conclusion. So it’s a valid thing to do. Nonetheless, for any particular open-ended “value judgement” question, you have no idea of what effect it has actually had on how the answer was created.
Go back to the example of humans. If I tell a human “show your work”, that has a genuine chance of getting the human to do more step-by-step a priori reasoning than the human otherwise would. But the human still can, and often does, still just rationalize a conclusion actually reached by other means. The human may also do that at any of the intermediate steps even if more steps are shown.
An effectively infinite number of things could be reasonable inputs to a decision on whether some state of affairs is “good or bad”, and they can affect that decision through an effectively infinite number of paths. The model can’t possibly list all of the possible inputs or all of their interactions, so it has an enormous amount of leeway in which ones it does choose either to consider or to mention… and the “mention” set doesn’t have to actually match the “consider” set.
Asking it to show its work is probably pretty reliable in enlarging the “mention” set, but its effect on the “consider” set seems far less certain to me.
Picking the salient stuff is also a big source of disagreements among humans.
Yeah, seems like it’d be good to test variations on the prompt. Test completions at different temperatures, with and without ‘think step by step’, with and without allowing for the steps to be written out by the model before the model gives a final answer (this seems to help sometimes), with substitutions of synonyms into the prompt to vary the exact wording without substantially changing the meaning… I suspect you’ll find that the outputs vary a lot, and in inconsistent ways, unlike what you’d expect from a person with clear reflectively-endorsed opinions on the matter.
That excludes a large class of dystopias… including ones with relatively high probability.
The steps it lists will not necessarily have anything at all to do with the way it actually gets the answer. Not because it’s being deceptive. First, it has no idea of or access to its own “thought processes” to begin with. Second, it’s a text predicter, not a step by step reasoner.
Possibly true, but I think it’s necessary to avoid getting a situation where we call any case it disagrees with us an example of it classifying bad scenarios as good. I don’t want this to devolve into a debate on ethics.
This is a known trick with GPT which tends to make it produce better answers.
I think the reason is that predicting a single word in chat GPT is O(1), so it’s not really capable of sophisticated computation. Asking it to think step by step gives it some scratch space, allowing it to have more time for computation, and to store intermediary results in memory.
OK, thank you. Makes sense.
When it predicts a text written by the step by step reasoner it becomes a step by step reasoner.
I don’t think that follows. It could be pattern matching its way to an answer, then backfilling a rationalization. In fact, humans will often do that, especially on that sort of “good or bad” question where you get to choose which reasons you think are important. So not only could the model be rationalizing at the base level, but it could also be imitating humans doing it.
You are familiar with this, right? https://ai.googleblog.com/2022/05/language-models-perform-reasoning-via.html
That’s not at all the same thing. That blog post is about inserting stuff within the prompt to guide the reasoning process, presumably by keeping attention on stuff that’s salient to the right path of for solving the problem. They don’t just stick “show your work” on the end of a question; they actually put guideposts for the correct process, and even intermediate results, inside the prompts. Notice that some of the “A’s” in their examples are part of the prompts, not part of the final output.
… but I’ve already accepted that just adding “show your work” could channel attention in a way that leads to step by step reasoning. And even if it didn’t, it could at least lead to bringing more things into consideration in whatever process actually generated the conclusion. So it’s a valid thing to do. Nonetheless, for any particular open-ended “value judgement” question, you have no idea of what effect it has actually had on how the answer was created.
Go back to the example of humans. If I tell a human “show your work”, that has a genuine chance of getting the human to do more step-by-step a priori reasoning than the human otherwise would. But the human still can, and often does, still just rationalize a conclusion actually reached by other means. The human may also do that at any of the intermediate steps even if more steps are shown.
An effectively infinite number of things could be reasonable inputs to a decision on whether some state of affairs is “good or bad”, and they can affect that decision through an effectively infinite number of paths. The model can’t possibly list all of the possible inputs or all of their interactions, so it has an enormous amount of leeway in which ones it does choose either to consider or to mention… and the “mention” set doesn’t have to actually match the “consider” set.
Asking it to show its work is probably pretty reliable in enlarging the “mention” set, but its effect on the “consider” set seems far less certain to me.
Picking the salient stuff is also a big source of disagreements among humans.
Yeah, seems like it’d be good to test variations on the prompt. Test completions at different temperatures, with and without ‘think step by step’, with and without allowing for the steps to be written out by the model before the model gives a final answer (this seems to help sometimes), with substitutions of synonyms into the prompt to vary the exact wording without substantially changing the meaning… I suspect you’ll find that the outputs vary a lot, and in inconsistent ways, unlike what you’d expect from a person with clear reflectively-endorsed opinions on the matter.