That is a plausible architecture, and is probably analogous to something humans do. But the “neural net which finds flaws in reasoning” would by itself be a much more complex object than a language model.
I actually find it plausible that what-most-humans-are-doing doesn’t involve the second model being much more complicated (this is more of a dig at what I think most humans are doing most of the time, then a point about what the smartest humans are doing)
Like, you generate some babble of things to say and do. You then predict which things would get you yelled at by other humans – for being dangerous, for being socially unsuave, for being logically wrong. My impression from my own introspection is that most of this looks more like vague pattern-matching than anything else, and I have to sit down and “think for real” in order to get more interesting things.
I do notice that when I sit down and “think for-real”, I can generate better thoughts that when I do the pattern of “hmm, I remember getting criticized for this sort of thought before, let me try to permutate the thought until I don’t feel like I’ll get yelled at.” So (hopefully?) thinking-for-real exists, but I bet you could make serious progress without it.
This is how I generated my first comment to the post. I generated first variant of the comment and then generate expected number of votes which came out negative, so I decided not to post my first variant. When I generated new comment which had better expected number of votes and decided to post it.
I would start with dataset of errors in reasoning. Just generate 100 000 texts using GPT-2, put them in the Mechanical turk for marking reasoning errors, and then train another neural net to find logical or other types of errors bad on this dataset.
That is a plausible architecture, and is probably analogous to something humans do. But the “neural net which finds flaws in reasoning” would by itself be a much more complex object than a language model.
I actually find it plausible that what-most-humans-are-doing doesn’t involve the second model being much more complicated (this is more of a dig at what I think most humans are doing most of the time, then a point about what the smartest humans are doing)
Like, you generate some babble of things to say and do. You then predict which things would get you yelled at by other humans – for being dangerous, for being socially unsuave, for being logically wrong. My impression from my own introspection is that most of this looks more like vague pattern-matching than anything else, and I have to sit down and “think for real” in order to get more interesting things.
I do notice that when I sit down and “think for-real”, I can generate better thoughts that when I do the pattern of “hmm, I remember getting criticized for this sort of thought before, let me try to permutate the thought until I don’t feel like I’ll get yelled at.” So (hopefully?) thinking-for-real exists, but I bet you could make serious progress without it.
This is how I generated my first comment to the post. I generated first variant of the comment and then generate expected number of votes which came out negative, so I decided not to post my first variant. When I generated new comment which had better expected number of votes and decided to post it.
Man, at first I thought you were saying that your top level comment was generated by GPT-2 and I thought you were on a whole nother level of meta.
I would start with dataset of errors in reasoning. Just generate 100 000 texts using GPT-2, put them in the Mechanical turk for marking reasoning errors, and then train another neural net to find logical or other types of errors bad on this dataset.