As far as the conditioning goes, Habryka showed me some base model outputs with conditioning on karma/agreement and there turns out to be an EDT-like problem with LW-style comments when you condition on high values—often, a high-scoring LW comment will include strong empirical evidence like personal experience or citations, which would be highly convincing indeed… if it were true, rather than confabulated.
So if you sampled a response to your new post about “X might be helpful”, then a high-value conditioning might generate a counter-comment from “Gwern” like “I’ve tried X over 100 times and it never worked!” You can see the problem with that. It’s not the ‘kneejerk prejudices’, it’s the self-fulfilling prophecies of sampling based on previously sampled tokens which bootstrap strong but false claims. (If that were true, if I had tried X over 100 times and it never worked, that would be a very valuable and important comment for me to make on your new post about X, and it would be highly upvoted etc. It’s just that the LLM has no way of knowing that and it’s almost certainly not true, especially if X is some new idea that no one else could’ve even tried yet.)
The confabulation problem here seems especially bad because we value empirical grounding so much, and that is something base LLMs are poor at. (The chatbots are much better, but problematic in all the other ways.) It’s not obvious how to condition for good comments which avoid confabulation issues and either solely refer to pre-existing published comments or pure reasoning/general-knowledge responses.
So the karma/agreement conditioning idea might not work out in practice compared to just sampling random values, or something more complex, like generating n comments at each possible combination of levels, and presenting the grid, or perhaps then feeding them back in to select the ‘best’ one in some sense.
Yeah, to be honest, I got 2 hours of sleep and I don’t follow everything you said. If you say it won’t work, I believe you, but I do know that being part of a community that outspokenly claims to value empirical evidence but has no objective mechanism between signing up for an account and upvoting its comments/posts can’t objectively claim to associate empiricism with its community karma. Even if you could make this work for LLMs, I don’t know that it would be reliable.
We might like to think rational cognitive practices actually make us “less wrong” than other people, but that may only be the case out of identity bias. We have no way to prove it isn’t just our knee-jerk biases manifesting in practice unless we rely on something external to blind our judgement. I went through the sign-up process, and there is no mechanism beyond personal belief. Nothing. Believing we have fixed our bias is exactly how it would go unchecked… which explains why this place is decades behind in certain fields. Dunning-Kruger is a huge interdisciplinary barrier in this community (in both directions), and so are the lack of communication accommodations. Hell, LW still touts IQ over WAIS, as if there aren’t 9+ unmeasured types of intelligence that y’all think either don’t exist or have no meaningful value.
Would it help if I wrote about this? Or would I just get downvoted to oblivion because a statement that I consider basic scientific literacy and don’t think provide a link for is not enough to trigger a “curious → highlight → web search” reaction NOR a “request for reference link” in a community of self-proclaimed rationalists?
I think you would probably be downvoted because you have already admitted to writing poorly thought out ignorant comments under conditions conducive to arrogance and bad judgment, of which you are apparently unashamed and feel no need to rectify (eg. by refraining from commenting until you are recovered), while dragging in unrelated claims which are seriously problematic like uncritical belief in Dunning-Kruger as a thing or claiming that anyone is touting ‘IQ over WAIS’ (WAIS… like, the IQ test WAIS?) or apparently believe in things like multiple intelligences, and your comments are littered with mockery, spelling errors, and grandiose generalizations writing checks that you don’t remotely come close to cashing. (Saying you’ve definitely seen data, trust me bro, but you can’t remember where, and everyone should just go google it themselves, is not a convincing argument.)
If you are going to comment on my serious writings—and in my shortform posts, not yours—I would greatly appreciate it if you could do so on more than 2 hours of sleep, and confine your comments to the object level I am writing about (instead of jumping to the meta-level about how these exemplify the errors of this community of blind sheep that only you are enlightened enough to perceive and explain to them—if only they would not reject your message). I would also suggest reading more MoR and less Attack on Titan, and in general identifying less with fictional characters.
As far as the conditioning goes, Habryka showed me some base model outputs with conditioning on karma/agreement and there turns out to be an EDT-like problem with LW-style comments when you condition on high values—often, a high-scoring LW comment will include strong empirical evidence like personal experience or citations, which would be highly convincing indeed… if it were true, rather than confabulated.
So if you sampled a response to your new post about “X might be helpful”, then a high-value conditioning might generate a counter-comment from “Gwern” like “I’ve tried X over 100 times and it never worked!” You can see the problem with that. It’s not the ‘kneejerk prejudices’, it’s the self-fulfilling prophecies of sampling based on previously sampled tokens which bootstrap strong but false claims. (If that were true, if I had tried X over 100 times and it never worked, that would be a very valuable and important comment for me to make on your new post about X, and it would be highly upvoted etc. It’s just that the LLM has no way of knowing that and it’s almost certainly not true, especially if X is some new idea that no one else could’ve even tried yet.)
The confabulation problem here seems especially bad because we value empirical grounding so much, and that is something base LLMs are poor at. (The chatbots are much better, but problematic in all the other ways.) It’s not obvious how to condition for good comments which avoid confabulation issues and either solely refer to pre-existing published comments or pure reasoning/general-knowledge responses.
So the karma/agreement conditioning idea might not work out in practice compared to just sampling random values, or something more complex, like generating n comments at each possible combination of levels, and presenting the grid, or perhaps then feeding them back in to select the ‘best’ one in some sense.
Yeah, to be honest, I got 2 hours of sleep and I don’t follow everything you said. If you say it won’t work, I believe you, but I do know that being part of a community that outspokenly claims to value empirical evidence but has no objective mechanism between signing up for an account and upvoting its comments/posts can’t objectively claim to associate empiricism with its community karma. Even if you could make this work for LLMs, I don’t know that it would be reliable.
We might like to think rational cognitive practices actually make us “less wrong” than other people, but that may only be the case out of identity bias. We have no way to prove it isn’t just our knee-jerk biases manifesting in practice unless we rely on something external to blind our judgement. I went through the sign-up process, and there is no mechanism beyond personal belief. Nothing. Believing we have fixed our bias is exactly how it would go unchecked… which explains why this place is decades behind in certain fields. Dunning-Kruger is a huge interdisciplinary barrier in this community (in both directions), and so are the lack of communication accommodations. Hell, LW still touts IQ over WAIS, as if there aren’t 9+ unmeasured types of intelligence that y’all think either don’t exist or have no meaningful value.
Would it help if I wrote about this? Or would I just get downvoted to oblivion because a statement that I consider basic scientific literacy and don’t think provide a link for is not enough to trigger a “curious → highlight → web search” reaction NOR a “request for reference link” in a community of self-proclaimed rationalists?
I think you would probably be downvoted because you have already admitted to writing poorly thought out ignorant comments under conditions conducive to arrogance and bad judgment, of which you are apparently unashamed and feel no need to rectify (eg. by refraining from commenting until you are recovered), while dragging in unrelated claims which are seriously problematic like uncritical belief in Dunning-Kruger as a thing or claiming that anyone is touting ‘IQ over WAIS’ (WAIS… like, the IQ test WAIS?) or apparently believe in things like multiple intelligences, and your comments are littered with mockery, spelling errors, and grandiose generalizations writing checks that you don’t remotely come close to cashing. (Saying you’ve definitely seen data, trust me bro, but you can’t remember where, and everyone should just go google it themselves, is not a convincing argument.)
If you are going to comment on my serious writings—and in my shortform posts, not yours—I would greatly appreciate it if you could do so on more than 2 hours of sleep, and confine your comments to the object level I am writing about (instead of jumping to the meta-level about how these exemplify the errors of this community of blind sheep that only you are enlightened enough to perceive and explain to them—if only they would not reject your message). I would also suggest reading more MoR and less Attack on Titan, and in general identifying less with fictional characters.