Related to the role of peer review: a lot stuff on LW/AF is relatively exploratory, feeling out concepts, trying to figure out the right frames, etc. We need to be generally willing to ask discuss incomplete ideas, stuff that hasn’t yet had the details ironed out. For that to succeed, we need community discussion standards which tolerate a high level of imperfect details or incomplete ideas. I think we do pretty well with this today.
But sometimes, you want to be like “come at me bro”. You’ve got something that you’re pretty highly confident is right, and you want people to really try to shoot it down (partly as a social mechanism to demonstrate that the idea is in fact as solid and useful as you think it is). This isn’t something I’d want to be the default kind of feedback, but I’d like for authors to be able to say “come at me bro” when they’re ready for it, and I’d like for posts which survive such a review to be perceived as more epistemically-solid/useful.
With that in mind, here’s a few of my own AF posts which I’d submit for a “come at me bro” review:
Probability as Minimal Map—I claim this is both a true and useful interpretation of probability distributions. Come at me bro.
Writing Causal Models Like We Write Programs—I claim that this approach fully captures the causal semantics of typical programming languages, the “gears of computation”, and “what programs mean”. Come at me bro.
The Fusion Power Generator Scenario (and this comment) - I claim that any alignment scheme which relies on humans using an AI safely, or relies on humans asking the right questions, is either very limited or not safe. (In particular, this includes everything in the HCH cluster.) Come at me bro.
Human Values Are A Function Of Humans’ Latent Variables—I claim that this captures all of the conceptually-difficult pieces of “what are human values?”, and shows that those conceptual difficulties can be faithfully captured in a Bayesian framework. Come at me bro.
For all of these, things like “this frame is wrong” or “this seems true but not useful” are valid objections. I’m not just claiming that the proofs hold.
But sometimes, you want to be like “come at me bro”. You’ve got something that you’re pretty highly confident is right, and you want people to really try to shoot it down (partly as a social mechanism to demonstrate that the idea is in fact as solid and useful as you think it is). This isn’t something I’d want to be the default kind of feedback, but I’d like for authors to be able to say “come at me bro” when they’re ready for it, and I’d like for posts which survive such a review to be perceived as more epistemically-solid/useful.
Yeah, when I think about implementing a review process for the Alignment Forum, I’m definitely thinking about something you can ask for more polished research, in order to get external feedback and a tag saying this is peer review (for prestige and reference).
Thanks for the suggestions! We’ll consider them. :)
Related to the role of peer review: a lot stuff on LW/AF is relatively exploratory, feeling out concepts, trying to figure out the right frames, etc. We need to be generally willing to ask discuss incomplete ideas, stuff that hasn’t yet had the details ironed out. For that to succeed, we need community discussion standards which tolerate a high level of imperfect details or incomplete ideas. I think we do pretty well with this today.
But sometimes, you want to be like “come at me bro”. You’ve got something that you’re pretty highly confident is right, and you want people to really try to shoot it down (partly as a social mechanism to demonstrate that the idea is in fact as solid and useful as you think it is). This isn’t something I’d want to be the default kind of feedback, but I’d like for authors to be able to say “come at me bro” when they’re ready for it, and I’d like for posts which survive such a review to be perceived as more epistemically-solid/useful.
With that in mind, here’s a few of my own AF posts which I’d submit for a “come at me bro” review:
Probability as Minimal Map—I claim this is both a true and useful interpretation of probability distributions. Come at me bro.
Public Static: What Is Abstraction—I claim that this captures all of the key pieces of what “abstraction” means. Come at me bro.
Writing Causal Models Like We Write Programs—I claim that this approach fully captures the causal semantics of typical programming languages, the “gears of computation”, and “what programs mean”. Come at me bro.
The Fusion Power Generator Scenario (and this comment) - I claim that any alignment scheme which relies on humans using an AI safely, or relies on humans asking the right questions, is either very limited or not safe. (In particular, this includes everything in the HCH cluster.) Come at me bro.
Human Values Are A Function Of Humans’ Latent Variables—I claim that this captures all of the conceptually-difficult pieces of “what are human values?”, and shows that those conceptual difficulties can be faithfully captured in a Bayesian framework. Come at me bro.
For all of these, things like “this frame is wrong” or “this seems true but not useful” are valid objections. I’m not just claiming that the proofs hold.
Yeah, when I think about implementing a review process for the Alignment Forum, I’m definitely thinking about something you can ask for more polished research, in order to get external feedback and a tag saying this is peer review (for prestige and reference).
Thanks for the suggestions! We’ll consider them. :)