edit: i think i’ve received enough expressions of interest (more would have diminishing value but you’re still welcome to), thanks everyone!
i recall reading in one of the MIRI posts that Eliezer believed a ‘world model violation’ would be needed for success to be likely.
i believe i may be in possession of such a model violation and am working to formalize it, where by formalize i mean write in a way that is not ‘hard-to-understand intuitions’ but ‘very clear text that leaves little possibility for disagreement once understood’. it wouldn’t solve the problem, but i think it would make it simpler so that maybe the community could solve it.
if you’d be interested in providing feedback on such a ‘clearly written version’, please let me know as a comment or message.[1] (you’re not committing to anything by doing so, rather just saying “im a kind of person who would be interested in this if your claim is true”). to me, the ideal feedback is from someone who can look at the idea under ‘hard’ assumptions (of the type MIRI has) about the difficulty of pointing an ASI, and see if the idea seems promising (or ‘like a relevant model violation’) from that perspective.
Historically I’ve been able to understand others’ vague ideas & use them in ways they endorse. I can’t promise I’ll read what you send me, but I am interested.
edit: i think i’ve received enough expressions of interest (more would have diminishing value but you’re still welcome to), thanks everyone!
i recall reading in one of the MIRI posts that Eliezer believed a ‘world model violation’ would be needed for success to be likely.
i believe i may be in possession of such a model violation and am working to formalize it, where by formalize i mean write in a way that is not ‘hard-to-understand intuitions’ but ‘very clear text that leaves little possibility for disagreement once understood’. it wouldn’t solve the problem, but i think it would make it simpler so that maybe the community could solve it.
if you’d be interested in providing feedback on such a ‘clearly written version’, please let me know as a comment or message.[1] (you’re not committing to anything by doing so, rather just saying “im a kind of person who would be interested in this if your claim is true”). to me, the ideal feedback is from someone who can look at the idea under ‘hard’ assumptions (of the type MIRI has) about the difficulty of pointing an ASI, and see if the idea seems promising (or ‘like a relevant model violation’) from that perspective.
i don’t have many contacts in the alignment community
I’m game! We should be looking for new ideas, so I’m happy to look at yours and provide feedback.
Consider me in
Historically I’ve been able to understand others’ vague ideas & use them in ways they endorse. I can’t promise I’ll read what you send me, but I am interested.
Maybe you can say a bit about what background someone should have to be able to evaluate your idea.