Thanks for writing this up. I agree with several of the subpoints you make about how the plan could be more specific, measurable, etc.
I’m not sure where I stand on some of the more speculative (according to me) claims about OpenAI’s intentions. Put differently, I see your post making two big-picture claims:
The 1-2 short blog posts about the OpenAI plan failed to meet several desired criteria. Reality doesn’t grade on a curve, so even though the posts weren’t intended to spell out a bunch of very specific details, we should hold the world’s leading AGI company to high standards, and we should encourage them to release SMARTer and more detailed plans. (I largely agree with this)
OpenAI is alignment-washing, and their safety efforts are not focused on AI x-risk. (I’m much less certain about this claim and I don’t think there’s much evidence presented here to support it).
IMO, the observation that OpenAI’s plan isn’t “SMART” could mean that they’re alignment-washing. But it could also simply mean that they’re working toward making their plan SMARTer, and they’re working toward making their plans more specific/measurable, but they wanted to share what they had so far (which I commend them for). Similarly, the fact that OpenAI is against pivotal-acts could mean that they’re not taking the “we need to escape the critical risk period” goal seriously or it could mean that they reject one particular way of escaping the acute risk period, and they’re trying to find alternatives.
I also think I have some sort of prior that goes something like “you should have a high bar for confidently claiming that someone isn’t pursuing the same goal as you, just because their particular approach to achieving that goal isn’t yet specific/solid.
I’m also confused though, because you probably have a bunch of other pieces of evidence going into your model of OpenAI, and I don’t believe that everyone should have to write up a list of 50 reasons in order to criticize the intentions of a lab.
All things considered, I think I land somewhere like “I think it’s probably worth acknowledging more clearly that the accusations about alignment-washing are speculative, and the evidence in the post could be consistent with an OpenAI that really is trying hard to solve the alignment problem. Or acknowledge that you have other reasons for believing the alignment-washing claims that you’ve decided not to go into in the post.”
A substantial part of my private evidence is my personal evaluation of the CEO of OpenAI. I am really uneasy about stating this in public, but I now regret keeping my very negative evaluation of SBF private. Speak the truth, even if your voice trembles. I think a full “Heel turn” is more likely than not.
Thanks for writing this up. I agree with several of the subpoints you make about how the plan could be more specific, measurable, etc.
I’m not sure where I stand on some of the more speculative (according to me) claims about OpenAI’s intentions. Put differently, I see your post making two big-picture claims:
The 1-2 short blog posts about the OpenAI plan failed to meet several desired criteria. Reality doesn’t grade on a curve, so even though the posts weren’t intended to spell out a bunch of very specific details, we should hold the world’s leading AGI company to high standards, and we should encourage them to release SMARTer and more detailed plans. (I largely agree with this)
OpenAI is alignment-washing, and their safety efforts are not focused on AI x-risk. (I’m much less certain about this claim and I don’t think there’s much evidence presented here to support it).
IMO, the observation that OpenAI’s plan isn’t “SMART” could mean that they’re alignment-washing. But it could also simply mean that they’re working toward making their plan SMARTer, and they’re working toward making their plans more specific/measurable, but they wanted to share what they had so far (which I commend them for). Similarly, the fact that OpenAI is against pivotal-acts could mean that they’re not taking the “we need to escape the critical risk period” goal seriously or it could mean that they reject one particular way of escaping the acute risk period, and they’re trying to find alternatives.
I also think I have some sort of prior that goes something like “you should have a high bar for confidently claiming that someone isn’t pursuing the same goal as you, just because their particular approach to achieving that goal isn’t yet specific/solid.
I’m also confused though, because you probably have a bunch of other pieces of evidence going into your model of OpenAI, and I don’t believe that everyone should have to write up a list of 50 reasons in order to criticize the intentions of a lab.
All things considered, I think I land somewhere like “I think it’s probably worth acknowledging more clearly that the accusations about alignment-washing are speculative, and the evidence in the post could be consistent with an OpenAI that really is trying hard to solve the alignment problem. Or acknowledge that you have other reasons for believing the alignment-washing claims that you’ve decided not to go into in the post.”
I’ll do both:
I (again) affirm that this is very speculative.
A substantial part of my private evidence is my personal evaluation of the CEO of OpenAI. I am really uneasy about stating this in public, but I now regret keeping my very negative evaluation of SBF private. Speak the truth, even if your voice trembles. I think a full “Heel turn” is more likely than not.