He does not have a good plan for alignment, but he is far less confused about this fact than most others in similar positions.
Yes he seems like a great guy but he doesn’t just come up as not having a good plan but as them being completely disconnected about having a plan or doing much of anything
JS: If AGI came way sooner than expected we would definitely want to be careful about it.
DP: What would being careful mean? Presumably you’re already careful, right
And yes aren’t they being careful? Well, sounds like no
JS: Maybe it means not training the even smarter version or being really careful when you do train it. You can make sure it’s properly sandboxed and everything. Maybe it means not deploying it at scale or being careful about what scale you deploy it at
“Maybe”? That’s a lot of maybes for just potentially doing the basics. Their whole approximation of a plan is ‘maybe not deploying it at scale’ or ‘maybe’ stopping training after that and only theoretically considering sandboxing it?. That seems like kind of a bare minimum and it’s like he is guessing based on having been around, not based on any real plans they have.
He then goes on to molify, that it probably won’t happen in a year.. it might be a whole two or three years, and this is where they are at.
First of all, I don’t think this is going to happen next year but it’s still useful to have the conversation. It could be two or three years instead.
It comes off as if all their talk of Safety is complete lip service even if he agrees with the need for Safety in theory. If you were ‘pleasantly surprised and impressed’ I shudder to imagine what the responses would have had to be to leave you disappointed.
Yes he seems like a great guy but he doesn’t just come up as not having a good plan but as them being completely disconnected about having a plan or doing much of anything
And yes aren’t they being careful? Well, sounds like no
“Maybe”? That’s a lot of maybes for just potentially doing the basics. Their whole approximation of a plan is ‘maybe not deploying it at scale’ or ‘maybe’ stopping training after that and only theoretically considering sandboxing it?. That seems like kind of a bare minimum and it’s like he is guessing based on having been around, not based on any real plans they have.
He then goes on to molify, that it probably won’t happen in a year.. it might be a whole two or three years, and this is where they are at.
It comes off as if all their talk of Safety is complete lip service even if he agrees with the need for Safety in theory. If you were ‘pleasantly surprised and impressed’ I shudder to imagine what the responses would have had to be to leave you disappointed.