I didn’t leave it as a “simple” to-do, but rather an offer to collaboratively hash something out.
That said: If people don’t even know what it would look like when they see it, how can one update on evidence? What is Nate looking at which tells him that GPT doesn’t “want things in a behavioralist sense”? (I bet he’s looking at something real to him, and I bet he could figure it out if he tried!)
I claim we are many scientific insights away from being able to talk about these questions at the level of precision necessary to make predictions like this.
To be clear, I’m not talking about formalizing the boundary. I’m talking about a bet between people, adjudicated by people.
(EDIT: I’m fine with a low sensitivity, high specificity outcome—we leave it unresolved if it’s ambiguous / not totally obvious relative to the loose criteria we settled on. Also, the criterion could include randomly polling n alignment / AI people and asking them how “behaviorally-wanting” the system seemed on a Likert scale. I don’t think you need fundamental insights for that to work.)
I didn’t leave it as a “simple” to-do, but rather an offer to collaboratively hash something out.
That said: If people don’t even know what it would look like when they see it, how can one update on evidence? What is Nate looking at which tells him that GPT doesn’t “want things in a behavioralist sense”? (I bet he’s looking at something real to him, and I bet he could figure it out if he tried!)
To be clear, I’m not talking about formalizing the boundary. I’m talking about a bet between people, adjudicated by people.
(EDIT: I’m fine with a low sensitivity, high specificity outcome—we leave it unresolved if it’s ambiguous / not totally obvious relative to the loose criteria we settled on. Also, the criterion could include randomly polling n alignment / AI people and asking them how “behaviorally-wanting” the system seemed on a Likert scale. I don’t think you need fundamental insights for that to work.)