I expect you would also say that a crucial hard part many people are avoiding is “how to learn human values?”, right? (Not the true names, but a useful pointer)
Yes, although I consider that one more debatable.
I expect you to partially disagree, but there’s not always a “right” operationalization...
When there’s not a “right” operationalization, that usually means that the concepts involved were fundamentally confused in the first place.
I want to say that you should start with behavioral theorem, and often the properties you want to describe might make more sense behaviorally, but I guess you’re going to answer that we have evidence that this doesn’t work in Alignment and so it is avoiding the Hard part. Am I correct?
Actually, I think starting from a behavioral theorem is fine. It’s just not where we’re looking to end up, and the fact that we want to open the black box should steer what starting points we look for, even when those starting points are behavioral.
When there’s not a “right” operationalization, that usually means that the concepts involved were fundamentally confused in the first place.
Curious about the scope of the conceptual space where this belief was calibrated. It seems to me to tacitly say something like “everything that’s important is finitely characterizable”.
Maybe the “fundamentally confused” in your phrasing already includes the case of “stupidly tried to grab something that wasn’t humanly possible, even if in principle” as a confused way for a human, without making any claim of reality being conveniently compressible at all levels. (Note that this link explicitly disavows beauty at “all levels” too.)
I suppose you might also say “I didn’t make any claim of finiteness” but I do think something like “at least some humans are only a finite string away from grokking anything” is implicit if you expect there to be blogposts/textbooks that can operationalize everything relevant. It would be an even stronger claim than “finiteness”, it would be “human-typical length strings”
I believe Adam is pointing at something quite important, akin to a McNamara fallacy for formalization. To paraphrase:
The first step is to formalize whatever can be easily formalized. This is OK as far as it goes. The second step is to disregard that which can’t be easily formalized or to make overly simplifying assumptions. This is artificial and misleading. The third step is to presume that what can’t be formalized easily really isn’t important. This is blindness. The fourth step is to say that what can’t be easily formalized really doesn’t exist. This is suicide.
In the case of something that has already been engineered (human brains with agency), we probably should grant that it is possible to operationalize everything relevant. But I want to pushback on the general version and would want “why do you believe simple-formalization is possible here, in this domain?” to be allowed to be asked.
Yes, although I consider that one more debatable.
When there’s not a “right” operationalization, that usually means that the concepts involved were fundamentally confused in the first place.
Actually, I think starting from a behavioral theorem is fine. It’s just not where we’re looking to end up, and the fact that we want to open the black box should steer what starting points we look for, even when those starting points are behavioral.
Curious about the scope of the conceptual space where this belief was calibrated. It seems to me to tacitly say something like “everything that’s important is finitely characterizable”.
Maybe the “fundamentally confused” in your phrasing already includes the case of “stupidly tried to grab something that wasn’t humanly possible, even if in principle” as a confused way for a human, without making any claim of reality being conveniently compressible at all levels. (Note that this link explicitly disavows beauty at “all levels” too.)
I suppose you might also say “I didn’t make any claim of finiteness” but I do think something like “at least some humans are only a finite string away from grokking anything” is implicit if you expect there to be blogposts/textbooks that can operationalize everything relevant. It would be an even stronger claim than “finiteness”, it would be “human-typical length strings”
I believe Adam is pointing at something quite important, akin to a McNamara fallacy for formalization. To paraphrase:
In the case of something that has already been engineered (human brains with agency), we probably should grant that it is possible to operationalize everything relevant. But I want to pushback on the general version and would want “why do you believe simple-formalization is possible here, in this domain?” to be allowed to be asked.
[PS. am not a native speaker]