I’m confused on what your point here even is. For the first part, if you’re trying to say
research that gives strong arguments/proofs that you cannot solve alignment by doing X (like showing certain techniques aren’t powerful enough to prove P!=NP) is also useful.
, then that makes sense. But the post didn’t mention anything about that?
You said:
We cannot just rely on a can-do attitude, as we can with starting a start-up (where even if there’s something fundamentally wrong about the idea, and it fails, only a few people’s lives are impacted hard).
which I feel is satirizing the post. I read the post to say
It’s extremely difficult to get all the bits of hidden information in one shot, so it’s important to go at it from many different angles like what’s happened historically. There will always be problems with individual approaches, but we can steelman them to think about what hidden bits of info they could reveal about the actual solution.
We don’t have any proofs that the approaches the referenced researchers are doomed to fail like we have for P!=NP and what you linked. I would predict that Adam does think approaches that run counter to “instrumental convergence, the security mindset, the fragility of value, the orthogonality thesis” to be doomed to fail.
We don’t have any proofs that the approaches the referenced researchers are doomed to fail like we have for P!=NP and what you linked.
Besides looking for different angles or ways to solve alignment, or even for strong arguments/proofs why a particular technique will not solve alignment, … it seems prudent to also look for whether you can prove embedded misalignment by contradiction (in terms of the inconsistency of the inherent logical relations between essential properties that would need to be defined as part of the concept of embedded/implemented/computed alignment).
This is analogous to the situation Hilbert and others in the Vienna circle found themselves in trying to ‘solve for’ mathematical models being (presumably) both complete and consistent. Gödel, who was a semi-outsider, instead took the inverse route of proving by contradiction that a model cannot be simultaneously complete and consistent.
If you have an entire community operating under the assumption that a problem is solvable or at least resolving to solve the problem in the hope that it is solvable, it seems epistemically advisable to have at least a few oddballs attempting to prove that the problem is unsolvable.
Otherwise you end up skewing your entire ‘portfolio allocation’ of epistemic bets.
Yeah, that points well to what I meant. I appreciate your generous intellectual effort here to paraphrase back!
Sorry about my initially vague and disagreeable comment (aimed at Adam, who I chat with sometimes as a colleague). I was worried about what looks like a default tendency in the AI existential safety community to start from the assumption that problems in alignment are solvable.
Adam has since clarified with me that although he had not written about it in the post, he is very much open to exploring impossibility arguments (and sent me a classic paper on impossibility proofs in distributed computing).
I’m confused on what your point here even is. For the first part, if you’re trying to say
, then that makes sense. But the post didn’t mention anything about that?
You said:
which I feel is satirizing the post. I read the post to say
We don’t have any proofs that the approaches the referenced researchers are doomed to fail like we have for P!=NP and what you linked. I would predict that Adam does think approaches that run counter to “instrumental convergence, the security mindset, the fragility of value, the orthogonality thesis” to be doomed to fail.
Besides looking for different angles or ways to solve alignment, or even for strong arguments/proofs why a particular technique will not solve alignment,
… it seems prudent to also look for whether you can prove embedded misalignment by contradiction (in terms of the inconsistency of the inherent logical relations between essential properties that would need to be defined as part of the concept of embedded/implemented/computed alignment).
This is analogous to the situation Hilbert and others in the Vienna circle found themselves in trying to ‘solve for’ mathematical models being (presumably) both complete and consistent. Gödel, who was a semi-outsider, instead took the inverse route of proving by contradiction that a model cannot be simultaneously complete and consistent.
If you have an entire community operating under the assumption that a problem is solvable or at least resolving to solve the problem in the hope that it is solvable, it seems epistemically advisable to have at least a few oddballs attempting to prove that the problem is unsolvable.
Otherwise you end up skewing your entire ‘portfolio allocation’ of epistemic bets.
I understand your point now, thanks. It’s:
or something of the sort.
Yeah, that points well to what I meant. I appreciate your generous intellectual effort here to paraphrase back!
Sorry about my initially vague and disagreeable comment (aimed at Adam, who I chat with sometimes as a colleague). I was worried about what looks like a default tendency in the AI existential safety community to start from the assumption that problems in alignment are solvable.
Adam has since clarified with me that although he had not written about it in the post, he is very much open to exploring impossibility arguments (and sent me a classic paper on impossibility proofs in distributed computing).
… making your community and (in this case) the wider world fragile to reality proving you wrong.