In the end you do want to solve the problem, obviously. But the road from here to there goes through many seemingly weird and insufficient ideas that are corrected, adapted, refined, often discarded except for a small bit.
Alignment is no different, including “strong” alignment.
There is an implicit assumption here that is not covering all the possible outcomes of research progress.
With progress on understanding some open problems in mathematics and computer science, they have turned out unsolvable. That is a valuable, decision-relevant conclusion. It means it is better to do something else than keep hacking away at solving that maths problem.
We cannot just rely on a can-do attitude, as we can with starting a start-up (where even if there’s something fundamentally wrong about the idea, and it fails, only a few people’s lives are impacted hard).
With ‘solving for’ the alignment of generally intelligent and scalable/replicable machine algorithms, it is different.
This is the extinction of human society and all biological life we are talking about. We need to think this through rationally, and consider all outcomes of our research impartially.
I appreciate the emphasis on diverse conceptual approaches. Please, be careful in what you are looking for.
I’m confused on what your point here even is. For the first part, if you’re trying to say
research that gives strong arguments/proofs that you cannot solve alignment by doing X (like showing certain techniques aren’t powerful enough to prove P!=NP) is also useful.
, then that makes sense. But the post didn’t mention anything about that?
You said:
We cannot just rely on a can-do attitude, as we can with starting a start-up (where even if there’s something fundamentally wrong about the idea, and it fails, only a few people’s lives are impacted hard).
which I feel is satirizing the post. I read the post to say
It’s extremely difficult to get all the bits of hidden information in one shot, so it’s important to go at it from many different angles like what’s happened historically. There will always be problems with individual approaches, but we can steelman them to think about what hidden bits of info they could reveal about the actual solution.
We don’t have any proofs that the approaches the referenced researchers are doomed to fail like we have for P!=NP and what you linked. I would predict that Adam does think approaches that run counter to “instrumental convergence, the security mindset, the fragility of value, the orthogonality thesis” to be doomed to fail.
We don’t have any proofs that the approaches the referenced researchers are doomed to fail like we have for P!=NP and what you linked.
Besides looking for different angles or ways to solve alignment, or even for strong arguments/proofs why a particular technique will not solve alignment, … it seems prudent to also look for whether you can prove embedded misalignment by contradiction (in terms of the inconsistency of the inherent logical relations between essential properties that would need to be defined as part of the concept of embedded/implemented/computed alignment).
This is analogous to the situation Hilbert and others in the Vienna circle found themselves in trying to ‘solve for’ mathematical models being (presumably) both complete and consistent. Gödel, who was a semi-outsider, instead took the inverse route of proving by contradiction that a model cannot be simultaneously complete and consistent.
If you have an entire community operating under the assumption that a problem is solvable or at least resolving to solve the problem in the hope that it is solvable, it seems epistemically advisable to have at least a few oddballs attempting to prove that the problem is unsolvable.
Otherwise you end up skewing your entire ‘portfolio allocation’ of epistemic bets.
Yeah, that points well to what I meant. I appreciate your generous intellectual effort here to paraphrase back!
Sorry about my initially vague and disagreeable comment (aimed at Adam, who I chat with sometimes as a colleague). I was worried about what looks like a default tendency in the AI existential safety community to start from the assumption that problems in alignment are solvable.
Adam has since clarified with me that although he had not written about it in the post, he is very much open to exploring impossibility arguments (and sent me a classic paper on impossibility proofs in distributed computing).
There is an implicit assumption here that is not covering all the possible outcomes of research progress.
With progress on understanding some open problems in mathematics and computer science, they have turned out unsolvable. That is a valuable, decision-relevant conclusion. It means it is better to do something else than keep hacking away at solving that maths problem.
E.g.
Solving for why mathematical models would be both consistent and complete (see https://youtu.be/HeQX2HjkcNo)
Solving for that any distributed data store can guarantee consistency, availability, and partition tolerance (see https://en.m.wikipedia.org/wiki/CAP_theorem)
Solving for 5 degree polynomials with radicals (see http://www.scientificlib.com/en/Mathematics/LX/AbelRuffiniTheorem.html).
We cannot just rely on a can-do attitude, as we can with starting a start-up (where even if there’s something fundamentally wrong about the idea, and it fails, only a few people’s lives are impacted hard).
With ‘solving for’ the alignment of generally intelligent and scalable/replicable machine algorithms, it is different.
This is the extinction of human society and all biological life we are talking about. We need to think this through rationally, and consider all outcomes of our research impartially.
I appreciate the emphasis on diverse conceptual approaches. Please, be careful in what you are looking for.
I’m confused on what your point here even is. For the first part, if you’re trying to say
, then that makes sense. But the post didn’t mention anything about that?
You said:
which I feel is satirizing the post. I read the post to say
We don’t have any proofs that the approaches the referenced researchers are doomed to fail like we have for P!=NP and what you linked. I would predict that Adam does think approaches that run counter to “instrumental convergence, the security mindset, the fragility of value, the orthogonality thesis” to be doomed to fail.
Besides looking for different angles or ways to solve alignment, or even for strong arguments/proofs why a particular technique will not solve alignment,
… it seems prudent to also look for whether you can prove embedded misalignment by contradiction (in terms of the inconsistency of the inherent logical relations between essential properties that would need to be defined as part of the concept of embedded/implemented/computed alignment).
This is analogous to the situation Hilbert and others in the Vienna circle found themselves in trying to ‘solve for’ mathematical models being (presumably) both complete and consistent. Gödel, who was a semi-outsider, instead took the inverse route of proving by contradiction that a model cannot be simultaneously complete and consistent.
If you have an entire community operating under the assumption that a problem is solvable or at least resolving to solve the problem in the hope that it is solvable, it seems epistemically advisable to have at least a few oddballs attempting to prove that the problem is unsolvable.
Otherwise you end up skewing your entire ‘portfolio allocation’ of epistemic bets.
I understand your point now, thanks. It’s:
or something of the sort.
Yeah, that points well to what I meant. I appreciate your generous intellectual effort here to paraphrase back!
Sorry about my initially vague and disagreeable comment (aimed at Adam, who I chat with sometimes as a colleague). I was worried about what looks like a default tendency in the AI existential safety community to start from the assumption that problems in alignment are solvable.
Adam has since clarified with me that although he had not written about it in the post, he is very much open to exploring impossibility arguments (and sent me a classic paper on impossibility proofs in distributed computing).
… making your community and (in this case) the wider world fragile to reality proving you wrong.