Remmelt comments on Productive Mistakes, Not Perfect Answers

Remmelt 8 Apr 2022 18:02 UTC
15 points
We don’t have any proofs that the approaches the referenced researchers are doomed to fail like we have for P!=NP and what you linked.

Besides looking for different angles or ways to solve alignment, or even for strong arguments/proofs why a particular technique will not solve alignment,
… it seems prudent to also look for whether you can prove embedded misalignment by contradiction (in terms of the inconsistency of the inherent logical relations between essential properties that would need to be defined as part of the concept of embedded/implemented/computed alignment).
This is analogous to the situation Hilbert and others in the Vienna circle found themselves in trying to ‘solve for’ mathematical models being (presumably) both complete and consistent. Gödel, who was a semi-outsider, instead took the inverse route of proving by contradiction that a model cannot be simultaneously complete and consistent.

If you have an entire community operating under the assumption that a problem is solvable or at least resolving to solve the problem in the hope that it is solvable, it seems epistemically advisable to have at least a few oddballs attempting to prove that the problem is unsolvable.

Otherwise you end up skewing your entire ‘portfolio allocation’ of epistemic bets.
- Logan Riggs 12 Apr 2022 18:18 UTC
  5 points
  Parent
  I understand your point now, thanks. It’s:
  An embedded aligned agent is desired to have properties (1),(2), and (3). But, suppose (1) & (2), then (3) cannot be true. Then, suppose (2) & …
  or something of the sort.
  - Remmelt 13 Apr 2022 10:36 UTC
    1 point
    Parent
    Yeah, that points well to what I meant. I appreciate your generous intellectual effort here to paraphrase back!
    
    Sorry about my initially vague and disagreeable comment (aimed at Adam, who I chat with sometimes as a colleague). I was worried about what looks like a default tendency in the AI existential safety community to start from the assumption that problems in alignment are solvable.
    
    Adam has since clarified with me that although he had not written about it in the post, he is very much open to exploring impossibility arguments (and sent me a classic paper on impossibility proofs in distributed computing).
- Remmelt 9 Apr 2022 2:39 UTC
  2 points
  Parent
  … making your community and (in this case) the wider world fragile to reality proving you wrong.