Remmelt comments on Productive Mistakes, Not Perfect Answers

Remmelt 8 Apr 2022 6:26 UTC
2 points
In the end you do want to solve the problem, obviously. But the road from here to there goes through many seemingly weird and insufficient ideas that are corrected, adapted, refined, often discarded except for a small bit. Alignment is no different, including “strong” alignment.

There is an implicit assumption here that is not covering all the possible outcomes of research progress.

With progress on understanding some open problems in mathematics and computer science, they have turned out unsolvable. That is a valuable, decision-relevant conclusion. It means it is better to do something else than keep hacking away at solving that maths problem.

E.g.
- Solving for why mathematical models would be both consistent and complete (see https://youtu.be/HeQX2HjkcNo)
- Solving for that any distributed data store can guarantee consistency, availability, and partition tolerance (see https://en.m.wikipedia.org/wiki/CAP_theorem)
- Solving for 5 degree polynomials with radicals (see http://www.scientificlib.com/en/Mathematics/LX/AbelRuffiniTheorem.html).
We cannot just rely on a can-do attitude, as we can with starting a start-up (where even if there’s something fundamentally wrong about the idea, and it fails, only a few people’s lives are impacted hard).

With ‘solving for’ the alignment of generally intelligent and scalable/replicable machine algorithms, it is different.

This is the extinction of human society and all biological life we are talking about. We need to think this through rationally, and consider all outcomes of our research impartially.

I appreciate the emphasis on diverse conceptual approaches. Please, be careful in what you are looking for.
- Logan Riggs 8 Apr 2022 15:56 UTC
  2 points
  Parent
  I’m confused on what your point here even is. For the first part, if you’re trying to say
  research that gives strong arguments/proofs that you cannot solve alignment by doing X (like showing certain techniques aren’t powerful enough to prove P!=NP) is also useful.
  , then that makes sense. But the post didn’t mention anything about that?
  You said:
  We cannot just rely on a can-do attitude, as we can with starting a start-up (where even if there’s something fundamentally wrong about the idea, and it fails, only a few people’s lives are impacted hard).
  which I feel is satirizing the post. I read the post to say
  It’s extremely difficult to get all the bits of hidden information in one shot, so it’s important to go at it from many different angles like what’s happened historically. There will always be problems with individual approaches, but we can steelman them to think about what hidden bits of info they could reveal about the actual solution.
  We don’t have any proofs that the approaches the referenced researchers are doomed to fail like we have for P!=NP and what you linked. I would predict that Adam does think approaches that run counter to “instrumental convergence, the security mindset, the fragility of value, the orthogonality thesis” to be doomed to fail.
  - Remmelt 8 Apr 2022 18:02 UTC
    15 points
    Parent
    We don’t have any proofs that the approaches the referenced researchers are doomed to fail like we have for P!=NP and what you linked.
    
    Besides looking for different angles or ways to solve alignment, or even for strong arguments/proofs why a particular technique will not solve alignment,
    … it seems prudent to also look for whether you can prove embedded misalignment by contradiction (in terms of the inconsistency of the inherent logical relations between essential properties that would need to be defined as part of the concept of embedded/implemented/computed alignment).
    This is analogous to the situation Hilbert and others in the Vienna circle found themselves in trying to ‘solve for’ mathematical models being (presumably) both complete and consistent. Gödel, who was a semi-outsider, instead took the inverse route of proving by contradiction that a model cannot be simultaneously complete and consistent.
    
    If you have an entire community operating under the assumption that a problem is solvable or at least resolving to solve the problem in the hope that it is solvable, it seems epistemically advisable to have at least a few oddballs attempting to prove that the problem is unsolvable.
    
    Otherwise you end up skewing your entire ‘portfolio allocation’ of epistemic bets.
    - Logan Riggs 12 Apr 2022 18:18 UTC
      5 points
      Parent
      I understand your point now, thanks. It’s:
      An embedded aligned agent is desired to have properties (1),(2), and (3). But, suppose (1) & (2), then (3) cannot be true. Then, suppose (2) & …
      or something of the sort.
      - Remmelt 13 Apr 2022 10:36 UTC
        1 point
        Parent
        Yeah, that points well to what I meant. I appreciate your generous intellectual effort here to paraphrase back!
        
        Sorry about my initially vague and disagreeable comment (aimed at Adam, who I chat with sometimes as a colleague). I was worried about what looks like a default tendency in the AI existential safety community to start from the assumption that problems in alignment are solvable.
        
        Adam has since clarified with me that although he had not written about it in the post, he is very much open to exploring impossibility arguments (and sent me a classic paper on impossibility proofs in distributed computing).
    - Remmelt 9 Apr 2022 2:39 UTC
      2 points
      Parent
      … making your community and (in this case) the wider world fragile to reality proving you wrong.