avturchin comments on A Heuristic Proof of Practical Aligned Superintelligence

avturchin 11 Oct 2024 9:38 UTC
11 points
0
There are at lest two meaning in alignment: 1. do what I want and 2. don’t have catastrophic failure.
I think that “alignment is easy’ works only for the first requirement. But there could be many catastrophic failures modes, e.g. even humans can rebel or drift away from initial goals.
- Roko 11 Oct 2024 13:14 UTC
  2 points
  −6
  Parent
  Yes, I think this objection captures something important.
  
  I have proven that aligned AI must exist and also that it must be practically implementable.
  
  But some kind of failure, i.e. a “near miss” on achieving a desired goal can happen even if success was possible.
  
  I will address these near misses in future posts.