Jan_Kulveit comments on Christiano (ARC) and GA (Conjecture) Discuss Alignment Cruxes

Jan_Kulveit 25 Feb 2023 4:34 UTC
5 points
0
A highly compressed version of what the disagreements are about in my ontology of disagreements about AI safety...
- crux about continuity; here GA mostly has the intuition “things will be discontinuous” and this manifests in many guesses (phase shifts, new ways of representing data, possibility to demonstrate overpowering the overseer, …); Paul assumes things will be mostly continuous, with a few exceptions which may be dangerous
  - this seems similar to typical cruxes between Paul and e.g. Eliezer (also in my view this is actually decent chunk of disagreements: my model of Eliezer predicts Eliezer would actually update toward more optimistic views if he believed “we will have more tries to solve the actual problems, and they will show in a lab setting”)
- possible crux about x-risk from the broader system (e.g. AI powered cultural evolution); here it’s unclear who is exactly where in this debate
  - I don’t think there is any neat public debate on this, but I usually disagree with Eliezer’s and similar “orthodox” views about the relative difficulty & expected neglectedness (I expect narrow single ML system “alignment” to be difficult but solvable and likely solved by default, because incentives to do so; whole-world-alignment / multi-multi to be difficult and with bad results by default)
(there are also many points of agreement)