simon comments on What I mean by “alignment is in large part about making cognition aimable at all”

simon 3 Feb 2023 20:07 UTC
1 point
0
In theory, yes, that could work.
In practice once it’s known how to do that people will do it over and over again with successively more ambitious “limited” goals until it doesn’t work.
Another possibility is a “pivotal act”, but in practice I think a pivotal act has the following possible outcomes:
1. The intended shutdown of the AI after the pivotal act fails, humanity destroyed
2. pivotal act doesn’t work
3. pivotal act succeeds temporarily
  1. but the people responsible are vilified, caught and prevented from repeating it, pivotal act undone over time
  2. and the people responsible can repeat what they did, becoming de facto rulers of earth, but due to the lack of legitimacy face a game-theoretical landscape forcing them to act as dictators even if they have good intentions to start with
  3. and the people responsible and sympathizers convince the general public of the legitimacy of the act (potentially leads to good outcome in this case, but beware of corrupted hardware making you overestimate the actually-low probability of this happening)
4. pivotal act is somehow something that has permanent effect, but the potential of humanity permanently reduced
- Jonathan Claybrough 4 Feb 2023 2:47 UTC
  1 point
  0
  Parent
  I don’t dispute that at some point in time we want to solve alignment (to come out of the precipice period), but I disputed it’s more dangerous to know how to point AI before having solved what perfect goal to give it.
  In fact, I think it’s less dangerous because we at minimum gain more time, to work and solve alignment, and at best can use existing near human-level AGI to help us solve alignment too. The main reason to believe this is to reason that near human-level AGI is a particular zone where we can detect deception, where it can’t easily unbox itself and takeover, yet is still useful. The longer we stay in this zone, the more relatively safe progress we can make (including on alignment)
  - simon 4 Feb 2023 4:00 UTC
    1 point
    0
    Parent
    In fact, I think it’s less dangerous because we at minimum gain more time
    As I argued above we have less time, not more, if we know how to point AI.
    An AI aimed at something in particular would be much more dangerous for its level than one not aimed at any particular real-world goal, and so “near-human level AGI” would be much safer (and we can keep in the near-human level zone you mention longer) if we can’t point it.