rvnnt comments on [Linkpost] Introducing Superalignment

rvnnt 6 Jul 2023 10:48 UTC
0 points
0
[...] iteratively align superintelligence.

To align the first automated alignment researcher, [...]

To validate the alignment of our systems, [...]

What do they mean by “aligned”?

How do we ensure AI systems much smarter than humans follow human intent?

OK. Assuming that
- sharp left turns are not an issue,
- and scalable oversight is even possible in practice,
- and OAI somehow solves the problems of
  - AIs hacking humans (to influence their intents),
  - and deceptive alignment,
  - humans going crazy when given great power,
  - etc.
  - and all the problems no-one has noticed yet,
then, there’s the question of “aligned to what”? Whose intent? What would success at this agenda look like?

Maybe: A superintelligence that accurately models its human operator, follows the human’s intent^[1] to complete difficult-but-bounded tasks, and is runnable at human-speed with manageable amount of compute, sitting on OAI’s servers?

Who would get to use that superintelligence? For what purpose would they use it? How long before the {NSA, FSB, CCP, …} steal that superintelligence off OAI’s servers? What would they use it for?

Point being: If an organization is not adequate in all key dimensions of operational adequacy, then even if they somehow miraculously solve the alignment/control problem, they might be increasing S-risks while only somewhat decreasing X-risks.

What is OAI’s plan for getting their opsec and common-good-commitment to adequate levels? What’s their plan for handling success at alignment/control?
1. ↩︎
  and does not try to hack the human into having more convenient intents