William_S comments on Challenges to Christiano’s capability amplification proposal

William_S 26 May 2018 22:54 UTC
5 points
So I also don’t see how Paul expects the putative alignment of the little agents to pass through this mysterious aggregation form of understanding, into alignment of the system that understands Hessian-free optimization.
My model of Paul’s approach sees the alignment of the subagents as just telling you that no subagent is trying to actively sabotage your system (ie. by optimizing to find the worst possible answer to give you), and that the alignment comes from having thought carefully about how the subagents are supposed to act in advance (in a way that could potentially be run just by using a lookup table).