For the first issue, I agree that “Carefully Bootstrapped Alignment” is organizationally hard, but I don’t think improving the organizational culture is an effective solution. It is too slow and humans often make mistakes. I think technical solutions are needed. For example, let an AI be responsible for safety assessment. When a researcher submits a job to the AI training cluster, this AI assesses the safety of the job. If this job may produce a dangerous AI, the job will be rejected. In addition, external supervision is also needed. For example, the government could stipulate that before an AI organization releases a new model, it needs to be evaluated by a third-party safety organization, and all organizations with computing resources exceeding a certain threshold need be supervised. There is more discussion on this in the section Restricting AI Development. For the second issue, you mentioned free variables. I think this is a key point. In the case where we are not fully confident in the safety of AI, we should reduce free variables as much as possible. This is why I proposed a series of AI Controllability Rules. The priority of these rules is higher than the goals. AI should be trained to achieve the goals under the premise of complying with the rules. In addition, I think we should not place all our hopes on alignment. We should have more measures to deal with the situation where AI alignment fails, such as AI Monitoring and Decentralizing AI Power.
For the first issue, I agree that “Carefully Bootstrapped Alignment” is organizationally hard, but I don’t think improving the organizational culture is an effective solution. It is too slow and humans often make mistakes. I think technical solutions are needed. For example, let an AI be responsible for safety assessment. When a researcher submits a job to the AI training cluster, this AI assesses the safety of the job. If this job may produce a dangerous AI, the job will be rejected. In addition, external supervision is also needed. For example, the government could stipulate that before an AI organization releases a new model, it needs to be evaluated by a third-party safety organization, and all organizations with computing resources exceeding a certain threshold need be supervised. There is more discussion on this in the section Restricting AI Development.
For the second issue, you mentioned free variables. I think this is a key point. In the case where we are not fully confident in the safety of AI, we should reduce free variables as much as possible. This is why I proposed a series of AI Controllability Rules. The priority of these rules is higher than the goals. AI should be trained to achieve the goals under the premise of complying with the rules. In addition, I think we should not place all our hopes on alignment. We should have more measures to deal with the situation where AI alignment fails, such as AI Monitoring and Decentralizing AI Power.