Task: Apply an abstract proposal to a concrete ML system
Context: A researcher is reading a highly theoretical alignment paper and is curious about whether/how it might apply to a real world machine learning system, like a large transformer trained using SGD. They would like to see what parts of the ML system would change under this proposal.
Input type: a theoretical alignment proposal and a description of an ML system
Output type: a description of how the ML system would change under the given proposal
Info constraints: none
Instance 1:
Input:
Abstract proposal: The description of the complexity regularizer from the ELK report:
ML system description: a GPT-style model trained on natural language using SGD, with a GPT-style reporter trained on the GPT model’s weights.
Output: This is tricky becuase the circuit complexity of a neural network is largely fixed by its architecture. As we sweep over different architecutres/hyperparemeters for the reporter model, we can add a regularization term to the hyperparameter optimization based on the total number of weights in the model.
Other possible output: Since circuit complexity for a neural network is largely fixed by its architecture, we will consider the “minimal complexity” of the trained neural network to be the number of non-zero parameters, and will regularize on that to encourage sparsity in the weights.
Task: Apply an abstract proposal to a concrete ML system
Context: A researcher is reading a highly theoretical alignment paper and is curious about whether/how it might apply to a real world machine learning system, like a large transformer trained using SGD. They would like to see what parts of the ML system would change under this proposal.
Input type: a theoretical alignment proposal and a description of an ML system
Output type: a description of how the ML system would change under the given proposal
Info constraints: none
Instance 1:
Input:
Abstract proposal: The description of the complexity regularizer from the ELK report:
ML system description: a GPT-style model trained on natural language using SGD, with a GPT-style reporter trained on the GPT model’s weights.
Output: This is tricky becuase the circuit complexity of a neural network is largely fixed by its architecture. As we sweep over different architecutres/hyperparemeters for the reporter model, we can add a regularization term to the hyperparameter optimization based on the total number of weights in the model.
Other possible output: Since circuit complexity for a neural network is largely fixed by its architecture, we will consider the “minimal complexity” of the trained neural network to be the number of non-zero parameters, and will regularize on that to encourage sparsity in the weights.
Instance 2:
Input:
Abstract proposal: The Iterated Amplification and Distillation proposal, as described in https://ai-alignment.com/iterated-distillation-and-amplification-157debfd1616
ML system description: a GPT-style model trained on natural language using SGD.
Output: Any of the project proposals in https://www.alignmentforum.org/posts/Y9xD78kufNsF7wL6f/machine-learning-projects-on-ida, or something similar