Maybe you did. I find it hard to distinguish inventing and half remembering ideas.
If the training procedure either
Reliably produces mesaoptimisers with about the same values. or
Reliably produces mesaoptimizers that can acausally cooperate
The rest of the procedure allows one mesaoptimizer to take control of the whole output
Then using different copies of GPT-n trained from different seeds doesn’t help.
If you just convert 1% of the english into network yourself, then all it needs to use is some error correction. Even without that, neural net structures are somewhat redundant. A malicious optimizer with a random 1% removed is probably still a malicious optimizer.
Secondly, its goal here is to give us subtly wrong alignment papers. Suppose we produce an examination and transparency step so thorough that no mesa optimizer can slip through. The original mesaoptimiser can just design a network that is doing its best as a mesaoptimiser free predictor, but has subtly maliciously chosen beliefs about AI research.
The error correction needs to be present in the original network because I also do some of the converting network into english. The only reason I don’t do everything myself is that it takes too long. The proportion can be higher at the topmost levels because there are less tasks there. The error correction doesn’t let it completely ignore what I would do at the low levels because on the 1% I can compare its outputs to mine, so they need to at least superficially look similar.
If we find that there’s a bunch of redundancy, we can check whether there is any way to cut it down that would suddenly identify a bunch of mesa optimization. So the mesaoptimizer would have to take over the network entirely or trick its bretheren as well as us.
Maybe you did. I find it hard to distinguish inventing and half remembering ideas.
If the training procedure either
Reliably produces mesaoptimisers with about the same values. or
Reliably produces mesaoptimizers that can acausally cooperate
The rest of the procedure allows one mesaoptimizer to take control of the whole output
Then using different copies of GPT-n trained from different seeds doesn’t help.
If you just convert 1% of the english into network yourself, then all it needs to use is some error correction. Even without that, neural net structures are somewhat redundant. A malicious optimizer with a random 1% removed is probably still a malicious optimizer.
Secondly, its goal here is to give us subtly wrong alignment papers. Suppose we produce an examination and transparency step so thorough that no mesa optimizer can slip through. The original mesaoptimiser can just design a network that is doing its best as a mesaoptimiser free predictor, but has subtly maliciously chosen beliefs about AI research.
The error correction needs to be present in the original network because I also do some of the converting network into english. The only reason I don’t do everything myself is that it takes too long. The proportion can be higher at the topmost levels because there are less tasks there. The error correction doesn’t let it completely ignore what I would do at the low levels because on the 1% I can compare its outputs to mine, so they need to at least superficially look similar.
If we find that there’s a bunch of redundancy, we can check whether there is any way to cut it down that would suddenly identify a bunch of mesa optimization. So the mesaoptimizer would have to take over the network entirely or trick its bretheren as well as us.