Shouldn’t e.g. existing theoretical ML results also inform (at least somewhat, and conditioned on assumptions about architecture, etc.) p(scheming)? E.g. I provide an argument / a list of references on why one might expect a reasonable likelihood that ‘AI will be capable of automating R&D while also being incapable of doing non-trivial consequentialist reasoning in a forward pass’ here.
Agreed. I’d say current theoretical ML is some weak version of “theoretically-sound arguments about protocols and AIs”, but it might become better in the future.
Shouldn’t e.g. existing theoretical ML results also inform (at least somewhat, and conditioned on assumptions about architecture, etc.) p(scheming)? E.g. I provide an argument / a list of references on why one might expect a reasonable likelihood that ‘AI will be capable of automating R&D while also being incapable of doing non-trivial consequentialist reasoning in a forward pass’ here.
Agreed. I’d say current theoretical ML is some weak version of “theoretically-sound arguments about protocols and AIs”, but it might become better in the future.