Bogdan Ionut Cirstea comments on Protocol evaluations: good analogies vs control

Bogdan Ionut Cirstea 22 Feb 2024 0:14 UTC
1 point
0
Shouldn’t e.g. existing theoretical ML results also inform (at least somewhat, and conditioned on assumptions about architecture, etc.) p(scheming)? E.g. I provide an argument / a list of references on why one might expect a reasonable likelihood that ‘AI will be capable of automating R&D while also being incapable of doing non-trivial consequentialist reasoning in a forward pass’ here.
- Fabien Roger 22 Feb 2024 17:00 UTC
  3 points
  1
  Parent
  Agreed. I’d say current theoretical ML is some weak version of “theoretically-sound arguments about protocols and AIs”, but it might become better in the future.