lc comments on Safety engineering, target selection, and alignment theory

lc 10 Apr 2024 12:32 UTC
2 points
0

One might worry that it is difficult to set benchmarks of success for alignment research. Is a Newtonian understanding of gravitation sufficient to attempt a Moon landing, or must one develop a complete theory of general relativity before believing that one can land softly on the Moon?3

In the case of AI alignment, there is at least one obvious benchmark to focus on initially. Imagine we had access to an incredibly powerful computer with access to the internet, an automated factory, and large sums of money. If we could program that computer to reliably achieve some simple goal (such as producing as much diamond as possible), then a large share of the AI alignment research would be completed.

Are we close to meeting this benchmark?