Solving a scientific problem without being able to learn from experiments and failures is incredibly hard.
I wonder what, if any, scientific/theoretical problems have been solved right “on the first try” in human history. I know MIRI and others have done studies of history to find examples of e.g. technological discontinuities. Perhaps a study could be made of this?
An example Yudkowsky brings up in the Sequences often, is Einstein’s discovery of General Relativity. I think this is informative and helpful for alignment. Einstein did lots of thought experiments, and careful reasoning, to the point where his theory basically “came out” right, in time for experiments to prove it so.
More generally, I think Yudkowsky analogizes AI safety to physics, and it seems similar: combination of careful theory and expensive/dangerous experiments, the high intellectual barriers to entry, the need for hardcore conceptual and mathematical “engineering” to even think about the relevant things, the counterintuitive properties, etc.
TLDR write, self-critique, and formalize more thought experiments. This could help a lot with getting alignment theoretically right sooner (which helps regardless of how critical the “first” experimental attempt turns out to be).
I wonder what, if any, scientific/theoretical problems have been solved right “on the first try” in human history. I know MIRI and others have done studies of history to find examples of e.g. technological discontinuities. Perhaps a study could be made of this?
An example Yudkowsky brings up in the Sequences often, is Einstein’s discovery of General Relativity. I think this is informative and helpful for alignment. Einstein did lots of thought experiments, and careful reasoning, to the point where his theory basically “came out” right, in time for experiments to prove it so.
More generally, I think Yudkowsky analogizes AI safety to physics, and it seems similar: combination of careful theory and expensive/dangerous experiments, the high intellectual barriers to entry, the need for hardcore conceptual and mathematical “engineering” to even think about the relevant things, the counterintuitive properties, etc.
TLDR write, self-critique, and formalize more thought experiments. This could help a lot with getting alignment theoretically right sooner (which helps regardless of how critical the “first” experimental attempt turns out to be).