Solving a scientific problem without being able to learn from experiments and failures is incredibly hard.
I wonder what, if any, scientific/theoretical problems have been solved right “on the first try” in human history. I know MIRI and others have done studies of history to find examples of e.g. technological discontinuities. Perhaps a study could be made of this?
An example Yudkowsky brings up in the Sequences often, is Einstein’s discovery of General Relativity. I think this is informative and helpful for alignment. Einstein did lots of thought experiments, and careful reasoning, to the point where his theory basically “came out” right, in time for experiments to prove it so.
More generally, I think Yudkowsky analogizes AI safety to physics, and it seems similar: combination of careful theory and expensive/dangerous experiments, the high intellectual barriers to entry, the need for hardcore conceptual and mathematical “engineering” to even think about the relevant things, the counterintuitive properties, etc.
TLDR write, self-critique, and formalize more thought experiments. This could help a lot with getting alignment theoretically right sooner (which helps regardless of how critical the “first” experimental attempt turns out to be).
My general answer is a lot in pure mathematics, but mathematics in full generality is incredibly hard, so we haven’t even mapped the mathematical territory very well yet.
In scenarios where you actually have to interface with our specific physical/mathematical universe, I believe the answer is 0 times, or close to it, and Einstein cut the search space from thousands to millions of initially valid theories to 4 theories, but he didn’t get to the correct theory 0-shot, and he did have to rely on experiment a little bit:
I wonder what, if any, scientific/theoretical problems have been solved right “on the first try” in human history. I know MIRI and others have done studies of history to find examples of e.g. technological discontinuities. Perhaps a study could be made of this?
An example Yudkowsky brings up in the Sequences often, is Einstein’s discovery of General Relativity. I think this is informative and helpful for alignment. Einstein did lots of thought experiments, and careful reasoning, to the point where his theory basically “came out” right, in time for experiments to prove it so.
More generally, I think Yudkowsky analogizes AI safety to physics, and it seems similar: combination of careful theory and expensive/dangerous experiments, the high intellectual barriers to entry, the need for hardcore conceptual and mathematical “engineering” to even think about the relevant things, the counterintuitive properties, etc.
TLDR write, self-critique, and formalize more thought experiments. This could help a lot with getting alignment theoretically right sooner (which helps regardless of how critical the “first” experimental attempt turns out to be).
My general answer is a lot in pure mathematics, but mathematics in full generality is incredibly hard, so we haven’t even mapped the mathematical territory very well yet.
In scenarios where you actually have to interface with our specific physical/mathematical universe, I believe the answer is 0 times, or close to it, and Einstein cut the search space from thousands to millions of initially valid theories to 4 theories, but he didn’t get to the correct theory 0-shot, and he did have to rely on experiment a little bit:
https://www.lesswrong.com/posts/GSBCw94DsxLgDat6r/interpreting-yudkowsky-on-deep-vs-shallow-knowledge#6HPjxMvTnP9JeibXZ