Huh, I thought you agreed with statements like “if we had many shots at AI Alignment and could get reliable empirical feedback on whether an AI Alignment solution is working, AI Alignment would be much easier”.
My model is that John is talking about “evidence on whether an AI alignment solution is sufficient”, and you understood him to say “evidence on whether the AI Alignment problem is real/difficult”. My guess is you both agree on the former, but I am not confident.
Huh, I thought you agreed with statements like “if we had many shots at AI Alignment and could get reliable empirical feedback on whether an AI Alignment solution is working, AI Alignment would be much easier”.
I agree that having many shots is helpful, but lacking them is not the core difficulty (just as having many shots to launch a rocket doesn’t help you very much if you have no idea how rockets work).
I don’t really know what “reliable empirical feedback” means in this context—if you have sufficiently reliable feedback mechanisms, then you’ve solved most of the alignment problem. But, out of the things John listed:
Goodhart problems in outer alignment, deception in inner alignment, phase change in hard takeoff, “getting what you measure” in slow takeoff
I expect that we’ll observe a bunch of empirical examples of each of these things happening (except for the hard takeoff phase change), and not know how to fix them.
I agree that having many shots is helpful, but lacking them is not the core difficulty (just as having many shots to launch a rocket doesn’t help you very much if you have no idea how rockets work).
I do really feel like it would have been really extremely hard to build rockets if we had to get it right on the very first try.
I think for rockets the fact that it is so costly to experiment with stuff, explains the majority of the difficulty of rocket engineering. I agree you also have very little chance to build a successful space rocket without having a good understanding of newtonian mechanics and some aspects of relativity, but I don’t know, if I could just launch a rocket every day without bad consequences, I am pretty sure I wouldn’t really need a deep understanding of either of those, or would easily figure out the relevant bits as I kept experimenting.
The reason why rocket science relies so much on having solid theoretical models is because we have to get things right in only a few shots. I don’t think you really needed any particularly good theory to build trains for example. Just a lot of attempts and tinkering.
At a sufficiently high level of abstraction, I agree that “cost of experimenting” could be seen as the core difficulty. But at a very high level of abstraction, many other things could also be seen as the core difficulty, like “our inability to coordinate as a civilization” or “the power of intelligence” or “a lack of interpretability”, etc. Given this, John’s comment seemed like mainly rhetorical flourishing rather than a contentful claim about the structure of the difficult parts of the alignment problem.
Also, I think that “on our first try” thing isn’t a great framing, because there are always precursors (e.g. we landed a man on the moon “on our first try” but also had plenty of tries at something kinda similar). Then the question is how similar, and how relevant, the precursors are—something where I expect our differing attitudes about the value of empiricism to be the key crux.
Well you could probably build a rocket that looks like it works, anyways. Could you build one you would want to try to travel to the moon in? (Are you imagining you get to fly in these rockets? Or just launch and watch from ground? I was imagining the 2nd...)
I agree that having many shots is helpful, but lacking them is not the core difficulty (just as having many shots to launch a rocket doesn’t help you very much if you have no idea how rockets work).
I basically buy that argument, though I do still think lack of shots is the main factor which makes alignment harder than most other technical fields in their preparadigmatic stage.
“Harder” can have two meanings: “the program (of design, and the proof) is longer” and “the program is less likely to be generated in the real world”. These meanings are correlated, but not identical.
Huh, I thought you agreed with statements like “if we had many shots at AI Alignment and could get reliable empirical feedback on whether an AI Alignment solution is working, AI Alignment would be much easier”.
My model is that John is talking about “evidence on whether an AI alignment solution is sufficient”, and you understood him to say “evidence on whether the AI Alignment problem is real/difficult”. My guess is you both agree on the former, but I am not confident.
I agree that having many shots is helpful, but lacking them is not the core difficulty (just as having many shots to launch a rocket doesn’t help you very much if you have no idea how rockets work).
I don’t really know what “reliable empirical feedback” means in this context—if you have sufficiently reliable feedback mechanisms, then you’ve solved most of the alignment problem. But, out of the things John listed:
I expect that we’ll observe a bunch of empirical examples of each of these things happening (except for the hard takeoff phase change), and not know how to fix them.
I do really feel like it would have been really extremely hard to build rockets if we had to get it right on the very first try.
I think for rockets the fact that it is so costly to experiment with stuff, explains the majority of the difficulty of rocket engineering. I agree you also have very little chance to build a successful space rocket without having a good understanding of newtonian mechanics and some aspects of relativity, but I don’t know, if I could just launch a rocket every day without bad consequences, I am pretty sure I wouldn’t really need a deep understanding of either of those, or would easily figure out the relevant bits as I kept experimenting.
The reason why rocket science relies so much on having solid theoretical models is because we have to get things right in only a few shots. I don’t think you really needed any particularly good theory to build trains for example. Just a lot of attempts and tinkering.
At a sufficiently high level of abstraction, I agree that “cost of experimenting” could be seen as the core difficulty. But at a very high level of abstraction, many other things could also be seen as the core difficulty, like “our inability to coordinate as a civilization” or “the power of intelligence” or “a lack of interpretability”, etc. Given this, John’s comment seemed like mainly rhetorical flourishing rather than a contentful claim about the structure of the difficult parts of the alignment problem.
Also, I think that “on our first try” thing isn’t a great framing, because there are always precursors (e.g. we landed a man on the moon “on our first try” but also had plenty of tries at something kinda similar). Then the question is how similar, and how relevant, the precursors are—something where I expect our differing attitudes about the value of empiricism to be the key crux.
Well you could probably build a rocket that looks like it works, anyways. Could you build one you would want to try to travel to the moon in? (Are you imagining you get to fly in these rockets? Or just launch and watch from ground? I was imagining the 2nd...)
I basically buy that argument, though I do still think lack of shots is the main factor which makes alignment harder than most other technical fields in their preparadigmatic stage.
“Harder” can have two meanings: “the program (of design, and the proof) is longer” and “the program is less likely to be generated in the real world”. These meanings are correlated, but not identical.