FWIW I think paul slow takeoff is pretty unlikely for reasons to be found in this thread and this post. On the other hand, as someone who thinks fast takeoff (in various senses) is more likely than not, I don’t yet see why that makes Truthful LM work significantly less useful. (By contrast I totally see why Truthful LM work is significantly less useful if AGI/TAI/etc. comes from stuff that doesn’t resemble modern deep learning.)
“Catch misalignment early...” This makes it sound like misalignment is something that AIs don’t have yet but might one day have, so we need to be vigilant and notice it when it appears. But instead isn’t misalignment something that all AIs have by default?
My current view is that power-seeking misalignment will probably cause existential catastrophe, that persuasion tools happen first and have a >20% chance of destroying our ability to solve that problem, and that there are various philosophical and societal problems that could (>20%) get us even if we solve power-seeking misalignment. Does this mean I agree or disagree with “our current picture of the risks is incomplete?”
(The underlying HTML is <a href="http://I think that working on truthful LMs has a comparative advantage in worlds where: We have around 10-40 years until transformative AI Transformative AI is built using techniques that resemble modern deep learning There is a slow takeoff Alignment does not require vastly more theoretical insight (but may require some) Our current picture of the risks posed by transformative AI is incomplete">this thread</a>. I am also surprised that this results in clicking being a no-op, rather than a “functional” link that leads to your browser’s could-not-resolve-host page.)
“Catch misalignment early...”—This should have been “scary misalignment”, e.g. power-seeking misalignment, deliberate deception in order to achieve human approval, etc., which I don’t think we’ve seen clear signs of in current LMs. My thinking was that in fast takeoff scenarios, we’re less likely to spot this until it’s too late, and more generally that truthful LM work is less likely to “scale gracefully” to AGI. It’s interesting that you don’t share these intuitions.
Does this mean I agree or disagree with “our current picture of the risks is incomplete?”
As mentioned, this phrase should probably be replaced by “a significant portion of the total existential risk from AI comes from risks other than power-seeking misalignment”. There isn’t supposed to be a binary cutoff for “significant portion”; the claim is that the greater the risks other than power-seeking misalignment, the greater the comparative advantage of truthful LM work. This is because truthful LM work seems more useful for addressing risks from social problems such AI persuasion (as well as other potential risks that haven’t been as clearly articulated yet, I think). Sorry that my original phrasing was so unclear.
Nothing to apologize for, it was reasonably clear, I’m just trying to learn more about what you believe and why. This has been helpful, thanks!
I totally agree that in fast takeoff scenarios we are less likely to spot those things until it’s too late. I guess I agree that truthful LM work is less likely to scale gracefully to AGI in fast takeoff scenarios… so I guess I agree with your overall point… I just notice I feel a bit confused and muddle about it, is all. I can imagine plausible slow-takeoff scenarios in which truthful LM work doesn’t scale gracefully, and plausible fast-takeoff scenarios in which it does. At least, I think I can. The former scenario would be something like: It turns out the techniques we develop for making dumb AIs truthful stop working once the AIs get smart, for similar reasons that techniques we use to make small children be honest (or to put it more vividly, believe in santa) stop working once they grow up. The latter scenario would be something like: Actually that’s not the case, the techniques work all the way up past human level intelligence, and “fast takeoff” in practice means “throttled takeoff” where the leading AI project knows they have a few month lead over everyone else and is using those months to do some sort of iterated distillation and amplification, in which it’s crucial that the early stages be truthful and that the techniques scale to stage N overseeing stage N+1.
Thanks, these clarifications are very helpful.
FWIW I think paul slow takeoff is pretty unlikely for reasons to be found in this thread and this post. On the other hand, as someone who thinks fast takeoff (in various senses) is more likely than not, I don’t yet see why that makes Truthful LM work significantly less useful. (By contrast I totally see why Truthful LM work is significantly less useful if AGI/TAI/etc. comes from stuff that doesn’t resemble modern deep learning.)
“Catch misalignment early...” This makes it sound like misalignment is something that AIs don’t have yet but might one day have, so we need to be vigilant and notice it when it appears. But instead isn’t misalignment something that all AIs have by default?
My current view is that power-seeking misalignment will probably cause existential catastrophe, that persuasion tools happen first and have a >20% chance of destroying our ability to solve that problem, and that there are various philosophical and societal problems that could (>20%) get us even if we solve power-seeking misalignment. Does this mean I agree or disagree with “our current picture of the risks is incomplete?”
FYI, the “this thread” link in your comment doesn’t work. Apparently it’s possible for a link to be simultaneously green and unclickable.
(The underlying HTML is
<a href="http://I think that working on truthful LMs has a comparative advantage in worlds where: We have around 10-40 years until transformative AI Transformative AI is built using techniques that resemble modern deep learning There is a slow takeoff Alignment does not require vastly more theoretical insight (but may require some) Our current picture of the risks posed by transformative AI is incomplete">this thread</a>
. I am also surprised that this results in clicking being a no-op, rather than a “functional” link that leads to your browser’s could-not-resolve-host page.)Fixed, thanks!
“Catch misalignment early...”—This should have been “scary misalignment”, e.g. power-seeking misalignment, deliberate deception in order to achieve human approval, etc., which I don’t think we’ve seen clear signs of in current LMs. My thinking was that in fast takeoff scenarios, we’re less likely to spot this until it’s too late, and more generally that truthful LM work is less likely to “scale gracefully” to AGI. It’s interesting that you don’t share these intuitions.
As mentioned, this phrase should probably be replaced by “a significant portion of the total existential risk from AI comes from risks other than power-seeking misalignment”. There isn’t supposed to be a binary cutoff for “significant portion”; the claim is that the greater the risks other than power-seeking misalignment, the greater the comparative advantage of truthful LM work. This is because truthful LM work seems more useful for addressing risks from social problems such AI persuasion (as well as other potential risks that haven’t been as clearly articulated yet, I think). Sorry that my original phrasing was so unclear.
Nothing to apologize for, it was reasonably clear, I’m just trying to learn more about what you believe and why. This has been helpful, thanks!
I totally agree that in fast takeoff scenarios we are less likely to spot those things until it’s too late. I guess I agree that truthful LM work is less likely to scale gracefully to AGI in fast takeoff scenarios… so I guess I agree with your overall point… I just notice I feel a bit confused and muddle about it, is all. I can imagine plausible slow-takeoff scenarios in which truthful LM work doesn’t scale gracefully, and plausible fast-takeoff scenarios in which it does. At least, I think I can. The former scenario would be something like: It turns out the techniques we develop for making dumb AIs truthful stop working once the AIs get smart, for similar reasons that techniques we use to make small children be honest (or to put it more vividly, believe in santa) stop working once they grow up. The latter scenario would be something like: Actually that’s not the case, the techniques work all the way up past human level intelligence, and “fast takeoff” in practice means “throttled takeoff” where the leading AI project knows they have a few month lead over everyone else and is using those months to do some sort of iterated distillation and amplification, in which it’s crucial that the early stages be truthful and that the techniques scale to stage N overseeing stage N+1.