Under the anthropic principle, we should expect there to be a ‘consistent underlying reason’ for our continued survival.
Why? It sounds like you’re anthropic updating on the fact that we’ll exist in the future, which of course wouldn’t make sense because we’re not yet sure of that. So what am I missing?
It sounds like you’re anthropic updating on the fact that we’ll exist in the future
The quote you replied to was meant to be about the past.[1]
(paragraph retracted due to unclarity)
Specifically, I think that (“we find a fully-general agent-alignment solution right as takeoff is very near” given “early AGIs take a form that was unexpected”) is less probable than (“observing early AGI’s causes us to form new insights that lead to a different class of solution” given “early AGIs take a form that was unexpected”). Because I think that, and because I think we’re at that point where takeoff is near, it seems like it’s some evidence for being on that second path.
This should only constitute an anthropic update to the extent you think more-agentic architectures would have already killed us
I do think that’s possible (I don’t have a good enough model to put a probability on it though). I suspect that superintelligence is possible to create with much less compute than is being used for SOTA LLMs. Here’s a thread with some general arguments for this.
Of course, you could claim that our understanding of the past is not perfect, and thus should still update
I think my understanding of why we’ve survived so far re:AI is very not perfect. For example, I don’t know what would have needed to happen for training setups which would have produced agentic superintelligence by now to be found first, or (framed inversely) how lucky we needed to be to survive this far.
~~~
I’m not sure if this reply will address the disagreement, or if it will still seem from your pov that I’m making some logical mistake. I’m not actually fully sure what the disagreement is. You’re welcome to try to help me understand if one remains.
I’m sorry if any part of this response is confusing, I’m still learning to write clearly.
Everything makes sense except your second paragraph. Conditional on us solving alignment, I agree it’s more likely that we live in an “easy-by-default” world, rather than a “hard-by-default” one in which we got lucky or played very well. But we shouldn’t condition on solving alignment, because we haven’t yet.
Thus, in our current situation, the only way anthropics pushes us towards “we should work more on non-agentic systems” is if you believe “world were we still exist are more likely to have easy alignment-through-non-agentic-AIs”. Which you do believe, and I don’t. Mostly because I think in almost no worlds we have been killed by misalignment at this point. Or put another way, the developments in non-agentic AI we’re facing are still one regime change away from the dynamics that could kill us (and information in the current regime doesn’t extrapolate much to the next one).
Conditional on us solving alignment, I agree it’s more likely that we live in an “easy-by-default” world, rather than a “hard-by-default” one in which we got lucky or played very well.
(edit: summary: I don’t agree with this quote because I think logical beliefs shouldn’t update upon observing continued survival because there is nothing else we can observe. It is not my position that we should assume alignment is easy because we’ll die if it’s not)
I think that language in discussions of anthropics is unintentionally prone to masking ambiguities or conflations, especially wrt logical vs indexical probability, so I want to be very careful writing about this. I think there may be some conceptual conflation happening here, but I’m not sure how to word it. I’ll see if it becomes clear indirectly.
One difference between our intuitions may be that I’m implicitly thinking within a manyworlds frame. Within that frame it’s actually certain that we’ll solve alignment in some branches.
So if we then ‘condition on solving alignment in the future’, my mind defaults to something like this: “this is not much of an update, it just means we’re in a future where the past was not a death outcome. Some of the pasts leading up to those futures had really difficult solutions, and some of them managed to find easier ones or get lucky. The probabilities of these non-death outcomes relative to each other have not changed as a result of this conditioning.” (I.e I disagree with the top quote)
The most probable reason I can see for this difference is if you’re thinking in terms of a single future, where you expect to die.[1] In this frame, if you observe yourself surviving, it may seem[2] you should update your logical belief that alignment is hard (because P(continued observation|alignment being hard) is low, if we imagine a single future, but certain if we imagine the space of indexically possible futures).
Whereas I read it as only indexical, and am generally thinking about this in terms of indexical probabilities.
I totally agree that we shouldn’t update our logical beliefs in this way. I.e., that with regard to beliefs about logical probabilities (such as ‘alignment is very hard for humans’), we “shouldn’t condition on solving alignment, because we haven’t yet.” I.e that we shouldn’t condition on the future not being mostly death outcomes when we haven’t averted them and have reason to think they are.
Maybe this helps clarify my position?
On another point:
the developments in non-agentic AI we’re facing are still one regime change away from the dynamics that could kill us
I agree with this, and I still found the current lack of goals over the world surprising and worth trying to get as a trait of superintelligent systems.
Though after reflecting on it more I (with low confidence) think this is wrong, and one’s logical probabilities shouldn’t change after surviving in a ‘one-world frame’ universe either.
For an intuition pump: consider the case where you’ve crafted a device which, when activated, leverages quantum randomness to kill you with probability n-1/n where n is some arbitrarily large number. Given you’ve crafted it correctly, you make no logical update in the manyworlds frame because survival is the only thing you will observe; you expect to observe the 1/n branch.
In the ‘single world’ frame, continued survival isn’t guaranteed, but it’s still the only thing you could possibly observe, so it intuitively feels like the same reasoning applies...?
This update is screened off by “you actually looking at the past and checking whether we got lucky many times or there is a consistent reason”. Of course, you could claim that our understanding of the past is not perfect, and thus should still update, only less so. Although to be honest, I think there’s a strong case for the past clearly showing that we just got lucky a few times.
It sounded like you were saying the consistent reason is “our architectures are non-agentic”. This should only constitute an anthropic update to the extent you think more-agentic architectures would have already killed us (instead of killing us in the next decade). I’m not of this opinion. And if I was, I’d need to take into account factors like “how much faster I’d have expected capabilities to advance”, etc.
Why? It sounds like you’re anthropic updating on the fact that we’ll exist in the future, which of course wouldn’t make sense because we’re not yet sure of that. So what am I missing?
The quote you replied to was meant to be about the past.[1]
(paragraph retracted due to unclarity)
Specifically, I think that (“we find a fully-general agent-alignment solution right as takeoff is very near” given “early AGIs take a form that was unexpected”) is less probable than (“observing early AGI’s causes us to form new insights that lead to a different class of solution” given “early AGIs take a form that was unexpected”). Because I think that, and because I think we’re at that point where takeoff is near, it seems like it’s some evidence for being on that second path.
I do think that’s possible (I don’t have a good enough model to put a probability on it though). I suspect that superintelligence is possible to create with much less compute than is being used for SOTA LLMs. Here’s a thread with some general arguments for this.
I think my understanding of why we’ve survived so far re:AI is very not perfect. For example, I don’t know what would have needed to happen for training setups which would have produced agentic superintelligence by now to be found first, or (framed inversely) how lucky we needed to be to survive this far.
~~~
I’m not sure if this reply will address the disagreement, or if it will still seem from your pov that I’m making some logical mistake. I’m not actually fully sure what the disagreement is. You’re welcome to try to help me understand if one remains.
I’m sorry if any part of this response is confusing, I’m still learning to write clearly.
I originally thought you were asking why it’s true of the past, but then I realized we very probably agreed (in principle) in that case.
Everything makes sense except your second paragraph. Conditional on us solving alignment, I agree it’s more likely that we live in an “easy-by-default” world, rather than a “hard-by-default” one in which we got lucky or played very well. But we shouldn’t condition on solving alignment, because we haven’t yet.
Thus, in our current situation, the only way anthropics pushes us towards “we should work more on non-agentic systems” is if you believe “world were we still exist are more likely to have easy alignment-through-non-agentic-AIs”. Which you do believe, and I don’t. Mostly because I think in almost no worlds we have been killed by misalignment at this point. Or put another way, the developments in non-agentic AI we’re facing are still one regime change away from the dynamics that could kill us (and information in the current regime doesn’t extrapolate much to the next one).
(edit: summary: I don’t agree with this quote because I think logical beliefs shouldn’t update upon observing continued survival because there is nothing else we can observe. It is not my position that we should assume alignment is easy because we’ll die if it’s not)
I think that language in discussions of anthropics is unintentionally prone to masking ambiguities or conflations, especially wrt logical vs indexical probability, so I want to be very careful writing about this. I think there may be some conceptual conflation happening here, but I’m not sure how to word it. I’ll see if it becomes clear indirectly.
One difference between our intuitions may be that I’m implicitly thinking within a manyworlds frame. Within that frame it’s actually certain that we’ll solve alignment in some branches.
So if we then ‘condition on solving alignment in the future’, my mind defaults to something like this: “this is not much of an update, it just means we’re in a future where the past was not a death outcome. Some of the pasts leading up to those futures had really difficult solutions, and some of them managed to find easier ones or get lucky. The probabilities of these non-death outcomes relative to each other have not changed as a result of this conditioning.” (I.e I disagree with the top quote)
The most probable reason I can see for this difference is if you’re thinking in terms of a single future, where you expect to die.[1] In this frame, if you observe yourself surviving, it may seem[2] you should update your logical belief that alignment is hard (because P(continued observation|alignment being hard) is low, if we imagine a single future, but certain if we imagine the space of indexically possible futures).
Whereas I read it as only indexical, and am generally thinking about this in terms of indexical probabilities.
I totally agree that we shouldn’t update our logical beliefs in this way. I.e., that with regard to beliefs about logical probabilities (such as ‘alignment is very hard for humans’), we “shouldn’t condition on solving alignment, because we haven’t yet.” I.e that we shouldn’t condition on the future not being mostly death outcomes when we haven’t averted them and have reason to think they are.
Maybe this helps clarify my position?
On another point:
I agree with this, and I still found the current lack of goals over the world surprising and worth trying to get as a trait of superintelligent systems.
(I’m not disagreeing with this being the most common outcome)
Though after reflecting on it more I (with low confidence) think this is wrong, and one’s logical probabilities shouldn’t change after surviving in a ‘one-world frame’ universe either.
For an intuition pump: consider the case where you’ve crafted a device which, when activated, leverages quantum randomness to kill you with probability n-1/n where n is some arbitrarily large number. Given you’ve crafted it correctly, you make no logical update in the manyworlds frame because survival is the only thing you will observe; you expect to observe the 1/n branch.
In the ‘single world’ frame, continued survival isn’t guaranteed, but it’s still the only thing you could possibly observe, so it intuitively feels like the same reasoning applies...?
Yes, but
This update is screened off by “you actually looking at the past and checking whether we got lucky many times or there is a consistent reason”. Of course, you could claim that our understanding of the past is not perfect, and thus should still update, only less so. Although to be honest, I think there’s a strong case for the past clearly showing that we just got lucky a few times.
It sounded like you were saying the consistent reason is “our architectures are non-agentic”. This should only constitute an anthropic update to the extent you think more-agentic architectures would have already killed us (instead of killing us in the next decade). I’m not of this opinion. And if I was, I’d need to take into account factors like “how much faster I’d have expected capabilities to advance”, etc.
(I think I misinterpreted your question and started drafting another response, will reply to relevant portions of this reply there)