It seems like Evolution did not “try” to have humans aligned to status. It might have been a proxy for inclusive genetic fitness, but if so, I would not say that evolution “succeeded” at aligning humans. My guess is it’s not a great proxy for inclusive genetic fitness in the modern environment (my guess is it’s weakly correlated with reproductive success, but clearly not as strongly as the relative importance that humans assign to it would indicate if it was a good proxy for inclusive genetic fitness).
Of course, my guess is after the fact, for any system that has undergone some level of self-reflection and was put under selection that causes it to want coherent things, you will be able to identify some patterns in its goals. The difficult part in aligning AIs is in being able to choose what those patterns are, not being able to cohere some patterns at the end of it. My guess is with any AI system, if we were to survive and got to observe it as its made its way to coherence, we would be able to find some robust patterns in its goals (my guess is in the case of LLMs something related to predicting text, but who knows), but that doesn’t give me much solace in the AI treating me well, or sharing my goal.
A super relevant point. If we try to align our AIs with something, and they end up robustly aligned with some other proxy thing, we definitely didn’t succeed.
But, it’s still impressive to me that evolution hooked up general planning capabilities to a (learned) abstract concept, at all.
Like there’s this abstract concept, which varies a lot in it’s particulars, from environment to environment. And which the brain has to learn to detect it aside from the particulars. Somehow the genome is able to construct the brain such that the motivation circuitry can pick out that abstract concept, after is it learned (or as it is being learned) and use that as a major criterion of the planning and decision machinery. And the end result is that the organism as a whole ends up not that far from a [abstract concept]-maximizer.
This is a lot more than I might expect evolution to be able to pull off, if I thought that our motivations were a hodge-podge of adaptions that cohere (as much as they do) into godshatter.
My point is NOT that evolution killed it, alignment is easy. My point is that evolution got a lot further than I would have guessed was possible.
It seems like Evolution did not “try” to have humans aligned to status. It might have been a proxy for inclusive genetic fitness, but if so, I would not say that evolution “succeeded” at aligning humans. My guess is it’s not a great proxy for inclusive genetic fitness in the modern environment (my guess is it’s weakly correlated with reproductive success, but clearly not as strongly as the relative importance that humans assign to it would indicate if it was a good proxy for inclusive genetic fitness).
Of course, my guess is after the fact, for any system that has undergone some level of self-reflection and was put under selection that causes it to want coherent things, you will be able to identify some patterns in its goals. The difficult part in aligning AIs is in being able to choose what those patterns are, not being able to cohere some patterns at the end of it. My guess is with any AI system, if we were to survive and got to observe it as its made its way to coherence, we would be able to find some robust patterns in its goals (my guess is in the case of LLMs something related to predicting text, but who knows), but that doesn’t give me much solace in the AI treating me well, or sharing my goal.
A super relevant point. If we try to align our AIs with something, and they end up robustly aligned with some other proxy thing, we definitely didn’t succeed.
But, it’s still impressive to me that evolution hooked up general planning capabilities to a (learned) abstract concept, at all.
Like there’s this abstract concept, which varies a lot in it’s particulars, from environment to environment. And which the brain has to learn to detect it aside from the particulars. Somehow the genome is able to construct the brain such that the motivation circuitry can pick out that abstract concept, after is it learned (or as it is being learned) and use that as a major criterion of the planning and decision machinery. And the end result is that the organism as a whole ends up not that far from a [abstract concept]-maximizer.
This is a lot more than I might expect evolution to be able to pull off, if I thought that our motivations were a hodge-podge of adaptions that cohere (as much as they do) into godshatter.
My point is NOT that evolution killed it, alignment is easy. My point is that evolution got a lot further than I would have guessed was possible.