Hi! I’ve been an outsider in this community for a while effectively for arguing exactly this: yes, values are robust. Before I set off all the ‘quack’ filters, I did manage to persuade Richard Ngo that an AGI wouldn’t want to kill humans right away.
I think that for embodied agents, convergent instrumental subgoals very well likely lead to alignment.
I think this is definitely not true if we imagine an agent living outside of a universe it can wholly observe and reliably manipulate, but the story changes dramatically when we make the agent an embodied agent in our own universe.
Our universe is so chaotic and unpredictable that actions increasing the likelihood of direct progress towards a goal will become increasingly difficult to compute beyond some time horizon, and the threat of death is going to be present for any agent of any size. If you can’t reliably predict something like, ‘the position of the moon 3,000 years from tomorrow’ due to the numerical error getting worse over time, i don’t see how it’s possible to compute far more complicated queries about possible futures involving billions of agents.
Hence I suspect that the best way to maximize long term progress towards any goal is to increase the number and diversity of agents that have an interest in keeping you alive. The easiest, simplest way to do this is with a strategy of identifying agents whose goals are roughly compatible with yours, identifying the convergent instrumental subgoals of those agents, and helping those agents on their path. This is effectively a description of being loving: figuring out how you can help those around you grow and develop.
There is also a longer argument which says, ‘instrumental rationality, once you expand the scope, turns into something like religion’
Fine, replace the agents with rocks. The problem still holds.
There’s no closed form solution for the 3-body problem; you can only numerically approximate the future, with decreasing accuracy as time goes on. There are far more than 3 bodies in the universe relevant to the long term survival of an AGI that could die in any number of ways because it’s made of many complex pieces that can all break or fail.