In an actual environment the goal directed agent may get copied over and over while the twitcher won’t. This is what evolution did and humans seeking utility will likely copy the agents that appear to do what they want.
And I think this infers a solution for AI alignment.
Suppose you have a simulation for your n+1 generations of agent. If the simulation is designed to test the utility of the agent with tight bounds, and the competing agents are all using a computer architecture that is comparable and bounded (fixed hardware, similar software backend), it will tend to pick agents that put all their computational nodes into efficiently solving the problem.
That is there won’t be room for nodes to plan things like self ego and internal goals and so on. You want agents that don’t have capacity for any of that.
In an actual environment the goal directed agent may get copied over and over while the twitcher won’t. This is what evolution did and humans seeking utility will likely copy the agents that appear to do what they want.
And I think this infers a solution for AI alignment.
Suppose you have a simulation for your n+1 generations of agent. If the simulation is designed to test the utility of the agent with tight bounds, and the competing agents are all using a computer architecture that is comparable and bounded (fixed hardware, similar software backend), it will tend to pick agents that put all their computational nodes into efficiently solving the problem.
That is there won’t be room for nodes to plan things like self ego and internal goals and so on. You want agents that don’t have capacity for any of that.