Meta-level comment: I don’t think it’s good to dismiss original arguments immediately and completely.
Object-level comment:
Neither of those claims has anything to do with humans being the “winners” of evolution.
I think it might be more complicated than that:
We need to define what “a model produced by a reward function” means, otherwise the claims are meaningless. Like, if you made just a single update to the model (based on the reward function), calling it “a model produced by the reward function” is meaningless (’cause no real optimization pressure was applied). So we do need to define some goal of optimization (which determines who’s a winner and who’s a loser).
We need to argue that the goal is sensible. I.e. somewhat similar to a goal we might use while training our AIs.
Here’s some things we can try:
We can try defining all currently living species as winners. But is it sensible? Is it similar to a goal we would use while training our AIs? “Let’s optimize our models for N timesteps and then use all surviving models regardless of any other metrics” ← I think that’s not sensible, especially if you use an algorithm which can introduce random mutations into the model.
We can try defining species which avoided substantial changes for the longest time as winners. This seems somewhat sensible, because those species experienced the longest optimization pressure. But then humans are not the winners.
We can define any species which gained general intelligence as winners. Then humans are the only winners. This is sensible because of two reasons. First, with general intelligence deceptive alignment is possible: if humans knew that Simulation Gods optimize organisms for some goal, humans could focus on that goal or kill all competing organisms. Second, many humans (in our reality) value creating AGI more than solving any particular problem.
I think the later is the strongest counter-argument to “humans are not the winners”.
Meta-level comment: I don’t think it’s good to dismiss original arguments immediately and completely.
Object-level comment:
I think it might be more complicated than that:
We need to define what “a model produced by a reward function” means, otherwise the claims are meaningless. Like, if you made just a single update to the model (based on the reward function), calling it “a model produced by the reward function” is meaningless (’cause no real optimization pressure was applied). So we do need to define some goal of optimization (which determines who’s a winner and who’s a loser).
We need to argue that the goal is sensible. I.e. somewhat similar to a goal we might use while training our AIs.
Here’s some things we can try:
We can try defining all currently living species as winners. But is it sensible? Is it similar to a goal we would use while training our AIs? “Let’s optimize our models for N timesteps and then use all surviving models regardless of any other metrics” ← I think that’s not sensible, especially if you use an algorithm which can introduce random mutations into the model.
We can try defining species which avoided substantial changes for the longest time as winners. This seems somewhat sensible, because those species experienced the longest optimization pressure. But then humans are not the winners.
We can define any species which gained general intelligence as winners. Then humans are the only winners. This is sensible because of two reasons. First, with general intelligence deceptive alignment is possible: if humans knew that Simulation Gods optimize organisms for some goal, humans could focus on that goal or kill all competing organisms. Second, many humans (in our reality) value creating AGI more than solving any particular problem.
I think the later is the strongest counter-argument to “humans are not the winners”.