any more than “sorting pebbles into prime heaps” means “doing whatever pebblesorters care about”
How specifically are these two things different? I can imagine some differences, but I am not sure which one did you mean.
For example, if you meant that sorting pebbles is what they do, but it’s not their terminal value and certainly not their only value (just like humans build houses, but building houses is not our terminal value), in that case you fight the hypothetical.
If you meant that in a different universe pebblesorter-equivalents would evolve differently and wouldn’t care about sorting pebbles into prime heaps, then the pebblesorter-equivalents wouldn’t be pebblesorters. Analogically, there could be some human-equivalents in a paraller universe with inhuman values; but they wouldn’t be humans.
Or perhaps you meant the difference between extrapolated values and “what now feels like a reasonable heuristics”. Or...
What I meant is that “prime heaps” are not about pebblesorters. There are exactly zero pebblesorters in the definitions of “prime”, “pebble” and “heap”.
If I told you to sort pebbles into prime heaps, the first thing you’d do is calculate some prime numbers. If I told you to do whatever pebblesorters care about, the first thing you’d do is find one and interrogate it to find out what they valued.
If I gave you a source code of a Friendly AI, all you’d have to do would be to run the code.
If I told you to do whatever human CEV is, you’d have to find and interrogate some humans.
The difference is that by analysing the code of the Friendly AI you could probably learn some facts about humans, while by learning about prime numbers you don’t learn about the pebblesorters. But that’s a consequence of humans caring about humans, and pebblesorters not caring about pebblesorters. Our values are more complex than prime numbers and include caring about ourselves… which is probably likely to happen to a species created by evolution.
How specifically are these two things different? I can imagine some differences, but I am not sure which one did you mean.
For example, if you meant that sorting pebbles is what they do, but it’s not their terminal value and certainly not their only value (just like humans build houses, but building houses is not our terminal value), in that case you fight the hypothetical.
If you meant that in a different universe pebblesorter-equivalents would evolve differently and wouldn’t care about sorting pebbles into prime heaps, then the pebblesorter-equivalents wouldn’t be pebblesorters. Analogically, there could be some human-equivalents in a paraller universe with inhuman values; but they wouldn’t be humans.
Or perhaps you meant the difference between extrapolated values and “what now feels like a reasonable heuristics”. Or...
What I meant is that “prime heaps” are not about pebblesorters. There are exactly zero pebblesorters in the definitions of “prime”, “pebble” and “heap”.
If I told you to sort pebbles into prime heaps, the first thing you’d do is calculate some prime numbers. If I told you to do whatever pebblesorters care about, the first thing you’d do is find one and interrogate it to find out what they valued.
If I gave you a source code of a Friendly AI, all you’d have to do would be to run the code.
If I told you to do whatever human CEV is, you’d have to find and interrogate some humans.
The difference is that by analysing the code of the Friendly AI you could probably learn some facts about humans, while by learning about prime numbers you don’t learn about the pebblesorters. But that’s a consequence of humans caring about humans, and pebblesorters not caring about pebblesorters. Our values are more complex than prime numbers and include caring about ourselves… which is probably likely to happen to a species created by evolution.