I’m not well-versed enough to offer something that would qualify as proof, but intuitively I would say “All problems with making a tiling bot robust are also found in aligning something with human values, but aligning something with human values comes with a host of additional problems, each of which takes additional effort”. We can write a tiling bot for a grid world, but we can’t write an entity that follows human values in a grid world. Tiling bots don’t need to be complicated or clever, they might not even have to qualify as AGI—they just have to be capable of taking over the world.
All of that said, I strongly encourage the most possible caution with this post. Creating a “neutral” AGI is still a very evil act, even if it is the act with the highest expected utility.
Q5 of Yudkowsky’s post seems like an expert opinion that this sort of caution isn’t productive. What I present here seems like a natural result of combining awareness of s-risk with the low probability of good futures that Yudkowsky asserts, so I don’t think security from obscurity offers much protection. In the likely event that the evil thing is bad, it seems best to discuss it openly so that the error can be made plain for everyone and people don’t get stuck believing it is the right thing to do or worrying that others believe it is the right thing to do. In the unlikely event that it is good, I don’t want to waste time personally gathering enough evidence to become confident enough to act on it when others might have more evidence readily available.
I’m not well-versed enough to offer something that would qualify as proof, but intuitively I would say “All problems with making a tiling bot robust are also found in aligning something with human values, but aligning something with human values comes with a host of additional problems, each of which takes additional effort”. We can write a tiling bot for a grid world, but we can’t write an entity that follows human values in a grid world. Tiling bots don’t need to be complicated or clever, they might not even have to qualify as AGI—they just have to be capable of taking over the world.
Q5 of Yudkowsky’s post seems like an expert opinion that this sort of caution isn’t productive. What I present here seems like a natural result of combining awareness of s-risk with the low probability of good futures that Yudkowsky asserts, so I don’t think security from obscurity offers much protection. In the likely event that the evil thing is bad, it seems best to discuss it openly so that the error can be made plain for everyone and people don’t get stuck believing it is the right thing to do or worrying that others believe it is the right thing to do. In the unlikely event that it is good, I don’t want to waste time personally gathering enough evidence to become confident enough to act on it when others might have more evidence readily available.