What Vladimir said. The actual variable in the AI’s programming can’t be magically linked directly to the number of iron atoms in the atmosphere; it’s linked to the output of a sensor, or many sensors. There are always at least two possible failure modes- either the AI could suborn the sensor itself, or wirehead itself to believe the sensor has the correct value. These are not trivial failure modes; they’re some of the largest hurdles that Eliezer sees as integral to the development of FAI.
What Vladimir said. The actual variable in the AI’s programming can’t be magically linked directly to the number of iron atoms in the atmosphere; it’s linked to the output of a sensor, or many sensors. There are always at least two possible failure modes- either the AI could suborn the sensor itself, or wirehead itself to believe the sensor has the correct value. These are not trivial failure modes; they’re some of the largest hurdles that Eliezer sees as integral to the development of FAI.
Yes, if the AI doesn’t have a decent ontology or image of the world, this method likely fails.
But again, this seems strictly easier than FAI: we need to define physics and position, not human beings, and not human values.