So a number of the comments have pointed in the direction of my concerns with what I interpret to be the underlying assumption of this post, namely that it is at all possible to work with something that is not at least touched by humans enough that implicit, partial modeling of humans will not happen as a result of trying to build AI, general or narrow, even if restricted to a small domain. This is not to fail to acknowledge that much current AI safety work tends to be extremely human-centric, going so far as to rely on uniquely human capabilities (at least unique among known things), and that this is in itself a problem for many of the reasons you lay out, but I think it would be a mistake to think we can somehow get away from humans in building AGI.
The reality is that humans are involved in the work of building AGI, involved in the design and construction of the hardware they will run on, the data sets they will use, etc., and even if we think we’ve removed the latent human-shaped patterns from our algorithms, hardware, and data, we should strongly suspect we are mistaken because humans are tremendously bad at noticing when they are assuming something true of the world when it is actually true of their understanding, i.e. I would expect it to be more likely that humans would fail to notice their latent presence in a “human-model-free” AI than for the AI to actually be free of human modeling.
Thus to go down the direction of working on building AGI without human models risks failure because we failed to deal with the AGI picking up on the latent patterns of humanity within it. This is not to say that we should stick to a human-centric approach, because it has many problems as you’ve described, but to try to avoid humans is to ignore making our systems robust to the kinds of interference from humans that can push us away from the goal of safe AI, especially unexpected and unplanned for interference due to hidden human influence. If we instead build expecting to deal with and be robust to the influence of humans, we stand a much better chance of producing safe AI than either being human-centric or overly ignoring humans.
So a number of the comments have pointed in the direction of my concerns with what I interpret to be the underlying assumption of this post, namely that it is at all possible to work with something that is not at least touched by humans enough that implicit, partial modeling of humans will not happen as a result of trying to build AI, general or narrow, even if restricted to a small domain. This is not to fail to acknowledge that much current AI safety work tends to be extremely human-centric, going so far as to rely on uniquely human capabilities (at least unique among known things), and that this is in itself a problem for many of the reasons you lay out, but I think it would be a mistake to think we can somehow get away from humans in building AGI.
The reality is that humans are involved in the work of building AGI, involved in the design and construction of the hardware they will run on, the data sets they will use, etc., and even if we think we’ve removed the latent human-shaped patterns from our algorithms, hardware, and data, we should strongly suspect we are mistaken because humans are tremendously bad at noticing when they are assuming something true of the world when it is actually true of their understanding, i.e. I would expect it to be more likely that humans would fail to notice their latent presence in a “human-model-free” AI than for the AI to actually be free of human modeling.
Thus to go down the direction of working on building AGI without human models risks failure because we failed to deal with the AGI picking up on the latent patterns of humanity within it. This is not to say that we should stick to a human-centric approach, because it has many problems as you’ve described, but to try to avoid humans is to ignore making our systems robust to the kinds of interference from humans that can push us away from the goal of safe AI, especially unexpected and unplanned for interference due to hidden human influence. If we instead build expecting to deal with and be robust to the influence of humans, we stand a much better chance of producing safe AI than either being human-centric or overly ignoring humans.