Expanding a bit on why: I think this will fail because the house-building AI won’t actually be very good at instrumental reasoning, so there’s nothing for the sticky goals hypothesis to make use of.
To be clear, I think I basically agree with everything in the comment chain above. Nevertheless, I would argue that these sorts of experiments are worth running anyway, for the sorts of reasons that I outline here.
Expanding a bit on why: I think this will fail because the house-building AI won’t actually be very good at instrumental reasoning, so there’s nothing for the sticky goals hypothesis to make use of.
To be clear, I think I basically agree with everything in the comment chain above. Nevertheless, I would argue that these sorts of experiments are worth running anyway, for the sorts of reasons that I outline here.