Is anyone thinking about how to scale up human feedback collection by several orders of magnitudes? A lot of alignment proposals aren’t focused on the social choice theory questions, which I’m okay with, but I’m worried that there may be large constant factors in the scalability of human feedback strategies like amplification/debate, such that there could be big differences between collecting 50k trajectories versus say 50-500M. Obviously cost/logistics are a giant bottleneck here, but I’m wondering about what other big challenges might be (especially if we could make intellectual progress on this before we may need to)
Is anyone thinking about how to scale up human feedback collection by several orders of magnitudes? A lot of alignment proposals aren’t focused on the social choice theory questions, which I’m okay with, but I’m worried that there may be large constant factors in the scalability of human feedback strategies like amplification/debate, such that there could be big differences between collecting 50k trajectories versus say 50-500M. Obviously cost/logistics are a giant bottleneck here, but I’m wondering about what other big challenges might be (especially if we could make intellectual progress on this before we may need to)