sample five random users’ forecasts, score them, and then average
Are you sure this is how their bot works? I read this more as “sample five things from the LLM, and average those predictions”. For Metaculus, the crowd is just given to you, right, so it seems crazy to sample users?
Yeah, I also think you misunderstood that part of the paper (though it really is very ambiguously written). My best guess is they are saying that they are averaging their performance metrics over 5 forecasts.
Thanks for the post!
Are you sure this is how their bot works? I read this more as “sample five things from the LLM, and average those predictions”. For Metaculus, the crowd is just given to you, right, so it seems crazy to sample users?
Thanks Neel, we agree that we misinterpreted this. We’ve removed the claim.
Thanks for making the correction!
Yeah, I also think you misunderstood that part of the paper (though it really is very ambiguously written). My best guess is they are saying that they are averaging their performance metrics over 5 forecasts.