I think you’ve already given several examples:
Should I count the people I spoke to for 15 minutes for free at the imbue potlucks? That was year-changing for at least one. But if I count them I have to count all of the free people ever, even those who were uninvested. Then people will respond “Okok, how many bounties have you taken on?” Ok sure, but should I include the people who I told “Your case is not my specialty, idk if i’ll be able to help, but I’m interested in trying for a few hours if you’re into it”? Should I include the people who had an amazing session or two but haven’t communicated in two months? Should I include the people who are being really unagentic and slow?
It would already be informative if you put numbers on each of these questions (i.e. “how often does talking for 15 minutes accomplish something”, “how many bounties have you taken on in/outside of your specialty”, “what percent of your clients are ‘unagentic and slow’ (and what does this actually mean)”). Probably one could do much better by generating several metrics that one would expect to be most useful (or top N%tile useful) and share each of them.
I’m somewhat surprised to see the distribution of predictions for 75% on FrontierMath. Does anyone want to bet money on this, at say, 2:1 odds (my two dollars that this won’t happen against your one that it will)?
(Edit: I guess the wording doesn’t exclude something like AlphaProof, which I wasn’t considering. I think I might bet 1:1 odds if systems targeted at math are included, as opposed to general purpose models?)