I haven’t look deeply at what the % on the ML benchmarks actually mean. On the one hand it would be a bit weird to me if in 2030 we still have not made enough progress on them, given the current rate. On the other hand, I trust the authors in that it should be AGI-ish to pass those benchmarks, and then I don’t want to bet money on something far into the future if money might not matter as much then. (Also, without considering money mattering less or the fact that the money might not be delivered in 2030 etc., I think anyone taking the 2026 bet should take the 2030 bet since if you’re 50⁄50 in 2026 you’re probably 75:25 with 4 extra years).
The more rational thing should then to take the bet for 2026 when money still matters, though apart from the ML benchmarks there is this dishwashing thing where the conditions of the bet are super tough and I don’t imagine anyone doing all the reliability tests filming a dishwasher etc. in 3.5y. And then for Tesla I feel the same about those big errors every 100k miles. Like, 1) why only Tesla? 2) wouldn’tmost humans would make risky blunders on such long distances? 3) would anyone really do all those tests on a Tesla?
I’ll have another look at the ML benchmarks, but on the mean time it seems that we should do other odds because of Tesla + dishwasher.
I haven’t look deeply at what the % on the ML benchmarks actually mean. On the one hand it would be a bit weird to me if in 2030 we still have not made enough progress on them, given the current rate. On the other hand, I trust the authors in that it should be AGI-ish to pass those benchmarks, and then I don’t want to bet money on something far into the future if money might not matter as much then. (Also, without considering money mattering less or the fact that the money might not be delivered in 2030 etc., I think anyone taking the 2026 bet should take the 2030 bet since if you’re 50⁄50 in 2026 you’re probably 75:25 with 4 extra years).
The more rational thing should then to take the bet for 2026 when money still matters, though apart from the ML benchmarks there is this dishwashing thing where the conditions of the bet are super tough and I don’t imagine anyone doing all the reliability tests filming a dishwasher etc. in 3.5y. And then for Tesla I feel the same about those big errors every 100k miles. Like, 1) why only Tesla? 2) wouldn’tmost humans would make risky blunders on such long distances? 3) would anyone really do all those tests on a Tesla?
I’ll have another look at the ML benchmarks, but on the mean time it seems that we should do other odds because of Tesla + dishwasher.