Occasionally think about topics discussed here. Will post if I have any thoughts worth sharing.
Tomás B.
Curious for an update now that we have slight-better modals. In my brain-dead webdev use-cases, Claude 3.5 has passed some threshold of usability.
The most important thing this article did was make legible Gerard’s history on Uncyclopedia—which one of his allies will inevitably use to destroy him.
I think about anticipated future experiences. All future slices of me have the same claim to myself.
I’m not convinced you can get any utility from measure-reducing actions unless you can parley the advantage they give you into making more copies of yourself in the branch in which you survive. I am not happy about the situation, but it seems I will be forced endure whatever comes and there will never, ever be any escape.
Significant evidence for data contamination of MATH benchmark: https://arxiv.org/abs/2402.19450
Rumours are GPT-5 has been finished awhile.
The trans IQ connection is entirely explained by woman’s clothing being less itchy.
I was in the middle of writing a frustrated reply to Matthew’s comment when I realized he isn’t making very strong claims. I don’t think he’s claiming your scenario is not possible. Just that not all power seeking is socially destructive, and this is true just because most power seeking is only partially effective. Presumably he agrees that in the limit of perfect power acquisition most power seeking would indeed be socially destructive.
Every toaster a Mozart, every microwave a Newton, every waffle iron a Picasso.
Was this all done through Suno? You guys are much better at prompting it than I am.
The bet is with a friend and I will let him judge.
I agree that providing an api to God is a completely mad strategy and we should probably expect less legibility going forward. Still, we have no shortage of ridiculously smart people acting completely mad.
This seems to be as good of a place as any to post my unjustified predictions on this topic, the second of which I have a bet outstanding on at even odds.
Devin will turn out to be just a bunch of GPT-3.5/4 calls and a pile of prompts/heuristics/scaffolding so disgusting and unprincipled only a team of geniuses could have created it.
Someone will create an agent that gets 80%+ on SWE-Bench within six months.
I am not sure if 1. being true or false is good news. Both suggest we should update towards large jumps in coding ability very soon.
Regarding RSI, my intuition has always been that automating AI research will likely be easier than automating the development and maintenance of a large app like, say, Photoshop, So I don’t expect fire alarms like “non-gimmicky top 10 app on AppStore was developed entirely autonomously” before doom.
After spending several hours trying to get Gemini, GPT-4 and Claude 3 to make original jokes—I now think I may be wrong about this. Still could be RLHF, but it does seem like an intelligence issue. @janus are the base models capable of making original jokes?
Looks to me he’s training on the test set tbh. His ambition to get an IQ of 195 is admirable though.
I very much doubt this will work. I am also annoyed you don’t share your methods. If you can provide me with a procedure that raises my IQ by 20 points in a manner that convinces me this is a real increase in g, I will give you one hundred thousand dollars.
@Veedrac suppose this pans out and custom hardware is made for such networks. How much faster/larger/cheaper will this be?
This is applied to training. It’s not a quantization method.
Hsu on China’s huge human capital advantage:
Returning to Summers’ calculation, and boldly extrapolating the normal distribution to the far tail (not necessarily reliable, but let’s follow Larry a bit further), the fraction of NE Asians at +4SD (relative to the OECD avg) is about 1 in 4k, whereas the fraction of Europeans at +4SD is 1 in 33k. So the relative representation is about 8 to 1. (This assumed the same SD=90 for both populations. The Finnish numbers might be similar, although it depends crucially on whether you use the smaller SD=80.) Are these results plausible? Have a look at the pictures here of the last dozen or so US Mathematical Olympiad teams (the US Asian population percentage is about 3 percent; the most recent team seems to be about half Asians). The IMO results from 2007 are here. Of the top 15 countries, half are East Asian (including tiny Hong Kong, which outperformed Germany, India and the UK).
Incidentally, again assuming a normal distribution, there are only about 10k people in the US who perform at +4SD (and a similar number in Europe), so this is quite a select population (roughly, the top few hundred high school seniors each year in the US). If you extrapolate the NE Asian numbers to the 1.3 billion population of China you get something like 300k individuals at this level, which is pretty overwhelming.If you think AGI will not come in the next 5-10 years then I think there is a very good chance it comes from China. Also, China certainly has the engineering talent to surpass the rest of the world in basically everything, including fabs. Personally, I expect AGI in the new few years though.
Not looking good for my prediction: https://www.swebench.com/