Totally agree with you here. I think probably half of their development energy was spent getting to where GPT-4 Functions were right when Functions came out and they were probably like...oh...welp.
awg
Just seeing this, sorry. I think they could have gotten a lot of the infrastructure going even before GPT-4, just in a sort of toy fashion, but I agree, most of the development probably happened after GPT-4 became available. I don’t think long context was as necessary, because my guess is the infrastructure set up behind the scenes was already parceling out subtasks to subagents and that probably circumvented the need for super-long context, though I’m sure having longer context definitely helps.
My guess is right now they’re probably trying to optimize which sort of subtasks go to which model by A/B testing. If Claude 3 Opus is as good as people say at coding, maybe they’re using that for actual coding task output? Maybe they’re using GPT-4T or Gemini 1.5 Pro for a central orchestration model? Who knows. I feel like there are lots of conceivable ways to string this kind of thing together, and there will be more and more coming out every week now...
It took longer to get from AutoGPT to Devin than I initially thought it would, though in retrospect it only took “this long” because that’s literally about how long it takes to productize something comparatively new like this.
It does make me realize though that the baking timer has dinged and we’re about to see a lot more of this stuff coming out of the oven.
Agreed. You’ll bifurcate the mission and end up doing both things worse than you would have done if you’d just picked one and focused.
Your position seems to be one that says this is not something to be worried about/looking at. Can you explain why?
For instance, if it is a desire to train predictive systems to provide accurate information, how is 10% or even 1-2% label noise “fine” under those conditions (if, for example, we could somehow get that number down to 0%)?
Ah. Yeah, it’s been forever and a day since I used it as well. Bummer to hear they’ve succumbed to the swiping model!
Isn’t OkCupid still around? I was confused by your saying that it no longer exists. Did it change ownership or style or something?
Broken Benchmark: MMLU
This was a lot clearer, thank you.
You speak with such a confident authoritative tone, but it is so hard to parse what your actual conclusions are.
You are refuting Paul’s core conclusion that there’s a “30% chance of TAI by 2033,” but your long refutation is met with: “wait, are you trying to say that you think 30% is too high or too low?” Pretty clear sign you’re not communicating yourself properly.
Even your answer to his direct follow-up question: “Do you think 30% is too low or too high for July 2033?” was hard to parse. You did not say something simple and easily understandable like, “I think 30% is too high for these reasons: …” you say “Once criticality is achieved the odds drop to 0 [+ more words].” The odds of what drop to zero? The odds of TAI? But you seem to be saying that once criticality is reached, TAI is inevitable? Even the rest of your long answer leaves in doubt where you’re really coming down on the premise.
By the way, I don’t think I would even be making this comment myself if A) I didn’t have such a hard time trying to understand what your conclusions were myself and B) you didn’t have such a confident, authoritative tone that seemed to present your ideas as if they were patently obvious.
Well summarized.
I would be interested to know if that’s true and they are updating on that information or if they’re going “But Grusch didn’t reveal any of that classified information TO ME, John Q. Public. So it’s not even worth thinking about at all!”
Would love to hear from the people who voted to disagree with you as to why they voted that way/what they specifically disagree with here.
Edit: 2 days later and no one wants to speak up? Seriously? Seems like some evidence for Lord Dreadwar’s points #1 and #2 above.
I think the silence from major news outlets could be explained for the same reason that there has been silence from the vast majority of the LessWrong community: stigmatization and the fear of looking like crackpots.
This is very, very poor reasoning. If your position is that an unexplained phenomena + conspiracy are too wild, why would you use a different, far-less-supported unexplained phenomena + conspiracy to dismiss it?
How does all of the recent official activity fit into your worldview here? Do you have your own speculations/explanations for why, e.g., Chuck Schumer would propose such specifically-worded legislation on this topic? Does that stuff just not factor into your worldview at all (or perhaps is weighted next to nothing against your own tweeted-about intuitions)?
My sense is that discussion of this incredibly stigmatized topic will not proliferate on LW until there is some “real evidence” (whatever that ends up being) released to discuss. Which is kind of a shame, since I totally agree with you that there is seemingly far too much official activity swirling around this topic for there to be no “there” there, regardless of what “there” is.
There’s been one high-profile betting thread on this: https://www.lesswrong.com/posts/t5W87hQF5gKyTofQB/ufo-betting-put-up-or-shut-up
When you see the color red, what is that like? When you run your hand over something rough and bumpy, what is that like? When you taste salt, what is that like?
I completely agree with your post in almost all senses, and this is coming from someone who has also worked out in the real world, with real problems, trying to collect and analyze real data (K-12 education, specifically—talk about a hard environment in which to do data collection and analyzation, the data is inherently very messy, and the analyzation is very high stakes).
But this part
I think undersells the extent to which
A) the big companies have already started to understand that their data is everything and that collecting, tracking, and analyzing every piece of business data they have is the most strategic move they can make, regardless of AI
B) the fact that even current levels of AI will begin speeding up the data integration efforts by orders of magnitude (automating the low-hanging fruit for data cleaning alone could save thousands of person hours for a company)
Between those two things, I think it’s a few years at most before the conduits for sharing and analyzing this core business data are set up at scale. I work in the big tech software industry and know for a fact that this is already happening in a big way. And more and more, businesses of all sizes are getting used to the SaaS infrastructure where you pay for a company to have access to specific (or all) parts of your business such that they provide a blanket service for you that you know will help you. Think of all of the cloud security companies and how quickly that got stood up, or all the new POS platforms. I think those are more correct analogies than the massive hardware scaling that had to happen during the microchip and then PC booms. (Of course, there’s datacenter scaling that must happen, but that’s a manifestly different, more centralized concern.)
TL;DR: I think you make a lot of valuable insights about how organizations actually work with data under the current paradigms. But I don’t think this data integration dynamic will slow down take off as much as you imply.