arabaga

Karma: 113

arabaga Jun 19, 2024, 8:27 AM
7 points
1
in reply to: maxnadeau’s comment on: Getting 50% (SoTA) on ARC-AGI with GPT-4o
I agree that there is a good chance that this solution is not actually SOTA, and that it is important to distinguish the three sets.
There’s a further distinction between 3 guesses per problem (which is allowed according to the original specification as Ryan notes), and 2 guesses per problem (which is currently what the leaderboard tracks [rules]).
Some additional comments / minor corrections:
The past SOTA got [we don’t know] on the first, 52% on the second, and 34% on the third.
AFAICT, the current SOTA-on-the-private-test-set with 3 submissions per problem is 37%, and that solution scores 54% on the public eval set.
The SOTA-on-the-public-eval-set is at least 60% (see thread).
Apparently, lots of people get worse performance on the public test set than the private one
I think this is a typo and you mean the opposite.
From looking into this a bit, it seems pretty clear that the public eval set and the private test set are not IID. They’re “intended” to be the “same” difficulty, but AFAICT this essentially just means that they both consist of problems that are feasible for humans to solve.
It’s not the case that a fixed set of eval/test problems were created and then randomly distributed between the public eval set and private test set. At your link, Chollet says “the [private] test set was created last” and the problems in it are “more unique and more diverse” than the public eval set. He confirms that here:
This is *also* likely in part due to the fact that the eval set contains more “easy” tasks. The eval set and test set were not calibrated for difficulty. So while all tasks across the board are feasible for humans, the tasks in the test set may be harder on average. This was not intentional, and is likely either a fluke (there are only 100 tasks in the test set) or due to the test set having been created last.”
Bottom line: I would expect Ryan’s solution to score significantly lower than 50% on the private test set.

arabaga Apr 1, 2024, 3:21 PM
14 points
3
in reply to: Tomás B.’s comment on: The Story of “I Have Been A Good Bing”
You can directly write/paste your own lyrics (Custom Mode). And v3 came out fairly recently, which is better in general, in case you haven’t tried it in a while.

arabaga Apr 1, 2024, 3:13 PM
12 points
1
in reply to: MondSemmel’s comment on: The Story of “I Have Been A Good Bing”
They seem to be created by https://app.suno.ai/ And yes, it is really easy to create songs—you can either have it create the lyrics for you based on a prompt (the default), or you can write/paste the lyrics yourself (Custom Mode). Songs can be up to ~2 minutes long I think.

arabaga Jan 12, 2024, 5:05 AM
1 point
0
in reply to: Unnamed’s comment on: Prediction markets are consistently underconfident. Why?
Yeah, this seems to be a big part of it. If you instead switch it to the probability at market midpoint, Manifold is basically perfectly calibrated, and Kalshi is if anything overconfident (Metaculus still looks underconfident overall).

arabaga Nov 21, 2023, 4:39 AM
1 point
0
in reply to: M. Y. Zuo’s comment on: OpenAI Staff (including Sutskever) Threaten to Quit Unless Board Resigns
No, the letter has not been falsified.
Just to clarify: ~700 out of ~770 OpenAI employees have signed the letter (~90%)
Out of the 10 authors of the autointerpretability paper, only 5 have signed the letter. This is much lower than the average rate. One out of the 10 is no longer at OpenAI, so couldn’t have signed it, so it makes sense to count this as ⁵⁄₉ rather than ⁵⁄₁₀. Either way, it’s still well below the average rate.

arabaga Nov 21, 2023, 12:59 AM
1 point
0
in reply to: gwern’s comment on: OpenAI Staff (including Sutskever) Threaten to Quit Unless Board Resigns
Ah, nice catch, I’ll update my comment.

arabaga Nov 20, 2023, 11:30 PM
15 points
6
in reply to: Robert_AIZI’s comment on: OpenAI Staff (including Sutskever) Threaten to Quit Unless Board Resigns
There is an updated list of 702 who have signed the letter (as of the time I’m writing this) here: https://www.nytimes.com/interactive/2023/11/20/technology/letter-to-the-open-ai-board.html (direct link to pdf: https://static01.nyt.com/newsgraphics/documenttools/f31ff522a5b1ad7a/9cf7eda3-full.pdf)
Nick Cammarata left OpenAI ~8 weeks ago, so he couldn’t have signed the letter.
Out of the remaining 6 core research contributors:
- ³⁄₆ have signed it: Steven Bills, Dan Mossing, and Henk Tillman
- ³⁄₆ have still not signed it: Leo Gao, Jeff Wu, and William Saunders
Out of the non-core research contributors:
- ²⁄₃ signed it: Gabriel Goh and Ilya Sutskever
- ¹⁄₃ still have not signed it: Jan Leike
That being said, it looks like Jan Leike has tweeted that he thinks the board should resign: https://twitter.com/janleike/status/1726600432750125146
And that tweet was liked by Leo Gao: https://twitter.com/nabla_theta/likes
Still, it is interesting that this group is clearly underrepresented among people who have actually signed the letter.
Edit: Updated to note that Nick Cammarata is no longer at OpenAI, so he couldn’t have signed the letter. For what it’s worth, he has liked at least one tweet that called for the board to resign: https://twitter.com/nickcammarata/likes

arabaga Nov 19, 2023, 2:46 AM
29 points
21
on: Altman firing retaliation incoming?
It seems like a strategy by investors or even large tech companies to create a self-fulfilling prophecy to create a coalition of OpenAI employees, when there previously was none.
How is this more likely than the alternative, which is simply that this is an already-existing coalition that supports Sam Altman as CEO? Considering that he was CEO until he was suddenly removed yesterday, it would be surprising if most employees and investors didn’t support him. Unless I’m misunderstanding what you’re claiming here?

arabaga Oct 15, 2023, 1:08 AM
6 points
0
in reply to: ChristianKl’s comment on: The Gods of Straight Lines
If you follow the link, under the section “Free Market Seen as Best, Despite Inequality”, Vietnam is the country with the highest agreement by far with the statement “Most people are better off in a free market economy, even though some people are rich and some are poor” (95%!)
That being said, while it is the most pro-capitalism country, it is clearly not the most capitalist country (although it’s not that bad, 72nd out of 176 countries ranked: https://www.heritage.org/index/ranking), and it would likely be more capitalist today if South Vietnam had won.

arabaga Aug 24, 2023, 7:34 PM
4 points
0
on: State of Generally Available Self-Driving
Small typo/correction: Waymo and Cruise each claim 10k rides per week, not riders.

arabaga Aug 15, 2023, 7:06 PM
30 points
18
on: A short calculation about a Twitter poll
Note that another way of phrasing the poll is:
Everyone responding to this poll chooses between a blue pill or red pill.
- if you choose red pill, you live
- if you choose blue pill, you die unless >50% of ppl choose blue pill
Which do you choose?
I bet the poll results would be very different if it was phrased this way.
What links here?
- Ten variations on red-pill-blue-pill by Richard_Kennaway (Aug 19, 2023, 4:34 PM; 23 points)
- Richard_Kennaway's comment on A short calculation about a Twitter poll by Ege Erdil (Aug 17, 2023, 8:39 PM; 2 points)

arabaga Apr 26, 2023, 7:37 PM
7 points
6
in reply to: dsj’s comment on: Can we evaluate the “tool versus agent” AGI prediction?
Does anyone doubt that, with at most a few very incremental technological steps from today, one could train a multimodal, embodied large language model (“RobotGPT”), to which you could say, “please fill up the cauldron”, and it would just do it, using a reasonable amount of common sense in the process — not flooding the room, not killing anyone or going to any other extreme lengths, and stopping if asked?
Indeed, isn’t PaLM-SayCan an early example of this?

arabaga Sep 16, 2022, 10:48 AM
2 points
1
in reply to: ESRogs’s comment on: How should DeepMind’s Chinchilla revise our AI forecasts?
To be precise, Alphabet owns DeepMind. Google and DeepMind are sister companies.

So it’s possible for something to benefit Google without benefiting DeepMind, or vice versa.

arabaga Aug 25, 2022, 7:28 PM
0 points
3
in reply to: Viliam’s comment on: [Review] The Problem of Political Authority by Michael Huemer
“A scenario where a group of human thugs [rips and devours your entire family] is still okay-ish in some sense, because no state was involved; at least you have avoided the horrors of non-consensual taxation!”

Sorry, this doesn’t pass the ITT.

arabaga Aug 25, 2022, 7:26 PM
−1 points
−1
in reply to: deepthoughtlife’s comment on: [Review] The Problem of Political Authority by Michael Huemer
Yes, anarcho-capitalists accept that ~everyone will hire a security agency. This isn’t a refutation of anarchism.

The point is that security agencies have incentive to compete on quality, whereas current governments don’t (as much), so the quality of security agencies would be higher than the quality of governments today.