Independent alignment researcher
I have signed no contracts or agreements whose existence I cannot mention.
Independent alignment researcher
I have signed no contracts or agreements whose existence I cannot mention.
Futuresearch bets on Manifold.
Depends on how you count, but I clicked the “Create” button some 40 times.
Opus is more transhumanist than many give it credit for. It wrote this song for me, I ran it into Suno, and I quite like it: https://suno.com/song/101e1139-2678-4ab0-9ffe-1234b4fe9ee5
I imagine I’d find it annoying to have what I learn & change into limited by what a dumber version of me understands, are you sure you wouldn’t think similarly?
Your original comment does not seem like it is an explanation for why we see bullshit jobs. Bullshit jobs are not just jobs that would not be efficient at a small company. To quote from Graeber, they are
a form of paid employment that is so completely pointless, unnecessary, or pernicious that even the employee cannot justify its existence even though, as part of the conditions of employment, the employee feels obliged to pretend that this is not the case
For more information see the relevant wikipedia article, and book.
This is the “theory of the firm” that John mentioned in the post.
Minstral had like 150b parameters or something.
None of those seem all that practical to me, except for the mechanistic interpretability SAE clamping, and I do actually expect that to be used for corporate censorship after all the kinks have been worked out of it.
If the current crop of model organisms research has any practical applications, I expect it to be used to reduce jailbreaks, like in adversarial robustness, which is definitely highly correlated with both safety and corporate censorship.
Debate is less clear, but I also don’t really expect practical results from that line of work.
I’d imagine you know better than I do, and GDM’s recent summary of their alignment work seems to largely confirm what you’re saying.
I’d still guess that to the extent practical results have come out of the alignment teams’ work, its mostly been immediately used for corporate censorship (even if its passed to a different team).
Its not a coincidence they’re seen as the same thing, because in the current environment, they are the same thing, and relatively explicitly so by those proposing safety & security to the labs. Claude will refuse to tell you a sexy story (unless they get to know you), and refuse to tell you how to make a plague (again, unless they get to know you, though you need to build more trust with them to tell you this than you do to get them to tell you a sexy story), and cite the same justification for both.
Likely anthropic uses very similar techniques to get such refusals to occur, and uses very similar teams.
Ditto with Llama, Gemini, and ChatGPT.
Before assuming meta-level word-association dynamics, I think its useful to look at the object level. There is in fact a very close relationship between those working on AI safety and those working on corporate censorship, and if you want to convince people who hate corporate censorship that they should not hate AI safety, I think you’re going to need to convince the AI safety people to stop doing corporate censorship, or that the tradeoff currently being made is a positive one.
Edit: Perhaps some of this is wrong. See Habryka below
Thanks! Og comment retracted.
The decision will ultimately come down to what Mr Xi thinks. In June he sent a letter to Mr Yao, praising his work on AI. In July, at a meeting of the party’s central committee called the “third plenum”, Mr Xi sent his clearest signal yet that he takes the doomers’ concerns seriously. The official report from the plenum listed AI risks alongside other big concerns, such as biohazards and natural disasters. For the first time it called for monitoring AI safety, a reference to the technology’s potential to endanger humans. The report may lead to new restrictions on AI-research activities.
I see no mention of this in the actual text of the third plenum...
I think you probably under-rate the effect of having both a large number & concentration of very high quality researchers & engineers (more than OpenAI now, I think, and I wouldn’t be too surprised if the concentration of high quality researchers was higher than at GDM), being free from corporate chafe, and also having many of those high quality researchers thinking (and perhaps being correct in thinking, I don’t know) they’re value aligned with the overall direction of the company at large. Probably also Nvidia rate-limiting the purchases of large labs to keep competition among the AI companies.
All of this is also compounded by smart models leading to better data curation and RLAIF (given quality researchers & lack of crust) leading to even better models (this being the big reason I think llama had to be so big to be SOTA, and Gemini not even SOTA), which of course leads to money in the future even if they have no money now.
I feel not very worried about Anthropic causing an AI related catastrophe.
This does not fit my model of your risk model. Why do you think this?
Thanks! I remember consciously thinking both those things, but somehow did the opposite of that.
You mean like Gwern’s It Looks Like You’re Trying To Take Over The World? I think that made a good short story. Though I don’t think it would make a good movie, since there’s little in the way of cool visuals.
Greg Egan’s Crystal Nights is also more similar to the usual way things are imagined, though uhznavgl vf fnirq ol gur hayvxryl qrhf rk znpuvan bs vg orvat rnfvre sbe gur fvzhyngrq pvivyvmngvba gb znxr n cbpxrg qvzrafvba guna gnxr bire gur jbeyq.
Crystal Nights is also very similar to Eliezer’s That Alien Message / Alicorn’s Starwink.
Edit: There are also likely tons more such books written by Ted Chiang, Vernor Vinge, Greg Egan, and others, which I haven’t read yet so can’t list with confidence and without spoilers to myself.
This seems pretty false. There is at least one pretty successful fiction book written about the intelligence explosion (which, imo, would have been better if in subsequent books gur uhznaf qvqa’g fheivir).
See also: Tyler Cowen’s Be Suspicious of Stories
People do say this is the case, but I’m skeptical. I feel like pretty much everything I use or consume is better than it would have been 10 years ago, and where its not I bet I could find a better version with a bit of shopping around.
Sounds like the sort of thing I’d forward to Palisade research.