Other examples include buying poor quality food and then having to pay for medical care, buying a cheap car that costs more in repairs, payday loans, ect.
Chris_Leong
Unless you insist that this system is helpful for the powered privileges such as king, as a reference of the public opinion, that will be legit?
That would make the domain of checkable tasks rather small.
That said, it may not matter depending on the capability you want to measure.If you want to make the AI hack a computer to turn the entire screen green and it skips a pixel so as to avoid completing the task, well it would have still demonstrated that it possesses the dangerous capability, so it has no reason to sandbag.
On the other hand, if you are trying to see if it has a capability that you wish it use, it can still sandbag.
I’d strongly recommend spending some time in the Bay area (or London as a second best option). Spending time in these spaces will help you build your model of the space.
You may also find this document I created on AI Safety & Entrepreneurship useful.
One of the biggest challenges here is that subsidies designed to be support alignment could be snagged by AI companies misrepresenting capabilities works as safety work. Do you think the government has the ability to differentiate between these?
Become a member of LessWrong or the AI Alignment Forum
I think the goal is for the alignment forum to be somewhat selective in terms of who can comment.
(Removed some of my comments b/c I just noticed the clarification that you meant average member of the EA forum/Less Wrong. I would suggest changing the title of your post though).
For the record, I see the new field of “economics of transformative AI” as overrated.
Economics has some useful frames, but it also tilts people towards being too “normy” on the impacts of AI and it doesn’t have a very good track record on advanced AI so far.
I’d much rather see multidisciplinary programs/conferences/research projects, including economics as just one of the perspectives represented, then economics of transformative AI qua economics of transformative AI.(I’d be more enthusiastic about building economics of transformative AI as a field if we were starting five years ago, but these things take time and it’s pretty late in the game now, so I’m less enthusiastic about investing field-building effort here and more enthusiastic about pragmatic projects combining a variety of frames).
Points for creativity, though I’m still somewhat skeptical about the viability of this strategy,
My intuition would be that models learn to implement more general templates as well.
It seems to me that “vibe checks” for how smart a model feels are easily gameable by making it have a better personality.
It’s not clear to me that personality is completely separate from capabilities, especially with inference time reasoning.Also, what do you mean by “bigger templates”?
I wonder about the extent to which having an additional level of selection helps.
High school curricula are generally limited by having to be able to be taught by a large number of teachers all around the country and by needing a minimum number of students at the school who are capable of the content.
If the préparatoires can put more qualified teachers and students together that would allow significant development and running selection for elite universities after such an intermediate preparatory program it would reduce the chance that talented students aren’t missed due to having attended a high school that is weaker at maths (even though it sounds like the preparatories have a selection bar too, I assume it’s quite a bit lower than performing well enough to get into a top institution).
Here’s a short-form with my Wise AI advisors research direction: https://www.lesswrong.com/posts/SbAofYCgKkaXReDy4/chris_leong-s-shortform?view=postCommentsNew&postId=SbAofYCgKkaXReDy4&commentId=Zcg9idTyY5rKMtYwo
(I already posted this on the Less Wrong post).
I was taking it as “solves” or “gets pretty close to solving”. Maybe that’s a misinterpretation on my part. What did you mean here?
First of all, it tackles one of the main core difficulties of AI safety in a fairly direct way — namely, the difficulty of how to specify what we want AI systems to do (aka “outer alignment”)
I wouldn’t quite go so far as to say it “tackles” the problem of outer alignment, but it does tie into (pragmatic) attempts to solve the problem by identifying the ontology of realistically specifiable reward functions. However, maybe I’m misunderstanding you?
I suspect that your post probably isn’t going to be very legible to the majority folks on Less Wrong, since you’re assuming familiarity with meta-modernism. To be honest, I suspect this post would have been more persuasive if you had avoided mentioning it, since the majority of folks here are likely skeptical of it and it hardly seems to be essential for making what seems to be the core point of your post[1]. Sometimes less is more. Things cut out can always explored in the future, when you have the time to explain them in a way that will be legible to your audience (though it’s often valuable to gesture towards the directions you wish to develop in the future).
I see as the core point that your post is arguing for as the following:If moral realism is true[2], then this suggests that incorporating it within our attempt at alignment may be easier than avoiding making any assumptions about morality, since understanding morality then becomes about trying to see reality more clearly.
I think this is quite an interesting and reasonable argument and I’d like to see you sketch out in more detail how you think we might be able to leverage it.
I just created a new Discord server for generated AI safety reports (ie. using Deep Research or other tools). Would be excited to see you join (ps. Open AI now provides uses on the plus plan 10 queries per month using Deep Research).
Interesting idea. Will be interesting to see if this works out.
Lenses are… tabs. Opinionated tabs
Could you explain the intended use further?
Acausal positive interpretation
I’ve written up an short-form argument for focusing on Wise AI advisors. I’ll note that my perspective is different from that taken in the paper. I’m primarily interested in AI as advisors, whilst the authors focus more on AI acting directly in the world.
I agree that this doesn’t provide a definition of these values. Wise AI advisors could be helpful for figuring out your values, much like how a wise human would be helpful for this.