One of the biggest challenges here is that subsidies designed to be support alignment could be snagged by AI companies misrepresenting capabilities works as safety work. Do you think the government has the ability to differentiate between these?
Chris_Leong
Become a member of LessWrong or the AI Alignment Forum
I think the goal is for the alignment forum to be somewhat selective in terms of who can comment.
(Removed some of my comments b/c I just noticed the clarification that you meant average member of the EA forum/Less Wrong. I would suggest changing the title of your post though).
For the record, I see the new field of “economics of transformative AI” as overrated.
Economics has some useful frames, but it also tilts people towards being too “normy” on the impacts of AI and it doesn’t have a very good track record on advanced AI so far.
I’d much rather see multidisciplinary programs/conferences/research projects, including economics as just one of the perspectives represented, then economics of transformative AI qua economics of transformative AI.(I’d be more enthusiastic about building economics of transformative AI as a field if we were starting five years ago, but these things take time and it’s pretty late in the game now, so I’m less enthusiastic about investing field-building effort here and more enthusiastic about pragmatic projects combining a variety of frames).
Points for creativity, though I’m still somewhat skeptical about the viability of this strategy,
My intuition would be that models learn to implement more general templates as well.
It seems to me that “vibe checks” for how smart a model feels are easily gameable by making it have a better personality.
It’s not clear to me that personality is completely separate from capabilities, especially with inference time reasoning.Also, what do you mean by “bigger templates”?
I wonder about the extent to which having an additional level of selection helps.
High school curricula are generally limited by having to be able to be taught by a large number of teachers all around the country and by needing a minimum number of students at the school who are capable of the content.
If the préparatoires can put more qualified teachers and students together that would allow significant development and running selection for elite universities after such an intermediate preparatory program it would reduce the chance that talented students aren’t missed due to having attended a high school that is weaker at maths (even though it sounds like the preparatories have a selection bar too, I assume it’s quite a bit lower than performing well enough to get into a top institution).
Here’s a short-form with my Wise AI advisors research direction: https://www.lesswrong.com/posts/SbAofYCgKkaXReDy4/chris_leong-s-shortform?view=postCommentsNew&postId=SbAofYCgKkaXReDy4&commentId=Zcg9idTyY5rKMtYwo
(I already posted this on the Less Wrong post).
I was taking it as “solves” or “gets pretty close to solving”. Maybe that’s a misinterpretation on my part. What did you mean here?
First of all, it tackles one of the main core difficulties of AI safety in a fairly direct way — namely, the difficulty of how to specify what we want AI systems to do (aka “outer alignment”)
I wouldn’t quite go so far as to say it “tackles” the problem of outer alignment, but it does tie into (pragmatic) attempts to solve the problem by identifying the ontology of realistically specifiable reward functions. However, maybe I’m misunderstanding you?
I suspect that your post probably isn’t going to be very legible to the majority folks on Less Wrong, since you’re assuming familiarity with meta-modernism. To be honest, I suspect this post would have been more persuasive if you had avoided mentioning it, since the majority of folks here are likely skeptical of it and it hardly seems to be essential for making what seems to be the core point of your post[1]. Sometimes less is more. Things cut out can always explored in the future, when you have the time to explain them in a way that will be legible to your audience (though it’s often valuable to gesture towards the directions you wish to develop in the future).
I see as the core point that your post is arguing for as the following:If moral realism is true[2], then this suggests that incorporating it within our attempt at alignment may be easier than avoiding making any assumptions about morality, since understanding morality then becomes about trying to see reality more clearly.
I think this is quite an interesting and reasonable argument and I’d like to see you sketch out in more detail how you think we might be able to leverage it.
I just created a new Discord server for generated AI safety reports (ie. using Deep Research or other tools). Would be excited to see you join (ps. Open AI now provides uses on the plus plan 10 queries per month using Deep Research).
Interesting idea. Will be interesting to see if this works out.
Lenses are… tabs. Opinionated tabs
Could you explain the intended use further?
Acausal positive interpretation
My take: Counterfactuals are Confusing because of an Ontological Shift:
“In our naive ontology, when we are faced with a decision, we conceive of ourselves as having free will in the sense of there being multiple choices that we could actually take. These choices are conceived of as actual and we when think about the notion of the “best possible choice” we see ourselves as comparing actual possible ways that the world could be. However, we when start investigating the nature of the universe, we realise that it is essentially deterministic and hence that our naive ontology doesn’t make sense. This forces us to ask what it means to make the “best possible choice” in a deterministic ontology where we can’t literally take a choice other than the one that we make. This means that we have to try to find something in our new ontology that roughly maps to our old one.”
We expect a straightforward answer to “What is a decision theory as a mathematical object?”, since we automatically tend to assume our ontology is consistent, but if this isn’t the case and we actually have to repair our ontology, it’s unsurprising that we end up with different kinds of objects.
Well, we’re going to be training AI anyway. If we’re just training capabilities, but not wisdom, I think things are unlikely to go well. More thoughts on this here.
I believe that Anthropic should be investigating artificial wisdom:
I’ve summarised a paper arguing for the importance of artificial wisdom with Yoshua Bengio being one of the authors.I also have a short-form arguing for training wise AI advisors and an outline Some Preliminary Notes of the Promise of a Wisdom Explosion.
By Wise AI Advisors, I mean training an AI to provide wise advice. BTW, I’ve now added a link to a short-form post in my original comment where I detail the argument for wise AI advisors further.
I’d strongly recommend spending some time in the Bay area (or London as a second best option). Spending time in these spaces will help you build your model of the space.
You may also find this document I created on AI Safety & Entrepreneurship useful.