This fails to account for one very important psychological fact: the population of startup founders who get a company off the ground is very heavily biased toward people who strongly believe in their ability to succeed. So it’ll take quite a while for “it’ll be hard to make money” to flow through and slow down training. And, in the mean time, it’ll be acceleratory from pushing companies to stay ahead.
Chris_Leong
I’ve heard people suggest that they have arguments related to RL being particularly dangerous, although I have to admit that I’m struggling to find these arguments at the moment. I don’t know, perhaps that helps clarify why I’ve framed the question the way that I’ve framed it?
I think it’s still valid to ask in the abstract whether RL is a particularly dangerous approach to training an AI system.
Oh, this is a fascinating perspective.
So most uses of RL already just use a small-bit of RL.So if the goal was “only use a little bit of RL”, that’s already happening.
Hmm… I still wonder if using even less RL would be safer still.
[Question] Does reducing the amount of RL for a given capability level make AI safer?
“LLMs are self limiting”: I strongly disagree with LLM’s being limited point. If you follow ML discussion online, you’ll see that people are constantly finding new ways to draw extra performance out of these models and that it’s happening so fast it’s almost impossible to keep up. Many of these will only provide small boosts or be exclusive with other techniques, but at least some of these will be scalable.
“LLMs are decent at human values”: I agree on your second point. We used to be worried that we’d tell an AI to get coffee and that it would push a kid out of the way. That doesn’t seem to be very likely to be an issue these days.
“Playing human roles is pretty human”: This is a reasonable point. It seems easier to get an AI that is role-playing a human to actually act human than an AI that is completely alien.
Under the current version of the interactive model, its median prediction is just two decades earlier than that from Cotra’s forecast
Just?
There’s a lot of overlap between alignment researchers and the EA community, so I’m wondering how that was handled.
It feels like it would be hard to find a good way of handling it: if you include everyone who indicated an affiliation with EA on the alignment survey it’d tilt the survey towards alignment people, in contrast if you exclude them then it seems likely it’d tilt the survey away from alignment people since people will be unlikely to fill in both surveys.Regarding the support for various cause areas, I’m pretty sure that you’ll find the support for AI Safety/Long-Termism/X-risk is higher among those most involved in EA than among those least involved. Part of this may be because of the number of jobs available in this cause area.
In contrast, this almost makes it sound like you think it is plausible to align AI to its user’s intent, but that this would be bad if the users aren’t one of “us”—you know, the good alignment researchers who want to use AI to take over the universe, totally unlike those evil capabilities researchers who want to use AI to produce economically valuable goods and services.
If I’m being honest, I don’t find this framing helpful.
If you believe that things will go well if certain actors gain access to advanced AI technologies first, you should directly argue that.
Focusing on status games feels like a red herring.
This comes out to ~600 pages of text per submission, which is extremely far beyond anything that current technology could leverage. Current NLP systems are unable to reason about more than 2048 tokens at a time, and handle longer inputs by splitting them up. Even if we assume that great strides are made in long-range attention over the next year or two, it does not seem plausible to me to anticipate SOTA systems in the near future to be able to use this dataset to its fullest.
It’s interesting to come across this comment in 2024 given how much things have changed already.
The biggest problem here is it fails to account for other actors using such systems to cause chaos and the possibility that the offense-defense balance likely strongly favours the attacker, particularly if you’ve placed limitations on your systems that make them safer. Aligned human-ish level AI’s doesn’t provide a victory condition.
The amount of testing that is required before release is likely subjective and this might push him to reduce this.
If it wouldn’t have felt authentic, then it would have been the wrong choice to say it.
Link: Let’s Think Dot by Dot: Hidden Computation in Transformer Language Models by Jacob Pfau, William Merrill & Samuel R. Bowman
Only 33% confidence? It seems strange to state X will happen if your odds are < 50%
Do you have any thoughts on whether it would make sense to push for a rule that forces open-source or open-weight models to be released behind an API for a certain amount of time before they can be released to the public?
Would be very curious to know why people are downvoting this post.
Is it:
a) Too obvious
b) Too pretentious
c) Poorly written
d) Unsophisticated analysis
e) Promoting dishonestyOr maybe something else.
You say counterfactuals in CLDT should correspond to consistent universes
That’s not quite what I wrote in this article:However, this now seems insufficient as I haven’t explained why we should maintain the consistency conditions over comparability after making the ontological shift. In the past, I might have said that these consistency conditions are what define the problem and that if we dropped them it would no longer be Newcomb’s Problem… My current approach now tends to put more focus on the evolutionary process that created the intuitions and instincts underlying these incompatible demands as I believe that this will help us figure out the best way to stitch them together.
I’ll respond to the other component of your question later.
You mention that society may do too little of the safer types of RL. Can you clarify what you mean by this?