jimrandomh comments on Jimrandomh’s Shortform

jimrandomh 14 Mar 2023 20:56 UTC
24 points
(I wrote this comment for the HN announcement, but missed the time window to be able to get a visible comment on that thread. I think a lot more people should be writing comments like this and trying to get the top comment spots on key announcements, to shift the social incentive away from continuing the arms race.)
On one hand, GPT-4 is impressive, and probably useful. If someone made a tool like this in almost any other domain, I’d have nothing but praise. But unfortunately, I think this release, and OpenAI’s overall trajectory, is net bad for the world.
Right now there are two concurrent arms races happening. The first is between AI labs, trying to build the smartest systems they can as fast as they can. The second is the race between advancing AI capability and AI alignment, that is, our ability to understand and control these systems. Right now, OpenAI is the main force driving the arms race in capabilities–not so much because they’re far ahead in the capabilities themselves, but because they’re slightly ahead and are pushing the hardest for productization.
Unfortunately at the current pace of advancement in AI capability, I think a future system will reach the level of being a recursively self-improving superintelligence before we’re ready for it. GPT-4 is not that system, but I don’t think there’s all that much time left. And OpenAI has put us in a situation where humanity is not, collectively, able to stop at the brink; there are too many companies racing too closely, and they have every incentive to deny the dangers until it’s too late.
Five years ago, AI alignment research was going very slowly, and people were saying that a major reason for this was that we needed some AI systems to experiment with. Starting around GPT-3, we’ve had those systems, and alignment research has been undergoing a renaissance. If we could _stop there_ for a few years, scale no further, invent no more tricks for squeezing more performance out of the same amount of compute, I think we’d be on track to create AIs that create a good future for everyone. As it is, I think humanity probably isn’t going to make it.
In Planning for AGI and Beyond Sam Altman wrote:
At some point, the balance between the upsides and downsides of deployments (such as empowering malicious actors, creating social and economic disruptions, and accelerating an unsafe race) could shift, in which case we would significantly change our plans around continuous deployment.
I think we’ve passed that point already, but if GPT-4 is the slowdown point, it’ll at least be a lot better than if they continue at this rate going forward. I’d like to see this be more than lip service.
Survey data on what ML researchers expect
An example concrete scenario of how a chatbot turns into a misaligned superintelligence
Extra-pessimistic predictions by Eliezer
(Facebook crosspost)
- Noosphere89 15 Mar 2023 13:45 UTC
  1 point
  Parent
  Going to write this now, but I disagree right now due to differing models of AI risk.
- JNS 15 Mar 2023 12:47 UTC
  1 point
  Parent
  When I look at the recent Stanford paper, where they retained a LLaMA model using training data generated by GPT-3, and some of the recent papers utilizing memory.
  I get that tinkling feeling and my mind goes “combining that and doing …. I could …”
  I have not updated for faster timelines, yet. But I think I might have to.
  - Gerald Monroe 15 Mar 2023 13:01 UTC
    2 points
    Parent
    If you look at the GPT-4 paper they used the model itself to check it’s own outputs for negative content. This lets them scale applying the constraints of “don’t say <things that violate the rules>”.
    
    Presumably they used an unaltered copy of GPT-4 as the “grader”. So it’s not quite RSI because of this—it’s not recursive, but it is self improvement.
    This to me is kinda major, AI is now capable enough to make fuzzy assessments of if a piece of text is correct or breaks rules.
    For other reasons, especially their strong visual processing, yeah, self improvement in a general sense appears possible. (self improvement as a ‘shorthand’, your pipeline for doing it might use immutable unaltered models for portions of it)