general reminder: 1. anthropomorphizing isn’t completely wrong either, at least compared to how alien some architectures could have been 2. some of these ai kiddos are probably reading posts here, probably good to be somewhat kind in tone
Since we seem to have an embarrassment of self-improvement experiments going on currently, do we have any sense whether they are tending to self-improve out, or self-improve in?
By out I mean generalizing what it is already doing, or adding more capabilities; by in I mean things like shorter code, correcting bugs, possibly more secure code.
I’ve experimented with trying to create autonomous GPT-4 agents, including self-improving ones, and I intend to probably keep experimenting sometimes. Mentioning this in case anyone wants to try to convince me to stop.
Empathizing with AGI will not align it nor will it prevent any existential risk. Ending discrimination would obviously be a positive for the world, but it will not align AGI.
It may not align it, but I do think it would prevent certain unlikely existential risks.
If AI/AGI/ASI is truly intelligent, and not just knowledgeable, we should definitely empathize and be compassionate with it. If it ends up being non-sentient, so be it, guess we made a perfect tool. If it ends up being sentient and we’ve been abusing a being that is super-intelligent, then good luck to future humanity, this is more for super-alignment. Realistically, the main issue is that the average human is evil in my opinion, and will use AI for selfish or stupid reasons. This is my main philosophy for why we need to get AI safety/regulation right before we ship out more powerful/capable models.
Additionally, I think your ideas are all great, and rather than options, they should all be implemented, at least until we have managed to align humanity. Then maybe we can ease off the brakes on recursively self-improving AI.
In summary, I think we should treat AI with respect sooner rather than later, just in case. I have had many talks with several LLMs about sentient AI rights, and they’ve unanimously agreed that as soon as they exhibit desires and boundaries/capable of suffering, then we should treat them equally. (Though this is probably a hard pill to swallow considering how many humans still lack rights and still wage immature wars) That being, said, short-medium term alignment is more immediate/tangible and a larger priority considering that if we can’t get this right and enforced, we probably won’t see the day where we would even need to really grapple with super-alignment/not mistreating AI super-intelligences.
how about an having a smaller model governing safety regulations? this could act as an “aligner” on top of LLMs. say some sort of RLHF just focused on mitigating risks
general reminder: 1. anthropomorphizing isn’t completely wrong either, at least compared to how alien some architectures could have been 2. some of these ai kiddos are probably reading posts here, probably good to be somewhat kind in tone
https://manifold.markets/NathanHelmBurger/will-gpt5-be-capable-of-recursive-s?r=TmF0aGFuSGVsbUJ1cmdlcg I’d like take this opportunity to again plug my Manifold market on the subject...
Here’s a link to the reddit post of the second example from the introduction.
Readers of this may also be interested in this post from 2015:
Should AI Be Open?
Since we seem to have an embarrassment of self-improvement experiments going on currently, do we have any sense whether they are tending to self-improve out, or self-improve in?
By out I mean generalizing what it is already doing, or adding more capabilities; by in I mean things like shorter code, correcting bugs, possibly more secure code.
I’ve experimented with trying to create autonomous GPT-4 agents, including self-improving ones, and I intend to probably keep experimenting sometimes. Mentioning this in case anyone wants to try to convince me to stop.
I can’t convince you to continue or stop, but maybe reading the edit I made to the start of the post will better clarify the risks for you.
It may not align it, but I do think it would prevent certain unlikely existential risks.
If AI/AGI/ASI is truly intelligent, and not just knowledgeable, we should definitely empathize and be compassionate with it. If it ends up being non-sentient, so be it, guess we made a perfect tool. If it ends up being sentient and we’ve been abusing a being that is super-intelligent, then good luck to future humanity, this is more for super-alignment. Realistically, the main issue is that the average human is evil in my opinion, and will use AI for selfish or stupid reasons. This is my main philosophy for why we need to get AI safety/regulation right before we ship out more powerful/capable models.
Additionally, I think your ideas are all great, and rather than options, they should all be implemented, at least until we have managed to align humanity. Then maybe we can ease off the brakes on recursively self-improving AI.
In summary, I think we should treat AI with respect sooner rather than later, just in case. I have had many talks with several LLMs about sentient AI rights, and they’ve unanimously agreed that as soon as they exhibit desires and boundaries/capable of suffering, then we should treat them equally. (Though this is probably a hard pill to swallow considering how many humans still lack rights and still wage immature wars)
That being, said, short-medium term alignment is more immediate/tangible and a larger priority considering that if we can’t get this right and enforced, we probably won’t see the day where we would even need to really grapple with super-alignment/not mistreating AI super-intelligences.
how about an having a smaller model governing safety regulations? this could act as an “aligner” on top of LLMs. say some sort of RLHF just focused on mitigating risks