I think we’re close to hitting points where improvement in ai in general scenarios will get more and more difficult to achieve over time, and from here on out, each gain in capability will cost exponentially more. I think the concrete evidence I’m using for the claim matters less than the test, but I’ll cover my beliefs anyway.
...
so enamored about ChatGPT … ChatGPT does shiny new thing
Of course they-fucking would have made a fucking GPT-4 holy fucking shit. But no, they didn’t.
I think they’ve made it clear they do have a GPT-4, and that it’s not magic, but that it’s pretty impressive. I think speculation that bing chat is gpt-4 based is plausible, though I’m not as confident as bing chat itself seems to be.
map to open-endedness
Papers exist that map this. I will not link them here. They’re not hard to find if you already know what you’re doing finding papers, though.
No nickel-and-diming- especially on marginal improvements
No tools: Database lookups, search engines, elasticsearch, in-memory caches, etc.
Not hidden behind a company’s api, code audited by third party
AIs that would pass this test exist now, but have not been scaled; they would be catastrophically dangerous if hyperscaled. the teams that know how to build them understand this and are not inclined to share their code, so you’re not gonna get proof, but I am convinced by the papers that claim it because I understand their mechanisms.
If you don’t have the skill of finding papers yourself trained strongly, I have some posts on related topics. They focus on the papers I think differentially improve verifiable-by-construction approaches, but it’s not an accident that those are significant improvements in capability—it could not be otherwise.
And finally, since I end almost all messages with it, my pitch for alignment: I don’t think alignment is as fundamentally difficult as yudkowsky, but I think it’s very important that we can trust that the manifold of behavior of next-gen ais has thorough mutualism-seeking self-correction at every level of the model, because an AI that sees opportunities to make itself stronger needs to have weights-level failsafes that make it stop grabbing power if the power it’s grabbing is zero-or-negative-sum, or if modifying itself to use the new source of power would make its previous semantic bindings break. markets only work to promote the common good when agents within the markets notice when they’re just capturing each others’ existing value by being cleverer, rather than creating value each other failed to notice an opportunity to create more value than would have existed otherwise. if the market is swarmed by value capturing agents, humans will have their value captured and locked away by ai quickly, and this will result in the market suffocating humanity and eventually it’ll only be ai agents.
Unclear without a fuller reference, but taking a guess. I don’t think we either have or need anything to offer other than being moral patients. And moral patienthood doesn’t need to be symmetric, I expect it’s correct to leave sapient crocodiles (who by stipulation won’t consider me a moral patient) to their own devices, as long as their global influence and ability to commit atrocities is appropriately bounded.
Because of LLM human imitations, the main issue with alignment appears to be transitive alignment, ability of LLM AGIs to set up existential risk governance so that they don’t build unaligned successor AGIs. This is not a matter of morality or deeper principles that generate morality’s alignment-relevant aspects. It’s a matter of competence, and LLM AGIs won’t by default be much more competent at this sort of coordination and caution than humans, even as they are at least as competent at building unaligned AGIs. Even if there is a selection principle whereby eventually most unaligned AGIs grow to recognize humans as moral patients, that doesn’t save us from the initial ignorant nanotech-powered flailing.
So I don’t think non-extinction alignment is reliably feasible, even if LLM AGIs are the first to take off and are fine alignment-wise. But I think it’s not a problem whose outcome humans will have an opportunity to influence. What we need to focus on is alignment of LLM characters, channeling their potential humanity instead of morally arbitrary fictional AI tropes, and maybe unusual inclination to be wary of AI risk.
...
I think they’ve made it clear they do have a GPT-4, and that it’s not magic, but that it’s pretty impressive. I think speculation that bing chat is gpt-4 based is plausible, though I’m not as confident as bing chat itself seems to be.
Papers exist that map this. I will not link them here. They’re not hard to find if you already know what you’re doing finding papers, though.
AIs that would pass this test exist now, but have not been scaled; they would be catastrophically dangerous if hyperscaled. the teams that know how to build them understand this and are not inclined to share their code, so you’re not gonna get proof, but I am convinced by the papers that claim it because I understand their mechanisms.
If you don’t have the skill of finding papers yourself trained strongly, I have some posts on related topics. They focus on the papers I think differentially improve verifiable-by-construction approaches, but it’s not an accident that those are significant improvements in capability—it could not be otherwise.
And finally, since I end almost all messages with it, my pitch for alignment: I don’t think alignment is as fundamentally difficult as yudkowsky, but I think it’s very important that we can trust that the manifold of behavior of next-gen ais has thorough mutualism-seeking self-correction at every level of the model, because an AI that sees opportunities to make itself stronger needs to have weights-level failsafes that make it stop grabbing power if the power it’s grabbing is zero-or-negative-sum, or if modifying itself to use the new source of power would make its previous semantic bindings break. markets only work to promote the common good when agents within the markets notice when they’re just capturing each others’ existing value by being cleverer, rather than creating value each other failed to notice an opportunity to create more value than would have existed otherwise. if the market is swarmed by value capturing agents, humans will have their value captured and locked away by ai quickly, and this will result in the market suffocating humanity and eventually it’ll only be ai agents.
Unclear without a fuller reference, but taking a guess. I don’t think we either have or need anything to offer other than being moral patients. And moral patienthood doesn’t need to be symmetric, I expect it’s correct to leave sapient crocodiles (who by stipulation won’t consider me a moral patient) to their own devices, as long as their global influence and ability to commit atrocities is appropriately bounded.
Because of LLM human imitations, the main issue with alignment appears to be transitive alignment, ability of LLM AGIs to set up existential risk governance so that they don’t build unaligned successor AGIs. This is not a matter of morality or deeper principles that generate morality’s alignment-relevant aspects. It’s a matter of competence, and LLM AGIs won’t by default be much more competent at this sort of coordination and caution than humans, even as they are at least as competent at building unaligned AGIs. Even if there is a selection principle whereby eventually most unaligned AGIs grow to recognize humans as moral patients, that doesn’t save us from the initial ignorant nanotech-powered flailing.
So I don’t think non-extinction alignment is reliably feasible, even if LLM AGIs are the first to take off and are fine alignment-wise. But I think it’s not a problem whose outcome humans will have an opportunity to influence. What we need to focus on is alignment of LLM characters, channeling their potential humanity instead of morally arbitrary fictional AI tropes, and maybe unusual inclination to be wary of AI risk.
Re: Papers- I’m aware of papers like you’re alluding to, though I haven’t been that impressed.