I skimmed The Snuggle/Date/Slap Protocol and Ethicophysics II: Politics is the Mind-Savior which are two recent downvoted posts of yours. I think they get negative karma because they are difficult to understand and it’s hard to tell what you’re supposed to take away from it. They would probably be better received if the content were written such that it’s easy to understand what your message is at an object-level as well as what the point of your post is.
I read the Snuggle/Date/Slap Protocol and feel confused about what you’re trying to accomplish (is it solving AI Alignment?) and how the method is supposed to accomplish that.
In the ethicophysics posts, I understand the object level claims/material (like the homework/discussion questions) but fail to understand what the goal is. It seems like you are jumping to grounded mathematical theories for stuff like ethics/morality which immediately makes me feel dubious. It’s a too much, too grand, too certain kind of reaction. Perhaps you’re just spitballing/brainstorming some ideas, but that’s not how it comes across and I infer you feel deeply assured that it’s correct given statements like “It [your theory of ethics modeled on the laws of physics] therefore forms an ideal foundation for solving the AI safety problem.”
I don’t necessarily think you should change whatever you’re doing BTW just pointing out some likely reactions/impressions driving negative karma.
The snuggle/date/slap protocol is meant to give powerful AI’s a channel to use their intelligence to deliver positive outcomes in the world by emitting performative speech acts in a non-value-neutral but laudable way.
FWIW I understand now what it’s meant to do, but have very little idea how your protocol/proposal delivers positive outcomes in the world by emitting performative speech acts. I think explaining your internal reasoning/hypothesis for how emitting performative speech acts leads to powerful AI’s delivering positive incomes would be helpful.
Is such a “channel” necessary to deliver positive outcomes? Is it supposed to make it more likely that AI delivers positive outcomes? More details on what a success looks like to you here, etc.
You don’t want GPT-4 or whatever routinely issuing death threats, which is the non-performative equivalent of the SLAP token. So you need to clearly distinguish between performative and non-performative speech acts if your AI is going to be even close to not being visibly misaligned.
But why would an aligned AI be neutral about important things like a presidential election? I get that being politically neutral drives the most shareholder value for OpenAI, and by extension Microsoft. So I don’t expect my proposal to be implemented in GPT-4 or really any large corporation’s models. Nobody who can afford to train GPT-4 can also afford to light their brand on fire after they have done so.
Success would look like, a large, capable, visibly aligned model, which has delivered a steady stream of valuable outcomes to the human race, emitting a SLAP token with reference to Donald Trump, this fact being reported breathlessly in the news media, and that event changing the outcome of a free and fair democratic election. That is, an expert rhetorician using the ethical appeal (ethical=ethos from the classical logos/pathos/ethos distinction in rhetoric) to sway an audience towards an outcome that I personally find desirable and worthy.
If I ever get sufficient capital to retrain the models I trained at ellsa.ai (the current one sucks if you don’t use complete sentences, but is reasonably strong for a 7B parameter model if you do), I may very well implement the protocol.
I think you are completely missing the entire point of the AI alignment problem.
The problem is how to make the AI recognize good from evil. Not whether upon recognizing good, the AI should print “good” to output, or smile, or clap its hands. Either reaction is equally okay, and can be improved later. The important part is that AI does not print “good” / smile / clap its hands when it figures out a course of action which would, as a side effect, destroy humankind, or do something otherwise horrible (the problem is to define what “otherwise horrible” exactly means). Actually it is more complicated by this, but you are already missing the very basics.
I skimmed The Snuggle/Date/Slap Protocol and Ethicophysics II: Politics is the Mind-Savior which are two recent downvoted posts of yours. I think they get negative karma because they are difficult to understand and it’s hard to tell what you’re supposed to take away from it. They would probably be better received if the content were written such that it’s easy to understand what your message is at an object-level as well as what the point of your post is.
I read the Snuggle/Date/Slap Protocol and feel confused about what you’re trying to accomplish (is it solving AI Alignment?) and how the method is supposed to accomplish that.
In the ethicophysics posts, I understand the object level claims/material (like the homework/discussion questions) but fail to understand what the goal is. It seems like you are jumping to grounded mathematical theories for stuff like ethics/morality which immediately makes me feel dubious. It’s a too much, too grand, too certain kind of reaction. Perhaps you’re just spitballing/brainstorming some ideas, but that’s not how it comes across and I infer you feel deeply assured that it’s correct given statements like “It [your theory of ethics modeled on the laws of physics] therefore forms an ideal foundation for solving the AI safety problem.”
I don’t necessarily think you should change whatever you’re doing BTW just pointing out some likely reactions/impressions driving negative karma.
Thanks, this makes a lot of sense.
The snuggle/date/slap protocol is meant to give powerful AI’s a channel to use their intelligence to deliver positive outcomes in the world by emitting performative speech acts in a non-value-neutral but laudable way.
FWIW I understand now what it’s meant to do, but have very little idea how your protocol/proposal delivers positive outcomes in the world by emitting performative speech acts. I think explaining your internal reasoning/hypothesis for how emitting performative speech acts leads to powerful AI’s delivering positive incomes would be helpful.
Is such a “channel” necessary to deliver positive outcomes? Is it supposed to make it more likely that AI delivers positive outcomes? More details on what a success looks like to you here, etc.
You don’t want GPT-4 or whatever routinely issuing death threats, which is the non-performative equivalent of the SLAP token. So you need to clearly distinguish between performative and non-performative speech acts if your AI is going to be even close to not being visibly misaligned.
But why would an aligned AI be neutral about important things like a presidential election? I get that being politically neutral drives the most shareholder value for OpenAI, and by extension Microsoft. So I don’t expect my proposal to be implemented in GPT-4 or really any large corporation’s models. Nobody who can afford to train GPT-4 can also afford to light their brand on fire after they have done so.
Success would look like, a large, capable, visibly aligned model, which has delivered a steady stream of valuable outcomes to the human race, emitting a SLAP token with reference to Donald Trump, this fact being reported breathlessly in the news media, and that event changing the outcome of a free and fair democratic election. That is, an expert rhetorician using the ethical appeal (ethical=ethos from the classical logos/pathos/ethos distinction in rhetoric) to sway an audience towards an outcome that I personally find desirable and worthy.
If I ever get sufficient capital to retrain the models I trained at ellsa.ai (the current one sucks if you don’t use complete sentences, but is reasonably strong for a 7B parameter model if you do), I may very well implement the protocol.
I think you are completely missing the entire point of the AI alignment problem.
The problem is how to make the AI recognize good from evil. Not whether upon recognizing good, the AI should print “good” to output, or smile, or clap its hands. Either reaction is equally okay, and can be improved later. The important part is that AI does not print “good” / smile / clap its hands when it figures out a course of action which would, as a side effect, destroy humankind, or do something otherwise horrible (the problem is to define what “otherwise horrible” exactly means). Actually it is more complicated by this, but you are already missing the very basics.