One reason is antitrust regulation. I think this would make regulators quite suspicious. It’s not usually good for the world for companies to play more of a positive-sum game with each other.
Suppose we considered simulating some human for a while to get a single response. My math heuristics are throwing up the hypothesis that proving what the response would be is morally equivalent to actually running the simulation—it’s just another substrate. Thoughts? Implications? References?
Fix a set of posts. Its subsets form a category whose morphisms are inclusions that map every element to itself. Call its forgetful functor to Set f. Each BCI can measure its user, such as by producing a vector of neuron activations. Its possible measurements form a space, and these spaces form a category. (Its morphisms would translate between brains, and each morphism would keep track of how well it preserves meaning.) Call its forgetful functor to Set g.
The comma category f/g has as its objects users (each a Set-function from some set of posts they’ve seen to their measured reactions), and each morphism would relate the user to another brain that saw more posts and reacted similarly on what the first user saw.
The product on f/g tells you how to translate between a set of brains. A user could telepathically tell another what headspace they’re in, so long as the other has ever demonstrated a corresponding experience. Note that a republican sending his love for republican posts might lead to a democrat receiving his hatred for republican posts.
The coproduct on f/g tells you how to extrapolate expected reactions between a set of brains. A user could simply put himself into a headspace and get handed a list of posts he hasn’t seen for which it is expected that they would have put him into that headspace.
Hearthstone has recently released Zephrys the Great, a card that looks at the public gamestate and gives you a choice between three cards that would be useful right now. You can see it in action here. I am impressed in the diversity of the choices it gives. An advisor AI that seems friendlier than Amazon’s/Youtube’s recommendation algorithm, because its secondary optimization incentive is fun, not money!
Could we get them to opensource the implementation so people could try writing different advisor AIs to use in the card’s place for, say, their weekly rule-changing Tavern Brawls?
OpenAI has a 100x profit cap for investors. Could another form of investment restriction reduce AI race incentives?
The market selects for people that are good at maximizing money, and care to do so. I’d expect there are some rich people who care little whether they go bankrupt or the world is destroyed.
Such a person might expect that if OpenAI launches their latest AI draft, either the world is turned into paperclips or all investors get the maximum payoff. So he might invest all his money in OpenAI and pressure OpenAI (via shareholder swing voting or less regulated means) to launch it.
If OpenAI said that anyone can only invest up to a certain percentage of their net worth in OpenAI, such a person would be forced to retain something to protect.
My lifelong chronic lateness resists “don’t do this, you’ll be late” as Patrick resists Man Ray, but noticing when I reach for “what time is it?” and thinking “that info makes your decisions worse, pretend you didn’t have it” just works? Let’s see if it sticks.
I forget things more than usual, so “a few years pass” is more comparable than usual to “I die and am replaced with a clone”. I have therefore in childhood instilled a policy of being loyal to my past selves, and would like to someday become a continuum of resurrected past selves. This is going to go better with more data. Recommend local, open-source lifelogging software for Windows or Linux and Android.
The wavefunctioncollapse algorithm measures whichever tile currently has the lowest entropy. GPT-3 always just measures the next token. Of course in prose those are usually the same, but I expect some qualitative improvements once we get structured data with holes such that any might have low entropy, a transformer trained to fill holes, and the resulting ability to pick which hole to fill next.
Until then, I expect those prompts/GPT protocols to perform well which happen to present the holes in your data in the order that wfc would have picked, ie ask it to show its work, don’t ask it to write the bottom line of its reasoning process first.
Long shortform short: Include the sequences in your prompt as instructions :)
I claim that the way to properly solve embedded agency is to do abstract agent foundations such that embedded agency falls out naturally as one adds an embedding.
In the abstract, an agent doesn’t terminally care to use an ability to modify its utility function.
Suppose a clique of spherical children in a vacuum [edit: …pictured on the right] found each other by selecting for their utility functions to be equal on all situations considered so far. They invest in their ability to work together, as nature incentivizes them to
They face a coordination problem: As they encounter new situations, they might find disagreements. Thus, they agree to shift their utility functions precisely in the direction of satisfying whatever each other’s preferences turn out to be.
This is the simplest case I yet see where alignment as a concept falls out explicitly. It smells like it fails to scale in any number of ways, which is worrisome for our prospects. Another point for not trying to build a utility maximizer.
It’s nice how college has accidentally turned into classes for everyone on how to tell when ChatGPT is making Assistant’s knowledge up. I think I’m better than chance at this. We should have contests on this, Eliciting Latent Knowledge can be an art until we make it a science.
In the multiplayer roleplaying game ss13, one player gets the rare and coveted role of wizard, and can now choose his magic ability/equipment purchases. First he buys an apprentice, choosing me randomly from all ghosts (players without a role, observing the game) that would like it. Next he considers picking an ability that will let him spawn additional ghost roles during play. Let’s say there’s a 50% chance of an extra ghost, it’s barely worth it in that case so he picks it.
But let’s suppose I were the one with the choice. If over the years I get asked 4 times whether I want to play Apprentice, I’m gonna get picked 3 times out of 4, and 2 of those three times there’s no extra ghosts. So I shouldn’t buy the ability. But we’re on the same team, so this doesn’t make sense! What’s going on?
Ah, I got it. Just as I update towards the ghost pool being smaller when I get picked, I should update towards the ghost pool being larger when I find myself as a ghost in the first place. These updates cancel out, and I should buy the ability.
All the knowledge a language model character demonstrates is contained in the model. I expect that there is also some manner of intelligence, perhaps pattern-matching ability, such that the model cannot write a smarter character than itself. The better we engineer our prompts, the smaller the model overhang. The larger the overhang, the more opportunity for inner alignment failure.
I expect that all that’s required for a Singularity is to wait a few years for the sort of language model that can replicate a human’s thoughts faithfully, then make it generate a thousand year’s worth of that researcher’s internal monologue, perhaps with access to the internet.
Neural networks should be good at this task—we have direct evidence that neural networks can run human brains.
Whether our world’s plot has a happy ending then merely depends on the details of that prompt/protocol—such as whether it decides to solve alignment before running a successor. Though it’s probably simple to check alignment of the character—we have access to his thoughts. A harder question is whether the first LM able to run humans is still inner aligned.
https://arbital.com/p/cev/ : “If any hypothetical extrapolated person worries about being checked, delete that concern and extrapolate them as though they didn’t have it. This is necessary to prevent the check itself from having a UDT influence on the extrapolation and the actual future.”
Our altruism (and many other emotions) are evolutionarily just an acausal reaction to the worry that we’re being simulated by other humans.
It seems like a jerk move to punish someone for being self-aware enough to replace their emotions by the decision-theoretic considerations they evolved to approximate.
And unnecessary! For if they behave nicely when checked because they worry they’re being checked, they should also behave nicely when unchecked.
I think (given my extremely limited understanding of this stuff) this is to prevent UDT agents from fooling the people simulating them by recognizing that they’re in a simulation.
Cryptocurrencies are vulnerable to 51% attacks. This is good: If we transport most of the economy to the blockchain and then someone manages to commandeer it, this can in effect lead to a peaceful transition to a world government. We find the strongest living capitalist and empower them still further, in case there is any threat to our entire species.
More immediately, OpenAI and other companies who have or would endorse throwing in their lot with the foremost AI company should now, while they’re still more than one quarterly earnings report away from an AI race, formalize this via contract.
I think you’re missing the fact that “the economy” isn’t actually about currency or accounting. Those are ways of tracking the economy, which consists of various goods and services that people provide to each other.
If any given currency (crypto or not) becomes untrustworthy, it’s value goes to zero, and other currencies take over as accounting mechanisms. Often with some violence in disputed ownership of the actual stuff that the currency was supposed to have been tied to.
What if we suppose that wealth doesn’t track merit that well, and accumulating 51% of wealth most likely signals the measurement error due to noise/randomness/luck?
And even inasmuch as it tracks capitalistic merit, it might not track other things we care about, which makes it problematic leaving all eggs in one basket.
We happen to have landed in a scenario where distributing eggs across baskets is the silly move. Making the holder of a random dollar world dictator would be an improvement.
Suppose all futures end in FAI or UFAI. Suppose there were a magic button that rules out the UFAI futures if FAI was likely enough, and the FAI futures otherwise. The cutoff happens to be chosen to conserve your subjective probability of FAI. I see the button as transforming our game for the world’s fate from one of luck into one of skill. Would you press it?
Consider a singleton overseeing ten simpletons. Its ontology is that each particle has a position. Each prefers all their body’s particles being in it to the alternative. It aggregates their preferences by letting each of them rule out 10% of the space of possibilities. This does not let them gurantee their integrity. What if it considered changes to a single position instead of states? Each would rule out any change that removes a particle from their body, which fits fine in their 10%. Iterating non-ruled-out changes would end up in an optimal state starting from any state. This isn’t free lunch, but we should formalize what we paid.
Why don’t AI companies just trade some stocks to each other, so they’ll be playing more of a positive-sum game?
One reason is antitrust regulation. I think this would make regulators quite suspicious. It’s not usually good for the world for companies to play more of a positive-sum game with each other.
As the prediction markets on Trump winning went from ~50% to ~100% over 6 hours, S&P 500 futures moved less than the rest of the time. Why?
Were whichever markets you’re looking at open at this time? Most stuff doesn’t trade that much out of hours.
https://www.google.com/search?q=spx futures
I was specifically looking at Nov 5th 0:00-6:00, which twitched enough to show aliveness, while manifold and polymarket moved in smooth synchrony.
Suppose we considered simulating some human for a while to get a single response. My math heuristics are throwing up the hypothesis that proving what the response would be is morally equivalent to actually running the simulation—it’s just another substrate. Thoughts? Implications? References?
I’ve found a category-theoretical model of BCI-powered reddit!
Fix a set of posts. Its subsets form a category whose morphisms are inclusions that map every element to itself. Call its forgetful functor to Set f. Each BCI can measure its user, such as by producing a vector of neuron activations. Its possible measurements form a space, and these spaces form a category. (Its morphisms would translate between brains, and each morphism would keep track of how well it preserves meaning.) Call its forgetful functor to Set g.
The comma category f/g has as its objects users (each a Set-function from some set of posts they’ve seen to their measured reactions), and each morphism would relate the user to another brain that saw more posts and reacted similarly on what the first user saw.
The product on f/g tells you how to translate between a set of brains. A user could telepathically tell another what headspace they’re in, so long as the other has ever demonstrated a corresponding experience. Note that a republican sending his love for republican posts might lead to a democrat receiving his hatred for republican posts.
The coproduct on f/g tells you how to extrapolate expected reactions between a set of brains. A user could simply put himself into a headspace and get handed a list of posts he hasn’t seen for which it is expected that they would have put him into that headspace.
Hearthstone has recently released Zephrys the Great, a card that looks at the public gamestate and gives you a choice between three cards that would be useful right now. You can see it in action here. I am impressed in the diversity of the choices it gives. An advisor AI that seems friendlier than Amazon’s/Youtube’s recommendation algorithm, because its secondary optimization incentive is fun, not money!
Could we get them to opensource the implementation so people could try writing different advisor AIs to use in the card’s place for, say, their weekly rule-changing Tavern Brawls?
OpenAI has a 100x profit cap for investors. Could another form of investment restriction reduce AI race incentives?
The market selects for people that are good at maximizing money, and care to do so. I’d expect there are some rich people who care little whether they go bankrupt or the world is destroyed.
Such a person might expect that if OpenAI launches their latest AI draft, either the world is turned into paperclips or all investors get the maximum payoff. So he might invest all his money in OpenAI and pressure OpenAI (via shareholder swing voting or less regulated means) to launch it.
If OpenAI said that anyone can only invest up to a certain percentage of their net worth in OpenAI, such a person would be forced to retain something to protect.
My lifelong chronic lateness resists “don’t do this, you’ll be late” as Patrick resists Man Ray, but noticing when I reach for “what time is it?” and thinking “that info makes your decisions worse, pretend you didn’t have it” just works? Let’s see if it sticks.
Does this mean when you see what time it is, you think “oh I have a few minutes?”
I forget things more than usual, so “a few years pass” is more comparable than usual to “I die and am replaced with a clone”. I have therefore in childhood instilled a policy of being loyal to my past selves, and would like to someday become a continuum of resurrected past selves. This is going to go better with more data. Recommend local, open-source lifelogging software for Windows or Linux and Android.
https://activitywatch.net/
The wavefunctioncollapse algorithm measures whichever tile currently has the lowest entropy. GPT-3 always just measures the next token. Of course in prose those are usually the same, but I expect some qualitative improvements once we get structured data with holes such that any might have low entropy, a transformer trained to fill holes, and the resulting ability to pick which hole to fill next.
Until then, I expect those prompts/GPT protocols to perform well which happen to present the holes in your data in the order that wfc would have picked, ie ask it to show its work, don’t ask it to write the bottom line of its reasoning process first.
Long shortform short: Include the sequences in your prompt as instructions :)
I claim that the way to properly solve embedded agency is to do abstract agent foundations such that embedded agency falls out naturally as one adds an embedding.
In the abstract, an agent doesn’t terminally care to use an ability to modify its utility function.
Suppose a clique of spherical children in a vacuum [edit: …pictured on the right] found each other by selecting for their utility functions to be equal on all situations considered so far. They invest in their ability to work together, as nature incentivizes them to
They face a coordination problem: As they encounter new situations, they might find disagreements. Thus, they agree to shift their utility functions precisely in the direction of satisfying whatever each other’s preferences turn out to be.
This is the simplest case I yet see where alignment as a concept falls out explicitly. It smells like it fails to scale in any number of ways, which is worrisome for our prospects. Another point for not trying to build a utility maximizer.
It’s nice how college has accidentally turned into classes for everyone on how to tell when ChatGPT is making Assistant’s knowledge up. I think I’m better than chance at this. We should have contests on this, Eliciting Latent Knowledge can be an art until we make it a science.
In the multiplayer roleplaying game ss13, one player gets the rare and coveted role of wizard, and can now choose his magic ability/equipment purchases. First he buys an apprentice, choosing me randomly from all ghosts (players without a role, observing the game) that would like it. Next he considers picking an ability that will let him spawn additional ghost roles during play. Let’s say there’s a 50% chance of an extra ghost, it’s barely worth it in that case so he picks it.
But let’s suppose I were the one with the choice. If over the years I get asked 4 times whether I want to play Apprentice, I’m gonna get picked 3 times out of 4, and 2 of those three times there’s no extra ghosts. So I shouldn’t buy the ability. But we’re on the same team, so this doesn’t make sense! What’s going on?
Ah, I got it. Just as I update towards the ghost pool being smaller when I get picked, I should update towards the ghost pool being larger when I find myself as a ghost in the first place. These updates cancel out, and I should buy the ability.
All the knowledge a language model character demonstrates is contained in the model. I expect that there is also some manner of intelligence, perhaps pattern-matching ability, such that the model cannot write a smarter character than itself. The better we engineer our prompts, the smaller the model overhang. The larger the overhang, the more opportunity for inner alignment failure.
I expect that all that’s required for a Singularity is to wait a few years for the sort of language model that can replicate a human’s thoughts faithfully, then make it generate a thousand year’s worth of that researcher’s internal monologue, perhaps with access to the internet.
Neural networks should be good at this task—we have direct evidence that neural networks can run human brains.
Whether our world’s plot has a happy ending then merely depends on the details of that prompt/protocol—such as whether it decides to solve alignment before running a successor. Though it’s probably simple to check alignment of the character—we have access to his thoughts. A harder question is whether the first LM able to run humans is still inner aligned.
https://arbital.com/p/cev/ : “If any hypothetical extrapolated person worries about being checked, delete that concern and extrapolate them as though they didn’t have it. This is necessary to prevent the check itself from having a UDT influence on the extrapolation and the actual future.”
Our altruism (and many other emotions) are evolutionarily just an acausal reaction to the worry that we’re being simulated by other humans.
It seems like a jerk move to punish someone for being self-aware enough to replace their emotions by the decision-theoretic considerations they evolved to approximate.
And unnecessary! For if they behave nicely when checked because they worry they’re being checked, they should also behave nicely when unchecked.
I think (given my extremely limited understanding of this stuff) this is to prevent UDT agents from fooling the people simulating them by recognizing that they’re in a simulation.
IE, you want to ignore the following code:
If (inOmegasHead){
oneBox;
} else{
twoBox}
Cryptocurrencies are vulnerable to 51% attacks. This is good: If we transport most of the economy to the blockchain and then someone manages to commandeer it, this can in effect lead to a peaceful transition to a world government. We find the strongest living capitalist and empower them still further, in case there is any threat to our entire species.
More immediately, OpenAI and other companies who have or would endorse throwing in their lot with the foremost AI company should now, while they’re still more than one quarterly earnings report away from an AI race, formalize this via contract.
I think you’re missing the fact that “the economy” isn’t actually about currency or accounting. Those are ways of tracking the economy, which consists of various goods and services that people provide to each other.
If any given currency (crypto or not) becomes untrustworthy, it’s value goes to zero, and other currencies take over as accounting mechanisms. Often with some violence in disputed ownership of the actual stuff that the currency was supposed to have been tied to.
Yeah, you’d need to have the social expectations be that 51% attacks are a legitimate part of the mechanism.
What if we suppose that wealth doesn’t track merit that well, and accumulating 51% of wealth most likely signals the measurement error due to noise/randomness/luck?
And even inasmuch as it tracks capitalistic merit, it might not track other things we care about, which makes it problematic leaving all eggs in one basket.
We happen to have landed in a scenario where distributing eggs across baskets is the silly move. Making the holder of a random dollar world dictator would be an improvement.
Suppose all futures end in FAI or UFAI. Suppose there were a magic button that rules out the UFAI futures if FAI was likely enough, and the FAI futures otherwise. The cutoff happens to be chosen to conserve your subjective probability of FAI. I see the button as transforming our game for the world’s fate from one of luck into one of skill. Would you press it?
Consider a singleton overseeing ten simpletons. Its ontology is that each particle has a position. Each prefers all their body’s particles being in it to the alternative. It aggregates their preferences by letting each of them rule out 10% of the space of possibilities. This does not let them gurantee their integrity. What if it considered changes to a single position instead of states? Each would rule out any change that removes a particle from their body, which fits fine in their 10%. Iterating non-ruled-out changes would end up in an optimal state starting from any state. This isn’t free lunch, but we should formalize what we paid.