I think destruction of the market should be ruled out easily. Say paperclips have to have this value on an active market.
For manipulation, monopolization and “kill almost everyone and leave just a small market of 2 last survivors”… I have to make a post about this. I have a deeper idea (maybe) behind it than this particular example.
My general idea is this: I think when you hook up AI’s rewards to a system that has to have certain properties, it leads to interesting effects and implications for Alignment. Because now the AI needs to care both about its rewards and also about the properties of the reward system. Many Alignment ideas implicitly try to achieve this anyway.
Instead of explaining “monopolization is bad” (complicated and specific fact) you need to explain “100% controlling your own reward system is bad” (easier and more universal fact).
The humorous fictional version of this story would involve 2 of the last survivors locating a sensor of the AI, building a large hollow paperclip-shaped human habitat
I think some outcomes of paperclip maximization are qualitatively different from “everyone dies”, even if they’re still very bad. The outcomes in which AI has to leave at least some freedom/autonomy for humans (or some other system) are especially different. I think this is underexplored.
I think reformulating Alignment problem as “reward system control” problem at worst allows you to formulate all the same problems with a new angle and at best gives useful insight about the solution.
Say paperclips have to have this value on an active market.
Defining ‘active market’ sounds quite difficult. Is any kind of software-mediated trading, as opposed to humans thrusting arms into the air, like HFT trading of stocks, an ‘active market’? Then fine, the AI creates agents which just wash-trades assets. (Better yet, it uses combinatorial markets to ensure bids/asks only execute that leave the price exactly the same or other such properties minimized/maximized/stabilized.)
To take a step back: do you see a potential conceptual distinction between my idea and classic paperclip maximization? (Of course, you don’t have to see it and/or agree that there’s one. And even if there’s one in theory it doesn’t mean it exists in practice.)
Yes, it’s always hard to define the “true reward” AI should strive for. But properties of the system “true reward + AI” may be easier to define.
Then fine, the AI creates agents which just wash-trades assets.
If AI is able to reason/learn about properties of reward systems, then AI should be able to infer that taking 100% control over the reward system is a hack. Not something that can possibly be asked. So hacking the economy isn’t just a solution “human doesn’t expect” (some such solutions are very good), it’s a solution that can’t possibly be asked. This is one of the points of my idea: to introduce a distinction between unexpected solutions and nonsensical solutions.
do you see a potential conceptual distinction between my idea and classic paperclip maximization?
No. Not without a lot more work, because markets, evolution, gradient descent, Bayesian inference, and logical inference/prediction markets all have various isomorphisms and formal identities, which can make their ‘differences’ more a matter of nominalist preference, notation, and emphasis than necessarily any genuine conceptual distinction. You can define AIs which are quite explicitly architected as ‘markets’ of various sorts, like the ‘Hayek machine’ or the ‘neural bucket brigade’, or interpret them as natural selection if you prefer on agents with log utility (evolutionary finance), and so on; are those “markets”, which can trade paperclips? Sure, why not.
I see that I need a post to at least explain myself. On the other hand, I worry to post too soon (maybe it’s better to discuss something beforehand?). For the moment I decided to post this comment. I know, it’s not formal, but I wanted to show what type of AI thinking I have in mind. And sorry for an annoying semantic nitpick ahead.
Not without a lot more work, because markets, evolution, gradient descent, Bayesian inference, and logical inference/prediction markets all have various isomorphisms and formal identities, which can make their ‘differences’ more a matter of nominalist preference, notation, and emphasis than necessarily any genuine conceptual distinction.
I think we can use 2 metrics to compare those ideas:
Does this idea describe what the AI tries to achieve?
Does this idea describe how the AI thinks internally?
My idea is 80% about (1) and 20% about (2). Gradient descent is 100% about (2). Evolution, Bayesian inference and prediction markets are 100% about (2).
Because of this I feel like there’s only 20% chance those ideas are equivalent/there’s only 20% equivalence between them.
So, I feel like those ideas are different enough: “an AI that works like a market” and “an AI that seeks markets in the world and analyzes their properties”.
I think destruction of the market should be ruled out easily. Say paperclips have to have this value on an active market.
For manipulation, monopolization and “kill almost everyone and leave just a small market of 2 last survivors”… I have to make a post about this. I have a deeper idea (maybe) behind it than this particular example.
My general idea is this: I think when you hook up AI’s rewards to a system that has to have certain properties, it leads to interesting effects and implications for Alignment. Because now the AI needs to care both about its rewards and also about the properties of the reward system. Many Alignment ideas implicitly try to achieve this anyway.
Instead of explaining “monopolization is bad” (complicated and specific fact) you need to explain “100% controlling your own reward system is bad” (easier and more universal fact).
I think some outcomes of paperclip maximization are qualitatively different from “everyone dies”, even if they’re still very bad. The outcomes in which AI has to leave at least some freedom/autonomy for humans (or some other system) are especially different. I think this is underexplored.
I think reformulating Alignment problem as “reward system control” problem at worst allows you to formulate all the same problems with a new angle and at best gives useful insight about the solution.
Defining ‘active market’ sounds quite difficult. Is any kind of software-mediated trading, as opposed to humans thrusting arms into the air, like HFT trading of stocks, an ‘active market’? Then fine, the AI creates agents which just wash-trades assets. (Better yet, it uses combinatorial markets to ensure bids/asks only execute that leave the price exactly the same or other such properties minimized/maximized/stabilized.)
To take a step back: do you see a potential conceptual distinction between my idea and classic paperclip maximization? (Of course, you don’t have to see it and/or agree that there’s one. And even if there’s one in theory it doesn’t mean it exists in practice.)
Yes, it’s always hard to define the “true reward” AI should strive for. But properties of the system “true reward + AI” may be easier to define.
If AI is able to reason/learn about properties of reward systems, then AI should be able to infer that taking 100% control over the reward system is a hack. Not something that can possibly be asked. So hacking the economy isn’t just a solution “human doesn’t expect” (some such solutions are very good), it’s a solution that can’t possibly be asked. This is one of the points of my idea: to introduce a distinction between unexpected solutions and nonsensical solutions.
No. Not without a lot more work, because markets, evolution, gradient descent, Bayesian inference, and logical inference/prediction markets all have various isomorphisms and formal identities, which can make their ‘differences’ more a matter of nominalist preference, notation, and emphasis than necessarily any genuine conceptual distinction. You can define AIs which are quite explicitly architected as ‘markets’ of various sorts, like the ‘Hayek machine’ or the ‘neural bucket brigade’, or interpret them as natural selection if you prefer on agents with log utility (evolutionary finance), and so on; are those “markets”, which can trade paperclips? Sure, why not.
Thank you for taking the time to answer!
I see that I need a post to at least explain myself. On the other hand, I worry to post too soon (maybe it’s better to discuss something beforehand?). For the moment I decided to post this comment. I know, it’s not formal, but I wanted to show what type of AI thinking I have in mind. And sorry for an annoying semantic nitpick ahead.
I think we can use 2 metrics to compare those ideas:
Does this idea describe what the AI tries to achieve?
Does this idea describe how the AI thinks internally?
My idea is 80% about (1) and 20% about (2). Gradient descent is 100% about (2). Evolution, Bayesian inference and prediction markets are 100% about (2).
Because of this I feel like there’s only 20% chance those ideas are equivalent/there’s only 20% equivalence between them.
So, I feel like those ideas are different enough: “an AI that works like a market” and “an AI that seeks markets in the world and analyzes their properties”.