Would CIRL with many human agents realistically model our world?
What does AI alignment mean with respect to many humans with different goals? Are we implicitly assuming (with all our current agendas) that the final model of AGI is to being corrigible with one human instructor?
How do we synthesize goals of so many human agents into one utility function? Are we assuming solving alignment with one supervisor is easier? Wouldn’t having many supervisors restrict the space meaningfully?
I wonder if there’s a game theoretic and evolutionary argument that could be made here about cooperation being the sane default in the absence of other priors.
What Dharma traditions in particular so you have in mind, because I can’t think of one i would describe as saying everyone had innate “moral” perfection unless you sufficiently twist around the word “moral” such that it’s use is confusing at best.
I’m interested in this. The problem is that if people consider the value provided by the different currencies at all fungible, side markets will pop up that allow their exchange.
An idea I haven’t thought about enough (mainly because I lack expertise) is to mark a token as Contaminated if its history indicates that it has passed through “illegal” channels, ie has benefited someone in an exchange not considered a true exchange of value, and so purists can refuse to accept those. Purist communities, if large, would allow stability of such non-contaminated tokens.
Maybe a better question to ask is “do we have utility functions that are partial orders and thus would benefit from many isolated markets?”, because if so, you wouldn’t have to worry about enforcing anything, many different currencies will automatically come into existence and be stable.
Of course, more generally, you wouldn’t quite have complete isolation, but different valuations of goods in different currencies, without “true” fungibility. I think it is quite possibe that our preference orderings are in fact partial and the current one-currency valuation of everything might be improved.
It is so difficult to understand the difference and articulate in pronunciation some accent that is not one’s native, because of the predictive processing of the brain. Our brains are constantly appropriating signals that are closely related to the known ones.
Waterproof paper in the shower, for collecting thoughts and making a morning todo list
Email filters and Priority Inbox, to prevent spurious interruptions while keeping enough trust that urgent things will generate notifications, that I don’t feel compelled to check too often
USB batteries for recharging phones—one to carry around, one at each charging spot for quick-swapping
I find having a skateboard is a compact way to shave minutes off of the sections of my commute where I would otherwise have to walk. It turns a 15 minute walk to the bus stop into a 5 minute ride, which adds up in the long run.
What are the possible failure modes of AI-aligned Humans? What are the possible misalignment scenarios? I can think of malevolent uses of AI tech to enforce hegemony and etc etc. What else?
Generally, if you want to go outside of your comfort zone, you might as well do something useful (either for yourself, or for others).
For example, if you try “rejection therapy” (approaching random people, getting rejected, and thus teaching your System 1 that being rejected doesn’t actually hurt you), you could approach people with something specific, like giving them fliers, or trying to sell something. You may make some money as a side effect, and in addition to expanding your comfort zone also get some potentially useful job experience. If you travel across difficult terrain, you could also transport some cargo and get paid for it. If you volunteer for an organization, you will get some advice and support (the goal is to do something unusual and uncomfortable, not to optimize for failure), and you will get interesting contacts (your LinkedIn profile will be like: “endorsed for skills: C++, object-oriented development, brain surgery, fire extinguishing, assassination, cooking for homeless”).
You could start by obtaining a list of non-governmental organizations in your neighborhood, calling them, and asking whether they need a temporary volunteer. (Depending on your current comfort zone, this first step may already be outside of it.)
How specifically would you do better than status quo?
I could easily dismiss some charities for causes I don’t care about, or where I think they do more harm than good. Now there are still many charities left whose cause I approve of, and that seems to me like they could help. How do I choose among these? They publish some reports, but are the numbers there the important ones, or just the ones that are easiest to calculate?
For example, I don’t care if your “administrative overhead” is 40%, if that allows you to spend the remaining 60% ten times more effectively than a comparable charity with smaller overhead. Unfortunately, the administrative overhead will most likely be included in the report, with two decimal places; but the achieved results will be either something nebulous (e.g. “we make the world a better place” or “we help kids become smarter”), or they will describe the costs, not the outcomes (e.g. “we spent 10 millions to save the rainforest” or “we spent 5 milions to teach kids the importance of critical thinking”).
Now, I don’t have time and skills to become a full-time charity researcher. So if I want to donate well, I need someone who does the research for me, and whose integrity and sanity I can trust.
Extremely low probability events are great as intuition pumps, but terrible as real world decisionmaking.
Would CIRL with many human agents realistically model our world?
What does AI alignment mean with respect to many humans with different goals? Are we implicitly assuming (with all our current agendas) that the final model of AGI is to being corrigible with one human instructor?
How do we synthesize goals of so many human agents into one utility function? Are we assuming solving alignment with one supervisor is easier? Wouldn’t having many supervisors restrict the space meaningfully?
Is there a good bijection between specification gaming and wireheading vs different types of Goodhart’s law?
Seems like this has been done already.
https://www.alignmentforum.org/posts/yXPT4nr4as7JvxLQa/classifying-specification-problems-as-variants-of-goodhart-s
Speculation: People never use pro-con lists to actually make decisions, they rather use them rationalizingly to convince others.
The internet might be lacking multiple kind of curation and organization tools? How can we improve?
Are Dharma traditions that posit ‘innate moral perfection of everyone by default’ reasoning from the just world fallacy?
I wonder if there’s a game theoretic and evolutionary argument that could be made here about cooperation being the sane default in the absence of other priors.
What Dharma traditions in particular so you have in mind, because I can’t think of one i would describe as saying everyone had innate “moral” perfection unless you sufficiently twist around the word “moral” such that it’s use is confusing at best.
Can we have a market with qualitatively different (un-interconvertible) forms of money?
I’m interested in this. The problem is that if people consider the value provided by the different currencies at all fungible, side markets will pop up that allow their exchange.
An idea I haven’t thought about enough (mainly because I lack expertise) is to mark a token as Contaminated if its history indicates that it has passed through “illegal” channels, ie has benefited someone in an exchange not considered a true exchange of value, and so purists can refuse to accept those. Purist communities, if large, would allow stability of such non-contaminated tokens.
Maybe a better question to ask is “do we have utility functions that are partial orders and thus would benefit from many isolated markets?”, because if so, you wouldn’t have to worry about enforcing anything, many different currencies will automatically come into existence and be stable.
Of course, more generally, you wouldn’t quite have complete isolation, but different valuations of goods in different currencies, without “true” fungibility. I think it is quite possibe that our preference orderings are in fact partial and the current one-currency valuation of everything might be improved.
It is so difficult to understand the difference and articulate in pronunciation some accent that is not one’s native, because of the predictive processing of the brain. Our brains are constantly appropriating signals that are closely related to the known ones.
How would signalling/countersignalling work in a post-scarcity economy?
Can you define a post-scarcity economy in terms of what you anticipate the world to look like?
What are some effective ways to reset the hedonic baseline?
What gadgets have improved your productivity?
For example, I started using a stylus few days ago and realized it can be a great tool for a lot of things!
Multiple large monitors, for programming.
Waterproof paper in the shower, for collecting thoughts and making a morning todo list
Email filters and Priority Inbox, to prevent spurious interruptions while keeping enough trust that urgent things will generate notifications, that I don’t feel compelled to check too often
USB batteries for recharging phones—one to carry around, one at each charging spot for quick-swapping
I find having a skateboard is a compact way to shave minutes off of the sections of my commute where I would otherwise have to walk. It turns a 15 minute walk to the bus stop into a 5 minute ride, which adds up in the long run.
If there is no self, what are we going to upload to the cloud?
The brain, I guess.
Pathological examples of math are analogous to adversarial examples in ML. Or are they?
What are the possible failure modes of AI-aligned Humans? What are the possible misalignment scenarios? I can think of malevolent uses of AI tech to enforce hegemony and etc etc. What else?
What’s a good way to force oneself outside their comfort zone where most expectations and intuitions routinely fail?
This might become useful to build antifragility about expectation management.
Quick example—living without money in a foreign nation.
Is it possible to design a personal or group retreat for this?
What kills you doesn’t make you stronger. You want to get out of your comfort zone, not out of your survival zone.
Okay, natural catastrophes might not be a good example. (Edited)
Helping out with disaster/emergency relief efforts might get people out of their comfort zone.
Generally, if you want to go outside of your comfort zone, you might as well do something useful (either for yourself, or for others).
For example, if you try “rejection therapy” (approaching random people, getting rejected, and thus teaching your System 1 that being rejected doesn’t actually hurt you), you could approach people with something specific, like giving them fliers, or trying to sell something. You may make some money as a side effect, and in addition to expanding your comfort zone also get some potentially useful job experience. If you travel across difficult terrain, you could also transport some cargo and get paid for it. If you volunteer for an organization, you will get some advice and support (the goal is to do something unusual and uncomfortable, not to optimize for failure), and you will get interesting contacts (your LinkedIn profile will be like: “endorsed for skills: C++, object-oriented development, brain surgery, fire extinguishing, assassination, cooking for homeless”).
You could start by obtaining a list of non-governmental organizations in your neighborhood, calling them, and asking whether they need a temporary volunteer. (Depending on your current comfort zone, this first step may already be outside of it.)
Where is the paradigm for Effective Activism? On a first thought, it doesn’t even seem to be difficult to do better than status quo.
How specifically would you do better than status quo?
I could easily dismiss some charities for causes I don’t care about, or where I think they do more harm than good. Now there are still many charities left whose cause I approve of, and that seems to me like they could help. How do I choose among these? They publish some reports, but are the numbers there the important ones, or just the ones that are easiest to calculate?
For example, I don’t care if your “administrative overhead” is 40%, if that allows you to spend the remaining 60% ten times more effectively than a comparable charity with smaller overhead. Unfortunately, the administrative overhead will most likely be included in the report, with two decimal places; but the achieved results will be either something nebulous (e.g. “we make the world a better place” or “we help kids become smarter”), or they will describe the costs, not the outcomes (e.g. “we spent 10 millions to save the rainforest” or “we spent 5 milions to teach kids the importance of critical thinking”).
Now, I don’t have time and skills to become a full-time charity researcher. So if I want to donate well, I need someone who does the research for me, and whose integrity and sanity I can trust.