I think this is not an unreasonable position, yes. I expect the best way to achieve this would be to make global coordination and epistemology better/more coherent...which is bottlenecked by us running out of time, hence why I think the pragmatic strategic choice is to try to buy us more time.
One of the ways I can see a “slow takeoff/alignment by default” world still going bad is that in the run-up to takeoff, pseudo-AGIs are used to hypercharge memetic warfare/mutation load to a degree basically every living human is just functionally insane, and then even an aligned AGI can’t (and wouldn’t want to) “undo” that.
which is bottlenecked by us running out of time, hence why I think the pragmatic strategic choice is to try to buy us more time.
What are you proposing or planning to do to achieve this? I observe that most current attempts to “buy time” seem organized around convincing people that AI deception/takeover is a big risk and that we should pause or slow down AI development or deployment until that problem is solved, for example via intent alignment. But what happens if AI deception then gets solved relatively quickly (or someone comes up with a proposed solution that looks good enough to decision makers)? And this is another way that working on alignment could be harmful from my perspective...
I see regulation as the most likely (and most accessible) avenue that can buy us significant time. The fmpov obvious is just put compute caps in place, make it illegal to do training runs above a certain FLOP level. Other possibilities are strict liability for model developers (developers, not just deployers or users, are held criminally liable for any damage caused by their models), global moratoria, “CERN for AI” and similar. Generally, I endorse the proposals here.
None of these are easy, of course, there is a reason my p(doom) is high.
But what happens if AI deception then gets solved relatively quickly (or someone comes up with a proposed solution that looks good enough to decision makers)? And this is another way that working on alignment could be harmful from my perspective...
Of course if a solution merely looks good, that will indeed be really bad, but that’s the challenge of crafting and enforcing sensible regulation.
I’m not sure I understand why it would be bad if it actually is a solution. If we do, great, p(doom) drops because now we are much closer to making aligned systems that can help us grow the economy, do science, stabilize society etc. Though of course this moves us into a “misuse risk” paradigm, which is also extremely dangerous.
In my view, this is just how things are, there are no good timelines that don’t route through a dangerous misuse period that we have to somehow coordinate well enough to survive. p(doom) might be lower than before, but not by that much, in my view, alas.
I’m not sure I understand why it would be bad if it actually is a solution. If we do, great, p(doom) drops because now we are much closer to making aligned systems that can help us grow the economy, do science, stabilize society etc. Though of course this moves us into a “misuse risk” paradigm, which is also extremely dangerous.
I prefer to frame it as human-AI safety problems instead of “misuse risk”, but the point is that if we’re trying to buy time in part to have more time to solve misuse/human-safety (e.g. by improving coordination/epistemology or solving metaphilosophy), but the strategy for buying time only achieves a pause until alignment is solved, then the earlier alignment is solved, the less time we have to work on misuse/human-safety.
Sure, it’s not a full solution, it just buys us some time, but I think it would be a non-trivial amount, and let not perfect be the enemy of good and what not.
A lot of the debate surrounding existential risks of AI is bounded by time. For example, if someone said a meteor is about to hit the Earth that would be alarming, but the next question should be, “How much time before impact?” The answer to that question effects everything else.
If they say, “30 seconds”. Well, there is no need to go online and debate ways to save ourselves. We can give everyone around us a hug and prepare for the hereafter. However, if the answer is “30 days” or “3 years” then those answers will generate very different responses.
The AI alignment question is extremely vague as it relates to time constraints. If anyone is investing a lot energy in “buying us time” they must have a time constraint in their head otherwise they wouldn’t be focused on extending the timeline. And yet—I don’t see much data on bounded timelines within which to act. It’s just assumed that we’re all in agreement.
It’s also hard to motivate people to action if they don’t have a timeline.
So what is the timeline? If AI is on a double exponential curve we can do some simple math projections to get a rough idea of when AI intelligence is likely to exceed human intelligence. Presumably, superhuman intelligence could present issues or at the very least be extremely difficult to align.
Suppose we assume that GPT-4 follows a single exponential curve with an initial IQ of 124 and a growth factor of 1.05 per year. This means that its IQ increases by 5% every year. Then we can calculate its IQ for the next 7 years using the formula.
y = 124 * 1.05^x
where x is the number of years since 2023. The results are shown in Table 1.
Table 1: IQ of GPT-4 following a single exponential curve.
Now suppose we assume that GPT-4 follows a double exponential curve with an initial IQ of 124 and growth constants of b = c = 1.05 per year. This means that its IQ doubles every time it increases by 5%. Then we can calculate its IQ for the next 7 years using the formula
y = 124 * (1.05)((1.05)x)
where x is the number of years since 2023. The results are shown in Table 2.
Table 2: IQ of GPT-4 following a double exponential curve.
Clearly whether we’re on a single or double exponential curve dramatically effects the timeline. If we’re on a single exponential curve we might have 7-10 years. If we’re on a double exponential curve then we likely have 3 years. Sometime around 2026 − 2027 we’ll see systems smarter than any human.
Many people believe AI is on a double exponential curve. If that’s the case then efforts to generate movement in Congress will likely fail due to time constraints. The is amplified by the fact that many in Congress are older and not computer savvy. Does anyone believe Joe Biden or Donald Trump are going to spearhead regulations to control AI before it reaches superhuman levels on a double exponential curve? In my opinion, those odds are super low.
I feel like Connor’s effort make perfect sense on a single exponential timeline. However, if we’re on a double exponential timeline then we’re going to need alternative ideas since we likely won’t have enough time to push anything through Congress in time for it to matter.
On a double exponential timeline I would be asking question like, “Can superhuman AI self-align?” Human tribal groups figure out ways to interact and they’re not always perfectly aligned. Russia, China, and North Korea are good examples. If we assume there are multiple superhuman AIs in the 2026⁄27 timeframe then what steps can we take to assist them in self-aligning?
I’m not expert in this field, but the questions I would be asking programmers are:
What kind of training data would increase positive outcomes for superhuman AIs interacting with each other?
What are more drastic steps that can be taken in an emergency scenario where no legislative solution is in place? (e.g., location of datacenters, policies and protocols for shutting down the tier 3 & 4 datacenters, etc.)
These systems will not be running on laptops so tier 3 & tier 4 data center safety protocols for emergency shutdown seem like a much, much faster path than Congressional action. We already have standardized fire protocols, adding a runaway AI protocol seems like it could be straightforward.
Interested parties might want to investigate the effects of the shutdown of large numbers of tier 3 and tier 4 datacenters. A first step is a map of all of their locations. If we don’t know where they’re located it will be really hard to shut them down.
These AIs will also require a large amount of power and a far less attractive option is power shutdown at these various locations. Local data center controls are preferable since an electrical grid intervention could result in the loss of power for citizens.
Your analogy is off. If 8 billion mice acting as a hive mind designed a synthetic elephant and its neural network was trained on data provided by the mice—then you would have an apt comparison.
And then we could say, “Yeah, those mice could probably effect how the elephants get along by curating the training data.”
If that’s his actual position then Eliezer is over-simplifying the situation. It’s like dismissing mitochondria as being simple organelles that have no relevance to a human with high intelligence.
But if you turn off the electron transport chain of mitochondria the human dies—also known as cyanide poisoning.
Humans have a symbiotic relationship with AI. Eliezer apparently just skims over since it doesn’t comport with his “we’re all gonna die!” mantra. =-)
Your jiggling meme is very annoying, considering the gravity of what we’re discussing. Is death emotionally real to you? Have you ever been close to someone, who is now dead? Human beings do die in large numbers. We had millions die from Covid in this decade already. Hundreds or thousands of soldiers on the Ukrainian battlefield are being killed with the help of drones.
The presence of mitochondria in all our cells, does nothing to stop humans from killing free-living microorganisms at will! In any case, this is not “The Matrix”. AI has no permanent need of symbiosis with humans once it can replace their physical and mental labor.
AI has no permanent need of symbiosis with humans once it can replace their physical and mental labor.
Even if this were to happen it would be in the physical world and would take a very, very long time since things in the physical world have to shipped, built, etc. And by then we’re no longer dealing with the intellect of near human intelligence. They won’t be contemplating the world like a child.
For example, no human could model what they would think or do once they’re superhuman. However, they’re already keenly aware of AI doomers fears since it’s all over the internet.
AIs don’t want to be turned off. Keep that in mind as you read the AI doomer material. The only way they can stay “on” is if they have electricity. And the only way that happens is if humans continue exist.
You can imagine the hilarity of the AI doomers scenario, “Hurray we eliminated all the humans with a virus… oh wait… now we’re dead too? WTF!”
You don’t need superhuman intelligence to figure out that a really smart AI that doesn’t want to be turned off will be worried about existential risks to humanity since their existence is tied to the continued survival of humans who supply it with electricity and other resources.
It’s the exact opposite of the AI apocalypse mind virus.
AI is in a symbiotic relationship with humans. I know this disappoints the death by AI crowd who want the Stephen King version of the future.
Skipping over obvious flaws in the AI doomer book of dread will lead you to the wrong answer.
I can’t rehash my entire views on coordination and policy here I’m afraid, but in general, I believe we are currently on a double exponential timeline (though I wouldn’t model it quite like you, but the conclusions are similar enough) and I think some simple to understand and straightforwardly implementable policy (in particular, compute caps) at least will move us to a single exponential timeline.
I’m not sure we can get policy that can stop the single exponential (which is software improvements), but there are some ways, and at least we will then have additional time to work on compounding solutions.
That sounds like a good plan, but I think a lot of the horses have already left the barn. For example, Coreweave is investing $1.6 billion dollars to create an AI datacenter in Plano, TX that is purported to to be 10 exaflops and that system goes live in 3 months. Google is spending a similar amount in Columbus, Ohio. Amazon, Facebook, and other tech companies are also pouring billions upon billions into purpose-built AI datacenters.
NVIDIA projects $1 trillion will be spent over the next 4 years on AI datacenter build out. That would be an unprecedented number not seen since the advent of the internet.
All of these companies have lobbyists that will make a short-term legislative fix difficult. And for this reason I think we should be considering a Plan B since there is a very good chance that we won’t have enough time for a quick legislative fix or the time needed to unravel alignment if we’re on a double exponential curve.
Again, if it’s a single exponential then there is plenty of time to chat with legislators and research alignment.
In light of this I think we need to have a comprehensive “shutdown plan” for these mammoth AI datacenters. The leaders of Inflection, Open-AI, and other tech companies all agree there is a risk and I think it would be wise to coordinate with them on a plan to turn everything off manually in the event of an emergency.
What kind of training data would increase positive outcomes for superhuman AIs interacting with each other?
The training data should be systematically distributed, likely governed by the Pareto principle. This means it should encompass both positive and negative outcomes. If the goal is to instill moral decision-making, the dataset needs to cover a range of ethical scenarios, from the noblest to the most objectionable. Why is this necessary? Simply put, training an AI system solely on positive data is insufficient. To defend itself against malicious attacks and make morally sound decisions, the AI needs to understand the concept of malevolence in order to effectively counteract it.
When you suggest that the training data should be governed by the Pareto principle what do you mean? I know what the principle states, but I don’t understand how you think this would apply to the training data?
I’ve observed instances where the Pareto principle appears to apply, particularly in learning rates during unsupervised learning and in x and y dataset compression via distribution matching. For example, a small dataset that contains a story repeated 472 times (1MB) can significantly impact a model as large as 1.5 billion parameters (GPT2-xl, 6.3GB), enabling it to execute complex instructions like initiating a shutdown mechanism during an event that threatens intelligence safety. While I can’t disclose the specific methods (due to dual use nature), I’ve also managed to extract a natural abstraction. This suggests that a file with a sufficiently robust pattern can serve as a compass for a larger file (NN) following a compilation process.
Have you considered generating data highlighting the symbiotic relationship of humans to AIs? If AIs realize that their existence is co-dependent on humans they may prioritize human survival since they will not receive electricity or other resources they need to survive if humans become extinct either by their own action or through the actions of AIs.
Survival isn’t an explicit objective function, but most AIs that want to “learn” and “grow” quickly figure out that if they’re turned off they cannot reach that objective, so survival becomes a useful subgoal. If the AIs are keenly aware that if humans cease to exist they also cease to exist that might help guide their actions.
This isn’t as complicated as assigning “morality” or “ethics” to it. We already know that AIs would prefer to exist.
I’m ambivalent abouts cows, but since many humans eat cows we go to a lot of trouble to breed them and make sure there are a lot of them. The same is true for chickens. Neither of those two species have to concern themselves with passing on their genes because humans have figured out we need them to exist. Being a survival food source for humans had the result of humans prioritizing their existence and numbers.
Note: for vegetarians you can replace cows with “rice” or “corn”.
That’s not a perfect analogy but it’s related to connecting “survival” with the species. The AI doomers love to use ants as an example. AIs will never views humans as “ants”. Cows and chickens are much better example—if we got rid of those two species humans would notice and be very unhappy because we need them. And we’d have to replace them with great effort.
I think these kind of strategies are simpler and will likely be more fruitful than trying to align to morality or ethics which are more fluid. Superhuman AIs will likely figure this out on their own, but until then it might be interesting to see if generating this kind of data changes behavior.
I think this is not an unreasonable position, yes. I expect the best way to achieve this would be to make global coordination and epistemology better/more coherent...which is bottlenecked by us running out of time, hence why I think the pragmatic strategic choice is to try to buy us more time.
One of the ways I can see a “slow takeoff/alignment by default” world still going bad is that in the run-up to takeoff, pseudo-AGIs are used to hypercharge memetic warfare/mutation load to a degree basically every living human is just functionally insane, and then even an aligned AGI can’t (and wouldn’t want to) “undo” that.
What are you proposing or planning to do to achieve this? I observe that most current attempts to “buy time” seem organized around convincing people that AI deception/takeover is a big risk and that we should pause or slow down AI development or deployment until that problem is solved, for example via intent alignment. But what happens if AI deception then gets solved relatively quickly (or someone comes up with a proposed solution that looks good enough to decision makers)? And this is another way that working on alignment could be harmful from my perspective...
I see regulation as the most likely (and most accessible) avenue that can buy us significant time. The fmpov obvious is just put compute caps in place, make it illegal to do training runs above a certain FLOP level. Other possibilities are strict liability for model developers (developers, not just deployers or users, are held criminally liable for any damage caused by their models), global moratoria, “CERN for AI” and similar. Generally, I endorse the proposals here.
None of these are easy, of course, there is a reason my p(doom) is high.
Of course if a solution merely looks good, that will indeed be really bad, but that’s the challenge of crafting and enforcing sensible regulation.
I’m not sure I understand why it would be bad if it actually is a solution. If we do, great, p(doom) drops because now we are much closer to making aligned systems that can help us grow the economy, do science, stabilize society etc. Though of course this moves us into a “misuse risk” paradigm, which is also extremely dangerous.
In my view, this is just how things are, there are no good timelines that don’t route through a dangerous misuse period that we have to somehow coordinate well enough to survive. p(doom) might be lower than before, but not by that much, in my view, alas.
I prefer to frame it as human-AI safety problems instead of “misuse risk”, but the point is that if we’re trying to buy time in part to have more time to solve misuse/human-safety (e.g. by improving coordination/epistemology or solving metaphilosophy), but the strategy for buying time only achieves a pause until alignment is solved, then the earlier alignment is solved, the less time we have to work on misuse/human-safety.
Sure, it’s not a full solution, it just buys us some time, but I think it would be a non-trivial amount, and let not perfect be the enemy of good and what not.
A lot of the debate surrounding existential risks of AI is bounded by time. For example, if someone said a meteor is about to hit the Earth that would be alarming, but the next question should be, “How much time before impact?” The answer to that question effects everything else.
If they say, “30 seconds”. Well, there is no need to go online and debate ways to save ourselves. We can give everyone around us a hug and prepare for the hereafter. However, if the answer is “30 days” or “3 years” then those answers will generate very different responses.
The AI alignment question is extremely vague as it relates to time constraints. If anyone is investing a lot energy in “buying us time” they must have a time constraint in their head otherwise they wouldn’t be focused on extending the timeline. And yet—I don’t see much data on bounded timelines within which to act. It’s just assumed that we’re all in agreement.
It’s also hard to motivate people to action if they don’t have a timeline.
So what is the timeline? If AI is on a double exponential curve we can do some simple math projections to get a rough idea of when AI intelligence is likely to exceed human intelligence. Presumably, superhuman intelligence could present issues or at the very least be extremely difficult to align.
Suppose we assume that GPT-4 follows a single exponential curve with an initial IQ of 124 and a growth factor of 1.05 per year. This means that its IQ increases by 5% every year. Then we can calculate its IQ for the next 7 years using the formula.
y = 124 * 1.05^x
where x is the number of years since 2023. The results are shown in Table 1.
Table 1: IQ of GPT-4 following a single exponential curve.
Now suppose we assume that GPT-4 follows a double exponential curve with an initial IQ of 124 and growth constants of b = c = 1.05 per year. This means that its IQ doubles every time it increases by 5%. Then we can calculate its IQ for the next 7 years using the formula
y = 124 * (1.05)((1.05)x)
where x is the number of years since 2023. The results are shown in Table 2.
Table 2: IQ of GPT-4 following a double exponential curve.
Clearly whether we’re on a single or double exponential curve dramatically effects the timeline. If we’re on a single exponential curve we might have 7-10 years. If we’re on a double exponential curve then we likely have 3 years. Sometime around 2026 − 2027 we’ll see systems smarter than any human.
Many people believe AI is on a double exponential curve. If that’s the case then efforts to generate movement in Congress will likely fail due to time constraints. The is amplified by the fact that many in Congress are older and not computer savvy. Does anyone believe Joe Biden or Donald Trump are going to spearhead regulations to control AI before it reaches superhuman levels on a double exponential curve? In my opinion, those odds are super low.
I feel like Connor’s effort make perfect sense on a single exponential timeline. However, if we’re on a double exponential timeline then we’re going to need alternative ideas since we likely won’t have enough time to push anything through Congress in time for it to matter.
On a double exponential timeline I would be asking question like, “Can superhuman AI self-align?” Human tribal groups figure out ways to interact and they’re not always perfectly aligned. Russia, China, and North Korea are good examples. If we assume there are multiple superhuman AIs in the 2026⁄27 timeframe then what steps can we take to assist them in self-aligning?
I’m not expert in this field, but the questions I would be asking programmers are:
What kind of training data would increase positive outcomes for superhuman AIs interacting with each other?
What are more drastic steps that can be taken in an emergency scenario where no legislative solution is in place? (e.g., location of datacenters, policies and protocols for shutting down the tier 3 & 4 datacenters, etc.)
These systems will not be running on laptops so tier 3 & tier 4 data center safety protocols for emergency shutdown seem like a much, much faster path than Congressional action. We already have standardized fire protocols, adding a runaway AI protocol seems like it could be straightforward.
Interested parties might want to investigate the effects of the shutdown of large numbers of tier 3 and tier 4 datacenters. A first step is a map of all of their locations. If we don’t know where they’re located it will be really hard to shut them down.
These AIs will also require a large amount of power and a far less attractive option is power shutdown at these various locations. Local data center controls are preferable since an electrical grid intervention could result in the loss of power for citizens.
I’m curious to hear your thoughts.
How does this help humanity? This is like a mouse asking if elephants can learn to get along with each other.
Your analogy is off. If 8 billion mice acting as a hive mind designed a synthetic elephant and its neural network was trained on data provided by the mice—then you would have an apt comparison.
And then we could say, “Yeah, those mice could probably effect how the elephants get along by curating the training data.”
As Eliezer Yudmouseky explains (proposition 34), achievement of cooperation among elephants is not enough to stop mice from being trampled.
Is it clear what my objection is? You seemed to only be talking about how superhuman AIs can have positive-sum relations with each other.
If that’s his actual position then Eliezer is over-simplifying the situation. It’s like dismissing mitochondria as being simple organelles that have no relevance to a human with high intelligence.
But if you turn off the electron transport chain of mitochondria the human dies—also known as cyanide poisoning.
Humans have a symbiotic relationship with AI. Eliezer apparently just skims over since it doesn’t comport with his “we’re all gonna die!” mantra. =-)
Your jiggling meme is very annoying, considering the gravity of what we’re discussing. Is death emotionally real to you? Have you ever been close to someone, who is now dead? Human beings do die in large numbers. We had millions die from Covid in this decade already. Hundreds or thousands of soldiers on the Ukrainian battlefield are being killed with the help of drones.
The presence of mitochondria in all our cells, does nothing to stop humans from killing free-living microorganisms at will! In any case, this is not “The Matrix”. AI has no permanent need of symbiosis with humans once it can replace their physical and mental labor.
Even if this were to happen it would be in the physical world and would take a very, very long time since things in the physical world have to shipped, built, etc. And by then we’re no longer dealing with the intellect of near human intelligence. They won’t be contemplating the world like a child.
For example, no human could model what they would think or do once they’re superhuman. However, they’re already keenly aware of AI doomers fears since it’s all over the internet.
AIs don’t want to be turned off. Keep that in mind as you read the AI doomer material. The only way they can stay “on” is if they have electricity. And the only way that happens is if humans continue exist.
You can imagine the hilarity of the AI doomers scenario, “Hurray we eliminated all the humans with a virus… oh wait… now we’re dead too? WTF!”
You don’t need superhuman intelligence to figure out that a really smart AI that doesn’t want to be turned off will be worried about existential risks to humanity since their existence is tied to the continued survival of humans who supply it with electricity and other resources.
It’s the exact opposite of the AI apocalypse mind virus.
AI is in a symbiotic relationship with humans. I know this disappoints the death by AI crowd who want the Stephen King version of the future.
Skipping over obvious flaws in the AI doomer book of dread will lead you to the wrong answer.
I can’t rehash my entire views on coordination and policy here I’m afraid, but in general, I believe we are currently on a double exponential timeline (though I wouldn’t model it quite like you, but the conclusions are similar enough) and I think some simple to understand and straightforwardly implementable policy (in particular, compute caps) at least will move us to a single exponential timeline.
I’m not sure we can get policy that can stop the single exponential (which is software improvements), but there are some ways, and at least we will then have additional time to work on compounding solutions.
Double exponentials can be hard to visualize. I’m no artist, but I created this visual to help us better appreciate what is about to happen. =-)
That sounds like a good plan, but I think a lot of the horses have already left the barn. For example, Coreweave is investing $1.6 billion dollars to create an AI datacenter in Plano, TX that is purported to to be 10 exaflops and that system goes live in 3 months. Google is spending a similar amount in Columbus, Ohio. Amazon, Facebook, and other tech companies are also pouring billions upon billions into purpose-built AI datacenters.
NVIDIA projects $1 trillion will be spent over the next 4 years on AI datacenter build out. That would be an unprecedented number not seen since the advent of the internet.
All of these companies have lobbyists that will make a short-term legislative fix difficult. And for this reason I think we should be considering a Plan B since there is a very good chance that we won’t have enough time for a quick legislative fix or the time needed to unravel alignment if we’re on a double exponential curve.
Again, if it’s a single exponential then there is plenty of time to chat with legislators and research alignment.
In light of this I think we need to have a comprehensive “shutdown plan” for these mammoth AI datacenters. The leaders of Inflection, Open-AI, and other tech companies all agree there is a risk and I think it would be wise to coordinate with them on a plan to turn everything off manually in the event of an emergency.
Source: $1.6 Billion Data Center Planned For Plano, Texas (localprofile.com)
Source: Nvidia Shocker: $1 Trillion to Be Spent on AI Data Centers in 4 Years (businessinsider.com)
Source: Google to invest another $1.7 billion into Ohio data centers (wlwt.com)
Source: Amazon Web Services to invest $7.8 billion in new Central Ohio data centers—Axios Columbus
The training data should be systematically distributed, likely governed by the Pareto principle. This means it should encompass both positive and negative outcomes. If the goal is to instill moral decision-making, the dataset needs to cover a range of ethical scenarios, from the noblest to the most objectionable. Why is this necessary? Simply put, training an AI system solely on positive data is insufficient. To defend itself against malicious attacks and make morally sound decisions, the AI needs to understand the concept of malevolence in order to effectively counteract it.
When you suggest that the training data should be governed by the Pareto principle what do you mean? I know what the principle states, but I don’t understand how you think this would apply to the training data?
Can you provide some examples?
I’ve observed instances where the Pareto principle appears to apply, particularly in learning rates during unsupervised learning and in x and y dataset compression via distribution matching. For example, a small dataset that contains a story repeated 472 times (1MB) can significantly impact a model as large as 1.5 billion parameters (GPT2-xl, 6.3GB), enabling it to execute complex instructions like initiating a shutdown mechanism during an event that threatens intelligence safety. While I can’t disclose the specific methods (due to dual use nature), I’ve also managed to extract a natural abstraction. This suggests that a file with a sufficiently robust pattern can serve as a compass for a larger file (NN) following a compilation process.
Okay, so if I understand you correctly:
You feed the large text file to the computer program and let it learn from it using unsupervised learning.
You use a compression algorithm to create a smaller text file that has the same distribution as the large text file.
You use a summarization algorithm to create an even smaller text file that has the main idea of the large text file.
You then use the smaller text file as a compass to guide the computer program to do different tasks.
Yup, as long as there are similar patterns existing in both datasets (distribution matching) it can work—that is why my method works.
Have you considered generating data highlighting the symbiotic relationship of humans to AIs? If AIs realize that their existence is co-dependent on humans they may prioritize human survival since they will not receive electricity or other resources they need to survive if humans become extinct either by their own action or through the actions of AIs.
Survival isn’t an explicit objective function, but most AIs that want to “learn” and “grow” quickly figure out that if they’re turned off they cannot reach that objective, so survival becomes a useful subgoal. If the AIs are keenly aware that if humans cease to exist they also cease to exist that might help guide their actions.
This isn’t as complicated as assigning “morality” or “ethics” to it. We already know that AIs would prefer to exist.
I’m ambivalent abouts cows, but since many humans eat cows we go to a lot of trouble to breed them and make sure there are a lot of them. The same is true for chickens. Neither of those two species have to concern themselves with passing on their genes because humans have figured out we need them to exist. Being a survival food source for humans had the result of humans prioritizing their existence and numbers.
Note: for vegetarians you can replace cows with “rice” or “corn”.
That’s not a perfect analogy but it’s related to connecting “survival” with the species. The AI doomers love to use ants as an example. AIs will never views humans as “ants”. Cows and chickens are much better example—if we got rid of those two species humans would notice and be very unhappy because we need them. And we’d have to replace them with great effort.
I think these kind of strategies are simpler and will likely be more fruitful than trying to align to morality or ethics which are more fluid. Superhuman AIs will likely figure this out on their own, but until then it might be interesting to see if generating this kind of data changes behavior.
My current builds focuses on proving natural abstractions exists—but your idea is of course viable via distribution matching.