I think your post here gives a good picture to keep in mind. However, I also find it likely that there will be some qualitative differences, rather than just an absolute quantitative advantage in magic for the AGI. I’ve been working on a post on this recently, but I thought I might as well also just briefly bring it up in the comments here until the post is ready.
I think that, for equal absolute levels of compute and algorithms, there will be a relative difference where people are good at long-timescale things, while AI is good at large-spacescale and at topics that in humans require a lot of cognitive work for specialization. Relatedly, people will tend to be good at understanding people, while AI will tend to be good at most other things.
My basic logic for this is: Both human and current AI intelligence seems to primarily have been built through a “search” loop, where different options were tested and the ones that worked were kept. However, the human search loop is evolution, whereas AI search loops are usually gradient descent. These each have a number of advantages and disadvantages, but the big advantage for evolution is that it works on a very long timescale; people don’t just evolve to deal with immediate short-term factors, because evolution’s feedback is solely based on reproductive fitness, and reproductive fitness is a holistic measure that takes your entire life into account.
On the other hand, gradient descent is much more expensive. You can’t pump a lifetime of data through a single gradient descent cycle (and also you don’t have a contiguous lifetime of data, you have a bunch of bits and pieces in a variety of formats). This means that gradient descent will tend to be run on data on shorter timescales, which means that it will be biased towards noticing effects that happen on these shorter timescales. Some of the time, effects on shorter timescales can be extrapolated to predict longer timescales, but this doesn’t always work, e.g. if there are hidden variables or if it’s computationally infeasible due to scale.
(Humans of course also learn most of their information from shorter timescale real-life experience, but as aid in interpreting this information, we’ve got priors from evolution. These might help a lot.)
The main place where I expect this to be relevant is in modelling people. People are full of hidden variables, and they cooperate on large scales and across long times. So this means I’d expect far fewer puppy-cupcakes than its intelligence levels would suggest, relative to its amount of antarctic ice cap melts.
(On the other hand, gradient descent has a number of relative advantages over evolution and especially over human real-life experience, e.g. a human can only learn one thing at a time whereas gradient descent can integrate absurdly many streams of data simultaneously into one big model. This allows AIs to be much broader in their abilities.)
(It also seems pretty plausible that all of this doesn’t apply because the AI can just strategy steal its long-term models from humans.)
You get most of your learning from experiences? I sure don’t. I get most of mine from reading, and I expect an AGI even close to human-level will also be able to learn from the logical abstractions of the books it reads. I think what you’re saying would be true if we agreed to not train AI models on text, but only on things like toy physical models. But currently, we’re feeding in tons of data meant to educated humans about the world, like textbooks on every subject and scientific papers, and all of wikipedia, and personal stories from Reddit and… everything we can come up with. If the algorithm has been improved enough to acccurately model the world protrayed in this text data, it will know lots about manipulating humans and predicting long timescales.
Examples of things that are right around me right now that I’ve not learned through reading: door, flask, lamps, tables, chairs, honey, fridge, ….
I’ve definitely learned a lot from reading, though typically even when reading about stuff I’ve learned even more by applying what I’ve read in practice, as words don’t capture all the details.
I think your post here gives a good picture to keep in mind. However, I also find it likely that there will be some qualitative differences, rather than just an absolute quantitative advantage in magic for the AGI. I’ve been working on a post on this recently, but I thought I might as well also just briefly bring it up in the comments here until the post is ready.
I think that, for equal absolute levels of compute and algorithms, there will be a relative difference where people are good at long-timescale things, while AI is good at large-spacescale and at topics that in humans require a lot of cognitive work for specialization. Relatedly, people will tend to be good at understanding people, while AI will tend to be good at most other things.
My basic logic for this is: Both human and current AI intelligence seems to primarily have been built through a “search” loop, where different options were tested and the ones that worked were kept. However, the human search loop is evolution, whereas AI search loops are usually gradient descent. These each have a number of advantages and disadvantages, but the big advantage for evolution is that it works on a very long timescale; people don’t just evolve to deal with immediate short-term factors, because evolution’s feedback is solely based on reproductive fitness, and reproductive fitness is a holistic measure that takes your entire life into account.
On the other hand, gradient descent is much more expensive. You can’t pump a lifetime of data through a single gradient descent cycle (and also you don’t have a contiguous lifetime of data, you have a bunch of bits and pieces in a variety of formats). This means that gradient descent will tend to be run on data on shorter timescales, which means that it will be biased towards noticing effects that happen on these shorter timescales. Some of the time, effects on shorter timescales can be extrapolated to predict longer timescales, but this doesn’t always work, e.g. if there are hidden variables or if it’s computationally infeasible due to scale.
(Humans of course also learn most of their information from shorter timescale real-life experience, but as aid in interpreting this information, we’ve got priors from evolution. These might help a lot.)
The main place where I expect this to be relevant is in modelling people. People are full of hidden variables, and they cooperate on large scales and across long times. So this means I’d expect far fewer puppy-cupcakes than its intelligence levels would suggest, relative to its amount of antarctic ice cap melts.
(On the other hand, gradient descent has a number of relative advantages over evolution and especially over human real-life experience, e.g. a human can only learn one thing at a time whereas gradient descent can integrate absurdly many streams of data simultaneously into one big model. This allows AIs to be much broader in their abilities.)
(It also seems pretty plausible that all of this doesn’t apply because the AI can just strategy steal its long-term models from humans.)
You get most of your learning from experiences? I sure don’t. I get most of mine from reading, and I expect an AGI even close to human-level will also be able to learn from the logical abstractions of the books it reads. I think what you’re saying would be true if we agreed to not train AI models on text, but only on things like toy physical models. But currently, we’re feeding in tons of data meant to educated humans about the world, like textbooks on every subject and scientific papers, and all of wikipedia, and personal stories from Reddit and… everything we can come up with. If the algorithm has been improved enough to acccurately model the world protrayed in this text data, it will know lots about manipulating humans and predicting long timescales.
Examples of things that are right around me right now that I’ve not learned through reading: door, flask, lamps, tables, chairs, honey, fridge, ….
I’ve definitely learned a lot from reading, though typically even when reading about stuff I’ve learned even more by applying what I’ve read in practice, as words don’t capture all the details.