Dr. in math. AI Alignment and Safety researcher. Bayesian. Science YouTuber, podcaster, writer. Author of the books “The Equation of Knowledge”, “Le Fabuleux Chantier” and “Turing à la plage”.
Lê Nguyên Hoang
SmartPoop 1.0: An AI Safety Science-Fiction
Pathways: Google’s AGI
Hi Apodosis, I have done my PhD in Bayesian game theory, so this is a topic close to my heart ˆˆ There are plenty of fascinating things to explore in the study of interactions between Bayesians. One important finding of my PhD was that, essentially, Bayesians end up playing (stable) Bayes-Nash equilibria in repeated games, even if the only feedback they receive is their utility (and in particular even if the private information of other players remain private). I also studied Bayesian incentive-compatible mechanism design, i.e. coming up with rules that incentivize Bayesians’ honesty. The book also discusses interesting features of interactions between Bayesians, such as Aumann-Aaronson’s agreement theorem or Bayesian persuasion (i.e. maximizing a Bayesian judge’s probability of convicting a defendant by optimizing what investigations should be persued). One research direction I’m interested in is that a Byzantine Bayesian agreement, i.e. how much a group of honest Bayesians can agree if they are infiltrated by a small number of malicious individuals, though I have not yet found the time to dig this topic further. A more empirical challenge is to determine how well these Bayesian game theory models fit the description of human (or AI) interactions. Clearly, we humans are not Bayesians. We have some systematic cognitive biases (and even powerful AIs may also have systematic biases, since they won’t be running Bayes rule exactly!). How can we best model and predict humans’ divergence from Bayes rule? There has been a lot of spectacular advance in cognitive sciences in this regard (check out Josh Tenenbaum’s work for instance), but there’s definitely a lot more to do!
The Equation of Knowledge
I promoted Bayes-up on my YouTube channel a couple of times 😋 (and on Twitter)
https://www.youtube.com/channel/UC0NCbj8CxzeCGIF6sODJ-7A/
The YouTube algorithm is arguably an example of a “simple” manipulative algorithm. It’s probably a combination of some reinforcement learning and a lot of supervised learning by now; but the following arguments apply even for supervised learning alone.
To maximize user engagement, it may recommend more addictive contents (cat videos, conspiracy, …) because it learned from previous examples that users who clicked on one such content tended to stay longer on YouTube afterwards. This is massive user manipulation at scale.
Is this an existential risk? Well, some of these addictive contents are radicalizing and angering users,. This arguably increases the risk of international tensions, which increases the risk of nuclear war. This may not be the most dramatic increase in existential risk; but it’s one that seems already going on today!
More generally, I believe that by pondering a lot more the behavior and impact of the YouTube algorithm, a lot can be learned about complex algorithms, including AGI. In a sense, the YouTube algorithm is doing so many different tasks that it can be argued to be already quite “general” (audio, visual, text, preference learning, captioning, translating, recommending, planning...).
More on this algorithm here: https://robustlybeneficial.org/wiki/index.php?title=YouTube
This is probably more contentious. But I believe that the concept of “intelligence” is unhelpful and causes confusion. Typically, Legg-Hutter intelligence does not seem to require any “embodied intelligence”.
I would rather stress two key properties of an algorithm: the quality of the algorithm’s world model and its (long-term) planning capabilities. It seems to me (but maybe I’m wrong) that “embodied intelligence” is not very relevant to world model inference and planning capabilities.
By the way, I’ve just realized that the Wikipedia page on AI ethics begins with robots. 😤
Thanks for the interesting comment. Perhaps to clarify, our current algorithms are by no means a final solution. In fact, our hope is to collect an interesting database to then encourage research on better algorithms that will factor, e.g., the comments on the videos.
Also, in the “settings” of the rating page, we have a functionality that allows contributors to input both their judgments and their confidence in their judgments, on a scale from 0 to 3 stars (default is 2). One idea could be to demand comments when the contributor claims a 3-star confidence judgment. This can allow disputes in the comment section.