By contrast, today’s AIs are really nice and ethical. They’re humble, open-minded, cooperative, kind. Yes, they care about some things that could give them instrumental reasons to seek power (eg being helpful, human welfare), but their values are great
They also aren’t facing the same incentive landscape humans are. You talk later about evolution to be selfish; not only is the story for humans is far more complicated (why do humans often offer an even split in the ultimatum game?), but also humans talk a nicer game than they act (see construal level theory, or social-desirability bias.). Once you start looking at AI agents who have similar affordances and incentives that humans have, I think you’ll see a lot of the same behaviors.
(There are structural differences here between humans and AIs. As an analogy, consider the difference between large corporations and individual human actors. Giant corporate chain restaurants often have better customer service than individual proprietors because they have more reputation on the line, and so are willing to pay more to not have things blow up on them. One might imagine that AIs trained by large corporations will similarly face larger reputational costs for misbehavior and so behave better than individual humans would. I think the overall picture is unclear and nuanced and doesn’t clearly point to AI superiority.)
though there’s a big question mark over how much we’ll unintentionally reward selfish superhuman AI behaviour during training
Is it a big question mark? It currently seems quite unlikely to me that we will have oversight systems able to actually detect and punish superhuman selfishness on the part of the AI.
My understanding is that the Lightcone Offices and Lighthaven have 1) overlapping but distinct audiences, with Lightcone Offices being more ‘EA’ in a way that seemed bad, and 2) distinct use cases, where Lighthaven is more of a conference venue with a bit of coworking whereas Lightcone Offices was basically just coworking.