Let me know if you have any questions!
Kabir Kumar
options to vary rules/environment/language as well, to see how the alignment generalizes ood. will try this today
it would basically be DnD like.
Making a thing like Papers Please, but as a text adventure, popping an ai agent into that.
Also, could literally just put the ai agent into a text rpg adventure—something like the equivalent of Skyrim, where there are a number of ways to achieve the endgame, level up, etc, both more and less morally. Maybe something like https://www.choiceofgames.com/werewolves-3-evolutions-end/
Will bring it up at the alignment eval hackathon
I see them in o1-preview all the time as well. Also, french occasionally
If developments like this continue, could open weights models be made into a case for not racing? E.g. if everyone’s getting access to the weights, what’s the point in spending billions to get there 2 weeks earlier?
this can be done more scalably in a text game, no?
People Cannot Handle Gambling on Smartphones
this seems a very strange way to say “Smartphone Gambling is Unhealthy”
It’s like saying “People’s Lungs Cannot Handle Cigarettes”
To be a bit less useless—I think this fundamentally misses the problem of respect and actually being able to communicate with yourself and fully do things, if you’ve done so—and that you can do these when you have full faith and respect in yourself (meaning all of yourself—may include love as well, not sure how necessary that is for this). Could maybe be done in other ways as well, but I find those less beautiful, personally.
I think this is really along the wrong path and misunderstanding a lot of things, but so far along the incorrect path of thought and misunderstanding so much, that it’s hard to untangle
I thought this was going to be an allegory for interpretability.
give better names to actual formal math things, jesus christ.
I think posts like this are net harmful, by discouraging people from joining those doing good things without providing an alternative and so wasting energy on meaningless ruminating that doesn’t culminate in any useful action.
oh, sorry, I thought slatestar codex wrote something about it and you were saying that’s where it comes from
I pretty much agree. I prefer rigid definitions because they’re less ambiguous to test and more robust to deception. And this field has a lot of deception.
Yup, those are hard. Was just thinking of a definition for the alignment problem, since I’ve not really seen any good ones.
what do you think of replit agent, stack blitz, etc?
damn, those prices are wild
used before, e.g. Feynman: https://calteches.library.caltech.edu/51/2/CargoCult.htm
I plan to send the winning proposals from this to as many governing bodies/places that are enacting laws as possible—one country is lined up atm.