Program Den comments on Aligned with what?

Program Den 16 Jan 2023 17:44 UTC
1 point
0
I like that you have reservations about if we’re even powerful enough to destroy ourselves yet. Often I think “of course we are! Nukes, bioweapons, melting ice!”, but really, there’s no hard proof that we even can end ourselves.

It seems like the question of human regulation would be the first question, if we’re talking about AI safety, as the AI isn’t making itself (the egg comes first). Unless we’re talking about some type of fundamental rules that exist a priori. :)

This is what I’ve been asking and so far not finding any satisfactory answers for. Sci-Fi has forever warned us of the dangers of— well, pretty much any future-tech we can imagine— but especially thinking machines in the last century or so.

How do we ensure that humans design safe AI? And is it really a valid fear to think we’re not already building most the safety in, by the vary nature of “if the model doesn’t produce the results we want, we change it until it does”? Some of the debate seems to go back to a thing I said about selfishness. How much does the reasoning matter, if the outcome is the same? How much is semantics? If I use “selfish” to for all intents and purposes mean “unselfish” (the rising tide lifts all boats), how would searching my mental map for “selfish” or whatnot actually work? Ultimately it’s the actions, right?

I think this comes back to humans, and philosophy, and the stuff we haven’t quite sorted yet. Are thoughts actions? I mean, we have different words for them, so I guess not, but they can both be rendered as verbs, and are for sure linked. How useful would it actually be to be able to peer inside the mind of another? Does the timing matter? Depth? We know so little. Research is hard to reproduce. People seem to be both very individualistic, and groupable together like a survey.

FWIW it strikes me that there is a lot of anthropomorphic thinking going on, even for people who are on the lookout for it. Somewhere I mentioned how the word “reward” is probably not the best one to use, as it implies like a dopamine hit, which implies wireheading, and I’m not so sure that’s even possible for a computer— well as far as we know it’s impossible currently, and yet we’re using “reward systems” and other language which implies these models already have feelings.

I don’t know how we make it clear that “reward” is just for our thinking, to help visualize or whatever, and not literally what is happening. We are not training animals, we’re programming computers, and it’s mostly just math. Does math feel? Can an algorithm be rewarded? Maybe we should modify our language, be it literally by using different words, or meta by changing meaning (I prefer different words but to each their own).

I mean, I don’t really know if math has feelings. It might. What even are thoughts? Just some chemical reactions? Electricity and sugar or whatnot? Is the universe super-deterministic and did this thought, this sentence, basically exist from the first and will exist to the last? Wooeee! I love to think! Perhaps too much. Or not enough? Heh.