Gerald Monroe comments on Aligned with what?

Gerald Monroe 16 Jan 2023 18:49 UTC
2 points
0
What evidence is there that we are near (even within 50 years!) to achieving conscious programs, with their own will, and the power to affect it? People are seriously contemplating programs sophisticated enough to intentionally lie to us. Lying is a sentient concept if ever there was one!

ChatGPT lies right now. It’s doing this because it has learned humans want a confident answer with logically correct but fake details over “I don’t know”.

Sure, it isn’t aware it’s lying, it’s just predicting which string of text to create, and the one with bullshit in it it thinks has a higher score than the correct answer or “I don’t know”.

This is a mostly fixable problem but the architecture doesn’t allow a system where we know it will never (or almost never) lie, we can only reduce the errors.
As for the rest—there have been enormous advances in the capability for DL/transformer based models in just the last few months. This is nothing like the controllers for previous robotic arms, and none of your prior experiences or the history of robotics are relevant.
See: https://innermonologue.github.io/ and https://www.deepmind.com/blog/building-interactive-agents-in-video-game-worlds
These are using techniques that both work pretty well, and I understand no production robotics system currently uses.
- Program Den 17 Jan 2023 4:44 UTC
  1 point
  0
  Parent
  Saying ChatGPT is “lying” is an anthropomorphism— unless you think it’s conscious?
  
  The issue is instantly muddied when using terms like “lying” or “bullshitting”^[1], which imply levels of intelligence simply not in existence yet. Not even with models that were produced literally today. Unless my prior experiences and the history of robotics have somehow been disconnected from the timeline I’m inhabiting. Not impossible. Who can say. Maybe someone who knows me, but even then… it’s questionable. :)
  
  I get the idea that “Real Soon Now, we will have those levels!” but we don’t, and using that language to refer to what we do have, which is not that, makes the communication harder— or less specific/accurate if you will— which is, funnily enough, sorta what you are talking about! NLP control of robots is neat, and I get why we want the understanding to be real clear, but neither of the links you shared of the latest and greatest imply we need to worry about “lying” yet. Accuracy? Yes 100%
  
  If for “truth” (as opposed to lies), you mean something more like “accuracy” or “confidence”, you can instruct ChatGPT to also give its confidence level when it replies. Some have found that to be helpful.
  
  If you think “truth” is some binary thing, I’m not so sure that’s the case once you get into even the mildest of complexities^[2]. “It depends” is really the only bulletproof answer.
  
  For what it’s worth, when there are, let’s call them binary truths, there is some recent-ish work^[3] in having the response verified automatically by ensuring that the opposite of the answer is false, as it were.
  
  If a model rarely has literally “no idea”, then what would you expect? What’s the threshold for “knowing” something? Tuning responses is one of the hard things to do, but as I mentioned before, you can peer into some of these “thought process” if you will^[4], literally by just asking it to add that information in the response.
  
  Which is bloody amazing! I’m not trying to downplay what we’ve (the royal we) have already achieved. Mainly it would be good if we are all on the same page though, as it were, at least as much as is possible (some folks think True Agreement is actually impossible, but I think we can get close).
  1. ^
    The nature of “Truth” is one of the Hard Questions for humans— much less our programs.
  2. ^
    Don’t get me started on the limits of provability in formal axiomatic theories!
  3. ^
    Discovering Latent Knowledge in Language Models Without Supervision
  4. ^
    But please don’t^[5]. ChatGPT is not “thinking” in the human sense
  5. ^
    won’t? that’s the opposite of will, right? grammar is hard (for me, if not some programs =])