I like this question—if it proves true that GPT-4 can produce recognizable ASCII art of things, that would mean it was somehow modelling an internal sense of vision and ability to recognize objects.
For this very reason, I was intrigued by the possibility of teaching them vision this way.
I think ASCII art in its general form is an unfair setup, though; ChatGPT has no way of knowing the spacing or width of individual letters, that is not how they perceive them, and hence, they have no way of seeing which letters are above each other. Basically, if you were given an ASCII art in a string, but had no idea how the individual characters looked or what width they had, you would have no way to interpret the image.
ASCII works because we perceive characters both in their visual shape, and in their encoded meaning, and we also see them depicting in a particular font with particular kerning settings. That entails so much information that is simply missing for them. With a bunch of the pics they produce, you notice they are basically off in the way you would expect if you didn’t know the width of individual characters.
This changes if we only use characters of equal width, and a square frame. Say only 8 and 0. And you tell them that if there is a row of characters 8 characters long, this means the ninth character will be right under the first character, the tenth right under the second, etc. This would enable them to learn the spatial relations between the numbers, first in 2D, then in 3D. I’ve been meaning to do that, see if I can teach them spatial reasoning that way, save the convo and report it to the developers for training data, but was unsure if that was a good way for them to retain the information, or whether it would become superfluous as image recognition is incoming.
I’ve seen attempts to not just request ASCII, but teach it. And notably, ChatGPT learned across the conversation and improved, despite the fact that I was stuck by how humans were giving an explanation that is terrible if the person you are explaining things to cannot see your interface. We need to explain ASCII like you are explaining it to someone who is blind and feeling along a set of equally spaced beads, telling them to arrange the beads in a 3D construct in their heads.
It is clearly something tricky for them, though. ChatGPT learned language first, not math, they struggle to do things like accurately count characters. I find it all the more impressive that what they generate is not meaningless, and improves.
With a lot of scenarios where people say ChatGPT failed, I have found that their prompts as is did not work, but if you explain things gently and step by step, they can do it. You can aid the AI in figuring out the correct solution, and I find it fascinating that this is possible, that you can watch them learn through the conversation. The difference is really whether you want to prove an AI can’t do something, or instead treat it like a mutual teaching interaction, as though you were teaching to a bright student with specific disabilities. E.g. not seeing your interface visually is not a cognitive failing, and judging them for it reminds me of people mistaking hearing impaired people for intellectually disabled people, because they keep mishearing instructions.
I like this question—if it proves true that GPT-4 can produce recognizable ASCII art of things, that would mean it was somehow modelling an internal sense of vision and ability to recognize objects.
For this very reason, I was intrigued by the possibility of teaching them vision this way.
I think ASCII art in its general form is an unfair setup, though; ChatGPT has no way of knowing the spacing or width of individual letters, that is not how they perceive them, and hence, they have no way of seeing which letters are above each other. Basically, if you were given an ASCII art in a string, but had no idea how the individual characters looked or what width they had, you would have no way to interpret the image.
ASCII works because we perceive characters both in their visual shape, and in their encoded meaning, and we also see them depicting in a particular font with particular kerning settings. That entails so much information that is simply missing for them. With a bunch of the pics they produce, you notice they are basically off in the way you would expect if you didn’t know the width of individual characters.
This changes if we only use characters of equal width, and a square frame. Say only 8 and 0. And you tell them that if there is a row of characters 8 characters long, this means the ninth character will be right under the first character, the tenth right under the second, etc. This would enable them to learn the spatial relations between the numbers, first in 2D, then in 3D. I’ve been meaning to do that, see if I can teach them spatial reasoning that way, save the convo and report it to the developers for training data, but was unsure if that was a good way for them to retain the information, or whether it would become superfluous as image recognition is incoming.
I’ve seen attempts to not just request ASCII, but teach it. And notably, ChatGPT learned across the conversation and improved, despite the fact that I was stuck by how humans were giving an explanation that is terrible if the person you are explaining things to cannot see your interface. We need to explain ASCII like you are explaining it to someone who is blind and feeling along a set of equally spaced beads, telling them to arrange the beads in a 3D construct in their heads.
It is clearly something tricky for them, though. ChatGPT learned language first, not math, they struggle to do things like accurately count characters. I find it all the more impressive that what they generate is not meaningless, and improves.
With a lot of scenarios where people say ChatGPT failed, I have found that their prompts as is did not work, but if you explain things gently and step by step, they can do it. You can aid the AI in figuring out the correct solution, and I find it fascinating that this is possible, that you can watch them learn through the conversation. The difference is really whether you want to prove an AI can’t do something, or instead treat it like a mutual teaching interaction, as though you were teaching to a bright student with specific disabilities. E.g. not seeing your interface visually is not a cognitive failing, and judging them for it reminds me of people mistaking hearing impaired people for intellectually disabled people, because they keep mishearing instructions.